Starbeamrainbowlabs

Stardust
Blog


Archive


Mailing List Articles Atom Feed Comments Atom Feed Twitter Reddit Facebook

Tag Cloud

3d 3d printing account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio automation backups bash batch blender blog bookmarklet booting bug hunting c sharp c++ challenge chrome os cluster code codepen coding conundrums coding conundrums evolved command line compilers compiling compression conference conferences containerisation css dailyprogrammer data analysis debugging defining ai demystification distributed computing dns docker documentation downtime electronics email embedded systems encryption es6 features ethics event experiment external first impressions freeside future game github github gist gitlab graphics guide hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet interoperability io.js jabber jam javascript js bin labs latex learning library linux lora low level lua maintenance manjaro minetest network networking nibriboard node.js open source operating systems optimisation outreach own your code pepperminty wiki performance phd photos php pixelbot portable privacy problem solving programming problems project projects prolog protocol protocols pseudo 3d python reddit redis reference release releases rendering research resource review rust searching secrets security series list server software sorting source code control statistics storage svg systemquery talks technical terminal textures thoughts three thing game three.js tool tutorial twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 worldeditadditions xmpp xslt

Compiling the wacom driver from source to fix tilt & rotation support

I was all sat down and setup to do some digital drawing the other day, and then I finally snapped. My graphics tablet (a secondhand Wacom Intuos Pro S from Vinted) - which supports pen tilt - was not functioning correctly. Due to a bug that has yet to be patched, the tilt X/Y coordinates were being wrongly interpreted as unsigned integers (i.e. uint32) instead of signed integers (e.g. int32). This had the effect of causing the rotational calculation to jump around randomly, making it difficult when drawing.

So, given that someone had kindly posted a source patch, I set about compiling the driver from source. For some reason that is currently unclear to me, it is not being merged into the main wacom tablet driver repository. This leaves compiling from source with the patch the only option here that is currently available.

It worked! I was so ecstatic. I had tilt functionality for the first time!

Fast-forward to yesterday....... and it broke again, and I first noticed because I am left-handed and I have a script that flips the mapping of the pad around so I can use it the opposite way around.

I have since fixed it, but the entire process took me long enough to figure out that I realised that I was halfway there to writing a blog post as a comment on the aforementioned GitHub issue, so I decided to just go the rest of the way and write this up into a full blog post / tutorially kinda thing and do the drawing I wanted to do in the first place tomorrow.

In short, there are 2 parts to this:

  • input-wacom, the kernel driver
  • xf86-input-wacom, the X11 driver that talks to the kernel driver

....and they both have to be compiled separately, as I discovered yesterday.

Who is this for?

If you've got a Wacom Intuos tablet that supports pen tilt / rotation, then this blog post is for you.

Mine is a Wacom Intuos Pro S PTH-460.

This tutorial has been written on Ubuntu 24.04, but it should work for other systems too.

If there's the demand I might put together a package and put it in my apt repo, though naturally this will be limited to the versions of Ubuntu I personally use on my laptop - though do tend to upgrade through the 6-monthly updates.

I could also put together an AUR package, but currently on the devices I run Artix (Arch derivative) I don't usually have a tilt-supporting graphics tablet physically nearby when I'm using them and they run Wayland for unavoidable reasons.

Linux MY_DEVICE_NAME 6.8.0-48-generic #48-Ubuntu SMP PREEMPT_DYNAMIC Fri Sep 27 14:04:52 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Kompiling the kernel module

Navigate to a clean directory somewhere persistent, as you may need to get back to it later.

If you have the kernel driver installed, then uninstall it now.

On Ubuntu / apt-based systems, they bundle the kernel module and the X11 driver bit all in a single package..... hence the reason why we hafta do all the legwork of compiling and installing both the kernel module and the X11 driver from source :-/

e.g. on Ubuntu:

sudo apt remove xserver-xorg-input-wacom

Then, clone the git repo and checkout the right branch:

git clone https://github.com/jigpu/input-wacom.git -b fix-445
cd input-wacom;

....then, ref the official instructions install build-time dependencies if required:

sudo apt-get install build-essential autoconf linux-headers-$(uname -r)

...check if you have these installed already by replacing apt-get install with apt-cache policy.

Then, build and install all-in-one:

if test -x ./autogen.sh; then ./autogen.sh; else ./configure; fi && make && sudo make install || echo "Build Failed"

....this will prompt for a password to install directly into your system. I think they recommend to do it this way to simplify the build process for people.

This should complete our khecklist for the kernel module, but to activate it you'll need to reboot.

Don't bother doing that right now though on Ubuntu, since we have the X11 driver to go. For users on systems lucky enough to split the 2 drivers up, then you can just reboot here.

You can check (after rebooting!) if you've got the right input-wacom kernel module with this command:

grep "" /sys/module/wacom*/version

....my research suggests you need to have a wacom tablet plugged in for this to work.

If you get something like this:

$ grep "" /sys/module/wacom*/version
v2.00

....then you're still using your distribution-provided wacom kernel module. Go uninstall it!

The output you're looking for should look a bit like this:

$ grep "" /sys/module/wacom*/version
v2.00-1.2.0.37.g2c27caa

Compiling the X11 driver

Next up is xf86-input-wacom, the X11 side of things.

Instructions for this are partially sourced from https://github.com/linuxwacom/xf86-input-wacom/wiki/Building-The-Driver#building-with-autotools.

First, install dependencies:

sudo apt-get install autoconf pkg-config make xutils-dev libtool xserver-xorg-dev$(dpkg -S $(which Xorg) | grep -Eo -- "-hwe-[^:]*") libx11-dev libxi-dev libxrandr-dev libxinerama-dev libudev-dev

Then, clone the git repository and checkout the latest release:

git clone https://github.com/linuxwacom/xf86-input-wacom.git
cd "xf86-input-wacom";
git tag; # Pick the latest one from this list
git switch "$(git tag | tail -n1)"; # Basically git switch TAG_NAME

It should be at the bottom, or at least that's what I found. For me, that was xf86-input-wacom-1.2.3.

Then, to build and install the software from source, run these 2 commands one at a time:

set -- --prefix="/usr" --libdir="$(readlink -e $(ls -d /usr/lib*/xorg/modules/input/../../../ | head -n1))"
if test -x ./autogen.sh; then ./autogen.sh "$@"; else ./configure "$@"; fi && make && sudo make install || echo "Build Failed"

Now you should have the X11 side of things installed. In my case that includes xsetwacom, the (questionably designed) CLI for managing the properties of connected graphics tablets.

If that is not the case for you, you can extract it from the Ubuntu apt package:

apt download xserver-xorg-input-wacom
dpkg -x DEB_FILEPATH_HERE .
ar xv DEB_FILEPATH_HERE # or, if you don't have dpkg for some reason

....then, go locate the tool and put it somewhere in your PATH. I recommend somewhere towards the end in case you forget and fiddle with your setup some more later, so it gets overridden automatically. When I was fidddling around, that was /usr/local/games for me.

Making X11 like the kernel Driver

Or also known as enabling hotplug support. Or getting the kernel module and X11 to play nicely with each other.

This is required to make udev (the daemon that listens for devices to be plugged into the machine and then performs custom actions on them) tell the X server that you've plugged in your graphics tablet, or X11 to recognise that tablet devices are indeed tablet devices, or something else vaguely similar to that effect.

Thankfully, this just requires the installation of a single configuration file in a directory that may not exist for you yet - especially if you uninstalled your distro's wacom driver package.

Do it like this:

mkdir -p /etc/X11/xorg.conf.d/;
sudo curl -sSv https://raw.githubusercontent.com/linuxwacom/xf86-input-wacom/refs/heads/master/conf/70-wacom.conf -o /etc/X11/xorg.conf.d/70-wacom.conf

Just case they move things around as I've seen happen in far too many tutorials with broken links before, the direct link to the exact commit of this file I used is:

https://github.com/linuxwacom/xf86-input-wacom/blob/47552e13e714ab6b8c2dcbce0d7e0bca6d8a8bf0/conf/70-wacom.conf

Final steps

With all that done and out of the way, reboot. This serves 2 purposes:

  1. Reloading the correct kernel module
  2. Restarting the X11 server so it has the new driver.

Make sure to use the above instructions to check you are indeed running the right version of the input-wacom kernel module.

If all goes well, tilt/rotation support should now work in the painting program of your choice.

For me, that's Krita, the AppImage of which I bundle into my apt repository because I like the latest version:

https://apt.starbeamrainbowlabs.com/

The red text "Look! Negative TX/TY (TiltX / TiltY) numbers!" crudely overlaid using the Shutter screenshotting tool on top of a screenshot of the Krita tablet tester with a red arrow pointing at the TX/TY values highlighted in yellow.

Conclusion

Phew, I have no idea where this blog post has come from. Hopefully it is useful to someone else out there who also owns an tilt-supporting wacom tablet who is encountering a similar kinda issue.

Ref teaching and the previous post, preparing teaching content is starting to slwo down now thankfully. Ahead are the uncharted waters of assessment - it is unclear to me how much energy that will take to deal with.

Hopefully though there will be more PhD time (post on PhD corrections..... eventually) and free energy to spend on writing more blog posts for here! This one was enjoyable to write, if rather unexpected.

Has this helped you? Are you still stuck? Do report any issues to the authors of the above two packages I've shown in this post!

Comments below are also appreciated, both large and small.

Ubuntu 24.04 upgrade report

Heya! I just upgraded to from Ubuntu 23.10 to Ubuntu 24.04 today, so I thought I'd publish a quick blog post on my experience. There are a number of issues to watch out for on this one.

tldr: Do not upgrade a machine to which you do not have physical access to 24.04 until the first point-release comes out!

While the do-release-upgrade itself went relatively well, I encountered a number of problematic issues that significantly affected the stability of my system afterwards, which I describe below, along with the fixes and workarounds that I applied.

Illustration of a striped numbat, looking up at fireflies against a pink and purple gradient background with light rays coming from the top corners

(Above: One of the official wallpapers for Ubuntu 24.04 Noble Numbat entitled "Little numbat boy", drawn by azskalt in Krita)

apt sources

Of course, any do-release-upgrade you run is going to disable third-party sources. But this time there's a new mysterious format for apt sources that looks a bit like this:

Enabled: yes
Signed-By: /etc/apt/trusted.gpg.d/sbrl.asc
Types: deb
URIs: https://apt.starbeamrainbowlabs.com/
Suites: ./
Components: 

....pretty strange, right? As it turns out, Ubuntu 24.04 has decided to switch to this new "DEB822" apt sources format by default, though I believe the existing format that looks like this:

deb [signed-by=/etc/apt/trusted.gpg.d/sbrl.asc] https://apt.starbeamrainbowlabs.com/ ./ # apt.starbeamrainbowlabs.com

....should still work. Something else to note: the signed-by there is now required, and sources won't work without it.

For more information, see steeldriver's Ask Ubuntu Answer here:

Where is the documentation for the new apt sources format used in 24.04? - Ask Ubuntu

Boot failure: plymouth and the splash screen

Another issue I encountered was this bug:

boot - Kubuntu 24.04 Black Screen / Not Booting After Upgrade - Ask Ubuntu

...basically, there's a problem with the splash screen which crashes the system because it tries to load an image before the graphics drivers load. The solution here is to disable the splash option in the grub settings.

This can be done either before you reboot into 24.04, or if you have already rebooted into 24.04, in the grub menu you can simply hit e on the default Ubuntu entry in your grub menu and then remove the word splash from the boot line there.

If you are lucky enough to see this post before you reboot, then simply edit /etc/default/grub and change quiet splash under GRUB_CMDLINE_LINUX_DEFAULT to be an empty string:

GRUB_CMDLINE_LINUX_DEFAULT=""

...and then update grub like so:

sudo update-grub

Boot failure: unable to even reach grub

A strange one I encountered was an inability to even reach grub, even if I manually select the grub.efi as a boot target via my UEFI firmware settings (I'm on an entroware laptop so that's F2, but your key will vary).

This one kinda stumped me, so I found this page:

Boot-Repair - Community Help Wiki

...which suggests a boot repair tool. Essentially it reinstalls grub and fixes a number of other common issues, such as a missing nvram entry for grub (UEFI systems need bootloaders registering against them), missing packages - I suspect this was the issue this time - and other common issues.

It did claim that my nvram was locked, but it still seems to have resolved the issue anyway. I do recommend booting into the live Ubuntu session with the toram kernel parameter (press e in the grub menu → add kernel parameter → press ctrl + x) and them removing your flash drive before running this tool, just to avoid it getting confused and messing with the bootloader on your flash drive - thus rendering it unusable - by accident.

Essentially, boot into a live environment, connect to the Internet, and run then these commands:

sudo add-apt-repository ppa:yannubuntu/boot-repair && sudo apt update
sudo apt install -y boot-repair
boot-repair

sudo is not required for some strange reason.

indicator-keyboard-service memory leak

Finally, there is a significant memory leak in indicator-keyboard-service - which I assume provides the media/function key functionality, which I only noticed because I have a system resource monitor running in my system tray (indicator-multiload; multiload-ng is an alternative version that may work if you have issues with the former).

The workaround I implemented was to move the offending binary aside and install a stub script in its place:

cd /usr/libexec/indicator-keyboard
sudo mv indicator-keyboard-service indicator-keyboard-service.bak
sudo nano indicator-keyboard-service

In the text editor for the replacement for indicator-keyboard-service, paste the following content:

#!/usr/bin/env sh
exit 0

...save and exit. Then, chmod +x:

sudo chmod +x indicator-keyboard-service

....this should at least workaround the issue so that you can regain system stability.

I run the Unity desktop, but this will likely affect the GNOME desktop and others too. There's already a bug report on Launchpad here:

Bug #2055388 "suspected memory leak with indicator-keyboard (causing gnome-session-flashback to freeze after startup)" : Bugs : indicator-keyboard package : Ubuntu

...if this issue affects you, do make sure to go and click the green text at this top-ish of the page to say so. The more people that say it affects them, the higher it will be on the priority list to fix.

Conclusion

A number of significant issues currently plague the upgrade process to 24.04:

  • Memory leaks from indicator-keyboard-service
  • Multiple issues preventing systems from booting by default

...I recommend that upgrading to 24.04 is done cautiously at this time. If you do not have physical access to a given system or do not have the time/energy to fix issues that prevent your system from booting successfully, I strongly recommend waiting for the first or second point release (i.e. 24.04.1 / 24.04.2) before upgrading.

If you haven't already, I also strongly recommend configuring timeshift to take automated snapshots of your system so that you can easily roll back in case of a failure.

Finally, I also recommend upgrading via the command line with this command:

sudo do-release-upgrade

...and carefully monitoring the logs as the upgrade process is running. Then, do not reboot as it asks you to until you have checked and resolved all of the above issues.

That's all I have at the moment for upgrading Ubuntu. I have 3 other systems to upgrade from 22.04, but I'll be waiting for the first point release before attempting that. I'll make another post (or a comment on this one) to let everyone know how it went when I do begin the process of upgrading them.

If you've encountered any issues in the upgrade process to 24.04 (or have any further insight into the issues I describe here), please do leave a comment below!

A memory tester for the days of UEFI

For the longest time, memtest86+ was a standard for testing sticks of RAM that one suspects may be faulty. I haven't used it in a while, but when I do use it I find that an OS-independent tool (i.e. one that you boot into instead of your normal operating system) is the most reliable way to identify faults with RAM.

It may surprise you, but I've had this post mostly written up for about 2 years...! I remembered about this post recently, and decided to rework some of it and post it here.

Since UEFI was invented (Unified Extensible Firmware Interface) and replaced the traditional BIOS for booting systems around the world, booting memtest86+ suddenly became more challenging, as it is not currently compatible with UEFI. Now, it has been updated to support UEFI though, so I thought I'd write a blog post about it - mainly because there are very rarely guides on booting images like memtest86+ from a multiboot flash drive, like the one I have blogged about before.

Before we begin, worthy of note is memtest86. While it has a very similar name, it is a variant of memtest86+ that is not open source. I have tried it though, and it works well too - brief instructions can be found for it at the end of this blog post.

I will assume that you have already followed my previous guide on setting up a multiboot flash drive. You can find that guide here:

Multi-boot + data + multi-partition = octopus flash drive 2.0?

Alternatively, anywhere you can find a grub config file you can probably follow this guide. I have yet to find an actually decent reference for the grub configuration file language, but if you know of one, please do post it in the comments.

Memtest86+ (the open source one)

Personally, I recommend the open-source Memtest86+. Since the update to version 7.0, it is now compatible with both BIOS and UEFI-based systems without any additional configuration, which is nice. See the above link to one of my previous blog posts if you would like a flash drive that boots both BIOS and UEFI grub at the same time.

To start, visit the official website, and scroll down to the download section. From here, you want to download the "Binary Files (.bin/.efi) For PXE and chainloading" version. Unzip the file you download, and you should see the following files:

memtest32.bin
memtest32.efi
memtest64.bin
memtest64.efi

....discard the files with the .efi file extension - these are for booting directly instead of being chainloaded by grub. As the names suggest, the ones with 64 in the filename are the ones for 64-bit systems, which includes most systems today. Copy these to the device of your choice, and the open up your relevant grub.cfg (or equivalent grub configuration file - /etc/default/grub on an already-installed system) for editing. Then, somewhere in there add the following:

submenu "Memtest86+" {
    if loadfont unicode ; then
        set gfxmode=1024x768,800x600,auto
        set gfxpayload=800x600,1024x768
        terminal_output gfxterm
    fi

    insmod linux

    menuentry "[amd64] Start Memtest86+, use built-in support for USB keyboards" {
        linux /images/memtest86/memtest64.bin keyboard=both
    }
    menuentry "[amd64] Start Memtest86+, use BIOS legacy emulation for USB keyboards" {
        linux /images/memtest86/memtest64.bin keyboard=legacy
    }
    menuentry "[amd64] Start Memtest86+, disable SMP and memory identification" {
        linux /images/memtest86/memtest64.bin nosmp nosm nobench
    }
}

...replace /images/memtest86/memtest64.bin with the path to the memtest64.bin (or memtest32.bin) file, relative to your grub.cfg file. I forget where I took the above config file from, but I can't find it in my history.

If you are doing this on an installed OS instead of a USB flash drive, then things get a little more complicated. You will need to dig around and find what your version of grub considers paths to be relative to, and put your memtest64.bin file somewhere nearby. If you have experience with this, then please do leave a comment below.

This should be all you need. For those using a grub setup for an already-installed OS (e.g. via /etc/default/grub), then you will need to run a command for your changes to take effect:

sudo update-grub

Adding shutdown/reboot/reboot to bios setup/firmware options

Another thing I discovered recently is how to add options to my grub menu to reboot, shutdown, and reboot into firmware settings. rEFInd (an alternative bootloader to grub that I like very much, but I haven't yet explored for booting multiple ISOs on a flash drive) has these in its menus by default, but grub doesn't - so since I discovered how to do it recently I thought I'd include the config here for reference.

Simply add the following somewhere in your grub configuration file:

menuentry "Reboot" {
    reboot
}

menuentry "Shut Down" {
    halt
}

menuentry "UEFI Firmware / BIOS Settings" {
    fwsetup
}

Bonus: Memtest86 (non open-source)

I followed [https://www.yosoygames.com.ar/wp/2020/03/installing-memtest86-on-uefi-grub2-ubuntu/] this guide, but ended up changing a few things, so I'll outline the process here. Again, I'll assume you alreaady have a multiboot flash drive.

Firstly, download memtest86-usb.zip and extract the contents. Then, find the memtest86-usb.img file and find the offset of the partition that contains the actual EFI image that is the memtest86 program:


fdisk -lu memtest86-usb.img

Disk memtest86-usb.img: 500 MiB, 524288000 bytes, 1024000 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 68264C0F-858A-49F0-B692-195B64BE4DD7

Device              Start     End Sectors  Size Type
memtest86-usb.img1   2048  512000  509953  249M Microsoft basic data
memtest86-usb.img2 514048 1023966  509919  249M EFI System

Then, take the start position of the second partition (the last line that is highlighted), and multiply it by 512, the sector size. In my case, the number is 263192576. Then, mount the partition into a directory you have already created:

sudo mount -o loop,offset=263192576 memtest86-usb.img /absolute/path/to/dir

Then, browse the contents of the mounted partition and copy the EFI/BOOT directory off to your flash drive, and rename it to memtest86 or something.

Now, update your grub.cfg and add the following:

menuentry "memtest86" {
    chainloader /images/memtest/BOOTX64.efi
}

....replacing /images/memtest/BOOTX64.efi with the path to the BOOTX64.efi file that should be directly in the BOOT directory you copied off.

Finally, you should be able to try it out! Boot into your multiboot flash drive as normal, and then select the memtest86 option from the grub menu.

Extra note: booting from hard drives

This post is really turning into a random grab-bag of items in my grub config file, isn't it? Anyway, An option I don't use all that often (but is very useful when I do need it), are options to boot from the different hard drives in a machine. Since you can't get grub to figure out how many there are in advance, you have to statically define them ahead of time:



submenu "Boot from Hard Drive" {
    menuentry "Hard Drive 0" {
        set root=(hd0)
        chainloader +1
    }
    menuentry "Hard Drive 1" {
        set root=(hd1)
        chainloader +1
    }
    menuentry "Hard Drive 2" {
        set root=(hd2)
        chainloader +1
    }
    menuentry "Hard Drive 3" {
        set root=(hd3)
        chainloader +1
    }
}

....chainloading (aka calling another bootloader) is a wonderful thing :P

Of course, expand this as much as you like. I believe this approach also works with specific partitions with the syntax (hd0,X), where X is the partition number starting from 0.

Again, add to your grub.cfg file and update as above.

Conclusion

This post is more chaotic and disorganised than I expected, but I thought it would be useful to document some of the tweaks I've made to my multiboot flash drive setup over the years - something that has more proven its worth many many times since I first set it up.

We've added a memory (RAM) tester to our setup, using the open-source Memtest86+, and the alternative non-open-source version. We've also added options to reboot, shutdown, and enter the bios/uefi firmware settings.

Finally, we took a quick look at adding options to boot from different hard drives and partitions. If anyone knows how to add a menu item that could allow one to distinguish between different hard disks, partitions, their sizes, and their content more easily, please do leave a comment below.

Sources and further reading

Encrypting and formatting a disk with LUKS + Btrfs

Hey there, a wild tutorial appeared! This is just a quick one for self-reference, but I hope it helps others too.

The problem at hand is that of formatting a data disk (if you want to format your root / disk please look elsewhere - it usually has to be done before or during installation unless you like fiddling around in a live environment) with Btrfs.... but also encrypting the disk, which isn't something that Btrfs natively supports.

I'm copying over some data to my new lab PC, and I've decided to up the security on the data disk I store my research data on.

Unfortunately, both GParted and KDE Partition Manager were unable to help me (the former not supporting LUKS, and the latter crashing with a strange error), so I ended up looking through more posts that should be reasonable to find a solution that didn't involve encrypting either / or /boot.

It's actually quite simple. First, find your disk's name via lsblk, and ensure you have created the partition in question. You can format it with anything (e.g. using the above) since we'll be overwriting it anyway.

Note: You may need to reboot after creating the partition (or after some of the below) if you encounter errors, as Linux sometimes doesn't like new partitions appearing out of the blue with names that were used previously on that boot very much.

Then, format it with LUKS, the most common encryption scheme on Linux:

sudo cryptsetup luksFormat /dev/nvmeXnYpZ

...then, formatting with Btrfs is a 2-step process. First we hafta unlock the LUKS encrypted partition:

sudo cryptsetup luksOpen /dev/nvme0n1p1 SOME_MAPPER_NAME

...this creates a virtual 'mapper' block device we can hit like any other normal (physical) partition. Change SOME_MAPPER_NAME to anything you like so long as it doesn't match anything else in lsblk/df -h and also doesn't contain spaces. Avoid unicode/special characters too, just to be safe.

Then, format it with Btrfs:

sudo mkfs.btrfs --metadata single --data single --label "SOME_LABEL" /dev/mapper/SOME_MAPPER_NAME

...replacing SOME_MAPPER_NAME (same value you chose earlier) and SOME_LABEL as appropriate. If you have multiple disks, rinse and repeat the above steps for them, and then bung them on the end:

sudo mkfs.btrfs --metadata raid1 --data raid1 --label "SOME_LABEL" /dev/mapper/MAPPER_NAME_A /dev/mapper/MAPPER_NAME_B ... /dev/mapper/MAPPER_NAME_N

Note the change from single to raid1. raid1 stores at least 2 copies on different disks - it's a bit of a misnomer as I've talked about before.

Now that you have a kewl Btrfs-formatted partition, mount it as normal:

sudo mount /dev/mapper/SOME_MAPPER_NAME /absolute/path/to/mount/point

For Btrfs filesystems with multiple disks, it shouldn't matter which source partition you pick here as Btrfs should pick up on the other disks.

Automation

Now that we have it formatted, we don't want to hafta keep typing all those commands again. The simple solution to this is to create a shell script and put it somewhere in our $PATH.

To do this, we should ensure we have a robust name for the disk instead of /dev/nvme, which could point to a different disk in future if your motherboard or kernel decides to present them in a different order for a giggle. That's easy by looking over the output of blkid and cross-referencing it with lsblk and/or df -h:

sudo lsblk
sudo df -h
sudo blkid # → UUID

The number you're after should be in the UUID="" field. The shell script I came up with is short and sweet:

#!/usr/bin/env bash
disk_id="ID_FROM_BLKID";
mapper_name="SOME_NAME";
mount_path="/absolute/path/to/mount/dir";

sudo cryptsetup luksOpen "/dev/disk/by-uuid/${disk_id}" "${mapper_name}";
sudo mount "/dev/mapper/${mapper_name}" "${mount_path}"

Fill in the values as appropriate:

  • disk_id: The UUID of the disk in question from blkid.
  • mapper_name: A name of your choosing that doesn't clash with anything else in /dev/mapper on your system
  • mount_path: The absolute path to the directory that you want to mount into - usually in /mnt or /media.

Put this script in e.g. $HOME/.local/bin or somewhere else in $PATH that suits you and your setup. Don't forget to run chmod +x path/to/script!

Conclusion

We've formatted an existing partition with LUKS and Btrfs, and written a quick-and-dirty shell script to semi-automate the process of mounting it here.

If this has been useful or if you have any suggestions, please do leave a comment below!

Sources and further reading

.desktop files: Launcher icons on Linux

Heya! Just thought I'd write a quick reminder post for myself on the topic of .desktop files. In most Linux distributions, launcher icons for things are dictated by files with the file extension .desktop.

Of course, most programs these days come with a .desktop file automatically, but if you for example download an AppImage, then you might not get an auto-generated one. You might also be packaging something for your distro's package manager (go you!) - something I do semi-regularly when apt repos for software I need isn't updated (see my apt repository!).

They can live either locally to your user account (~/.local/share/applications) or globally (/usr/share/applications), and they follow the XDG desktop entry specification (see also the Arch Linux docs page, which is fabulous as usual ✨). It's basically a fancy .ini file:

[Desktop Entry]
Encoding=UTF-8
Type=Application
Name=Krita
Comment=Krita: Professional painting and digital art creation program
Version=1.0
Terminal=false
Exec=/usr/local/bin/krita
Icon=/usr/share/icons/krita.png

Basically, leave the first line, the Type, the Version, the Terminal, and the Encoding directives alone, but the others you'll want to customise:

  • Name: The name of the application to be displayed in the launcher
  • Comment: The short 1-line description. Some distros display this as the tooltip on hover, others display it in other ways.
  • Exec: Path to the binary to execute. Prepend with env Foo=bar etc if you need to set some environment variables (e.g. running on a discrete card - 27K views? wow O.o)
  • Icon: Path to the icon to display as the launcher icon. For global .desktop files, this should be located somewhere in /usr/share/icon.

This is just the basics. There are many other directives you can include - like the Category directive, which describes - if your launcher supports categories - what categories a given launch icon should appear under.

Troubleshooting: It took me waaay too long to realise this, but if you have put your .desktop file in the right place and it isn't appearing - even after a relog - then the desktop-file-validate command could come in handy:

desktop-file-validate path/to/file.desktop

It validates the content of a given .desktop file. If it contains any errors, then it will complain about them for you - unlike your desktop environment which just ignores .desktop files that are invalid.

If you found this useful, please do leave a comment below about what you're creating launcher icons for!

Sources and further reading

NAS Series List

Somehow, despite posting about my NAS back in 2021 I have yet to post a proper series list post about it! I'm rectifying that now with this quick post.

I wrote this series of 4 posts back when I first built my new NAS box.

Here's the full list of posts in the main NAS series:

Additionally, as a bonus, I also later in 2021 I wrote a pair of posts back how I was backing up my NAS to a backup NAS. Here they are:

How (not) to recover a consul cluster

Hello again! I'm still getting used to a new part-time position at University which I'm not quite ready to talk about yet, but in the mean time please bear with me as I shuffle my schedule around.

As I've explained previously on here, I have a consul cluster (superglue service discovery!) that forms the backbone of my infrastructure at home. Recently, I had a small powercut that knocked everything offline, and as the recovery process was quite interesting I thought I'd blog about it here.

The issue at had happened at about 5pm, but I only discovered it was a problem until a few horus later when I got home. Essentially, a small powercut knocked everything offline. While my NAS rebooted automatically afterwards, my collection of Raspberry Pis weren't so lucky. I can only suspect that they were caught in some transient state or something. None of them responded when I pinged them, and later inspection of the logs on my collectd instance revealed that they were essentially non-functional until after they were rebooted manually.

A side effect of this was that my Consul (and, by extension, my Nomad cluster) cluster was knocked offline.

Anyway, at first I only rebooted the controller host (that has both a Consul and Nomad server running on it, but does not accept and run jobs). This rebooted just fine and came back online, so I then rebooted my monitoring box (that also runs continuous integration), which also came back online.

Due to the significantly awkward physical location I keep my cluster in with the rest of the Pis, I decided to flip the power switch on the extension to restart all my hosts at the same time.

While this worked..... it also caused my cluster controller node to reboot, which caused its raft epoch number to increment by 1... which broke the quorum (agreement) of my cluster, and required manual intervention to resolve.

Raft quorum

To understand the specific issue here, we need to look at the Raft consensus algorithm. Raft is, as the name suggests, a consensus algorithm. Such an algorithm is useful when you have a cluster of servers that need to work together in a redundant fault-tolerant fashion on some common task, such as in our case Consul (service discovery) and Nomad (task scheduling).

The purpose of a raft server is to maintain agreement amongst all nodes in a cluster as to the global state of an application. It does this using a distributed log that it replicates through a fancy but surprisingly simple algorithm.

At the core of this algorithm is the concept of a leader. The cluster leader is responsible for managing and committing updates to the global state, as well as sending out the global state to everyone else in the cluster. In the case of Consul, the Consul servers are the cluster (the clients simply connect back to whichever servers are available) - and I have 3 of them, since Raft requires an odd number of nodes.

When the cluster first starts up or the leader develops a fault (e.g. someone sets off a fork bomb on it just for giggles), an election occurs to decide on a new leader. The election term number (or epoch number) is incremented by one, and everyone votes on who the new leader should be. The node with the most votes becomes the new leader, and quorum (agreement) is achieved across the entire cluster.

Consul and Raft

In the case of Consul, everyone must cast a vote for the vote to be considered valid, otherwise the vote is considered invalid and the election process must begin again. Crucially, the election term number must also be the same across everyone voting.

In my case, because I started my cluster controller and then rebooted it before it had a chance to achieve quorum, it incremented it's election term number and additional time than the rest of the cluster did, which caused the cluster to fail to reach quorum as the other 2 nodes in the Consul server cluster consider the controller node's vote to be invalid, yet they still demanded that all servers vote to elect a new leader.

The practical effect of this was tha because the Consul cluster failed to agree on who the leader should be, the Nomad cluster (which hangs off the Consul cluster, using it to find each other) also failed to start and subsequently reach quorum, which knocked all my jobs offline.

The solution

Thankfully, the Hashicorp Consul documentation for this specific issue is fabulous:

https://developer.hashicorp.com/consul/tutorials/datacenter-operations/recovery-outage#failure-of-a-server-in-a-multi-server-cluster

To summarise:

  1. Boot the cluster as normal if it isn't booted already
  2. Stop the failed node
  3. Create a special config file (raft/peers.json) that will cause the failed node to drop it's state and accept the state of the incomplete cluster, allowing it to rejoin and the cluster gain collective quorum once more.

The documentation to perform this recovery protocol is quite clear. While there is an option to recover a failed node if you still have a working cluster with a leader, in my case I didn't so I had to use the alternate route.

Conclusion

I've talked briefly about an interesting issue that caused my Consul cluster to break quorum, which inadvertently brought my entire infrastructure down until resolved the issue.

While Consul is normally really quite resilient, you can break it if you aren't careful. Having an understanding of the underlying consensus algorithm Raft is very helpful to diagnosing and resolving issues, though the error messages and documentation I looked through were generally clear and helpful.

Considerations on monitoring infrastructure

I like Raspberry Pis. I like them so much that by my count I have at least 8 in operation at the time of typing performing various functions for me, including a cluster for running various services.

Having Raspberry Pis and running services on servers is great, but once you have some infrastructure setup hosting something you care about your thoughts naturally turn to mechanisms by which you can ensure that such infrastructure continues to run without incident, and if problems do occur they can be diagnosed and fixed efficiently.

Such is the thought that is always on my mind when managing my own infrastructure, sprawls across multiple physical locations. To this end, I thought I'd blog what my monitoring system looks like - what it's strengths are, and what it could do better.

A note before we begin: I continue to have a long-term commitment to posting on this blog - I have just started a part-time position alongside my PhD due to the end of my primary research period, which has been taking up a lot of my mental energy. Things should get slowly back to normal soon-ish.

Keep in mind as you read this that my situation may be different to your own. For example, monitoring a network primary consisting of Raspberry Pis demands a very different approach than an enterprise setup (if you're looking for a monitoring solution for a bunch of big powerful servers, I've heard the TICK stack is a good place to start).

Monitoring takes many forms and purposes. Broadly speaking, I split the monitoring I have on my infrastructure into the following categories:

  1. Logs (see my earlier post on Centralising logs with rsyslog)
  2. System resources (e.g. CPU/RAM/disk/etc usage) - I use collectd for this
  3. Service health - I use Consul for my cluster, and Uptime Robot for this website.
  4. Server health (e.g. whether a server is down or not, hanging due to a bad mount, etc.)

I've found that as there are multiple categories of things that need monitoring, there isn't a single one-size-fits-all solution to the problem, so different tools are needed to monitor different things.

Logs - centralised rsyslog

At the moment, monitoring logs is a solved problem for me. I've talked about my setup previously, in which I have a centralised rsyslog server which receives and stores all logs from my entire infrastructure (barring a few select boxes I need to enrol in this system). Storing logs nets me 2 things:

  1. The ability to reference them (e.g. with lnav) later in the event of an issue for diagnostic purposes
  2. The ability to inspect the logs during routine maintenance for any anomalies, issues, or errors that might become problematic later if left unattended

System information - collectd

Similarly, storing information about system resource usage - such as CPU load or disk usage for instance - is more useful than you'd think for spotting and pinpointing issues with one's infrastructure - be it a single server or an entire fleet. In my case, this also includes monitoring network latency (useful should my ISP encounter issues, as then I can identify if it's a me or a them problem) and HTTP response times.

For this, I use collectd, backed by rrd (round-robin database) files. These are fixed-size files that contain ring buffers that it iteratively writes over, allowing efficient storage of up to 1 year's worth of history.

To visualise this in the browser, I use Collectd Graph Panel, which is unfortunately pretty much abandonware (I haven't found anything better).

To start with the strengths of this system, it's very computationally efficient. I have tried previously to setup a TICK (Telegraf, InfluxDB, Chronograf, and Kapacitor) stack on a Raspberry Pi, but it was way too heavy - especially considering the Raspberry Pi my monitoring system runs on is also my continuous integration server. Collectd, on the other hand, runs quietly in the background, barely using any resources at all.

Another strength is that it's easy and simple. You throw a config file at it (which could be easily standardised across an entire fleet of servers), and collectd will dutifully send encrypted system metrics to a given destination for you with minimal fuss. Meanwhile, the browser-based dashboard I use automatically plots graphs and displays them for you without any tedious creation of a custom dashboard.

Having a system monitor things is good, but having it notify you in the event of an anomaly is even better. While collectd does have the ability to generate and send notifications, its capacity to do this is unfortunately rather limited.

Another limitation of collectd is that accessing and processing the stored system metrics data is not a trivial process, since it's stored in rrd databases, the parsing of which is surprisingly difficult due to a lack of readily available libraries to do this. This makes it difficult to integrate it with other systems, such as n8n for example, which I have recently setup to replace some functions of IFTTT to automatically repost my blog posts here to Reddit and Discord.

Collectd can write to multiple sources however (e.g. MQTT), so I might look into this as an option to connect it to some other program to deliver more flexible notifications about issues.

Service health

Service health is what most people might think of when I initially said that this blog post would be about monitoring. In many ways, it's one of the most important things to monitor - especially if other people rely on infrastructure which is managed by you.

Currently, I achieve this in 2 ways. Firstly, for services running on the server that hosts this website I have a free Uptime Robot account which monitors my server and website. It costs me nothing, and I get monitoring of my server from a completely separate location. In the event my server or the services thereon that it monitors are down, I will get an email telling me as such - and another email once it goes back up again.

Secondly, for services running on my cluster I use Consul's inbuilt service monitoring functionality, though I don't yet have automated emails to notify me of failures (something I need to investigate a solution for).

The monitoring system you choose here depends on your situation, but I strongly recommend having at least some form of external monitoring of whether your target boxes go down that can notify you of this. If your monitoring is hosted on the box that goes down, it's not really of much use...!

Monitoring service health more robustly and notifying myself about issues is currently on my todo list.

Server health

Server health ties into service health, and perhaps also system information too. Knowing which servers are up and which ones are down is important - not least because of the services running thereon.

The tricky part of this is that if a server goes down, it could be because of any one of a number of issues - ranging from a simple software/hardware failure, all the way up to an entire-building failure (e.g. a powercut) or a natural disaster. With this in mind, it's important to plan your monitoring carefully such that you still get notified in the event of a failure.

Conclusion

In this post, I've talked a bit about my monitoring infrastructure, and things to consider more generally when planning monitoring for new or existing infrastructure.

It's never too late to iteratively improve your infrastructure monitoring system - whether it be enrolling that box in the corner that never got added to the system, or implementing a totally kind of monitoring - e.g. centralised logging, or in my case I need to work on more notifications for when things go wrong.

On a related note, what do your backups look like right now? Are they automated? Do they cover all your important data? Could you restore them quickly and efficiently?

If you've found this interesting, please leave a comment below!

NSD, Part 2: Dynamic DNS

Hey there! In the last post, I showed you how to setup nsd, the Name Server Daemon, an authoritative DNS server to serve records for a given domain. In this post, I'm going to talk through how to extend that configuration to support Dynamic DNS.

Normally, if you query, say, the A or AAAA records for a domain or subdomain like git.starbeamrainbowlabs.com, it will return the same IP address that you manually set in the DNS zone file, or if you use some online service then the value you manually set there. This is fine if your IP address does not change, but becomes problematic if your IP address may change unpredictably.

The solution, as you might have guessed, lies in dynamic DNS. Dynamic DNS is a fancy word for some kind of system where the host system that a DNS record points to (e.g. compute.bobsrockets.com) informs the DNS server about changes to its IP address.

This is done by making a network request from the host system to some kind of API that automatically updates the DNS server - usually over HTTP (though anything else could work too, but please make sure it's encrypted!).

You may already be familiar with using a HTTP API to inform your cloud-based registrar (e.g. Cloudflare, Gandi, etc) of IP address changes, but in this post we're going to set dynamic DNS up with the nsd server we configured in the previous post mentioned above.

The first order of business is to find some software to do this. You could also write a thing yourself (see also setting up a systemd service). There are several choices, but I went with dyndnsd (I may update this post if I ever write my own daemon for this).

Next, you need to determine what subdomain you'll use for dynamic dns. Since DNS is hierarchical, an entire subdomain is required - you can't just do dynamic DNS for, say, wiki.bobsrockets.com - since dyndnsd will manage it's own DNS zone file, all dynamic DNS hostnames will be under that subdomain - e.g. wiki.dyn.bobsrockets.com.

Configuring the server

For the server, I will be assuming that the dynamic dns daemon will be running on the same server as the nsd daemon.

For this tutorial, we'll be setting it up unencrypted. This is a security risk if you are setting it up to accept requests over the Internet rather than a local trusted network! Notes on how to fix this at the end of this post.

Since this is a Ruby-based program (which I do generally recommend avoiding since Ruby is generally an inefficient language to write a program in I've observed), first we need to install gem, the Ruby package manager:

sudo apt install ruby ruby-rubygems ruby-dev

Then, we can install the gem Ruby package manager:

sudo gem install dyndnsd

Now, we need to configure it. dyndnsd is configured using a YAML (ew) configuration file. It's probably best to show an example configuration file and explain it afterwards:

# listen address and port
host: "0.0.0.0"
port: 5354
# The internal database file. We'll create this in a moment.
db: "/var/lib/dyndnsd/db.json"
# enable debug mode?
debug: false
# all hostnames are required to be cool-name.dyn.bobsrockets.com
domain: "dyn.bobsrockets.com"
# configure the updater, here we use command_with_bind_zone, params are updater-specific
updater:
  name: "command_with_bind_zone"
  params:
    zone_file: "/etc/dyndnsd/zones/dyn.bobsrockets.com.zone"
    command: "systemctl reload nsd"
    ttl: "5m"
    dns: "bobsrockets.com."
    email_addr: "bob.bobsrockets.com"
# Users with the hostnames they are allowed to create/update
users:
  computeuser: # <--- Username
    password: "alongandrandomstring"
    hosts:
      - compute1.dyn.bobsrockets.com
  computeuser2:
    password: "anotherlongandrandomstring"
    hosts:
      - compute2.dyn.bobsrockets.com
      - compute3.dyn.bobsrockets.com

...several things to note here that I haven't already noted in comments.

  • zone_file: "/etc/nsd/zones/dyn.bobsrockets.com.zone": This is the path to the zone file dyndnsd should update.
  • dns: "bobsrockets.com.": This is the fully-qualified hostname with a dot at the end of the DNS server that will be serving the DNS records (i.e. the nsd server).
  • email_addr: "bob.bobsrockets.com": This sets the email address of the administrator of the system, but the @ at sign is replaced with a dot .. If your email address contains a dot . in the user (e.g. [email protected]), then it won't work as expected here.

Also important here is that although when dealing with domains like this it is less confusing to always require a dot . at the end of fully qualified domain names, this is not always the case here.

Once you've written the config file,, create the directory /etc/dyndnsd and write it to /etc/dyndnsd/dyndnsd.yaml.

With the config file written, we now need to create and assign permissions to the data directory it will be using. Do that like so:

sudo useradd --no-create-home --system --home /var/lib/dyndnsd dyndnsd
sudo mkdir /var/lib/dyndnsd
sudo chown dyndnsd:dyndnsd /var/lib/dyndnsd

Also, we need to create the zone file and assign the correct permissions so that it can write to it:

sudo mkdir /etc/dyndnsd/zones
sudo chown dyndnsd:dyndnsd /etc/dyndnsd/zones
# symlink the zone file into the nsd zones directory. This way dyndns isn't allowed to write to all of /etc/nsd/zones - just the 1 zone file it is supposed to update.
sudo ln -s /etc/dyndnsd/zones/dyn.bobsrockets.com.zone /etc/nsd/zones/dyn.bobsrockets.com.zone

Now, we can write a systemd service file to run dyndnsd for us:

[Unit]
Description=dyndnsd: Dynamic DNS record updater
Documentation=https://github.com/cmur2/dyndnsd

[Service]
User=dyndnsd
Group=dyndnsd
ExecStart=/usr/local/bin/dyndnsd /etc/dyndnsd/dyndnsd.yaml
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=dyndnsd

[Install]
WantedBy=multi-user.target

Save this to /etc/systemd/system/dyndnsd.service. Then, start the daemon like so:

sudo systemctl daemon-reload
sudo systemctl enable --now dyndnsd.service

Finally, don't forget to update your firewall to allow requests through to dyndnsd. For UFW, do this:

sudo ufw allow 5354/tcp comment dyndnsd

That completes the configuration of dyndnsd on the server. Now we just need to update the nsd config file to tell it about the new zone.

nsd's config file should be at /etc/nsd/nsd.conf. Open it for editing, and add the following to the bottom:

zone:
    name: dyn.bobsrockets.com
    zonefile: dyn.bobsrockets.com.zone

...and you're done on the server!

Configuring the client(s)

For the clients, all that needs doing is configuring them to make regular requests to the dyndnsd server to keep it appraised of their IP addresses. This is done by making a HTTP request, so we can test it with curl like this:

curl http://computeuser:[email protected]:5354/nic/update?hostname=compute1.dyn.bobsrockets.com

...where computeuser is the username, alongandrandomstring is the password, and compute1.dyn.bobsrockets.com is the hostname it should update.

The server will be able to tell what the IP address is it should set for the subdomain compute1.dyn.bobsrockets.com by the IP address of the client making the request.

The simplest way of automating this is using cron. Add the following cronjob (sudo crontab -e to edit the crontab):

*/5 * * * *     curl -sS http://computeuser:[email protected]:5354/nic/update?hostname=compute1.dyn.bobsrockets.com

....and that's it! It really is that simple. Windows users will need to setup a scheduled task instead and install curl, but that's outside the scope of this post.

Conclusion

In this post, I've given a whistle-stop tour of setting up a simple dynamic dns server. This can be useful if a host as a dynamic IP address on a local network but it still needs a (sub)domain for some reason.

Note that this is not suitable for untrusted networks! For example, setting dyndnsd to accept requests over the Internet is a Bad Idea, as this simple setup is not encrypted.

If you do want to set this up over an untrusted network, you must encrypt the connection to avoid nasty DNS poisoning attacks. Assuming you already have a working reverse proxy setup on the same machine (e.g. Nginx), you'll need to add a new virtual host (a server { } block in Nginx) that reverse-proxies to your dyndnsd daemon and sets the X-Real-IP HTTP header, and then ensure port 5354 is closed on your firewall to prevent direct access.

This is beyond this scope of this post and slightly different depending on your setup, but if there's the demand I can blog about how to do this.

Sources and further reading

The NSD Authoritative DNS Server: What, why, and how

In a previous blog post, I explained how to setup unbound, a recursive resolving DNS server. I demonstrated how to setup a simple split-horizon DNS setup, and forward DNS requests to an upstream DNS server - potentially over DNS-over-TLS.

Recently, for reasons that are rather complicated, I found myself in an awkward situation which required an authoritative DNS server - and given my love of explaining complicated and rather niche concepts here on my blog, I thought this would be a fabulous opportunity to write a 2-part series :P

In this post, I'm going to outline the difference between a recursive resolver and an authoritative DNS server, and explain why you'd want one and how to set one up. I'll explain how it fits as a part of a wider system.

Go grab your snacks - you'll be learning more about DNS than you ever wanted to know....

DNS in a (small) nutshell

As I'm sure you know if you're reading this, DNS stands for the Domain Name System. It translates domain names (e.g. starbeamrainbowlabs.com.) into IP addresses (e.g. 5.196.73.75, or 2001:41d0:e:74b::1). Every network-connected system will make use of a DNS server at one point or another.

DNS functions on records. These define how a given domain name should be resolved to it's corresponding IP address (or vice verse, but that's out-of-scope of this post). While there are many different types of DNS record, here's a quick reference for the most common one's you'll encounter when reading this post.

  • A: As simple as it gets. An A record defines the corresponding IPv4 address for a domain name.
  • AAAA: Like an A record, but for IPv6.
  • CNAME: An alias, like a symlink in a filesystem [Linux] or a directory junction [Windows]
  • NS: Specifies the domain name of the authoritative DNS server that holds DNS records for this domain. See more on this below.

A tale of 2 (DNS) servers

Consider your laptop, desktop, phone, or other device you're reading this on right now. Normally (if you are using DHCP, which is a story for another time), your router (which usually acts as the DHCP server on most home networks) will tell you what DNS server(s) to use.

These servers that your device talks to is what's known as a recursive resolving DNS server. These DNS servers do not have any DNS records themselves: their entire purpose is to ask other DNS servers to resolve queries for them.

At first this seems rather counterintuitive. Why bother when you can have a server that actually hosts the DNS records themselves and just ask that every time instead?

Given the size of the Internet today, this is unfortunately not possible. If we all used the same DNS server that hosted all DNS records, it would be drowned in DNS queries that even the best Internet connection would not be abel to handle. It would also be a single point of failure - bringing the entire Internet crashing down every time maintenance was required.

To this end, a more scaleable system was developed. By having multiple DNS servers between users and the authoritative DNS servers that actually hold the real DNS records, we can ensure the system scales virtually infinitely.

The next question that probably comes to mind is where the name recursive resolvers DNS server comes from. This name comes from the way that these recursive DNS servers ask other DNS servers for the answer to a query, instead of answering based on records they hold locally (though most recursive resolving DNS servers also have a cache for performance, but this is also a tale for another time).

Some recursive resolving DNS servers - such as the one built into your home router - simply asks 1 or 2 upstream DNS servers - usually either provided by your ISP or manually set by you (I recommend 1.1.1.1/1.0.0.1), but others are truly recursive.

Take peppermint.mooncarrot.space. for example. If we had absolutely no idea where to start resolving this domain, we would first ask a DNS root server for help. Domain names are hierarchical in nature - sub.example.com. is a subdomain of example.com.. The same goes then that mooncarrot.space. is a subdomain of space., which is itself a subdomain of ., the DNS root zone. It is no accident that all the domain names in this blog post have a dot at the end of them (try entering starbeamrainbowlabs.com. into your browser, and watch as your browser auto-hides the trailing dot .).

In this way, if we know the IP address of a DNS root server (e.g. 193.0.14.129, or 2001:7fd::1), we can recurse through this hierarchical tree to discover the IP address associated with a domain name we want to resolve

First, we'd ask a root server to tell us the authoritative DNS server for the space. domain name. We do this by asking it for the NS record for the space. domain.

Once we know the address of the authoritative DNS server for space., we can ask it to give us the NS record for mooncarrot.space. for us. We may repeat this process a number of times - I'll omit the specific details of this for brevity (if anyone's interested, I can write a full deep dive post into this, how it works, and how it's kept secure - comment below) - and then we can finally ask the authoritative DNS server we've tracked down to resolve the domain name peppermint.mooncarrot.space. to an IP address for us (e.g. by asking for the associated A or AAAA record).

Authoritative DNS servers

With this in mind, we can now move on to the main purpose of this post: setting up an authoritative DNS server. As you might have guessed by now, the purpose of an authoritative DNS server is to hold records about 1 or more domain names.

While most of the time the authoritative DNS server for your domain name will be either your registrar or someone like Cloudflare, there are a number of circumstances in which it can be useful to run your own authoritative DNS server(s) and not rely on your registrar:

  • If you need more control over the DNS records served for your domain than your registrar provides
  • Serving complex DNS records for a domain name on an internal network (split-horizon DNS)
  • Setting up your own dynamic DNS system (i.e. where you dynamically update the IP address(es) that a domain name resolves to via an API call)

Other situations certainly exist, but these are 2 that come to mind at the moment (comment below if you have any other uses for authoritative DNS servers).

The specific situation I found myself was a combination of the latter 2 points here, so that's the context in which I'll be talking.

To set one up, we first need some software to do this. There are a number of DNS servers out there:

  • Bind9 [recursive; authoritative]
  • Unbound [recursive; not really authoritative; my favourite]
  • Dnsmasq [recursive]
  • systemd-resolved [recursive; it always breaks for me so I don't use it]

As mentioned Unbound is my favourite, so for this post I'll be showing you how to use it's equally cool sibling, nsd (Name Server Daemon).

The Name Server Daemon

Now that I've explained what an authoritative DNS server is and why it's important, I'll show you how to install and configure one, and then convince another recursive resolving DNS server that's under your control to ask your new authoritative DNS server instead of it's default upstream to resolve DNS queries for a given domain name.

It goes without saying that I'll be using Linux here. If you haven't already, I strongly recommend using Linux for hosting a DNS server (or any other kind of server). You'll have a bad day if you don't.

I will also be assuming that you have a level of familiarity with the Linux terminal. If you don't learn your terminal and then come back here.

nsd is available in all major distributions of Linux in the default repositories. Adjust as appropriate for your distribution:

sudo apt install nsd

nsd has 2 configuration files that are important. First is /etc/nsd/nsd.conf, which configures the nsd daemon itself. Let's do this one first. If there's an existing config file here, move it aside and then paste in something like this:

server:
    port: 5353

    server-count: 1
    username: nsd

    logfile: "/var/log/nsd.log"
    pidfile: "/run/nsd.pid"

    # The zonefile directive(s) below is prefixed by this path
    zonesdir: /etc/nsd/zones

zone:
    name: example.com
    zonefile: example.com.zone

...replace example.com with the domain name that you want the authoritative DNS server to serve DNS records for. You can also have multiple zone: blocks for different (sub)domains - even if those domain names are subdomains of others.

For example, I could have a zone: block for both example.com and dyn.example.com. This can be useful if you want to run your own dynamic DNS server, which will write out a full DNS zone file (a file that contains DNS records) without regard to any other DNS records that might have been in that DNS zone.

Replace also 5353 with the port you want nsd to listen on. In my case I have my authoritative DNS server running on the same box as the regular recursive resolver, so I've had to move the authoritative DNS server aside to a different port as dnsmasq (the recursive DNS server I have running on this particular box) has already taken port 53.

Next up, create the directory /etc/nsd/zones, and then open up example.com.zone for editing inside that new directory. In here, we will put the actual DNS records we want nsd to serve.

The format of this file is governed by RFC1035 section 5 and RFC1034 section 3.6.1, but the nsd docs provide a simpler example. See also the wikipedia page on DNS zone files.

Here's an example:

; example.com.
$TTL 300
example.com. IN     SOA    a.root-servers.net. admin.example.com. (
                2022090501  ; Serial
                3H          ; refresh after 3 hours
                1H          ; retry after 1 hour
                1W          ; expire after 1 week
                1D)         ; minimum TTL of 1 day

; Name Server
IN  NS  dns.example.com.

@                   IN A        5.196.73.75
example.com.        IN AAAA     2001:41d0:e:74b::1
www                 IN CNAME    @
ci                  IN CNAME    @

Some notes about the format to help you understand it:

  • Make sure ALL your fully-qualified domain names have the trailing dot at the end otherwise you'll have a bad day.
  • $TTL 300 specifies the default TTL (Time To Live, or the time DNS records can be cached for) in seconds for all subsequent DNS records.
  • Replace example.com. with your domain name.
  • admin.example.com. should be the email address of the person responsible for the DNS zone file, with the @ replaced with a dot instead.
  • dns.example.com. in the NS record must be set to the domain name of the authoritative DNS server serving the zone file.
  • @ IN A 5.196.73.75 is the format for defining an A record (see the introduction to this blog post) for example.com. - @ is automatically replaced with the domain name in question - in this case example.com.
  • When declaring a record, if you don't add the trailing dot then it is assumed you're referring to a subdomain of the domain this DNS zone file is for - e.g. if you put www it assumes you mean www.example.com.

Once you're done, all that's left for configuring nsd is to start it up for the first time (and on boot). Do that like so:

sudo systemctl restart nsd
sudo systemctl enable nsd

Now, you should be able to query it to test it. I like to use dig for this:

dig -p 5353 +short @dns.example.com example.com

...this should return a result based on the DNS zone file you defined above. Replace 5353 with the port number your authoritative DNS server is running on, or omit -p 5353 altogether if it's running on port 53.

Try it out by updating your DNS zone file and reloading nsd: sudo systemctl reload nsd

Congratulations! You now have an authoritative DNS server under your control! This does not mean that it will be queried by any other DNS servers on your network though - read on.....

Integration with the rest of your network

The final part of this post will cover integrating an authoritative DNS server with another DNS server on your network - usually a recursive one. How you do this will vary depending on the target DNS server you want to convince to talk to your authoritative DNS server.

For Unbound:

I've actually covered this in a previous blog post. Simply update /etc/unbound/unbound.conf with a new block like this:

forward-zone:
    name: "example.com."
    forward-addr: 127.0.0.1@5353

...where example.com. is the domain name to forward for (WITH THE TRAILING DOT; and all subdomains thereof), 127.0.0.1 is the IP address of the authoritative DNS server, and 5353 is the port number of the authoritative DNS server.

Then, restart Unbound like so:

sudo systemctl restart unbound

For dnsmasq:

Dnsmasq's main config file is located at /etc/dnsmasq.conf, but there may be other config files located in /etc/dnsmasq.d/ that might interfere. Either way, update dnsmasq's config file with this directive:

server=/example.com./127.0.0.1#5353

...where example.com. is the domain name to forward for (WITH THE TRAILING DOT; and all subdomains thereof), 127.0.0.1 is the IP address of the authoritative DNS server, and 5353 is the port number of the authoritative DNS server.

If there's another server=/example.com./... directive elsewhere in your dnsmasq config, it may override your new definition.

Then, restart dnsmasq like so:

sudo systemctl restart dnsmasq

If there's another DNS server that I haven't included here that you use, please leave a comment on how to reconfigure it to forward a specific domain name to a different DNS server.

Conclusion

In this post, I've talked about the difference between an authoritative DNS server and a recursive resolving DNS server. I've shown why authoritative DNS servers are useful, and alluded to reasons why running your own authoritative DNS server can be beneficial.

In the second post in this 2-part miniseries, I'm going to go into detail on dynamic DNS, why it's useful, and how to set up a dynamic dns server.

As always, this blog post is a starting point - not an ending point. DNS is a surprisingly deep subject: from DNS root hint files to mDNS (multicast DNS) to the various different DNS record types, there are many interesting and useful things to learn about it.

After all, it's always DNS..... especially when you don't think it is.

Sources and further reading

Art by Mythdael