Starbeamrainbowlabs

Stardust
Blog


Archive


Mailing List Articles Atom Feed Comments Atom Feed Twitter Reddit Facebook

Tag Cloud

3d 3d printing account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio automation backups bash batch blender blog bookmarklet booting bug hunting c sharp c++ challenge chrome os cluster code codepen coding conundrums coding conundrums evolved command line compilers compiling compression conference conferences containerisation css dailyprogrammer data analysis debugging defining ai demystification distributed computing dns docker documentation downtime electronics email embedded systems encryption es6 features ethics event experiment external first impressions freeside future game github github gist gitlab graphics guide hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet interoperability io.js jabber jam javascript js bin labs latex learning library linux lora low level lua maintenance manjaro minetest network networking nibriboard node.js open source operating systems optimisation outreach own your code pepperminty wiki performance phd photos php pixelbot portable privacy problem solving programming problems project projects prolog protocol protocols pseudo 3d python reddit redis reference release releases rendering research resource review rust searching secrets security series list server software sorting source code control statistics storage svg systemquery talks technical terminal textures thoughts three thing game three.js tool tutorial twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 worldeditadditions xmpp xslt

Cluster, Part 6: Superglue Service Discovery | Setting up Consul

Hey, welcome back to another weekly installment of cluster configuration for colossal computing control. Apparently I'm managing to keep this up as a weekly series every Wednesday.

Last week, we sorted out managing updates to the host machines in our cluster to keep them fully patched. We achieved this by firstly setting up an apt caching server with apt-cacher-ng. Then we configured our host machines to use it. Finally, we setup automated updates with unattended-upgrades so we don't have to keep installing them manually all the time.

For reference, here are all the posts in this series so far:

In this part, we're going to install and configure Consul - the first part of the Hashicorp stack. Consul doesn't sound especially exciting, but it is an extremely important part of our (diabolical? I never said that) plan. It serves a few purposes:

  • Clusters together, so Nomad (the task scheduler) can find other nodes
  • Keeps track of which services are running where

It uses the Raft Consensus Algorithm (like wesher from part 4; they actually use the same library under-the-hood it would appear) to provide a relatively decentralised approach to the problem, allowing for some nodes to fail without impacting the cluster's operation as a whole.

It also provides a DNS API, which we'll be tying into with Unbound later in this post.

Before continuing, you may find reading through the official Consul guides a useful exercise. Try out some of the examples too to get your head around what Consul is useful for.

(Above: Nasa's DSN dish in Canberra, Australia just before major renovations are carried out. Credit: NASA/Canberra Deep Space Communication Complex)

Installation and Preamble

To start with, we need to install it. I've done the hard work of packaging it already, so you can install it from my apt repository - which, if you've been following this series, you should have it setup already (if not, follow the link and read the instructions there).

Install consul like this:

sudo apt install consul

Next, we need a systemd service file. Again, I have packages in my apt repository for this. There are 2 packages:

  • hashicorp-consul-systemd-client
  • hashicorp-consul-systemd-server

The only difference between the 2 packages is where it reads the configuration file from. The client package reads from /etc/consul/client.hcl, and the server from /etc/consul/server.hcl. They also conflict with each other, so you can't install both at the same time. This is because - as far as I can tell - servers can expose the client interface in just the same way as any other normal client.

To get a feel for the bigger picture, let's talk architecture. Because Consul uses the Raft Consensus Algorithm, we'll need an odd number of servers to avoid issues (if you use an even number of servers, then you run the risk of a 'split brain', where there's no clear majority vote as to who's the current leader of the cluster). In my case, I have 5 Raspberry Pi 4s:

  • 1 x 2GB RAM (controller)
  • 4 x 4GB RAM (workers)

In this case, I'm going to use the controller as my first Consul server, and pick 2 of the workers at random to be my other 2, to make up 3 servers in total. Note that in future parts of this series you'll need to keep track of which ones are the servers, since we'll be doing this all over again for Nomad.

With this in mind, install the hashicorp-consul-systemd-server package on the nodes you'll be using as your servers, and the hashicorp-consul-systemd-client package on the rest.

A note about cluster management

This is probably a good point to talk about cluster management before continuing. Specifically the automation of said management. Personally, I have the goal of making the worker nodes completely disposable - which means completely automating the setup from installing the OS right up to being folded into the cluster and accepting jobs.

To do this, we'll need a tool to help us out. In my case, I've opted to write one from scratch using Bash shell scripts. This is not something I can recommend to anyone else, unless you want to gain an understanding of how such tools work. My inspiration was efs2, which appears to be a Go program - and Docker files. As an example, my job file for automating the install of a Consul client agent install looks like this:

#!/usr/bin/env bash

SCRIPT "${JOBFILE_DIR}/common.sh";

COPY "../consul/server.hcl" "/tmp/server.hcl"

RUN "sudo mv /tmp/server.hcl /etc/consul/server.hcl";
RUN "sudo chown root:root /etc/consul/server.hcl";
RUN "sudo apt-get update";
RUN "sudo apt-get install --yes hashicorp-consul-systemd-server";

RUN "sudo systemctl enable consul.service";
RUN "sudo systemctl restart consul.service";

...I'll be going through all steps in a moment. Of course, if there's the demand for it then I'll certainly write a post or 2 about my shell scripting setup here (comment below), but I still recommend another solution :P

Note that the firewall configuration is absent here - this is because I've set it to allow all traffic on the wgoverlay network interface, which I talked about in part 4. If you did want to configure the firewall, here are the rules you'd need to create:

sudo ufw allow 8301 comment consul-serf-lan;
sudo ufw allow 8300/tcp comment consul-rpc;
sudo ufw allow 8600 comment consul-dns;

Many other much more mature tools exist - you should use one of those instead of writing your own:

  • Ansible - uses YAML configuration files; organises things logically into 'playbooks' (personally I really seriously can't stand YAML, which is another reason for writing my own)
  • Puppet
  • and more

The other thing to be aware of is version control. You should absolutely put all your configuration files, scripts, Ansible playbooks, etc under version control. My preference is Git, but you can use anything you like. This will help provide a safety net in case you make an edit and break everything. It's also a pretty neat way to keep it all backed up by pushing it to a remote such as your own Git server (you do have automated backups, right?), GitHub, or GitLab.

Configuration

Now that we've got that sorted out, we need to deal with the configuration files. Let's do the server configuration file first. This is written in the Hashicorp Configuration Language. It's probably a good idea to get familiar with it - I have a feeling we'll be seeing a lot of it. Here's my full server configuration (at the time of typing, anyway - I'll try to keep this up-to-date).

bind_addr = "{{ GetInterfaceIP \"wgoverlay\" }}"

# When we have this many servers in the cluster, automatically run the first leadership election
# Remember that the Hashicorp stack uses the Raft consensus algorithm.
bootstrap_expect = 3
server = true
ui = true

client_addr = "127.0.0.1 {{ GetInterfaceIP \"docker0\" }}"

data_dir = "/srv/consul"
log_level = "INFO"

domain = "mooncarrot.space."


retry_join = [
    // "172.16.230.100"
    "bobsrockets",
    "seanssatellites",
    "tillystelescopes"
]

This might look rather unfamiliar. There's also a lot to talk about, so let's go through it bit by bit. If you haven't already, I do suggest coming up with an awesome naming scheme for your servers. You'll thank me later.

The other thing you'll need to do before continuing is buy a domain name. It sounds silly, but it's actually really important. As we talked about in part 3, we're going to be running our own internal DNS - and Consul is a huge part of this.

By default, Consul serves DNS under the .consul top-level-domain, which both unregistered and very bad practice (because it's unregistered). Someone could come along tomorrow and regsister and start using the .consul top-level domain, and then things would get awkward if you ever wanted to visit an external domain that ends in .consul that you're using internally.

I've chosen mooncarrot.space myself, but if you don't yet have one, I recommend taking your time and coming up with a really clever one you like - since you'll be using it for a long time - and updating it later is a pain in the behind. If you're looking for a recommendation for a DNS provider, I've been finding Gandi great so far (much better than GoDaddy, who have tons of hidden charges).

Once you've decided on a domain name (and bought it, if necessary), set it in the server configuration file via the domain directive:

domain = "mooncarrot.space."

Don't forget that trailing dot. As we learned in part 3, it's important, since it indicates an absolute domain name.

Also note that I'm using a subdomain of the domain in question here. This is because of an issue whereby I'm unable to get Unbound to forward that Consul is unable to resolve on to CloudFlare.

Another thing of note is the data_dir directive. Note that this is the data storage directive local to the specific node, not shared storage (we'll be tackling that in a future post).

The client_addr directive here tells Consul which network interfaces to bind the client API to. In our case, we're binding it to the local loopback (127.0.0.1) and the docker0 network interface by dynamically grabbing it's IP address - so that docker containers on the host can use the API.

The bind_addr directive is similar, but for the inter-node communication interfaces. This tells Consul that the other nodes in the Cluster are accessible over the wgoverlay interface that we setup in part 4. This is important, since Consul doesn't encrypt or authenticate it's traffic by default as far as I can tell - and I haven't yet found a good way to do this that doesn't involve putting a password directly into a configuration file.

In this way the WireGuard mesh VPN provides the encryption & authentication that Consul lacks by default (though I'm certainly going to be looking into it anyway).

bootstrap_expect is also interesting. If you've decided on a different number of Consul server nodes, then you should change this value to equal the number of server nodes you're going to have. 3, 5, and 7 are all good numbers - though don't go overboard. More servers means more overhead. Servers take more computing power than clients, so try not to have too many of them.

Finally, retry_join is also very important. It should contain the domain name of all the servers in the cluster. In my case, I'm using the be the hostnames of the other servers in the network, since Wesher (our WireGuard mesh VPN program) automatically adds the machine names of all the nodes in the VPN cluster to your /etc/hosts file. In this way we ensure that the Cluster always talks over the wgoverlay VPN network interface.

Oh yeah, and I should probably note here that your servers should not use FQDNs (Fully Qualified Domain Names) as their hostnames. I found out the hard way: it messes with Consul, and it ends up serving node IPs via DNS on something like harpsichord.mooncarrot.space.node.mooncarrot.space instead something sensible like harpsichord.node.mooncarrot.space. If anyone has a solution for this that doesn't involve using non-FQDNs as hostnames, I'd love to know (since FQDNs as hostnames is my preference).

That was a lot of words. Next, let's do the client configuration file:

bind_addr = "{{ GetInterfaceIP \"wgoverlay\" }}"

bootstrap = false
server = false

domain = "mooncarrot.space."

client_addr = "127.0.0.1 {{ GetInterfaceIP \"docker0\" }}"

data_dir = "/srv/consul"
log_level = "INFO"

retry_join = [
    "wopplefox",
    "spatterling",
    "sycadil"
]

Not much to talk about here, since the configuration is almost indentical to that of the server, except you don't have to tell it how many servers there are, and retry_join should contain the names of the servers that the client should try to join, as explained above.

Once you've copied the configuration files onto all the nodes in your cluster (/etc/consul/server.hcl for servers; /etc/consul/client.hcl for clients), it's now time to boot up the cluster. On all nodes (probably starting with the servers), do this:

# Enable the Consul service on boot
sudo systemctl enable consul.service
# Start the Consul service now
sudo systemctl start consul.service
# Or, you can do it all in 1 command:
sudo systemctl enable --now consul.service

It's probably a good idea to follow the cluster's progress along in the logs. On a server node, do this after starting the service & enabling start on boot:

sudo journalctl -u consul --follow

You'll probably see a number of error messages, but be patient - it can take a few minutes after starting Consul on all nodes for the first time for them to start talking to each other, sort themselves out, and calm down.

Now, you should have a working Consul cluster! On one of the server nodes, do this to list all the servers in the cluster:

consul members

If you like, you can also run this from your local machine. Simply install the consul package (but not the systemd service file), and make some configuration file adjustments. Update your ~/.bashrc on your local machine to add something like this:

export CONSUL_HTTP_ADDR=http://consul.service.mooncarrot.space:8500;

....replacing mooncarrot.space with your own domain, of course :P

Next, update the server configuration file to make the client_addr directive look like this:

client_addr = "127.0.0.1 {{ GetInterfaceIP \"docker0\" }} {{ GetInterfaceIP \"wgoverlay\" }}"

Upload the new version to your servers, and restart them one at a time (unless you're ok with downtime - I'm trying to practice avoiding downtime now so I know all the processes for later):

sudo systemctl restart consul.service

At this point, we've got a fully-functioning Consul cluster. I recommend following some of the official guides to learn more about how it works and what you can do with it.

Unbound

Before we finish for today, we've got 1 more task to complete. As I mentioned back in part 3, we're going to configure our DNS server to conditionally forward queries to Consul. The end result we're aiming for is best explained with a diagram:

A diagram showing how we're aiming for Unbound to resolve queries.

In short:

  1. Try the localzone data
  2. If nothing was found there (or it didn't match), see if it matches Consul's subdomain
  3. If so, forward the query to Consul and return the result
  4. If Consul couldn't resolve the query, forward it to CloudFlare via DNS-over-TLS

The only bit we're currently missing of this process is the Consul bit, which we're going to do now. Edit /etc/unbound/unbound.conf on your DNS server (mine is on my controller node), and insert the following:

###
# Consul
###
forward-zone:
    name: "node.mooncarrot.space."
    forward-addr: 127.0.0.1@8600
forward-zone:
    name: "service.mooncarrot.space."
    forward-addr: 127.0.0.1@8600

...replace mooncarrot.space. with your domain name (not forgetting the trailing dot, of course). Note here that we have 2 separate forward zones here.

Unfortunately, I can't seem to find a way to get Unbound to fall back to a more generic forward zone in the event that a more specific one is unable to resolve the query (I've tried both a forward-zone and a stub-zone). To this end, we need to define multiple more specific forward-zones if we want to be able to forward queries out to CloudFlare for additional DNS records. Here's an example:

  1. tuner.service.mooncarrot.space is an internal service that is resolved by Consul
  2. peppermint.mooncarrot.space is an externally defined DNS record defined with my registrar

If we then ask Unbound to resolve then both, only #1 will be resolved correctly. Unbound will do something like this for #2:

  • Check the local-zone for a record (not found)
  • Ask Consul (not found)
  • Return error

If you are not worried about defining DNS records in your registrar's web interface / whatever they use, then you can just do something like this instead:

###
# Consul
###
forward-zone:
    name: "mooncarrot.space."
    forward-addr: 127.0.0.1@8600

For advanced users, the Consul's documentation on the DNS interface is worth a read, as it gives the format of all DNS records Consul can service.

Note also here that the recursors configuration option is an alternative solution, but I don't see an option to force DNS-over-TLS queries there.

If you have a better solution, please get in touch by commenting below or answering my ServerFault question.

With this done, you should be able to ask Consul what the IP address of any node in the cluster is like so:

dig +short harpsichord.node.mooncarrot.space
dig +short grandpiano.node.mooncarrot.space
dig +short oboe.node.mooncarrot.space
dig +short some_machine_name_here.node.mooncarrot.space

Again, you'll need to replace mooncarrot.space of course with your domain name.

Conclusion

Phew! There was a lot of steps and moving parts to setup in this post. Take your time, and re-read this post a few times to make sure you've got all your ducks in a row. Make sure to test your new Consul cluster by reading the official guides as mentioned above too, as it'll cause lots of issues later if you've got bugs lurking around in Consul.

I can't promise that it's going to get easier in future posts - it's probably going to get a lot more complicated with lots more to keep track of, so make sure you have a solid understanding of what we've done so far before continuing.

To summarise, we've managed to setup a brand-new Consul cluster. We've configured Unbound to forward queries to Consul, to enable seamless service discovery (regardless of which host things are running on) later on.

We've also picked an automation framework for automating the setup and configuration of the various services and such we'll be configuring. I recommend taking some time to get to know your chosen framework - many have lots of built-in modules to make common tasks much easier. Try going back to previous posts in this series (links at the top of this post) and implementing them in your chosen framework.

Finally, we've learnt a bit about version control and it's importance in managing configuration files.

In the next few posts in this series (unless I get distracted - likely - or have a change of plans), we're going to be setting up Nomad, the task scheduler that will be responsible for managing what runs where and informing Consul of this. We're also going to be setting up and configuring a private Docker registry and Traefik - the latter of which is known as an edge router (more on that in future posts).

See you next time, when we dive further down into what's looking less like a rabbit hole and more like a cavernous sinkhole of epic proportions.

Found this useful? Confused about something? Got a suggestion? Comment below! It's really awesome to hear that my blog posts have helped someone else out.

Cluster, Part 5: Staying current | Automating apt updates and using apt-cacher-ng

Hey there! Welcome to another cluster blog post. In the last post, we looked at setting up a WireGuard mesh VPN as a trusted private network for management interfaces and inter-cluster communication. As a refresher, here's a list of all the parts in this series so far:

Before we get to the Hashicorp stack though (next week, I promise!), there's an important housekeeping matter we need to attend to: managing updates.

In Debian-based Linux distributions such as Raspbian (that I'm using on my Raspberry Pis), updates are installed through apt - and this covers everything from the kernel to the programs we have installed - hence the desire to package everything we're going to be using to enable easy installation and updating.

There are a number of different related command-line tools, but the ones we're interested in are apt (the easy-to-use front-end CLI) and apt-get (the original tool for installing updates).

There are 2 key issues we need to take care of:

  1. Automating the installation of packages updates
  2. Caching the latest packages locally to avoid needless refetching

Caching package updates locally with apt-cacher-ng

Issue #2 here is slightly easier to tackle, surprisingly enough, so we'll do that first. We want to cache the latest packages locally, because if we have lots of machines in our cluster (I have 5, all running Raspbian), then when they update they all have to download the package lists and the packages themselves from the remote sources each time. Not only is this bandwidth-inefficient, but it takes longer and puts more strain on the remote servers too.

For apt, this can be achieved through the use of apt-cacher-ng. Other distributions require different tools - in particular I'll be researching and tackling Alpine Linux's apk package manager in a future post in this series - since I intend to use Alpine Linux as my primary base image for my Docker containers (I also intend to build my own Docker containers from scratch too, so that will be an interesting experience that will likely result in a few posts too).

Anyway, installation is pretty simple:

sudo apt install apt-cacher-ng

Once done, there's a little bit of tuning we need to attend to. apt-cacher-ng by default listens for HTTP requests on TCP port 3142 and has an administrative interface at /acng-report.html. This admin interface is not, by default, secured - so this is something we should do before opening a hole in the firewall.

This can be done by editing the /etc/apt-cacher-ng/security.conf configuration file. It should read something like this:


# This file contains confidential data and should be protected with file
# permissions from being read by untrusted users.
#
# NOTE: permissions are fixated with dpkg-statoverride on Debian systems.
# Read its manual page for details.

# Basic authentication with username and password, required to
# visit pages with administrative functionality. Format: username:password

AdminAuth: username:password

....you may need to use sudo to view and edit it. Replace username and password with your own username and a long unguessable password that's not similar to any existing passwords you have (especially since it's stored in plain text!).

Then we can (re) start apt-cacher-ng:

sudo systemctl enable apt-cacher-ng
sudo systemctl restart apt-cacher-ng

The last thing we need to do here is to punch a hole through the firewall, if required. As I explained in the previous post, I'm using a WireGuard mesh VPN, so I'm allowing all traffic on that interface (for reasons that will - eventually - come clear), so I don't need to open a separate hole in my firewall unless I want other devices on my network to use it too (which wouldn't be a bad idea, all things considered).

Anyway, ufw can be configured like so:

sudo ufw allow 3142/tcp comment apt-cacher-ng

With the apt-cacher server installed and configured, you can now get apt to use it:

echo 'Acquire::http { Proxy \"http://X.Y.Z.W:3142\"; }' | sudo tee -a /etc/apt/apt.conf.d/proxy

....replacing X.Y.Z.W with the IP address (or hostname!) of your apt-cacher-ng server. Note that it will get upset if you use https anywhere in your apt sources, so you'll have to inspect /etc/apt/sources.list and all the files in /etc/apt/sources.list.d/ manually and update them.

Automatic updates with unattended-upgrades

Next on the list is installing updates automatically. This is useful because we don't want to have to manually install updates every day on every node in the cluster. There are positives and negatives about installing updates - I recommend giving the top of this article a read.

First, we need to install unattended-upgrades:

sudo apt install unattended-upgrades

Then, we need to edit the /etc/apt/apt.conf.d/50unattended-upgrades file - don't forget to sudo.

Unfortunately, I haven't yet automated this process (or properly developed a replacement configuration place that can be automatically placed on a target system by a script), so for now we'll have to do this manually (the mssh command might come in handy).

First, find the line that starts with Unattended-Upgrade::Origins-Pattern, and uncomment the lines that end in -updates, label=Debian, label=Debian-Security. For Raspberry Pi users, add the following lines in that bit too:

"origin=Raspbian,codename=${distro_codename},label=Raspbian";

// Additionally, for those running Raspbian on a Raspberry Pi,
// match packages from the Raspberry Pi Foundation as well.
"origin=Raspberry Pi Foundation,codename=${distro_codename},label=Raspberry Pi Foundation";

unattended-upgrades will only install packages that are matched by a list of origins. Unfortunately, the way that you specify which updates to install is a total mess, and it's not obvious how to configure it. I did find an Ask Ubuntu answer that explains how to get unattended-upgrades to install updates. If anyone knows of a cleaner way of doing this - I'd love to know.

The other decision to make here is whether you'd like your hosts to automatically reboot. This could be disruptive, so only enable it if you're sure that it won't interrupt any long-running tasks.

To enable it, find the line that starts with Unattended-Upgrade::Automatic-Reboot and set it to true (uncommenting it if necessary). Then find the Unattended-Upgrade::Automatic-Reboot-Time setting and set it to a time of day you're ok with it rebooting at - e.g. 03:00 for 3am in the morning - but take care to avoid all your servers from rebooting at the same time, as this might cause issues later.

A few other settings need to be updated too. Here are they are, with their correct values:

APT::Periodic::Update-Package-Lists "1";
APT::Periodic::Download-Upgradeable-Packages "1";
APT::Periodic::Unattended-Upgrade "1";
APT::Periodic::AutocleanInterval "7";

Make sure you find the existing settings and update them, because otherwise if you just paste these in, they may get overridden. In short these settings:

  • Enable automatic updates to the package metadata indexes
  • Downloads upgradeable packages
  • Installs downloaded updates
  • Automatically cleans the cache up every 7 days

Once done, save and close that file. Finally, we need to enable and start the unattended-upgrades service:

sudo systemctl enable unattended-upgrades
sudo systemctl restart unattended-upgrades

To learn more about automatic upgrades, these guides might help shed some additional light on the subject:

Conclusion

In this post, we've taken a look at apt package caching and unattended-upgrades. In the future, I'm probably going to have to sit down and either find an alternative to unattended-upgrades that's easier to configure, or rewrite the entire configuration file and create my own version. Comments and suggestions on this are welcome in the comments.

In the next post, we'll be finally getting to the Hashicorp stack by installing and configuring Consul. Hold on to your hats - next week's post is significantly complicated.

Edit 2020-05-09: Add missing instructions on how to get apt to actually use an apt-cacher-ng server

Cluster, Part 4: Weaving Wormholes | Peer-to-Peer VPN with WireGuard

(Above: The WireGuard and wesher logos. Background photo: from Unsplash by Clint Adair.)

Hey - welcome back! Last week, we set Unbound up as our primary DNS server for our network. We also configured cluster member devices to use it for DNS resolution. As a recap, here are links to the all the posts in this series so far:

In this part, we're going to setup a WireGuard) peer-to-peer VPN. This is a good idea for several reasons:

  • It provides defence-in-depth
  • It protects non-encrypted / unprotected private services from the rest of the network

The latter point here is particularly important - especially if you've got other device on your network like I have. If you're somehow following along with this series with devices fancy enough to have multiple network interfaces, you can connect the 2nd network interface of every server to a separate switch, that doesn't connect to anywhere else. Don't forget that you'll need to setup a DHCP server on this new mini-network (or configure static IPs manually on each device, which I don't recommend) - but this is out-of-scope of this article.

In the absence of such an opportunity, a peer-to-peer VPN should do the trick just as well. We're going to be using WireGuard), which I discovered recently. It's very cool indeed - and it's apparently special enough to be merged directly into the Linux Kernel v5.6! It's been given high praise from security experts too.

What I like most is it's simplicity. It follows the UNIX Philosophy, and as such while it's very simple in the way it works, it can be used and applied in so many different ways to solve so many different problems.

With this in mind, let's get to installing it! If you're on an Ubuntu or Debian-based machine, then you should just be able to install it directly:

sudo apt install wireguard

Note that you might have to have your kernel's development headers installed if you experience issues. For Raspbian users (like me), installation is slightly more complicated. We'll need to setup the debian-backports apt repository to pull it in, since the Debian developers have backported it to the latest stable version of Debian (e.g. like a hotfix) - but Raspbian hasn't yet pulled it in. This is done in 2 steps:

# Add the debian-backports GPG key
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 04EE7237B7D453EC 648ACFD622F3D138
# Add the debian-backports apt repo
echo 'deb http://httpredir.debian.org/debian buster-backports main contrib non-free' | sudo tee /etc/apt/sources.list.d/debian-backports.list

Now, we should be able to update to download the new repository metadata:

sudo apt update

Next, we need to install the Raspberry Pi Linux kernel headers. Unlike other distributions (which use the linux-headers-generic package), this is done in slightly different way:

sudo apt install raspberrypi-kernel-headers

This might take a while. Once done, we can now install WireGuard itself:

sudo apt install wireguard

Again, this will take a while. Don't forget to pay close attention to the output - I've noticed that it's fond of throwing error messages, but not actually counting it as an error and ultimately claiming it completed successfully.

With this installed, we can now setup our WireGuard peer-to-peer VPN. WireGuard itself works on a public-private keypair per-device setup. A device first generates a keypair, and then the public key thereof needs copying to all other devices it wants to connect to. In this fashion, both a peer-to-peer setup (like we're after), and a client-server setup (like a more traditional VPN such as IPSec or OpenVPN) can be configured.

The overhead of configuring such a peer-to-peer WireGuard VPN is considerable though, since every time a device is added to the VPN every existing device needs updating to add the public key thereof to establish communications between the 2.

While researching an easier solution to the problem, I came across wesher, which does much of the heavy-lifting for you. It does of course come at the cost of slightly reduced security (since the entire VPN network is protected by a single pre-shared key) and reduced configurability, but from my experiences so far it works like a charm for my purposes - and it eases management too.

It is distributed as a single Go binary, that uses the Raft Consensus Algorithm (the same library that Nomad and Consul use actually) to manage cluster membership and provide self-healing properties. This makes is very easy for my to package into an apt package, which I've added to my apt repository.

The instructions to add my apt repository can be found on it's page, but here they are directly:

# Add the repository
echo "deb https://apt.starbeamrainbowlabs.com/ ./ # apt.starbeamrainbowlabs.com" | sudo tee /etc/apt/sources.list.d/sbrl.list
# Import the signing key
wget -q https://apt.starbeamrainbowlabs.com/aptosaurus.asc -O- | sudo apt-key add -
# Alternatively, import the signing key from a keyserver:
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys D48D801C6A66A5D8
# Update apt's cache
sudo apt update

Don't forget that if you're using a caching server like apt-cacher-ng (which we'll be setting up in the next post in this series), you'll probably want to change the https in the first command there to regular http in order for the caching to take effect. Note that this doesn't affect the security of the downloaded packages, since apt will verify the integrity and GPG signature of all packages downloaded. It only affects the privacy of the connection used to download the packages themselves (more on this in a future post).

Once setup, install wesher like so:

sudo apt install wesher

If you've got a systemd-based system (as I have sadly with Raspbian), I provide a systemd service file in a separate package:

sudo apt install wesher-systemd

Don't forget to perform these steps on all the machines you want to enter the cluster. Next, we need to configure our firewall. I'm using UFW - I hope you set up something similar when you first configured your servers you're clustering (I recommend UFW. Note also that this tutorial series will not cover the basics like this - instead, I'll link to other tutorials and such as I think of them). For UFW users, do this:

sudo ufw allow 7946 comment wesher-gossip
sudo ufw allow 51820/udp comment wesher-wireguard

Wesher requires 2 ports: 1 for the clustering traffic for managing cluster membership, and another for the WireGuard traffic itself. Now, we can initialise the cluster. This has to be done on an interactive terminal for the first time, to avoid leaking the cluster's pre-shared key to log files. Do it like this in a terminal:

sudo wesher

It should tell you that it's generated a new cluster key - it will save it in a private configuration directory. Save this somewhere safe, such as in your password manager. Now, you can press Ctrl + C to quit, and start the systemd service:

sudo systemctl enable --now wesher.service

It's perhaps good practice to check that the service has started successfully:

sudo systemctl status wesher.service

Having 1 node setup is nice, but not very useful. Adding additional nodes to the cluster is a bit different. Follow this tutorial up to and including the installation of the wesher and wesher-systemd packages, and then instead of just doing sudo wesher, do this instead:

sudo wesher --cluster-key CLUSTER_KEY_HERE --join IP_OF_ANOTHER_NODE --overlay-net 172.31.250.0/16 --log-level info

...replacing CLUSTER_KEY_HERE with your cluster key (don't forget to prefix the entire command with a space to avoid it from entering your shell history file), and IP_OF_ANOTHER_NODE with the IP address of another node in the cluster on the external untrusted network. Note that the --overlay-net there is required because of the way I wrote the systemd service file in the wesher-systemd package. I did this:

[Unit]
Description=wesher - wireguard mesh builder
After=network-online.target
After=syslog.target
After=rsyslog.service

[Service]
EnvironmentFile=-/etc/default/wesher
ExecStart=/usr/local/sbin/wesher --overlay-net 172.31.250.0/16 --log-level info
Restart=on-failure
Type=simple
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=wesher

[Install]
WantedBy = multi-user.target

I explicitly specify the subnet of the VPN network here to avoid clashes with other networks in the 10.0.0.0/8 range. I don't have a network in that range, but I know that others do. Unfortunately there's currently a known bug that means that IP address collisions may occur, and the cluster can't currently sort them out. So you need to use a pretty large subnet for now to avoid such collisions (here's hoping this one is patched soon).

Note that if you want to set additional configuration options, you can do so in /etc/default/wesher in the format VAR_NAME=VALUE - 1 per line. A full reference of the supported environment variables can be found here.

Anyway, once wesher has joined the cluster on the new node, press Ctrl + C to exit and then start and enable the systemd service as before (note that wesher saves all the configuration details to a configuration directory, so the cluster key doesn't need to be provided more than once):

sudo systemctl enable --now wesher.service
sudo systemctl status wesher.service

With this, you should have a fully-functional Wireguard peer-to-peer VPN setup with wesher. You can ask a node as to what IP address it has on the VPN by using the following command:

ip address

The IP address should be shown next to the wgoverlay network interface.

The last thing to do here is to configure our Firewall. In my case, I'm using UFW, so the instructions I include here will be specific to UFW (if you use another firewall, translate these commands for your firewall, and command below!).

In my specific case, I want to take the unusual step of allowing all traffic in on the VPN. The reason for this will become apparent in a future post, but in short Nomad dynamically allocates outward-facing ports for services. There's a good reason for this I'll get into at the time, but for now, this is how you'd do that:

sudo ufw allow in on wgoverlay

Of course, we can add override rules here that block traffic if we need to. Note that this only allows in all traffic on the wgoverlay network interface and not all network interfaces as my previous blog about UFW would have done - i.e. like this:

sudo ufw allow 1234/tcp

In the next part, we'll take a look at setting up an apt caching server to improve performance and reduce Internet data usage when downloading system updates.

Found this useful? Spotted a mistake? Confused about something? Comment below! It really helps motivate me to write these posts.

Cluster, Part 3: Laying groundwork with Unbound as a DNS server

Looking up at the canopy of some trees

(Above: A picture from my wallpapers folder. If you took this, comment below and I'll credit you)

Welcome to another blog post about my cluster! Although I had to replace the ATX PSU I was planning on using to power the thing with a USB power supply instead, I've got all the Pis powered up and networked together now - so it's time to finally start on the really interesting bit!

In this post, I'm going to show you how to install Unbound as a DNS server. This will underpin the entire stack we're going to be setting up - and over the next few posts in this series, I'll take you through the entire process.

Starting with DNS is a logical choice, which comes with several benefits:

  • We get a minor performance boost by caching DNS queries locally
  • We get to configure our DNS server to make encrypted DNS-over-TLS queries on behalf of the entire network (even if some of the connected devices - e.g. phones don't support doing so)
  • If we later change our minds and want to shuffle around the IP addresses, it's not such a big deal as if we used IP addresses in all our config files

With this in mind, I starting with DNS before moving on to Docker and the Hashicorp stack:

A picture of the homepages of the Hashicorp stack

Before we begin, let's set out our goals:

  • We want a caching resolver, to avoid repeated requests across the Internet for the same query
  • We want to encrypt queries that leave the network via DNS-over-TLS
  • We want to be able to add our own custom DNS records for a domain of our choosing, for internal resolution only.

The last point there is particularly important. We want to resolve something like 172.16.230.100 to server1.bobsrockets.com internally, but not externally outside the network. This way we can include server1.bobsrockets.com in config files, and if the IP changes then we don't have to go back and edit all our config files - just reload or restart the relevant services.

Without further delay, let's start by installing unbound:

sudo apt install unbound

If you're on another system, translate this for your package manager. See this amazing wiki page for help on translating package manager commands :-)

Next, we need to write the config file. The default config file for me it located at /etc/unbound/unbound.conf:

# Unbound configuration file for Debian.
#
# See the unbound.conf(5) man page.
#
# See /usr/share/doc/unbound/examples/unbound.conf for a commented
# reference config file.
#
# The following line includes additional configuration files from the
# /etc/unbound/unbound.conf.d directory.
include: "/etc/unbound/unbound.conf.d/*.conf"

There's not a lot in the /etc/unbound/unbound.conf.d/ directory, so I'm going to be extending /etc/unbound/unbound.conf. First, we're going to define a section to get Unbound to forward requests via DNS-over-TLS:

forward-zone:
    name: "."
    # Cloudflare DNS
    forward-addr: 1.0.0.1@853
    # DNSlify - ref https://www.dnslify.com/services/resolver/
    forward-addr: 185.235.81.1@853
    forward-ssl-upstream: yes

The . name there simply means "everything". If you haven't seen it before, the fully-qualified domain name for seanssatellites.io for example is as follows:

seanssatellites.io.

Notice the extra trailing dot . there. That's really important, as it signifies the DNS root (not sure on it's technical name. Comment if you know it, and I'll update this). The io bit is the top-level domain (commonly abbreviated as TLD). seanssatellites is the actual domain bit that you buy.

It's a hierarchical structure, and apparently used to be inverted here in the UK before the formal standard was defined by the IETF (Internet Engineering Task Force) - of which RFC 1034 was a part.

Anyway, now that we've told Unbound to forward queries, the next order of business is to define a bunch of server settings to get it to behave the way we want it to:

server:
    interface: 0.0.0.0
    interface: ::0

    ip-freebind: yes

    # Access control - default is to deny everything apparently

    # The local network
    access-control: 172.16.230.0/24 allow
    # The docker interface
    access-control: 172.17.0.1/16 allow

    username: "unbound"

    harden-algo-downgrade: yes
    unwanted-reply-threshold: 10000000


    prefetch: yes

There's a lot going on here, so let's break it down.

Property Meaning
interface Tells unbound what interfaces to listen on. In this case I tell it to listen on all interfaces on both IPv4 and IPv6.
ip-freebind Tells it to try and listen on interfaces that aren't yet up. You probably don't need this, so you can remove it. I'm including it here because I'm currently unsure whether unbound will start before docker, which I'm going to be using extensively. In the future I'll probably test this and remove this directive.
access-control unbound has an access control system, which functions rather like a firewall from what I can tell. I haven't had the time yet to experiment (you'll be seeing that a lot), but once I've got my core cluster up and running I intend to experiment and play with it extensively, so expect more in the future from this.
username The name of the user on the system that unbound should run as.
harden-algo-downgrade Protect against downgrade attacks when making encrypted connections. For some reason the default is to set this to no, so I enable it here.
unwanted-reply-threshold Another security-hardening directive. If this many DNS replies are received that unbound didn't ask for, then take protective actions such as emptying the cache just in case of a DNS cache poisoning attack
prefetch Causes unbound to prefetch updated DNS records for cache entries that are about to expire. Should improve performance slightly.

If you have a flaky Internet connection, you can also get Unbound to return stale DNS cache entries if it can't reach the remote DNS server. Do that like this:

server:
    # Service expired cached responses, but only after a failed 
    # attempt to fetch from upstream, and 10 seconds after 
    # expiration. Retry every 10s to see if we can get a
    # response from upstream.
    serve-expired: yes
    serve-expired-ttl: 10
    serve-expired-ttl-reset: yes

With this, we should have a fully-functional DNS server. Enable it to start on boot and (re)start it now:

sudo systemctl enable unbound.service
sudo systemctl restart unbound.service

If it's not already started, the restart action will start it.

Internal DNS records

If you're like me and want some custom DNS records, then continue reading. Unbound has a pretty nifty way of declaring custom internal DNS records. Let's enable that now. First, you'll need a domain name that you want to return custom internal DNS records for. I recommend buying one - don't use an unregistered one, just in case someone else comes along and registers it.

Gandi are a pretty cool registrar - I can recommend them. Cloudflare are also cool, but they don't allow you to register several years at once yet - so they are probably best set as the name servers for your domain (that's a free service), leaving your domain name with a registrar like Gandi.

To return custom DNS records for a domain name, we need to tell unbound that it may contain private DNS records. Let's do that now:

server:
    private-domain: "mooncarrot.space"

This of course goes in /etc/unbound/unbound.conf, as before. See the bottom of this post for the completed configuration file.

Next, we need to define some DNS records:

server:
    local-zone: "mooncarrot.space." transparent
    local-data: "controller.mooncarrot.space.   IN A 172.16.230.100"
    local-data: "piano.mooncarrot.space.        IN A 172.16.230.101"
    local-data: "harpsichord.mooncarrot.space.  IN A 172.16.230.102"
    local-data: "saxophone.mooncarrot.space.    IN A 172.16.230.103"
    local-data: "bag.mooncarrot.space.          IN A 172.16.230.104"

    local-data-ptr: "172.16.230.100 controller.mooncarrot.space."
    local-data-ptr: "172.16.230.101 piano.mooncarrot.space."
    local-data-ptr: "172.16.230.102 harpsichord.mooncarrot.space."
    local-data-ptr: "172.16.230.103 saxophone.mooncarrot.space."
    local-data-ptr: "172.16.230.104 bag.mooncarrot.space."

The local-zone directive tells it that we're defining custom DNS records for the given domain name. The transparent bit tells it that if it can't resolve using the custom records, to forward it out to the Internet instead. Other interesting values include:

Value Meaning
deny Serve local data (if any), otherwise drop the query.
refuse Serve local data (if any), otherwise reply with an error.
static Serve local data, otherwise reply with an nxdomain or nodata answer (similar to the reponses you'd expect from a DNS server that's authoritative for the domain).
transparent Respond with local data, but resolve other queries normally if the answer isn't found locally.
redirect serves the zone data for any subdomain in the zone.
inform The same as transparent, but logs client IP addresses
inform_deny Drops queries and logs client IP addresses

Adapted from /usr/share/doc/unbound/examples/unbound.conf, the example Unbound configuration file.

Others exist too if you need even more control, like always_refuse (which always responds with an error message).

The local-data directives define the custom DNS records we want Unbound to return, in DNS records syntax (again, if there's an official name for the syntax leave a comment below). The local-data-ptr directive is a shortcut for defining PTR, or reverse DNS records - which resolve IP addresses to their respective domain names (useful for various things, but also commonly used as a step to verify email servers - comment below and I'll blog on lots of other shady and not so shady techniques used here).

With that, our configuration file is complete. Here's the full configuration file in it's entirety:

# Unbound configuration file for Debian.
#
# See the unbound.conf(5) man page.
#
# See /usr/share/doc/unbound/examples/unbound.conf for a commented
# reference config file.
#
# The following line includes additional configuration files from the
# /etc/unbound/unbound.conf.d directory.
include: "/etc/unbound/unbound.conf.d/*.conf"

server:
    interface: 0.0.0.0
    interface: ::0

    ip-freebind: yes

    # Access control - default is to deny everything apparently

    # The local network
    access-control: 172.16.230.0/24 allow
    # The docker interface
    access-control: 172.17.0.1/16 allow

    username: "unbound"

    harden-algo-downgrade: yes
    unwanted-reply-threshold: 10000000

    private-domain: "mooncarrot.space"

    prefetch: yes

    # ?????? https://www.internic.net/domain/named.cache

    # Service expired cached responses, but only after a failed 
    # attempt to fetch from upstream, and 10 seconds after 
    # expiration. Retry every 10s to see if we can get a
    # response from upstream.
    serve-expired: yes
    serve-expired-ttl: 10
    serve-expired-ttl-reset: yes

    local-zone: "mooncarrot.space." transparent
    local-data: "controller.mooncarrot.space.   IN A 172.16.230.100"
    local-data: "piano.mooncarrot.space.        IN A 172.16.230.101"
    local-data: "harpsichord.mooncarrot.space.  IN A 172.16.230.102"
    local-data: "saxophone.mooncarrot.space.    IN A 172.16.230.103"
    local-data: "bag.mooncarrot.space.          IN A 172.16.230.104"

    local-data-ptr: "172.16.230.100 controller.mooncarrot.space."
    local-data-ptr: "172.16.230.101 piano.mooncarrot.space."
    local-data-ptr: "172.16.230.102 harpsichord.mooncarrot.space."
    local-data-ptr: "172.16.230.103 saxophone.mooncarrot.space."
    local-data-ptr: "172.16.230.104 bag.mooncarrot.space."

    fast-server-permil: 500

forward-zone:
    name: "."
    # Cloudflare DNS
    forward-addr: 1.0.0.1@853
    # DNSlify - ref https://www.dnslify.com/services/resolver/
    forward-addr: 185.235.81.1@853
    forward-ssl-upstream: yes

Where do we go from here?

Remember that it's important to not just copy and paste a configuration file, but to understand what every single line of it does.

In a future post in this series, we'll be revising this to forward requests for *.service.mooncarrot.space to Consul, a clustered service discovery system that keeps track of what is running where and presents a DNS server as an interface (there are others).

In the next post, we'll (probably) be looking at setting up Consul - unless something else happens first :P Nomad should be next after that, followed closely by Vault.

Once I've got all that set up (which is a blog post series in and of itself!), I'll then look at encrypting all communications between all nodes in the cluster. After that, we'll (finally?) get to Docker and my plans there. Other things include an apt and apk (the Alpine Linux package manager) caching servers - which will have to be tackled separately.

Oh yeah, and I want to explore Traefik, which is apparently like Nginx, but intended for containerised infrastructure.

All this is definitely going to keep me very busy!

Found this interesting? Got a suggestion? Comment below!

Sources and Further Reading

Cluster, Part 2: Grand Designs

In the last part of this series, I talked about my plans for building an ARM-based cluster, because I'm growing out of the Raspberry Pi 3B+ I currently have at home. Since then, I have decided to focus on the compute cluster first, as I have a reasonable amount of room left on the 1tB WD Pidrive I have attached to my existing Raspberry Pi 3B+.

Hardware

To this end, I have been busy ordering parts and organising things to get construction of the compute cluster side of things going. The most important part of the whole cluster is the compute boards themselves. I've decided to go with 4 x Raspberry Pi 4s with 4GB RAM each for the worker nodes, and 1 x Raspberry Pi 4 with 2GB of RAM as the controller (it would have been a 1GB RAM model, but a recent announcement changed my mind :D):

(Above: The Raspberry Pi 4s I'm going to be using. The colourful heatsink cases there are to dissipate heat passively if possible and reduce the need for the fan to run as often. The one with the smaller red heatsink is the controller node - I don't anticipate the load on that node being high enough to need a bigger more expensive heatsink)

My reasoning for Raspberry Pis is software support. They are hugely popular - and from experience I can tell that they are pretty well supported on the software side of things. Issues with hardware features not being supported by the operating system are minimal - and where issues do arise they are more often than not sorted out. Regular kernel security updates are also provided - something that isn't always a thing with Linux distributions for other boards I've noticed.

Although the nodes in the cluster are very important, they are far from the only component I'll need. I'll also need a way to power it - which I've settled on an using a desktop ATX power supply (generously donated by University).

(Above: The ATX power supply, with a few wires cut and other bits and bobs attached. As of this blog post I'm in the middle of wiring it up, so I haven't finished it yet)

This adds some additional complications though, because wiring an ATX power supply up to a fleet of Raspberry Pi 4s isn't as easy as it sounds. To do that, I've decided to wire the 5V and ground wires up to 5 USB type-a breakout boards, with a 3 amp self-resettable fuse on each live (red) wire. Then I can use 5 short type-a to type-c converter cables to power the Raspberry Pi 4s.

(Above: The extra bits and bobs laid out that I'll be using to wire the ATX power supply up to the USB type-a breakout boards. From left to right: 3A self-resettable fuses, 18 AWG wire, Wagos, header pins, and finally the USB type-a breakout boards themselves)

With power to the Raspberry Pis, the core compute hardware is in place. I still need a bunch of things around the edges though, such as a (very quiet) fan to keep it cool:

(Above: A Noctua NF-P14s redux-1200)

I found this particular fan on quietpc.com. While their prices and shipping are somewhat expensive (I didn't actually buy it from there - I got a better deal on Amazon instead), they are a great place to look into the different options available for really quiet fans. I'm pretty sensitive to noise, so having a quiet fan is an important part of my cluster design.

This one is the large 14cm model, so that it fits in front of all 5 Raspberry Pis if they are stood up on their sides and stacked horizontally. It takes 12 volts, so I'll be connecting it to the 12V rail from the ATX power supply. The fan speed is also controllable via PWM (pulse-width modulation), so I plan on using an Arduino (probably one of the Arduino Unos I've got lying around) to control it and present a serial interface or something to the Raspberry Pi that's acting as the controller node in the cluster.

Lastly, another extremely important part of any cluster is a solid switch. Without a great switch at the base of the network, you'll have all sorts of connection issues and the performance of the cluster will be degraded significantly. I'm anticipating that I'll want to transfer significant amounts of data around very quickly (e.g. Docker container images, and later large blocks of data during a storage cluster rebalance).

For this reason, I've bought myself a Netgear GS116v2. While its unmanaged, I can't currently afford a more expensive managed switch at this time. It is however gigabit and also has an array of other features such as energy efficient ethernet (802.3az), full duplex gigabit (i.e. 32GB bandwidth available to all ports, which is enough for all ports to be transmitting and receiving gigabit at the same time), and a silent fanless design.

My Netgear GS116v2

(Above: The switch I'll be using. I watched eBay and got it used for much less than it's available new)

Networking

Hardware isn't the only thing I've been thinking about. While I've been waiting for packages to arrive, I've also been planning out the software I'm going to use and how I'm going to network all my Pis together.

My plans on the networking side of things are subject to significant change depending on how many responsibilities I can convince my home router to give up, but I have drawn up a network diagram showing what I'm currently aiming towards:

An ideal-case scenario network diagram. Explained below.

The cluster is represented on the left half of the diagram. This will probably entail some considerable persuasion of my router to pull off, but a quick look reveals that it's (probably) possible with some trial-and-error.

The idea is that I have a separate subnet for the cluster than the rest of the home network. Then I can do strange stuff and fiddle with it (hopefully) without affecting everyone else on the network.

Software

Meanwhile, out of all the different aspects of building this cluster I've got the clearest picture as to the software I'm going to be using.

I've decided that I'm going to use a container-based system. I've looked at a number of different options (such as podman and Singularity) - but I'm currently of the opinion that Docker is the most suitable option for what I'm going for. It's not as enterprisey as Singularity, and it seems to be more mature than podman. It also has a huge library of prebuilt containers too - but for learning purposes I'm going to be writing almost all my container scripts from scratch - probably using some sort of Alpine Linux container as a base. If I ever run into a situation where Docker isn't suitable and I need something closer to a VM, I'll probably use LXC, which I believe sits on top of the same underlying container runtime that Docker does.

I'm anticipating that container-based tech is going to be great for managing the stuff that's running on my cluster - so you can expect more posts that go into some depth about how it all works and how I'm setting my system up in the future.

To complement my container-based tech, I'm also going to be using a workload orchestrator. The Viper High-Performance Computer I've recently gained access to has lots of nodes in it and uses Slurm for workload orchestration, but that seems more geared towards environments that have lots of jobs that each have a defined running time. Great for scientific simulations and other such things, but not so great for personal self-hosted applications and the like.

Instead, I'm probably going to use Nomad. It looks seriously cool, and an initial look at the documentation reveals that it's probably going to be much simpler easier to understand than Kubernetes (see also), which seems to be the other competing software in the business. It also seems to integrate well with other programs done by the same company (Hashicorp) like Consul for service networking management (I'm hoping I can get DNS resolution for the services running on the cluster under control with it) and Vault for secret management (e.g. API keys, passwords, and other miscellaneous secrets) - all of which I'm going to install and experiment with (expect more on that soon).

All of those for now will be backed by an NFS share on all nodes in the cluster for the persistent volumes attached to running containers.

On the controller node I mentioned earlier I'm also going to be running a few extra items to aid in the management of the cluster:

  • A Docker registry, from which the worker nodes will be pulling containers for execution (worker nodes will not have access to the public Docker registry at hub.docker.com)
  • An apt caching proxy - probably apt-cacher-ng. Since all the nodes in the cluster are going to be using the same OS, have the same packages installed, and the same configuration settings etc, it doesn't make much sense for them to be downloading apt packages from the Internet every time - so I'll be caching them locally on the controller node
  • Potentially some sort of reverse proxy that sits in front of all the services running on the cluster, but I haven't decided on how this will fit into the larger puzzle just yet (more research is required). I'm already very familiar with Nginx, but I've seen Traefik recommended for dynamic container-based setups, so I'm going to investigate that too.

That about covers my high-level design ideas. As of the time of typing, the next thing I need to do is organise a case for it all to go in, fix the loose connections in the screw terminals (not pictured; they arrived after I took the pictures), and then find a place to put it....

Cluster, Part 1: Answers only lead to more questions

At home, I have a Raspberry Pi 3B+ as a home file server. Lately though, I've been noticing that I've been starting to grow out of it (both in terms of compute capacity and storage) - so I've decided to get thinking early about what I can do about it.

I thought of 2 different options pretty quickly:

  • Build a 'proper' server
  • Build a cluster instead

While both of these options are perfectly viable and would serve my needs well, one of them is distinctly more interesting than the other - that being a cluster. While having a 'proper' server would be much simpler, perhaps slightly more power efficient (though I would need tests to confirm, since ARM - the CPU architecture I'm planning on using for the cluster - is more power efficient), and would put all my system resources on the same box, I like the idea of building a cluster for a number of reasons.

For one, I'll learn new skills setting it up and managing it. So far, I've been mostly managing my servers by hand. When you start to acquire a number of machines though, this quickly becomes unwieldy. I'd like to experiment with a containerisation technology (I'm not sure which one yet) and play around with distributing containers across hosts - and auto-restarting them on a different host if 1 host goes down. If this is decentralised, even better!

For another, having a single larger server is a single point of failure - which would be relatively expensive to replace. If I use lots of small machines instead, then if 1 dies then not only is it cheaper to replace, but it's also not as urgent since the other machines in the cluster can take over while I order a replacement.

Finally, having a cluster is just cool. Do we really need more of a reason than this?

With all this in mind, I've been thinking quite a bit about the architecture of such a cluster. I haven't bought anything yet (and probably won't for a while yet) - because as you may have guessed from the title of this post I've been running into a number of issues that all need researching.

First though let's talk about which machines I'm planning on using. I'm actually considering 2 clusters, to solve 2 different issues: compute and storage. Compute refers to running applications (e.g. Nextcloud etc), and storage refers to a distributed storage mechanism with multiple hosts - each with 1 drive attached - though I'm unsure about the storage cluster at this stage.

For the compute cluster, I'm leaning towards 4 x Raspberry Pi 4 with 4GiB of RAM each. For the storage cluster, I'm considering a number of different boards. 3 identical boards of 1 of the following:

I do seem to remember a board that had USB 3 onboard, which would be useful for connecting to the external drives. Currently the plan is to use a SATA to USB converter connect to internal HDDs (e.g. WD Greens) - but I have yet to find one that doesn't include the power connector or splits the power off into a separate USB cable (more on power later). This would be all be backed up by a Gigabit switch of some description (so the Rock Pi S is not a particularly attractive option, since it would be limited to 100MiB).

I've been using HackerBoards.com to discover different boards which may fit my project - but I'm not particularly satisfied with any of the options here so far. Specifically, I'm after Gigabit Ethernet and USB 3 on the same board if possible.

The next issue is software support. I've been bitten by this before, so I'm being extra cautious this time. I'm after a board that provides good software support, so that I can actually use all the hardware I've paid for.

The other thing relating to software that I'd like if possible is the ability to use a systemd-free operating system. Just like before, when I selected Manjaro OpenRC (which is now called Artix Linux), since I already have a number of systems using systemd I would like to balance it out a bit with some systems that use something else. I don't really mind if this is OpenRC, S6, or RunIt - just that it's something different to broaden my skill set.

Unfortunately, it's been a challenge to locate a distribution of Linux that both has broad support for ARM SoCs and does not use systemd. I suspect that I may have to give up on this, but I'm still holding out hope that there's a distribution out there that can do what I want - even if I have to prepare the system image myself (Alpine Linux looks potentially promising, but at the moment it's a huge challenge to figure out whether a chipset supported or not....). Either way, from my research it looks like having mainline Linux kernel support fro my board of choice is critically important to ensure continued support and updates (both feature and security) in the future.

Lastly, I also have power problems. Specifically, how to power the cluster. The big problem is that the Raspberry Pi 4 requires 3A of power max - instead the usual 2.4A in the 3B+ model. Of course, it won't be using this all the time, but it's apparently important that the ceiling of the power supply is 3A to avoid issues. Problem is, most multi-port chargers can barely provide 2A to connected devices - and I have not yet seen one that would provide 3A to 4+ devices and support additional peripherals such as hard drives and other supporting boards as described above.

To this end, I may end up having to build my own power supply from an old ATX supply that you can find in an old desktop PC. These can generally supply plenty of power (though it's always best to check) - but the problem here is that I'd need to do a ton of research to make sure that I wire it up correctly and safely, to avoid issues there too (I'm scared of blowing a fuse or electrocuting someone etc).

This concludes my first blog post on my cluster plans. It may be a while until the next one, as I have lots more research to do before I can continue. Suggestions and tips are welcomed in the comments below.

Art by Mythdael