Cluster, Part 7: Wrangling... boxes? | Expanding the Hashicorp stack with Docker and Nomad
Welcome back to part 7 of my cluster configuration series. Sorry this one's a bit late - the last one was a big undertaking, and I needed a bit of a rest :P
Anyway, I'm back at it with another part to my cluster series. For reference, here are all the posts in this series so far:
- Cluster, Part 1: Answers only lead to more questions
- Cluster, Part 2: Grand Designs
- Cluster, Part 3: Laying groundwork with Unbound as a DNS server
- Cluster, Part 4: Weaving Wormholes | Peer-to-Peer VPN with WireGuard
- Cluster, Part 5: Staying current | Automating apt updates and using apt-cacher-ng
- Cluster, Part 6: Superglue Service Discovery | Setting up Consul
Don't forget that you can see all the latest posts in the cluster tag right here on my blog.
Last time, we lit the spark for the bonfire so to speak, that keeps track of what is running where. We also tied it into the internal DNS system that we setup in part 4, which will act as the binding fabric of our network.
In this post, we're going to be doing 4 very important things:
- Installing Docker
- Installing and configuring Hashicorp Nomad
This is set to be another complex blog post that builds on the previous ones in this series (remember that benign rabbit hole from a few blog posts ago?).
Above: Nomad is a bit like a railway network manager. It decides what is going to run where and at what time. Picture taken by me.
Installing Docker
Let's install Docker first. This should be relatively easy. According to the official Docker documentation, you can install Docker like so:
curl https://get.docker.com/ | sudo sh
I don't like piping to sh
though (and neither should you), so we're going to be doing something more akin to the "install using the repository". As a reminder, I'm using Raspberry Pi 4s running Raspbian (well, DietPi - but that's a minor detail). If you're using a different distribution or CPU architecture, you'll need to read the documentation to figure out the specifics of installing Docker for your architecture.
For Raspberry Pi 4s at least, it looks a bit like this:
echo 'deb [arch=armhf] https://download.docker.com/linux/raspbian buster stable' | sudo tee /etc/apt/sources.list.d/docker.list
sudo apt update
sudo apt install docker-ce
Don't forget that if you're running an apt caching server, you'll need to tweak that https
to be plain-old http
. For the curious, my automated script for my automated Ansible replacement (see the "A note about cluster management" in part 6) looks like this:
#!/usr/bin/env bash
RUN "echo 'deb [arch=armhf] http://download.docker.com/linux/raspbian buster stable' | sudo tee /etc/apt/sources.list.d/docker.list";
RUN "sudo apt-get update";
RUN "sudo apt-get install --yes docker-ce";
Docker should install without issue - note that you need to install it on all nodes in the cluster. We can't really do anything meaningful with it yet though, as we don't yet have Nomad installed. Let's move on and install that then!
Installing Hashicorp Nomad
Nomad is what's known as a workload orchestrator. This means that it, given a bunch of jobs, decides what is going to run where. If a host goes down, it is also responsible for shuffling things around to compensate.
Nomad works on the concept of 'jobs', which can be handled any any 1 of a number of drivers. In our case, we're going to be using the built-in Docker driver, as we want to manage the running of lots of Docker containers across multiple hosts in our cluster.
After installing Consul last time, we can build on that with Nomad. The 2 actually integrate really nicely with each other. Nomad will, by default, seek out a local Consul daemon, use it to discover other hosts in the cluster, and hang it's own cluster from Consul. Neat!
Also like Consul, Nomad functions with servers and clients. The servers all talk to each other via the Raft consensus algorithm, and the clients a lightweight daemons that do that the servers tell them to. I'm going to have 3 servers and 2 clients, in the following layout:
Host # | Consul | Nomad |
---|---|---|
1 | Server | Server |
2 | Server + Client | Client |
3 | Server + Client | Client |
4 | Client | Server + Client |
5 | Client | Server + Client |
Just for the record, according to thee Nomad documentation it's not recommended that servers also act as clients, but I don't have enough hosts to avoid this yet.
With this in mind, let's install Nomad. Again, as last time, I've packaged Nomad in my at repository. If you haven't already, go and set it up now. Then, install nomad like so:
sudo apt install hashicorp-nomad
Also as last time, I've deliberately chosen a different name then the existing nomad
package that you'll probably find in your distribution's repositories to avoid confusion during updates. If you're a systemd user, then I've also got a trio of packages that provide a systemd service file:
Package Name | Config file location |
---|---|
hashicorp-nomad-systemd-server |
/etc/nomad/server.hcl |
hashicorp-nomad-systemd-client |
/etc/nomad/client.hcl |
hashicorp-nomad-systemd-both |
/etc/nomad/both.hcl |
They all conflict with each other (such that you can only have 1 installed at a time), and the only difference between them is where the configuration file is located.
Install 1 of these (if required) now too with your package manager. If you're not a systemd user, consul your service manager's documentation and write a service definition. If you're willing, comment below and I'll include a note about it here!
Speaking of configuration files, we should write one for Nomad. Let's start off with the bits that will be common across all the config file variants:
bind_addr = "{{ GetInterfaceIP \"wgoverlay\" }}"
# Increase log verbosity
log_level = "INFO"
# Setup data dir
# The data directory used to store state and other persistent data. On client
# machines this is used to house allocation data such as downloaded artifacts
# used by drivers. On server nodes, the data dir is also used to store the
# replicated log.
data_dir = "/srv/nomad"
A few things to note here. log_level
is mostly personal preference, so do whatever you like there. I'll probably tune it myself as I get more familiar with how everything works.
data_dir
needs to be a path to a private root-owned directory on disk for the Nomad agent to store stuff locally to that node. It should not be shared with other nodes. If you installed one of the systemd packages above, /srv/nomad
is created and properly permissed for you.
bind_addr
tells Nomad which network interface to send management clustering traffic over. For me I'm using a WireGuard mesh VPN I setup in [part 4](), so I specify wgoverlay
here.
Next, let's look at the server config:
# Enable the server
server {
enabled = true
# We've got 3 servers in the cluster at the moment
bootstrap_expect = 3
# Note that Nomad finds other servers automagically through the consul cluster
# TODO: Enable this. Before we do we need to figure out how to move this sekret into vault though or something
# encrypt = "SOME_VALUE_HERE"
}
Not much to see here. Don't forget to change the bootstrap_expect
to a different value if you are going to have a different number of servers in your cluster (nodes that are just clients don't count).
Note that this isn't the complete server configuration file - you need to take both this and the above common bit to make the complete server configuration file.
Now, let's look at the client configuration:
client {
enabled = true
# Note that Nomad finds other servers automagically through the consul cluster
# Just a worker, nothing special going on here
node_class = "worker"
# use wgoverlay for network fingerprinting and forwarding
network_interface = "wgoverlay"
# Nobody is allow to run as root - even if you *are* inside a container.....
# For 1 thing it'll trigger a permission denied when writing to the NFS share
options = {
"user.blacklist" = "root"
}
}
This is more interesting.
network_interface
is really important if you're using a WireGuard mesh VPN like wesher that I setup and configured in part 4. By default, Nomad port forwards over all interfaces that make sense, and in this case gets it wrong.
This fixes that by telling it to explicitly port forward containers over the wgoverlay
interface. If your network interface has a different name, this is the place to change it. It's a fairly common practice from what I can tell to have both a 'public' and a 'private' network in a cluster environment. The private network is usually trusted, and as such has lots of management traffic running over it. The public network is the one that's locked down that requests come in to from outside.
The "user.blacklist" = "root"
here is a precaution that I may end up having to remove in future. It blocks any containers from running on this client from running as root inside the Docker container. This is actually worth remembering, because it's a bit of a security risk. This is a fail-safe to remind myself that it's a Bad Idea.
Apparently there are tactics that can be deployed to avoid running containers as root - even when you might think you need to. In addition, if there's no other way to avoid it, apparently there's a clever user namespace remapping trick one can deploy to avoid a container from having root privileges if it breaks out of it's container.
Another thing to note is that NFS shares often don't like you reading or writing files owned by root
either, so if you're going to be saving data to a shared NFS partition like me, this is yet another reason to avoid root
in your containers.
At this point it's also probably a good idea to talk a little bit about usernames - although we'll talk in more depth about this later. From my current understanding, the usernames inside a container aren't necessarily the same as those outside the container.
Every process runs under a specified username, but each username is backed by a given user id. It's this user id that is translated back into a username on the client machine when reading files from an NFS mount - hence why usernames in NFS shares can be somewhat odd.
Docker containers often have custom usernames created inside the containers for running processes inside the container with specific user ids. More on this later, but I plan to dive into this in the process of making my own Docker container images.
Anyway, we now have our configuration files for Nomad. For a client node, take the client config and the common config from the top of this section. For a server, take the server and common sections. For a node that's to act as both a client and a server, take all 3 sections.
Now that we've got that sorted, we should be able to start the Nomad agent:
sudo systemctl enable --now nomad.service
This is the same for all nodes in the cluster - regardless as to whether it's a client, a server, or both (this is also the reason that you can't have more than 1 of the systemd apt packages installed at once that I mentioned above).
If you're using the UFW firewall, then that will need configuring. For me, I'm allowing all traffic on the wgoverlay
network interface that's acting as my trusted network:
sudo ufw allow in on wgoverlay
If you'd prefer not to do that, then you can allow only the specific ports through like so:
sudo ufw allow 4646/tcp comment nomad-http
sudo ufw allow 4647/tcp comment nomad-rpc
sudo ufw allow 4648/tcp comment nomad-serf
Note that this allows the traffic on all interfaces - these will need tweaking if you only want to allow the traffic in on a specific interface (which, depending on your setup, is probably a wise decision).
Anyway, you should now be able to ask the Nomad cluster for it's status like so:
nomad node status
...execute this from any server node in the cluster. It should give output like this:
ID DC Name Class Drain Eligibility Status
75188064 dc1 piano worker false eligible ready
9eb7a7a5 dc1 harpsicord worker false eligible ready
c0d23e71 dc1 saxophone worker false eligible ready
a837aaf4 dc1 violin worker false eligible ready
If you see this, you've successfully configured Nomad. Next, I recommend reading the Nomad tutorial and experimenting with some of the examples. In particular the Getting Started and Deploy and Manage Jobs topics are worth investigating.
Conclusion
In this post, we've installed Docker, and installed and configured Nomad. We've also touched briefly on some of the security considerations we need to be aware of when running things in Docker containers - much more on this in the future.
In future posts, we're going to look at setting up shared storage, so that jobs running on Nomad can be safely store state and execute on any client / worker node in the cluster while retaining access to said state information.
On the topic of Nomad, we're also going to look at running our first real job: a Docker registry, so that we can push our own custom Docker images to it when we've built them.
You may have noticed that both Nomad and Consul also come with a web interface. We're going to look at these too, but in order to do so we need a special container-aware reverse-proxy to act as a broker between 'cluster-space' (in which everything happens 'somewhere', and we don't really know nor do we particularly care where), and 'normal-network-space' (in which everything happens in clearly defined places).
I've actually been experiencing some issues with this, as I initially wanted to use Traefik for this purpose - but I ran into a number of serious difficulties with reading their (lack of) documentation. After getting thoroughly confused I'm now experimenting with Fabio (git repository) instead, which I'm getting on much better with. It's a shame really, I even got as far as writing the automated packaging script for Traefik - as evidenced by the traefik
packages in my apt repository.
Until then though, happy cluster configuration! Feel free to post a comment below.
Found this interesting? Found a mistake? Confused about something? Comment below!