Starbeamrainbowlabs

Stardust
Blog

Cluster, Part 7: Wrangling... boxes? | Expanding the Hashicorp stack with Docker and Nomad

Welcome back to part 7 of my cluster configuration series. Sorry this one's a bit late - the last one was a big undertaking, and I needed a bit of a rest :P

Anyway, I'm back at it with another part to my cluster series. For reference, here are all the posts in this series so far:

Don't forget that you can see all the latest posts in the cluster tag right here on my blog.

Last time, we lit the spark for the bonfire so to speak, that keeps track of what is running where. We also tied it into the internal DNS system that we setup in part 4, which will act as the binding fabric of our network.

In this post, we're going to be doing 4 very important things:

This is set to be another complex blog post that builds on the previous ones in this series (remember that benign rabbit hole from a few blog posts ago?).

Above: Nomad is a bit like a railway network manager. It decides what is going to run where and at what time. Picture taken by me.

Installing Docker

Let's install Docker first. This should be relatively easy. According to the official Docker documentation, you can install Docker like so:

curl https://get.docker.com/ | sudo sh

I don't like piping to sh though (and neither should you), so we're going to be doing something more akin to the "install using the repository". As a reminder, I'm using Raspberry Pi 4s running Raspbian (well, DietPi - but that's a minor detail). If you're using a different distribution or CPU architecture, you'll need to read the documentation to figure out the specifics of installing Docker for your architecture.

For Raspberry Pi 4s at least, it looks a bit like this:

echo 'deb [arch=armhf] https://download.docker.com/linux/raspbian buster stable' | sudo tee /etc/apt/sources.list.d/docker.list
sudo apt update
sudo apt install docker-ce

Don't forget that if you're running an apt caching server, you'll need to tweak that https to be plain-old http. For the curious, my automated script for my automated Ansible replacement (see the "A note about cluster management" in part 6) looks like this:

#!/usr/bin/env bash
RUN "echo 'deb [arch=armhf] http://download.docker.com/linux/raspbian buster stable' | sudo tee /etc/apt/sources.list.d/docker.list";
RUN "sudo apt-get update";
RUN "sudo apt-get install --yes docker-ce";

Docker should install without issue - note that you need to install it on all nodes in the cluster. We can't really do anything meaningful with it yet though, as we don't yet have Nomad installed. Let's move on and install that then!

Installing Hashicorp Nomad

Nomad is what's known as a workload orchestrator. This means that it, given a bunch of jobs, decides what is going to run where. If a host goes down, it is also responsible for shuffling things around to compensate.

Nomad works on the concept of 'jobs', which can be handled any any 1 of a number of drivers. In our case, we're going to be using the built-in Docker driver, as we want to manage the running of lots of Docker containers across multiple hosts in our cluster.

After installing Consul last time, we can build on that with Nomad. The 2 actually integrate really nicely with each other. Nomad will, by default, seek out a local Consul daemon, use it to discover other hosts in the cluster, and hang it's own cluster from Consul. Neat!

Also like Consul, Nomad functions with servers and clients. The servers all talk to each other via the Raft consensus algorithm, and the clients a lightweight daemons that do that the servers tell them to. I'm going to have 3 servers and 2 clients, in the following layout:

Host # Consul Nomad
1 Server Server
2 Server + Client Client
3 Server + Client Client
4 Client Server + Client
5 Client Server + Client

Just for the record, according to thee Nomad documentation it's not recommended that servers also act as clients, but I don't have enough hosts to avoid this yet.

With this in mind, let's install Nomad. Again, as last time, I've packaged Nomad in my at repository. If you haven't already, go and set it up now. Then, install nomad like so:

sudo apt install hashicorp-nomad

Also as last time, I've deliberately chosen a different name then the existing nomad package that you'll probably find in your distribution's repositories to avoid confusion during updates. If you're a systemd user, then I've also got a trio of packages that provide a systemd service file:

Package Name Config file location
hashicorp-nomad-systemd-server /etc/nomad/server.hcl
hashicorp-nomad-systemd-client /etc/nomad/client.hcl
hashicorp-nomad-systemd-both /etc/nomad/both.hcl

They all conflict with each other (such that you can only have 1 installed at a time), and the only difference between them is where the configuration file is located.

Install 1 of these (if required) now too with your package manager. If you're not a systemd user, consul your service manager's documentation and write a service definition. If you're willing, comment below and I'll include a note about it here!

Speaking of configuration files, we should write one for Nomad. Let's start off with the bits that will be common across all the config file variants:

bind_addr = "{{ GetInterfaceIP \"wgoverlay\" }}"

# Increase log verbosity
log_level = "INFO"

# Setup data dir
# The data directory used to store state and other persistent data. On client
# machines this is used to house allocation data such as downloaded artifacts
# used by drivers. On server nodes, the data dir is also used to store the
# replicated log.
data_dir = "/srv/nomad"

A few things to note here. log_level is mostly personal preference, so do whatever you like there. I'll probably tune it myself as I get more familiar with how everything works.

data_dir needs to be a path to a private root-owned directory on disk for the Nomad agent to store stuff locally to that node. It should not be shared with other nodes. If you installed one of the systemd packages above, /srv/nomad is created and properly permissed for you.

bind_addr tells Nomad which network interface to send management clustering traffic over. For me I'm using a WireGuard mesh VPN I setup in [part 4](), so I specify wgoverlay here.

Next, let's look at the server config:

# Enable the server
server {
    enabled = true

    # We've got 3 servers in the cluster at the moment
    bootstrap_expect = 3

    # Note that Nomad finds other servers automagically through the consul cluster

    # TODO: Enable this. Before we do we need to figure out how to move this sekret into vault though or something
    # encrypt = "SOME_VALUE_HERE"
}

Not much to see here. Don't forget to change the bootstrap_expect to a different value if you are going to have a different number of servers in your cluster (nodes that are just clients don't count).

Note that this isn't the complete server configuration file - you need to take both this and the above common bit to make the complete server configuration file.

Now, let's look at the client configuration:

client {
    enabled = true
    # Note that Nomad finds other servers automagically through the consul cluster

    # Just a worker, nothing special going on here
    node_class = "worker"

    # use wgoverlay for network fingerprinting and forwarding
    network_interface = "wgoverlay"

    # Nobody is allow to run as root - even if you *are* inside a container.....
    # For 1 thing it'll trigger a permission denied when writing to the NFS share
    options = {
        "user.blacklist" = "root"
    }
}

This is more interesting.

network_interface is really important if you're using a WireGuard mesh VPN like wesher that I setup and configured in part 4. By default, Nomad port forwards over all interfaces that make sense, and in this case gets it wrong.

This fixes that by telling it to explicitly port forward containers over the wgoverlay interface. If your network interface has a different name, this is the place to change it. It's a fairly common practice from what I can tell to have both a 'public' and a 'private' network in a cluster environment. The private network is usually trusted, and as such has lots of management traffic running over it. The public network is the one that's locked down that requests come in to from outside.

The "user.blacklist" = "root" here is a precaution that I may end up having to remove in future. It blocks any containers from running on this client from running as root inside the Docker container. This is actually worth remembering, because it's a bit of a security risk. This is a fail-safe to remind myself that it's a Bad Idea.

Apparently there are tactics that can be deployed to avoid running containers as root - even when you might think you need to. In addition, if there's no other way to avoid it, apparently there's a clever user namespace remapping trick one can deploy to avoid a container from having root privileges if it breaks out of it's container.

Another thing to note is that NFS shares often don't like you reading or writing files owned by root either, so if you're going to be saving data to a shared NFS partition like me, this is yet another reason to avoid root in your containers.

At this point it's also probably a good idea to talk a little bit about usernames - although we'll talk in more depth about this later. From my current understanding, the usernames inside a container aren't necessarily the same as those outside the container.

Every process runs under a specified username, but each username is backed by a given user id. It's this user id that is translated back into a username on the client machine when reading files from an NFS mount - hence why usernames in NFS shares can be somewhat odd.

Docker containers often have custom usernames created inside the containers for running processes inside the container with specific user ids. More on this later, but I plan to dive into this in the process of making my own Docker container images.

Anyway, we now have our configuration files for Nomad. For a client node, take the client config and the common config from the top of this section. For a server, take the server and common sections. For a node that's to act as both a client and a server, take all 3 sections.

Now that we've got that sorted, we should be able to start the Nomad agent:

sudo systemctl enable --now nomad.service

This is the same for all nodes in the cluster - regardless as to whether it's a client, a server, or both (this is also the reason that you can't have more than 1 of the systemd apt packages installed at once that I mentioned above).

If you're using the UFW firewall, then that will need configuring. For me, I'm allowing all traffic on the wgoverlay network interface that's acting as my trusted network:

sudo ufw allow in on wgoverlay

If you'd prefer not to do that, then you can allow only the specific ports through like so:

sudo ufw allow 4646/tcp comment nomad-http
sudo ufw allow 4647/tcp comment nomad-rpc
sudo ufw allow 4648/tcp comment nomad-serf

Note that this allows the traffic on all interfaces - these will need tweaking if you only want to allow the traffic in on a specific interface (which, depending on your setup, is probably a wise decision).

Anyway, you should now be able to ask the Nomad cluster for it's status like so:

nomad node status

...execute this from any server node in the cluster. It should give output like this:

ID        DC   Name         Class   Drain  Eligibility  Status
75188064  dc1  piano        worker  false  eligible     ready
9eb7a7a5  dc1  harpsicord   worker  false  eligible     ready
c0d23e71  dc1  saxophone    worker  false  eligible     ready
a837aaf4  dc1  violin       worker  false  eligible     ready

If you see this, you've successfully configured Nomad. Next, I recommend reading the Nomad tutorial and experimenting with some of the examples. In particular the Getting Started and Deploy and Manage Jobs topics are worth investigating.

Conclusion

In this post, we've installed Docker, and installed and configured Nomad. We've also touched briefly on some of the security considerations we need to be aware of when running things in Docker containers - much more on this in the future.

In future posts, we're going to look at setting up shared storage, so that jobs running on Nomad can be safely store state and execute on any client / worker node in the cluster while retaining access to said state information.

On the topic of Nomad, we're also going to look at running our first real job: a Docker registry, so that we can push our own custom Docker images to it when we've built them.

You may have noticed that both Nomad and Consul also come with a web interface. We're going to look at these too, but in order to do so we need a special container-aware reverse-proxy to act as a broker between 'cluster-space' (in which everything happens 'somewhere', and we don't really know nor do we particularly care where), and 'normal-network-space' (in which everything happens in clearly defined places).

I've actually been experiencing some issues with this, as I initially wanted to use Traefik for this purpose - but I ran into a number of serious difficulties with reading their (lack of) documentation. After getting thoroughly confused I'm now experimenting with Fabio (git repository) instead, which I'm getting on much better with. It's a shame really, I even got as far as writing the automated packaging script for Traefik - as evidenced by the traefik packages in my apt repository.

Until then though, happy cluster configuration! Feel free to post a comment below.

Found this interesting? Found a mistake? Confused about something? Comment below!

Sources and Further Reading

Tag Cloud

3d 3d printing account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio automation backups bash batch blender blog bookmarklet booting bug hunting c sharp c++ challenge chrome os cluster code codepen coding conundrums coding conundrums evolved command line compilers compiling compression conference conferences containerisation css dailyprogrammer data analysis debugging defining ai demystification distributed computing dns docker documentation downtime electronics email embedded systems encryption es6 features ethics event experiment external first impressions freeside future game github github gist gitlab graphics guide hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet interoperability io.js jabber jam javascript js bin labs latex learning library linux lora low level lua maintenance manjaro minetest network networking nibriboard node.js open source operating systems optimisation outreach own your code pepperminty wiki performance phd photos php pixelbot portable privacy problem solving programming problems project projects prolog protocol protocols pseudo 3d python reddit redis reference release releases rendering research resource review rust searching secrets security series list server software sorting source code control statistics storage svg systemquery talks technical terminal textures thoughts three thing game three.js tool tutorial twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 worldeditadditions xmpp xslt

Archive

Art by Mythdael