Stardust | Starbeamrainbowlabs

Cluster, Part 8: The Shoulders of Giants | NFS, Nomad, Docker Registry

Welcome back! It's been a bit of a while, but now I'm back with the next part of my cluster series. As a refresher, here's a list of all the parts in the series so far:

In this one, we're going to look at running our first job on our Nomad cluster! If you haven't read the previous posts in this series, you'll probably want to go back and read them now, as we're going to be building on the infrastructure we've setup and the groundwork we've laid in the previous posts in this series.

Before we get to that though, we need to sort out shared storage - as we don't know which node in the cluster tasks will be running on. In my case, I'll be setting up NFS. This is hardly the only solution to the issue though - other options include:

Gluster
Ceph

If you're going to choose NFS like me though, you should be warned that it's neither encrypted not authenticated. You should ensure that NFS is only run on a trusted network. If you don't have a trusted network, use the WireGuard Mesh VPN trick in part 4 of this series.

NFS: Server

Setting up a server is relatively easy. Simply install the relevant package:

sudo apt install nfs-kernel-server

....edit /etc/exports to look something like this:

/mnt/somedrive/subdirectory 10.1.2.0/24(rw,async,no_subtree_check)

/mnt/somedrive/subdirectory is the directory you'd like clients to be able to access, and 10.1.2.0/24 is the IP range that should be allowed to talk to your NFS server.

Next, open up the relevant ports in your firewall (I use UFW):

sudo ufw allow nfs

....and you're done! Pretty easy, right? Don't worry, it'll get harder later on :P

NFS: Client

The client, in theory, is relatively straightforward too. This must be done on all nodes in the cluster - except the node that's acting as the NFS server (although having the NFS server as a regular node in the cluster is probably a bad idea). First, install the relevant package:

sudo apt install nfs-common

Then, update /etc/fstab and add the following line:

10.1.2.10:/mnt/somedrive/subdirectory   /mnt/shared nfs auto,nofail,noatime,intr,tcp,bg,_netdev 0   0

Again, 10.1.2.10 is the IP of the NFS server, and /mnt/somedrive/subdirectory must match the directory exported by the server. Finally, /mnt/shared is the location that we're going to mount the directory from the NFS server to. Speaking of, we should create that directory:

sudo mkdir /mnt/shared

I have yet to properly tune the options there on both the client and the server. If I find that I have to change anything here, I'll both come back and edit this and mention it in a future post that I did.

From here, you should be able to mount the NFS share like so:

sudo mount /mnt/shared

You should see the files from the NFS server located in /mnt/shared. You should check to make sure that this auto-mounts it on boot too (that's what the auto and _netdev are supposed to do).

If you experience issues on boot (like me), you might see something like this buried in /var/log/syslog:

mount[586]: mount.nfs: Network is unreachable

....then we can quickly hack this by creating a script in the directory /etc/network/if-up.d. It should read something like this should fix the issue:

#!/usr/bin/env bash
mount /mnt/shared

Save this to /etc/network/if-up.d/cluster-shared-nfs for example, not forgetting to mark it as executable:

sudo chmod +x /etc/network/if-up.d/cluster-shared-nfs

Alternatively, there's autofs that can do this more intelligently if you prefer.

First Nomad Job: Docker Registry

Now that we've got shared storage online, it's time for the big moment. We're finally going to start our very first job on our Nomad cluster!

It's going to be a Docker registry, and in my very specific case I'm going to be marking it as insecure (gasp!) because it's only going to be accessible from the WireGuard VPN - which I figure provides the encryption and authentication for us to get started reasonably simply without jumping through too many hoops. In the future, I'll probably revisit this in a later post to tighten things up.

Tasks on a Nomad cluster take the form of a Nomad job file. These can written in JSON or HCL (Hashicorp Configuration Language). I'll be using HCL here, because it's easier to read and we're not after machine legibility yet at this stage.

Nomad job files work a little bit like Nginx config files, in that they have nested sequences of blocks in a hierarchical structure. They loosely follow the following pattern:

job > group > task

The job is the top-level block that contains everything else. tasks are the items that actually run on the cluster - e.g. a Docker container. groups are a way to logically group tasks in a job, and are not required as far as I can tell (but we'll use one here anyway just for illustrative purposes). Let's start with the job spec:

job "registry" {
    datacenters = ["dc1"]
    # The Docker registry *is* pretty important....
    priority = 80

    # If this task was a regular task, we'd use a constraint here instead & set the weight to -100
    affinity {
        attribute   = "${attr.class}"
        value       = "controller"
        weight      = 100
    }

    # .....

}

This defines a new job called registry, and it should be pretty straight forward. We don't need to worry about the datacenters definition there, because we've only got the 1 (so far?). We set a priority of 80, and get the job to prefer running on nodes with the controller class (though I observe that this hasn't actually made much of a difference to Nomad's scheduling algorithm at all).

Let's move on to the real meat of the job file: the task definition!

group "main" {
    task "registry" {
        driver = "docker"

        config {
            image = "registry:2"
            labels { group = "registry" }

            volumes = [
                "/mnt/shared/registry:/var/lib/registry"
            ]

            port_map {
                registry = 5000
            }
        }

        resources {
            network {
                port "registry" {
                    static = 5000
                }
            }
        }

        # .......
    }
}

There's quite a bit to unpack here. The task itself uses the Docker driver, which tells Nomad to run a Docker container.

In the config block, we define the Docker driver-specific settings. The docker image we're going to run is registry:2 where registry is the image name, and 2 is the tag. This will to automatically pulled from the Docker hub. Future tasks will pull docker images from our very own private Docker registry, which we're in the process of setting up :D

We also mount a directory into the Docker container to allow it to persist the images that we push to it. This is done through a volume, which is the Docker word for bind-mounting a specific directory on the host system into a given location inside the guest container. For me I'm (currently) going to store the Docker registry data at /mnt/shared/registry - you should update this if you want to store it elsewhere. Remember this this needs to be a location on your shared storage, as we don't know which node in the cluster the Docker registry is going to run on in advance.

The port_map allows us to tell Nomad the port(s) that our service inside the Docker container listens on, and attach a logical name to them. We can then expose them in the resources block. In this specific case, I'm forcing Nomad to statically allocate port 5000 on the host system to point to port 5000 inside the container, for reasons that will become apparent later. This is done with the static keyword there. If we didn't do this, Nomad would allocate a random port number (which is normally what we'd want, because then we can run lots of copies of the same thing at the same time on the same host).

The last block we need to add to complete the job spec file is the service block. with a service block, Nomad will inform Consul that a new service is running, which will then in turn allow us to query it via DNS.

service {
    name = "${TASK}"
    tags = [ "infrastructure" ]

    address_mode = "host"
    port = "registry"
    check {
        type        = "tcp"
        port        = "registry"
        interval    = "10s"
        timeout     = "3s"
    }

}

The service name here is pulled from the name of the task. We tell Consul about the port number by specifying the logical name we assigned to it earlier.

Finally, we add a health check, to allow Consul to keep an eye on the health of our Docker registry for us. This will appear as a green tick if all is well in the web interface, which we'll be getting to in a future post. The health check in question simply ensures that the Docker registry is listening via TCP on the port it should be.

Here's the completed job file:

job "registry" {
    datacenters = ["dc1"]
    # The Docker registry *is* pretty important....
    priority = 80

    # If this task was a regular task, we'd use a constraint here instead & set the weight to -100
    affinity {
        attribute   = "${attr.class}"
        value       = "controller"
        weight      = 100
    }

    group "main" {

        task "registry" {
            driver = "docker"

            config {
                image = "registry:2"
                labels { group = "registry" }

                volumes = [
                    "/mnt/shared/registry:/var/lib/registry"
                ]

                port_map {
                    registry = 5000
                }
            }

            resources {
                network {
                    port "registry" {
                        static = 5000
                    }
                }
            }

            service {
                name = "${TASK}"
                tags = [ "infrastructure" ]

                address_mode = "host"
                port = "registry"
                check {
                    type        = "tcp"
                    port        = "registry"
                    interval    = "10s"
                    timeout     = "3s"
                }

            }
        }

        // task "registry-web" {
        //  driver = "docker"
        // 
        //  config {
        //      // We're going to have to build our own - the Docker image on the Docker Hub is amd64 only :-/
        //      // See https://github.com/Joxit/docker-registry-ui
        //      image = ""
        //  }
        // }
    }
}

Save this to a file, and then run it on the cluster like so:

nomad job run path/to/job/file.nomad

I'm as of yet unsure as to whether Nomad needs the file to persist on disk to avoid it getting confused - so it's probably best to keep your job files in a permanent place on disk to avoid issues.

Give Nomad to start the job, and then you can check on it's status like so:

nomad job status

This will print a summary of the status of all jobs on the cluster. To get detailed information about our new job, do this:

nomad job status registry

It should show that 1 task is running, like this:

ID            = registry
Name          = registry
Submit Date   = 2020-04-26T01:23:37+01:00
Type          = service
Priority      = 80
Datacenters   = dc1
Namespace     = default
Status        = running
Periodic      = false
Parameterized = false

Summary
Task Group  Queued  Starting  Running  Failed  Complete  Lost
main        0       0         1        5       6         1

Latest Deployment
ID          = ZZZZZZZZ
Status      = successful
Description = Deployment completed successfully

Deployed
Task Group  Desired  Placed  Healthy  Unhealthy  Progress Deadline
main        1        1       1        0          2020-06-17T22:03:58+01:00

Allocations
ID        Node ID   Task Group  Version  Desired  Status   Created   Modified
XXXXXXXX  YYYYYYYY  main        4        run      running  6d2h ago  2d23h ago

Ignore the Failed, Complete, and Lost there in my output - I ran into some snags while learning the system and setting mine up :P

You should also be able to resolve the IP of your Docker registry via DNS:

dig +short registry.service.mooncarrot.space

mooncarrot.space is the root domain I've bought for my cluster. I highly recommend you do the same if you haven't already. Consul exposes all services under the service subdomain, so in the future you should be able to resolve the IP of all your services in the same way: service_name.service.DOMAIN_ROOT.

Take care to ensure that it's showing the right IP address here. In my case, it should be the IP address of the wgoverlay network interface. If it's showing the wrong IP address, you may need to carefully check the configuration of both Nomad and Consul. Specifically, start by checking the network_interface setting in the client block of your Nomad worker nodes from part 7 of this series.

Conclusion

We're getting there, slowly but surely. Today we've setup shared storage with NFS, and started our first Nomad job. In doing so, we've started to kick the tyres of everything we've installed so far:

wesher, our WireGuard Mesh VPN
Unbound, our DNS server
Consul, our service discovery superglue
Nomad, our task scheduler

Truly, we are standing on the shoulders of giants: a whole host of open-source software that thousands of people from across the globe have collaborated together to produce which makes this all possible.

Moving forwards, we're going to be putting that Docker registry to good use. More immediately, we're going to be setting up Fabio (who's documentation is only marginally better than Traefik's, but just good enough that I could figure out how to use it....) in order to take a peek at those cool web interfaces for Nomad and Consul that I keep talking about.

We're also going to be looking at setting up Vault for secret (and certificate, if all goes well) management.

Until then, happy cluster configuration! If you're confused about anything so far, please leave a comment below. If you've got a suggestion to make it even better, please comment also! I'd love to know.

Sources and further reading

Alternatives to NFS
- Gluster
- Ceph
/etc/network/if-up.d/ NFS automount trick
autofs
Docker Registry
- Deploy a registry server
Nomad
- Docker driver

Cluster, Part 7: Wrangling... boxes? | Expanding the Hashicorp stack with Docker and Nomad

Welcome back to part 7 of my cluster configuration series. Sorry this one's a bit late - the last one was a big undertaking, and I needed a bit of a rest :P

Anyway, I'm back at it with another part to my cluster series. For reference, here are all the posts in this series so far:

Don't forget that you can see all the latest posts in the cluster tag right here on my blog.

Last time, we lit the spark for the bonfire so to speak, that keeps track of what is running where. We also tied it into the internal DNS system that we setup in part 4, which will act as the binding fabric of our network.

In this post, we're going to be doing 4 very important things:

Installing Docker
Installing and configuring Hashicorp Nomad

This is set to be another complex blog post that builds on the previous ones in this series (remember that benign rabbit hole from a few blog posts ago?).

Above: Nomad is a bit like a railway network manager. It decides what is going to run where and at what time. Picture taken by me.

Installing Docker

Let's install Docker first. This should be relatively easy. According to the official Docker documentation, you can install Docker like so:

curl https://get.docker.com/ | sudo sh

I don't like piping to sh though (and neither should you), so we're going to be doing something more akin to the "install using the repository". As a reminder, I'm using Raspberry Pi 4s running Raspbian (well, DietPi - but that's a minor detail). If you're using a different distribution or CPU architecture, you'll need to read the documentation to figure out the specifics of installing Docker for your architecture.

For Raspberry Pi 4s at least, it looks a bit like this:

echo 'deb [arch=armhf] https://download.docker.com/linux/raspbian buster stable' | sudo tee /etc/apt/sources.list.d/docker.list
sudo apt update
sudo apt install docker-ce

Don't forget that if you're running an apt caching server, you'll need to tweak that https to be plain-old http. For the curious, my automated script for my automated Ansible replacement (see the "A note about cluster management" in part 6) looks like this:

#!/usr/bin/env bash
RUN "echo 'deb [arch=armhf] http://download.docker.com/linux/raspbian buster stable' | sudo tee /etc/apt/sources.list.d/docker.list";
RUN "sudo apt-get update";
RUN "sudo apt-get install --yes docker-ce";

Docker should install without issue - note that you need to install it on all nodes in the cluster. We can't really do anything meaningful with it yet though, as we don't yet have Nomad installed. Let's move on and install that then!

Installing Hashicorp Nomad

Nomad is what's known as a workload orchestrator. This means that it, given a bunch of jobs, decides what is going to run where. If a host goes down, it is also responsible for shuffling things around to compensate.

Nomad works on the concept of 'jobs', which can be handled any any 1 of a number of drivers. In our case, we're going to be using the built-in Docker driver, as we want to manage the running of lots of Docker containers across multiple hosts in our cluster.

After installing Consul last time, we can build on that with Nomad. The 2 actually integrate really nicely with each other. Nomad will, by default, seek out a local Consul daemon, use it to discover other hosts in the cluster, and hang it's own cluster from Consul. Neat!

Also like Consul, Nomad functions with servers and clients. The servers all talk to each other via the Raft consensus algorithm, and the clients a lightweight daemons that do that the servers tell them to. I'm going to have 3 servers and 2 clients, in the following layout:

Host #	Consul	Nomad
1	Server	Server
2	Server + Client	Client
3	Server + Client	Client
4	Client	Server + Client
5	Client	Server + Client

Just for the record, according to thee Nomad documentation it's not recommended that servers also act as clients, but I don't have enough hosts to avoid this yet.

With this in mind, let's install Nomad. Again, as last time, I've packaged Nomad in my at repository. If you haven't already, go and set it up now. Then, install nomad like so:

sudo apt install hashicorp-nomad

Also as last time, I've deliberately chosen a different name then the existing nomad package that you'll probably find in your distribution's repositories to avoid confusion during updates. If you're a systemd user, then I've also got a trio of packages that provide a systemd service file:

Package Name	Config file location
`hashicorp-nomad-systemd-server`	`/etc/nomad/server.hcl`
`hashicorp-nomad-systemd-client`	`/etc/nomad/client.hcl`
`hashicorp-nomad-systemd-both`	`/etc/nomad/both.hcl`

They all conflict with each other (such that you can only have 1 installed at a time), and the only difference between them is where the configuration file is located.

Install 1 of these (if required) now too with your package manager. If you're not a systemd user, consul your service manager's documentation and write a service definition. If you're willing, comment below and I'll include a note about it here!

Speaking of configuration files, we should write one for Nomad. Let's start off with the bits that will be common across all the config file variants:

bind_addr = "{{ GetInterfaceIP \"wgoverlay\" }}"

# Increase log verbosity
log_level = "INFO"

# Setup data dir
# The data directory used to store state and other persistent data. On client
# machines this is used to house allocation data such as downloaded artifacts
# used by drivers. On server nodes, the data dir is also used to store the
# replicated log.
data_dir = "/srv/nomad"

A few things to note here. log_level is mostly personal preference, so do whatever you like there. I'll probably tune it myself as I get more familiar with how everything works.

data_dir needs to be a path to a private root-owned directory on disk for the Nomad agent to store stuff locally to that node. It should not be shared with other nodes. If you installed one of the systemd packages above, /srv/nomad is created and properly permissed for you.

bind_addr tells Nomad which network interface to send management clustering traffic over. For me I'm using a WireGuard mesh VPN I setup in [part 4](), so I specify wgoverlay here.

Next, let's look at the server config:

# Enable the server
server {
    enabled = true

    # We've got 3 servers in the cluster at the moment
    bootstrap_expect = 3

    # Note that Nomad finds other servers automagically through the consul cluster

    # TODO: Enable this. Before we do we need to figure out how to move this sekret into vault though or something
    # encrypt = "SOME_VALUE_HERE"
}

Not much to see here. Don't forget to change the bootstrap_expect to a different value if you are going to have a different number of servers in your cluster (nodes that are just clients don't count).

Note that this isn't the complete server configuration file - you need to take both this and the above common bit to make the complete server configuration file.

Now, let's look at the client configuration:

client {
    enabled = true
    # Note that Nomad finds other servers automagically through the consul cluster

    # Just a worker, nothing special going on here
    node_class = "worker"

    # use wgoverlay for network fingerprinting and forwarding
    network_interface = "wgoverlay"

    # Nobody is allow to run as root - even if you *are* inside a container.....
    # For 1 thing it'll trigger a permission denied when writing to the NFS share
    options = {
        "user.blacklist" = "root"
    }
}

This is more interesting.

network_interface is really important if you're using a WireGuard mesh VPN like wesher that I setup and configured in part 4. By default, Nomad port forwards over all interfaces that make sense, and in this case gets it wrong.

This fixes that by telling it to explicitly port forward containers over the wgoverlay interface. If your network interface has a different name, this is the place to change it. It's a fairly common practice from what I can tell to have both a 'public' and a 'private' network in a cluster environment. The private network is usually trusted, and as such has lots of management traffic running over it. The public network is the one that's locked down that requests come in to from outside.

The "user.blacklist" = "root" here is a precaution that I may end up having to remove in future. It blocks any containers from running on this client from running as root inside the Docker container. This is actually worth remembering, because it's a bit of a security risk. This is a fail-safe to remind myself that it's a Bad Idea.

Apparently there are tactics that can be deployed to avoid running containers as root - even when you might think you need to. In addition, if there's no other way to avoid it, apparently there's a clever user namespace remapping trick one can deploy to avoid a container from having root privileges if it breaks out of it's container.

Another thing to note is that NFS shares often don't like you reading or writing files owned by root either, so if you're going to be saving data to a shared NFS partition like me, this is yet another reason to avoid root in your containers.

At this point it's also probably a good idea to talk a little bit about usernames - although we'll talk in more depth about this later. From my current understanding, the usernames inside a container aren't necessarily the same as those outside the container.

Every process runs under a specified username, but each username is backed by a given user id. It's this user id that is translated back into a username on the client machine when reading files from an NFS mount - hence why usernames in NFS shares can be somewhat odd.

Docker containers often have custom usernames created inside the containers for running processes inside the container with specific user ids. More on this later, but I plan to dive into this in the process of making my own Docker container images.

Anyway, we now have our configuration files for Nomad. For a client node, take the client config and the common config from the top of this section. For a server, take the server and common sections. For a node that's to act as both a client and a server, take all 3 sections.

Now that we've got that sorted, we should be able to start the Nomad agent:

sudo systemctl enable --now nomad.service

This is the same for all nodes in the cluster - regardless as to whether it's a client, a server, or both (this is also the reason that you can't have more than 1 of the systemd apt packages installed at once that I mentioned above).

If you're using the UFW firewall, then that will need configuring. For me, I'm allowing all traffic on the wgoverlay network interface that's acting as my trusted network:

sudo ufw allow in on wgoverlay

If you'd prefer not to do that, then you can allow only the specific ports through like so:

sudo ufw allow 4646/tcp comment nomad-http
sudo ufw allow 4647/tcp comment nomad-rpc
sudo ufw allow 4648/tcp comment nomad-serf

Note that this allows the traffic on all interfaces - these will need tweaking if you only want to allow the traffic in on a specific interface (which, depending on your setup, is probably a wise decision).

Anyway, you should now be able to ask the Nomad cluster for it's status like so:

nomad node status

...execute this from any server node in the cluster. It should give output like this:

ID        DC   Name         Class   Drain  Eligibility  Status
75188064  dc1  piano        worker  false  eligible     ready
9eb7a7a5  dc1  harpsicord   worker  false  eligible     ready
c0d23e71  dc1  saxophone    worker  false  eligible     ready
a837aaf4  dc1  violin       worker  false  eligible     ready

If you see this, you've successfully configured Nomad. Next, I recommend reading the Nomad tutorial and experimenting with some of the examples. In particular the Getting Started and Deploy and Manage Jobs topics are worth investigating.

Conclusion

In this post, we've installed Docker, and installed and configured Nomad. We've also touched briefly on some of the security considerations we need to be aware of when running things in Docker containers - much more on this in the future.

In future posts, we're going to look at setting up shared storage, so that jobs running on Nomad can be safely store state and execute on any client / worker node in the cluster while retaining access to said state information.

On the topic of Nomad, we're also going to look at running our first real job: a Docker registry, so that we can push our own custom Docker images to it when we've built them.

You may have noticed that both Nomad and Consul also come with a web interface. We're going to look at these too, but in order to do so we need a special container-aware reverse-proxy to act as a broker between 'cluster-space' (in which everything happens 'somewhere', and we don't really know nor do we particularly care where), and 'normal-network-space' (in which everything happens in clearly defined places).

I've actually been experiencing some issues with this, as I initially wanted to use Traefik for this purpose - but I ran into a number of serious difficulties with reading their (lack of) documentation. After getting thoroughly confused I'm now experimenting with Fabio (git repository) instead, which I'm getting on much better with. It's a shame really, I even got as far as writing the automated packaging script for Traefik - as evidenced by the traefik packages in my apt repository.

Until then though, happy cluster configuration! Feel free to post a comment below.

Found this interesting? Found a mistake? Confused about something? Comment below!

Stardust
Blog

Cluster, Part 8: The Shoulders of Giants | NFS, Nomad, Docker Registry

NFS: Server

NFS: Client

First Nomad Job: Docker Registry

Conclusion

Sources and further reading

Cluster, Part 7: Wrangling... boxes? | Expanding the Hashicorp stack with Docker and Nomad

Installing Docker

Installing Hashicorp Nomad

Conclusion

Sources and Further Reading

Stardust Blog

Tag Cloud

Cluster, Part 8: The Shoulders of Giants | NFS, Nomad, Docker Registry

NFS: Server

NFS: Client

First Nomad Job: Docker Registry

Conclusion

Sources and further reading

Cluster, Part 7: Wrangling... boxes? | Expanding the Hashicorp stack with Docker and Nomad

Installing Docker

Installing Hashicorp Nomad

Conclusion

Sources and Further Reading

Stardust
Blog