Starbeamrainbowlabs

Stardust
Blog


Archive


Mailing List Articles Atom Feed Comments Atom Feed Twitter Reddit Facebook

Tag Cloud

3d 3d printing account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio automation backups bash batch blender blog bookmarklet booting bug hunting c sharp c++ challenge chrome os cluster code codepen coding conundrums coding conundrums evolved command line compilers compiling compression conference conferences containerisation css dailyprogrammer data analysis debugging defining ai demystification distributed computing dns docker documentation downtime electronics email embedded systems encryption es6 features ethics event experiment external first impressions freeside future game github github gist gitlab graphics guide hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet interoperability io.js jabber jam javascript js bin labs latex learning library linux lora low level lua maintenance manjaro minetest network networking nibriboard node.js open source operating systems optimisation outreach own your code pepperminty wiki performance phd photos php pixelbot portable privacy problem solving programming problems project projects prolog protocol protocols pseudo 3d python reddit redis reference release releases rendering research resource review rust searching secrets security series list server software sorting source code control statistics storage svg systemquery talks technical terminal textures thoughts three thing game three.js tool tutorial twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 worldeditadditions xmpp xslt

A memory tester for the days of UEFI

For the longest time, memtest86+ was a standard for testing sticks of RAM that one suspects may be faulty. I haven't used it in a while, but when I do use it I find that an OS-independent tool (i.e. one that you boot into instead of your normal operating system) is the most reliable way to identify faults with RAM.

It may surprise you, but I've had this post mostly written up for about 2 years...! I remembered about this post recently, and decided to rework some of it and post it here.

Since UEFI was invented (Unified Extensible Firmware Interface) and replaced the traditional BIOS for booting systems around the world, booting memtest86+ suddenly became more challenging, as it is not currently compatible with UEFI. Now, it has been updated to support UEFI though, so I thought I'd write a blog post about it - mainly because there are very rarely guides on booting images like memtest86+ from a multiboot flash drive, like the one I have blogged about before.

Before we begin, worthy of note is memtest86. While it has a very similar name, it is a variant of memtest86+ that is not open source. I have tried it though, and it works well too - brief instructions can be found for it at the end of this blog post.

I will assume that you have already followed my previous guide on setting up a multiboot flash drive. You can find that guide here:

Multi-boot + data + multi-partition = octopus flash drive 2.0?

Alternatively, anywhere you can find a grub config file you can probably follow this guide. I have yet to find an actually decent reference for the grub configuration file language, but if you know of one, please do post it in the comments.

Memtest86+ (the open source one)

Personally, I recommend the open-source Memtest86+. Since the update to version 7.0, it is now compatible with both BIOS and UEFI-based systems without any additional configuration, which is nice. See the above link to one of my previous blog posts if you would like a flash drive that boots both BIOS and UEFI grub at the same time.

To start, visit the official website, and scroll down to the download section. From here, you want to download the "Binary Files (.bin/.efi) For PXE and chainloading" version. Unzip the file you download, and you should see the following files:

memtest32.bin
memtest32.efi
memtest64.bin
memtest64.efi

....discard the files with the .efi file extension - these are for booting directly instead of being chainloaded by grub. As the names suggest, the ones with 64 in the filename are the ones for 64-bit systems, which includes most systems today. Copy these to the device of your choice, and the open up your relevant grub.cfg (or equivalent grub configuration file - /etc/default/grub on an already-installed system) for editing. Then, somewhere in there add the following:

submenu "Memtest86+" {
    if loadfont unicode ; then
        set gfxmode=1024x768,800x600,auto
        set gfxpayload=800x600,1024x768
        terminal_output gfxterm
    fi

    insmod linux

    menuentry "[amd64] Start Memtest86+, use built-in support for USB keyboards" {
        linux /images/memtest86/memtest64.bin keyboard=both
    }
    menuentry "[amd64] Start Memtest86+, use BIOS legacy emulation for USB keyboards" {
        linux /images/memtest86/memtest64.bin keyboard=legacy
    }
    menuentry "[amd64] Start Memtest86+, disable SMP and memory identification" {
        linux /images/memtest86/memtest64.bin nosmp nosm nobench
    }
}

...replace /images/memtest86/memtest64.bin with the path to the memtest64.bin (or memtest32.bin) file, relative to your grub.cfg file. I forget where I took the above config file from, but I can't find it in my history.

If you are doing this on an installed OS instead of a USB flash drive, then things get a little more complicated. You will need to dig around and find what your version of grub considers paths to be relative to, and put your memtest64.bin file somewhere nearby. If you have experience with this, then please do leave a comment below.

This should be all you need. For those using a grub setup for an already-installed OS (e.g. via /etc/default/grub), then you will need to run a command for your changes to take effect:

sudo update-grub

Adding shutdown/reboot/reboot to bios setup/firmware options

Another thing I discovered recently is how to add options to my grub menu to reboot, shutdown, and reboot into firmware settings. rEFInd (an alternative bootloader to grub that I like very much, but I haven't yet explored for booting multiple ISOs on a flash drive) has these in its menus by default, but grub doesn't - so since I discovered how to do it recently I thought I'd include the config here for reference.

Simply add the following somewhere in your grub configuration file:

menuentry "Reboot" {
    reboot
}

menuentry "Shut Down" {
    halt
}

menuentry "UEFI Firmware / BIOS Settings" {
    fwsetup
}

Bonus: Memtest86 (non open-source)

I followed [https://www.yosoygames.com.ar/wp/2020/03/installing-memtest86-on-uefi-grub2-ubuntu/] this guide, but ended up changing a few things, so I'll outline the process here. Again, I'll assume you alreaady have a multiboot flash drive.

Firstly, download memtest86-usb.zip and extract the contents. Then, find the memtest86-usb.img file and find the offset of the partition that contains the actual EFI image that is the memtest86 program:


fdisk -lu memtest86-usb.img

Disk memtest86-usb.img: 500 MiB, 524288000 bytes, 1024000 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 68264C0F-858A-49F0-B692-195B64BE4DD7

Device              Start     End Sectors  Size Type
memtest86-usb.img1   2048  512000  509953  249M Microsoft basic data
memtest86-usb.img2 514048 1023966  509919  249M EFI System

Then, take the start position of the second partition (the last line that is highlighted), and multiply it by 512, the sector size. In my case, the number is 263192576. Then, mount the partition into a directory you have already created:

sudo mount -o loop,offset=263192576 memtest86-usb.img /absolute/path/to/dir

Then, browse the contents of the mounted partition and copy the EFI/BOOT directory off to your flash drive, and rename it to memtest86 or something.

Now, update your grub.cfg and add the following:

menuentry "memtest86" {
    chainloader /images/memtest/BOOTX64.efi
}

....replacing /images/memtest/BOOTX64.efi with the path to the BOOTX64.efi file that should be directly in the BOOT directory you copied off.

Finally, you should be able to try it out! Boot into your multiboot flash drive as normal, and then select the memtest86 option from the grub menu.

Extra note: booting from hard drives

This post is really turning into a random grab-bag of items in my grub config file, isn't it? Anyway, An option I don't use all that often (but is very useful when I do need it), are options to boot from the different hard drives in a machine. Since you can't get grub to figure out how many there are in advance, you have to statically define them ahead of time:



submenu "Boot from Hard Drive" {
    menuentry "Hard Drive 0" {
        set root=(hd0)
        chainloader +1
    }
    menuentry "Hard Drive 1" {
        set root=(hd1)
        chainloader +1
    }
    menuentry "Hard Drive 2" {
        set root=(hd2)
        chainloader +1
    }
    menuentry "Hard Drive 3" {
        set root=(hd3)
        chainloader +1
    }
}

....chainloading (aka calling another bootloader) is a wonderful thing :P

Of course, expand this as much as you like. I believe this approach also works with specific partitions with the syntax (hd0,X), where X is the partition number starting from 0.

Again, add to your grub.cfg file and update as above.

Conclusion

This post is more chaotic and disorganised than I expected, but I thought it would be useful to document some of the tweaks I've made to my multiboot flash drive setup over the years - something that has more proven its worth many many times since I first set it up.

We've added a memory (RAM) tester to our setup, using the open-source Memtest86+, and the alternative non-open-source version. We've also added options to reboot, shutdown, and enter the bios/uefi firmware settings.

Finally, we took a quick look at adding options to boot from different hard drives and partitions. If anyone knows how to add a menu item that could allow one to distinguish between different hard disks, partitions, their sizes, and their content more easily, please do leave a comment below.

Sources and further reading

Building the science festival demo: How to monkeypatch an npm package

A pink background dotted with bananas, with the patch-package logo front and centre, and the npm logo small in the top-left. Small brown package boxes are present in the bottom 2 corners.

In a previous post, I talked about the nuts and bolts of the demo on a technical level, and put it's all put together. I alluded to the fact that I had to monkeypatch Babylon.js to disable the gamepad support because it was horribly broken, and I wanted to dedicate an entire post to the subject.

Partly because it's a clever hack I used, and partly because if I ever need to do something similar again I want a dedicated tutorially-style post on how I did it so I can repeat the process.

Monkeypatching an npm package after installation in a reliable way is an inherently fragile task: it is not something you want to do if you can avoid. In some cases though, it's unavoidable:

  1. If you're short on time, and need something to work
  2. If you are going to submit a pull request to fix something now, but need an interim workaround until your pull request is accepted upstream
  3. If upstream doesn't want to fix the problem, and you're forced to either maintain a patch or fork upstream into a new project, which is a lot more work.

We'll assume that one of these 3 cases is true.

In the game Factorio, there's a saying 'there's a mod for that' that is often repeated in response to questions in discourse about the game. The same is true of Javascript: If you need to do a non-trivial thing, there's usually an npm package that does it that you can lean on instead of reinventing the wheel.

In this case, that package is called patch-package. patch-package is a lovely little tool that enables you to do 2 related things:

a) Generate patch files simply by editing a given npm package in-situ b) Automatically and transparently apply generated patch files on npm install, requiring no additional setup steps should you clone your project down from its repository and run npm install.

Assuming you have a working setup with the target npm package you want to patch already installed, first install patch-package:

npm install --save patch-package

Note: We don't --save-dev here, because patch-package needs to run any time the target package is installed... not just in your development environment - unless the target package to patch is also a development dependency.

Next, delve into node_modules/ and directly edit the files associated with the target package you want to edit.

Sometimes, projects will ship multiple npm packages, with one being containing the pre-minified build distribution, and th other distributing the raw source - e.g. if you have your own build system like esbuild and want to tree-shake it.

This is certainly the case for Babylon.js, so I had to switch from the main babylonjs package to @babylon/core, which contains the source. Unfortunately official documentation for Babylon.js is rather inconsistent which can lead to confusion using the latter, but once I figured out how the imports worked it all came out in the wash.

Once done, generate the patch file for the target package like so:

npx patch-package your-package-name-here

This should create a patch file in the directory patches/ alongside your package.json file.

The final step is to enable automatic and transparent application of the new patch file on package installation. To do this, open up your package.json for editing, and add the following to the scripts object:

"scripts": {
    "postinstall": "patch-package"
}

...so a complete example might look a bit like this:

{
    "name": "research-smflooding-vis",
    "version": "1.0.0",
    "description": "Visualisations of the main smflooding research for outreach purposes",
    "main": "src/index.mjs",

    // ....

    "scripts": {
        "postinstall": "patch-package",
        "test": "echo \"No tests have been written yet.\"",
        "build": "node src/esbuild.mjs",
        "watch": "ESBUILD_WATCH=yes node src/esbuild.mjs"
    },

    // ......

    "dependencies": {
        // .....
    }
}

That's really all you need to do!

After you've applied the patch like this, don't forget to commit your changes to your git/mercurial/whatever repository.

I would also advise being a bit careful installing updates to any packages you've patched in future, in case of changes - though of course installing dependency package updates are vitally important to keep your code updated and secure.

As a rule of thumb, I recommend actively working to minimise the number of patches you apply to packages, and only use this method as a last resort.

That's all for this post. In future posts, I want to look more at the AI theory behind the demo, it's implications, and what it could mean for research in the field in the future (is there even a kind of paper one writes about things one learns from outreach activities that accidentally have a bearing on my actual research? and would it even be worth writing something formal? a question for my supervisor and commenters on that blog post when it comes out I think).

See you in the next post!

(Background to post banner: Unsplash)

How to pin an apt repository for preferential package installation

As described in my last post, pinning apt repositories is now necessary if you want to install Firefox from an apt repository (e.g. if you want to install Firefox Beta). This is not an especially difficult process, but it is significantly confusing, so I thought I'd write a post about it.

Pinning an apt repository means that even if there's a newer version of a package elsewhere, the 'older' version will still be installed from the apt repository you pin.

Be very careful with this technique. You can easily cause major issues with your system if you pin the wrong repository!

Firstly, you want to head to /etc/apt/sources.list.d/ and find the .list file for the repository you want to pin. Take note of the URL inside that file, and then run this command:

apt-cache policy

No root is necessary here, as it's still a read-only command. Depending on how many apt repositories you have installed in your system, there may be a significant amount of output. Find the lines that correspond to the apt repository you want to preferentially install from in this output. For this example, I'm going to pin the excellent nautilus-typeahead apt repository, so the bit I'm looking for looks like this:

999 http://ppa.launchpad.net/lubomir-brindza/nautilus-typeahead/ubuntu jammy/main amd64 Packages
    release v=22.04,o=LP-PPA-lubomir-brindza-nautilus-typeahead,a=jammy,n=jammy,l=nautilus-typeahead,c=main,b=amd64
    origin ppa.launchpad.net

From here, take a note of the o= bit. In my case, it's o=LP-PPA-lubomir-brindza-nautilus-typeahead. Then, create a new file in /etc/apt/preferences.d with the following content:

Package: *
Pin: release o=LP-PPA-lubomir-brindza-nautilus-typeahead
Pin-Priority: 1001

See that o=.... bit there? Replace it with the one for the repository you want to pin. The number there is the new priority of the repository. The numbers at the beginning of each line in the output of the apt-cache policy command are the priorities of your existing apt repositories, so this should give you an idea as to what number you need to use here - a higher number means a higher priority regardless of the version number of the packages contained therein.

Then, simply sudo apt update and sudo apt dist-upgrade, and apt should pick up the "upgrades" from your newly pinned repository! In some situations you may need to remove and reinstall the offending package if you encounter issues.

Sources and further reading

Autoplant, Part 1: Overview

At a recent plant sale at my University, I bought myself both a parlour palm and an areca palm for my desk. They look lovely, but I always worry that I'm going to forget to water them.

The palms on my desk, with a blurred background

Having some experience with arduino already, I decided to resolve the issue by implementing an arduino-based system to monitor my new plants and log the resulting data in my existing collectd-based (see also CGP which I use, but sadly it's abandonware. Any suggestions for alternatives are greatly appreciated) monitoring system I use for the various servers and systems that I manage.

The eventual aim is to completely automate the watering process too - but before I can do that I need to first get the monitoring up and running so that I can calibrate the sensors, so this is what I'll be focusing on in this post.

Circuit design

To do this, I've used a bunch of parts I have lying around (plug a few new ones), and wired up a NodeMCU v0.9 to some sensors:

The circuit I've designed. See below for explanation.

(Above: The circuit I've wired up. See the Fritzing file.)

A full list of parts can be found at the end of this post, along with links to where I got them from. The sensors I'm using are:

  • 2 x Capacitive soil moisture sensors
  • 2 x Liquid level sensors
  • 1 x BME280 that I had lying around and thought "why not?"

Both the soil sensors and the liquid level sensors give out an analogue signal (shown in orange), but unfortunately the NodeMCU v0.9 (based on the ESP8266) only has a single analogue pin, so I bought myself a CD4051 from switch electronics (link at the bottom of this post) to act as a multiplexer. Given 3 digital wires to act as a channel selector (shown in purple), it allows you to select between 8 different analogue channels - perfect for my use-case. You can see it in the diagram at the left-hand side of the larger breadboard.

While the Fritzing design for a USB breakout board isn't exactly the same as the ones I have, I plan on using them to transport power, ground, and the 2 analogue outputs from the 2 sensors for the plant on the other side of my desk.

The other component here is a BME280 I have lying around to monitor temperature, humidity, and air pressure. This isn't required, but since I have one lying around anyway I thought it was a good idea to use it.

Networking

With the circuit designed and implemented (still got to finalise the USB breakout boards), the next thing to organise is the transport and logging for the data generated. MQTT is easy to use on the Arduino because of PubSubClient, so that's what I decided on using.

Time for another diagram!

A brief overview of the networking setup I'll be going for with the backend. Subject to change.

(Can't see the above? Try a PNG instead.)

If you've followed my blog here for a while, you might remember that I have a cluster that's powered by a bunch of Raspberry Pis running Hashicorp Nomad and Consul.

On this cluster - amongst a many other things - I have an MQTT server (check out my earlier post on how to set one up) running inside a Docker container as a Nomad task. Connections to my Mosquitto MQTT server are managed by Fabio, which reverse-proxies connections to Mosquitto. Fabio exposes2 ports by which one can connect to the MQTT server:

  • TCP port 1883: Unencrypted MQTT
  • TCP port 8883: (TLS encrypted) MQTTS

In doing so, Fabio terminates TLS so that Mosquitto doesn't need to have access to my TLS certificates. For those interested, the proxy.addr in your fabio.properties file is actually a comma-separated list, and to achieve TLS termination for MQTTS you can add something like this to proxy.addr (make sure to use an up-to-date cipher list):

:1883;proto=tcp,:8883;proto=tcp;cs=mooncarrot;tlsmin=tls12;tlsciphers="TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305

Then if we continue working backwards, the next piece of the puzzle is port forwarding on my router. While Fabio exposes both MQTT and MQTTS, I only port-forward MQTTS and not unencrypted MQTT. My autoplant system will only be communicating over MQTTS, so it doesn't make sense to expose the less secure unencrypted port to the world too.

From here, no matter where in the world my autoplant monitoring system is located it will still be able to send sensor values back to be logged.

Working forwards again from the Mosquitto MQTT server, the Arduino is going to send MQTT messages to the sensors/data topic with messages that look something like this (but minified):

{
    "id": "plant-a",
    "sensor": "soil",
    "value": 2.41
}

The MQTT plugin for Collectd doesn't support JSON input though, so I'm going to write a broker that translates messages from the JSON format above to the format that Collectd likes. My reasoning for this is twofold:

  1. I also want to log data to a tab-separated-values file for later long-term analysis
  2. The collectd format is somewhat complicated and has the potential to mess up other parts of my monitoring system, so I want to do some extra validation on incoming data from remote sensors

I haven't fully decided on which language I'm going to write this validating broker in, but I'm thinking it might end up being a shell script. Weird as it sounds, I found a cool MQTT library for Bash called bish-bosh which I want to try out. My second choice here would be Rust, but I unfortunately can't find a (preferably pure-rust) MQTT(S) library, which I'm finding rather strange.

Either way, if possible I'm going to package up the completed implementation of this broker into a Docker container and write a quick Hashicorp Nomad job file to get it running on my cluster so that it benefits from the reundancy of my Nomad cluster. Then, with collectd listening on another topic, it can transparently bridge the 2. I'm not quite sure how will collectd's MQTT plugin actually works though, so this shell script may end up being a child process of my main collectd server using the exec plugin instead.

Conclusion

In this post, I've outlined my solution to a seemingly simple problem of watering plants automatically (because all simple problems need complex solutions :P). With an arduino-based system, I'm going to send messages via MQTTS containing sensor data, which will then be passed to a backend for processing. In future posts in this series, I want to (in no particular order):

  • Go through the code for the arduino I've written
  • Look at the code I have yet to write for the translation broker
  • Explore the 3d printed parts I have yet to design for various purposes
  • Show off the final completed circuit in action
  • Look at some initial statistics collected by the sensors
  • Start to play with pumping water around and different holding containers / designs, etc

Not all of these warrant a separate post, but there are definitely multiple posts to come on this project :D

Full parts list

  • 1 x NodeMCU v0.9 (anything with WiFi will do, I just had an old on lying around)
  • 1 x Half size breadboard
  • 1 x Mini breadboard (the smallest I had)
  • 2 x Capacitive soil moisture sensor (source, significantly cheaper if you're willing to wait ages)
  • 2 x Waveshare liquid level sensor (source, original website)
  • 1 x CD4051 8:1 analogue multiplexer (source - breadboard compatible)
  • 2 x USB type a breakouts
  • 1 x BME 280 (I got mine cheap on AliExpress)
  • Lots of jumper wires

MIDI to Music Box score converter

Keeping with the theme of things I've forgotten to blog about (every time I think I've mentioned everything, I remember something else), in this post I'm going to talk about yet another project of mine that's been happily sitting around (and I've nearly blogged about but then forgot on at least 3 separate occasions): A MIDI file to music box conversion script. A while ago, I bought one of these customisable music boxes:

(Above: My music box (centre), and the associated hole punching device it came with (top left).)

The basic principle is that you bunch holes in a piece of card, and then you feed it into the music box to make it play the notes you've punched into the card. Different variants exist with different numbers of notes - mine has a staggering 30 different nodes it can play!

It has a few problems though:

  1. The note names on the card are wrong
  2. Figuring out which one is which is a problem
  3. Once you've punched a hole, it's permanent
  4. Punching the holes is fiddly

Solving problem #4 might take a bit more work (a future project out-of-scope of this post), but the other 3 are definitely solvable.

MIDI is a protocol (and a file format) for storing notes from a variety of different instruments (this is just the tip of the iceberg, but it's all that's needed for this post), and it just so happens that there's a C♯ library on NuGet called DryWetMIDI that can read MIDI files in and allow one to iterate over the notes inside.

By using this and an homebrew SVG writer class (another blog post topic for another time once I've tidied it up), I implemented a command-line program I've called MusicBoxConverter (the name needs work - comment below if you can think of a better one) that converts MIDI files to an SVG file, which can then be printed (carefully - full instructions on the project's page - see below).

Before I continue, it's perhaps best to give an example of my (command line) program's output:

Example output

(Above: A lovely little tune from the TV classic The Clangers, converted for a 30 note music box by my MusicBoxConverter program. I do not own the song!)

The output from the program can (once printed to the correct size) then be cut out and pinned onto a piece of card for hole punching. I find paper clips work for this, but in future I might build myself an Arduino-powered device to punch holes for me.

The SVG generated has a bunch of useful features that makes reading it easy:

  • Each note is marked by a blue circle with a red plus sign to help with accuracy (the hole puncher that came with mine has lines on it that make a plus shape over the viewing hole)
  • Each hole is numbered to aid with getting it the right way around
  • Big arrows in the background ensure you don't get it backwards
  • The orange bar indicates the beginning
  • The blue circle should always be at the bottom and the and the red circle at the top, so you don't get it upside down
  • The green lines should be lined up with the lines on the card

While it's still a bit tedious to punch the holes, it's much easier be able to adapt a score in Musescore, export to MIDI, push it through MusicBoxConverter than doing lots of trial-and-error punching it directly unaided.

I won't explain how to use the program in this blog post in any detail, since it might change. However, the project's README explains in detail how to install and use it:

https://git.starbeamrainbowlabs.com/sbrl/MusicBoxConverter

Something worth a particular mention here is the printing process for the generated SVGs. It's somewhat complicated, but this is unfortunately necessary in order to avoid rescaling the SVG during the conversion process - otherwise it doesn't come out the right size to be compatible with the music box.

A number of improvements stand out to me that I could make to this project. While it only supports 30 note music boxes at the moment, it can easily be extended to support multiple other different type of music box (I just don't have them to hand).

Implementing a GUI is another possible improvement but would take a lot of work, and might be best served as a different project that uses my MusicBoxConverter CLI under the hood.

Finally, as I mentioned above, in a future project (once I have a 3D printer) I want to investigate building an automated hole punching machine to make punching the holes much easier.

If you make something cool with this program, please comment below! I'd love to hear from you (and I may feature you in the README of the project too).

Sources and further reading

  • Musescore - free and open source music notation program with a MIDI export function
  • MusicBoxConverter (name suggestions wanted in the comments below)

Automatically downloading emails and extracting their attachments

I have an all-in-one printer that's also a scanner - specifically the Epson Ecotank 4750 (though annoyingly the automated document feeder doesn't support duplex). While it's a great printer (very eco-friendly, and the inks last for ages!), my biggest frustration with it is that it doesn't scan directly to an SMB file share (i.e. a Windows file share). It does support SANE though, which allows you to use it through a computer.

This is ok, but the ability to scan directly from the device itself without needing to use a computer was very convenient, so I set out to remedy this. The printer does have a cloud feature they call "Epson Connect", which allows one to upload to various cloud services such as Google Drive and Box, but I don't want to upload potentially sensitive data to such services.

Fortunately, there's a solution at hand - email! The printer in question also supports scanning to a an email address. Once the scanning process is complete, then it sends an email to the preconfigured email address with the scanned page(s) attached. It's been far too long since my last post about email too, so let's do something about that.

Logging in to my email account just to pick up a scan is clunky and annoying though, so I decided to automate the process to resolve the issue. The plan is as follows:

  1. Obtain a fresh email address
  2. Use IMAP IDLE to instantly download emails
  3. Extract attachments and save them to the output directory
  4. Discard the email - both locally and remotely

As some readers may be aware, I run my own email server - hence the reason why I wrote this post about email previously, so I reconfigured it to add a new email address. Many other free providers exist out there too - just make sure you don't use an account you might want to use for anything else, since our script will eat any emails sent to it.

Steps 2, 3, and 4 there took some research and fiddling about, but in the end I cooked up a shell script solution that uses fetchmail, procmail (which is apparently unmaintained, so I should consider looking for alternatives), inotifywait, and munpack. I've also packaged it into a Docker container, which I'll talk about later in this post.

To illustrate how all of these fit together, let's use a diagram:

A diagram showing how the whole process fits together - explanation below.

fetchmail uses IMAP IDLE to hold a connection open to the email server. When it receives notification of a new email, it instantly downloads it and spawns a new instance of procmail to handle it.

procmail writes the email to a temporary directory structure, which a separate script is watching with inotifywait. As soon as procmail finishes writing the new email to disk, inotifywait triggers and the email is unpacked with munpack. Any attachments found are moved to the output directory, and the original email discarded.

With this in mind, let's start drafting up a script. The first order of the day is configuring fetchmail. This is done using a .fetchmailrc file - I came up with this:

poll bobsrockets.com protocol IMAP port 993
    user "[email protected]" with pass "PASSWORD_HERE"
    idle
    ssl

...where [email protected] is the email address you want to watch, bobsrockets.com is the domain part of said email address (everything after the @), and PASSWORD_HERE is the password required to login.

Save this somewhere safe with tight file permissions for later.

The other configuration file we'll need is one for procmail. let's do that one now:

CORRECTHOME=/tmp/maildir
MAILDIR=$CORRECTHOME/

:0
Mail/

Replace /tmp/maildir with the temporary directory you want to use to hold emails in. Save this as procmail.conf for later too.

Now we have the mail config files written, we need to install some software. I'm using apt on Debian (a minideb Docker container actually), so you'll need to adapt this for your own system if required.

sudo apt install ca-certificates fetchmail procmail inotify-tools mpack
# or, if you're using minideb:
install_packages ca-certificates fetchmail procmail inotify-tools mpack

fetchmail is for some strange reason extremely picky about the user account it runs under, so let's update the pre-created fetchmail user account to make it happy:

groupadd --gid 10000 fetchmail
usermod --uid 10000 --gid 10000 --home=/srv/fetchmail --uid=10000 --gi=10000 fetchmail
chown fetchmail:fetchmail /srv/fetchmail

fetchmail now needs that config file we created earlier. Let's update the permissions on that:

chmod 10000:10000 path/to/.fetchmailrc

If you're running on bare metal, move it to the /srv/fetchmail directory now. If you're using Docker, keep reading, as I recommend that this file is mounted using a Docker volume to make the resulting container image more reusable.

Now let's start drafting a shell script to pull everything together. Let's start with some initial setup:

#!/usr/bin/env bash

if [[ -z "${TARGET_UID}" ]]; then
    echo "Error: The TARGET_UID environment variable was not specified.";
    exit 1;
fi
if [[ -z "${TARGET_GID}" ]]; then
    echo "Error: The TARGET_GID environment variable was not specified.";
    exit 1;
fi
if [[ "${EUID}" -ne 0 ]]; then
    echo "Error: This Docker container must run as root because fetchmail is a pain, and to allow customisation of the target UID/GID (although all possible actions are run as non-root users)";
    exit 1;
fi

dir_mail_root="/tmp/maildir";
dir_newmail="${dir_mail_root}/Mail/new";
target_dir="/mnt/output";

fetchmail_uid="$(id -u "fetchmail")";
fetchmail_gid="$(id -g "fetchmail")";

temp_dir="$(mktemp --tmpdir -d "imap-download-XXXXXXX")";
on_exit() {
    rm -rf "${temp_dir}";
}
trap on_exit EXIT;

log_msg() {
    echo "$(date -u +"%Y-%m-%d %H:%M:%S") imap-download: $*";
}

This script will run as root, and fetchmail runs as UID 10000 and GID 10000, The reasons for this are complicated (and mostly have to do with my weird network setup). We look for the TARGET_UID and TARGET_GID environment variables, as these define the uid:gid we'll be setting files to before writing them to the output directory.

We also determine the fetchmail UID/GID dynamically here, and create a second temporary directory to work with too (the reasons for which will become apparent).

Before we continue, we need to create the directory procmail writes new emails to. Not because procmail won't create it on its own (because it will), but because we need it to exist up-front so we can watch it with inotifywait:

mkdir -p "${dir_newmail}";
chown -R "${fetchmail_uid}:${fetchmail_gid}" "${dir_mail_root}";

We're running as root, but we'll want to spawn fetchmail (and other things) as non-root users. Technically, I don't think you're supposed to use sudo in non-interactive scripts, and it's also not present in my Docker container image. The alternative is the setpriv command, but using it is rather complicated and annoying.

It's more powerful than sudo, as it allows you to specify not only the UID/GID a process runs as, but also the capabilities the process will have too (e.g. binding to low port numbers). There's a nasty bug one has to work around if one is using Docker too, so given all this I've written a wrapper function that abstracts all of this complexity away:

# Runs a process as another user.
# Ref https://github.com/SinusBot/docker/pull/40
# $1    The UID to run the process as.
# $2    The GID to run the process as.
# $3-*  The command (including arguments) to run
run_as_user() {
    run_as_uid="${1}"; shift;
    run_as_gid="${1}"; shift;
    if [[ -z "${run_as_uid}" ]]; then
        echo "run_as_user: No target UID specified.";
        return 1;
    fi
    if [[ -z "${run_as_gid}" ]]; then
        echo "run_as_user: No target GID specified.";
        return 2;
    fi

    # Ref https://github.com/SinusBot/docker/pull/40
    # WORKAROUND for `setpriv: libcap-ng is too old for "all" caps`, previously "-all" was used here
    # create a list to drop all capabilities supported by current kernel
    cap_prefix="-cap_";
    caps="$cap_prefix$(seq -s ",$cap_prefix" 0 "$(cat /proc/sys/kernel/cap_last_cap)")";

    setpriv --inh-caps="${caps}" --reuid "${run_as_uid}" --clear-groups --regid "${run_as_gid}" "$@";
    return "$?";
}

With this in hand, we can now wrap fetchmail and procmail in a function too:

do_fetchmail() {
    log_msg "Starting fetchmail";

    while :; do
        run_as_user "${fetchmail_uid}" "${fetchmail_gid}" fetchmail --mda "/usr/bin/procmail -m /srv/procmail.conf";

        exit_code="$?";
        if [[ "$exit_code" -eq 127 ]]; then
            log_msg "setpriv failed, exiting with code 127";
            exit 127;
        fi 

        log_msg "Fetchmail exited with code ${exit_code}, sleeping 60 seconds";
        sleep 60
    done
}

In short this spawns fetchmail as the fetchmail user we configured above, and also restarts it if it dies. If setpriv fails, it returns an exit code of 127 - so we catch that and don't bother trying again, as the issue likely needs manual intervention.

To finish the script, we now need to setup that inotifywait loop I mentioned earlier. Let's setup a shell function for that:


do_attachments() {
    while :; do # : = infinite loop
        # Wait for an update
        # inotifywait's non-0 exit code forces an exit for some reason :-/
        inotifywait -qr --event create --format '%:e %f' "${dir_newmail}";

        # Process new email here
    done
}

Processing new emails is not particularly difficult, but requires a sub loop because:

  • More than 1 email could be written at a time
  • Additional emails could slip through when we're processing the last one
while read -r filename; do

    # Process each email

done < <(find "${dir_newmail}" -type f);

Finally, we need to process each email we find in turn. Let's outline the steps we need to take:

  1. Move the email to that second temporary directory we created above (since the procmail directory might not be empty)
  2. Unpack the attachments
  3. chown the attach

Let's do this in chunks. First, let's move it to the temporary directory:

log_msg "Processing email ${filename}";

# Move the email to a temporary directory for processing
mv "${filename}" "${temp_dir}";

The filename environment variable there is the absolute path to the email in question, since we used find and passed it an absolute directory to list the contents of (as opposed to a relative path).

To find the filepath we moved it to, we need to do this:

filepath_temp="${temp_dir}/$(basename "${filename}")"

This is important for the next step, where we unpack it:

# Unpack the attachments
munpack -C "${temp_dir}" "${filepath_temp}";

Now that we've unpacked it, let's do a bit of cleaning up, by deleting the original email file and the .desc description files that munpack also generates:

# Delete the original email file and any description files
rm "${filepath_temp}";
find "${temp_dir}" -iname '*.desc' -delete;

Great! Now we have the attachments sorted, now all we need to do is chown them to the target UID/GID and move them to the right place.

chown -R "${TARGET_UID}:${TARGET_GID}" "${temp_dir}";
chmod -R a=rX,ug+w "${temp_dir}";

I also chmod the temporary directory too to make sure that the permissions are correct, because otherwise the mv command is unable to read the directory's contents.

Now to actually move all the attachments:

# Move the attachment files to the output directory
while read -r attachment; do
    log_msg "Extracted attachment ${attachment}";
    chmod 0775 "${attachment}";
    run_as_user "${TARGET_UID}" "${TARGET_GID}" mv "${attachment}" "${target_dir}";
done < <(find "${temp_dir}" -type f);

This is rather overcomplicated because of an older design, but it does the job just fine.

With that done, we've finished the script. I'll include the whole script at the bottom of this post.

Dockerification

If you're running on bare metal, then you can skip to the end of this post. Because I have a cluster, I want to be able to run this thereon. Since said cluster works with Docker containers, it's natural to Dockerise this process.

The Dockerfile for all this is surprisingly concise:

(Can't see the above? View it on my personal Git server instead)

To use this, you'll need the following files alongside it:

It exposes the following Docker volumes:

  • /mnt/fetchmailrc: The fetchmailrc file
  • /mnt/output: The target output directory

All these files can be found in this directory on my personal Git server.

Conclusion

We've strung together a bunch of different programs to automatically download emails and extract their attachments. This is very useful as for ingesting all sorts of different files. Things I haven't covered:

  • Restricting it to certain source email addresses to handle spam
  • Restricting the file types accepted (the file command is probably your friend)
  • Disallowing large files (most 3rd party email servers do this automatically, but in my case I don't have a limit that I know of other than my hard disk space)

As always, this blog post is both a reference for my own use and a starting point for you if you'd like to do this for yourself.

If you've found this useful, please comment below! I find it really inspiring / motivating to learn how people have found my posts useful and what for.

Sources and further reading

run.sh script

(Can't see the above? Try a this link, or alternatively this one (bash))

A much easier way to install custom versions of Python

Recently, I wrote a rather extensive blog post about compiling Python from source: Installing Python, Keras, and Tensorflow from source.

Since then, I've learnt of multiple other different ways to do that which are much easier as it turns out to achieve that goal.

For context, the purpose of running a specific version of Python in the first place was because on my University's High-Performance Computer (HPC) Viper, it doesn't have a version of Python new enough to run the latest version of Tensorflow.

Using miniconda

After contacting the Viper team at the suggestion of my supervisor, I discovered that they already had a mechanism in place for specifying which version of Python to use. It seems obvious in hindsight - since they are sure to have been asked about this before, they already had a solution in the form of miniconda.

If you're lucky enough to have access to Viper, then you can load miniconda like so:

module load python/anaconda/4.6/miniconda/3.7

If you don't have access to Viper, then worry not. I've got other methods in store which might be better suited to your environment in later sections.

Once loaded, you can specify a version of Python like so:

conda create -n py python=3.8

The -n py specifies the name of the environment you'd like to create, and can be anything you like. Perhaps you could use the name of the project you're working on would be a good idea. The python=3.8 is the version of Python you want to use. You can list the versions of Python available like so:

conda search -f python

Then, to activate the new environment, do this:

conda init bash
conda activate py
exec bash

Replace py with the name of the environment you created above.

Now, you should have the specific version of Python you wanted installed and ready to use.

Edit 2022-03-30: Added conda install pip step, as some systems don't natively have pip by default which causes issues.

The last thing we need to do here is to install pip inside the virtual conda environment. Do that like so:

conda install pip

You can also install packages with pip, and it should all come out in the wash.

For Viper users, further information about miniconda can be found here: Applications/Miniconda Last

Gentoo Project Prefix

Another option I've been made aware of is Gentoo's Project Prefix. Essentially, it installs Gentoo (a distribution of Linux) inside a directory without root privileges. It doesn't work very well on Ubuntu, however due to this bug, but it should work on other systems.

They provide a bootstrap script that you can run that helps you bootstrap the system. It asks you a few questions, and then gets to work compiling everything required (since Gentoo is a distribution that compiles everything from source).

If you have multiple versions of gcc available, try telling it about a slightly older version of GCC if it fails to install.

If you can get it to install, a Gentoo Prefix install allows the installation whatever software you like!

pyenv

The last solution to the problem I'm aware of is pyenv. It automates the process of downloading and compiling specified versions of Python, and also updates you shell automatically. It does require some additional dependencies to be installed though, which could be somewhat awkward if you don't have sudo access to your system. I haven't actually tried it myself, but it may be worth looking into if the other 2 options don't work for you.

Conclusion

There's always more than 1 way to do something, and it's always worth asking if there's a better way if the way you're currently using seems hugely complicated.

Installing Python, Keras, and Tensorflow from source

I found myself in the interesting position recently of needing to compile Python from source. The reasoning behind this is complicated, but it boils down to a need to use Python with Tensorflow / Keras for some natural language processing AI, as Tensorflow.js isn't going to cut it for the next stage of my PhD.

The target upon which I'm aiming to be running things currently is Viper, my University's high-performance computer (HPC). Unfortunately, the version of Python on said HPC is rather old, which necessitated obtaining a later version. Since I obviously don't have sudo permissions on Viper, I couldn't use the default system package manager. Incredibly, pre-compiled Python binaries are not distributed for Linux either, which meant that I ended up compiling from source.

I am going to be assuming that you have a directory at $HOME/software in which we will be working. In there, there should be a number of subdirectories:

  • bin: For binaries, already added to your PATH
  • lib: For library files - we'll be configuring this correctly in this guide
  • repos: For git repositories we clone

Make sure you have your snacks - this was a long ride to figure out and write - and it's an equally long ride to follow. I recommend reading this all the way through before actually executing anything to get an overall idea as to the process you'll be following and the assumptions I've made to keep this post a reasonable length.

Setting up

Before we begin, we need some dependencies:

  • gcc - The compiler
  • git - For checking out the cpython git repository
  • readline - An optional dependency of cpython (presumably for the REPL)

On Viper, we can load these like so:

module load utilities/multi
module load gcc/10.2.0
module load readline/7.0

Compiling openssl

We also need to clone the openssl git repo and build it from source:

cd ~/software/repos
git clone git://git.openssl.org/openssl.git;    # Clone the git repo
cd openssl;                                     # cd into it
git checkout OpenSSL_1_1_1-stable;              # Checkout the latest stable branch (do git branch -a to list all branches; Python will complain at you during build if you choose the wrong one and tell you what versions it supports)
./config;                                       # Configure openssl ready for compilation
make -j "$(nproc)"                              # Build openssl

With openssl compiled, we need to copy the resulting binaries to our ~/software/lib directory:

cp lib*.so* ~/software/lib;
# We're done, cd back to the parent directory
cd ..;

To finish up openssl, we need to update some environment variables to let the C++ compiler and linker know about it, but we'll talk about those after dealing with another dependency that Python requires.

Compiling libffi

libffi is another dependency of Python that's needed if you want to use Tensorflow. To start, go to the libgffi GitHub releases page in your web browser, and copy the URL for the latest release file. It should look something like this:

https://github.com/libffi/libffi/releases/download/v3.3/libffi-3.3.tar.gz

Then, download it to the target system:

cd ~/software/lib
curl -OL URL_HERE

Note that we do it this way, because otherwise we'd have to run the autogen.sh script which requires yet more dependencies that you're unlikely to have installed.

Then extract it and delete the tar.gz file:

tar -xzf libffi-3.3.tar.gz
rm libffi-3.3.tar.gz

Now, we can configure and compile it:

./configure --prefix=$HOME/software
make -j "$(nproc)"

Before we install it, we need to create a quick alias:

cd ~/software;
ln -s lib lib64;
cd -;

libffi for some reason likes to install to the lib64 directory, rather than our pre-existing lib directory, so creating an alias makes it so that it installs to the right place.

Updating the environment

Now that we've dealt with the dependencies, we now need to update our environment so that the compiler knows where to find them. Do that like so:

export LD_LIBRARY_PATH="$HOME/software/lib:${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}";
export LDFLAGS="-L$HOME/software/lib -L$HOME/software/include $LDFLAGS";
export CPPFLAGS="-I$HOME/software/include -I$HOME/software/repos/openssl/include -I$HOME/software/repos/openssl/include/openssl $CPPFLAGS"

It is also advisable to update your ~/.bashrc with these settings, as you may need to come back and recompile a different version of Python in the future.

Personally, I have a file at ~/software/setup.sh which I run with source $HOME/software/setuop.sh in my ~/.bashrc file to keep things neat and tidy.

Compiling Python

Now that we have openssl and libffi compiled, we can turn our attention to Python. First, clone the cpython git repo:

git clone https://github.com/python/cpython.git
cd cpython;

Then, checkout the latest tag. This essentially checks out the latest stable release:

git checkout "$(git tag | grep -ivP '[ab]|rc' | tail -n1)"

Important: If you're intention is to use tensorflow, check the Tensorflow Install page for supported Python versions. It's probable that it doesn't yet support the latest version of Python, so you might need to checkout a different tag here. For some reason, Python is really bad at propagating new versions out to the community quickly.

Before we can start the compilation process, we need to configure it. We're going for performance, so execute the configure script like so:

./configure --with-lto --enable-optimizations --with-openssl=/absolute/path/to/openssl_repo_dir

Replace /absolute/path/to/openssl_repo with the absolute path to the above openssl repo.

Now, we're ready to compile Python. Do that like so:

make -j "$(nproc)"

This will take a while, but once it's done it should have built Python successfully. For a sanity check, we can also test it like so:

make -j "$(nproc)" test

The Python binary compiled should be called simply python, and be located in the root of the git repository. Now that we've compiled it, we need to make a few tweaks to ensure that our shell uses our newly compiled version by default and not the older version from the host system. Personally, I keep my ~/bin folder under version control, so I install host-specific to ~/software, and put ~/software/bin in my PATH like so:

export PATH=$HOME/software/bin

With this in mind, we need to create some symbolic links in ~/software/bin that point to our new Python installation:

cd $HOME/software/bin;
ln -s relative/path/to/python_binary python
ln -s relative/path/to/python_binary python3
ln -s relative/path/to/python_binary python3.9

Replace relative/path/to/python_binary with the relative path tot he Python binary we compiled above.

To finish up the Python installation, we need to get pip up and running, the Python package manager. We can do this using the inbuilt ensurepip module, which can bootstrap a pip installation for us:

python -m ensurepip --user

This bootstraps pip into our local user directory. This is probably what you want, since if you try and install directly the shebang incorrectly points to the system's version of Python, which doesn't exist.

Then, update your ~/.bash_aliases and add the following:

export LD_LIBRARY_PATH=/absolute/path/to/openssl_repo_dir/lib:$LD_LIBRARY_PATH;
alias pip='python -m pip'
alias pip3='python -m pip'

...replacing /absolute/path/to/openssl_repo_dir with the path to the openssl git repo we cloned earlier.

The next stage is to use virtualenv to locally install our Python packages that we want to use for our project. This is good practice, because it keeps our dependencies locally installed to a single project, so they don't clash with different versions in other projects.

Before we can use virtualenv though, we have to install it:

pip install virtualenv

Unfortunately, Python / pip is not very clever at detecting the actual Python installation location, so in order to actually use virtualenv, we have to use a wrapper script - because the [shebang]() in the main ~/.local/bin/virtualenv entrypoint does not use /usr/bin/env to auto-detect the python binary location. Save the following to ~/software/bin (or any other location that's in your PATH ahead of ~/.local/bin):

#!/usr/bin/env bash

exec python ~/.local/bin/virtualenv "$@"

For example:

# Write the script to disk
nano ~/software/bin/virtualenv;
# chmod it to make it executable
chmod +x ~/software/bin/virtualenv

Installing Keras and tensorflow-gpu

With all that out of the way, we can finally use virtualenv to install Keras and tensorflow-gpu. Let's create a new directory and create a virtual environment to install our packages in:

mkdir tensorflow-test
cd tensorflow-test;
virtualenv "$PWD";
source bin/activate;

Now, we can install Tensorflow & Keras:

pip install tensorflow-gpu

It's worth noting here that Keras is a dependency of Tensorflow.

Tensorflow has a number of alternate package names you might want to install instead depending on your situation:

  • tensorflow: Stable tensorflow without GPU support - i.e. it runs on the CPU instead.
  • tf-nightly-gpu: Nightly tensorflow for the GPU. Useful if your version of Python is newer than the version of Python supported by Tensorflow

Once you're done in the virtual environment, exit it like this:

deactivate

Phew, that was a huge amount of work! Hopefully this sheds some light on the maddenly complicated process of compiling Python from source. If you run into issues, you're welcome to comment below and I'll try to help you out - but you might be better off asking the Python community instead, as they've likely got more experience with Python than I have.

Sources and further reading

Running multiple local versions of CUDA on Ubuntu without sudo privileges

I've been playing around with Tensorflow.js for my PhD (see my PhD Update blog post series), and I had some ideas that I wanted to test out on my own that aren't really related to my PhD. In particular, I've found this blog post to be rather inspiring - where the author sets up a character-based recurrent neural network to generate text.

The idea of transcoding all those characters to numerical values and back seems like too much work and too complicated just for a quick personal project though, so my plan is to try and develop a byte-based network instead, in the hopes that I can not only teach it to generate text as in the blog post, but valid Unicode as well.

Obviously, I can't really use the University's resources ethically for this (as it's got nothing to do with my University work) - so since I got a new laptop recently with an Nvidia GeForce RTX 2060, I thought I'd try and use it for some machine learning instead.

The problem here is that Tensorflow.js requires only CUDA 10.0, but since I'm running Ubuntu 20.10 with all the latest patches installed, I have CUDA 11.1. A quick search of the apt repositories on my system reveals nothing that suggests I can install older versions of CUDA alongside the newer one, so I had to devise another plan.

I discovered some months ago (while working with Viper - my University's HPC - for my PhD) that you can actually extract - without sudo privileges - the contents of the CUDA .run installers. By then fiddling with your PATH and LD_LIBRARY_PATH environment variables, you can get any program you run to look for the CUDA libraries elsewhere instead of loading the default system libraries.

Since this is the second time I've done this, I thought I'd document the process for future reference.

First, you need to download the appropriate .run installer for the CUDA libraries. In my case I need CUDA 10.0, so I've downloaded mine from here:

https://developer.nvidia.com/cuda-10.0-download-archive?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_type=runfilelocal

Next, we need to create a new subdirectory and extract the .run file into it. Do that like so:

cd path/to/runfile_directory;
mkdir cuda-10.0
./cuda_10.0.130_410.48_linux.run --extract=${PWD}/cuda-10.0/

Make sure that the current working directory contains no spaces, no preferably no other special characters either. Also, adjust the file and directory names to suit your situation.

Once done, this will have extract 3 subfiles - which also have the suffix .run. We're only interested in CUDA itself, so we only need to extract the the one that starts with cuda-linux. Do that like so (adjusting file/directory names as before):

cd cuda-10.0;
./cuda-linux.10.0.130-24817639.run -noprompt -prefix=$PWD/cuda;
rm *.run;
mv cuda/* .;
rmdir cuda;

If you run ./cuda-linux.10.0.130-24817639.run --help, it's actually somewhat deceptive - since there's a typo in the help text! I corrected it for this above though. Once done, this should leave the current working directory containing the CUDA libraries - that is a subdirectory next to the original .run file:

+ /path/to/some_directory/
    + ./cuda_10.0.130_410.48_linux.run
    + cuda-10.0/
        + version.txt
        + bin/
        + doc/
        + extras/
        + ......

Now, it's just a case of fiddling with some environment variables and launching your program of choice. You can set the appropriate environment variables like this:

export PATH="/absolute/path/to/cuda-10.0/bin:${PATH}";
if [[ ! -z "${LD_LIBRARY_PATH}" ]]; then
    export LD_LIBRARY_PATH="/absolute/path/to/cuda-10.0/lib64:${LD_LIBRARY_PATH}";
else
    export LD_LIBRARY_PATH="/absolute/path/to/cuda-10.0/lib64";
fi

You could save this to a shell script (putting #!/usr/bin/env bash before it as the first line, and then running chmod +x path/to/script.sh), and then execute it in the context of the current shell for example like so:

source path/to/activate-cuda-10.0.sh

Many deep learning applications that use CUDA also use CuDNN, a deep learning library provided by Nvidia for accelerating deep learning applications. The archived versions of CuDNN can be found here: https://developer.nvidia.com/rdp/cudnn-archive

When downloading (you need an Nvidia developer account, but thankfully this is free), pay attention to the version being requested in the error messages generated by your application. Also take care to download the version of CUDA you're using, and match the CuDNN version appropriately.

When you download, select the "cuDNN Library for Linux" option. This will give you a tarball, which contains a single directory cuda. Extract the contents of this directory over the top of your CUDA directory from following my instructions above, and it should work as intended. I used my graphical archive manager for this purpose.

Avoiding accidental array mutation when iterating arrays in PHP

Pepperminty Wiki is written in PHP, and I've posted before about the search engine I've implemented for it that's powered by an inverted index. In this post, I want to talk about an anti-feature of PHP that doesn't behave the way you'd expect, and how to avoid running into the same problem I did.

To do this, let's introduce a simple example of the problem at work:

<?php
$arr = [];
for($i = 0; $i < 3; $i++) {
    $key = random_int(0, 2000);
    $arr[$key] = $i;
    echo("[init] key: $key, i: $i\n");
}

foreach($arr as $key => &$value) {
    // noop
}

echo("structure before: "); var_dump($arr);

foreach($arr as $key => $value) {
    echo("key: $key, i was $value\n");
}

echo("structure after: "); var_dump($arr);
?>

The above code initialises an associative array with 3 elements. The contents might look like this:

Key Value
469 0
1777 1
1685 2

Pretty simple so far. It then iterates over it twice: once referring to the values by reference (that's what the & there is for), and the second time referring to the items by value.

You'd expect the array to be identical before and after the second foreach loop, but you'd be wrong:

Key Value
469 0
1777 1
1685 1

Wait, what? That's very odd. What's going on here? How can a foreach loop that's iterating an array by value mutate an array? To understand why, let's take a step back for a moment. Here's another snippet:

<?php

$arr = [ 1, 2, 3 ];

foreach($arr as $key => $value) {
    echo("$key: $value\n");
}

echo("The last value was $key: $value\n");
?>

What do you expect to happen here? While in Javascript with a for..of loop with a let declaration both $key and $value would have fallen out of scope by now, in PHP foreach statements don't create a new scope for variables. Instead, they inherit the scope from their parent - e.g. the global scope in the above or their containing function if defined inside a function.

To this end, we can still access the values of both $key and $value in the above example even after the foreach loop has exited! Unexpected.

It gets better. Try prefixing $value with an ampersand & in the above example and re-running it - note that both $key and $value are both still defined.

This leads us to why the unexpected behaviour occurs. For some reason because of the way that PHP's foreach loop is implemented, if we re-use the same variable name for $value here in a subsequent loop it replaces the value of the last item in the array.

Shockingly enough this is actually documented behaviour (see also this bug report), though I'm somewhat confused as to how it happens on the last element in the array instead of the first.

With this in mind, to avoid this problem in future if you iterate an array by reference with a foreach loop, always remember to unset() the $value, like so:

<?php
$arr = [];
for($i = 0; $i < 3; $i++) {
    $key = random_int(0, 2000);
    $arr[$key] = $i;
    echo("[init] key: $key, i: $i\n");
}

foreach($arr as $key => &$value) {
    // noop
}
unset($key); unset($value);

echo("structure before: "); var_dump($arr);

foreach($arr as $key => $value) {
    echo("key: $key, i was $value\n");
}

echo("structure after: "); var_dump($arr);
?>

By doing this, you can ensure that you don't accidentally mutate your arrays and spend weeks searching for the bug like I did.

It's language features like these that catch developers out: and being aware of the hows and whys of their occurrence can help you to avoid them next time (if anyone can explain why it's the last element in the array that's affected instead of the first, I'd love to know!).

Regardless, although I'm aware of how challenging implementing a programming language is, programming language designers should take care to avoid unexpected behaviour like this that developers don't expect.

Found this interesting? Comment below!

Sources and further reading

Art by Mythdael