Stardust | Starbeamrainbowlabs

Automatically organising & optimising photos and videos with Bash

As I promised recently, this post is about a script I implemented a while back that automatically organises and optimises the photos and videos that I take for me. Since I've been using it a while now and it seems stable, I thought I'd share it here in the hopes that it might be useful to someone else too.

I take quite a few photos and the odd video or two with my phone. These are automatically uploaded to a Raspberry Pi 3B+ that's acting as a file server on my home network with FolderSync (yes, it has ads, but it's the best I could find that does the job). Once uploaded to a folder, I then wanted a script that would automatically sort the uploaded images and videos into folders by year and month according to their date taken.

To do this, I implemented a script that uses exiftool (sudo apt install libimage-exiftool-perl I believe) to pull out the date taken from JPEGs and sort based on that. For other formats that don't support EXIF data, I take the last modified time with the date command and use that instead.

Before I do this though, I run my images through a few preprocessing tools:

PNGs are optimised with optipng (sudo apt install optipng)
JPEGs are optimised with jpegoptim (sudo apt install jpegoptim)
JPEGs are additionally automatically reoriented with mogrify -auto-orient from ImageMagick, as many cameras will set an EXIF tag for the rotation of an image without bothering to physically rotate the image itself

It's worth noting here that these preprocessing optimisation steps are lossless. In other words, no quality lost by performing these actions - it simply encodes the images more efficiently such that they use less disk space.

Once all these steps are complete, images and videos are sorted according to their date taken / last modified time as described above. That should end up looking a bit like this:

images
    + 2019
        + 07-July
            + image1.jpeg
    + 2020
        + 05-May
            + image2.png
            + image3.jpeg
        + 06-June
            + video1.mp4

Now that I've explained how it works, I can show you the script itself:

(Can't see the above? Check out the script directly on GitLab here: organise-photos)

The script operates on the current working directory. All images directly in the working directory will be sorted as described above. Once you've put it in a directory that is in your PATH, simply call it like this:

organise-photos

The script can be divided up into 3 distinct sections:

The setup and initialisation
The function that sorts individual files themselves into the right directory (handle_file - it's about half-way down)
The preprocessing steps and the driver code that calls the above function.

So far, I've found that it's been working really rather well. During development and testing I did encounter a number of issues with the sorting system in handle_file that caused it to sort files into the wrong directory - which took me a while finally squash.

I'm always tweaking and improving it though. To that end, I have several plans to improve it.

Firstly, I want to optimise videos too. I'd like to store them in a standard format if possible. It's not that simple though, because some videos don't take well to being transcoded into a different format - indeed they can even take up more space than they did previously! In those cases it's probably worth discarding the attempt at transcoding the video to a more efficient format if it's larger than the original file.

I'd also like to reverse-geocode (see also the usage policy) the (latitude, longitude) geotags in my images to the name of the place that I took them, and append this data to the comment EXIF tag. This will make it easier to search for images based on location, rather than having to remember when I took them.

Finally, I'd also like to experiment with some form of AI object recognition with a similar goal as reverse-geocoding. By detecting the objects in my images and appending them to the comment EXIF tag, I can do things like search for "cat", and return all the images of cats I've taken so far.

I haven't started to look into AI much yet, but initial search results indicate that I might have an interesting time locating an AI that can identify a large number of different objects.

Anyway, my organise-photos script is available on GitLab in my personal bin folder that I commit to git if you'd like to take a closer look - suggestions and merge requests are welcome if you've got an idea that would make it even better :D

Sources and further reading

Website change detection with headless Firefox and ImageMagick

This wasn't the script I had in mind in the previous blog post (so you can look forward to another blog post about it), but have you ever wanted to know when a web page changes? If it does change, it's almost impossible to tell where on the page it's changed. Recently, I was thinking about the problem, and realised a few things:

Firefox can be operated headlessly (with --headless) to take screenshots
ImageMagick must be advanced enough to diff images

With this in mind, I set about implementing a script. Before we continue, here's an example diff image:

It's rather tall because of the webpage I chose, but the bits that have changed appear in red. The script I've written also generates an animated PNG showing the difference too:

Again, it's very tall because of the page I tested with, but I think it's pretty cool!

If you'd like to check the script out for yourself, you find it in the following git repository: sbrl/url-diff

For the curious, the script in question is written in Bash. It uses apcalc (available in Debian / Ubuntu based Linux distributions with sudo apt install apcalc) to crunch the numbers, and headless Firefox + Imagemagick as described above to take the screenshots and do the image processing. It should in theory work on Windows, but you'll need to jump through a number of hoops:

Install call url-diff.sh from [git bash]()
Install [ImageMagick]() and make sure the binaries are in your PATH
Install Firefox and make sure firefox is in your PATH
Explicitly set the URLDIFF_STORAGE_DIR environment variable when calling the script (do this by prefixing the command at the bottom of this post with URLDIFF_STORAGE_DIR=path/to/directory)

With my fancy new embed system, I can show you the code behind it:

(Can't see the above? Check it out in the git repository.)

I'm working on line numbers (sadly the author of highlight.js doesn't like them, so an alternative solution is required).

Anyway, the basic layout of the script is as follows:

First, the settings are read in and the default values set
Then, I define some utility functions.
- The calculate_percentage_colour function is integral to the image change detection algorithm. It counts percentage of an image that is a given colour.
Next, the help text is displayed if necessary
The case statement that follows allows multiple subcommands to be implemented. Currently I only have a check subcommand, but you never know!
Inside this case statement, the screenshots are taken and compared.
- A new screenshot is taken with headless Firefox
- If we don't have a screenshot stored away already, we stash the new screenshot and exit
- If we do have a pre-existing screenshot, we continue with the comparison, starting by generating a diff image where pixels that have changed are given 1 colour, and pixels that haven't changed another
- It's at this point that calculate_percentage_colour is called to calculate how much of the image has changed - the diff image is passed in and the changed pixels are counted
- If more than 2% (by default) has changed, then we continue on to generate the output images
- The first output image consists of the new screenshot with the diff image overlaid - this is generated with some ImageMagick wizardry: -compose over -composite
- The second is an animated PNG comprised of the old and new screenshots. This is generated with ffmpeg - which supports animated PNGs
- Finally, the old screenshot that we have stored away is replaced with the new one

It sounds more complicated than it is - hopefully my above explanation makes sense (post a comment below if you're confused about something!).

You can call the script like so:

git clone https://git.starbeamrainbowlabs.com/sbrl/url-diff.git
cd url-diff;
./url-diff.sh check URL_HERE path/to/output_diff.png path/to/output.apng

....replacing URL_HERE with the URL to check, and the paths with the places you'd like to write the output images to.

Own your code, part 6: The Lantern Build Engine

It's time again for another installment in the own your code series! In the last post, we looked at the git post-receive hook that calls the main git-repo Laminar CI task, which is the core of our Continuous Integration system (which we discussed in the post before that). You can see all the posts in the series so far here.

In this post we're going to travel in the other direction, and look at the build script / task automation engine that I've developed that goes hand-in-hand with the Laminar CI system - though it can and does stand on it's own too.

Introducing the Lantern Build Engine! Finally, after far too long I'm going to formally post here about it.

Originally developed out of a need to automate the boring and repetitive parts of building and packing my assessed coursework (ACWs) at University, the lantern build engine is my personal task automation system. It's written in 100% Bash, and allows tasks to be easily defined like so:

task_dostuff() {
    task_begin "Doing a thing";
    do_work;
    task_end "$?" "Oops, do_work failed!";

    task_begin "Doing another thing";
    do_hard_work;
    task_end "$?" "Yikes! do_hard_work failed.";
}

When the above task is run, Lantern will automatically detect the dustuff task, since it's a bash function that's prefixed with task_. The task_begin and task_end calls there are 2 other bash functions, which generate pretty output to inform the user that a task is starting or ending. The $? there grabs the exit code from the last command - and if it fails task_end will automatically display the provided error message.

Tasks are defined in a build.sh file, for which Lantern provides a template. Currently, the template file contains some additional logic such as the help text output if no tasks were specified - which is left-over from the time when Lantern was small enough to fit in the same file as the build tasks themselves.

I'm in the process of adding support for the all the logic in the template file, so that I can cut down on the extra boilerplate there even further. After defining your tasks in a copy of the template build file, it's really easy to call them:

./build dostuff

Of course, don't forget to mark the copy of the template file executable with chmod +x ./build.

The above initial example only scratches the surface of what Lantern can do though. It can easily check to see if a given command is installed with check_command:

task_go-to-the-moon() {
    task_begin "Checking requirements";
    check_command git true;
    check_command node true;
    check_command npm true;
    task_end 0;
}

If any of the check_command calls fail, then an error message is printed and the build terminated.

Work that needs doing in Lantern can be expressed with 3 levels of logical separation: stages, tasks, and subtasks:

task_build-rocket() {
    stage_begin "Preparation";

    task_begin "Gathering resources";
    gather_resources;
    task_end "$?" "Failed to gather resources";

    task_begin "Hiring engineers";
    hire_engineers;
    task_end "$?" "Failed to hire engineers";

    stage_end "$?";

    stage_begin "Building Rocket";
    build_rocket --size big --boosters 99;
    stage_end "$?";

    stage_begin "Launching rocket";
    task_begin "Preflight checks";
    subtask_begin "Checking fuel";
    check_fuel --level full;
    subtask_end "$?" "Error: The fuel tank isn't full!";
    subtask_begin "Loading snacks";
    load_items --type snacks --from warehouse;
    subtask_end "$?" "Error: Failed to load snacks!";
    task_end "$?";

    task_begin "Launching!";
    launch --countdown 10;
    task_end "$?";

    stage_end "$?";
}

Come to think about it, I should probably rename the function prefix from task to job. Stages, tasks, and subtasks each look different in the output - so it's down to personal preference as to which one you use and where. Subtasks in particular are best for commands that don't return any output.

Popular services such as [Travis CI]() have a thing where in the build transcript they display the versions of related programs to the build, like this:

$ uname -a
Linux MachineName 5.3.0-19-generic #20-Ubuntu SMP Fri Oct 18 09:04:39 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
$ node --version
v13.0.1
$ npm --version
6.12.1

Lantern provides support for this with the execute command. Prefixing commands with execute will cause them to be printed before being executed, just like the above:

task_prepare() {
    task_begin "Displaying environment details";
    execute uname -a;
    execute node --version;
    execute npm --version;
    task_end "$?";
}

As build tasks get more complicated, it's logical to split them up into multiple tasks that can be called independently and conditionally. Lantern makes this easy too:

task_build() {
    task_begin "Building";
    # Do build stuff here
    task_end "$?";
}
task_deploy() {
    task_begin "Deploying";
    # Do deploy stuff here
    task_end "$?";
}

task_all() {
    tasks_run build deploy;
}

The all task in the above runs both the build and deploy tasks. In fact, the template build script uses tasks_run at the very bottom to treat every argument passed to it as a task name, leading to the behaviour described above.

Lantern also provides an array of other useful functions to make expressing build sequences easy, concise, and readable - from custom colours to testing environment variables to see if they exist. It's all fully documented in the README of the project too.

As described 2 posts ago, the git-repo Laminar CI task (once it's spawned a hologram of itself) currently checks for the existence of a build or build.sh executable script in the root of the repository it is running on, and passes ci as the first and only argument.

This provides easy integration with Lantern, since Lantern build scripts can be called anything we like, and with a tasks_run call at the bottom as in the template file, we can simply define a ci Lantern task function that runs all our continuous integration jobs that we need to execute.

If you're interested in trying out Lantern for yourself, check out the repository!

https://gitlab.com/sbrl/lantern-build-engine#lantern-build-engine

Personally, I use it for everything from CI to rapid development environment setup.

This concludes my (epic) series about my git hosting and continuous integration. We've looked at git hosting, and taken a deep dive into integrating it into a continuous integration system, which we've augmented with a bunch of scripts of our own design. The system we've ended up with, while a lot of work to setup, is extremely flexible, allowing for modifications at will (for example, I have a webhook script that's similar to the git post-receive hook, but is designed to receive notifications from GitHub instead of Gitea and queue the git-repo just the same).

I'll post a series list post soon. After that, I might blog about my personal apt repository that I've setup, which is somewhat related to this.

Building Javascript (and other things) with Rollup

Hey, another blog post!

Recently I've been somewhat distracted by another project which has been most interesting. I've learnt a bunch of things (including getting started with LeafletJS and Chart.JS), but most notably I've been experimenting with another Javascript build system.

I've used quite a few build systems over the years. For the uninitiated, their primary purpose is to turn lots of separate source files into as few output files as possible. This is important with things that run in the browser, because the browser has to download everything before it can execute it. Fewer files and less data means a speedier application - so everyone wins.

I've been rather unhappy with the build systems I've used so far:

Gulp.js was really heavy and required a lot of configuration in order to get things moving.
Browserify was great, but I found the feature set lacking and ended up moving on because it couldn't do what I wanted it to.
Webpack was ok, but it also has a huge footprint and was super slow too.

One thing led to another, and after trying a few other systems (I can't remember their names, but one begin with P and has a taped-up cardboard box as a logo) I found Rollup (I think Webpack requires Rollup as a dependency under the hood, if I'm not mistaken).

Straight away, I found that Rollup was much faster than Webpack. It also requires minimal configuration (which doesn't need changing every time you create a new source file) - which allows me to focus on the code I'm writing instead of battling with the build system to get it to do what I want.

It also supports plugins - and with a bunch available on npm, there's no need to write your own plugin to perform most common tasks. For my recent project I linked to above, I was able to put a bunch of plugins together with a configuration file - and, with a bit of fiddling around, I've got it to build both my Javascript and my CSS (via a postcss integration plugin) and putt he result in a build output folder that I've added to my .gitignore file.

I ended up using the following plugins:

rollup-plugin-node-resolve - essential. Allows import statements to be automatically resolved to their respective files.
rollup-plugin-commonjs - An addition to the above, if I recall correctly
rollup-plugin-postcss - Provides integration with Postcss. I've found Postcss with be an effective solution to the awkward issue of CSS
rollup-plugin-terser - Uses Terser to minify modern Javascript. I didn't know this existed until I found the Rollup plugin - it's a fork of and the successor to UglifyJS2, which only supports ES5 syntax.

....and the following Postcss plugins:

postcss-import - Inlines @import statements into a single CSS source file, allowing you to split your CSS up into multiple files (finally! :P)
postcss-copy - after trying 5 different alternatives, I found this to be the most effective at automatically copying images etc. to the build output folder - potentially renaming them - and altering the corresponding references in the CSS source code

The definition of plugins is done in a Javascript configuration file in a array - making it clear that your code goes through a pipeline of plugins that transform step-by-step to form the final output - creating and maintaining a source map along the way (very handy for debugging code in the browser. It's basically a file that lets the browser reverse-engineer the final output to present you with the original source code again whilst debugging) - which works flawlessly in my browser (the same can't be said for Webpack - I had lots of issues with the generated source maps there).

I ended up with a config file like this:

(Can't see the file above? Try viewing it directly.)

In short, the pipeline goes something like this:

(Source file; Created in Draw.io)

Actually calling Rollup is done in a Bash-based general build task file. Rollup is installed locally to the project (npm install --save rollup), so I call it like this:

node_modules/.bin/rollup --sourcemap --config rollup.config.js;

I should probably make a blog post about that build task system I wrote and why/how I made it, but that's a topic for another time :P

I'll probably end up tweaking the setup I've described here a bit in the future, but for now it does everything I'd like it to :-)

Found this interesting? Got a different way of doing it? Still confused? Comment below!

Stardust Blog

Tag Cloud

Automatically organising & optimising photos and videos with Bash

Sources and further reading

Website change detection with headless Firefox and ImageMagick

Own your code, part 6: The Lantern Build Engine

Building Javascript (and other things) with Rollup

Stardust
Blog