Stardust | Starbeamrainbowlabs

Powahroot: Client and Server-side routing in Javascript

If I want to really understand something, I usually end up implementing it myself. This is the case with my latest library - powahroot, but also because I didn't really like the way any of the alternatives functioned because I'm picky.

Originally I wrote it for this project (although it's actually for a little satellite project that isn't open-source unfortunately - maybe at some point in the future!) - but I liked it so much that I decided that I had to turn it into a full library that I could share here.

In short, a routing framework helps you get requests handled in the right places in your application. I've actually blogged about this before, so I'd recommend you go and read that post first before continuing with this one.

For all the similarities between the server side (as mentioned in my earlier post) and the client side, the 2 environments are different enough that they warrant having 2 distinctly separate routers. In powahroot, I provide both a ServerRouter and a ClientRouter.

The ServerRouter is designed to handle Node.js HTTP request and response objects. It provides shortcut methods .get(), .post(), and others to quickly create routes for different request types - and also supports middleware to enable logical separation of authentication, request processing, and response generation.

The ClientRouter, on the other hand, is essentially a stripped-down version of the ServerRouter that's tailored to functioning in a browser environment. It doesn't support middleware (yet?), but it does support the pushstate that's part of the History API.

I've also published it on npm, so you can install it like this:

npm install --save powahroot

Then you can use it like this:

# On the server
import ServerRouter from 'powahroot/Server.mjs';

// ....

const router = new ServerRouter();
router.on_all(async (context, next) => { console.debug(context.url); await next()})
router.get("/files/::filepath", (context, _next) => context.send.plain(200, `You requested ${context.params.filepath}`));
// .....

# On the client
import ClientRouter from 'powahroot/Client.mjs';

// ....

const router = new ClientRouter({
    // Options object. Default settings:
    verbose: false, // Whether to be verbose in console.log() messages
    listen_pushstate: true, // Whether to react to browser pushstate events (excluding those generated by powahroot itself, because that would cause an infinite loop :P)
});

As you can see, powahroot uses ES6 Modules, which makes it easy to split up your code into separate independently-operating sections.

In addition, I've also generated some documentation with the documentation tool on npm. It details the API available to you, and should serve as a good reference when using the library.

You can find that here: https://starbeamrainbowlabs.com/code/powahroot/docs/

It's automatically updated via continuous integration and continuous deployment, which I really do need to get around to blogging about (I've spent a significant amount of time setting up the base system upon which powahroot's CI and CD works. In short I use Laminar CI and a GitHub Webhook, but there's a lot of complicated details).

Found this interesting? Used it in your own project? Got an idea to improve powahroot? Comment below!

Easy Ansi Escape Code in C♯ and Javascript

I've been really rather ill over the weekend, so I wasn't able to write a post when I wanted to. I'm starting to recover now though, so I thought I'd write a quick post about a class I've written in C♯ (and later ported to Javascript via an ES6 Module) that makes using VT100 Ansi escape codes easy.

Such codes are really useful for changing the text colour & background in a terminal or console, and for moving the cursor around to draw fancy UI elements with just text.

I initially wrote it a few months ago as part of an ACW (Assessed CourseWork), but out of an abundance of caution I've held back on open-sourcing this particular code fragment that drove the code I submitted for that assessment.

Now that the assessment is done & marked though (and I've used this class in a few projects since), I feel comfortable with publishing the code I've written publically. Here it is in C♯:

(Can't see the above? Try this direct link instead)

In short, it's a static class that contains a bunch of Ansi escape codes that you can include in strings that you send out via Console.WriteLine() - no special fiddling required! Though, if you're on Windows, don't forget you need to enable escape sequences first. Here's an example that uses the C♯ class:

// ....
catch(Exception error) {
    Console.WriteLine($"{Ansi.FRed}Error: Oops! Something went wrong.");
    Console.WriteLine($"    {error.Message}{Ansi.Reset}");
    return;
}
// ....

Of course, you can get as elaborate as you like. In addition, if you need to disable all escape sequence output for some reason (e.g. you know you're writing to a log file), simply set Ansi.Enabled to false.

Some time later, I found myself needing this class in Javascript (for a Node.js application I'm pretty sure) quite badly (reason - I might blog about it soon :P). To that end, I wound up porting it from C♯ to Javascript. The process wasn't actually that painful - probably because it's a fairly small and simple class.

With porting complete, I intend to keep both versions in sync with each other as I add more features - but no promises :P

Here's the Javascript version:

(Can't see the above? Try this direct link instead)

The Javascript version isn't actually a static class as like the C♯ version, due to the way ES6 modules work. In the future, I'd like to do some science and properly figure out the ins and outs of ES6 modules and refactor this into the JS equivalent of a globally static class (or a Singleton, though in JS we don't have to worry about thread safety 'cause the only way to communicate between threads (also) in JS is through messaging).

Still, it works well enough for my purposes for now.

Found this interesting? Got an improvement? Just want to say hi? Comment below!

Building Javascript (and other things) with Rollup

Hey, another blog post!

Recently I've been somewhat distracted by another project which has been most interesting. I've learnt a bunch of things (including getting started with LeafletJS and Chart.JS), but most notably I've been experimenting with another Javascript build system.

I've used quite a few build systems over the years. For the uninitiated, their primary purpose is to turn lots of separate source files into as few output files as possible. This is important with things that run in the browser, because the browser has to download everything before it can execute it. Fewer files and less data means a speedier application - so everyone wins.

I've been rather unhappy with the build systems I've used so far:

Gulp.js was really heavy and required a lot of configuration in order to get things moving.
Browserify was great, but I found the feature set lacking and ended up moving on because it couldn't do what I wanted it to.
Webpack was ok, but it also has a huge footprint and was super slow too.

One thing led to another, and after trying a few other systems (I can't remember their names, but one begin with P and has a taped-up cardboard box as a logo) I found Rollup (I think Webpack requires Rollup as a dependency under the hood, if I'm not mistaken).

Straight away, I found that Rollup was much faster than Webpack. It also requires minimal configuration (which doesn't need changing every time you create a new source file) - which allows me to focus on the code I'm writing instead of battling with the build system to get it to do what I want.

It also supports plugins - and with a bunch available on npm, there's no need to write your own plugin to perform most common tasks. For my recent project I linked to above, I was able to put a bunch of plugins together with a configuration file - and, with a bit of fiddling around, I've got it to build both my Javascript and my CSS (via a postcss integration plugin) and putt he result in a build output folder that I've added to my .gitignore file.

I ended up using the following plugins:

rollup-plugin-node-resolve - essential. Allows import statements to be automatically resolved to their respective files.
rollup-plugin-commonjs - An addition to the above, if I recall correctly
rollup-plugin-postcss - Provides integration with Postcss. I've found Postcss with be an effective solution to the awkward issue of CSS
rollup-plugin-terser - Uses Terser to minify modern Javascript. I didn't know this existed until I found the Rollup plugin - it's a fork of and the successor to UglifyJS2, which only supports ES5 syntax.

....and the following Postcss plugins:

postcss-import - Inlines @import statements into a single CSS source file, allowing you to split your CSS up into multiple files (finally! :P)
postcss-copy - after trying 5 different alternatives, I found this to be the most effective at automatically copying images etc. to the build output folder - potentially renaming them - and altering the corresponding references in the CSS source code

The definition of plugins is done in a Javascript configuration file in a array - making it clear that your code goes through a pipeline of plugins that transform step-by-step to form the final output - creating and maintaining a source map along the way (very handy for debugging code in the browser. It's basically a file that lets the browser reverse-engineer the final output to present you with the original source code again whilst debugging) - which works flawlessly in my browser (the same can't be said for Webpack - I had lots of issues with the generated source maps there).

I ended up with a config file like this:

(Can't see the file above? Try viewing it directly.)

In short, the pipeline goes something like this:

(Source file; Created in Draw.io)

Actually calling Rollup is done in a Bash-based general build task file. Rollup is installed locally to the project (npm install --save rollup), so I call it like this:

node_modules/.bin/rollup --sourcemap --config rollup.config.js;

I should probably make a blog post about that build task system I wrote and why/how I made it, but that's a topic for another time :P

I'll probably end up tweaking the setup I've described here a bit in the future, but for now it does everything I'd like it to :-)

Found this interesting? Got a different way of doing it? Still confused? Comment below!

Generating word searches for fun and profit

(Above: A Word search generated with the tool below)

A little while ago I was asked about generating a wordsearch in a custom shape. I thought to myself "someone has to have built this before...", and while I was right to an extent, I couldn't find one that let you use any shape you liked.

This, of course, was unacceptable! You've probably guessed it by now, but I went ahead and wrote my own :P

While I wrote it a little while ago, I apparently never got around to posting about it on here.

In short, it works by using an image you drop into the designated area on the page as the shape the word search should take. Each pixel is a single cell of the word search - with the alpha channel representing whether or not a character is allowed to be placed there (transparent means that it can't contain a character, and opaque means that it can).

Creating such an image is simple. Personally, I recommend Piskel or GIMP for this purpose.

Once done, you can start building a wordlist in the wordlist box at the right-hand-side. It should rebuild the word search as soon as you click out of the box. If it doesn't, then you've found a bug! Please report it here.

With the word search generated, you can use the Question Sheet and Answer Sheet links to open printable versions for export.

You can find my word search generator here:

WordsearchPlus

I've generated a word search of the current tags in the tag cloud on this blog too: Question Sheet [50.3KiB], Answer Sheet [285.6KiB]

The most complicated part of this was probably the logistics behind rude word removal. Thankfully, I did't have to find and maintain such a list of words, as the futility npm package does this for me, but algorithmically guaranteeing that by censoring 1 rude word another is not accidentally created in another direction is a nasty problem.

If you're interested in a more technical breakdown of one (or several!) particular aspects of this - let me know! While writing about all of it would probably make for an awfully long post, a specific aspect or two should be more manageable.

In the future, I'll probably revisit this and add additional features to it, such as the ability to restrict which directions words are placed in, for example. If you've got a suggestion of your own, open an issue (or even better, open a pull request :D)!

Converting my timetable to ical with Node.JS and Nightmare

(Source: Taken by me!)

My University timetable is a nightmare. I either have to use a terrible custom app for my phone, or an awkwardly-built website that feels like it's at least 10 years old!

Thankfully, it's not all doom and gloom. For a number of years now, I've been maintaining a Node.JS-based converter script that automatically pulls said timetable down from the JSON backend of the app - thanks to a friend who reverse-engineered said app. It then exports it as a .ical file that I can upload to my server & subscribe to in my pre-existing calendar.

Unfortunately, said backend changed quite dramatically recently, and broke my script. With the only alternative being the annoying timetable website that really don't like being scraped.

Where there's a will, there's a way though. Not to be deterred, I gave it a nightmare of my own: a scraper written with Nightmare.JS - a Node.JS library that acts, essentially, as a scriptable web-browser!

While the library has some kinks (especially with .wait("selector")), it worked well enough for me to implement a scraper that pulled down my timetable in HTML form, which I then proceeded to parse with cheerio.

The code is open-source (find it here!) - and as of this week I've updated it to work with the new update to the timetabling system this semester. A further update will be needed in early December time, which I'll also be pushing to the repository.

The README of the repository should contain adequate instructions for getting it running yourself, but if not, please open an issue!

Note that I am not responsible for anything that happens as a result of using this script! I would strongly recommend setting up the secure storage of your password if you intend to automate it. I've just written this to solve a problem in order to ensure that I can actually get to my lectures on time - and not an hour late or on the wrong week because I've misread the timetable (again)!

In the future, I'd like to experiment with other scriptable web-browser frameworks to compare them with my experiences with NightmareJS.

Found this interesting? Found a better way to do this? Comment below!

What I've learnt from #LOWREZJAM 2018

(Above: The official LOWREZJAM 2018 logo)

While I didn't manage to finish my submission in time for #LOWREZJAM, I've had ton of fun building it - and learnt a lot too! This post is a summary of my thoughts and the things I've learnt.

Firstly, completing a game for a game jam requires experience - and preparation - I think. As this was my first game jam, I didn't know how I should prepare for such a thing in order to finish my submission successfully - in subsequent jams I now know what it is that I need to prepare in advance. This mainly includes learning the technology(ies) and libraries I intend to use.

(Sorry, but this image is © Starbeamrainbowlabs 2018)

This time around, I went with building my own game engine. While this added a lot of extra work (which probably wasn't helpful) and contributed to the architectural issues that prevented me from finishing my submission in time, I found it a valuable experience in learning how a game engine is put together and how it works. This information, I feel, will help inform my choice in which other game engine I learn next. I've got my eye on Crafty.JS, Phaser (v3 vs the community edition, anyone? I'm confused), and perhaps MelonJS - your thoughts on the matter are greatly appreciated! I may blog my choice after exploring the available options at a later date.

I've also got some ideas as to how to improve my Sprite Packer, which I wrote a while ago. While it proved very useful in the jam, I think it could benefit from some tweaks to the algorithm to ensure it packs things in neater than it currently does.

I'd also like to investigate other things such as QuadTrees (used to quickly search an area for nearby objects - and also to compress images, apparently) and Particle Systems, as they look really interesting.

An visualisation of an example QuadTree from the page I linked to above.

I'd also like to talk a little about my motivation behind participating in this game jam. For me, it's a couple of reasons. Not only does it help me figure out what I need to work on (see above), but it's always been a personal dream of mine to make a game that tells a story. I find games to be an art form for communicating ideas, stories, and emotions in a interactive way that simply isn't possible with other media formats, such as films, for instance (films are great too though).

While I'll never be able to make a game of triple-A quality on my own, many developers have shown time and again that you don't need an army of game developers, artists, musicians, and more to make a great game.

As my art-related skills aren't amazing, I decided on participating in #LOWREZJAM, because lower-resolution sprites are much easier to draw (though I've had lots of help from the same artist who helped me design my website here). Coupled with the 2-week timespan (I can't even begin to fathom building a game in 4 hours - let alone the 24 hours from 3 Thing Game!), it was the perfect opportunity to participate and improve my skills. While I haven't managed to complete my entry this year, I'll certainly be back again for more next year!

(Sorry, but this image is © Starbeamrainbowlabs 2018)

Found this interesting? Had your own experience with a game jam? Comment below!

Acorn Validator

Edit: Corrected title and a bunch of grammatical mistakes. I typed this out on my phone with Monospace - and I seems that my keyboard (and Phone-Laptop Bluetooth connection!) leave something to be desired.....

Over the last week, I've been hard at work on an entry for #LOWREZJAM. While it's not finished yet (submission is on Thursday), I've found some time to write up a quick blog post about a particular aspect of it. Of course, I'll be blogging about the rest of it later once it's finished :D

The history of my entry is somewhat.... complicated. Originally, I started work on it a few months back as an independent project, but due to time constraints and other issues I was unable to get very far with it. Once I discovered #LOWREZJAM, I made the decision to throw away the code I had written and start again from (almost) scratch.

It is for this reason that I have a Javascript validator script lying around. I discovered when writing it originally that my editor Atom didn't have syntax validation support for Javascript. While there are extensions that do the job, it looked like a complicated business setting one up for just syntax checking (I don't want your code style guideline suggestions! I have my own style!). To this end, I wrote myself a quick bash script to automatically check the syntax of all my javascript files that I can then include as a build step - just before webpack.

Over the time I've been working on my #LOWREZJAM entry here, I've been tweaking and improving it - and thought I'd share it here. In the future, I'm considering turning it into a linting provider for the Atom editor I mentioned above (it depends on how complicated that is, and how good the documentation is to help me understand the process).

The script (which can be found in full at the bottom of this post), has 2 modes of operation. In the first mode, it acts as a co-ordinator process that starts all the sub-processes that validate the javascript. In the second mode, it validates a single file - outputting the result to the standard output and also logging any errors in a special place that the co-ordinator process can find them later.

It decides which mode to operate in based on whether it recieves an argument telling it which file to validate:

if [ "$1" != "" ]; then
    validate_file "$1" "$2";
    exit $?;
fi

If it detects an argument, then it calls the validate_file function and exits with the returned exit code.

If not, then the script continues into co-ordinator mode. In this mode it chains a bunch of commands together like lego bricks to start subprocesses to validate all of the javascript files it can find a fast as possible - acorn, the validator itself, can only check one file as a time it would appear. It does this like so:

find . -not -path "./node_modules/" -not -path "./dist/" | grep -iP '\.mjs$' | xargs -n1 -I{} -P32 $0 "{}" "${counter_dirname}";

This looks complicated, but it can be broken down into smaller, easy-to-understand chunks. explainshell.com is rather good at demonstrating this. Particularly of note here is the $0. This variable holds the path to the currently executing script - allowing the co-ordinator to call itself in validator mode.

The validator function itself is also quite simple. In short, it runs the validator, storing the result in a variable. It then also saves the exit code for later analysis. Once done, it outputs to the standard output, and then also outputs the validator's output - but only is there was an error to keep things neat and tidy. Finally, if there was an error, it outputs a file to a temporary directory (whose name is determined by the co-ordinator and passed to sub-processes via the 2nd argument) with a name of its PID (the content doesn't matter - I just output 1, but anything will do). This allows the co-ordinator to count the number of errors that the subprocesses encounter, without having to deal with complicated locks arising from updating the value stored in a single file. Here's that in bash:

counter_dirname=$2;

# ....... 

# Use /dev/shm here since apparently while is in a subshell, so it can't modify variables in the main program O.o
    if ! [ "${validate_exit_code}" -eq 0 ]; then
        echo 1 &gt;"${counter_dirname}/$$";
    fi

Once all the subprocesses have finished up, the co-ordinator counts up all the errors and outputs the total at the bottom:

error_count=$(ls ${counter_dirname} | wc -l);

echo 
echo Errors: $error_count
echo

Finally, the co-ordinator cleans up after the subprocesses, and exits with the appropriate error code. This last bit is important for automation, as a non-zero exit code tells the patent process that it failed. My build script (which uses my lantern build engine, which deserves a post of its own) picks up on this and halts the build if any errors were found.

rm -rf "${counter_dirname}";

if [ ${error_count} -ne 0 ]; then
    exit 1;
fi

exit 0;

That's about all there is to it! The complete code can be found at the bottom of this post. To use it, you'll need to run npm install acorn in the directory that you save it to.

I've done my best to optimize it - it can process a dozen or so files in ~1 second - but I think I can do much better if I rewrite it in Node.JS - as I can eliminate the subprocesses by calling the acorn API directly (it's a Node.JS library), rather than spawning many subprocesses via the CLI.

Found this useful? Got a better solution? Comment below!

#!/usr/bin/env sh

validate_file() {
    filename=$1;
    counter_dirname=$2;

    validate_result=$(node_modules/.bin/acorn --silent --allow-hash-bang --ecma9 --module $filename 2&gt;&amp;1);
    validate_exit_code=$?;
    validate_output=$([ ${validate_exit_code} -eq 0 ] &amp;&amp; echo ok || echo ${validate_result});
    echo "${filename}: ${validate_output}";
    # Use /dev/shm here since apparently while is in a subshell, so it can't modify variables in the main program O.o
    if ! [ "${validate_exit_code}" -eq 0 ]; then
        echo 1 &gt;"${counter_dirname}/$$";
    fi
}

if [ "$1" != "" ]; then
    validate_file "$1" "$2";
    exit $?;
fi

counter_dirname=$(mktemp -d -p /dev/shm/ -t acorn-validator.XXXXXXXXX.tmp);
# Parallelisation trick from https://stackoverflow.com/a/33058618/1460422
# Updated to use xargs
find . -not -path "./node_modules/" -not -path "./dist/" | grep -iP '\.mjs$' | xargs -n1 -I{} -P32 $0 "{}" "${counter_dirname}";

error_count=$(ls ${counter_dirname} | wc -l);

echo 
echo Errors: $error_count
echo 

rm -rf "${counter_dirname}";

if [ ${error_count} -ne 0 ]; then
    exit 1;
fi

exit 0;

Routers: Essential, everywhere, and yet exasperatingly elusive

Now that I've finished my University work for the semester (though do have a few loose ends left to tie up), I've got some time on my hands to do a bunch of experimenting that I haven't had the time for earlier in the year.

In this case, it's been tracking down an HTTP router that I used a few years ago. I've experimented with a few now (find-my-way, micro-http-router, and rill) - but all of them few something wrong with them, or feel too opinionated for my taste.

I'm getting slightly ahead of myself though. What's this router you speak of, and why is it so important? Well, it call comes down to application design. When using PHP, you can, to some extent, split your application up by having multiple files (though I would recommend filtering everything through a master index.php). In Node.JS, which I've been playing around with again recently, that's not really possible.

A comparison of the way PHP and Node.JS applications are structured. See the explanation below.

Unlike PHP, which gets requests handed to it from a web server like Nginx via CGI (Common Gateway Interface), Node.JS is the server. You can set up your very own HTTP server listening on port 9898 like this:

import http from 'http';
const http_server = http.createServer((request, response) => {
    response.writeHead(200, {
        "x-custom-header": "yay"
    });
    response.end("Hello, world!");
}).listen(9898, () => console.log("Listening on pot 9898"));

This poses a problem. How do we know what the client requested? Well, there's the request object for that - and I'm sure you can guess what the response object is for - but the other question that remains is how to we figure out which bit of code to call to send the client the correct response?

That's where a request router comes in handy. They come in all shapes and sizes - ranging from a bare-bones router to a full-scale framework - but their basic function is the same: to route a client's request to the right place. For example, a router might work a little bit like this:

import http from 'http';
import Router from 'my-awesome-router-library';

// ....

const router = new Router();

router.get("/login", route_login);
router.put("/inbox/:username", route_user_inbox_put);

const http_server = http.createServer(router.handler()).listen(20202);

Pretty simple, right? This way, every route can lead to a different function, and each of those functions can be in a separate file! Very cool. It all makes for a nice and neat way to structure one's application, preventing any issues relating to any one file getting too big - whilst simultaneously keeping everything orderly and in its own place.

Except when you're picky like me and you can't find a router you like, of course. I've got some pretty specific requirements. For one, I want something flexible and unopinionated enough that I can do my own thing without it getting in the way. For another, I'd like first-class support for middleware.

What's middleware you ask? Well, I've only just discovered it recently, but I can already tell that's its a very powerful method of structuring more complex applications - and devastatingly dangerous if used incorrectly (the spaghetti is real).

Basically, the endpoint of a route might parse some data that a client has sent it, and maybe authenticate the request against a backend. Perhaps a specific environment needs to be set up in order for a request to be fulfilled.

While we could do these things in the end route, it would clutter up the code in the end route, and we'd likely have more boilerplate, parsing, and environment setup code than we have actual application logic! The solution here is middleware. Think of it as an onion, with the final route application logic in the middle, and the parsing, logging, and error handling code as the layers on the outside.

A diagram visualising an application with 3 layers of middleware: an error handler, a logger, and a data parser - with the application logic in the middle. Arrows show that a request makes its way through these layers of middleware both on the way in, and the way out.

In order to reach the application logic at the centre, an incoming request must first make its way through all the layers of middleware that are in the way. Similarly, it must also churn through the layers of middleware in order to get out again. We could represent this in code like so:

// Middleware that runs for every request
router.use(middleware_error_handler);
router.use(middleware_request_logger);

// Decode all post data with middleware
// This won't run for GET / HEAD / PUT / etc. requests - only POST requests
router.post(middleware_decode_post_data);

// For GET requestsin under `/inbox`, run some middleware
router.get("/inbox", middleware_setup_user_area);

// Endpoint routes
// These function just like middleware too (i.e. we could
// pass the request through to another layer if we wanted
// to), but that don't lead anywhere else, so it's probably
// better if we keep them separate
router.get("/inbox/:username", route_user_inbox);
router.any("/honeypot", route_spambot_trap);

router.get("/login", route_display_login_page);
router.post("/login", route_do_login);

Quite a neat way of looking at it, right? Lets take a look at some example middleware for our fictional router:

async function middleware_catch_errors(context, next) {
    try {
        await next();
    } catch(error) {
        console.error(error.stack);
        context.response.writeHead(503, {
            "content-type": "text/plain"
        });
        // todo make this fancier
        context.response.end("Ouch! The server encountered an error and couldn't respond to your request. Please contact bob at [email protected]!");
    }
}

See that next() call there? That function call there causes the application to enter the next layer of middleware. We can have as many of these layers as we like - but don't go crazy! It'll cause you problems later whilst debugging.....

What I've shown here is actually very similar to the rill framework - it just has a bunch of extras tagged on that I don't like - along with some annoying limitations when it comes to defining routes.

To that end, I think I'll end up writing my own router, since none of the ones I've found will do the job just right. It kinda fits with the spirit of the project that this is for, too - teaching myself new things that I didn't know before.

If you're curious as to how a Node.JS application is going to fit in with a custom HTTP + WebSockets server written in C♯, then the answer is a user management panel. I'm not totally sure where this is going myself - I'll see where I end up! After all, with my track record, you're bound to find another post or three showing up on here again some time soon.

Until then, Goodnight!

Found this useful? Still got questions? Comment below!

Distributing work with Node.js

A graph of the data I generated by writing the scripts I talk about in this post. (Above: A pair of graphs generated with gnuplot from the data I crunched with the scripts I talk about in this blog post. Anti-aliased version - easier to pick details out [928.1 KiB])

I really like Node.js. For those not in the know, it's basically Javascript for servers - and it's brilliant at networking. Like really really good. Like C♯-beating good. Anyway, last week I had a 2-layer neural network that I wanted to simulate all the different combinations from 1-64 nodes in both layers for, as I wanted to generate a 3-dimensional surface graph of the error.

Since my neural network (which is also written in Node.js :P) has a command-line interface, I wrote a simple shell script to drive it in parallel, and set it going on a Raspberry Pi I have acting as a file server (it doesn't do much else most of the time). After doing some calculations, I determined that it would finish at 6:40am Thursday..... next week!

Of course, taking so long is no good at all if you need it done Thursday this week - so I set about writing a script that would parallelise it over the network. In the end I didn't actually include the data generated in my report for which I had the Thursday deadline, but it was a cool challenge nonetheless!

Server

To start with, I created a server script that would allocate work items, called nodecount-surface-server.js. The first job was to set things up and create a quick settings object and a work item generator:

#!/usr/bin/env node
// ^----| Shebang to make executing it on Linux easier

const http = require("http"); // We'll need this later

const settings = {
    port: 32000,
    min: 1,
    max: 64,
};
settings.start = [settings.min, settings.min];

function* work_items() {
    for(let a = settings.start[0]; a < settings.max; a++) {
        for(let b = settings.start[1]; b < settings.max; b++) {
            yield [a, b];
        }
    }
}

That function* is a generator. C♯ has them too - and they let a function return more than one item in an orderly fashion. In my case, it returns arrays of numbers which I use as the topology for my neural networks:

[1, 1]
[1, 2]
[1, 3]
[1, 4]
....

Next, I wrote the server itself. Since it was just a temporary script that was running on my local network, I didn't implement too many security measures - please bear this in mind if using or adapting it yourself!


function calculate_progress(work_item) {
    let i = (work_item[0]-1)*64 + (work_item[1]-1), max = settings.max * settings.max;
    return `${i} / ${max} ${(i/max*100).toFixed(2)}%`;
}

var work_generator = work_items();

const server = http.createServer((request, response) => {
    switch(request.method) {
        case "GET":
            let next = work_generator.next();
            let next_item = next.value;
            if(next.done)
                break;
            response.write(next_item.join("\t"));
            console.error(`[allocation] [${calculate_progress(next_item)}] ${next_item}`);
            break;
        case "POST":
            var body = "";
            request.on("data", (data) => body += data);
            request.on("end", () => {
                console.log(body);
                console.error(`[complete] ${body}`);
            })
            break;
    }
    response.end();
});
server.on("clientError", (error, socket) => {
    socket.end("HTTP/1.1 400 Bad Request");
});
server.listen(settings.port, () => { console.error(`Listening on ${settings.port}`); });

Basically, the server accepts 2 types of requests:

GET requests, which ask for work
POST requests, which respond with the results of a work item

In my case, I send out work items like this:

11  24

...and will be receiving work results like this:

11  24  0.2497276811644629

This means that I don't even need to keep track of which work item I'm receiving a result for! If I did though, I'd probably having some kind of ID-based system with a list of allocated work items which I could refer back to - and periodically iterate over to identify any items that got lost somewhere so I can add them to a reallocation queue.

With that, the server was complete. It outputs the completed work item results to the standard output, and progress information to the standard error. This allows me to invoke it like this:

node ./nodecount-surface-server.js >results.tsv

Worker

Very cool. A server isn't much good without an army of workers ready and waiting to tear through the work items it's serving at breakneck speed though - and that's where the worker comes in. I started writing it in much the same way I did the server:

#!/usr/bin/env node
// ^----| Another shebang, just like the server

const http = require("http"); // We'll need this to talk to the server later
const child_process = require("child_process"); // This is used to spawn the neural network subprocess

const settings = {
    server: { host: "172.16.230.58", port: 32000 },
    worker_command: "./network.js --epochs 1000 --learning-rate 0.2 --topology {topology} <datasets/acw-2-set-10.txt 2>/dev/null"
};

That worker_command there in the settings object is the command I used to execute the neural network, with a placeholder {topology} which we find-and-replace just before execution. Due to obvious reasons (no plagiarism thanks!) I can't release that script itself, but it's not necessary to understand how the distributed work item systme I've written works. It could just as well be any other command you like!

Next up is the work item executor itself. Since it obviously takes time to execute a work item (why else would I go to such lengths to process as many of them at once as possible :P), I take a callback as the 2nd argument (it's just like a delegate or Action in C♯):


function execute_item(data, callback) {
    let command = settings.worker_command.replace("{topology}", data.join(","));
    console.log(`[execute] ${command}`);
    let network_process = child_process.exec(command, (error, stdout, stderr) =>  {
        console.log(`[done] ${stdout.trim()}`);
        let result = stdout.trim().split(/\t|,/g);
        let payload = `${result[0]}\t${result[1]}\t${result[5]}`;

        let request = http.request({
            hostname: settings.server.host,
            port: settings.server.port,
            path: "/",
            method: "POST",
            headers: {
                "content-length": payload.length
            }
        }, (response) => {
            console.log(`[submitted] ${payload}`);
            callback();
        });
        request.write(payload);
        request.end();
    });
}

In the above I substitute in the work item array as a comma-separated list, execute the command as a subprocess, report the result back to the server, and then call the callback. To report the result back I use the http module built-in to Node.JS, but if I were tidy this up I would probably use an npm package like got instead, as it simplifies the code a lot and provides more features / better error handling / etc.

A work item executor is no good without any work to do, so that's what I tackled next. I wrote another function that fetches work items from the server and executes them - wrapping the whole thing in a Promise to make looping it easier later:


function do_work() {
    return new Promise(function(resolve, reject) {
        let request = http.request({
            hostname: settings.server.host,
            port: settings.server.port,
            path: "/",
            method: "GET"
        }, (response) => {
            var body = "";
            response.on("data", (chunk) => body += chunk);
            response.on("end", () => {
                if(body.trim().length == 0) {
                    console.error(`No work item received. We're done!`);
                    process.exit();
                }
                let work_item = body.split(/\s+/).map((item) => parseInt(item.trim()));
                console.log(`[work item] ${work_item}`);
                execute_item(work_item, resolve);
            });
        });
        request.end();
    });
}

Awesome! It's really coming together. Doing just one work item isn't good enough though, so I took it to the next level:

function* do_lots_of_work() {
    while(true) {
        yield do_work();
    }
}

// From https://starbeamrainbowlabs.com/blog/article.php?article=posts/087-Advanced-Generators.html
function run_generator(g) {
    var it = g(), ret;

    (function iterate() {
        ret = it.next();
        ret.value.then(iterate);
    })();
}

run_generator(do_lots_of_work);

Much better. That completed the worker script - so all that remained was to set it going on as many machines as I could get my hands on, sit back, and watch it go :D

I did have some trouble with crashes at the end because there was no work left for them to do, but it didn't take (much) fiddling to figure out where the problem(s) lay.

Each instance of the worker script can max out a single core of a machine, so multiple instances of the worker script are needed per machine in order to fully utilise a single machine's resources. If I ever need to do this again, I'll probably make use of the built-in cluster module to simplify it such that I only need to start a single instance of the worker script per machine instance of 1 for each core.

Come to think of it, it would have looked really cool if I'd done it at University and employed a whole row of machines in a deserted lab doing the crunching - especially since it was for my report....

Liked this post? Got an improvement? Comment below!

Finding the distance to a (finite) line from a point in Javascript

A screenshot of the library I've written. Explanation below.

For a project of mine (which I might post about once it's more stable), I'm going to need a way to find the distance to a point from the mouse cursor to implement an eraser. I've attempted this problem before - but it didn't exactly go to plan. To that end, I decided to implement the algorithm on its own to start with - so that I could debug it properly without all the (numerous) moving parts of the project I'm writing it for getting in the way.

As you may have guessed since you're reading this post, it actually went rather well! Using the C++ implementation on this page as a reference, it didn't take more than an hour or two to get a reasonable implementation working - and it didn't take a huge amount of time to tidy it up into an npm package for everyone to use!

My implementation uses ES6 Modules - so you may need to enable them in about:config or chrome://flags if you haven't already (don't believe the pages online that say you need Firefox / Chrome nightly - it's available in stable, just disabled by default) before taking a look at the demo, which you can find here:

Line Distance Calculator

(Click and drag to draw a line - your distance from it is shown in the top left)

The code behind it is actually quite simple - just rather full of nasty maths that will give you a headache if you try and understand it all at once (I broke it down, which helped). The library exposes multiple methods to detect a point's distance from different kinds of line - one for multi-segmented lines (which I needed in the first place), one for a single (finite) line (which the multi-segmented line employs), and one for a single infinite line - which I implemented first, using this Wikipedia article - before finding that it was buggy because it was for an infinite line (even though the article's name is apparently correct)!

I've written up a usage guide if you're interested in playing around with it yourself.

I've also got another library that I've released recently (also for Nibriboard) that simplifies multi-segmented lines instead of finding the distance to them, which I may post about about soon too!

Update: Looks like I forgot that I've already posted about the other library! You can read about it here: Line Simplification: Visvalingam's Algorithm

Got a question? Wondering why I've gone to the trouble of implementing such an algorithm? Comment below - I'd love to hear your thoughts!

Stardust Blog

Tag Cloud