Distributing work with Node.js
(Above: A pair of graphs generated with gnuplot
from the data I crunched with the scripts I talk about in this blog post. Anti-aliased version - easier to pick details out [928.1 KiB])
I really like Node.js. For those not in the know, it's basically Javascript for servers - and it's brilliant at networking. Like really really good. Like C♯-beating good. Anyway, last week I had a 2-layer neural network that I wanted to simulate all the different combinations from 1-64 nodes in both layers for, as I wanted to generate a 3-dimensional surface graph of the error.
Since my neural network (which is also written in Node.js :P) has a command-line interface, I wrote a simple shell script to drive it in parallel, and set it going on a Raspberry Pi I have acting as a file server (it doesn't do much else most of the time). After doing some calculations, I determined that it would finish at 6:40am Thursday..... next week!
Of course, taking so long is no good at all if you need it done Thursday this week - so I set about writing a script that would parallelise it over the network. In the end I didn't actually include the data generated in my report for which I had the Thursday deadline, but it was a cool challenge nonetheless!
Server
To start with, I created a server script that would allocate work items, called nodecount-surface-server.js
. The first job was to set things up and create a quick settings object and a work item generator:
#!/usr/bin/env node
// ^----| Shebang to make executing it on Linux easier
const http = require("http"); // We'll need this later
const settings = {
port: 32000,
min: 1,
max: 64,
};
settings.start = [settings.min, settings.min];
function* work_items() {
for(let a = settings.start[0]; a < settings.max; a++) {
for(let b = settings.start[1]; b < settings.max; b++) {
yield [a, b];
}
}
}
That function*
is a generator. C♯ has them too - and they let a function return more than one item in an orderly fashion. In my case, it returns arrays of numbers which I use as the topology for my neural networks:
[1, 1]
[1, 2]
[1, 3]
[1, 4]
....
Next, I wrote the server itself. Since it was just a temporary script that was running on my local network, I didn't implement too many security measures - please bear this in mind if using or adapting it yourself!
function calculate_progress(work_item) {
let i = (work_item[0]-1)*64 + (work_item[1]-1), max = settings.max * settings.max;
return `${i} / ${max} ${(i/max*100).toFixed(2)}%`;
}
var work_generator = work_items();
const server = http.createServer((request, response) => {
switch(request.method) {
case "GET":
let next = work_generator.next();
let next_item = next.value;
if(next.done)
break;
response.write(next_item.join("\t"));
console.error(`[allocation] [${calculate_progress(next_item)}] ${next_item}`);
break;
case "POST":
var body = "";
request.on("data", (data) => body += data);
request.on("end", () => {
console.log(body);
console.error(`[complete] ${body}`);
})
break;
}
response.end();
});
server.on("clientError", (error, socket) => {
socket.end("HTTP/1.1 400 Bad Request");
});
server.listen(settings.port, () => { console.error(`Listening on ${settings.port}`); });
Basically, the server accepts 2 types of requests:
GET
requests, which ask for work
POST
requests, which respond with the results of a work item
In my case, I send out work items like this:
11 24
...and will be receiving work results like this:
11 24 0.2497276811644629
This means that I don't even need to keep track of which work item I'm receiving a result for! If I did though, I'd probably having some kind of ID-based system with a list of allocated work items which I could refer back to - and periodically iterate over to identify any items that got lost somewhere so I can add them to a reallocation queue.
With that, the server was complete. It outputs the completed work item results to the standard output, and progress information to the standard error. This allows me to invoke it like this:
node ./nodecount-surface-server.js >results.tsv
Worker
Very cool. A server isn't much good without an army of workers ready and waiting to tear through the work items it's serving at breakneck speed though - and that's where the worker comes in. I started writing it in much the same way I did the server:
#!/usr/bin/env node
// ^----| Another shebang, just like the server
const http = require("http"); // We'll need this to talk to the server later
const child_process = require("child_process"); // This is used to spawn the neural network subprocess
const settings = {
server: { host: "172.16.230.58", port: 32000 },
worker_command: "./network.js --epochs 1000 --learning-rate 0.2 --topology {topology} <datasets/acw-2-set-10.txt 2>/dev/null"
};
That worker_command
there in the settings object is the command I used to execute the neural network, with a placeholder {topology}
which we find-and-replace just before execution. Due to obvious reasons (no plagiarism thanks!) I can't release that script itself, but it's not necessary to understand how the distributed work item systme I've written works. It could just as well be any other command you like!
Next up is the work item executor itself. Since it obviously takes time to execute a work item (why else would I go to such lengths to process as many of them at once as possible :P), I take a callback as the 2nd argument (it's just like a delegate or Action
in C♯):
function execute_item(data, callback) {
let command = settings.worker_command.replace("{topology}", data.join(","));
console.log(`[execute] ${command}`);
let network_process = child_process.exec(command, (error, stdout, stderr) => {
console.log(`[done] ${stdout.trim()}`);
let result = stdout.trim().split(/\t|,/g);
let payload = `${result[0]}\t${result[1]}\t${result[5]}`;
let request = http.request({
hostname: settings.server.host,
port: settings.server.port,
path: "/",
method: "POST",
headers: {
"content-length": payload.length
}
}, (response) => {
console.log(`[submitted] ${payload}`);
callback();
});
request.write(payload);
request.end();
});
}
In the above I substitute in the work item array as a comma-separated list, execute the command as a subprocess, report the result back to the server, and then call the callback. To report the result back I use the http
module built-in to Node.JS, but if I were tidy this up I would probably use an npm package like got instead, as it simplifies the code a lot and provides more features / better error handling / etc.
A work item executor is no good without any work to do, so that's what I tackled next. I wrote another function that fetches work items from the server and executes them - wrapping the whole thing in a Promise to make looping it easier later:
function do_work() {
return new Promise(function(resolve, reject) {
let request = http.request({
hostname: settings.server.host,
port: settings.server.port,
path: "/",
method: "GET"
}, (response) => {
var body = "";
response.on("data", (chunk) => body += chunk);
response.on("end", () => {
if(body.trim().length == 0) {
console.error(`No work item received. We're done!`);
process.exit();
}
let work_item = body.split(/\s+/).map((item) => parseInt(item.trim()));
console.log(`[work item] ${work_item}`);
execute_item(work_item, resolve);
});
});
request.end();
});
}
Awesome! It's really coming together. Doing just one work item isn't good enough though, so I took it to the next level:
function* do_lots_of_work() {
while(true) {
yield do_work();
}
}
// From https://starbeamrainbowlabs.com/blog/article.php?article=posts/087-Advanced-Generators.html
function run_generator(g) {
var it = g(), ret;
(function iterate() {
ret = it.next();
ret.value.then(iterate);
})();
}
run_generator(do_lots_of_work);
Much better. That completed the worker script - so all that remained was to set it going on as many machines as I could get my hands on, sit back, and watch it go :D
I did have some trouble with crashes at the end because there was no work left for them to do, but it didn't take (much) fiddling to figure out where the problem(s) lay.
Each instance of the worker script can max out a single core of a machine, so multiple instances of the worker script are needed per machine in order to fully utilise a single machine's resources. If I ever need to do this again, I'll probably make use of the built-in cluster
module to simplify it such that I only need to start a single instance of the worker script per machine instance of 1 for each core.
Come to think of it, it would have looked really cool if I'd done it at University and employed a whole row of machines in a deserted lab doing the crunching - especially since it was for my report....
Liked this post? Got an improvement? Comment below!