Stardust | Starbeamrainbowlabs

Building the science festival demo: technical overview

Hello and welcome to the technical overview of the hull science festival demo I did on the 9th September 2023. If you haven't already, I recommend reading the main release post for context, and checking out the live online demo here: https://starbeamrainbowlabs.com/labs/research-smflooding-vis/

I suspect that a significant percentage of the readers of my blog here love technical nuts and bolts of things (though you're usually very quiet in the comments :P), so I'm writing a series of posts about various aspects of the demo, because it's was a very interesting project.

In this post, we'll cover the technical nuts and bolts of how I put it together, the software and libraries I used, and the approach I took. I also have another post written I'll be posting after this one on monkeypatching npm packages after you install them, because I wanted that to be it's own post. In a post after that we'll look look at the research and the theory behind the project and how it fits into my PhD and wider research.

To understand the demo, we'll work backwards and deconstruct it piece by piece - starting with what you see on screen.

Browsing for a solution

As longtime readers of my blog here will know, I'm very partial to cramming things into the browser that probably shouldn't run in one. This is also the case for this project, which uses WebGL and the HTML5 Canvas.

Of course, I didn't implement using the WebGL API directly. That's far too much effort. Instead, I used a browser-based game engine called Babylon.js. Babylon abstracts the complicated bits away, so I can just focus on implementing the demo itself and not reinventing the wheel.

Writing code in Javascript is often an exercise in putting lego bricks together (which makes it very enjoyable, since you rarely have to deviate from your actual task due to the existence of npm). To this end, in the process of implementing the demo I collected a bunch of other npm packages together to which I could then delegate various tasks:

tsv for parsing TSV files
chroma-js for handling colours
pako for running unpacking .gz files with pure JS in the browser
readablestream-lines for making a ReadableStream a line-by-line iterator
octree-es for an Octree implementation
a few other minor packages

Graphics are easy

After picking a game engine, it is perhaps unsurprising that the graphics were easy to implement - even with 70K points to display. I achieved this with Babylon's PointsCloudSystem class, which made the display of the point cloud a trivial exercise.

After adapting and applying a clever plugin (thanks, @sebavan!), I had points that were further away displaying smaller and closer ones larger. Dropping in a perceptually uniform colour map (I wonder if anyone's designed a perceptually uniform mesh map for a 3D volume?) and some fog made the whole thing look pretty cool and intuitive to navigate.

Octopain

Now that I had the points displaying, the next step was to get the text above easy point displaying properly. Clearly with 70K points (140K in the online demo!) I can't display text for all of them at once (and it would look very messy if I did), so I needed to index them somehow and efficiently determine which points were near to the player in real time. This is actually quite a well studied problem, and from prior knowledge I remember that Octrees were reasonably efficient. If I had some tine to sit down and read papers (a great pastime), this one (some kind of location recognition from point clouds; potentially indoor/outdoor tracking) and this one (AI semantic segmentation of point clouds) look very interesting.

Unfortunately, the task of extracting a list of points within a given radius was not something commonly implemented in octree implementations on npm, and combined with a bit of headache figuring out the logic of this and how to hook it up to the existing Babylon renderer resulted in this step taking some effort before I found octree-es and got it working the way I wanted it to.

In the end, I had the octree as a completely separate point indexing data structure, and I used the word as a key to link it with the PointsCloudSystem in babylon.

Gasp, is that a memory leaks I see?!

Given I was in a bit of a hurry to get the whole demo thing working, it should come as no surprise that I ended up with a memory leak. I didn't actually have time to fix it before the big day either, so I had the demo on the big monitor while I kept an eye on the memory usage of my laptop on my laptop screen!

A photo of my demo up and running on a PC with a PS4 controller on a wooden desk. An Entroware laptop sits partially obscured by a desktop PC monitor, the latter of which has the demo full screen.

(Above: A photo of my demo in action.... I kept an eye on the memory graph the taskbar on my laptop the whole time. It only crashed once!)

Anyone who has done anything with graphics and game engines probably suspects where the memory leak was already. When rendering the text above each point with a DynamicTexture, I didn't reuse the instance when the player moved, leading to a build-up of unused textures in memory that would eventually crash the machine. After the day was over, I had time to sit down and implement a pool to re-use these textures over and over again, which didn't take nearly as long as I thought it would.

Gamepad support

You would think that being a well known game engine that Babylon would have working gamepad support. The documentation even suggests as such, but sadly this is not the case. When I discovered that gamepad support was broken in Babylon (at least for my PS4 controller), I ended up monkeypatching Babylon to disable the inbuilt support (it caused a crash even when disabled O.o) and then hacking together a custom implementation.

This custom implementation is actually quite flexible, so if I ever have some time I'd like to refactor it into its own npm package. Believe it or not I tried multiple other npm packages for wrapping the Gamepad API, and none worked reliably (it's a polling API, which can make designing an efficient and stable wrapper an interesting challenge).

To do that though I would need to have some other controllers to test with, as currently it's designed only for the PS4 dualshock controller I have on hand. Some time ago I initially purchased an Xbox 360 controller wanting something that worked out of the box with Linux, but it didn't work out so well so I ended up selling it on and buying a white PS4 dualshock controller instead (pictured below).

I'm really impressed with how well the PS4 dualshock works with Linux - it functions perfectly out of the box in the browser (useful test website) just fine, and even appears to have native Linux mainline kernel support which is a big plus. The little touchpad on it is cute and helpful in some situations too, but most of the time you'd use a real pointing device.

A white PS4 dualshock controller.

(Above: A white PS4 dualshock controller.)

How does it fit in a browser anyway?!

Good question. The primary answer to this is the magic of esbuild: a magical build tool that packages your Javascript and CSS into a single file. It can also handle other associated files like images too, and on top of that it's suuuper easy to use. It tree-shakes by default, and just all-around a joy to use.

Putting it to use resulted in my ~1.5K lines of code (wow, I thought it was more than that) along with ~300K lines in libraries being condensed into a single 4MiB .js and a 0.7KiB .css file, which I could serve to the browser along with the the main index.html file. It's event really easy to implement subresource integrity, so I did that just for giggles.

Datasets, an origin story

Using the Fetch API, I could fetch a pre-prepared dataset from the server, unpack it, and do cool things with it as described above. The dataset itself was prepared using a little Python script I wrote (source).

The script uses GloVe to vectorise words (I think I used 50 dimensions since that's what fit inside my laptop at the time), and then UMAP (paper, good blog post on why UMAP is better than tSNE) to do dimensionality reduction down to 3 dimensions, whilst still preserving global structure. Judging by the experiences we had on the day, I'd say it was pretty accurate, if not always obvious why given words were related (more on this why this is the case in a separate post).

My social media data, plotted in 2D with PCA (left), tSNE (centre), and UMAP (right). Points are blue against a white background, plotted with the Python datashader package.

_(Above: My social media data, plotted in 2D with PCA (left), tSNE (centre), and UMAP (right). Points are blue against a white background, plotted with the Python datashader package.)_

I like Javascript, but I had the code written in Python due to prior research, so I just used Python (looking now there does seem to be a package that implementing UMAP in JS, so I might look at that another time). The script is generic enough that I should be able to adapt it for other projects in the future to do similar kinds of analyses.

For example, if I were to look at a comparative analysis of e.g. language used by social media posts from different hashtags or something, I could use the same pipeline and just label each group with a different colour to see the difference between the 2 visually.

The data itself comes from 2 different places, depending on where you see the demo. If you were luck enough to see it in person, then it's directly extracted from my social media data. The online one comes from page abstracts from various Wikipedia language dumps to preserve privacy of the social media dataset, just in case.

With the data converted, the last piece of the puzzle is that of how it ends up in the browser. My answer is a gzip-compressed headerless tab-separated-values file that looks something like this (uncompressed, of course):

cat    -10.147051      2.3838716       2.9629934
apple   -4.798643       3.1498482       -2.8428414
tree -2.1351748      1.7223179       5.5107193

With the data stored in this format, it was relatively trivial to load it into the browser, decompressed as mentioned previously, and then display it with Babylon.js. There's also room here to expand and add additional columns later if needed, to e.g. control the colour of each point, label each word with a group, or something else.

Conclusion

We've pulled the demo apart piece by piece, and seen at a high level how it's put together and the decisions I made while implementing it. We've seen how I implemented the graphics - aided by Babylon.js and a clever hack. I've explained how I optimised the location polling using achieve real-time performance with an octree, and how reusing textures is very important. Finally, we took a brief look at the dataset and where it came from.

In the next post, we'll take a look at how to monkeypatch an npm package and when you'd want to do so. In a later post, we'll look at the research behind the demo, what makes it tick, what I learnt while building and showing it off, and how that fits in with the wider field from my perspective.

Until then, I'll see you in the next post!

Edit 2023-11-30: Oops! I forgot to link to the source code....! If you'd like to take a gander at the source code behind the demo, you can find it here: https://github.com/sbrl/research-smflooding-vis

Creating a UEFI + BIOS multi-boot + Data flash drive

This post is slightly different - It is mainly to document the process of creating a flash drive that boots into Grub2 with both a BIOS and UEFI firmware that you can also store data on in Windows.

First of all, you need to make sure that your flash drive has a GPT (GUID Partition Table) and not an MBR. If you don't know, it is safe to assume that you are currently using a MBR (Master Boot Record).

Switching to a GPT

If you don't have a GPT, then you need to backup the contents of your flash drive because you will need to wipe it clean in order to switch it over.

Make sure you are in a linux environment (start a virtual machine and pass the flash drive in if you don't have one). Work out where your flash drive is (in this instance ours is at /dev/sdx) and then execute the following commands:

sudo apt-get install gdisk
sudo sgdisk --zap-all /dev/sdx

The above will wipe all partition tables from the device. Before continuing, you may need to remove and re-plug-in the flash drive you are working in. Run this pair of commands to add a "Microsoft basic data" partition, and format it with FAT32:

sudo sgdisk --new=1:0:0 --typecode=1:0700 /dev/sdx
sudo mkfs.vfat -F32 -n GRUB2EFI /dev/sdx1

The type code 0700 is the bit that determines the type of partition that we are creating. In this case, we are creating a Microsoft basic data partition. The tutorial I followed (see the source at the bottom) set the type ef00, which is a EFI System Partition. If you experience problems, try changing the type to this.

Next up, we need to mount the new partition. Do it like this:

sudo mount -t vfat /dev/sdb1 /mnt -o uid=1000,gid=1000,umask=022

The above will mount it to the directory /mnt. next, you need to go to the first source below and download the zip that can be found under the text pack with all necessary files for you to modify as you need. Extract this to the root of the flash drive (/mnt in our case):

cd ~/Downloads/
unzip usb-pack_efi.zip
rsync -auv usb-pack_efi/ /mnt

Next, we need to install grub2 in BIOS mode. It will complain horribly, but apparently it works :D

sudo grub-install --force --removable --boot-directory=/mnt/boot /dev/sdb

Now you should have a flash drive with Grub2 installed, that you can also see in Windows!

Credit to sysmatck of ubuntuforums.org sudodus for the guide.

Sources:

How to Create a EFI/UEFI GRUB2 Multiboot USB Drive to boot ISO images

File System Performance in PHP

While writing pepperminty wiki, I started seeing a rather nasty in crease in page load times. After looking into it, I drew the conclusion that it must have been the file system that caused the problem. At the time, I had multiple calls to PHP's glob function to find all the wiki pages in the current directory, and I was checking to see if the wiki page existed before reading it into memory.

The solution: A page index. To cut down on the number of reads from the file system, I created a json file that containedd inforamtion about every page on the wiki. This way, it only needs to check the existence of and read in a single file before it can start rendering any one page. If the page index doesn't exist, it is automatically rebuilt with the glob function to find all the wiki pages in the current directory.

In short: to increase the performance of your PHP application, try to reduce the number of reads (and writes!) to the file system to an absolute minimum.

I still need to update the code to allow users to delete pages via the GUI though, because at present you have to have access to the server files to delete a page and then remove it from the page index manually.

Behind the Parallax Stars

Unfortunately I have been busy, so I have been unable to write this post up until now. Anyway, welcome to the second technical post on this blog. This time, we will be looking behind the Parallax Stars.

The FPS and rendering time graphs are drawn by a slightly modified version of stats.js, by mrdoob. It's original github repository can be found here, and the slightly modified version can be found here. The modifications I made allow both the FPS graph and the rendering time graphs to be displayed at once.

The interesting code that makes the stars work is found in stars.js. The function is a constructor that returns an containing everything needed to make the stars appear and animate.

this.build = function() {
    this.data = [];

    for (var i = 0; i < this.number; i++)
    {
        this.data[i] = new star();
    }

    if(this.depthsort == true)
    {
        this.data.sort(depthsort);
    }
};

function star()
{
    this.z = Math.random()*(0.7) + 0.3;

    this.x  = rand(canvas.width);
    this.y  = rand(canvas.height);
    this.radius = Math.floor(this.z * 7);

    var r, g;
    r = rand(200, 255);
    g = rand(200, g - 20);

    this.colour = "rgb(" + r + ", " + g + ", " + 0 + ");"

    this.speed = this.z * 3;
}

The build() function contains some setup code that populates an array with stars, which are constructed by a different function, star(). Each star is assigned an x, y, and z co-ordinate. The x and y co-ordinates determine it's position on the screen. and the z co-ordinate determines it's size and speed. The stars are also given a colour too. The function rand() is a small utility function I sometimes use for generating random numbers:

// EXAMPLES:
// rand(256) returns a whole number between or equal to 0 and 255
// rand(100,201) returns a whole number between or equal to 100 and 200
// rand() returns a decimal between or equal to 0 and less than 1
function rand()
{
    // takes 0-2 arguments (numbers)
    var args = arguments, num;
    if (args[0] && !args[1]) { 
        num = Math.floor(Math.random()*(args[0]));
    } else if (args[0] && args[1]) {
        num = Math.floor(Math.random()*(args[1]-args[0])) + args[0];
    } else {
        num = Math.random();
    }
    return num;
}

The array that hold all the stars is also depth sorted before we start and on each frame to keep the stars in order since stars are leaving the screen and new one coming onto the screen on every frame. The standard array .sort() function is used, along with a custom sort function:

function depthsort(a,b) {
    if(a.z < b.z)
        return -1;
    if(a.z > b.z)
        return 1;
    else
        return 0;
}

The updates that happen every frame are split into two different functions. step() deals with incrementing co-ordinates and keeping the depth sorting up to date, and render() takes the current state and renders it on the canvas.

this.step = function() {
    for (var i = 0; i < this.data.length - 1; i ++)
    {
        this.data[i].x += this.data[i].speed;

        if(this.data[i].x > canvas.width + this.data[i].radius + 1)
        {
            this.data[i].x = new star();
            this.data[i].x = -this.data[i].radius - 1;

            if(this.depthsort == true)
            {
                this.data.sort(depthsort);
            }
        }
    }
}

The this.data array holds all of the stars. For each star, it's x value is incremented by it's speed. If this moves the star off the screen, we replace it with a new star that starts off at the left hand side. The stars are then depth sorted again to keep all the further away stars behind the stars that are (supposed to be) closer to the front.

this.render = function() {
    context.clearRect(0, 0, canvas.width, canvas.height);

    for (var i = 0; i < this.data.length - 1; i ++)
    {
        context.globalAlpha = 1;
        context.globalCompositeOperation = "destination-out";

        context.beginPath();
        context.arc(this.data[i].x, this.data[i].y, this.data[i].radius, 0, Math.PI*2, true);
        context.closePath();
        context.fill();

        context.globalAlpha = this.data[i].z;
        context.globalCompositeOperation = "source-over";

        context.fillStyle = this.data[i].colour;

        context.beginPath();
        context.arc(this.data[i].x, this.data[i].y, this.data[i].radius, 0, Math.PI*2, true);
        context.closePath();
        context.fill();
    }
}

To start off the rendering, the canvas is cleared. After that, For each star, a hole in the canvas is cleared with the destination-out drawing mode. The reason for this is that the z co-ordinate is also used as an alpha value for each star to make it dimmer the further away it is. This gives a slightly more realistic 3d appearance. After clearing a hole, the drawing mode is rest to the default (source-over), and the alpha and fill colours are set up according to the star's current z co-ordinates and colour respectively. Then the star is drawn. Since when this script was written we did not have a canvas drawing function to draw an ellpise, the context.arc(this.data[i].x, this.data[i].y, this.data[i].radius, 0, Math.PI*2, true); code is used to draw circles via context.beginPath() and context.fill().

Finally, One nice neat update() function is provided that calls render() and step() in sequence:

this.update = function() {
    this.render();
    this.step();
}

That concludes this technical post! If you have any questions, please post them below and I will try my best to answer them.

How the trianglfier works

This technical post will be all about how the recently released Image Trianglfier works.

The visible <img /> element that shows the original image has a function called drawimage() attached to it's load event such that every time a new image is selected, it updates the first of two invisible <canvas> elements:

The above code takes the newly loaded image, calculates new dimensions for the image if either side of the image exceeds 1000 pixels, and draws it to the imagecache canvas.

When the render button is clicked, a second <canvas> with an id of main is put to work:

After grabbing a few references to the main canvas (the one we want to draw the image on) and the imagecache canvas (the one with the scaled original image on it), we then extract the pixeldata of the imagecache canvas on line #5 of the gist so that we can use it to out colours for each triangle later.

We then enter a for loop and start drawing triangles. A triangle is drawn by picking 3 random points within a box with a diameter set by the trianglesize setting, and setting these to be the 3 vertices. The central point of the box is picked on lines #17 and #18, and the 3 points are picked on lines #20 through #33.

The colour of the triangle is also set at this point by averaging the colour in a box centred on the central point we picked on lines #17 and #18. This boxes size is set by the coloursampleradius setting. The algorithm is passed the pixeldata we extracted earlier, along with the canvas size, the coordinates of the centralpoint, and the box's diameter and calculates the average by simply adding up all the colour components of each pixel in the box and dividing it by the number pixels it added up the components of. This algorithm is not particularly optimised and is one of the main causes for the hanging that occurs during the rendering process. This algorithm can be found here.

After the colour and point coordinate calculations, all that is left to do is draw the triangle itself. This is done on lines #36 to #42.

After all the triangles have been drawn, the rendering is complete. The finished image is then converted into a jpeg image is displayed on the right hand side.

Got a question? Spotted a mistake? Leave a comment below.

Suggestions for improvement are always appreciated

Stardust Blog

Tag Cloud