Stardust | Starbeamrainbowlabs

Building the science festival demo: technical overview

Hello and welcome to the technical overview of the hull science festival demo I did on the 9th September 2023. If you haven't already, I recommend reading the main release post for context, and checking out the live online demo here: https://starbeamrainbowlabs.com/labs/research-smflooding-vis/

I suspect that a significant percentage of the readers of my blog here love technical nuts and bolts of things (though you're usually very quiet in the comments :P), so I'm writing a series of posts about various aspects of the demo, because it's was a very interesting project.

In this post, we'll cover the technical nuts and bolts of how I put it together, the software and libraries I used, and the approach I took. I also have another post written I'll be posting after this one on monkeypatching npm packages after you install them, because I wanted that to be it's own post. In a post after that we'll look look at the research and the theory behind the project and how it fits into my PhD and wider research.

To understand the demo, we'll work backwards and deconstruct it piece by piece - starting with what you see on screen.

Browsing for a solution

As longtime readers of my blog here will know, I'm very partial to cramming things into the browser that probably shouldn't run in one. This is also the case for this project, which uses WebGL and the HTML5 Canvas.

Of course, I didn't implement using the WebGL API directly. That's far too much effort. Instead, I used a browser-based game engine called Babylon.js. Babylon abstracts the complicated bits away, so I can just focus on implementing the demo itself and not reinventing the wheel.

Writing code in Javascript is often an exercise in putting lego bricks together (which makes it very enjoyable, since you rarely have to deviate from your actual task due to the existence of npm). To this end, in the process of implementing the demo I collected a bunch of other npm packages together to which I could then delegate various tasks:

tsv for parsing TSV files
chroma-js for handling colours
pako for running unpacking .gz files with pure JS in the browser
readablestream-lines for making a ReadableStream a line-by-line iterator
octree-es for an Octree implementation
a few other minor packages

Graphics are easy

After picking a game engine, it is perhaps unsurprising that the graphics were easy to implement - even with 70K points to display. I achieved this with Babylon's PointsCloudSystem class, which made the display of the point cloud a trivial exercise.

After adapting and applying a clever plugin (thanks, @sebavan!), I had points that were further away displaying smaller and closer ones larger. Dropping in a perceptually uniform colour map (I wonder if anyone's designed a perceptually uniform mesh map for a 3D volume?) and some fog made the whole thing look pretty cool and intuitive to navigate.

Octopain

Now that I had the points displaying, the next step was to get the text above easy point displaying properly. Clearly with 70K points (140K in the online demo!) I can't display text for all of them at once (and it would look very messy if I did), so I needed to index them somehow and efficiently determine which points were near to the player in real time. This is actually quite a well studied problem, and from prior knowledge I remember that Octrees were reasonably efficient. If I had some tine to sit down and read papers (a great pastime), this one (some kind of location recognition from point clouds; potentially indoor/outdoor tracking) and this one (AI semantic segmentation of point clouds) look very interesting.

Unfortunately, the task of extracting a list of points within a given radius was not something commonly implemented in octree implementations on npm, and combined with a bit of headache figuring out the logic of this and how to hook it up to the existing Babylon renderer resulted in this step taking some effort before I found octree-es and got it working the way I wanted it to.

In the end, I had the octree as a completely separate point indexing data structure, and I used the word as a key to link it with the PointsCloudSystem in babylon.

Gasp, is that a memory leaks I see?!

Given I was in a bit of a hurry to get the whole demo thing working, it should come as no surprise that I ended up with a memory leak. I didn't actually have time to fix it before the big day either, so I had the demo on the big monitor while I kept an eye on the memory usage of my laptop on my laptop screen!

A photo of my demo up and running on a PC with a PS4 controller on a wooden desk. An Entroware laptop sits partially obscured by a desktop PC monitor, the latter of which has the demo full screen.

(Above: A photo of my demo in action.... I kept an eye on the memory graph the taskbar on my laptop the whole time. It only crashed once!)

Anyone who has done anything with graphics and game engines probably suspects where the memory leak was already. When rendering the text above each point with a DynamicTexture, I didn't reuse the instance when the player moved, leading to a build-up of unused textures in memory that would eventually crash the machine. After the day was over, I had time to sit down and implement a pool to re-use these textures over and over again, which didn't take nearly as long as I thought it would.

Gamepad support

You would think that being a well known game engine that Babylon would have working gamepad support. The documentation even suggests as such, but sadly this is not the case. When I discovered that gamepad support was broken in Babylon (at least for my PS4 controller), I ended up monkeypatching Babylon to disable the inbuilt support (it caused a crash even when disabled O.o) and then hacking together a custom implementation.

This custom implementation is actually quite flexible, so if I ever have some time I'd like to refactor it into its own npm package. Believe it or not I tried multiple other npm packages for wrapping the Gamepad API, and none worked reliably (it's a polling API, which can make designing an efficient and stable wrapper an interesting challenge).

To do that though I would need to have some other controllers to test with, as currently it's designed only for the PS4 dualshock controller I have on hand. Some time ago I initially purchased an Xbox 360 controller wanting something that worked out of the box with Linux, but it didn't work out so well so I ended up selling it on and buying a white PS4 dualshock controller instead (pictured below).

I'm really impressed with how well the PS4 dualshock works with Linux - it functions perfectly out of the box in the browser (useful test website) just fine, and even appears to have native Linux mainline kernel support which is a big plus. The little touchpad on it is cute and helpful in some situations too, but most of the time you'd use a real pointing device.

A white PS4 dualshock controller.

(Above: A white PS4 dualshock controller.)

How does it fit in a browser anyway?!

Good question. The primary answer to this is the magic of esbuild: a magical build tool that packages your Javascript and CSS into a single file. It can also handle other associated files like images too, and on top of that it's suuuper easy to use. It tree-shakes by default, and just all-around a joy to use.

Putting it to use resulted in my ~1.5K lines of code (wow, I thought it was more than that) along with ~300K lines in libraries being condensed into a single 4MiB .js and a 0.7KiB .css file, which I could serve to the browser along with the the main index.html file. It's event really easy to implement subresource integrity, so I did that just for giggles.

Datasets, an origin story

Using the Fetch API, I could fetch a pre-prepared dataset from the server, unpack it, and do cool things with it as described above. The dataset itself was prepared using a little Python script I wrote (source).

The script uses GloVe to vectorise words (I think I used 50 dimensions since that's what fit inside my laptop at the time), and then UMAP (paper, good blog post on why UMAP is better than tSNE) to do dimensionality reduction down to 3 dimensions, whilst still preserving global structure. Judging by the experiences we had on the day, I'd say it was pretty accurate, if not always obvious why given words were related (more on this why this is the case in a separate post).

My social media data, plotted in 2D with PCA (left), tSNE (centre), and UMAP (right). Points are blue against a white background, plotted with the Python datashader package.

_(Above: My social media data, plotted in 2D with PCA (left), tSNE (centre), and UMAP (right). Points are blue against a white background, plotted with the Python datashader package.)_

I like Javascript, but I had the code written in Python due to prior research, so I just used Python (looking now there does seem to be a package that implementing UMAP in JS, so I might look at that another time). The script is generic enough that I should be able to adapt it for other projects in the future to do similar kinds of analyses.

For example, if I were to look at a comparative analysis of e.g. language used by social media posts from different hashtags or something, I could use the same pipeline and just label each group with a different colour to see the difference between the 2 visually.

The data itself comes from 2 different places, depending on where you see the demo. If you were luck enough to see it in person, then it's directly extracted from my social media data. The online one comes from page abstracts from various Wikipedia language dumps to preserve privacy of the social media dataset, just in case.

With the data converted, the last piece of the puzzle is that of how it ends up in the browser. My answer is a gzip-compressed headerless tab-separated-values file that looks something like this (uncompressed, of course):

cat    -10.147051      2.3838716       2.9629934
apple   -4.798643       3.1498482       -2.8428414
tree -2.1351748      1.7223179       5.5107193

With the data stored in this format, it was relatively trivial to load it into the browser, decompressed as mentioned previously, and then display it with Babylon.js. There's also room here to expand and add additional columns later if needed, to e.g. control the colour of each point, label each word with a group, or something else.

Conclusion

We've pulled the demo apart piece by piece, and seen at a high level how it's put together and the decisions I made while implementing it. We've seen how I implemented the graphics - aided by Babylon.js and a clever hack. I've explained how I optimised the location polling using achieve real-time performance with an octree, and how reusing textures is very important. Finally, we took a brief look at the dataset and where it came from.

In the next post, we'll take a look at how to monkeypatch an npm package and when you'd want to do so. In a later post, we'll look at the research behind the demo, what makes it tick, what I learnt while building and showing it off, and how that fits in with the wider field from my perspective.

Until then, I'll see you in the next post!

Edit 2023-11-30: Oops! I forgot to link to the source code....! If you'd like to take a gander at the source code behind the demo, you can find it here: https://github.com/sbrl/research-smflooding-vis

My Hull Science Festival Demo: How do AIs understand text?

Hello there! On Saturday 9th September 2023, I was on the supercomputing stand for the Hull Science Festival with a cool demo illustrating how artificial intelligences understand and process text. Since then, I've been hard at work tidying that demo up, and today I can announce that it's available to view online here on my website!

This post is a general high-level announcement post. A series of technical posts will follow on the nuts and bolts of both the theory behind the demo and the actual code itself and how its put together, because it's quite interesting and I want to talk about it.

I've written this post to serve as a foreword / quick explanation of what you're looking at (similar to the explanation I gave in person), but if you're impatient you can just find it here.

All AIs currently developed are essentially complex parametrised mathematical models. We train these models by updating their parameters little by little until the output of the model is similar to the output of some ground truth label.

In other words, and AI is just a bunch of maths. So how does it understand text? The answer to this question lies in converting text to numbers - a process often called 'word embedding'.

This is done by splitting an input sentence into words, and then individually converting each word into a series of numbers, which is what you will see in the demo at the link below - just convert with some magic to 3 dimensions to make it look fancy.

Similar sorts of words will have similar sorts of numbers (or positions in 3D space in the demo). As an example here, at the science festival we found a group of footballers, a group of countries, and so on.

In the demo below, you will see clouds of words processed from Wikipedia. I downloaded a bunch of page abstracts for Wikipedia in a number of different languages (source), extracted a list of words, converted them to numbers (GloVe → UMAP), and plotted them in 3D space. Can you identify every language displayed here?

Find the demo here: https://starbeamrainbowlabs.com/labs/research-smflooding-vis/

A screenshot of the initial attract screen of the demo. A central box allows one to choose a file to load, with a large load button directly beneath it. The background is a blurred + bloomed screenshot of a point cloud from the demo itself.

Find the demo here: https://starbeamrainbowlabs.com/labs/research-smflooding-vis/

If you were one of the lucky people to see my demo in person, you may notice that this online demo looks very different to the one I originally presented at the science festival. That's because the in-person demo uses data from social media, but this one uses data from Wikipedia to preserve privacy, just in case.

I hope you enjoy the demo! Time permitting, I will be back with some more posts soon to explain how I did this and the AI/NLP theory behind it at a more technical level. Some topics I want to talk about, in no particular order:

General technical outline of the nuts and bolts of how the demo works and what technologies I used to throw it together
How I monkeypatched Babylon.js's gamepad support
A detailed and technical explanation of the AI + NLP theory behind the demo, the things I've learnt about word embeddings while doing it, and what future research could look like to improve word embeddings based on what I've learnt
Word embeddings, the options available, how they differ, and which one to choose.

Until next time, I'll leave you with 2 pictures I took on the day. See you in the next post!

A photo of my demo up and running on a PC with a PS4 controller on a wooden desk. An Entroware laptop sits partially obscured by a desktop PC monitor, the latter of which has the demo full screen.

(Above: A photo of my demo in action!)

A photo of some piles of postcards arranged on a light wooden desk. My research is not shown, but visuals from other researchers' projects are printed, such as microbiology to disease research to jellyfish galaxies.

(Above: A photo of the postcards on the desk next to my demo. My research is not shown, but visuals from other researchers' projects are printed, with everything from microbiology to disease research to jellyfish galaxies.)

Stardust Blog

Tag Cloud