Stardust | Starbeamrainbowlabs

My Hull Science Festival Demo: How do AIs understand text?

Hello there! On Saturday 9th September 2023, I was on the supercomputing stand for the Hull Science Festival with a cool demo illustrating how artificial intelligences understand and process text. Since then, I've been hard at work tidying that demo up, and today I can announce that it's available to view online here on my website!

This post is a general high-level announcement post. A series of technical posts will follow on the nuts and bolts of both the theory behind the demo and the actual code itself and how its put together, because it's quite interesting and I want to talk about it.

I've written this post to serve as a foreword / quick explanation of what you're looking at (similar to the explanation I gave in person), but if you're impatient you can just find it here.

All AIs currently developed are essentially complex parametrised mathematical models. We train these models by updating their parameters little by little until the output of the model is similar to the output of some ground truth label.

In other words, and AI is just a bunch of maths. So how does it understand text? The answer to this question lies in converting text to numbers - a process often called 'word embedding'.

This is done by splitting an input sentence into words, and then individually converting each word into a series of numbers, which is what you will see in the demo at the link below - just convert with some magic to 3 dimensions to make it look fancy.

Similar sorts of words will have similar sorts of numbers (or positions in 3D space in the demo). As an example here, at the science festival we found a group of footballers, a group of countries, and so on.

In the demo below, you will see clouds of words processed from Wikipedia. I downloaded a bunch of page abstracts for Wikipedia in a number of different languages (source), extracted a list of words, converted them to numbers (GloVe → UMAP), and plotted them in 3D space. Can you identify every language displayed here?

Find the demo here: https://starbeamrainbowlabs.com/labs/research-smflooding-vis/

A screenshot of the initial attract screen of the demo. A central box allows one to choose a file to load, with a large load button directly beneath it. The background is a blurred + bloomed screenshot of a point cloud from the demo itself.

Find the demo here: https://starbeamrainbowlabs.com/labs/research-smflooding-vis/

If you were one of the lucky people to see my demo in person, you may notice that this online demo looks very different to the one I originally presented at the science festival. That's because the in-person demo uses data from social media, but this one uses data from Wikipedia to preserve privacy, just in case.

I hope you enjoy the demo! Time permitting, I will be back with some more posts soon to explain how I did this and the AI/NLP theory behind it at a more technical level. Some topics I want to talk about, in no particular order:

General technical outline of the nuts and bolts of how the demo works and what technologies I used to throw it together
How I monkeypatched Babylon.js's gamepad support
A detailed and technical explanation of the AI + NLP theory behind the demo, the things I've learnt about word embeddings while doing it, and what future research could look like to improve word embeddings based on what I've learnt
Word embeddings, the options available, how they differ, and which one to choose.

Until next time, I'll leave you with 2 pictures I took on the day. See you in the next post!

Edit 2023-11-30: Oops! I forgot to link to the source code....! If you'd like to take a gander at the source code behind the demo, you can find it here: https://github.com/sbrl/research-smflooding-vis

A photo of my demo up and running on a PC with a PS4 controller on a wooden desk. An Entroware laptop sits partially obscured by a desktop PC monitor, the latter of which has the demo full screen.

(Above: A photo of my demo in action!)

A photo of some piles of postcards arranged on a light wooden desk. My research is not shown, but visuals from other researchers' projects are printed, such as microbiology to disease research to jellyfish galaxies.

(Above: A photo of the postcards on the desk next to my demo. My research is not shown, but visuals from other researchers' projects are printed, with everything from microbiology to disease research to jellyfish galaxies.)

I've submitted a paper on my rainfall radar research to NLDL 2024!

A screenshot of the nldl.org conference website.

(Above: A screenshot of the NLDL website)

Hey there! I'm excited that last week I submitted a paper to what I hope will become my very first conference! I've attended the AAAI-22 doctoral consortium online, but I haven't had the opportunity to attend a conference until now. Of course, I had to post about it here.

First things first, which conference have I chosen? With the help of my supervisor, we chose the Northern Lights Deep Learning Conference. It's relatively close by the UK (where I live), it's relevant to my area and the paper I wanted to submit (I've been working on the paper since ~July/August 2023), and the deadline wasn't too tight. There were a few other conferences I was considering, but they either had really awkward deadlines (sorry, HADR! I've missed you twice now), or got moved to an unsafe country (IJCAI → China).

The timeline is roughly as follows:

~early - ~mid November 2023: acceptance / rejection notification
somewhere in the middle: paper revision time
9th - 11th January 2024: conference time!

Should I get accepted, I'll be attending in person! I hope to meet some cool new people in the field of AI/machine learning and have lots of fascinating discussions about the field.

As longtime readers of my blog here might have guessed, the paper I've submitted is on my research using rainfall radar data and ~~abusing~~ image segmentation to predict floods. The exact title is as follows:

Towards AI for approximating hydrodynamic simulations as a 2D segmentation task

As the paper is unreviewed, I don't feel comfortable with releasing it publicly yet. However, feel free to contact me if you'd like to read it and I'm happy to hand out a copy of the unreviewed paper individually.

Most of the content has been covered quite casually in my phd update blog post series (16 posts in the series so far! easily my longest series by now), just explained in formal language.

This paper will also form the foundation of the second of two big meaty chapters of my thesis, the first being based on my social media journal article. I'm currently at 80 pages of thesis (including appendices, excluding bibliography, single spaced a4), and I still have a little way to go before it's done.

I'll be back soon with another PhD update blog post with more details about the thesis writing process and everything else I've been up to over the last 2 months. I may also write a post on the hull science festival which I'll be attending on the supercomputing stand with a Cool Demo™, 'cause the demo is indeed very cool.

See you then!

I've got some business cards!

My new business cards!

I've been to several events of various natures now (like the Hardware Meetup), and at each one I've found that a 'business card' or two would be really handy to give to people so that they can remember the address of this website.

After fiddling with the design over about a week I (and Mythdael!) came up with the design you see above. Personally, I'm really pleased with them, so I decided to post here to show them off :-)

I'm finding that it's a rather good idea to promote and build your brand at these kind of events by showing people the cool things that you've created and learnt, and business cards seem to be just the thing that helps you do it.

A close up of the front and back of my new business cards.

My robot works!

The Hull Pixelbot 2.0!

I went to the hardware meetup again today. For the last 2 times I've either managed to forget or I've been on holiday (probably both...), but I remembered this time around and I decided to go. I'm glad I did, because, with Rob's help I fixed my robot! It is now trundling around on the floor quite happily :D

The problem was that I assumed that the digital pin numbers on the wemos D1 R2 were the same as the GPIO pin numbers, which isn't the case apparently. Here's a table:

Digital	GPIO
D0	GPIO 16
D1	GPIO 5
D2	GPIO 4
D3	GPIO 0
D4	GPIO 2
D5	GPIO 14
D6	GPIO 12
D7	GPIO 13
D8	GPIO 15

(Table from wemos.cc)

Very odd numbering. It's rather frustrating actually! Anyway, there seemed to be quite a lot more people there this time as compared to last time - it looks like the idea is taking off, which I'm very glad about :-)

If you're in the Hull area and interested in hardware, you should seriously consider coming along. There's a page on meetup.com that contains the details of the next meetup, but the general rule of thumb is that it happens at C4DI every first and third Thursday of every month at 6pm - 8pm.

Before I forget, I'm going to end this post off by posting the modified version of Rob's hull pixelbot dance code that works on the Wemos:

(Git Repo, Raw)

I built a robot!

The robot I built! The day before yesterday we had another hardware meetup at C4DI, and I built a robot! Now I just have to figure out how to program it...

It's one of Rob Miles' Hull Pixel Bots, with a Wemos D1 R2 on the top and two stepper motors and their driver boards mounted in the middle, and a battery box on the bottom.

The wheels are a little wonky, but we'll sort that out next time :) For now, I'm going to have some fun making it run around on the floor :D

The Hull Pixelbot Meetup

Rob's Hull Pixelbot (Above: Rob's WiFi-enabled Pixelbot.)

Today Rob Miles was kind enough to give me a lift to the monthly hardware (or hull pixel bot) meetup. It was different to what I'm used to, but it was rather fun actually!

Rob Miles has built a kit that gives you the parts to build your very own Arduino-powered robot that trundles around on the floor. He's also managed to add a WiFi chip to it too - so you can (provided you write the code) connect to your pixel bot and control it remotely!

You can build your own by going to hullpixelbot.com.

I'll certainly be playing around with it and attending the next meetup (meetups are on the first Thursday of every month at 6:00pm at C4DI).

Portfolios are important

Attending the Game Development conference for students at Hull University gave me a little bit more of an idea as to what companies are looking for in perspective graduates (and more importantly interns in my case) that they are thinking of hiring. The thing that came across to me as the most important is the idea of an up to date portfolio. If you haven't come across one of these before, a portfolio is basically a showcase of everything that you've done, presented in a manner that is pleasing to the eye.

In my case my portfolio is my website, so I've just been spending half an hour or so updating it to reflect my current projects and accounts (I've opened an account on Codepen). You should do this too, and if you haven't got a portfolio set up, you can create one for free with Github Pages. If you're feeling particularly adventurous, you could also create a blog using Jekyll - Github pages supports this too, and it lets you create blog posts as markdown documents (like I do for this blog, although I wrote my blog engine myself), and it automatically transforms them into a blog post on your website for you. You can even use the Github web interface to do everything here!

If you comment below or get in touch with me in some other manner, I might feature a selection here on this blog.

Learning Prolog: Lab Session #6 - More Looping

Fireworks

Before I get into today's post, I wanted to include a picture of some firework that I managed to photograph a few days ago on my phone. A heard them outside and when I took a look out of the window, this is what I saw!

Anyway, Today's Prolog lab session was about more loops. These loops are different to the failure driven loops that we did last time though in that they are recursive instead. Here's an example:

loopa :-
    write('Enter \'end.\' to exit.'), nl,
    read(Input), (Input = end ; loopa).

The above is a simple rule that keeps asking for user input until the user enters 'end'.

?- loopa.
Enter 'end.' to exit.
|: hello.
Enter 'end.' to exit.
|: cheese.
Enter 'end.' to exit.
|: end.
true.

This is done through the semi colon, which acts as an or operator. Although this works, it isn't very readable. Let's try something else.

% Stopping condition
loopb(end).
% Main loop
loopb(Input) :-
    write('Enter \'end.\' to exit.'),
    read(Input2),
    loopb(Input2).

% Starter rule
go :-
    loopb(something).

Much better. The above is divided up into the stopping condition (i.e. when the user types end), the main loop, and a rule that kicks the rest of the program off. When go is called, we call loopb with something as a parameter immediately. Since this doesn't match the fact loob(end), Prolog looks at the rule we defined, loopb(Input). Then it asks the user for input, and starts the process over again until the user inputs end.

Here's an example of the above in action:

?- go.
Enter 'end.' to exit.gdsfgdrt.
Enter 'end.' to exit.sdfgsdgf.
Enter 'end.' to exit.sdfgsdfgertr.
Enter 'end.' to exit.cheese.
Enter 'end.' to exit.end.
true.

Having learnt this, one of the challenges was to write a rule in Prolog. A factorial is where you multiply an integer by all the integers greater than 0 but less than the integer. For example, 4! (4 factorial) is 4x3x2x1, which is 24). Here's what I came up with:

factorial(1, 1).
factorial(N, Factorial) :-
    N2 is N - 1,
    factorial(N2, Ans),
    Factorial is N * Ans.

And here's an example of the above in action:

?- factorial(6, Answer).
Answer = 720 .

That concludes this post on the 6th Prolog lab session. If you don't understand something (or I've made a mistake!) please post a comment below and I'll do my best to help you. Also, if you have any, post your firework pictures below. I'd be interested to see them!

Stardust Blog

Tag Cloud