PhD, Update 8: Eggs in Baskets
I'm back again with another PhD update blog post! Before we begin, here's a list of all the parts in the series so far:
- PhD Update 1: Directions
- PhD Update 2: The experiment, the data, and the supercomputers
- PhD Update 3: Simulating simulations with some success
- PhD Update 4: Ginormous Data
- PhD Update 5: Hyper optimisation and frustration
- PhD Update 6: The road ahead
- PhD Update 7: Just out of reach
As in the previous post, progress since last time is split in 2: The Temporal CNN, and the social media side of things. I've started to split my time more evenly between the 2 sides, as it seems like the Temporal CNN is going to take lots more work than anticipated and I'd rather not put all my eggs in 1 basket.
Temporal CNN
As you might have guessed, the Temporal CNN still isn't learning anything, but at least now I think I know what the problem is. Since last time, I've done a bunch of debugging and tests to try and figure out what the problem is. During that process, I've managed to reach a record of ~20% accuracy, which at least gives me hope that it's going to work!
Specifically, I used the MNIST (alternative site) handwriting digit dataset with my "easy" task as explain in the previous post, but with a small difference: I pre-generated 2 random tensors to serve as the "below 5" and "5 and above" targets to predict instead of a pair of tensors filled with 0s or 1s respectively. The model didn't like this at all, so this is how I now know what the problem is.
For those interested, here's the laundry list of other things I've tried since last time:
- Giving it more data (all of 2007, with the 2013 floods as validation; made things a bit worse)
- Found and fixed a bug in data normalisation that managed to sneak through during the rewrite (reduced training times a touch)
- Inverting the heightmap (helped a bit)
- Making the model deeper (Gave me a full 5% accuracy increase from 15% to 20%!)
Knowing what the problem is though is 1 thing, but solving it is another matter entirely. Thankfully, my supervisor and I have a plan to look into using a modified version of the latter half of a variational autoencoder and squidge it onto the tail end of the Temporal CNN. If it works, then I'm imagining that we'll need a new name for the Temporal CNN (suggestions?), but I'll tackle that once I've finished revising the model.
For context, a variational autoencoder is a modified "vanilla" autoencoder, and is 1 of 2 different main classes of generative AI model architecture - the other being Generative Adversarial Networks (GAN). In contrast to a GAN, a variational autoencoder does image-to-image translation with a single model, and maps an input parameter space onto an output parameter space. It first encodes the input to the model into a smaller tensor of features, before upscaling that back into an image again. In this fashion, it can learn to translate between 2 different images - for example putting glasses on people's faces.
To do this, I'm going to implement a vanilla variational autoencoder using the MNIST dataset, and once I've done this I'll then lift part of the model structure and transpose it onto the top of my existing Temporal CNN - by doing it this way I'll ensure that I have a known-good model to work with that is definitely capable of image-to-image translation.
Social Media
In other news, I've started to make some real progress on the social media side of things. I've downloaded and anonymised some tweets (the code for which is open source on npm under the package name twitter-academic-downloader
- I intend to write a separate blog post about it at some point soon-ish), and I've also put together an LSTM-based model to start looking at doing some text classification.
I decided to implement said model in Python instead of Javascript, because for what I can tell Tensorflow.js doesn't come with as many batteries included as Tensorflow for Python does for natural language processing-based tasks. This has caused some interesting adventures (and a number of frustrating crashes), but I think I'm starting to get the hang of it.
In particular it's interesting coming from Tensorflow.js (which is a later project), because it seems that Tensorflow for Python is much less cohesive and more disjointed as a library compared to Tensorflow.js, which has learnt and applied lessons from the Python implementation - resulting in a much more cohesive and well thought out API. A prime example of this is the tf.Dataset
vs tf.keras.Sequence
in the Python version, which isn't an issue in Tensorflow.js, as in the Javascript bindings we have a single tf.Dataset
.
This aside, my next step here is to train a significantly sized model that's larger than the mini model with a single layer and 100 units I've been using for testing purposes (that's my task for this afternoon - which I've likely done by the time you're reading this post).
In terms of literature, I've read a bunch more papers on the subject since last time - but I still feel like I've got more to read. Recently I read a series of papers about word embeddings (converting words into numerical tensors), which was very interesting. The process has evolved over the years, starting from a simple dictionary mapping incrementing numbers to words, to training an AI to generate said representations in increasingly sophisticated ways (starting with word2vec, then moving on to in no particular order ELMo, GloVe, and finally BERT - transformers are pretty incredible models). It was a fascinating read - I can recommend it to anyone who's interested in natural language processing (along with this excellent post)
In the model I've implemented, I've ultimately decided to go with GloVe (Global Vectors for Word Representation), as the pre-trained model is simply a text file containing a lookup table one can read into a dictionary or hash table.
Conclusion
Things have been moving forwards - albeit slowly. I've got an idea as to how I can resolve the issues I've been facing with the Temporal CNN (pending a new name once I'm done with all the modifications and I know what the model architecture is going to be like), though it's going to take a lot of work.
Things are finally starting to move in social media land - hopefully the accuracy of the LSTM-based model will be higher than that of the mini model I trained, which was only 50% on a balanced dataset - no better than blind guessing!
See you again in 2 months or so, when hopefully I'll have some real results to show (though of course I'll be keeping up with weekly posts about other things in the meantime). If you have any comments or questions about any of this - please leave a comment below! I'd love to hear your thoughts.
Sources and further reading
- Intuitively Understanding Variational Autoencoders
- Understanding Variational Autoencoders (VAEs)
- What is "Word Embedding"
- A Beginner's Guide to Word2Vec and Neural Word Embeddings
- ELMo (paper)
- GloVe: Global Vectors for Word Representation
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- The Unreasonable Effectiveness of Recurrent Neural Networks