PhD Update 15: Finding what works when
Hey there! Sorry this post is a bit late again: I've been unwell.
In the last post, I revisited my rainfall radar model, and shared how I had switched over to using .tfrecord
files to store my data, and the speed boost to training I got from doing that. I also took an initial look at applying contrastive learning to my rainfall radar problem. Finally, I looks a bit into dimensionality reduction algorithms - in short: use UMAP (paper).
Before we continue, here's the traditional list of previous posts:
- PhD Update 1: Directions
- PhD Update 2: The experiment, the data, and the supercomputers
- PhD Update 3: Simulating simulations with some success
- PhD Update 4: Ginormous Data
- PhD Update 5: Hyper optimisation and frustration
- PhD Update 6: The road ahead
- PhD Update 7: Just out of reach
- PhD Update 8: Eggs in Baskets
- PhD Update 9: Results?
- PhD Update 10: Sharing with the world
- PhD Update 11: Answers to our questions
- PhD Update 12: Is it enough?
- PhD Update 13: A half complete
- PhD Update 14: An old enemy
In addition, there's also an additional intermediate post about my PhD entitled "A snapshot into my PhD: Rainfall radar model debugging", which I posted since the last PhD update blog post. If you're interested in details of the process I go through stumbling around in the dark doing research on my PhD, do give it a read:
https://starbeamrainbowlabs.com/blog/article.php?article=posts/519-phd-snapshot.html
Since last time, I have been noodling around with the rainfall radar dataset and image segmentation models to see if I can find something that works. The results are mixed, but the reasons for this are somewhat subtle, so I'll explain those below.
Social media journal article
Last time I mentioned that I had written up my work with social media sentiment analysis models, but I had yet to finalise it to send to be published. This process is now completed, and it's currently under review! It's unlikely I'll have anything more to share on this for a good 3-6 months, but know that it's a process happening in the background. The journal I've submitted it to is Elsevier's Computers and Geosciences, though of course since it's under review I don't yet know if they will accept it or not.
Image segmentation, sort of
It doesn't feel like I've done much since last time, but looking back I've done a lot, so I'll summarise what I've been doing here. Essentially, the bulk of my work has been into different image segmentation models and strategies to see what works and what doesn't.
The specific difficulty here is that while I'm modelling my task of going from rainfall radar data (plus heightmap) to water depth in 2 dimensions as an image segmentation task, it's not exactly image segmentation, in that the output is significantly different in nature to the input I'm feeding the model, which complicates matters as this significantly increases the difficulty of the learning task I'm attempting to get the model to work on.
As a consequence of this, it is not obvious which model architecture to try first, or which ones will perform well or not, so I've been trying a variety of different approaches to see what works and what doesn't. My rough plan of model architectures to try is as follows:
- Split: contrastive pretraining
- Mono / autoencoder: encoder [ConvNeXt] → decoder [same as #1]
- Different loss functions:
- Categorical Crossentropy
- Binary Crossentropy
- Dice
- DeepLabV3+ (almost finished)
- Encoder-only [ConvNeXt/ResNet/maybe Swin Transformer] (pending)
Out of all of these approaches, I'm almost done with DeepLabV3+ (#4), and #5 (encoder-only) is the backup plan.
My initial hypothesis was that a more connected image segmentation model such as the pre-existing PSPNet, DeepLabV3+, etc would not be a suitable choice for this task, since regular image segmentation models such as these place emphasis on the output being proportional to the input. Hence, I theorised that an autoencoder-style model would be the best place to start - especially so since I've fiddled around with an autoencoder before, albeit for a trivial problem.
However, I discovered with approaches #1 and #2 that autoencoder-style models with this task have a tendency to get 'lost amongst the weeds', and ignore the minority class:
To remedy this, I attempted to use a different loss function called Dice, but this did not help the situation (see the intermediary A snapshot into my PhD post for details).
I ended up cutting the contrastive pretraining temporarily (#1), as it added additional complexity to the model that made it difficult to debug. In the future, when the model actually works, I intend to revisit the idea of contrastive pretraining to see if I can boost the performance of the working model at all.
If there's one thing that doing a PhD teaches you, it's to keep on going in the face of failure. I should know: my PhD has been full of failed attempts. I saying I found online (I forget who said it, unfortunately) definitely rings true here: "The difference between the novice and the master is that the master has failed more times than the novice has tried"
In the spirit of this, this brings us to the next step of proving (or disproving) that this task is possible, which is to try a pre-existing image segmentation model to see what happens. After some research (against, see the intermediary A snapshot into my PhD post for details), I discovered that DeepLabV3+
is the current state of the art for image segmentation.
After verifying that DeepLabV3+ actually works with it's intended dataset, I've now just finished adapting it to take my rainfall radar (plus heightmap) dataset as an input instead. It's currently training as I write this post, so I'll definitely have some results for next time.
The plan from here depends on the performance of DeepLabV3+
. Should it work, then I'm going to first post an excited social media post, and then secondly try adding an attention layer to further increase performance (if I have time). CBAM will probably be my choice of attention mechanism here - inspired by this paper.
If DeepLabV3+
doesn't work, then I'm going to go with my backup plan (#5), and quickly try training a classification-style model that takes a given area around a central point, and predicts water / no water for the pixel in the centre. Ideally, I would train this with a large batch size, as this will significantly boost the speed at which the model can make predictions after training. In terms of the image encoder, I'll probably use ConvNeXt
and at least one other image encoder for comparison - probably ResNet
- just in case there's a bug in the ConvNeXt
implementation I have (I'm completely paranoid haha).
Ideally I want to get a basic grasp on a model that works soon though, and leave too much of the noodling around with improving performance until later, as if at all possible it would be very cool to attend IJCAI 2023. At this point it feels unlikely I'll be able to scrape something together for submitting to the main conference (the deadline for full papers is 18th January 2023, abstracts by 11th January 2023), but submitting to an IJCAI 2023 workshop is definitely achievable I think - they usually open later on.
Long-term plans
Looking into the future, my formal (and funded) PhD research period is coming to an end this month, so I will be taking on some half time work alongside my PhD - I may publish some details of what this entails at a later time. This does not mean that I will be stopping my PhD, just doing some related (and paid) work on the side as I finish up.
Hopefully, in 6 months time I will have cracked this rainfall radar model and be a good way into writing my thesis.
Conclusion
Although I've ended up doing things a bit back-to-front on this rainfall radar model (doing DeepLabV3+ first would have been a bright idea), I've been trying a selection of different model architectures and image segmentation models with my rainfall radar (plus heightmap) to water depth problem to see which ones work and which ones don't. While I'm still in the process of testing these different approaches, it will not take long for me to finish this process.
Between now and the next post in this series in 2 months time, I plan to finish trying DeepLabV3+, and then try an encoder-only (image classification) style model should that not work out. I'm also going to pay particularly close attention to the order of my dimensions and how I crop them, as I found yesterday that I mixed up the order of the width and height dimensions, feeding one of the models I've tested data in the form [batch_size, width, height, channels]
instead of [batch_size, height, width, channels]
as you're supposed to.
If I can possibly manage it I'm going to begin the process of writing up my thesis by writing a paper for IJCAI 2023, because it would be very cool to get the chance to go to a real conference in person for the first time.
Finalyl, if anyone knows of any good resources on considerations in the design of image segmentation heads for AI models, I'm very interested. Please do leave comment below.