Starbeamrainbowlabs

Stardust
Blog

Defining AI: Image segmentation

Banner showing the text 'Defining AI' on top on translucent white vertical stripes against a voronoi diagram in white against a pink/purple background. 3 progressively larger circles are present on the right-hand side.

Welcome back to Defining AI! This series is all about defining various AI-related terms in short-and-sweet blog posts.

In the last post, we took a quick look at word embeddings, which is the key behind how AI models understand text. In this one, we're going to investigate image segmentation.

Image segmentation is a particular way of framing a learning task for an AI model that takes an image as an input, and then instead of classifying the image into a given set of categories, it classifies every pixel by the category each one belongs to.

The simplest form of this is what's called semantic segmentation, where we classify each pixel via a given set of categories - e.g. building, car, sky, road, etc if we were implementing a segmentation model for some automated vehicle.

If you've been following my PhD journey for a while, you'll know that it's not just images that can be framed as an 'image' segmentation task: any data that is 2D (or can be convinced to pretend to be 2D) can be framed as an image segmentation task.

The output of an image segmentation model is basically a 3D map. This map will obviously have the width and height, but then also have an extra dimension for the channel. This is best explained with an image:

(Above: a diagram explaining how the output of an image segmentation model is formatted. See below explanation. Extracted from my work-in-progress thesis!)

Essentially, each value in the 'channel' dimension will be the probability that pixel is that class. So, for example, a single pixel in a model for predicting the background and foreground of an image might look like this:

[ 0.3, 0.7 ]

....if we consider these classes:

[ background, foreground ]

....then this pixel has a 30% change of being a background pixel, and a 70% change of being a foreground pixel - so we'd likely assume it's a foreground pixel.

Built up over the course of an entire image, you end up with a classification of every pixel. This could lead to models that separate the foreground and background in live video feeds, autonomous navigation systems, defect detection in industrial processes, and more.

Some models, such as the Segment Anything Model (website) have even been trained to generically segment any input image, as in the above image where we have a snail sat on top of a frog sat on top of a turtle, which is swimming in the water.

As alluded to earlier, you can also feed in other forms of 2D data. For example, this paper) predicts rainfall radar data a given number of hours into the future from a sample at the present moment. Or, my own research approximates the function of a physics-based model!


That's it for this post. If you've got something you'd like defining, please do leave a comment below!

I'm not sure what it'll be next, but it might be either staying at a high-level and looking at different ways that we can frame tasks in AI models, or I could jump to a lower level to look at fundamentals like loss (error) functions, backpropagation, layers (AI models are made up of multiple smaller layers), etc. What would you like to see next?

Tag Cloud

3d 3d printing account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio automation backups bash batch blender blog bookmarklet booting bug hunting c sharp c++ challenge chrome os cluster code codepen coding conundrums coding conundrums evolved command line compilers compiling compression conference conferences containerisation css dailyprogrammer data analysis debugging defining ai demystification distributed computing dns docker documentation downtime electronics email embedded systems encryption es6 features ethics event experiment external first impressions freeside future game github github gist gitlab graphics guide hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet interoperability io.js jabber jam javascript js bin labs latex learning library linux lora low level lua maintenance manjaro minetest network networking nibriboard node.js open source operating systems optimisation outreach own your code pepperminty wiki performance phd photos php pixelbot portable privacy problem solving programming problems project projects prolog protocol protocols pseudo 3d python reddit redis reference release releases rendering research resource review rust searching secrets security series list server software sorting source code control statistics storage svg systemquery talks technical terminal textures thoughts three thing game three.js tool tutorial twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 worldeditadditions xmpp xslt

Archive

Art by Mythdael