Stardust | Starbeamrainbowlabs

AI encoders demystified

When beginning the process of implementing a new deep learning / AI model, alongside deciding on the input and outputs of the model and converting your data to tfrecord files (file extension: .tfrecord.gz; you really should, the performance boost is incredible) is designing the actual architecture of the model itself. It might be tempting to shove a bunch of CNN layers at the problem to make it go away, but in this post I hope to convince you to think again.

As with any field, extensive research has already been conducted, and when it comes to AI that means that the effective encoding of various different types of data with deep learning models has already been carefully studied.

To my understanding, there are 2 broad categories of data that you're likely to encounter:

Images: photos, 2D maps of sensor readings, basically anything that could be plotted as an image
Sequenced data: text, sensor readings with only a temporal and no spatial dimension, audio, etc

Sometimes you'll encounter a combination of these two. That's beyond the scope of this blog post, but my general advice is:

If it's 3D, consider it a 2D image with multiple channels to save VRAM
Go look up specialised model architectures for your problem and how well they work/didn't work
Use a Convolutional variant of a sequenced model or something
Pick one of these encoders and modify it slightly to fix your use case

In this blog post, I'm going to give a quick overview of the various state of the art encoders for the 2 categories of data I've found on my travels so far. Hopefully, this should give you a good starting point for building your own model. By picking an encoder from this list as a starting point for your model and reframing your problem just a bit to fit, you should find that you have not only a much simpler problem/solution, but also a more effective one too.

I'll provide links to the original papers for each model, but also links to any materials I've found that I found helpful in understanding how they work. This can be useful if you find yourself in the awkward situation of needing to implement them yourself, but it's also generally good to have an understanding of the encoder you ultimately pick.

I've read a number of papers, but there's always the possibility that I've missed something. If you think this is the case, please comment below. This is especially likely if you are dealing with audio data, as I haven't looked into handling that too much yet.

Finally, also out of scope of this blog post are the various ways of framing a problem for an AI to understand it better. If this interests you, please comment below and I'll write a separate post on that.

Images / spatial data

ConvNeXt: Image encoders are a shining example of how a simple stack of CNNs can be iteratively improved through extensive testing, and in no encoder is this more apparent than in ConvNeXt.

In 1 sentence, a ConvNeXt model can be summarised as "a ResNet but with the features of a transformer". Researchers took a bog-standard ResNet, and then iteratively tested and improved it by adding various features that you'd normally find in a transformer, which results in the model performant (*read: had the highest accuracy) encoder when compared to the other 2 on this list.

ConvNeXt is proof that CNN-based models can be significantly improved by incorporating more than just a simple stack of CNN layers.

Paper: A ConvNet for the 2020s
Explanation: ConvNext: The Return Of Convolution Networks
Implementation: https://github.com/leanderme/ConvNeXt-Tensorflow

Vision Transformer: Born from when someone asked what should happen if they tried to put an image into a normal transformer (see below), Vision Transformers are a variant of a normal transformer that handles images and other spatial data instead of sequenced data (e.g. sentences).

The current state of the art is the Swin Transformer to the best of my knowledge. Note that ConvNeXt outperforms Vision Transformers, but the specifics of course depend on your task.

Paper: Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Explanation: Swin Transformer: Hierarchical Vision Transformer using Shifted Windows - Committed towards better future
Implementation: https://github.com/microsoft/Swin-Transformer (PyTorch, if you know of a simple Tensorflow implementation please let me know)

ResNet: Extremely popular, you may have heard of the humble ResNet before. It was originally made a thing when someone wanted to know what would happen if they took the number of layers in a CNN-based model to the extreme. It has a 'residual connection' between blocks of layers, which avoids the 'vanishing gradient problem' which in short is when your model is too deep, and when it backpropagates the error the gradient it adjusts the weight by becomes so small that it has hardly any effect.

Since this model was invented, skip connections (or 'residual connections', as they are sometimes known) have become a regular feature in all sorts of models - especially those deeper than just a couple of layers.

Despite this significant advancement though, I recommend using a ConvNeXt encoder instead of images - it's better than and the architecture more well tuned than a bog standard ResNet.

Sequenced Data

Transformer: The big one that is quite possibly the most famous encoder-decoder ever invented. The Transformer replaces the LSTM (see below) for handling sequenced data. It has 2 key advantages over an LSTM:

It's more easily paralleliseable, which is great for e.g. training on GPUs
The attention mechanism it implements revolutionises the performance of like every network ever

The impact of the Transformer on AI cannot be overstated. In short: If you think you want an LSTM, use a transformer instead.

Paper: Attention is all you need
Explanation: The Illustrated Transformer
Implementation: Many around, but I have yet to find a simple enough one, so I implemented it myself from scratch. While I've tested the encoder and I know it works, I have yet to fully test the decoder. Once I have done this, I will write a separate blog post about it and add the link here.

LSTM: Short for Long Short-Term Memory, LSTMs were invented in 1997 to solve the vanishing gradient problem, in which gradients when backpropagating back through recurrent models that handle sequenced data shrink to vanishingly small values, rendering them ineffective at learning long-term relationships in sequenced data.

Superceded by the (non recurrent) Transformer, LSTMs are a recurrent model architecture, which means that their output feeds back into themselves. While this enables some unique model architectures for learning sequenced data (exhibit A), it also makes them horrible at being parallelised on a GPU (the dilated LSTM architecture attempts to remedy this, but Transformers are just better). The only thing that stands them apart from the Transformer is that the Transformer is somehow not built-in into Tensorflow as standard yet, whereas LSTMs are.

Just like Vision Transformers adapted the Transformer architecture for multidimensional data, so too are Grid LSTMs to normal LSTMs.

Paper: Long Short-Term Memory
Explanation: Understanding LSTMs,
Implementation: Tensorflow: tf.keras.layers.LSTM, PyTorch: torch.nn.LSTM

Conclusion

In summary, for images encoders in priority order are: ConvNeXt, Vision Transformer (Swin Transformer), ResNet, and for text/sequenced data: Transformer, LSTM.

We've looked at a small selection of model architectures for handling a variety of different data types. This is not an exhaustive list, however. You might have another awkward type of data to handle that doesn't fit into either of these categories - e.g. specialised models exist for handling audio, but the general rule of thumb is an encoder architecture probably already exists for your use-case - even if I haven't listed it here.

Also of note are alternative use cases for data types I've covered here. For example, if I'm working with images I would use a ConvNeXt, but if model prediction latency and/or resource consumption mattered I would consider using a MobileNet](https://www.tensorflow.org/api_docs/python/tf/keras/applications/mobilenet_v3), which while a smaller model is designed for producing rapid predictions in lower resource environments - e.g. on mobile phones.

Finally, while these are encoders decoders also exist for various tasks. Often, they are tightly integrated into the encoder. For example, the U-Net is designed for image segmentation. Listing these is out of scope of this article, but if you are getting into AI to solve a specific problem (as is often the case), I strongly recommend looking to see if an existing model/decoder architecture has been designed to solve your particular problem. It is often much easier to adjust your problem to fit an existing model architecture than it is to design a completely new architecture to fit your particular problem (trust me, I've tried this already and it was a Bad Idea).

The first one you see might even not be the best / state of the art out there - e.g. Transformers are better than the more widely used LSTMs. Surveying the landscape for your particular task (and figuring out how to frame it in the first place) is critical to the success of your model.

Encryption demystified: What to use and when

The other day, I found myself explaining different types of encryption, how they work, and what they are used for to someone in my lab implementing a secure system. During this process, I ended up creating a series of fancy diagrams in draw.io - so I thought I'd write it up into a proper demystification blog post.

To start us off here, let's define encryption. Encryption is the process of transforming a given input block of data (of an arbitrary data) using some kind of secret key into a form that is then completely unreadable. Any adversary obtaining a block of encrypted data encrypted with a suitably strong key (and algorithm) is not able to read or understand the data at all - except perhaps infer its original length.

Conversely, decryption is the process of undoing the encryption process with the same (or different, in some cases) key to get back the original data.

For purpose of this blog post, we will assume:

The encryption algorithms in question are perfect with no known weaknesses
Keys used to encrypt and/or decrypt are very strong and can't be cracked

Each of these are fields in their own right that could quite easily take many blog posts to fully explore.

From the perspective of a developer, there are 3 different basic places one needs to aware of. Others certainly exist, but to avoid making this post too long I'll just be covering the following 3:

Device encryption
Transport layer encryption
End-to end encryption

If there's any other encryption scheme you'd like me to cover, please leave a comment below and I'll try my best to explain it in a separate post.

Device encryption

First up is device encryption. Most modern operating systems for phones and PCs alike support device encryption:

Windows
Linux
Android
iOS

Not sure on macOS since I don't own one, but I'd be surprised if it didn't. The purpose of device encryption is that when the device is powered off, all data is stored physically on disk in an encrypted format, making it unreadable should the device be physically stolen - thereby protecting all data stored on it.

This is accomplished in a layered fashion. Let's explain it with a diagram:

A vertical layered diagram explaining device encryption. Physical block devices on the bottom, software applications on the top.

Although they may have different names for it, most operating systems back a concept of a "block device". Such a device is capable of storing a given number of bytes of data. Such devices need not be physical disks: they can instead be virtual. For example, zram presents block devices that store data compressed in RAM.

We can make use of this to encrypt hard drives. An encryption layer such as LUKS on Linux presents a virtual block device to the operating system which encrypts all data written to it before saving them back to some physical disk by which it is backed.

On boot, the encryption layer is initialised by the operating system and it asks the user for a password. Upon being given the correct password, the encryption layer is activated, and the operating system can then both request data blocks from the virtual block device (which causes the encryption layer to fetch the encrypted block from disk and then decrypt it before passing it to the requester) and write data blocks back to the virtual block device (whereby the encryption layer will encrypt the new data block before writing it to disk).

Even operating systems such as Windows (e.g. Bitlocker) and iOS which don't expose block devices in the same way as Linux does, the same principles I've explained here apply.

When the device is powered off, the key that was being stored in memory is wiped (it's stored in RAM, and RAM requires power to store data) and the data is secured.

Transport layer encryption

Another place encryption is commonly encountered in when transferring data to and from remote hosts over the Internet. Since the Internet is untrusted, it becomes rather a problem when one wants to transfer personal information such as passwords, bank card numbers, and location information across the Internet, in that such data could be stolen or modified in transit.

To solve this problem, the Transport Layer Security (TLS) protocol was invented. The purpose of TLS is to provide a secure connection between 2 hosts using authenticated encryption that has the following properties:

Eavesdroppers are unable to read data being transmitted
Attackers are unable to successfully modify any data in transit without the destination host knowing about it
The 2 hosts communicating with each other can verify each other's identities ¹

Although TLS itself is a protocol that is usually spoken over TCP, because it provides a generic bidirectional pipe through which any binary data can be transmitted and received, it is commonly used to wrap around other protocols to secure them. Examples include:

HTTP: Hypertext Transfer Protocol (used in web browsers)
SMTP: Simple ² Mail Transfer Protocol (used for sending and receiving emails)
IMAP: Internet Message Access Protocol (used for accessing email inboxes)
XMPP: Extensible Messaging and Presence Protocol (a federated messaging protocol used for instant messaging) ³

....and many others. There's a reason it's so prevalent: The most important rule when dealing with encryption and security is to never roll your own. Follow the standards, and use existing crypto libraries for your platform. Don't implement your own, as it's much more difficult than it appears to ensure your system is actually secure.

Here's a diagram of how it works:

End-to-end encryption

The last form of encryption I'm going to talk about is also perhaps the most misunderstood: end-to-end encryption.

End-to-end encryption is useful when you have 3 parties involved in the equation - usually 2 clients and a server. Suppose Alice and Bob have a messaging app on their phone that sends messages through an intermediary server (perhaps performing store-and-forward functions), but they do not want the server to be able to read their message. The solution here is end-to-end encryption, which prevents the intermediary server from being able to read the message.

Here's a diagram to explain what I mean:

End-to-end encryption is accomplished by using asymmetric cryptography. Asymmetric encryption - unlike symmetric encryption uses 2 keys instead of 1, and these keys also have to possess special properties, so you can't just generate some cryptographically secure random numbers and call it a day ⁴.

In asymmetric encryption, you have a public key which can only encrypt data, and a private key which can then decrypt the data. An example of this in practice is GPG, which is extensively used e.g. by apt (the package manager on some Linux systems).

In the diagram above, the sender encrypts the message with the public key that belongs to the receiver. They then send the message to the server, who forwards it on to the receiver. The receiver then decrypts the message with the private key (sometimes called a secret key).

In this way, the server is never able to read the content of the message. If the receiver wanted to reply to the sender, the same would happen in reverse. The receiver would need to ask the sender to securely transmit their public key to them, which they could then use to encrypt a message to send back.

In practice, every client involved in an end-to-end encryption system will generate their own keypair that consists of a public and a private key. They will then advertise their public key to everyone, allowing anyone to encrypt a message that only they can decrypt (an example of this: my GPG key can be found here).

It is important to avoid confusing end-to-end encryption with transport layer encryption. Indeed, end-to-end encryption is absolutely no substitute for transport layer encryption, because an application may for example need to authenticate with the intermediary server before being allowed to transmit end-to-end encrypted messages.

Transport layer encryption:

Allows 2 parties to communicate with each other securely
Does not prevent the receiver from reading received data (even if device encryption is employed)

End-to-end encryption:

Requires 3 parties to be involved in order to be effective
Ensures that 2 parties can communicate securely through an intermediary party
Requires that 2 parties wishing to communicate must first securely exchange their public keys and be confident that the public keys they have received actually belong to the other party they wish to communicate with
Can be significantly complicated to implement

Conclusion

In this post, we've looked at 3 types of encryption, how they work, and when they are useful. To summarise:

Device encryption protects data from physical theft
Transport layer encryption protects data in transit between 2 communicating parties talking to each other directly
End-to-end encryption protects the communications of 2 parties who are talking through 1 or more intermediary parties

Each of these are useful in different situations - and most likely are already solved problems. Do not implement any of these yourself. Use well known, battle tested libraries and programs for your platform that are regularly receiving updates instead.

While I've simplified this a lot in writing this post (we'd be here all week if I didn't!), I hope you've found this helpful (or even if you're still confused). This is a starting point, not an ending point - if this kind of thing interests you I can recommend researching it further and playing around with some practical implementations thereof.

Please do comment below (especially if you've spotted a mistake)! It's very motivating to hear that the things I write here are actually helpful to people.

In TLS, this is done using certificates. Each host has a list of certificate authorities (CAs) it trusts, and when a connection is initiated between a client and a server during the handshake certificates signed by these CAs are exchanged securely and checked. In practice, generally only the server sends a certificate which is then checked by the client - for example in HTTPS in web browsers. Server-to-server connections in a federated system (e.g. email) however give an opportunity to put this mutual authentication into action though. ↩
SMTP is not simple. While it was simple once upon a time, unfortunately it was not designed with the modern web and security in mind (given that it was first invented in 1981, I'm not surprised). Since it was invented, a large number of additions (both standardised and otherwise) have been adopted, significantly complicating it. Setting up a mail server correctly and ensuring your emails are delivered properly is not a simple task. ↩
See Snikket for a server, and Conversations for an Android client. See also the full client list. ↩
Use a crypto library like your programming language's crypto built-ins. If your language doesn't have a built-in module and you've tried checking your package manager, try libsodium, bearssl, or openssl. ↩

Servers demystified

Something I see a lot of around the Internet are people who think that you need to purchase a big (often rack-mounted) "server" in order to host things like websites, email, game servers, and more (exhibit a). Quite often, they turn to ebay to purchase used enterprise rack mounted servers too.

I want to take a moment here to write up my thoughts here on why that is almost never the correct approach for a home user to take to host such applications at home, and what the (much better) alternatives are to serve as a reference post I can direct people to who need educating about this important issue.

What is a "server"?

A server can mean 2 things: a physical computer whose primary role is to act as a server, and server applications, which serve content to other users elsewhere - be it phones, laptops, desktops, etc.

A lot of people new to the field don't realise it, but any computer can take on the role of a server - you don't need any fancy hardware. The things that a computer does is defined by the software it runs - not the hardware that it is built from.

Does a server need a graphics card (GPU)?

No. It really doesn't. It's extremely unlikely that for a general purpose server you would need a GPU. Another related myth here is that you need a GPU in your server if you're running a game server. This is also false. Most of the time a server is going to be running headlessly (i.e. without a monitor) - so it really doesn't need a GPU in order to function effectively.

The following tasks however may require a GPU:

Serious Machine Learning / Artificial Intelligence workloads
3D Rendering (e.g. Blender)
Live video streaming (video transcoding does not always utilise the GPU, as far as I can tell - make sure you check the documentation for your video editing software before buying any hardware)

Web servers, game servers, email servers, and other application servers do not use and cannot make use of a GPU. Programs need to be specially designed to support GPUs.

I need to purchase a license for Windows Server. Windows 10 isn't enough.

This is false. If you prefer Windows, then a regular old Windows 10 machine will be just fine for most home server use-cases. Windows Server provides additional features for enterprise that you are unlikely to need.

Personally, I recommend running a distribution of Linux though such as Ubuntu Server.

The problems with used hardware

Of particular frustration is the purchasing of old used (often rack mountable) servers from eBay and other auction sites. The low prices might be attractive, but such servers will nearly always have a number of issues:

The CPU and other components will frequently be 10+ years old, and draw lots of electricity
The fans will be very loud - sounding like a jet is taking off inside your house
They often don't come with hard drives, and often have custom drive bays that require purchasing expensive drives to fill

Awkward issues to be sure! Particularly of note here is the electricity problem. Very old devices draw orders of magnitude more power than newer ones - leading to a big electricity bill. It will practically always be cheaper to purchase a newer more expensive machine - it'll pay for itself in dramatically lower electricity bills.

What are the alternatives?

Many far more suitable alternatives exist. They fall into 2 categories:

Renting from a hosting company
Buying a physical device

I'll be talking through both of these options below.

Renting from a hosting company

If you'd rather not have any hardware of your own locally, you can always rent a server from a hosting company. These come in 1 flavours:

Virtual Private Servers (VPS): A virtual machine running on the hosting company's infrastructure. Often easier to scale to multiple machines.
Dedicated servers: Bare-metal hardware running in a hosting company's datacentre somewhere. Useful if you've outgrown a VPS.

Example providers include OVH, Kimsufi (dedicated servers), Digital Ocean, and many more.

Things to watch out for when choosing one include:

How can you get support if you have an issue?
What network speeds are provided? Are there any data caps?
How much hard drive space do they come with? You often can't get any additional hard drive space once you've bought it without switching to a new host.
How many CPU cores does it have (or, if you want to run a game server, what's the clock speed)?
How much RAM does it have?
How much is it per month?

Buying a physical device

If you'd rather buy a physical device (beware that email servers cannot be effectively hosted on a residential Internet connection), then I can recommend either looking into one of these 2:

An Intel NUC or other Mini PC in the same form factor
A Raspberry Pi (or, for more advanced users, I've heard good things about a Rock Pi, but haven't tried it myself)

Both options are quiet, reasonably priced, and will draw orders of magnitude less power than a big rack mounted server.

A notable caveat here is that if you intend to run a game server, you'll want to check the CPU architecture it runs on, as it may not be compatible with the Raspberry Pi (which has an ARM chip built it - which can be either arm64 or armv7l - I use the official Debian CPU architecture codes here to avoid ambiguity).

Other alternatives here include old laptops and desktops you already have lying around at home. Make sure they aren't too old though, because otherwise you'll run afoul of point #1 in my list of problems there above.

Conclusion

In this post, I've busted some common myths about serves. I've also taken a quick look some appropriate hardware that you can buy or rent to use as a server.

If you're in the market for a server, don't be fooled by low prices for used physical servers. Rather, either rent one from a hosting company, or buy a Mini PC or Raspberry Pi instead. It'll run quieter and use less power too.

Other common questions I see are how to get started with running various different applications on a server. This is out of scope of this article, but there are plenty of tutorials out there on how to do this.

Often you'll need some basic Linux terminal skills to follow along though - I've written a blog post about how you can get started with the terminal already. I also on occasion post tutorials here on this blog on how to setup various applications - these are usually tagged with tutorial and server.

Other sites have excellent tutorials on to setup all manner of different applications - I'll leave a bunch of links at the end of this post.

If this this post has helped demystify servers for you, please consider sharing it with others to clear up their misconceptions too.

Sources and Further Reading

DigitalOcean community tutorials (purchasing from DigitalOcean is not required)
How to install Ubuntu Server
Advanced users only (solid experience with the Linux Terminal and networking required): Ars Technica Taking Email Back series: 1, 2, 3, and finally 4
NGINX vs. Apache: Our View of a Decade-Old Question
How to secure a Linux server
Freeside resource list

Demystificating VPNs

After seeing yet another article that misunderstands and misrepresents VPNs, I just hda to make a post about it. This post actually started life as a reddit comment, but I decided to expand on it and make it a full post here on my blog.

VPNs are a technology that simply sends your traffic through an encrypted tunnel that pops out somewhere else.

For the curious, they do this by creating what's known as a virtual tunnel interface on your computer (on Linux-based machines this is often tun0) and alter your machine's routing table to funnel all your network traffic destined for the Internet into the tunnel interface.

The tunnel interface actually encrypts your data and streams it to a VPN server (though there are exceptions) that then forwards it on to the wider Internet for you.

This is great if:

You live in a country that censors your Internet connection
You don't trust your ISP
You are connected to a public open WiFi hotspot
You need to access resources on a remote network that only allow those physically present to use them

By using a VPN, you can make your device appear as though it is somewhere else. You can also hide your Internet traffic from the rest of your network that you are connected to.

However, VPNs are not a magic bullet. They are not so great at:

Blocking trackers
Blocking Ads
Blocking mining scripts that suck up your CPU
Limiting the amount of your data online services get a hold of

This is because it simply makes you appear as though you are somewhere else - it doesn't block or alter any of the traffic coming and going from your device - online services can still see all your personal data - all that's changed when you use a VPN is that your data is going via a waypoint on it's journey to it's final destination.

All hope is not lost, however - for there are steps you can take to deal with these issues. Try these steps instead:

Install an Ad blocker (personally I use µBlock Origin)
Activate tracking protection in your browser
Use a multi-account containers extension to sandbox sites that follow you across the web ( cough Facebook cough Google cough )

You may already be aware of these points - but in particular multi-account containers are quite interesting. By using an extension like the one I link to above, you can in effect have multiple browsers open at the same time. By this, I mean that you can have multiple 'sandboxes' - and the site data (e.g. cookies etc) will not cross over from 1 sandbox to another.

This gives websites the illusion of being loaded in multiple different environments - with limited options to figure out that they are in fact on the same machine - especially when combined with other measures.

Hopefully this clears up some of the confusion. If you know anyone else who's confused about VPNs, please share a link to this post with them! The fewer people who get the wrong idea about VPNs, the better.

Found this interesting? Have another privacy related question? Found an error in this post? Comment below!

Compilers, VMs, and JIT: Spot the difference

It's about time for another demystification post, I think :P This time, I'm going to talk about Compilers, Virtual Machines (VMs), and Just-In-Time Compilation (JIT) - and the way that they are both related and yet different.

Compilers

To start with, a compiler is a program that converts another program written in 1 language into another language (usually of a lower level). For example, gcc compiles C++ into native machine code.

A compiler usually has both a backend and a frontend. The frontend is responsible for the lexing and parsing of the source programming language. It's the part of the compiler that generates compiler warnings and errors. The frontend outputs an abstract syntax tree (or parse tree) that represents the source input program.

The backend then takes this abstract syntax tree, walks it, and generates code in the target output language. Such code generators are generally recursive in nature. Depending on the compiler toolchain, this may or may not be the last step in the build process.

gcc, for example, generates code to object files - which are then strung together by a linking program later in the build process.

Additional detail on the structure of a compiler (and how to build one yourself!) is beyond the scope of this article, but if you're interested I recommend reading my earlier Compilers 101 post.

Virtual Machines

A virtual machine comes in several flavours. Common to all types is the ability to execute instructions - through software - as if they were being executed on a 'real' hardware implementation of the programming language in question.

Examples here include:

KVM - The Linux virtual machine engine. Leverages hardware extensions to support various assembly languages that are implemented in real hardware - everything from Intel x86 Assembly to various ARM instruction sets. Virtual Machine Manager is a good GUI for it.
VirtualBox - Oracle's offering. Somewhat easier to use than the above - and cross-platform too!
.NET / Mono - The .NET runtime is actually a virtual machine in it's own right. It executes Common Intermediate Language in a managed environment.

It's also important to note the difference between a virtual machine and an emulator. An emulator is very similar to a virtual machine - except that it doesn't actually implement a language or instruction set itself. Instead, it 'polyfills' hardware (or environment) features that don't exist in the target runtime environment. A great example here is WINE or Parallels, which allow programs written for Microsoft Windows to be run on other platforms without modification.

Just-In-Time Compilation

JIT is sort of a combination of the above. The .NET runtime (officially knows as the Common Language Runtime, or CLR) is an example of this too - in that it compiles the CIL in the source assembly to native code just before execution.

Utilising such a mechanism does result in an additional delay during startup, but usually pays for itself in the vast performance improvements that can be made over an interpreter - as a JITting VM can automatically optimise code as it's executing to increase performance in real-time.

This is different to an interpreter, which reads a source program line-by-line - parsing and executing it as it goes. If it needs to go back to another part of the program, it will usually have to re-parse the code before it can execute it.

Conclusion

In this post, we've taken a brief look at compilers, VMs, and JIT. We've looked at the differences between them - and how they are different from their siblings (emulators and interpreters). In many ways, the line between the 3 can become somewhat blurred (hello, .NET!) - so learning the characteristics of each is helpful for disentangling different components of a program.

If there's anything I've missed - please let me know! If you're still confused on 1 or more points, I'm happy to expand on them if you comment below :-)

Found this useful? Spotted a mistake? Comment below!

LoRa Terminology Demystified: A Glossary

My 2 RFM95s on the lid of my project's box. More info in a future blog post coming soon!

(Above: My 2 RFM95s. One works, but the other doesn't yet....)

I've been doing some more experimenting with LoRa recently, as I've got 1 of my 2 RFM95 working (yay)! While the other is still giving me trouble (meaning that I can't have 1 transmit and the other receive yet :-/), I've still been able to experiment with other people's implementations.

To that end, I've been learning about a bunch of different words and concepts - and thought that I'd document them all here.

LoRa

The radio protocol itself is called LoRa, which stands for Long Range. It provides a chirp-based system (more on that later under Bandwidth) to allow 2 devices to communicate over great distances.

LoRaWAN

LoRaWAN builds on LoRa to provide a complete end-to-end protocol stack to allow Internet of Things (IoT) devices to communicate with an application server and each other. It provides:

Standard device classes (A, B, and C) with defined behaviours
- Class A devices can only receive for a short time after transmitting
- Class B devices receive on a regular, timed, basis - regardless of when they transmit
- Class C devices send and receive whenever they like
The concept of a Gateway for picking up packets and forwarding them across the rest of the network (The Things Network is the largest open implementation to date - you should definitely check it out if you're thinking of using LoRa in a project)
Secure multiple-layered encryption of messages via AES

...amongst many other things.

The Things Network

The largest open implementation of LoRaWAN that I know of. If you hook into The Things Network's LoRaWAN network, then your messages will get delivered to and from your application server and LoRaWAN-enabled IoT device, wherever you are in the world (so long as you've got a connection to a gateway). It's often abbreviated to TTN.

Check out their website.

A coverage map for The Things Network.

(Above: A coverage map for The Things Network. The original can be found here)

Data Rate

The data rate is the speed at which a message is transmitted. This is measured in bits-per-second, as LoRa itself is an 'unreliable' protocol (it doesn't guarantee that anyone will pick anything up at the other end). There are a number of preset data rates:

Code	Speed (bits/second)
DR0	250
DR1	440
DR2	980
DR3	1760
DR4	3125
DR5	5470
DR6	11000
DR7	50000

_(Source: Exploratory Engineering: Data Rate and Spreading Factor)_

These values are a little different in different places - the above are for Europe on 868MHz.

Maximum Payload Size

Going hand-in-hand with the Data Rate, the Maximum Payload Size is the maximum number of bytes that can be transmitted in a single packet. If more than the maximum number of bytes needs to be transmitted, then it will be split across multiple packets - much like TCP's Maximum Transmission Unit (MTU), when it comes to that.

With LoRa, the maximum payload size varies with the Data Rate - from 230 bytes at DR7 to just 59 at DF2 and below.

Spreading Factor

Often abbreviated to just simply SF, the spreading factor is also related to the Data Rate. In LoRa, the Spreading Factor refers to the duration of a single chirp. There are 6 defined Spreading Factors: ranging from SF7 (the fastest transmission speed) to SF12 (the slowest transmission speed).

Which one you use is up to you - and may be automatically determined by the driver library you use (it's always best to check). At first glance, it may seem optimal to choose SF7, but it's worth noting that the slower speeds achieved by the higher spreading factors can net you a longer range.

Data Rate	Configuration	bits / second	Max payload size (bytes)
DR0	SF12/125kHz	250	59
DR1	SF11/125kHz	440	59
DR2	SF10/125kHz	980	59
DR3	SF9/125kHz	1 760	123
DR4	SF8/125kHz	3 125	230
DR5	SF7/125kHz	5 470	230
DR6	SF7/250kHz	11 000	230
DR7	FSK: 50kpbs	50 000	230

_(Again, from Exploratory Engineering: Data Rate and Spreading Factor)_

Duty Cycle

A Duty Cycle is the amount of time something is active as a percentage of a total time. In the case of LoRa(/WAN?), there is an imposed 1% Duty Cycle, which means that you aren't allowed to be transmitting for more than 1% of the time.

Bandwidth

Often understood, the Bandwidth is the range of frequencies across which LoRa transmits. The LoRa protocol itself uses a system of 'chirps', which are spread form one end of the Bandwidth to the other going either up (an up-chirp), or down (a down-chirp). LoRahas 2 bandwidths it uses: 125kHz, 250kHz, and 500kHz.

Some example LoRa chirps as described above.

(Some example LoRa Chirps. Source: This Article on Link Labs)

Frequency

Frequency is something that most of us are familiar with. Different wireless protocols utilise different frequencies - allowing them to go about their business in peace without interfering with each other. For example, 2.4GHz and 5GHz are used by WiFi, and 800MHz is one of the frequencies used by 4G.

In the case of LoRa, different frequencies are in use in different parts of the world. ~868MHz is used in Europe (443MHz can also be used, but I haven't heard of many people doing so), 915MHz is used in the US, and ~780MHz is used in China.

Location	Frequency
Europe	863 - 870MHz
US	902 - 928MHz
China	779 - 787MHz

(Source: RF Wireless World)

Found this helpful? Still confused? Found a mistake? Comment below!

Sources and Further Reading

https://electronics.stackexchange.com/a/305287/180059

Demystifying Inverted Indexes

(The magnifying glass in the above banner came from openclipart)

After writing the post that will be released after this one, I realised that I made a critical assumption that everyone knew what an inverted index was. Upon looking for an appropriate tutorial online, I couldn't find one that was close enough to what I did in Pepperminty Wiki, so I decided to write my own.

First, some context. What's Pepperminty Wiki? Well, it's a complete wiki engine in a single file of PHP. The source files are obviously not a single file, but it builds into a single file - making it easy to drop into any PHP-enabled web server.

One of its features is a full-text search engine. A personal wiki of mine has ~75k words spread across ~550 pages, and it manages to search them all in just ~450ms! It does this with the aid of an inverted index - which I'll be explaining in this post.

First though, we need some data to index. How about the descriptions of some video games?

Kerbal Space Program

In KSP, you must build a space-worthy craft, capable of flying its crew out into space, without killing them. At your disposal is a collection of parts, which must be assembled to create a functional ship. Each part has its own function and will affect the way a ship flies (or doesn't). So strap yourself in, and get ready to try some Rocket Science!

Cross Code

Meet Lea as she logs into an MMO of the distant future. Follow her steps as she discovers a vast world, meets other players and overcomes all the challenges of the game.

Fort Meow

Fort Meow is a physics game by Upper Class Walrus about a girl, an old book and a house full of cats! Meow.

Factory Balls

Factory balls is the brand new bonte game I already announced yesterday. Factory balls takes part in the game design competition over at jayisgames. The goal of the design competition was to create a 'ball physics'-themed game. I hope you enjoy it!

Very cool, this should provide us with plenty of data to experiment with. Firstly, let's consider indexing. Take the Factory Balls description. We can split it up into tokens like this:

T	o	k	e	n	s	V	V
factory	balls	is	the	brand	new	bonte	game
i	already	announced	yesterday	factory	balls	takes
part	in	the	game	design	competition	over
at	jayisgames	the	goal	of	the	design
competition	was	to	create	a	ball	physics
themed	game	i	hope	you	enjoy	it

Notice how we've removed punctuation here, and made everything lowercase. This is important for the next step, as we want to make sure we consider Factory and factory to be the same word - otherwise when querying the index we'd have to remember to get the casing correct.

With our tokens sorted, we can now count them to create our index. It's like a sort of tally chart I guess, except we'll be including the offset in the text of every token in the list. We'll also be removing some of the most common words in the list that aren't very interesting - these are known as stop words. Here's an index generated from that Factory Balls text above:

Token	Frequency	Offsets
factory	2	0, 12
balls	2	1, 13
brand	1	4
new	1	5
bonte	1	6
game	3	7, 18, 37
i	2	8, 38
announced	1	10
yesterday	1	11
takes	1	14
design	2	19, 28
competition	2	20, 29
jayisgames	1	23
goal	1	25
create	1	32
ball	1	34
physics	1	35
themed	1	36
hope	1	39
enjoy	1	41

Very cool. Now we can generate an index for each page's content. The next step is to turn this into an inverted index. Basically, the difference between the normal index and a inverted index is that an entry in an inverted index contains not just the offsets for a single page, but all the pages that contain that token. For example, the Cross-Code example above also contains the token game, so the inverted index entry for game would contain a list of offsets for both the Factory Balls and Cross-Code pages.

Including the names of every page under every different token in the inverted index would be both inefficient computationally and cause the index to grow rather large, so we should assign each page a unique numerical id. Let's do that now:

Id	Page Name
1	Kerbal Space Program
2	Cross Code
3	Fort Meow
4	Factory Balls

There - much better. In Pepperminty Wiki, this is handled by the ids class, which has a pair of public methods: getid($pagename) and getpagename($id). If an id can't be found for a page name, then a new id is created and added to the list (Pepperminty Wiki calls this the id index) transparently. Similarly, if a page name can't be found for an id, then null should be returned.

Now that we've got ids for our pages, let's look at generating that inverted index entry for game we talked about above. Here it is:

Term: game

Id	Frequency	Offsets
2	1	31
3	1	5
4	3	5, 12, 23

Note how there isn't an entry for page id 1, as the Kerbal Space Program page doesn't contain the token game.

This, in essence, is the basics of inverted indexes. A full inverted index will contain an entry for every token that's found in at least 1 source document - though the approach used here is far from the only way of doing it (I'm sure there are much more advanced ways of doing it for larger datasets, but this came to mind from reading a few web articles and is fairly straight-forward and easy to understand).

Can you write a program that generates a full inverted index like I did in the example above? Try testing it on the test game descriptions at the start of this post.

You may also have noticed that the offsets used here are of the tokens in the list. If you wanted to generate contexts (like Duck Duck Go or Google do just below the title of a result), you'd need to use the character offsets from the source document instead. Can you extend your program to support querying the inverted index, generating contexts based on the inverted index too?

Liked this post? Got your own thoughts on the subject? Having trouble with the challenges at the end? Comment below!

Demystifying microphones: The difference between dynamics and condensers

Welcome back to another demystification post! This time, it's about microphones. I had a question recently about microphones and phantom power, and after doing some rather extensive research on the subject (unintentionally of course :P), I thought it a waste not to summarise it here.

Basically, phantom power is a +48V direct current that's transmitted through a microphone cable (not the kind you plug into your laptop I don't think - the big chunky ones). It's required by condenser microphones (though some use a battery instead), which have a pair of films (called diaphragms) which vibrate. When a current is passed through from one plate to the other, the physical sound gets converted into an electrical signal we can use.

A diagram of how a condenser microphone works on a whiteboard. Full explanation below.

Condenser microphones are much more sensitive than their dynamic microphone counterparts. They are able to better represent a wider range of frequencies - but as a result of this heightened sensitivity, you normally need a pop filter if you're recording vocals. In addition, they don't tend to perform too well in loud environments, such as concerts. Finally, they tend to be more expensive than dynamic microphones, too.

A diagram of how a dynamic microphone works on a whiteboard. Full explanation below.

Dynamic microphones, on the other hand, don't require phantom power. They are basically a loudspeaker in reverse and generate the current themselves - they have a single diaphragm that's attached to a metal core - which in turn has a coil of wire around it. When the diaphragm vibrates, so does the metal core - and as you can probably guess, a current is induced in the coil, as metal cores tend to do when inside coils of conveniently placed wires.

While they are better in loud environments (like concerts and drum kits), dynamic microphones aren't so good at representing a wide ranges of frequencies - and as such they are usually tailored to be pick up a specific frequency range better than others. Furthermore, they aren't as sensitive in general as your average condenser microphone, so they don't get on particularly well with very quiet sounds either.

Which you use generally depends on what you want to do. If you've got an overly enthusiastic drummer in a rock concert, you probably want a dynamic microphone. On the other hand, if you're trying to record the song of a cricket on a still summer's evening, you probably want to keep a condenser microphone handy.

I'm not an audio expert, so I might have made a few mistakes here and there! If you spot one, please do let me know in the comments below :-)

Sources and Further Reading

What Is Phantom Power? on the Dawsons blog
Condenser Microphones by Media College
Dynamic Microphones by Media College

Demystifying traceroute

A map of undersea cables. (Image from labnol. View the full map at submarinecablemap.com.)

A little while ago someone I know seemed a little bit confused as to how a traceroute interacts with a firewall, so I decided to properly look into and write this post.

Traceroute is the practice of sending particular packets across a network in order to discover all the different hops between the source computer and the destination. For example, here's a traceroute between starbeamrainbowlabs.com and bbc.co.uk:

traceroute to bbc.co.uk (212.58.244.23), 30 hops max, 60 byte packets
 1  125.ip-37-187-66.eu (37.187.66.125)  0.100 ms  0.015 ms  0.011 ms
 2  be10-147.rbx-g1-a9.fr.eu (37.187.231.169)  0.922 ms  0.912 ms  0.957 ms
 3  be100-1187.ldn-1-a9.uk.eu (91.121.128.87)  7.536 ms  7.538 ms  7.535 ms
 4  * * *
 5  ae-1-3104.ear2.London2.Level3.net (4.69.143.190)  18.481 ms  18.676 ms  18.903 ms
 6  unknown.Level3.net (212.187.139.230)  10.725 ms  10.434 ms  10.415 ms
 7  * * *
 8  ae0.er01.telhc.bbc.co.uk (132.185.254.109)  10.565 ms  10.666 ms  10.603 ms
 9  132.185.255.148 (132.185.255.148)  12.123 ms  11.781 ms  11.529 ms
10  212.58.244.23 (212.58.244.23)  10.596 ms  10.587 ms  65.243 ms

As you can see, there are quite a number of hops between us and the BBC, not all of which responded to attempts to probe them. Before we can speculate as to why, it's important to understand how a traceroute is performed.

There are actually a number of different methods to perform a traceroute, but they all have a few things in common. The basic idea exploits something called time to live (TTL). This is a special value that all IP packets have (located 16 bytes into an ipv4 header, and 7 bytes into an ipv6 header for those who are curious) that determines the maximum number of hops that a packet is allowed to go through before it is dropped. Every hop along a packet's route decreases this value by 1. When it reaches 0, an ICMP TTL Exceeded message is returned to the source of the packet. This message can be used to discover the hops between a given source and destination.

With that out of the way, we can move on to the different methods of generating this response from every hop along a given route. Linux comes with a traceroute utility built-in, and this is the tool that I'm going to be investigating. If you're on Windows, you can use tracert, but it doesn't have as many options as the Linux version.

Linux's traceroute utility defaults to using UDP packets on an uncommon port. It defaults to this because it's the best method that unprivileged users can use if they have a kernel older than 3.0 (check your kernel version with uname -r). It isn't ideal though, because many hosts don't expect incoming UDP packets and silently drop them.

Adding the -I flag causes traceroute to use ICMP ping requests instead. Thankfully most hosts will respond to ICMP pings, making it a much better probing tool. Some networks, however, don't allow ping requests to pass through their gateways (usually large institutions and schools), rendering this method useless in certain situations.

To combat the above, a new method was developed that uses TCP SYN packets instead of UDP or ICMP ping. If you send a TCP SYN packet (manipulating the TTL as above), practically all hosts will return some kind of message. This is commonly referred to as the TCP half-open technique, and defaults to port 80 - this allows the traceroute to bypass nearly all firewalls. If you're behind a proxy though I suspect it'll snag on it - theoretically speaking using port 443 instead should rectify this problem in most cases (i.e. traceroute -T -p 443 hostname.tld).

Traceroute has a bunch of other less reliable methods, which I'll explain quickly below.

-U causes traceroute to use UDP on port 53. This method usually only elicits responses from DNS servers along the route.
-UL makes traceroute use udplite in a similar fashion to UDP in the bullet point above. This is only available to administrators.
DCCP can also be used with the -D. It works similar to the TCP method described earlier.
A raw IP packet can also be used, but I can't think of any reasons you'd use this.

That just about covers all the different traceroute methods. If you have any questions, please leave a comment below.

Demystifying UDP

Yesterday I was taking a look at [UDP Multicast], and attempting to try it out in C#. Unfortunately, I got a little bit confused as to how it worked, and ended up sending a couple of hours wondering what I did wrong. I'm writing this post to hopefully save you the trouble of fiddling around trying to get it to work yourself.

UDP stands for User Datagram Protocol (or Unreliable Datagram Protocol). It offers no guarantee that message sent will be received at the other end, but is usually faster than its counterpart, TCP. Each UDP message has a source and a destination address, a source port, and a destination port.

When you send a message to a multicast address (like the 239.0.0.0/8 range or the FF00::/8 range for ipv6, but that's a little bit more complicated), your router will send a copy of the message to all the other interested hosts on your network, leaving out hosts that have not registered their interest. Note here that an exact copy of the original message is sent to all interested parties. The original source and destination addresses are NOT changed by your router.

With that in mind, we can start to write some code.

IPAddress multicastGroup = IPAddress.Parse("239.1.2.3");
int port = 43;
IPEndPoint channel = new IPEndPoint(multicastGroup, port);
UdpClient client = new UdpClient(43);
client.JoinMulticastGroup(multicastGroup);

In the above, I set up a few variables or things like the multicast address that we are going to join, the port number, and so on. I pass the port number to the new UdpClient I create, letting it know that we are interested in messages sent to that port. I also create a variable called channel, which we will be using later.

Next up, we need to figure out a way to send a message. Unfortunately, the UdpClient class only supports sends arrays of bytes, so we will be have to convert anything we want to send to and from a byte array. Thankfully though this isn't too tough:

string data = "1 2, 1 2, Testing!";
byte[] payload = Encoding.UTF8.GetBytes(data);
string message = Encoding.UTF8.GetString(payload);

The above converts a simple string to and from a byte[] array. If you're interested, you can also serialise and deserialise C♯ objects to and from a byte[] array by using Binary Serialisation. Anyway, we can now write a method to send a message across the network. Here's what I came up with:

private static void Send(string data)
{
    Console.WriteLine("Sending '{0}' to {1}.", data, destination);
    byte[] payload = Encoding.UTF8.GetBytes(data);
    Send(payload);
}
private static void Send(byte[] payload)
{
    client.Send(payload, payload.Length, channel);
}

Here I've defined a method to send stuff across the network for me. I've added an overload, too, which automatically converts string into byte[] arrays for me.

Putting the above together will result in a multicast message being sent across the network. This won't do us much good though unless we can also receive messages from the network too. Let's fix that:

public static async Task Listen()
{
    while(true)
    {
        UdpReceiveResult result = await client.ReceiveAsync();
        string message = Encoding.UTF8.GetString(result.Buffer);
        Console.WriteLine("{0}: {1}", result.RemoteEndPoint, message);
    }
}

You might not have seen (or heard of) asynchronous C# before, but basically it's a ways of doing another thing whilst you are waiting for one thing to complete. Dot net perls have a good tutorial on the subject if you want to read up on it.

For now though, here's how you call an asynchronous method from a synchronous one (like the Main() method since that once can't be async apparently):

Task.Run(() => Listen).Wait();

If you run the above in one program while sending a message in another, you should see something appear in the console of the listener. If not, your computer may not be configured to receive multicast messages that were sent from itself. In this case try running the listener on a different machine to the sender. In theory you should be able to run the listener on as many hosts on your local network as you want and they should all receive the same message.

Stardust Blog

Tag Cloud

AI encoders demystified

Images / spatial data

Sequenced Data

Conclusion

Encryption demystified: What to use and when

Device encryption

Transport layer encryption

End-to-end encryption

Conclusion

Servers demystified

What is a "server"?

Does a server need a graphics card (GPU)?

I need to purchase a license for Windows Server. Windows 10 isn't enough.

The problems with used hardware

What are the alternatives?

Renting from a hosting company

Buying a physical device

Conclusion

Sources and Further Reading

Demystificating VPNs

Compilers, VMs, and JIT: Spot the difference

Compilers

Virtual Machines

Just-In-Time Compilation

Conclusion

LoRa Terminology Demystified: A Glossary

LoRa

LoRaWAN

The Things Network

Data Rate

Maximum Payload Size

Spreading Factor

Duty Cycle

Bandwidth

Frequency

Sources and Further Reading

Demystifying Inverted Indexes

Demystifying microphones: The difference between dynamics and condensers

Sources and Further Reading

Demystifying traceroute

Demystifying UDP

Stardust
Blog