Stardust | Starbeamrainbowlabs

Pipes, /dev/shm, or a TCP socket: Which is faster?

I've been busy patching HAIL-CAESAR (a simplified 2D flood simulation program designed for HPC supercomputers) to make it more suitable for the scale of my PhD project, and as part of this I'm trying to use the standard input & output where possible to speed up data transfer for the pre and post-processing steps, since I need to convert the data to and from different formats.

As part of this, it crossed my mind that there are actually a number of different ways of getting data in and out of a program, so I decided to do a quick (relatively informal) test to see which was fastest.

In my actual project, I'm going to be doing the following data transfers:

From .jsonstream.gz files to a Node.js process
From the Node.js process to HAIL-CAESAR
From HAIL-CAESAR to another Node.js process (there's a LOT of data in this bit)
From that Node.js process to disk as PNG files

That's a lot of transferring. In particular the output of HAIL-CAESAR, which I'm currently writing directly to disk, appears to be absolutely enormous - due mainly to the hugely inefficient storage format used.

Anyway, the 3 mechanisms I'm putting to the test here are:

A pipe (e.g. writing to standard output)
Writing to a file in /dev/shm
A TCP socket

If anyone can think of any other mechanisms for rapid inter-process communication, please do get in touch by leaving a comment below.

Pipe

I'm simulating a pipe with the following code:

timeout --signal=SIGINT 30s dd if=/dev/zero status=progress | cat >/dev/null

The timeout --signal=SIGINT 30s bit lets it run for 30 seconds before stopping it with a SIGINT (the same as Ctrl + C). I'm reading from /dev/zero here, because I want to test the performance of the pipe and not be limited by the speed of random number generation if I were to use /dev/urandom.

Running this on my laptop resulted in a speed of ~396 MB/s.

`/dev/shm`

/dev/shm is the shared memory area on Linux - and is usually backed by a tmpfs file system (i.e. an in-memory ramdisk).

Here are the command I'm using to test this:

dd if=/dev/zero of=/dev/shm/test-1gb bs=1024 count=1000000
dd if=/dev/shm/test-1gb of=/dev/null bs=1024 count=1000000

This writes a 1GB file to /dev/shm, and then reads it back again (to be consistent with the pipe test). To calculate the overall MB/s speed, we need to know the time it took to do the read and write operations. I observed the following:

Operation	Speed	Time
Write	692 MB/s	1.4788s
Read	890 MB/s	1.1501s

....so that's 2.6289s in total. Then, we can calculate the MB/s by dividing 1GB by the total time, giving us a total transfer speed of ~380 MB/s. This seemed quite variable though - as when I tested it the other day I got only ~273 MB/s.

TCP Socket

Finally, to test a TCP socket, I devised the following:

nc -l 8888 >/dev/null &
timeout --signal=SIGINT 30s dd status=progress if=/dev/zero | nc 127.0.0.1 8888

The first line sets up the listener, and the 2nd line is the sender. As before with the pipe test, I'm stopping it after 30 seconds. It took a moment to stabilise, but towards the end it levelled off at about ~360 MB/s.

Conclusion

After running the 3 tests, the results were as follows:

Test	Speed
Pipe	396 MB/s
/dev/shm	380 MB/s
TCP Socket	360 MB/s

According to this, the pipe (i.e. writing to the standard output and reading from the standard input) is the fastest. This isn't particularly surprising (since the other methods have overhead), but interesting to test all the same. Here's a quick graph of that:

A quick bar chart of the above data

Of course, there are other considerations to take into account. For example, If you need scalable multi-core processing, then /dev/shm or TCP sockets (the latter especially since Linux has a special mechanism for multiple processes to listen on the same port and allow load-balancing between them) might be a better option - despite the additional overhead.

Other CPU architectures may have an effect on it too due to different CPU instructions being available - I ran these tests on Ubuntu 19.10 on the Intel Core i7-7500U in my laptop.

As of yet I'm unsure as to how much post-processing the data coming from HAIL-CAESAR will require - and whether it will require multiple processes to handle the load or not. I hope not - since HAIL-CAESAR is written in C++, and TCP sockets would be awkward and messy to implement (since you would probably have to use the low-level socket API, and I don't have any experience with networking in C++ yet) - and the HPC in question doesn't appear to have inotifywait installed to make listening for file writes on disk easier.

A comparison of compression formats for storing JSON

Happy new year, everyone!

I've blogged about different aspects of a (not so?) little project of mine several times now (exhibits A, B, C, D, and finally E - even if I didn't know it at the time), but it appears that I end up running into all sorts of interesting problems that I invent cool solutions for. I also find myself doing a bunch of research that I'm surprised nobody's compiled into a single place yet - as is the case in this blog post. (Also, Happy 2018 everyone! First post of the year :D)

I've been refactoring the subsystem that saves a considerable amount of JSON data to a bunch of different files on disk. Obviously, I'm interested in minimising the amount of space this JSON data takes up on disk. As this saving process happens in the background on a separate thread, I'm not too concerned about performance - other than it can't be too slow. With this in mind, I've found myself testing a bunch of different compression algorithms. Let's introduce our test data:

A 17MiB card game dataset, as a single minified JSON file
A 40KiB 'live' specimen chunk's data, saved as a single minified JSON file.

I can't remember which card game the first dataset is from, but I do know that I found it in the awesome JSON datasets list. Next up, here's our cast of compression algorithms we'll be testing:

The venerable GZip
BZip2 - Apparently GZIP, but smaller and slightly more computationally expensive
XZ (the newer child of LZMA2)
7zip
Google's Brotli

A colourful cast, to be sure! Let's run them through their paces - starting with the card game dataset. Here are the results I observe with each set to their default settings:

Format	Size
Uncompressed	17M
gzip	2.4M
bzip2	1.6M
xz	1.3M
7zip	1.4M
brotli	1.3M

Very interesting. It looks like xz and brotli are tied in first place - though I observed that brotli took ages in comparison to all the other algorithms I tested - and upon closer inspection xz beat it by 17.3KiB. Numbers are all very well, but to really see what's going on here, let's plot it on a graph:

A graph of the data in the table above.

That's better! I can actually make some comparisons now. From this graph we can observe that gzip is the worst of the lot, followed by bzip2. 7zip is surprisingly in third place, but then again it is designed for multiple files, whereas the rest of them are designed for a single stream of data. In second place is the terribly slow brotli, and finally in first place is xz.

Hrm - very interesting. How do our algorithms measure up when confronted a smaller load though? Here are the results for the sample chunk data:

Format	Size
Uncompressed	40K
gzip	5.4K
bzip2	4.3K
xz	4.6K
7zip	4.9K
Brotli	4.6K

Interesting results, to be sure, but I can't discern much from that. Let's plot a graph:

A graph of the data in the above table. Further explanation below.

Very interesting. With smaller loads, it appears that bzip2 performs much better with smaller loads than any other algorithm. While gzip is still the worst performing algorithm, while xz and brotli, surprisingly, performed much worse than bzip2.

To that end, I'm think I'm going to be choosing bzip2 as my compression of choice for this job, as it produces the best results for the type of work I'm going to be doing.

I'm really surprised about brotli though, actually. I had high hopes for it, considering it's a new algorithm invented by Google. They claimed that it would provide xz-like compression with gzip-like speeds - but from what I'm seeing, it does anything but.

Sources and Further Reading

Brotli.NET

Making Mathematical Art with C Sharp and PPM

The other day I wanted to (for some random reason) create some stripes. Having worked out a simple algorithm that would produce some rather nice stripes given an (x, y) pixel coordinate, I set out to write a small script that would generate some stripes for me.

I discovered that it wasn't as easy as I'd thought. Lockbits confused me, and I couldn't find a good enough example to learn from. Thankfully I caught wind of a ridiculously simple image format called PPM (Portable Pixel Map) that I could use to output a byte[] array of pixel data as a valid image.

Here's a diagram I made to illustrate the format:

The PPM Format.

The format basically consists of a ascii header, followed by a raw dump of a byte[] full of pixel data. The header contains several parts:

The characters P6 (This is called the 'magic byte', and can be used to identify the type of content that a file contains)
A single whitespace (I used \s in the diagram because that is the escape code for whitespace in a javascript regular expression)
The width of the image, in ascii
Another single whitespace
The height of the image, in ascii
Another single whitespace
The maximum value that the red / green / blue pixels will go up to. This value will be considered 100% saturated. Normally, you'd want this to be 255.
Another single whitespace - not shown on the diagram (oops); usually a new line (\n).
The raw byte[] array of pixel data.

Once you've your pixel data as a PPM, you can then use something like imagemagick to convert it to a png with a command like mogrify -format png image.ppm or convert image.ppm image.png.

Some red stripes.

Using this method, you can generate almost anything using pure C#. Here's the code I used to generate the above stripes:

using System;
using System.IO;

public class EmptyClass
{
    #region Settings

    static string filename = "image.ppm";

    static int width = 1500;
    static int height = 400;

    static int stripeWidth = width / 30;
    static rgb stripeLowCol = new rgb(204, 0, 0);
    static rgb stripeHighCol = new rgb(255, 51, 51);

    static float multiplier = 1f;

    #endregion

    #region Image Generator

    public static void Main()
    {
        byte[] pixelData = new byte[width * height * 3];
        for(int x = 0; x < width; ++x)
        {
            for(int y = 0; y < height; ++y)
            {
                int currentPixel = ((y * width) + x) * 3;
                pixelData[currentPixel] = redPixel(x, y);
                pixelData[currentPixel + 1] = greenPixel(x, y);
                pixelData[currentPixel + 2] = bluePixel(x, y);
            }
        }

        StreamWriter destination = new StreamWriter(filename);
        destination.Write("P6\n{0} {1}\n{2}\n", width, height, 255);
        destination.Flush();
        destination.BaseStream.Write(pixelData, 0, pixelData.Length);
        destination.Close();
    }

    #endregion

    #region Pixel value functions - edit these

    public static byte redPixel(int x, int y)
    {
        return (byte)(((x + y) % stripeWidth < stripeWidth / 2 ? stripeLowCol.r : stripeHighCol.r) * multiplier);
    }
    public static byte greenPixel(int x, int y)
    {
        return (byte)(((x + y) % stripeWidth < stripeWidth / 2 ? stripeLowCol.g : stripeHighCol.g) * multiplier);
    }
    public static byte bluePixel(int x, int y)
    {
        return (byte)(((x + y) % stripeWidth < stripeWidth / 2 ? stripeLowCol.b : stripeHighCol.b) * multiplier);
    }

    #endregion
}

#region Utility Classes
class rgb
{
    public byte r, g, b;
    public rgb(byte inCol)
    {
        r = g = b = inCol;
    }
    public rgb(byte inR, byte inG, byte inB)
    {
        r = inR;
        g = inG;
        b = inB;
    }
}
#endregion

The settings at the top control the appearance of the output. filename is the filename to write the image to, width and height set the dimensions of the image, stripeWidth sets the width in pixels of each stripe, and stripeLowCol and stripeHighCol set the colour of the different stripes. The multiplier at the end isn't actually needed, but you can use it to brighten or dim the resulting image if you want.

Not content with stripes, I played around for a bit longer and came up with this:

Another mathematical picture.

Above: My second attempt at mathematical art. It looks better in my native image previewer...

The above actually consists of a 3 different functions - one for each channel. Here they are:

public static byte redPixel(int x, int y)
{
    return (byte)(Math.Sin(x / (width / (Math.PI * 10))) * 255 * Math.Sin(y / (height / (Math.PI*10))));
}
public static byte greenPixel(int x, int y)
{
    return (byte)(Math.Sin(x / (width / (Math.PI * 5))) * 128 * Math.Sin(y / (height / (Math.PI * 5))));
}
public static byte bluePixel(int x, int y)
{
    return (byte)((Math.Sin(x / (width / Math.PI)) * 52 * Math.Sin(y / (height / Math.PI))) + 25);
}

I don't actually know how it works (even though I wrote it strangely enough), but if you do know, please leave a comment down below!

Since it might be a bit difficult to see, here's an animated gif that shows each of the colour channels broken down:

Illusion Boxes Decoded

Lastly, I have rendered a larger copy of the above. You can view it here (Size: 4.8MB).

Have you made some interesting mathematical art? Post it in the comments below!

A Distributed solution to /r/dailyprogrammer Challenge #223

A little while ago, I wrote my first distributed computing program to solve the second optional challenge to /r/dailyprogrammer's #223rd problem. I wrote it in Javascript, because I know really know how to utilise the tcp/ip networking stack in any other language yet :(

Anyway, if you want to check out my source code, I have put it up on GitHub. Below I will explain how you can get started with the pair for scripts I wrote, and how they work

Getting Started

Getting started is easy. Just clone the repository & cd into the new directory:

git clone https://github.com/sbrl/dailyprogrammer-223.git
cd dailyprogrammer-223

Then install the dependency:

npm install

Then you can start a server like so:

node server.js 4321 # starts a new server on port 4321

Or you can start a client like this:

node client.js starbeamrainbowlabs.com:9999

Note that I have used some ES6 features (check out my ES6: Features series), so if you are using Node.js and not io.js, you will need to tack on the --harmony flag in order to get them to work.

It works by mapping the search space (aaaaa to zzzzz) to a set of numbers, starting at 0. I used an ES6 generator to keep track of where we have got up to, wrapped in a function that handles reallocating work units that haven't been completed within a certain time limit. Each work unit consist of 16 words by default (though it can be changed), with the time limit set to 5 times the number of words in the block.

Next time, I will write the client so that it is compatible with a conventional browser - this should make it much easier for people to contribute their CPU time!

Stardust Blog

Tag Cloud