Pipes, /dev/shm, or a TCP socket: Which is faster?
I've been busy patching HAIL-CAESAR (a simplified 2D flood simulation program designed for HPC supercomputers) to make it more suitable for the scale of my PhD project, and as part of this I'm trying to use the standard input & output where possible to speed up data transfer for the pre and post-processing steps, since I need to convert the data to and from different formats.
As part of this, it crossed my mind that there are actually a number of different ways of getting data in and out of a program, so I decided to do a quick (relatively informal) test to see which was fastest.
In my actual project, I'm going to be doing the following data transfers:
- From
.jsonstream.gz
files to a Node.js process - From the Node.js process to HAIL-CAESAR
- From HAIL-CAESAR to another Node.js process (there's a LOT of data in this bit)
- From that Node.js process to disk as PNG files
That's a lot of transferring. In particular the output of HAIL-CAESAR, which I'm currently writing directly to disk, appears to be absolutely enormous - due mainly to the hugely inefficient storage format used.
Anyway, the 3 mechanisms I'm putting to the test here are:
- A pipe (e.g. writing to standard output)
- Writing to a file in
/dev/shm
- A TCP socket
If anyone can think of any other mechanisms for rapid inter-process communication, please do get in touch by leaving a comment below.
Pipe
I'm simulating a pipe with the following code:
timeout --signal=SIGINT 30s dd if=/dev/zero status=progress | cat >/dev/null
The timeout --signal=SIGINT 30s
bit lets it run for 30 seconds before stopping it with a SIGINT (the same as Ctrl + C). I'm reading from /dev/zero
here, because I want to test the performance of the pipe and not be limited by the speed of random number generation if I were to use /dev/urandom
.
Running this on my laptop resulted in a speed of ~396 MB/s.
/dev/shm
/dev/shm
is the shared memory area on Linux - and is usually backed by a tmpfs file system (i.e. an in-memory ramdisk).
Here are the command I'm using to test this:
dd if=/dev/zero of=/dev/shm/test-1gb bs=1024 count=1000000
dd if=/dev/shm/test-1gb of=/dev/null bs=1024 count=1000000
This writes a 1GB file to /dev/shm
, and then reads it back again (to be consistent with the pipe test). To calculate the overall MB/s speed, we need to know the time it took to do the read and write operations. I observed the following:
Operation | Speed | Time |
---|---|---|
Write | 692 MB/s | 1.4788s |
Read | 890 MB/s | 1.1501s |
....so that's 2.6289s in total. Then, we can calculate the MB/s by dividing 1GB by the total time, giving us a total transfer speed of ~380 MB/s. This seemed quite variable though - as when I tested it the other day I got only ~273 MB/s.
TCP Socket
Finally, to test a TCP socket, I devised the following:
nc -l 8888 >/dev/null &
timeout --signal=SIGINT 30s dd status=progress if=/dev/zero | nc 127.0.0.1 8888
The first line sets up the listener, and the 2nd line is the sender. As before with the pipe test, I'm stopping it after 30 seconds. It took a moment to stabilise, but towards the end it levelled off at about ~360 MB/s.
Conclusion
After running the 3 tests, the results were as follows:
Test | Speed |
---|---|
Pipe | 396 MB/s |
/dev/shm | 380 MB/s |
TCP Socket | 360 MB/s |
According to this, the pipe (i.e. writing to the standard output and reading from the standard input) is the fastest. This isn't particularly surprising (since the other methods have overhead), but interesting to test all the same. Here's a quick graph of that:
Of course, there are other considerations to take into account. For example, If you need scalable multi-core processing, then /dev/shm or TCP sockets (the latter especially since Linux has a special mechanism for multiple processes to listen on the same port and allow load-balancing between them) might be a better option - despite the additional overhead.
Other CPU architectures may have an effect on it too due to different CPU instructions being available - I ran these tests on Ubuntu 19.10 on the Intel Core i7-7500U in my laptop.
As of yet I'm unsure as to how much post-processing the data coming from HAIL-CAESAR will require - and whether it will require multiple processes to handle the load or not. I hope not - since HAIL-CAESAR is written in C++, and TCP sockets would be awkward and messy to implement (since you would probably have to use the low-level socket API, and I don't have any experience with networking in C++ yet) - and the HPC in question doesn't appear to have inotifywait
installed to make listening for file writes on disk easier.