Stardust | Starbeamrainbowlabs

Summer Project Part 2: Random Number Analysis with Gnuplot

In my last post about my Masters Summer Project, I talked about my plans and what I'm doing. In this post, I want to talk about random number generator evaluation.

As part of the Arduino-based Internet of Things device that will be collecting the data, I need to generate high-quality random numbers in order to ensure that the unique ids I use in my project are both unpredictable and unique.

In order to generate such numbers, I've found a library that exploits the jitter in the inbuilt watchdog timer that's present in the Arduino Uno. It's got a manual which is worth a read, as it explains the concepts behind it quite well.

After some experimenting, I ended up with a program that would generate random numbers as fast as it could:

// Generate_Random_Numbers - This sketch makes use of the Entropy library
// to produce a sequence of random integers and floating point values.
// to demonstrate the use of the entropy library
//
// Copyright 2012 by Walter Anderson
//
// This file is part of Entropy, an Arduino library.
// Entropy is free software: you can redistribute it and/or modify
// it under the terms of the GNU General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// (at your option) any later version.
//
// Entropy is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
// GNU General Public License for more details.
//
// You should have received a copy of the GNU General Public License
// along with Entropy.  If not, see <http://www.gnu.org/licenses/>.
// 
// Edited by Starbeamrainbowlabs 2019

#include "Entropy.h"

void setup() {
    Serial.begin(9600);

    // This routine sets up the watch dog timer with interrupt handler to maintain a
    // pool of real entropy for use in sketches. This mechanism is relatively slow
    // since it will only produce a little less than two 32-bit random values per
    // second.
    Entropy.initialize();
}

void loop() {
    uint32_t random_long;
    random_long = Entropy.random();
    Serial.println(random_long);
}

As you can tell, it's based on one of the examples. You may need to fiddle around with the imports in order to get it to work, because the Arduino IDE is terrible.

With this in place and uploaded to an Arduino, all I needed to do was log the serial console output to a file. Thankfully, this is actually really quite easy on Linux:

screen -S random-number-generator dd if=/dev/ttyACM0 of=random.txt bs=1

Since I connected the Arduino in question to a Raspberry Pi I have acting as a file server, I've included a screen call here that ensures that I can close the SSH session without it killing the command I'm executing - retaining the ability to 'reattach' to it later to check on it.

With it set off, I left it for a day or 2 until I had at least 1 MiB of random numbers. Once it was done, I ended up with a file looking a little like this:

(Want more? Download the entire set here.)

In total, it generated 134318 numbers for me to play with, which should be plenty to graph their distribution.

Graphing such a large amount of numbers requires a special kind of program. Since I've used it before, I reached for Gnuplot.

A histogram is probably the best kind of graph for this purpose, so I looked up how to get gnuplot to draw one and tweaked it for my own purposes.

I didn't realise that you could do arbitrary calculations inside a Gnuplot graph definition file, but apparently you can. The important bit below is the bin_width variable:

set key off
set border 3

# Add a vertical dotted line at x=0 to show centre (mean) of distribution.
#set yzeroaxis

# Each bar is half the (visual) width of its x-range.
#set boxwidth 0.05 absolute
#set style fill solid 1.0 noborder

bin_width = 171797751.16;

bin_number(x) = floor(x / bin_width)

rounded(x) = bin_width * ( bin_number(x) + 0.5 )

set terminal png linewidth 3 size 1920,1080

plot 'random.txt' using (rounded($1)):(1) smooth frequency with boxes

It specifies the width of each bar on the graph. To work this out, we need to know the maximum number in the dataset. Then we can divide it by the target number of bins to get the width thereof. Time for some awk!

awk 'BEGIN { a = 0; b=999999999999999 } { if ($1>0+a) a=$1; if ($1 < 0+b) b=$1; } END { print(a, b); }' <random.txt

This looks like a bit of a mess, so let's unwind that awk script so that we can take a better look at it.

BEGIN {
    max = 0;
    min = 999999999999999
}
{
    if ($1 > max)
        max = $1;
    if ($1 < min)
        min = $1;
}
END {
    print("Max:", max, "Min:", min);
}

Much better. In short, it keeps a record of the maximum and minimum numbers it's seen so far, and updates them if it sees a better one. Let's run that over our random numbers:

awk -f minmax.awk

Excellent. 4294943779 ÷ 25 = 171797751.16 - which is how I arrived at that value for the bin_width earlier.

Now we can render our graph:

gnuplot random.plt >histogram.png && optipng -o7 histogram.png

I always optimise images with either optipng or jpegoptim to save on storage space on the server, and bandwidth for readers - and in this case the difference was particularly noticeable. Here's the final graph:

The histograph generated by the above command.

As you can see, the number of numbers in each bin is pretty even, so we can reasonably conclude that the algorithm isn't too terrible.

What about uniqueness? Well, that's much easier to test than the distribution. If we count the numbers before and after removing duplicates, it should tell us how many duplicates there were. There's even a special command for it:

wc -l <random.txt 
134318
sort <random.txt | uniq | wc -l
134317
sort <random.txt | uniq --repeated
1349455381

Very interesting. Out of ~134K numbers, there's only a single duplicate! I'm not sure whether that's a good thing or not, as I haven't profiled any other random number generated in this manner before.

Since I'm planning on taking 1 reading a minute for at least a week (that's 10080 readings), I'm not sure that I'm going to get close to running into this issue - but it's always good to be prepared I guess......

Found this interesting? Got a better way of doing it? Comment below!

Sources and Further Reading

Entopy Library for Arduino
Gnuplot
Optipng
Jpegoptim
Optipng on wapm.io - I didn't even know that WebAssembly had a package manager. I definitely need to investigate that now.....
LoRa library that has a random byte generation feature based on untuned radio input

LoRa Terminology Demystified: A Glossary

My 2 RFM95s on the lid of my project's box. More info in a future blog post coming soon!

(Above: My 2 RFM95s. One works, but the other doesn't yet....)

I've been doing some more experimenting with LoRa recently, as I've got 1 of my 2 RFM95 working (yay)! While the other is still giving me trouble (meaning that I can't have 1 transmit and the other receive yet :-/), I've still been able to experiment with other people's implementations.

To that end, I've been learning about a bunch of different words and concepts - and thought that I'd document them all here.

LoRa

The radio protocol itself is called LoRa, which stands for Long Range. It provides a chirp-based system (more on that later under Bandwidth) to allow 2 devices to communicate over great distances.

LoRaWAN

LoRaWAN builds on LoRa to provide a complete end-to-end protocol stack to allow Internet of Things (IoT) devices to communicate with an application server and each other. It provides:

Standard device classes (A, B, and C) with defined behaviours
- Class A devices can only receive for a short time after transmitting
- Class B devices receive on a regular, timed, basis - regardless of when they transmit
- Class C devices send and receive whenever they like
The concept of a Gateway for picking up packets and forwarding them across the rest of the network (The Things Network is the largest open implementation to date - you should definitely check it out if you're thinking of using LoRa in a project)
Secure multiple-layered encryption of messages via AES

...amongst many other things.

The Things Network

The largest open implementation of LoRaWAN that I know of. If you hook into The Things Network's LoRaWAN network, then your messages will get delivered to and from your application server and LoRaWAN-enabled IoT device, wherever you are in the world (so long as you've got a connection to a gateway). It's often abbreviated to TTN.

Check out their website.

A coverage map for The Things Network.

(Above: A coverage map for The Things Network. The original can be found here)

Data Rate

The data rate is the speed at which a message is transmitted. This is measured in bits-per-second, as LoRa itself is an 'unreliable' protocol (it doesn't guarantee that anyone will pick anything up at the other end). There are a number of preset data rates:

Code	Speed (bits/second)
DR0	250
DR1	440
DR2	980
DR3	1760
DR4	3125
DR5	5470
DR6	11000
DR7	50000

_(Source: Exploratory Engineering: Data Rate and Spreading Factor)_

These values are a little different in different places - the above are for Europe on 868MHz.

Maximum Payload Size

Going hand-in-hand with the Data Rate, the Maximum Payload Size is the maximum number of bytes that can be transmitted in a single packet. If more than the maximum number of bytes needs to be transmitted, then it will be split across multiple packets - much like TCP's Maximum Transmission Unit (MTU), when it comes to that.

With LoRa, the maximum payload size varies with the Data Rate - from 230 bytes at DR7 to just 59 at DF2 and below.

Spreading Factor

Often abbreviated to just simply SF, the spreading factor is also related to the Data Rate. In LoRa, the Spreading Factor refers to the duration of a single chirp. There are 6 defined Spreading Factors: ranging from SF7 (the fastest transmission speed) to SF12 (the slowest transmission speed).

Which one you use is up to you - and may be automatically determined by the driver library you use (it's always best to check). At first glance, it may seem optimal to choose SF7, but it's worth noting that the slower speeds achieved by the higher spreading factors can net you a longer range.

Data Rate	Configuration	bits / second	Max payload size (bytes)
DR0	SF12/125kHz	250	59
DR1	SF11/125kHz	440	59
DR2	SF10/125kHz	980	59
DR3	SF9/125kHz	1 760	123
DR4	SF8/125kHz	3 125	230
DR5	SF7/125kHz	5 470	230
DR6	SF7/250kHz	11 000	230
DR7	FSK: 50kpbs	50 000	230

_(Again, from Exploratory Engineering: Data Rate and Spreading Factor)_

Duty Cycle

A Duty Cycle is the amount of time something is active as a percentage of a total time. In the case of LoRa(/WAN?), there is an imposed 1% Duty Cycle, which means that you aren't allowed to be transmitting for more than 1% of the time.

Bandwidth

Often understood, the Bandwidth is the range of frequencies across which LoRa transmits. The LoRa protocol itself uses a system of 'chirps', which are spread form one end of the Bandwidth to the other going either up (an up-chirp), or down (a down-chirp). LoRahas 2 bandwidths it uses: 125kHz, 250kHz, and 500kHz.

Some example LoRa chirps as described above.

(Some example LoRa Chirps. Source: This Article on Link Labs)

Frequency

Frequency is something that most of us are familiar with. Different wireless protocols utilise different frequencies - allowing them to go about their business in peace without interfering with each other. For example, 2.4GHz and 5GHz are used by WiFi, and 800MHz is one of the frequencies used by 4G.

In the case of LoRa, different frequencies are in use in different parts of the world. ~868MHz is used in Europe (443MHz can also be used, but I haven't heard of many people doing so), 915MHz is used in the US, and ~780MHz is used in China.

Location	Frequency
Europe	863 - 870MHz
US	902 - 928MHz
China	779 - 787MHz

(Source: RF Wireless World)

Found this helpful? Still confused? Found a mistake? Comment below!

Sources and Further Reading

https://electronics.stackexchange.com/a/305287/180059

LoRaWAN: Dream wireless communication for IoT

(Above: The LoRaWAN Logo. Nope, I'm not affiliated with them in any way - I just find it really cool and awesome :P)

Could it be? Wireless communication for internet of things devices that's not only low-power, but also fairly low-cost, and not only provides message authentication, but also industrial-strength encryption? Too good to be true? You might think so, but if what I'm reading is correct, there's initiative that aims to provide just that: LoRaWAN, long-range radio.

I first heard about it at the hardware meetup, and after a discussion last time, I thought I ought to take a serious look into it - and as you can probably guess by this post, I'm rather impressed by what I've seen.

Being radio-based, LoRaWAN uses various sub-gigahertz bands - the main one being ~868MHz in Europe, though apparently it can also use 433MHz and 169MHz. It can transfer up to 50kbps, but obviously that's that kind of speed can also be reached fairly close to the antenna.

Thankfully, the protocol seems to have accounted for this, and provides an adaptive speed negotiation system that lowers data-rates to suboptimal conditions and at long range - down to just 300bps, apparently - so while you're not going to browsing the web on it any time soon (sounds like a challenge to me :P), it's practically perfect for internet-of-things devices, which enable one to answer questions like "where's my cat? It's 2am and she's got out again....", and "what's the air quality like around here? Can we model it?" - without having to pay for an expensive cellular-based solution with a SIM card.

It's this that has me cautiously excited. The ability to answer such questions without paying thousands of pounds with certainly be rather cool. But my next question was: won't that mean even more laughably insecure devices scattered across the countryside? Well, maybe, but the LoRa alliance seems to have thought of this too, and have somehow managed to bake in 128-bit AES encryption and authentication.

Wait, what? Before we go into more detail, let's take a quick detour to look at how the LoRaWAN network functions. It's best explained with a diagram:

A diagram showing how the LoRa network works - explanation below.

The IoT device sends a message by radio to the nearest gateways.
All gateways in range receive the message and send it to the network server.
The message travels through the internet to the network server.

In essence, the LoRa network is fairly simple multi-layered network:

IoT Device: The (low-power) end device sending (or receiving) a message.
Gateway: An internet-capable device with a (more powerful) LoRa antenna on it. Relays messages between IoT Devices and the requested network sever.
Network Server: A backend server that sends and receives messages to and from the gateways. It deduplicates incoming messages form the gateways, and sends them on to the right Application Server. Going in the opposite direction, it remembers to which gateway the IoT device has the strongest connection, and sends the message there to bee transmitted to the IoT device in the next transmit window.
Application Server (not pictured): The server that does all the backend processing of the data coming from or going out to the IoT Devices.

Very interesting. With the network structure out of the way, let's talk about that security I mentioned earlier. Firstly, reading their security white paper reveals that it's more specifically AES 128 bit in counter mode (AES-128-CTR).

Secondly, isn't AES the Advanced Encryption Algorithm? What's all this about authentication then? Well, it (ab?)uses AES to create a CMAC (cipher-based message authentication code) for every message sent across the network, thus verifying it's integrity. The specific algorithm in use is AES-CMAC, which is standardised in RFC 4493.

Reading the white papers and technical documents on the LoRa Alliance website doesn't reveal any specific details on how the encryption keys are exchanged, but it does mention that there are multiple different keys involved - with separate keys for the network server, application server, and the connecting device itself - as well as a session key derivation system, which sounds to me a lot like forward secrecy that's used in TLS.

Since there's interest at the C4DI hardware meetup of possibly doing a group-style project with LoRaWAN, I might post some more about it in the future. If you're interested yourself, you should certainly come along!

Sources and Further Readings

Getting started with arduino

An arduino and a simple circuit.

Since I've been playing around with the Arduino a bit recently (thank you Rob!) I thought I'd write a quick post on how you can get started with the arudino and it's many variants.

If you're into creating electronics and creating circuits, then the arduino is for you. The arduino is a small electronic board that you might find some variant thereof in your thermostat at home, or Rob's thing-o-matic for example. You'll probably find something akin to an arduino in most embedded systems.

To get started, you'll need to buy an Arduino Uno (you can probably find it cheaper elsewhere). Some LEDs, resistors, and jumper cables wouldn't hurt either.

Once you've got all that, you can start to have some fun. To compile and send programs to your new arudino, you'll need to download and install the Arduino IDE from the official arduino website (direct link for debian / ubuntu users). Once installed, connect your arduino to your computer using the supplied cable and open the IDE.

The menu options that need changing in the IDE

Next, we need to set the IDE up to send correctly compiled programs to our new board. Firstly, we need to tell the IDE what kind of board we have. Go to Tools->Board and select Arduino Uno. We also need to tell the IDE which programmer to use. Go to Tools->Programmer and select AVRISP mkII. Finally, we need to tell the IDE which serial port the arduino is connected on. Go to Tools->Serial Port and select the last item in the list. If the next steps don't work, try selecting a different option in this list until it works.

With that out of the way, we can start to test out our arduino! Arduinos are programmed using a variant of C, which is similar to GSGL. To get started quickly, let's send some example code to our arduino to start with. In the file menu, go to Examples->01. Basics and select Blink.

Selecting the example code in the file menu.

A new window will pop up containing the example code. To compile and send the code to your arduino, click the second button in from the left, with the right facing arrow on it. This will send the code to your arduino. Once it's done, you should see a flashing light on your arduino board!

The Arduino IDE interface.

The other buttons are also useful. Here's an explanation:

Verify - Compiles and checks your code for syntax errors, but doesn't write it to the arduino.
Upload - Compiles your code and sends it to your arduino.
New - Creates a new document. This clears your existing tab! Use the down arrow below the 6 in the picture and select New Tab instead.
Open - Opens an existing document. Again, this clears your existing tab.
Save - This should be obvious.
Opens the serial monitor. The serial monitor is like a very basic console which allows you to see what your arduino is saying and lets you send messages to it.

That just about covers my very basic getting started tutorial for the arduino. If you've got any questions or comments, please leave them down below.

Sources and Further Reading

The official Arduino website
Basic arduino tutorials - Learn more about the arduino board and what everything does.
Arduino language reference - Useful when writing code.
08249 Labs - A comprehensive set of tutorials that teach everything that you'll need to get started (Hull University students only).
Simple Arduino Serial Communication by Arduino Basics
Arduino Programming Cheat Sheet by Mark Liffiton

Stardust Blog

Tag Cloud

Summer Project Part 2: Random Number Analysis with Gnuplot

Sources and Further Reading

LoRa Terminology Demystified: A Glossary

LoRa

LoRaWAN

The Things Network

Data Rate

Maximum Payload Size

Spreading Factor

Duty Cycle

Bandwidth

Frequency

Sources and Further Reading

LoRaWAN: Dream wireless communication for IoT

Sources and Further Readings

Getting started with arduino

Sources and Further Reading

Stardust
Blog