Summer Project Part 4: Threading the needle and compacting it down
In the last part, I put the circuit for the IoT device together and designed a box for said circuit to be housed inside of.
In this post, I'm going to talk a little bit about 3D printing, but I'm mostly going to discuss the software aspect of the firmware I'm writing for the Arduino Uno that's going to be control the whole operation out in the field.
Since last time, I've completed the design for the housing and sent it off to my University for 3D printing. They had some great suggestions for improving the design like making the walls slightly thicker (moving from 2mm to 4mm), and including an extra lip on the lid to keep it from shifting around. Here are some pictures:
(Left: The housing itself. Right: The lid. On the opposite side (not shown), the screw holes are indented.)
At the same time as handling sending the housing off to be 3D printed, I've also been busily iterating on the software that the Arduino will be running - and this is what I'd like to spend the majority of this post talking about.
I've been taking an iterative approach to writing it - adding a library, interfacing with it and getting it to do what I want on it's own, then integrating it into the main program.... and then compacting the whole thing down so that it'll fit inside the Arduino Uno. The thing is, the Uno is powered by an ATmega328P (datasheet). Which has 32K of program space and just 2K of RAM. Not much. At all.
The codebase I've built for the Uno is based on the following libraries.
- LMiC (the matthijskooijman fork) - the (rather heavy and needlessly complicated) LoRaWAN implementation
- Entropy for generating random numbers as explained in part 2
- TinyGPS, for decoding NMEA messages from the NEO-6M
- SdFat, for interfacing with microSD cards over SPI
Memory Management
Packing the whole program into a 32K + 2K box is not exactly an easy challenge, I discovered. I chose to first deal with the RAM issue. This was greatly aided by the FreeMemory library, which tells you how much RAM you've got left at a given point in the execution of your program. While it's a bit outdated, it's still a useful tool. It works a bit like this:
#include <MemoryFree.h>;
void setup() {
Serial.begin(115200);
Serial.println(freeMemory, DEC);
char test[] = "Bobs Rockets";
Serial.println(freeMemory, DEC); // Should be lower than the above call
}
void loop() {
// Nothing here
}
It's worth taking a moment to revise the way stacks and heaps work - and the differences between how they work in the Arduino environment and on your desktop. This is going to get rather complicated quite quickly - so I'd advise reading this stack overflow answer first before continuing.
First, let's look at the locations in RAM for different types of allocation:
- Things on the stack
- Things on the heap
- Global variables
Unlike on the device you're reading this on, the Arduino does not support multiple processes - and therefore the entirety of the RAM available is allocated to your program.
Since I wasn't sure about preecisly how the Arduino does it (it's processor architecture-specific), I wrote a simple test program to tell me:
#include <Arduino.h>
struct Test {
uint32_t a;
char b;
};
Test global_var;
void setup() {
Serial.begin(115200);
Test stack;
Test* heap = new Test();
Serial.print(F("Stack location: "));
Serial.println((uint32_t)(&stack), DEC);
Serial.print(F("Heap location: "));
Serial.println((uint32_t)heap, DEC);
Serial.print(F("Global location: "));
Serial.println((uint32_t)&global_var, DEC);
}
void loop() {
// Nothing here
}
This prints the following for me:
Stack location: 2295
Heap location: 461
Global location: 284
From this we can deduce that global variables are located at the beginning of the RAM space, heap allocations go on top of globals, and the stack grows down starting from the end of RAM space. It's best explained with a diagram:
Now for the differences. On a normal machine running an operating system, there's an extra layer of abstraction between where things are actually located in RAM and where the operating system tells you they are located. This is known as virtual memory address translation (see also virtual memory, virtual address space).
It's a system whereby the operating system maintains a series of tables that map physical RAM to a virtual address space that the running processes actually use. Usually each process running on a system will have it's own table (but this doesn't mean that it will have it's own physical memory - see also shared memory, but this is a topic for another time). When a process accesses an area of memory with a virtual address, the operating system will transparently translate the address using the table to the actual location in RAM (or elsewhere) that the process wants to access.
This is important (and not only for security), because under normal operation a process will probably allocate and deallocate a bunch of different lumps of memory at different times. With a virtual address space, the operating system can defragment the physical RAM space in the background and move stuff around without disturbing currently running processes. Keeping the free memory contiguous speeds up future allocations, and ensures that if a process asks for a large block of contiguous memory the operating system will be able to allocate it without issue.
As I mentioned before though, the Arduino doesn't have a virtual memory system - partly because it doesn't support multiple processes (it would need an operating system for that). The side-effect here is that it doesn't defragment the physical RAM. Since C/C++ isn't a managed language, we don't get _heap compaction_ either like in .NET environments such as Mono.
All this leads us to an environment in which heap allocation needs to be done very carefully, in order to avoid fragmenting the heap and causing a stack crash. If an object somewhere in the middle of the heap is deallocated, the heap will not shrink until everything to the right of it is also deallocated. This post has a good explanation of the problem too.
Other things we need to consider are keeping global variables to a minimum, and trying to keep most things on the stack if we can help it (though this may slow the program down if it's copying things between stack frames all the time).
To this end, we need to choose the libraries we use with care - because they can easily break these guidelines that we've set for ourselves. For example, the inbuilt SD library is out, because it uses a global variable that eats over 50% of our available RAM - and it there's no way (that I can see at least) to reclaim that RAM once we're finished with it.
This is why I chose SdFat instead, because it's at least a little better at allowing us to reclaim most of the RAM it used once we're finished with it by letting the instance fall out of scope (though in my testing I never managed to reclaim all of the RAM it used afterwards).
Alternatives like ยต-Fat do exist and are even lighter, but they have restrictions such as no appending to files for example - which would make the whole thing much more complicated since we'd have to pre-allocate the space for the file (which would get rather messy).
The other major tactic you can do to save RAM is to use the F()
trick. Consider the following sketch:
#include
void setup() {
Serial.begin(115200);
Serial.println("Bills boosters controller, version 1");
}
void loop() {
// Nothing here
}
On the highlighted line we've got an innocent Serial.println()
call. What's not obvious here is that the string literal here is actually copied to RAM before being passed to Serial.println()
- using up a huge amount of our precious working memory. Wrapping it in the F()
macro forces it to stay in your program's storage space:
Serial.println(F("Bills boosters controller, version 1"));
Saving storage
With the RAM issue mostly dealt with, I then had to deal with the thorny issue of program space. Unfortunately, this is not as easy as saving RAM because we can't just 'unload' something when it's not needed.
My approach to reducing program storage space was twofold:
- Picking lightweight alternatives to libraries I needed
- Messing with flags of said libraries to avoid compiling parts of libraries I don't need
It is for these reasons that I ultimately went with TinyGPS instead of TinyGPS++, as it saved 1% or so of the program storage space.
It's also for this reason that I've disabled as much of LMiC as possible:
#define DISABLE_JOIN
#define DISABLE_PING
#define DISABLE_BEACONS
#define DISABLE_MCMD_DCAP_REQ
#define DISABLE_MCMD_DN2P_SET
This disables OTAA, Class B functionality (which I don't need anyway), receiving messaages, the duty cycle cap system (which I'm not sure works between reboots), and a bunch of other stuff that I'd probably find rather useful.
In the future, I'll probably dedicate an entire microcontroller to handling LoRaWAN functionality - so that I can still use the features I've had to disable here.
Even doing all this, I still had to trim down my Serial.println()
calls and remove any non-essential code to bring it under the 32K limit. As of the time of typing, I've got jut 26 bytes to spare!
Next time, after tuning the TPL5110 down to the right value, we're probably going to switch gears and look at the server-side of things - and how I'm going to be storing the data I receive from the Arudino-based device I've built.
Found this interesting? Got a suggestion? Comment below!