Starbeamrainbowlabs

Stardust
Blog


Archive


Mailing List Articles Atom Feed Comments Atom Feed Twitter Reddit Facebook

Tag Cloud

3d 3d printing account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio automation backups bash batch blender blog bookmarklet booting bug hunting c sharp c++ challenge chrome os cluster code codepen coding conundrums coding conundrums evolved command line compilers compiling compression conference conferences containerisation css dailyprogrammer data analysis debugging defining ai demystification distributed computing dns docker documentation downtime electronics email embedded systems encryption es6 features ethics event experiment external first impressions freeside future game github github gist gitlab graphics guide hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet interoperability io.js jabber jam javascript js bin labs latex learning library linux lora low level lua maintenance manjaro minetest network networking nibriboard node.js open source operating systems optimisation outreach own your code pepperminty wiki performance phd photos php pixelbot portable privacy problem solving programming problems project projects prolog protocol protocols pseudo 3d python reddit redis reference release releases rendering research resource review rust searching secrets security series list server software sorting source code control statistics storage svg systemquery talks technical terminal textures thoughts three thing game three.js tool tutorial twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 worldeditadditions xmpp xslt

Website update: Share2Fediverse, and you can do it too!

Heya! Got another short port for you here. You night notice that on all posts now there's a new share button (those buttons that take you to difference places with a link to this site to share it elsewhere) that looks like this:

The 5-pointed rainbow fediverse logo

If you haven't seen it before, this is the logo for the Fediverse, a decentralised network of servers and software that all interoperate (find out more here: https://fedi.tips/).

Since creating my Mastodon account, I've wanted some way to allow everyone here to share my posts on the Fediverse if they feel that way inclined. Unlike other centralised social media platforms like Reddit etc though, the Fediverse doesn't have a 'central' server that you can link to.

To this end, you need a landing page to act as a middleman. There are a few options out there already (e.g. share2fedi), but I wanted something specific and static, so I built my own solution. It looks like this:

A screenshot of Share2Fediverse. The background is rainbow like the fediverse logo, with translucent pentagons scattered across it. The landing page window is centred, with a title and a share form.

(Above: A screenshot of Share2Fediverse.)

It's basically a bit of HTML + CSS for styling, a splash of Javascript to make the interface function and remember the instance + software you select for next time via localStorage.

Check it out at this demo link:

https://starbeamrainbowlabs.com/share2fediverse/#text=The%20fediverse%20is%20cool!%20%E2%9C%A8

Currently, it supports sharing to Mastodon, GNU Social, and Diaspora. As it turns out, finding the share url (e.g. for Mastodon on fediscience.org it's https://fediscience.org/share?text=some%20text%20here) is more difficult than it sounds, as I haven't found it to be well advertised. I'd love to add e.g. Pixelfed, Kbin, GoToSocial, Pleroma, and more.... but I need the share URL! If you know the share URL for any piece of Fediverse software, please do leave a comment below.

If you're interested in the source code, you can find it here:

https://github.com/sbrl/Share2Fediverse/

...if you'd really like to help out, you could even open a pull request! The file you want to edit is src/lib/software_db.mjs - though if you leave a comment here or open an issue I'll pick it up and add any requests.

See you on the Fediverse! o/

500 posts - thank you!

Looking up into a blossom tree against a blue sky.

500 posts is a lot. When I started writing back in 2014, I never imagined that I was make it to this milestone. I've thought for a while about what I wanted to do to celebrate, but couldn't think of anything specific - so I wanted to thank everyone who has supported me so far in my journey through University - first in my undergraduate course, then in my MSc course, and now in my PhD.

It was Rob Miles that first encouraged me to start a blog in the first year of my undergraduate course. A few weeks later, and I had gone from a coming soon page to building starbeamrainbowlabs.com, followed closely by this blog which I put together piece by piece.

The backend is actually written in PHP - though it is on my (seemingly endless :P) todo list to rewrite it as it's not particularly well written. I've made a start on this already by refactoring the commenting system (and adding more statistics), but I haven't touched the blog itself and the main website (particularly the CSS) much yet.

In total, over the last 499 posts (I'm still writing this post as of the time of typing) I've written 347,256 words in total, counted by doing cat *.md | tr -d -- '-{}\[\]();=><' | wc -w on all the markdown sources of the posts I've written. This is a mind boggling number! I suspect it's somewhat inflated by the code I include in my blog posts though.

On these, I've received 192 (probably) genuine top-level comments that aren't spam (not counting replies, which are difficult to count with jq, as the replies parameter isn't always present in my backend JSON files I store comments in). Each and every one of these has been helpful, and given me motivation to continue writing here - especially more recently on my PhD Update series.

I might have missed some spam comments, so do get in touch if you spot one.

From my first post way back on 29th June 2014 to this post in the present spans exactly 7 years, 10 months, 13 days, and 8 hours (or 2874 days and 8 hours), averaging 5 days 17 hours between each post overall.

I would like to thank everyone who has supported me on this incredible journey - especially my personal supervisor and also my PhD supervisor - both of whom have continuously assisted me with issues both large and small at all times of the day and year. The entire Department of Computer Science at the University of Hull - members both past and present - have all been very kind and helpful, and I'm deeply grateful to have had such a welcoming place to be.

Finally, thank you for reading. While I don't write posts on my blog here expecting that anyone will read them, it's amazing to see and hear about people finding them helpful :D

I can't say where I'm headed next after my PhD (the end of which is still some time away), but I can say that I'm committed to posting on this blog - so it won't be going anywhere any time soon :P

If there's a specific topic you'd like me to cover (and I haven't already done so), please do leave a comment below.

A ladybird in a hawthorn bush.

Website Update: Tools section

A while ago I noticed that the tools section of my website was horribly outdated, so recently I decided to do something about it. It was still largely displaying tools from when I still used Windows as my primary operating system, which was a long time ago now!

The new revision changes it to display icons only instead of icons and screenshots, as screenshots aren't always helpful for some of the tools I now use - and it also makes it easier to keep the section updated in the future.

A screenshot of part of the new tools section of my website.

I also switched to use a tab separated values file (TSV file) instead of a JSON file for the backend data file that the tools list is generated from, as a TSV file is much more convenient to hand edit than a JSON file (JSON is awesome for lots of things and I use it all the time - it's just not as useful here). If you're interested, you can view the source TSV file here: tools.tsv

I'm still filling out the list (each item in the list also necessitates an update to my personal logo collection), but it's already a huge improvement from the old tools list. Things like GitHub's Atom IDE, Ubuntu, Mozilla Firefox, and KeePass2 are currently my daily drivers - so it's great to have them listed there.

Check out the new tools section here: https://starbeamrainbowlabs.com/#tools

New website for Pepperminty Wiki

By now, Pepperminty Wiki is quite probably my longest running project - and I'm absolutely committed to continuing to support and improve it over time (I use it to host quite a lot of very important information myself).

As part of this, one of the things I'm always looking to improve is the installation process and the first impression users get when they first visit Pepperminty Wiki. Currently, this has a GitHub repository. This is great (as it shows people that we're open-source), but it isn't particularly user-friendly for those who are less technically inclined.

To this end, I've built a shiny new website to introduce people to Pepperminty Wiki and the features it has to offer. I've been thinking about this for a while, and I realised that actually despite the fact that I haven't yet incremented the version number to v1.0 yet (as of the time of posting the latest stable release is v0.22), Pepperminty Wiki is actually pretty mature, easy to deploy and use, and stable.

The new website for Pepperminty Wiki (link below)

(Above: The new Pepperminty Wiki website. Check it out here!)

The stability is a new one for me, as it isn't something I've traditionally put much of a focus on - instead focusing on educational purposes. Development of Pepperminty Wiki has sort of fallen into a pattern of 2-3 releases per year - each of which is preceded by one or more beta releases. I always leave at least 1 week between releasing a beta and the subsequent stable release to give myself and beta testers (of which Pepperminty Wiki has some! If you're reading this, I really appreciate it) time to spot any last-minute issues.

Anyway, the website can be found here: https://peppermint.mooncarrot.space/

Share it with your friends! :D

The initial plan was to buy a domain name like pepperminty.wiki for it, but after looking into the prices (~£36.29 per year) I found it was waaay too expensive for a project that I'm not earning a penny from working on (of course, if you're feeling that way inclined I have a Liberapay setup if you'd like to contribute towards server costs, but it's certainly not required).

Instead, I used a subdomain of one of my existing domains, mooncarrot.space (I use this one mostly for personal web app instances on my new infrastructure I'm blogging about in my cluster series), which is a bit shorter and easier to spell/say than starbeamrainbowlabs.com if you're not used to it.

After a few false starts, I settled on using Eleventy as my static site generator of choice. I'm not making use of all it's features (not even close), but I've found it fairly easy to use and understand how it ticks - and also flexible enough such that it will work with me, rather attempting to force me into a particular way of working.

Honourable mentions here include Hugo (great project, but if I recall correctly I found it confusing and complicated to setup and use), documentation (an epic documentation generator for JS projects, but not suitable for this type of website - check out some of the docs I auto-generate via my Laminar CI setup: powahroot, applause-cli, terrain50).

The Pepperminty Wiki website light theme

(Above: The light theme for the website - which one you see depends on your system preference - I use prefers-colour-scheme here. Personally I prefer the dark theme myself, as it's easier on my eyes)

The experience of implementing the website was an interesting one. Never having built a website to 'sell' something before (even if this is for a thing that's free), I found the most challenging part of the experience determining what text to use to appropriately describe the features of Pepperminty Wiki.

From the beginning I sort of had a vision for how I wanted the website to look. I wanted an introductory bit at the top (with a screenshot at a cool angle!), followed by a bit that explained the features, the some screenshots with short descriptions, followed finally by a download section. I also wanted it to be completely mobile-friendly.

A screenshot of the website as viewed by a mobile device

(Above: A screenshot of the website as viewed by a mobile device. The Firefox Developer Tools were useful for simulating this)

For the most part, this panned out quite well. Keeping the design relatively simple enabled me to support mobile devices as I went along, with minimal tweaks needed at the end of the process (mobile support really needs to be part of the initial design process).

The cool screenshot at the top and the fancy orange buttons you'll see in various places across the site were especially fun to put together - the iterative process of adding CSS directives to bring the idea I had in my head as to how I wanted it to look to life was very satisfying. I think I'll use the same basic principle I used for the fancy buttons again elsewhere (try hovering over them and clicking them to see the animations).

The bottom of the website, showing the fancy orange buttons

(Above: The bottom of the website, showing the fancy orange buttons)

I did contemplate the idea of using a CSS framework for the website, but not having seriously used one before for a personal project combined with the advent of the CSS grid ended up in the decision to abandon the use of a framework once again (I'll learn one eventually, I'm sure ).

So far my experience with frameworks is that they just get in the way when you want to do something that wasn't considered when the framework was built, but I suppose that given their widespread use elsewhere that I really should make an effort to learn at least one framework to get that experience (any suggestions in the comments are welcome).

All in all the experience of building the Pepperminty Wiki website was an enjoyable one. It took a number of hours over a number of days to put together (putting the false starts aside), but I feel as though it was definitely worth it.

Find the website here: https://peppermint.mooncarrot.space/

If I end up moving it at a later date, I'll ensure there's a redirect in place so the above link won't break.

Found this useful? Got a comment about or a suggestion to improve the website? Comment below! I'd love to hear from you.

Spam statistics are live!

I've blogged about spam a few times before, and as you might have guessed defending against it and analysing the statistics thereof is a bit of a hobby of mine. Since I first installed the comment key system (and then later upgraded) in 2015, I've been keeping a log of all the attempts to post spam comments on my blog. Currently it amounts to ~27K spam attempts, which is about ~14 comments per day overall(!) - so far too many to sort out manually!

This tracking system is based on mistakes. I have a number of defences in place, and each time that defence is tripped it logs it. For example, here are some of the mistake codes for some of my defences:

Code Meaning
website A web address was entered (you'll notice you can't see a website address field in the comment form below - it's hidden to regular users)
shortcomment The comemnt was too short
invalidkey The comment key)
http10notsupported The request was made over HTTP 1.0 instead of HTTP 1.1+
invalidemail The email address entered was invalid

These are the 5 leading causes of comment posting failures over the past month or so. Until recently, the system would only log the first defence that was tripped - leaving other defences that might have been tripped untouched. This saves on computational resources, but doesn't help the statistics I've been steadily gathering.

With the new system I implemented on the 12th June 2020, a comment is checked against all current defences - even if one of them has been tripped already, leading to some interesting new statistics. I've also implemented a quick little statistics calculation script, which is set to run every day via cron. The output thereof is public too, so you can view it here:

Failed comment statistics

Some particularly interesting things to note are the differences in the mistake histograms. There are 2 sets thereof: 1 pair that tracks all the data, and another that only tracks the data that was recorded after 12th June 2020 (when I implemented the new mistake recording system).

From this, we can see that if we look at only the first mistake made, invalidkey catches more spammers out by a landslide. However, if we look at all the mistakes made, the website check wins out instead - this is because the invalidkey check happens before the website check, so it was skewing the results because the invalidkey defence is the first line of defence.

Also interesting is how comment spam numbers have grown over time in the spam-by-month histogram. Although it's a bit early to tell by that graph, there's a very clear peak around May / June, which I suspect are malicious actors attempting to gain an advantage from people who may not be able to moderate their content as closely due to recent happenings in the world.

I also notice that the overall amount of spam I receive has an upwards trend. I suspect this is due to more people knowing about my website since it's been around for longer.

Finally, I notice that in the average number of mistakes (after 2020-06-12) histogram, most spammers make at least 2 mistakes. Unfortunately there's also a significant percentage of spammers who make only a single mistake, so I can't yet relax the rules such that you need to make 2 or more mistakes to be considered a spammer.

Incidentally, it would appear that the most common pair of mistakes to make are shortcomment and website - perhaps this is an artefact of some specific scraping / spamming software? If I knew more in this area I suspect that it might be possible to identify the spammer given the mistakes they've made and perhaps their user agent.

This is, of course, a very rudimentary analysis of the data at hand. If you're interested, get in touch and I'm happy to consider sharing my dataset with you.

EmbedBox: Lightweight syntax-highlighted embeds

I was planning posting about something else yesterday, but I wanted to show some GitLab code in a syntax-highlighted embed. When I wasn't able to figure out how to do that, I ended up writing EmbedBox.

The whole thing is best explained with an example. Have an embed:

(Can't see the above? Check out the original file here)

Pretty cool, right? The above is the default settings file for EmbedBox. Given any URL (e.g. https://raw.githubusercontent.com/sbrl/EmbedBox/master/src/settings.default.toml), it will generate a syntax-highlighted embed for it.

It does so using highlight.php to do the syntax-highlighting server-side, Stash PHP for the cache, and without any Javascript in the embed itself.

It comes with a web interface that generates the embed code given the input URL and a few other settings and shows a preview of what it'll look like.

EmbedBox is open-source too (under the Mozilla Public Licence 2.0), so you're welcome to setup your own instance!

To do so, check out the code here: https://github.com/sbrl/EmbedBox/

The installation instructions should be pretty straightforward in theory, but if you get stuck please open an issue.

Now that I've implemented EmbedBox, you can expect to see it appear in future blog posts. I'm planning to write about my organise-photos script in the near future, so expect a blog post about it soon.

Found this interesting? Got a suggestion? Want to say hi? Comment below!

Website update: Blog post view counter

Website update! This time, I've added a blog post view counter. You can see it at the bottom of every blog post:

A screenshot of the new blog post view counter with a red box around it. I would have liked to highlight it by darkening & blurring the rest of the image, but my screenshotting tool doesn't support it yet.

While views don't really matter to me on this blog, I am curious as to how many people read my posts.

It was fairly simple to implement actually, but the internals are quite interesting. Under-the-hood, it uses a 1x1 transparent tracking image, that's actually located just to the right of the word "views". You can view that image here. I searched the Internet to find the absolute smallest tracking image I could find, and came up with the one I'm using now (it's from here).

The aim here with using an external tracking image is to avoid counting bots that just load the page without images to see if they can spam me.

Every time you load the image, it adds 1 to a counter stored in an SQLite database file. It also serves a caching header, so that your browser (shouldn't) request the same tracking image more than once in a 30 minute time frame.

The system itself is fairly portable and flexible - I can use it in other places with little to no changes should I wish to. It also has a simple status dashboard where you can see all the views at the same time. As of the time of typing, these are the top 5 posts:

Spot Name Views
1 How to set up a WebDav share with Nginx 78
2 Run a program on your dedicated AMD graphics card on Linux 48
3 Embedding Files in C♯ Binaries 36
4 Orange Pi 3 in review 28
5 Developing and Running C# Programs on Linux 25

I kind of suspected that the posts in spots #1 and #2 would be popular. I've got quite a few comments on both of them - which is quite unusual for this blog. I estimate that only 1 in 500 to 1 in 1000 people actually leave a comment.

The post in #3 isn't really a surprise either - I've seen it crop up a number of times in my server logs, and I found it really difficult to find a clear and easy-to-read post on the subject when I wrote that post.

The post in #4 is probably only there because I used it for testing purposes - so at least 70% of those 'views' were me :P

Lastly, the post in #5 surprises me a bit. I would have thought that there's plenty of other resources around the internet about running .NET applications on Linux with Mono that would rank much more highly than my blog post, but I guess I was wrong! I'd be really curious to know if those people are primarily from my University.

The views further down the list get into the <5 views range somewhat quickly, so I'd take those under advisement. I suspect that they are probably bots automatically crawling the page, such as the GoogleBot for instance.

It's amazing to know that people actually read the things I write on here, even if they don't comment. It gives me motivation to write more blog posts :P

Of course, if there's something in particular that you'd like to see, you're welcome to leave a comment.

Where in the world does spam come from?

Answer: The US, apparently. I was having a discussion with someone recently, and since I have a rather extensive log of comment failures for debugging & analysis purposes (dating back to February 2015!) they suggested that I render a map of where the spam is coming from.

It was such a good idea that I ended up doing just that - and somehow also writing this blog post :P

First, let's start off by looking at the format of said log file:

[ Sun, 22 Feb 2015 07:37:03 +0000] invalid comment | ip: a.b.c.d | name: Nancyyeq | articlepath: posts/015-Rust-First-Impressions.html | mistake: longcomment
[ Sun, 22 Feb 2015 14:55:50 +0000] invalid comment | ip: e.f.g.h | name: Simonxlsw | articlepath: posts/015-Rust-First-Impressions.html | mistake: invalidkey
[ Sun, 22 Feb 2015 14:59:59 +0000] invalid comment | ip: x.y.z.w | name: Simontuxc | articlepath: posts/015-Rust-First-Impressions.html | mistake: invalidkey

Unfortunately, I didn't think about parsing it programmatically when I designed the log file format.... Oops! It's too late to change it now, I suppose :P

Anyway, as an output, we want a list of countries in 1 column, and a count of the number of IP addresses in another. First things first - we need to extract those IP addresses. awk is ideal for this. I cooked this up just quickly:

BEGIN {
    FS="|"
}

{
    gsub(" ip: ", "", $2);
    print $2;
}

This basically tells awk to split lines on the solid bar character (|), extracts the IP address bit (ip: p.q.r.s), and then strips out the ip: bit.

With this done, we're ready to lookup all these IP addresses to find out which country they're from. Unfortunately, IP addresses can change hands semi-regularly - even across country borders, so my approach here isn't going to be entirely accurate. I don't anticipate the error generated here to be all that big though, so I think it's ok to just do a simple lookup.

If I was worried about it, I could probably investigate cross-referencing the IP addresses with a GeoIP database from the date & time I recorded them. The effort here would be quite considerable - and this is a 'just curious' sort of thing, so I'm not going to do that here. If you have done this, I'd love to hear about it though - post a comment below.

Actually doing a GeoIP lookup itself is fairly easy to do, actually. While for the odd IP address here and there I usually use ipinfo.io, when there are lots of lookups to be done (10,479 to be exact! Wow.), it's probably best to utilise a local database. A quick bit of research reveals that Ubuntu Server has a package I can install that should do the job called geoip-bin:


sudo apt install geoip-bin
(....)
geoiplookup 1.1.1.1 # CloudFlare's 1.1.1.1 DNS service
GeoIP Country Edition: AU, Australia

Excellent! We can now lookup IP addresses automagically via the command line. Let's plug that in to the little command chain we got going on here:

cat failedcomments.log | awk 'BEGIN { FS="|" } { gsub(" ip: ", "", $2); print $2 }' | xargs -n1 geoiplookup

It doesn't look like geoiplookup supports multiple IP addresses at once, which is a shame. In that case, the above will take a while to execute for 10K IP addresses.... :P

Next up, we need to remove the annoying label there. That's easy with sed:

(...) | sed -E 's/^[A-Za-z: ]+, //g'

I had some trouble here getting sed to accept a regular expression. At some point I'll have to read the manual pages more closely and write myself a quick reference guide. Come to think about it, I could use such a thing for awk too - their existing reference guide appears to have been written by a bunch of mathematicians who like using single-letter variable names everywhere.

Anyway, now that we've got our IP address list, we need to strip out any errors, and then count them all up. The first point is somewhat awkward, since geoiplookup doesn't send errors to the standard error for some reason, but we can cheese it with grep -v:

(...) | grep -iv 'resolve hostname'

The -v here tells grep to instead remove any lines that match the specified string, instead of showing us only the matching lines. This appeared to work at first glance - I simply copied a part of the error message I saw and worked with that. If I have issues later, I can always look at writing a more sophisticated regular expression with the -P option.

The counting bit can be achieved in bash with a combination of the sort and uniq commands. sort will, umm, sort the input lines, and uniq with de-duplicate multiple consecutive input lines, whilst optionaly counting them. With this in mind, I wound up with the following:

(...) | sort | uniq -c | sort -n

The first sort call sorts the input to ensure that all identical lines are next to each other, reading for uniq.

uniq -c does the de-duplication, but also inserts a count of the number of duplicates for us.

Lastly, the final sort call with the -n argument sorts the completed list via a natural sort, which means (in our case) that it handles the numbers as you'd expect it too. I'd recommend you read the Wikipedia article on the subject - it explains it quite well. This should give us an output like this:

      1 Antigua and Barbuda
      1 Bahrain
      1 Bouvet Island
      1 Egypt
      1 Europe
      1 Guatemala
      1 Ireland
      1 Macedonia
      1 Mongolia
      1 Saudi Arabia
      1 Tuvalu
      2 Bolivia
      2 Croatia
      2 Luxembourg
      2 Paraguay
      3 Kenya
      3 Macau
      4 El Salvador
      4 Hungary
      4 Lebanon
      4 Maldives
      4 Nepal
      4 Nigeria
      4 Palestinian Territory
      4 Philippines
      4 Portugal
      4 Puerto Rico
      4 Saint Martin
      4 Virgin Islands, British
      4 Zambia
      5 Dominican Republic
      5 Georgia
      5 Malaysia
      5 Switzerland
      6 Austria
      6 Belgium
      6 Peru
      6 Slovenia
      7 Australia
      7 Japan
      8 Afghanistan
      8 Argentina
      8 Chile
      9 Finland
      9 Norway
     10 Bulgaria
     11 Singapore
     11 South Africa
     12 Serbia
     13 Denmark
     13 Moldova, Republic of
     14 Ecuador
     14 Romania
     15 Cambodia
     15 Kazakhstan
     15 Lithuania
     15 Morocco
     17 Latvia
     21 Pakistan
     21 Venezuela
     23 Mexico
     23 Turkey
     24 Honduras
     24 Israel
     29 Czech Republic
     30 Korea, Republic of
     32 Colombia
     33 Hong Kong
     36 Italy
     38 Vietnam
     39 Bangladesh
     40 Belarus
     41 Estonia
     44 Thailand
     50 Iran, Islamic Republic of
     53 Spain
     54 GeoIP Country Edition: IP Address not found
     60 Poland
     88 India
    113 Netherlands
    113 Taiwan
    124 Indonesia
    147 Sweden
    157 Canada
    176 United Kingdom
    240 Germany
    297 China
    298 Brazil
    502 France
   1631 Russian Federation
   2280 Ukraine
   3224 United States

Very cool. Here's the full command for reference explainshell explanation:

cat failedcomments.log | awk 'BEGIN { FS="|" } { gsub(" ip: ", "", $2); print $2 }' | xargs -n1 geoiplookup | sed -e 's/GeoIP Country Edition: //g' | sed -E 's/^[A-Z]+, //g' | grep -iv 'resolve hostname' | sort | uniq -c | sort -n

With our list in hand, I imported it into LibreOffice Calc to parse it into a table with the fixed-width setting (Google Sheets doesn't appear to support this), and then pulled that into a Google Sheet in order to draw a heat map:

A world map showing the above output in a heat-map style. Countries with more failed comment attempt appear redder.

At first, the resulting graph showed just a few countries in red, and the rest in white. To rectify this, I pushed the counts through the natural log (log()) function, which yielded a much better map, where the countries have been spamming just a bit are still shown in a shade of red.

From this graph, we can quite easily conclude that the most 'spammiest' countries are:

  1. The US
  2. Russia
  3. Ukraine (I get lots of spam emails from here too)
  4. China (I get lots of SSH intrusion attempts from here)
  5. Brazil (Wat?)

Personally, I was rather surprised to see the US int he top spot. I figured that with with tough laws on that sort of thing, spammers wouldn't risk attempting to buy a server and send spam from here.

On further thought though, it occurred to me that it may be because there are simply lots of infected machines in the US that are being abused (without the knowledge of their unwitting users) to send lots of spam.

At any rate, I don't appear to have a spam problem on my blog at the moment - it's just fascinating to investigate where the spam I do block comes from.

Found this interesting? Got an observation of your own? Plotted a graph from your own data? Comment below!

Using libsodium to upgrade the comment key system

I've blogged about the comment key system I utilise on this blog to prevent spam before (see also). Today, I've given it another upgrade to make it harder for spammers to fake a comment key!

In the last post, I transformed the comment key with a number of reversible operations - including a simple XOR password system. This is, of course, very insecure - especially since an attacker knows (or can at least guess) the content of the encrypted key, making it trivial (I suspect) to guess the password used for 'encryption'.

The solution here, obviously, is to utilise a better encryption system. Since it's the 5th November and I'm not particularly keen on my website going poof like the fireworks tonight, let's do something about it! PHP 7.2+ comes with native libsodium support (those still using older versions of PHP can still follow along! Simply install the PECL module). libsodium bills itself as

A modern, portable, easy to use crypto library.

After my experiences investigating it, I'd certainly say that's true (although the official PHP documentation could do with, erm, existing). I used this documentation instead instead - it's quite ironic because I have actually base64-encoded the password.......

Anyway, after doing some digging I found the quick reference, which explains how you can go about accomplishing common tasks with libsodium. For my comment key system, I want to encrypt my timestamp with a password - so I wanted the sodium_crypto_secretbox() and its associated sodium_crypto_secretbox_open() functions.

This pair of functions, when used together, implement a secure symmetric key encryption system. In other words, they securely encrypt a lump of data with a password. They do, however, have 2 requirements that must be taken care of. Firstly, the password must be of a specific length. This is easy enough to accomplish, as PHP is kind enough to tell us how long it needs to be:

/**
 * Generates a new random comment key system password.
 * @return string   The base64-encoded password.
 */
function key_generate_password() {
    return base64_encode_safe(random_bytes(SODIUM_CRYPTO_SECRETBOX_KEYBYTES));
}

Easy-peasy! base64_encode_safe isn't a built-in function - it's a utility function I wrote that we'll need later. For consistency, I've used it here too. Here it is, along with its counterpart:

/**
 * Encodes the given data with base64, but replaces characters that could 
 * get mangled in transit with safer equivalents.
 * @param   mixed   $data   The data to encode.
 * @return  string  The data, encoded as transmit-safe base64.
 */
function base64_encode_safe($data) {
    return str_replace(["/", "+"], ["-", "_"], base64_encode($data));
}
/**
 * Decodes data encoded with base64_encode_safe().
 * @param   mixed   $base64     The data to decode.
 * @return  string  The data, decoded from transmit-safe base64.
 */
function base64_decode_safe($base64) {
    return base64_decode(str_replace(["-", "_"], ["/", "+"], $base64));
}

With that taken care of, we can look at the other requirement: a nonce. Standing for Number used ONCE, it's a sequence of random bytes that's used by the encryption algorithm. We don't need to keep it a secret, but we do need to to decrypt the data again at the other end, in addition to the password - and we do need to ensure that we generate a new one for every comment key. Thankfully, this is also easy to do in a similar manner to generating a password:

$nonce = random_bytes(SODIUM_CRYPTO_SECRETBOX_NONCEBYTES);

With everything in place, we can look at upgrading the comment key generator itself. It looks much simpler now:

/**
 * Generates a new comment key.
 * Note that the 'encryption' used is not secure - it's just simple XOR!
 * It's mainly to better support verification in complex setups and 
 * serve as a nother annoying hurdle for spammers.
 * @param  string $pass The password to encrypt with. Should be a base64-encoded string from key_generate_password().
 * @return string       A new comment key stamped at the current time.
 */
function key_generate($pass) {
    $pass = base64_decode_safe($pass);
    $nonce = random_bytes(SODIUM_CRYPTO_SECRETBOX_NONCEBYTES);
    $encrypt = sodium_crypto_secretbox(
        strval(time()), // The thing we want to encrypt
        $nonce, // The nonce
        $pass // The password to encrypt with
    );

    return base64_encode_safe($nonce.$encrypt);
}

I bundle the nonce with the encrypted data here to ensure that the system continues to be stateless (i.e. we don't need to store any state information about a user on the server). I also encode the encrypted string with base64, as the encrypted strings contain lots of nasty characters (it's actually a binary byte array I suspect). This produces keys like this:

BOqDvr26XY9s8PhlmIZMIp6xCOZyfsh6J05S4Jp2cY3bL8ccf_oRgRMldrmzKk6RrnA=
Tt8H81tkJEqiJt-RvIstA_vz13LS8vjLgriSAvc1n5iKwHuEKjW93IMITikdOwr5-NY=
5NPvHg-l1GgcQ9ZbThZH7ixfKGqAtSBr5ggOFbN_wFRjo3OeJSWAcFNhQulYEVkzukI=

They are a bit long, but not unmanageable. In theory, I could make it a bit shorter by avoiding converting the integer output from time() to a string, but in my testing it didn't make much difference. I suspect that there's some sort of minimum length to the output string for some (probably good) reason.


php > var_dump(sodium_crypto_secretbox(time(), random_bytes(24), random_bytes(32)));
string(26) "GW$:���ߌ@]�+1b��������d%"
php > var_dump(sodium_crypto_secretbox(strval(time()), random_bytes(24), random_bytes(32)));
string(26) ":_}0H �E°9��Kn�p��ͧ��"

Now that we're encrypting comment keys properly, it isn't much good unless we can decrypt them as well! Let's do that too. The first step is to decode the base64 and re-split the nonce from the encrypted data:

$pass = base64_decode_safe($pass);
$key_enc_raw = base64_decode_safe($key);
$nonce = mb_substr($key_enc_raw, 0, SODIUM_CRYPTO_SECRETBOX_NONCEBYTES, "8bit");
$key_enc = mb_substr($key_enc_raw, SODIUM_CRYPTO_SECRETBOX_NONCEBYTES, null, "8bit");

This is derived from the example code. With the 2 separated once more, we can decrypt the key:

$key_dec = sodium_crypto_secretbox_open($key_enc, $nonce, $pass);

Apparently, according to the example code on the website I linked to above, if the return value isn't a string then the decryption failed. We should make sure we handle that when returning:

if(!is_string($key_dec))
    return null;
return intval($key_dec);

That completes the decryption code. Here is in full:

/**
 * Decodes the given key.
 * @param  string $key  The key to decode.
 * @param  string $pass The password to decode it with.
 * @return int|null     The time that the key was generated, or null if the provided key was invalid.
 */
function key_decode($key, $pass) {
    $pass = base64_decode_safe($pass);
    $key_enc_raw = base64_decode_safe($key);
    $nonce = mb_substr($key_enc_raw, 0, SODIUM_CRYPTO_SECRETBOX_NONCEBYTES, "8bit");
    $key_enc = mb_substr($key_enc_raw, SODIUM_CRYPTO_SECRETBOX_NONCEBYTES, null, "8bit");
    $key_dec = sodium_crypto_secretbox_open($key_enc, $nonce, $pass);
    if(!is_string($key_dec)) return null;
    return intval($key_dec);
}

Explicitly returning null in key_decode() requires a small change to key_verify(), in order to prevent it from thinking that a key is really old if the decryption fails (null is treated as 0 in arithmetic operations, apparently). Let's update key_verify() to handle that:

/**
 * Verifies a key.
 * @param   string  $key        The key to decode and verify.
 * @param   string  $pass       The password to decode the key with.
 * @param   int     $min_age    The minimum required age for the key.
 * @param   int     $max_age    The maximum allowed age for the key.
 * @return  bool                Whether the key is valid or not.
 */
function key_verify($key, $pass, $min_age, $max_age) {
    $key_time = key_decode($key, $pass);
    if($key_time == null) return false;
    $age = time() - $key_time;
    return $age >= $min_age && $age <= $max_age;
}

A simple check is all that's needed. With that, the system is all updated! Time to generate a new password with the provided function and put it to use. You can do that directly from the PHP REPL (php -a):


php > require("lib/comment_system/comment_key.php");
php > var_dump(key_generate_password());                                        string(44) "S0bJpph5htdaKHZdVfo9FB6O4jOSzr3xZZ94c2Qcn44="

Obviously, this isn't the password I'm using on my blog :P

I've got a few more ideas I'd like to play around with to deter spammers even more whilst maintaining the current transparency - so you may see another post soon on this subject :-)

With everything complete, here's the complete script:

(Above: The full comment key system code. Can't see it? Check it out on GitHub Gist here.)

Found this useful? Got an improvement? Confused about something? Comment below!

Sources and Further Reading

Maintenance: Server Push Support!

Recently, I took the time to add the official nginx ppa to my server to keep nginx up-to-date. In doing do, I jumped from a security-path-backported nginx 1.10 to version 1.14..... which adds a bunch of very cool new features. As soon as I leant that HTTP/2 Server Push was among the new features to be supported, I knew that I had to try it out.

In short, Server Push is a new technology - part of HTTP/2.0 (it's here at last :D) - that allows you to send resources to the client before they even know they need them. This is done by enabling it in the web server, and then having the web application append a specially-formatted link header to outgoing requests - which tell the web server what resources it bundle along with the response.

First, let's enable it in nginx. This is really quite simple:

http {
    # ....

    http2_push_preload      on;

    # ....
}

This enables link header parsing serve-wide. If you want to enable it for just a single virtual host, the http2_push_preload directive can be placed inside server blocks too.

With support enabled in nginx, we can add support to our web application (in my case, this website!). If you do a HEAD request against a page on my website, you'll get a response looking like this:

HTTP/2 200 
server: nginx/1.14.0
date: Tue, 21 Aug 2018 12:35:02 GMT
content-type: text/html; charset=UTF-8
vary: Accept-Encoding
x-powered-by: PHP/7.2.9-1+ubuntu16.04.1+deb.sury.org+1
link: </theme/core.min.css>; rel=preload; as=style, </theme/main.min.css>; rel=preload; as=style, </theme/comments.min.css>; rel=preload; as=style, </theme/bit.min.css>; rel=preload; as=style, </libraries/prism.min.css>; rel=preload; as=style, </theme/tagcloud.min.css>; rel=preload; as=style, </theme/openiconic/open-iconic.min.css>; rel=preload; as=style, </javascript/bit.min.js>; rel=preload; as=script, </javascript/accessibility.min.js>; rel=preload; as=script, </javascript/prism.min.js>; rel=preload; as=script, </javascript/smoothscroll.min.js>; rel=preload; as=script, </javascript/SnoozeSquad.min.js>; rel=preload; as=script
strict-transport-security: max-age=31536000;
x-xss-protection: 1; mode=block
x-frame-options: sameorigin

Particularly of note here is the link header. it looks long and complicated, but that's just because I'm pushing multiple resources down. Let's pull it apart. In essence, the link header takes a comma (,) separated list of paths to resources that the web-server should push to the client, along with the type of each. For example, if https://bobsrockets.com/ wanted to push down the CSS stylesheet /theme/boosters.css, they would include a link header like this:

link: </theme/boosters.css>; rel=preload; as=style

It's also important to note here that pushing a resource doesn't mean that we don't have to utilise it somewhere in the page. By this I mean that pushing a stylesheet down as above still means that we need to add the appropriate <link /> element to put it to use:

<link rel="stylesheet" href="/theme/boosters.css" />

Scripts can be sent down too. Doing so is very similar:

link: </js/liftoff.js>; rel=preload; as=script

There are other as values as well. You can send all kinds of things:

  • script - Javascript files
  • style - CSS Stylesheets
  • image - Images
  • font - Fonts
  • document - <iframe /> content
  • audio - Sound files to be played via the HTML5 <audio /> element
  • worker - Web workers
  • video - Videos to be played via the HTML5 <video /> element.

The full list can be found here. If you don't have support in your web server yet (or can't modify HTTP headers) for whatever reason, never fear! There's still something you can do. HTML also supports a similar <link rel="preload" href="...." /> element that you can add to your document's <head>.

While this won't cause your server to bundle extra resources with a response, it'll still tell the client to go off and fetch the specified resources in the background with a high priority. Obviously, this won't help much with external stylesheets and scripts (since simply being present in the document is enough to get the client to request them), but it could still be useful if you're lazily loading images, for example.

In future projects, I'll certainly be looking out for opportunities to take advantage of HTTP/2.0 Server Push (probably starting with investigating options for Pepperminty Wiki). I've found the difference to be pretty extraordinary myself.

Of course, this is hardly the only feature that HTTP/2 brings. If there's the demand, I may blog about other features and how they work too.

Found this interesting? Confused about something? Using this yourself in a cool project? Comment below!

Art by Mythdael