Starbeamrainbowlabs

Stardust
Blog


Archive


Mailing List Articles Atom Feed Comments Atom Feed Twitter Reddit Facebook

Tag Cloud

3d 3d printing account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio automation backups bash batch blender blog bookmarklet booting bug hunting c sharp c++ challenge chrome os cluster code codepen coding conundrums coding conundrums evolved command line compilers compiling compression conference conferences containerisation css dailyprogrammer data analysis debugging defining ai demystification distributed computing dns docker documentation downtime electronics email embedded systems encryption es6 features ethics event experiment external first impressions freeside future game github github gist gitlab graphics guide hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet interoperability io.js jabber jam javascript js bin labs latex learning library linux lora low level lua maintenance manjaro minetest network networking nibriboard node.js open source operating systems optimisation outreach own your code pepperminty wiki performance phd photos php pixelbot portable privacy problem solving programming problems project projects prolog protocol protocols pseudo 3d python reddit redis reference release releases rendering research resource review rust searching secrets security series list server software sorting source code control statistics storage svg systemquery talks technical terminal textures thoughts three thing game three.js tool tutorial twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 worldeditadditions xmpp xslt

Finding Favicons with PHP

There hasn't been a post here for a little while because I have been ill. I am back now though :)

While writing more Bloworm, I needed a function that would automatically detect the url of the favicon that is associated with a given url. I wrote a quick function to do this a while ago - and have been improving it little by little.

I now have it at a point where it finds the correct url 99% of the time, so I thought that I would share it with you.

/*
 * @summary Given a url, this function will attempt to find it's correspending favicon.
 *
 * @returns The url of the corresponding favicon.
 */
function auto_find_favicon_url($url)
{
    if(!validate_url($url))
        senderror(new api_error(400, 520, "The url you specified for the favicon was invalid."));

    // todo protect against downloading large files
    // todo send HEAD request instead of GET request
    try {
        $headers = get_headers($url, true);
    } catch (Exception $e) {
        senderror(new api_error(502, 710, "Failed to fetch the headers from url: $url"));
    }
    $headers = array_change_key_case($headers);

    $urlparts = [];
    preg_match("/^([a-z]+)\:(?:\/\/)?([^\/?#]+)(.*)/i", $url, $urlparts);

    $content_type = $headers["content-type"];
    if(!is_string($content_type)) // account for arrays of content types
        $content_type = $content_type[0];

    $faviconurl = "images/favicon-default.png";
    if(strpos($content_type, "text/html") !== false)
    {
        try {
            $html = file_get_contents($url);
        } catch (Exception $e) {
            senderror(new api_error(502, 711, "Failed to fetch url: $url"));
        }
        $matches = [];
        if(preg_match("/rel=\"shortcut(?: icon)?\" (?:href=[\'\"]([^\'\"]+)[\'\"])/i", $html, $matches) === 1)
        {
            $faviconurl = $matches[1];
            // make sure that the favicon url is absolute
            if(preg_match("/^[a-z]+\:(?:\/\/)?/i", $faviconurl) === 0)
            {
                // the url is not absolute, make it absolute
                $basepath = dirname($urlparts[3]);

                // the path should not include the basepath if the favicon url begins with a slash
                if(substr($faviconurl, 0, 1) === "/")
                {
                    $faviconurl = "$urlparts[1]://$urlparts[2]$faviconurl";
                }
                else
                {
                    $faviconurl = "$urlparts[1]://$urlparts[2]$basepath/$faviconurl";
                }
            }
        }
    }

    if($faviconurl == "images/favicon-default.png")
    {
        // we have not found the url of the favicon yet, parse the url
        // todo guard against invalid urls

        $faviconurl = "$urlparts[1]://$urlparts[2]/favicon.ico";
        $faviconurl = follow_redirects($faviconurl);
        $favheaders = get_headers($faviconurl, true);
        $favheaders = array_change_key_case($favheaders);

        if(preg_match("/2\d{3}/i", $favheaders[0]) === 0)
            return $faviconurl;
    }

    return $faviconurl;
}

This code is pulled directly from the Bloworm source code - so you will need to edit it slightly to suit your needs. It is not perfect, and will probably will be updated from time to time.

Following Redirects in PHP

Recently I have found that PHP sometimes doesn't follow redirects (e.g. the get_headers() function). So I wrote this quick function to follow a url's redirects to a certain depth:

/*
 * @summary Follows a chain of redirects and returns that last url in the sequence.
 * 
 * @param $url - The url to start at.
 * @param $maxdepth - The maximum depth to which to travel following redirects.
 * 
 * @returns The url at the end of the redirect chain.
 */
function follow_redirects($url, $maxdepth = 10, $depth = 0)
{
    //return the current url if we have hit the maximum depth
    if($depth >= $maxdepth)
        return $url;

    //download the headers from the url and make all the keys lowercase
    $headers = get_headers($url, true);
    $headers = array_change_key_case($headers);
    //we have a redirect if the `location` header is set
    if(isset($headers["location"]))
    {
        return follow_redirects($headers["location"], $maxdepth, $depth + 1);
    }
    else
    {
        return $url;
    }
}

For example, you could do this:

follow_redirects("https://example.com/some/path", 5);

That would follow the redirects, starting at https://example.com/some/path, to a maximum depth of 5 urls.

When I learn networking in C♯ (and if it doesn't follow redirects), I will rewrite this function in C♯ for you.

Generating Session Tokens with PHP

Recently I needed to generate random strings to hex to act as a session token for Blow Worm. Using session tokens mean that you send the login credentials once, and then the server hands out a session token for use instead of the password for the rest of that session. In theory this is more secure than sending the password to the server every time.

The problem with generating random session tokens is that you need a secure random number generator, so that hackers can't attempt to guess the random numbers and hence guess the session tokens (that would be bad).

The way I did it (please leave a comment below if this is insecure!) is as follows:

  1. Generate ~128 bits of randomness using the OpenSSL function openssl_random_pseudo_bytes(). This randomness generator is apparently better than rand() and mt_rand().
  2. Hash that resulting randomness with SHA256 to ensure a constant session key length.

The PHP code I am currently using is as follows:

$sessionkey = hash("sha256", openssl_random_pseudo_bytes($session_key_length));

I thought that I would share this here since it took me a little while to look up how to do this. If anyone has a better way of doing this, I will gladly take suggestions and give full credit.

File System Performance in PHP

While writing pepperminty wiki, I started seeing a rather nasty in crease in page load times. After looking into it, I drew the conclusion that it must have been the file system that caused the problem. At the time, I had multiple calls to PHP's glob function to find all the wiki pages in the current directory, and I was checking to see if the wiki page existed before reading it into memory.

The solution: A page index. To cut down on the number of reads from the file system, I created a json file that containedd inforamtion about every page on the wiki. This way, it only needs to check the existence of and read in a single file before it can start rendering any one page. If the page index doesn't exist, it is automatically rebuilt with the glob function to find all the wiki pages in the current directory.

In short: to increase the performance of your PHP application, try to reduce the number of reads (and writes!) to the file system to an absolute minimum.

I still need to update the code to allow users to delete pages via the GUI though, because at present you have to have access to the server files to delete a page and then remove it from the page index manually.

Pepperminty Wiki: A Wiki in a box

Recently I found a post on reddit by someone called am2064 about a 'one file wiki' called 'Minty Wiki' written in PHP. I took a look and whilie it was cool, I found it to have some bugs in it. I also found that it needed an extra PHP file to parse markdown to make it work properly. Still, I thought it was a cool idea so I decided to have a go myself.

694 lines of code later, I had something that worked and I thought that I might post about here on my blog. It is by no means finished, but it is in a somewhat usable (hopefully secure) state. I decided that markdown was the most logical choice for editing pages, so I modified Slimdown (by Johnny Broadway) to add internal link parsing and tweaked the bold/italics code to be mroe like Gmail's chat amongst other things. I first found Slimdown when looking for a lightweight markdown parser for comments on this blog.

I named my creation 'Pepperminty Wiki' (after the wiki that gave me the idea). It currently allows you to create and edit pages (although you need access to the server's files to delete pages currently), list all current pages, and view a printable version of a page. It even has a 'search' box that allows you to type in the name of the page you want to view. The search box has an HTML5 <datalist> to provide the autocomplete functionality.

To use it yourself, simply download index.php in the github repository below and put it in a folder on your server. Make sure that you have enabled write access to the folder though, or else you will start to see to rather strange error messages :)

To configure it, simply open the file you downloaded with your favourite text editor. You will find the settings (along with an explanation of each) at the top of the file. Make sure that you change the usernames and passwords!

You can find it on github here: Pepperminty Wiki

A (uneditable) version can be found here: Demo

Soon I will write up a technical post about my efforts to improve the performance of Pepperminty Wiki.

Security update to atom.gen.php

Since this website gets a lot of spam (ongoing investigations are currently in force in order to analyse the spambots' patterns, a post will be made here when they have been stopped) and this website also has a comments feed powered by atom.gen.php, I have had a chance to test atom.gen.php out in the wild with real data.

I discovered, unfortunately, that the script didn't handle invalid utf-8 and non printable characters very well, and this lead to the feed getting broken because XML doesn't like certain specific characters. This has now been fixed.

If you handle user input and use atom.gen.php to turn it into a feed, you will want to grab an updated copy of the script (quick link here) and overwrite your previous copy in order to fix this.

As well as fixing that, I also added a new option, $usecdata. This controls whether the <content> tag's contents should be wrapped in <![CDATA[...]]>. This should add extra protection again html / javascript injection attacks breaking your feeds. It defaults to false, though, so you need to manually enable it by setting it to true.

The reference has been updated accordingly.

If you find another bug, please comment below. You will recieve full credit at the top of the file (especially if you provide a fix!).

Binary Searching

We had our first Algorithms lecture on wednesday. We were introduced to two main things: complexity and binary searching. Why it is called binary searching, I do not know (leave a comment below if you do!). The following diagram I created explains it better than I could in words:

Binary Search Algorithm

I have implementated the 'binary search' algorithm in Javascript (should work in Node.JS too), PHP, and Python 3 (not tested in Python 2).

Javascript (editable version here):

/**
 * @summary Binary Search Implementation.
 * @description Takes a sorted array and the target number to find as input.
 * @author Starbeamrainbowlabs
 * 
 * @param arr {array} - The *sorted* array to search.
 * @param target {number} - The number to search array for.
 * 
 * @returns {number} - The index at which the target was found.
 */
function binarysearch(arr, target)
{
    console.log("searching", arr, "to find", target, ".");
    var start = 0,
        end = arr.length,
        midpoint = Math.floor((end + start) / 2);

    do {
        console.log("midpoint:", midpoint, "start:", start, "end:", end);
        if(arr[midpoint] !== target)
        {
            console.log("at", midpoint, "we found", arr[midpoint], ", the target is", target);
            if(arr[midpoint] > target)
            {
                console.log("number found was larger than midpoint - searching bottom half");
                end = midpoint;
            }
            else
            {
                console.log("number found was smaller than midpoint - searching top half");
                start = midpoint;
            }
            midpoint = Math.floor((end + start) / 2);
            console.log("new start/end/midpoint:", start, "/", end, "/", midpoint);
        }
    } while(arr[midpoint] !== target);
    console.log("found", target, "at position", midpoint);
    return midpoint;
}

The javascript can be tested with code like this:

//utility function to make generating random number easier
function rand(min, max)
{
    if(min > max)
        throw new Error("min was greater than max");
    return Math.floor(Math.random()*(max-min))+min;
}

var tosearch = [];
for(var i = 0; i < 10; i++)
{
    tosearch.push(rand(0, 25));
}
tosearch.sort(function(a, b) { return a - b;});
var tofind = tosearch[rand(0, tosearch.length - 1)];
console.log("result:", binarysearch(tosearch, tofind));

PHP:

<?php
//utility function
function logstr($str) { echo("$str\n"); }

/*
 * @summary Binary Search Implementation.
 * @description Takes a sorted array and the target number to find as input.
 * @author Starbeamrainbowlabs
 * 
 * @param arr {array} - The *sorted* array to search.
 * @param target {number} - The number to search array for.
 * 
 * @returns {number} - The index at which the target was found.
 */
function binarysearch($arr, $target)
{
    logstr("searching [" . implode(", ", $arr) . "] to find " . $target . ".");
    $start = 0;
    $end = count($arr);
    $midpoint = floor(($end + $start) / 2);

    do {
        logstr("midpoint: " . $midpoint . " start: " . $start . " end: " . $end);
        if($arr[$midpoint] != $target)
        {
            logstr("at " . $midpoint . " we found " . $arr[$midpoint] . ", the target is " . $target);
            if($arr[$midpoint] > $target)
            {
                logstr("number found was larger than midpoint - searching bottom half");
                $end = $midpoint;
            }
            else
            {
                logstr("number found was smaller than midpoint - searching top half");
                $start = $midpoint;
            }
            $midpoint = floor(($end + $start) / 2);
            logstr("new start/end/midpoint: " . $start . "/" . $end . "/" . $midpoint);
        }
    } while($arr[$midpoint] != $target);
    logstr("found " . $target . " at position " . $midpoint);
    return $midpoint;
}
?>

The PHP version can be tested with this code:

<?php
$tosearch = [];
for($i = 0; $i < 10; $i++)
{
    $tosearch[] = rand(0, 25);
}
sort($tosearch);

$tofind = $tosearch[array_rand($tosearch)];
logstr("result: " . binarysearch($tosearch, $tofind));
?>

And finally the Python 3 version:

#!/usr/bin/env python
import math;
import random;

"""
" @summary Binary Search Implementation.
" @description Takes a sorted list and the target number to find as input.
" @author Starbeamrainbowlabs
" 
" @param tosearch {list} - The *sorted* list to search.
" @param target {number} - The number to search list for.
" 
" @returns {number} - The index at which the target was found.
"""
def binarysearch(tosearch, target):
    print("searching [" + ", ".join(map(str, tosearch)) + "] to find " + str(target) + ".");
    start = 0;
    end = len(tosearch);
    midpoint = int(math.floor((end + start) / 2));

    while True:
        print("midpoint: " + str(midpoint) + " start: " + str(start) + " end: " + str(end));
        if tosearch[midpoint] != target:
            print("at " + str(midpoint) + " we found " + str(tosearch[midpoint]) + ", the target is " + str(target));
            if tosearch[midpoint] > target:
                print("number found was larger than midpoint - searching bottom half");
                end = midpoint;
            else:
                print("number found was smaller than midpoint - searching top half");
                start = midpoint;

            midpoint = int(math.floor((end + start) / 2));
            print("new start/end/midpoint: " + str(start) + "/" + str(end) + "/" + str(midpoint));

        else:
            break;

    print("found " + str(target) + " at position " + str(midpoint));
    return midpoint;

The python code can be tested with something like this:

tosearch = [];
for i in range(50):
    tosearch.append(random.randrange(0, 75));

tosearch.sort();
tofind = random.choice(tosearch);

print("result: " + str(binarysearch(tosearch, tofind)));

That's a lot of code for one blog post.....

A Simpler Way to Generate XML in PHP

In an effort to make XML generation simpler in PHP, I have written another PHP class, called simpexmlwriter.

Much like atom.gen.php, everything you need is all packaged up into one file - simply download and require simplexmlwriter.php and you are ready to start. Links can be found near the bottom of this post.

The same system is in place for contributions and feature requests: post a comment below to either request a feature or link to a modified version of the code and I will consider either merging your changes or adding the feature that you request.

It also has a 'reference' - just like atom.gen.php. A link can be found near the bottom of this post.

If you are still reading this and you are not interested in code, there will be a few things that you may be interesting in appearing on this website soon.

simplexmlwriter.php

simplexmlwriter.php reference

Generating Atom Feeds

Edit 11th May 2020: Since I made this post, I've discovered about a much better and safer way to generate XML. I suggest reading this newer post instead: Generating Atom Feeds. I'm leaving this post otherwise intact for historical interest.


This week I am releasing atom.gen.php, the PHP script that powers this blog's Atom feed (which you can find here!).

This PHP class has been designed to be simple and easy to use (apart from the addentry() function which needs tidying up :D), and quick to get started with. I have also created a basic example showing you how to use it and a 'reference' that covers all of the functions and properties that are available for use. Links to both the script and the 'reference' can be found at the bottom of this post.

Although I have tested it, it is entirely possible that you will come across a bug. If you do, please post about in the comments below.

You may also find that atom.gen.php does not do everything that you want it to. In this case, you have two options: either post a comment down below and I will consider adding the feature you request, or adding the feature yourself. If you add the feature or fix a bug yourself, please post a comment down below along with a link to the modified code and I will merge your changes and give you full credit for all the work you have done.

atom.gen.php

atom.gen.php reference

Art by Mythdael