Starbeamrainbowlabs

Stardust
Blog

Probably the world's most advanced string splitting function

When I was setting up this website, I foolishly picked a custom log file format that is rather hard for computers to parse. Because of this I haven't found a server log analysis tool that is intelligent enough to parse my logs (if you know of one please let me know in the comments!).

Here is an example of a typical log file entry:

[19/May/2015:00:47:52 +0100] "starbeamrainbowlabs.com" HTTP/1.1 GET 200 162.243.87.220 0s :443 /blog/article.php article=posts/013-Terminal-Reference.html "https://starbeamrainbowlabs.com/blog/?offset=60" "Mozilla/5.0 (compatible; spbot/4.4.2; +http://OpenLinkProfiler.org/bot )"

It looks strange, doesn't it? Since I want to have some idea of how many people are visiting my site, I have finally gotten around to writing my own custom log parser. In order to do this, I needed a way to convert each line into an array of terms. None of the answers on stackoverflow seemed to cut it, so I wrote my own:

<?php
function explode_adv($openers, $closers, $togglers, $delimiters, $str)
{
    $chars = str_split($str);
    $parts = [];
    $nextpart = "";
    $toggle_states = array_fill_keys($togglers, false); // true = now inside, false = now outside
    $depth = 0;
    foreach($chars as $char)
    {
        if(in_array($char, $openers))
            $depth++;
        elseif(in_array($char, $closers))
            $depth--;
        elseif(in_array($char, $togglers))
        {
            if($toggle_states[$char])
                $depth--; // we are inside a toggle block, leave it and decrease the depth
            else
                // we are outside a toggle block, enter it and increase the depth
                $depth++;

            // invert the toggle block state
            $toggle_states[$char] = !$toggle_states[$char];
        }
        else
            $nextpart .= $char;

        if($depth < 0) $depth = 0;

        if(in_array($char, $delimiters) &&
           $depth == 0 &&
           !in_array($char, $closers))
        {
            $parts[] = substr($nextpart, 0, -1);
            $nextpart = "";
        }
    }
    if(strlen($nextpart) > 0)
        $parts[] = $nextpart;

    return $parts;
}
?>

I have also posted this on stackoverflow. This function of mine takes 5 parameters:

  1. An array of characters that open a block - e.g. [, (, etc.
  2. An array of characters that close a block - e.g. ], ), etc.
  3. An array of characters that toggle a block - e.g. ", ', etc.
  4. An array of characters that should cause a split into the next part.
  5. The string to work on.

This function probably will have flaws, but it works well enough for me.

You can also find this function on GitHub's Gist - as always suggestions and contributions are always welcome :)

Tag Cloud

3d 3d printing account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio automation backups bash batch blender blog bookmarklet booting bug hunting c sharp c++ challenge chrome os cluster code codepen coding conundrums coding conundrums evolved command line compilers compiling compression conference conferences containerisation css dailyprogrammer data analysis debugging defining ai demystification distributed computing dns docker documentation downtime electronics email embedded systems encryption es6 features ethics event experiment external first impressions freeside future game github github gist gitlab graphics guide hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet interoperability io.js jabber jam javascript js bin labs latex learning library linux lora low level lua maintenance manjaro minetest network networking nibriboard node.js open source operating systems optimisation outreach own your code pepperminty wiki performance phd photos php pixelbot portable privacy problem solving programming problems project projects prolog protocol protocols pseudo 3d python reddit redis reference release releases rendering research resource review rust searching secrets security series list server software sorting source code control statistics storage svg systemquery talks technical terminal textures thoughts three thing game three.js tool tutorial twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 worldeditadditions xmpp xslt

Archive

Art by Mythdael