Probably the world's most advanced string splitting function
When I was setting up this website, I foolishly picked a custom log file format that is rather hard for computers to parse. Because of this I haven't found a server log analysis tool that is intelligent enough to parse my logs (if you know of one please let me know in the comments!).
Here is an example of a typical log file entry:
[19/May/2015:00:47:52 +0100] "starbeamrainbowlabs.com" HTTP/1.1 GET 200 162.243.87.220 0s :443 /blog/article.php article=posts/013-Terminal-Reference.html "https://starbeamrainbowlabs.com/blog/?offset=60" "Mozilla/5.0 (compatible; spbot/4.4.2; +http://OpenLinkProfiler.org/bot )"
It looks strange, doesn't it? Since I want to have some idea of how many people are visiting my site, I have finally gotten around to writing my own custom log parser. In order to do this, I needed a way to convert each line into an array of terms. None of the answers on stackoverflow seemed to cut it, so I wrote my own:
<?php
function explode_adv($openers, $closers, $togglers, $delimiters, $str)
{
$chars = str_split($str);
$parts = [];
$nextpart = "";
$toggle_states = array_fill_keys($togglers, false); // true = now inside, false = now outside
$depth = 0;
foreach($chars as $char)
{
if(in_array($char, $openers))
$depth++;
elseif(in_array($char, $closers))
$depth--;
elseif(in_array($char, $togglers))
{
if($toggle_states[$char])
$depth--; // we are inside a toggle block, leave it and decrease the depth
else
// we are outside a toggle block, enter it and increase the depth
$depth++;
// invert the toggle block state
$toggle_states[$char] = !$toggle_states[$char];
}
else
$nextpart .= $char;
if($depth < 0) $depth = 0;
if(in_array($char, $delimiters) &&
$depth == 0 &&
!in_array($char, $closers))
{
$parts[] = substr($nextpart, 0, -1);
$nextpart = "";
}
}
if(strlen($nextpart) > 0)
$parts[] = $nextpart;
return $parts;
}
?>
I have also posted this on stackoverflow. This function of mine takes 5 parameters:
- An array of characters that open a block - e.g.
[
,(
, etc. - An array of characters that close a block - e.g.
]
,)
, etc. - An array of characters that toggle a block - e.g.
"
,'
, etc. - An array of characters that should cause a split into the next part.
- The string to work on.
This function probably will have flaws, but it works well enough for me.
You can also find this function on GitHub's Gist - as always suggestions and contributions are always welcome :)