Starbeamrainbowlabs

Stardust
Blog

Finding Favicons with PHP

There hasn't been a post here for a little while because I have been ill. I am back now though :)

While writing more Bloworm, I needed a function that would automatically detect the url of the favicon that is associated with a given url. I wrote a quick function to do this a while ago - and have been improving it little by little.

I now have it at a point where it finds the correct url 99% of the time, so I thought that I would share it with you.

/*
 * @summary Given a url, this function will attempt to find it's correspending favicon.
 *
 * @returns The url of the corresponding favicon.
 */
function auto_find_favicon_url($url)
{
    if(!validate_url($url))
        senderror(new api_error(400, 520, "The url you specified for the favicon was invalid."));

    // todo protect against downloading large files
    // todo send HEAD request instead of GET request
    try {
        $headers = get_headers($url, true);
    } catch (Exception $e) {
        senderror(new api_error(502, 710, "Failed to fetch the headers from url: $url"));
    }
    $headers = array_change_key_case($headers);

    $urlparts = [];
    preg_match("/^([a-z]+)\:(?:\/\/)?([^\/?#]+)(.*)/i", $url, $urlparts);

    $content_type = $headers["content-type"];
    if(!is_string($content_type)) // account for arrays of content types
        $content_type = $content_type[0];

    $faviconurl = "images/favicon-default.png";
    if(strpos($content_type, "text/html") !== false)
    {
        try {
            $html = file_get_contents($url);
        } catch (Exception $e) {
            senderror(new api_error(502, 711, "Failed to fetch url: $url"));
        }
        $matches = [];
        if(preg_match("/rel=\"shortcut(?: icon)?\" (?:href=[\'\"]([^\'\"]+)[\'\"])/i", $html, $matches) === 1)
        {
            $faviconurl = $matches[1];
            // make sure that the favicon url is absolute
            if(preg_match("/^[a-z]+\:(?:\/\/)?/i", $faviconurl) === 0)
            {
                // the url is not absolute, make it absolute
                $basepath = dirname($urlparts[3]);

                // the path should not include the basepath if the favicon url begins with a slash
                if(substr($faviconurl, 0, 1) === "/")
                {
                    $faviconurl = "$urlparts[1]://$urlparts[2]$faviconurl";
                }
                else
                {
                    $faviconurl = "$urlparts[1]://$urlparts[2]$basepath/$faviconurl";
                }
            }
        }
    }

    if($faviconurl == "images/favicon-default.png")
    {
        // we have not found the url of the favicon yet, parse the url
        // todo guard against invalid urls

        $faviconurl = "$urlparts[1]://$urlparts[2]/favicon.ico";
        $faviconurl = follow_redirects($faviconurl);
        $favheaders = get_headers($faviconurl, true);
        $favheaders = array_change_key_case($favheaders);

        if(preg_match("/2\d{3}/i", $favheaders[0]) === 0)
            return $faviconurl;
    }

    return $faviconurl;
}

This code is pulled directly from the Bloworm source code - so you will need to edit it slightly to suit your needs. It is not perfect, and will probably will be updated from time to time.

Tag Cloud

3d 3d printing account algorithms android announcement architecture archives arduino artificial intelligence artix assembly async audio automation backups bash batch blender blog bookmarklet booting bug hunting c sharp c++ challenge chrome os cluster code codepen coding conundrums coding conundrums evolved command line compilers compiling compression conference conferences containerisation css dailyprogrammer data analysis debugging defining ai demystification distributed computing dns docker documentation downtime electronics email embedded systems encryption es6 features ethics event experiment external first impressions freeside future game github github gist gitlab graphics guide hardware hardware meetup holiday holidays html html5 html5 canvas infrastructure interfaces internet interoperability io.js jabber jam javascript js bin labs latex learning library linux lora low level lua maintenance manjaro minetest network networking nibriboard node.js open source operating systems optimisation outreach own your code pepperminty wiki performance phd photos php pixelbot portable privacy problem solving programming problems project projects prolog protocol protocols pseudo 3d python reddit redis reference release releases rendering research resource review rust searching secrets security series list server software sorting source code control statistics storage svg systemquery talks technical terminal textures thoughts three thing game three.js tool tutorial twitter ubuntu university update updates upgrade version control virtual reality virtualisation visual web website windows windows 10 worldeditadditions xmpp xslt

Archive

Art by Mythdael