Finding Favicons with PHP
There hasn't been a post here for a little while because I have been ill. I am back now though :)
While writing more Bloworm, I needed a function that would automatically detect the url of the favicon that is associated with a given url. I wrote a quick function to do this a while ago - and have been improving it little by little.
I now have it at a point where it finds the correct url 99% of the time, so I thought that I would share it with you.
/*
* @summary Given a url, this function will attempt to find it's correspending favicon.
*
* @returns The url of the corresponding favicon.
*/
function auto_find_favicon_url($url)
{
if(!validate_url($url))
senderror(new api_error(400, 520, "The url you specified for the favicon was invalid."));
// todo protect against downloading large files
// todo send HEAD request instead of GET request
try {
$headers = get_headers($url, true);
} catch (Exception $e) {
senderror(new api_error(502, 710, "Failed to fetch the headers from url: $url"));
}
$headers = array_change_key_case($headers);
$urlparts = [];
preg_match("/^([a-z]+)\:(?:\/\/)?([^\/?#]+)(.*)/i", $url, $urlparts);
$content_type = $headers["content-type"];
if(!is_string($content_type)) // account for arrays of content types
$content_type = $content_type[0];
$faviconurl = "images/favicon-default.png";
if(strpos($content_type, "text/html") !== false)
{
try {
$html = file_get_contents($url);
} catch (Exception $e) {
senderror(new api_error(502, 711, "Failed to fetch url: $url"));
}
$matches = [];
if(preg_match("/rel=\"shortcut(?: icon)?\" (?:href=[\'\"]([^\'\"]+)[\'\"])/i", $html, $matches) === 1)
{
$faviconurl = $matches[1];
// make sure that the favicon url is absolute
if(preg_match("/^[a-z]+\:(?:\/\/)?/i", $faviconurl) === 0)
{
// the url is not absolute, make it absolute
$basepath = dirname($urlparts[3]);
// the path should not include the basepath if the favicon url begins with a slash
if(substr($faviconurl, 0, 1) === "/")
{
$faviconurl = "$urlparts[1]://$urlparts[2]$faviconurl";
}
else
{
$faviconurl = "$urlparts[1]://$urlparts[2]$basepath/$faviconurl";
}
}
}
}
if($faviconurl == "images/favicon-default.png")
{
// we have not found the url of the favicon yet, parse the url
// todo guard against invalid urls
$faviconurl = "$urlparts[1]://$urlparts[2]/favicon.ico";
$faviconurl = follow_redirects($faviconurl);
$favheaders = get_headers($faviconurl, true);
$favheaders = array_change_key_case($favheaders);
if(preg_match("/2\d{3}/i", $favheaders[0]) === 0)
return $faviconurl;
}
return $faviconurl;
}
This code is pulled directly from the Bloworm source code - so you will need to edit it slightly to suit your needs. It is not perfect, and will probably will be updated from time to time.