Generating Atom 1.0 Feeds with PHP (the proper way)
I've generated Atom feeds in PHP before, but recently I went on the hunt to discover if PHP has something like C♯'s XMLWriter class - and it turns out it does! Although poorly documented (probably why I didn't find it in the first place :P), it's actually quite logical and easy to pick up.
To this end, I thought I'd blog about how I used to write the Atom 1.0 feed generator for recent changes on Pepperminty Wiki that I've recently implemented (coming soon in the next release!), as it's so much cleaner than atom.gen.php
that I blogged about before! It's safer too - as all the escaping is handled automatically by PHP - so there's no risk of an injection attack because I forgot to escape a character in my library code.
It ends up being a bit verbose, but a few helper methods (or a wrapper class?) should alleviate this nicely - I might refactor it at a later date.
To begin, we need to create an instance of the aptly-named XMLLWriter
class. It's probable that you'll need the php-xml
package installed in order to use this.
$xml = new XMLWriter();
$xml->openMemory();
$xml->setIndent(true); $xml->setIndentString("\t");
$xml->startDocument("1.0", "utf-8");
In short, the above creates a new XMLWriter
instance, instructs it to write the XML to memory, enables pretty-printing of the output, and writes the standard XML document header.
With the document created, we can begin to generate the Atom feed. To figure out the format (I couldn't remember from when I wrote atom.gen.php
- that was ages ago :P), I ended following this helpful guide on atomenabled.org. It seems familiar - I think I might have used it when I wrote atom.gen.php
. To start, we need something like this:
<feed xmlns="http://www.w3.org/2005/Atom">
......
</feed>
In PHP, that translates to this:
$xml->startElement("feed");
$xml->writeAttribute("xmlns", "http://www.w3.org/2005/Atom");
$xml->endElement(); // </feed>
Next, we probably want to advertise how the Atom feed was generated. Useful for letting the curious know what a website is powered by, and for spotting usage of your code out in the wild!
Since I'm writing this for Pepperminty Wiki, I'm settling on something not unlike this:
<generator uri="https://github.com/sbrl/Pepperminty-Wiki/" version="v0.18-dev">Pepperminty Wiki</generator>
In PHP, this translates to this:
$xml->startElement("generator");
$xml->writeAttribute("uri", "https://github.com/sbrl/Pepperminty-Wiki/");
$xml->writeAttribute("version", $version); // A variable defined elsewhere in Pepperminty Wiki
$xml->text("Pepperminty Wiki");
$xml->endElement();
Next, we need to add a <link rel="self" />
tag. This informs clients as to where the feed was fetched from, and the canonical URL of the feed. I've done this:
xml->startElement("link");
$xml->writeAttribute("rel", "self");
$xml->writeAttribute("type", "application/atom+xml");
$xml->writeAttribute("href", full_url());
$xml->endElement();
That full_url()
function is from StackOverflow, and calculates the full URI that was used to make a request. As Pepperminty Wiki can be run in any directory on ayn server, I can't pre-determine this url - hence the complexity.
Note also that I output type="application/atom+xml"
here. This specifies the type of content that can be found at the supplied URL. The idea here is that if you represent the same data in different ways, you can advertise them all in a standard way, with other formats having rel="alternate"
. Pepperminty Wiki does this - generating the recent changes list in HTML, CSV, and JSON in addition to the new Atom feed I'm blogging about here (the idea is to make the wiki data as accessible and easy-to-parse as possible). Let's advertise those too:
$xml->startElement("link");
$xml->writeAttribute("rel", "alternate");
$xml->writeAttribute("type", "text/html");
$xml->writeAttribute("href", "$full_url_stem?action=recent-changes&format=html");
$xml->endElement();
$xml->startElement("link");
$xml->writeAttribute("rel", "alternate");
$xml->writeAttribute("type", "application/json");
$xml->writeAttribute("href", "$full_url_stem?action=recent-changes&format=json");
$xml->endElement();
$xml->startElement("link");
$xml->writeAttribute("rel", "alternate");
$xml->writeAttribute("type", "text/csv");
$xml->writeAttribute("href", "$full_url_stem?action=recent-changes&format=csv");
$xml->endElement();
Before we can output the articles themselves, there are a few more pieces of metadata left on our laundry list - namely <updated />
, <id />
, <icon />
, <title />
, and <subtitle />
. There are others in the documentation too, but aren't essential (as far as I can tell) - and not appropriate in this specific case. Here's what they might look like:
<updated>2019-02-02T21:23:43+00:00</updated>
<id>https://wiki.bobsrockets.com/?action=recent-changes&format=atom</id>
<icon>https://wiki.bobsrockets.com/rocket_logo.png</icon>
<title>Bob's Wiki - Recent Changes</title>
<subtitle>Recent Changes on Bob's Wiki</subtitle>
The <updated />
tag specifies when the feed was last updated. It's unclear as to whether it's the date/time the last change was made to the feed or the date/time the feed was generated, so I've gone with the latter. If this isn't correct, please let me know and I'll change it.
The <id />
element can contain anything, but it must be a globally-unique string that identifies this feed. I've seen other feeds use the canonical url - and I've gone to the trouble of calculating it for the <link rel="self" />
- so it seems a shame to not use it here too.
The remaining elements (<icon />
, <title />
, and <subtitle />
) are pretty self explanatory - although it's worth mentioning that the icon must be square apparently. Let's whip those up with some more PHP:
$xml->writeElement("updated", date(DateTime::ATOM));
$xml->writeElement("id", full_url());
$xml->writeElement("icon", $settings->favicon);
$xml->writeElement("title", "$settings->sitename - Recent Changes");
$xml->writeElement("subtitle", "Recent Changes on $settings->sitename");
PHP even has a present for generating a date string in the correct format required by the spec :D $settings
is an object containing the wiki settings that's a parsed form of peppermint.json
, and contains useful things like the wiki's name, icon, etc.
Finally, with all the preamble done, we can turn to the articles themselves. In the case of Pepperminty Wiki, the final result will look something like this:
<entry>
<title type="text">Edit to Compute Core by Sean</title>
<id>https://seanssatellites.co.uk/wiki/?page=Compute%20Core</id>
<updated>2019-01-29T10:21:43+00:00</updated>
<content type="html"><ul>
<li><strong>Change type:</strong> edit</li>
<li><strong>User:</strong> Sean</li>
<li><strong>Page name:</strong> Compute Core</li>
<li><strong>Timestamp:</strong> Tue, 29 Jan 2019 10:21:43 +0000</li>
<li><strong>New page size:</strong> 1.36kb</li>
<li><strong>Page size difference:</strong> +1</li>
</ul></content>
<link rel="alternate" type="text/html" href="https://seanssatellites.co.uk/wiki/?page=Compute%20Core"/>
<author>
<name>Sean</name>
<uri>https://seanssatellites.co.uk/wiki/?page=Users%2FSean</uri>
</author>
</entry>
There are a bunch of elements here that deserve attention:
<title />
- The title of the article. Easy peasy!
<id />
- Just like the id of the feed itself, each article entry needs an id too. Here I've followed the same system I used for the feed, and given the url of the page content.
<updated />
- The last time the article was updated. Since this is part of a feed of recent changes, I've got this information readily at hand.
<content />
- The content to display. If the content is HTML, it must be escaped and type="html"
present to indicate this.
<link rel="alternate" />
Same deal as above, but on an article-by-article level. In this case, it should link to the page the article content is from. In this case, I link to the page & revision of the change in question. In other cases, you might link to the blog post in question for example.
<author />
- Can contain <name />
, <uri />
, and <email />
, and should indicate the author of the content. In this case, I use the name of the user that made the change, along with a link to their user page.
Here's all that in PHP:
foreach($recent_changes as $recent_change) {
if(empty($recent_change->type))
$recent_change->type = "edit";
$xml->startElement("entry");
// Change types: revert, edit, deletion, move, upload, comment
$type = $recent_change->type;
$url = "$full_url_stem?page=".rawurlencode($recent_change->page);
$content = ".......";
$xml->startElement("title");
$xml->writeAttribute("type", "text");
$xml->text("$type $recent_change->page by $recent_change->user");
$xml->endElement();
$xml->writeElement("id", $url);
$xml->writeElement("updated", date(DateTime::ATOM, $recent_change->timestamp));
$xml->startElement("content");
$xml->writeAttribute("type", "html");
$xml->text($content);
$xml->endElement();
$xml->startElement("link");
$xml->writeAttribute("rel", "alternate");
$xml->writeAttribute("type", "text/html");
$xml->writeAttribute("href", $url);
$xml->endElement();
$xml->startElement("author");
$xml->writeElement("name", $recent_change->user);
$xml->writeElement("uri", "$full_url_stem?page=".rawurlencode("$settings->user_page_prefix/$recent_change->user"));
$xml->endElement();
$xml->endElement();
}
I've omitted the logic that generates the value of the <content />
tag, as it's not really relevant here (you can check it out here if you're curious :D).
This about finishes the XML we need to generate for our feed. To extract the XML from the XMLWriter
, we can do this:
$atom_feed = $xml->flush();
Then we can do whatever we want to with the generated XML!
When the latest version of Pepperminty Wiki comes out, you'll be able to see a live demo here! Until then, you'll need to download a copy of the latest master version and experiment with it yourself. I'll also include a complete demo feed below:
<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<generator uri="https://github.com/sbrl/Pepperminty-Wiki/" version="v0.18-dev">Pepperminty Wiki</generator>
<link rel="self" type="application/atom+xml" href="http://[::]:35623/?action=recent-changes&format=atom&count=3"/>
<link rel="alternate" type="text/html" href="http://[::]:35623/?action=recent-changes&format=html"/>
<link rel="alternate" type="application/json" href="http://[::]:35623/?action=recent-changes&format=json"/>
<link rel="alternate" type="text/csv" href="http://[::]:35623/?action=recent-changes&format=csv"/>
<updated>2019-02-03T17:25:10+00:00</updated>
<id>http://[::]:35623/?action=recent-changes&format=atom&count=3</id>
<icon></icon>
<title>Pepperminty Wiki - Recent Changes</title>
<subtitle>Recent Changes on Pepperminty Wiki</subtitle>
<entry>
<title type="text">Edit to Internal link by admin</title>
<id>http://[::]:35623/?page=Internal%20link</id>
<updated>2019-01-29T19:55:08+00:00</updated>
<content type="html"><ul>
<li><strong>Change type:</strong> edit</li>
<li><strong>User:</strong> admin</li>
<li><strong>Page name:</strong> Internal link</li>
<li><strong>Timestamp:</strong> Tue, 29 Jan 2019 19:55:08 +0000</li>
<li><strong>New page size:</strong> 2.11kb</li>
<li><strong>Page size difference:</strong> +2007</li>
</ul></content>
<link rel="alternate" type="text/html" href="http://[::]:35623/?page=Internal%20link"/>
<author>
<name>admin</name>
<uri>http://[::]:35623/?page=Users%2FInternal%20link</uri>
</author>
</entry>
<entry>
<title type="text">Edit to Main Page by admin</title>
<id>http://[::]:35623/?page=Main%20Page</id>
<updated>2019-01-05T20:14:07+00:00</updated>
<content type="html"><ul>
<li><strong>Change type:</strong> edit</li>
<li><strong>User:</strong> admin</li>
<li><strong>Page name:</strong> Main Page</li>
<li><strong>Timestamp:</strong> Sat, 05 Jan 2019 20:14:07 +0000</li>
<li><strong>New page size:</strong> 317b</li>
<li><strong>Page size difference:</strong> +68</li>
</ul></content>
<link rel="alternate" type="text/html" href="http://[::]:35623/?page=Main%20Page"/>
<author>
<name>admin</name>
<uri>http://[::]:35623/?page=Users%2FMain%20Page</uri>
</author>
</entry>
<entry>
<title type="text">Edit to Main Page by admin</title>
<id>http://[::]:35623/?page=Main%20Page</id>
<updated>2019-01-05T17:53:08+00:00</updated>
<content type="html"><ul>
<li><strong>Change type:</strong> edit</li>
<li><strong>User:</strong> admin</li>
<li><strong>Page name:</strong> Main Page</li>
<li><strong>Timestamp:</strong> Sat, 05 Jan 2019 17:53:08 +0000</li>
<li><strong>New page size:</strong> 249b</li>
<li><strong>Page size difference:</strong> +31</li>
</ul></content>
<link rel="alternate" type="text/html" href="http://[::]:35623/?page=Main%20Page"/>
<author>
<name>admin</name>
<uri>http://[::]:35623/?page=Users%2FMain%20Page</uri>
</author>
</entry>
</feed>
...this is from my local development instance.
Found this interesting? Confused about something? Want to say hi? Comment below!