What Powers the Aggregators?

All lifestream and link-sharing aggregators use an RSS/ATOM parser to help power their service.

I built LinkRiver using Ruby on Rails and would have preferred to use a parser built in Ruby. However, Mark Pilgrim’s Universal Feed Parser is rock-solid and very well tested, so I use UFP for feed parsing. LinkRiver controls UFP via a memcached-based message queue. Some UFP-Python glue posts new shared links via a simple HTTP API.

A while back RSSMeme’s Benjamin Golub tweeted that he also uses UFP, so I thought I’d ask around to see what some of the other aggregators are using.

Bret Taylor from FriendFeed told me they use UFP as a fall-back but rely primarily on a custom parser that uses much less memory.

ReadBurner developer Alexander Marktl replied to say that he uses a MagicParser, a commercial parser for PHP.

After testing a bunch of options and finding none that worked, Tumblr’s Marco Arment wrote his own parser for PHP “with regular DOM functions”.

Google’s Chris Wetherell has blogged about the history of Google Reader and mentioned that UFP was involved, at least in the early stages.

Any others?

Updated: See comments — Gabe Rivera from Techmeme built his own in Perl.


4 thoughts on “What Powers the Aggregators?

  1. I’m not surprised you went with UFP. The standard Ruby RSS parser is pretty hairy, especially how it returns many different classes based on the type of feed it parsed (which is the correct thing to do, but can make developing with it a pain).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s