Find Similar Links on LinkRiver

I’ve been noodling on this feature for a while — how can I find “more links like this one” in LinkRiver. Putting on my machine learning hat, I contemplated link-to-link co-visitation schemes, semantic indexing, various clustering algorithms… but all approaches were too data-heavy, at least for now. There had to be an easier way…

LinkRiver has allowed full-text searching links (by river and stream) for a while now. The link title and host (i.e. http://www.techcrunch.com) are both a part of the index. Could the full-text search engine help out here? Let’s try it out.

One popular link today was a story on news.com about the possibility of eBay selling Skype to Google. What if I send the link host and title to the search engine? Are the results relevant?

Try it yourself: Click to see similar links

In most cases this works really well…

Twobile-Twitter for Windows Mobile
FriendFeed Has Search

But sometimes, the results are not so great:

TechMeme Leaderboard: Six Months In

Options – one thing I may do, depending on feedback, is stop including the link host as a part of the search query. Play around (click similar, then re-run the search after removing the link host from the search box) and let me know what you think.

What Powers the Aggregators?

All lifestream and link-sharing aggregators use an RSS/ATOM parser to help power their service.

I built LinkRiver using Ruby on Rails and would have preferred to use a parser built in Ruby. However, Mark Pilgrim’s Universal Feed Parser is rock-solid and very well tested, so I use UFP for feed parsing. LinkRiver controls UFP via a memcached-based message queue. Some UFP-Python glue posts new shared links via a simple HTTP API.

A while back RSSMeme’s Benjamin Golub tweeted that he also uses UFP, so I thought I’d ask around to see what some of the other aggregators are using.

Bret Taylor from FriendFeed told me they use UFP as a fall-back but rely primarily on a custom parser that uses much less memory.

ReadBurner developer Alexander Marktl replied to say that he uses a MagicParser, a commercial parser for PHP.

After testing a bunch of options and finding none that worked, Tumblr’s Marco Arment wrote his own parser for PHP “with regular DOM functions”.

Google’s Chris Wetherell has blogged about the history of Google Reader and mentioned that UFP was involved, at least in the early stages.

Any others?

Updated: See comments — Gabe Rivera from Techmeme built his own in Perl.

Favorite Firefox Extension – Tabs Open Relative

One feature I’ve missed since abandoning NetCaptor for Firefox a few years ago was the ability to open new tabs next to the current tab instead of at the end of my tab stack. I spent an hour white-boarding this with Firefox dev Ben Goodger, and I gave up trying to do this myself after finding the Firefox tab-ordering code to be a spaghetti-mess of independent arrays.

I don’t remember how I stumbled on Tabs Open Relative… but all is well in my tabbed browsing world again — as if some annoying background music is gone. Ahhhhh.

Why is this feature so important? Context. When you open new tabs, they tend to be related to the current tab. If I’m searching Google for digital camera reviews and open the top five links as separate tabs, those tabs should be close to the “starter” tab, not lost at the end.

Save Links for Later on LinkRiver

This happens to me all the time. I’m in super-productive mode and I run across an article or blog post that is interesting but entirely outside the context of what I’m doing. I need to stay on task – no tangents allowed.

I’ve tried a few things… a ‘To Read’ folder in my browser’s bookmarks or tagging links ‘toread’ on del.icio.us, but these methods were either too disruptive or difficult to manage.

I tried out InstaPaper the other day and loved it – one-click and a link is saved for later. It worked great, but it didn’t help me if I found something to ‘later’ when in Google Reader. Still too much friction.

Inspired by InstaPaper, I added a ‘Save for Later’ feature to LinkRiver.

Big

Links you mark ‘Later’ show up under your ‘Later’ tab in LinkRiver. These links are private and not shared with your followers unless you choose that explicitly.

Bookmarklets

Three Ways to Save Links for Later

There are three ways to add links to your ‘Later’ stream.

First – there is a new one-click bookmarklet you can add to your browser toolbar. One-click — boom — you’ve saved the link for later without leaving the page you are on. Look for these in your sidebar after logging in to LR.

Later Link

Second – links inside LinkRiver now have a ‘later’ option in addition to the ‘share’ option that’s been there for a while. Again – one-click and its saved for later.

Third – this one is probably the most powerful of them all – you can import an external feed into your ‘Later’ stream.

Big

I setup LinkRiver to import my Google Reader shared items into my main stream and my starred items into my Later stream. This works beautifully, especially when using Google Reader on my iPhone. Just click ‘share’ in GR to share on LinkRiver, or ‘star’ to save it for later. Sweet GTD goodness!

Elephants, T-Rexes, Giants and Dad

Big

I saw this on the wall at Claire’s preschool today. They had asked the kids to name things that are big – elephant, train (2x), giant, big big truck, and t-rex all made the list. What comes to Claire’s mind when you ask her to name something big? Dad. I love it!