SXSW 2011: Hacking RSS: Filtering & Processing Obscene Amounts of Information

Dawn Foster
MeeGo Community Manager
Intel

Information Overload

There is an obseen amount of data in the world we live in today. Right now we have a mass of 600+ Exabytes of data today (1 Exabyte = 1,073,741,824 Gigabytes).

Most of this information is…

  • Complete Crap
  • Out of Date / Obsolete
  • Not Relevant

So, what techniques can you use to find the information you want?

RSS is a start. Sources you care about delivered right to you, but do you care about everything in each feed? What about feeds you do not subscribe to? Can you keep up with what you have?

Prioritizing your reader

  • Put things you care about at the top
  • Categorize
  • Don’t try to read everything

Outsource / Crowdsource New Sources

The Real Magic is Filtering RSS

  • PostRank: Finds the best posts in a feed ranked on engagement (links, sharing, comments). You can then get an output as an RSS feed, and the feed includes the postrank number as a field.
  • Yahoo! Pipes: Allows you to filter based on any field in the RSS file, not just title and description. The downside is it takes a long time to learn and muddle through.
  • Feed Rinse:Easy to use, not as flexible. Import RSS feeds, and filters, then get new RSS feeds out.
  • BackTweets lets you search Twitter based on a URL regardless of it’s short link.
  • …and many more!

Things to use this with

  • Personal Productivity
  • Understanding the Possibilities
  • Creating prototypes for something you want to build

When not to use it

  • Don’t use in critical or production environments
  • Typically all of this can be done in most programming languages with caching and error checking

References

Dawn Foster’s Blog Post

Posted in SXSW 2011.

Leave a Reply

Your email address will not be published. Required fields are marked *