Yahoo! Pipes takes aggregation and integration to the masses

Posted Sep 29, 2008 // 0 comments
Dave:

In recent years, widespread adoption of RSS technology really pushed the concept of aggregating content/data from multiple sources into the mainstream. The problem is that as the number of sources increases, so does the amount of junk and duplicate content. Similarly, there are a number of web-based tools and services out there that are great on their own – but would be even better if they were used in conjunction with one another (i.e. the struggle for social bookmarking supremacy amongst Digg, del.icio.us, and countless others).In recent years, widespread adoption of RSS technology really pushed the concept of aggregating content/data from multiple sources into the mainstream.

The problem is that as the number of sources increases, so does the amount of junk and duplicate content.

Similarly, there are a number of web-based tools and services out there that are great on their own – but would be even better if they were used in conjunction with one another (i.e. the struggle for social bookmarking supremacy amongst Digg, del.icio.us, and countless others).

Yahoo’s Pipes service (which has been around in beta for about a year and a half now), gives users the power to aggregate and mash together all sorts of data from around the web in more ways than ever. Users can create a Pipe by pulling in a number of RSS feeds / web searches / APIs, aggregating that content into one “pool”, and applying logic to strip out irrelevant or duplicate content from the final output.

The drag-and-drop UI is slick and intuitive, allowing everyone from the layman to the developer to build algorithms of varying complexity without needing to create a single line of code. Here’s a glimpse into the look and feel of the UI:

<em>editing</em>_no_dupes_title_source_description_.png” src=”/sites/phase2technology.com/files/u17/_editing__no_dupes_title_source_description_.png” /></p>

	<p>For example, suppose I wanted to create a feed about Henry Paulson’s bailout plan that pulls from Bloomberg, Reuters, <span class=CNN, MSNBC, and NYT.

I would drag a block for each feed into the Pipes workspace, and enter the URL for each site’s “Financial News” feed. Then, I would drag in a Union module that pulls all of that content together, a Filter module that only includes content whose description contains “Paulson bailout”, then another Filter that removes content with duplicate descriptions such as repurposed AP reports, and a Sort module that sorts your output by publication date.

Then you can start looking through the output of your Pipe to see how your content looks. If you’re noticing that you still have a couple strange things slipping through the cracks, you can tweak your logic accordingly over time. Perhaps most importantly, the open and modular nature of Pipes allows you to include (or be inspired by) Pipes that have been created and shared by other users, since the underlying architecture is openly accessible and easy to interpret.

Many people also add search controls to their pipe, so you can search across some of the most reliable sources. Here are some Pipes that I think are interesting:

What are your favorite Pipes?

About Dave

Dave Leonard is a Solutions Architect with Phase2 Technology. He is responsible for identifying business needs and translating them into functional requirements.

Throughout the project lifecycle, he assists with initial project ...

more >

Read Dave's Blog

Comments

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
  • Allowed HTML tags: <a> <strong> <code> <p> <img> <ul> <ol> <li> <h2> <h3> <h4> <b> <u> <i>
  • You may insert videos with [video:URL]

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.