Here’s a pipe I’ve created that attempts to marshal the content from hyperlocal blogging in Birmingham and allow people only to subscribe to feeds that interest them. This is a piece of investigation and experimentation that I’ve been able to find the time to do thanks to Will Perrin and his hyperlocal blogging initiative Talk About Local. Will also helped define the reason why it would be useful to do — for what he called “lazy journalists”.
Lazy here is used in the same way that it might be used — in praise — of a computer programmer; that is, lazy means you’ll work hard at setting yourself up right to make sure you get everything you need easily later on. Will got to the crux of the argument by saying that journalists interested in a subject — let’s say noise abatement issues — could easily find examples of those at a local level outside the areas they physically know.
So this is a run through of the decisions made in building it (and what other options could work), it’s no more than a prototype at this stage so comments and improvements are very welcome. However if you would rather just get stuck into the pipe itself, head on over.
What are we collating?
This is actually one of the most important decisions, blogs about an area (rather than being based there, see this list of Birmingham based blogs for the size of the task without that) are what we want and this isn’t a simple case of aggregation — we wouldn’t want it clogged up with planning applications, flickr photos and BBC travel news, if any of those subjects are interesting then we trust the local bloggers to raise them. There are two levels of human filtering here, first those of the bloggers and secondly the person selecting the blogs is filtering for interest/quality too. In this instance I used Pete Ashton‘s local blog tumblr as a basic list to pick from (some were discarded, I think you can see why from the descriptions there – not a reflection of quality, more subject matter).
Why Birmingham? Simply because I’m there and that helps when making value judgements on the contents and context, a little local knowledge or a deal of research is needed to select which sites to get info from.
Why a pipe?
Yahoo Pipes make operations on RSS feeds easy to understand and alter, plus they are easily copied (cloned) and so people are free to make improvements or use the method for their own areas. Another reason for using a pipe is that it what it’s doing is passing the content through, it’s not storing or replicating the information, so no issues of permission arise.
Luckily in this instance all of the sites we’re interested in have RSS feeds, and almost all have full-text feeds. Tools like Dapper’s Dapp Factory will attempt to produce RSS content from sites that don’t offer it, and can work quite well.
You could attempt to do this via search, it’s possible to set up a custom Google that is restricted to a list of hand added sites (and here I attempted it) — useful for one off searches (although you’ve still to find the search engine) but not “lazy” enough. It would also be possible to use Google Reader, which can be used to collect feed items that you later search within, and offer a hand curated OPML list of relevant sites — two issues with this, one is that you still need to “read” (or “mark as read”) the items as they come in which is time consuming. The more important issue is that by using a pipe blogs and sites can be added/removed by the curator without people using the pipe having to bother.
How is the pipe built?
The first task is to take the RSS feeds of the blogs and add them to a ‘Fetch Feed’ module:
These feeds are all for blogs that are simply about Birmingham (or areas within it), we can expect the content, titles and tags to be useful in filtering from the collection of posts.
For blogs with specific subject remits as well as area, there is something else we can do to increase the quality of the filtering. By adding tags that apply to the whole site we can add information into the feed — the thought being that sites about a subject may not mention it in every post. For example the blog Created in Birmingham is about art in Birmingham, although every post may not mention the word “art”. We can add this into the feed with a little jiggery with regex:
What we’re doing here is creating another field in the RSS feed and adding the “tag” (or tags) that we wish to apply to the whole feed (I’ll happily admit that my knowledge of regex isn’t huge, this works but a more elegant solution would be very welcome). These feeds can then be combined with the feeds we’re grabbing unaltered.
Then comes the filter, the text box (yellow and on right in the illustration) is what makes the pipe work for any specific subject. It is what creates the box on the pipe home page for input:
The text input is used to filter by only letting a feed through if there’s a text match in either the title, description, guid (the URL which can contain useful category info) or the “tags’ (which we added).
That’s basically it — sorting the feed wasn’t that useful (the initial results aren’t as important as what’s fed through in future) and as we’re hand picking feeds (rather than using search) we shouldn’t need to run a “uniqueness” operation.
Results and further development
For a prototype the results are good. Trying searches like “council” or “football” works well, and even more detailed queries such as “noise abatement” produce what I’d expect.
Due to the “matching” algorithm used by Yahoo pipes a search for “art” for example will match “part” too — I couldn’t see a way to make sure it matched whole words only.
The adding of the tags probably needs further testing, more synonyms and a few more feeds to test to make sure it works. You could in theory take the content of each blog post and use replace to add synonyms for all sorts of words (football/soccer for example) — this would be time consuming, but may add to the usefulness of the additional tagging.
Thoughts, improvements, versions for other locations (or subjects perhaps) would be very interesting to see — do please tell.
The filter-able pipe is here: http://pipes.yahoo.com/bounder/birminghamlocal please clone and develop (and for those who just fancy all of the content, a feed without filter is here: http://pipes.yahoo.com/bounder/birminghamlocalblogs )