I’ve been running a, very rough, scrape of the Birmingham (UK) based interweb for ‘emotional wellbeing’ since April of 2008. Simply put a script running twice a day read in Tweets, news headlines and (originally) blog posts and compared the words within them to a table I’d drawn up of ‘emotion’ words and fairly arbitrary scores.
It was surprisingly interesting to watch: despite its roughness, the internal consistency let patterns emerge. It broadly followed weather and sports results, with some peaks and dips you could map to specific happenings, or news stories.
It lead to a spin off focussing on Tweets from MPs, which I think influenced some of the developments that Tweetminster produced in the next year or so.
It was the patterns that lead me to keep putting off improving the algorithm, but recent Twitter API developments meant I had to do some work anyway and that (together with another project, of which more soon) gave me the impetus to give the project an overhaul. And here’s how it works now…
Twitter’s geolocation services are now much improved, so I can specify a point (the centre of Victoria Square in Birmingham) and a radius (10 miles) and get a reasonably accurate dump of Tweet data back—the algorithm calls for the most recent 1000.
Twitter is now the sole focus of data, in keeping with the ‘conversational pychogeography‘ aims of the project (in essence, words used without too much pre-meditation are more interesting than those written purely for publication). It also provides much more and more reactive data.
The words contained within these tweets are then compared to data from the University of Florida (The Affective Norms for English Words – PDF link). Within that data set each word covered (there are around a thousand in the set I’ve using) is given a score for Valence (sad to happy on a scale 0-10), Arousal (asleep to awake on a scale of 0-10) and Dominance (feeling lack of control to feeling in control on a scale of 0-10). The scores are then collated and a mean calculated. The overall emotional wellbeing score here is calculated as a mean of the three individual means, although the scores are revealed individually on the site.
I’m unsure if combining the results in this way is the best, which is why the site reveals the working — the Twitter feed just goes with one value for ease of understanding and adds a rating adjective too:
The Twitter feed produces results twice a day, and these scores are being saved to visualise more graphically, but the website updates every ten seconds (and will self-refresh if you stay on the site) and also displays a word cloud of the currently found ‘emotion words':
Thoughts on further development
I’ve been experimenting with more local results (here is a version running on just one Birmingham post code — B13) as well as live graphing. I also have a version that will analyse results for a hashtag—something we may use in conjunction with the Civico player to produce ‘wormals’ (graphs of sentiment) during conferences.
But for now, I’m happy to let the new algorithm bed in—wondering about the amount of data and frequency that will be required to see the most detail—and to see what patterns we can spot.
Feedback welcome. Go see for yourself or follow on Twitter.