Improving the happiness index

After my talk at Oxford Geek Night I was happy to have a couple of suggestions to see if the algorithm could produce better results. One was to remove retweets from the search, which makes sense as we all know from many Twitter bios “a RT does not imply endorsement”—and that was easy to implement as the basic Twitter search api returns retweets ‘old-style’ with “RT” at the head.

The other was more complex, so I’m going to quote Owen who emailed me directly:

“This morning I thought up an analogy. Suppose you have weather readings for the last 100 days. For each day you have temperature (T), humidity (H) and mm of precipitation (P). What you’re doing is multiplying these all together, presumably because you want to get one number out. Unfortunately this number is meaningless. If you wanted to combine these quantities in some way you should really be thinking about what meaning you’re attaching to the number you get out. I’m ignoring here the fact that you multiplied them all together, when in all likelihood adding them would make more sense. I suggest it would be more meaningful to keep track of them separately, and plot three graphs instead of one. Indeed, this is what is done with weather data.

You spoke about wanting to get a measure of how much spread a set of data has. What you want is the variance, or something like it. The average (more properly called the mean) of a set of numbers is obtained by adding them all up and dividing by the total number. This tells you something very useful, but it loses all information about how spread out the information was. The variance captures that. It’s a bit tricky to calculate. I’ll try to explain it here, but you can always google for more details. Suppose you have numbers a1 up to a100. The average is M = (a1 + a2 + … + a100) / 100. The calculate the variance we have to calculate some intermediate numbers. First, you have to calculate the average. Then you have to calculate the average of each number squared: Z = (a1^2 + … + a100^2) / 100. Now the variance is V = Z – M. I know that doesn’t seem to make much sense. There is a way of calculating the variance which makes it clearer why it’s any use, but it’s a bit harder to actually implement.

You might want to square root the variance to get the standard deviation. This is measured on the same scale as the original numbers you had, so it makes a bit more sense to use that instead.”

So, @IsOxfordHappy and the location sensitive page now do both of those. I’ve removed the ‘word scale’ for the time being till I can see roughly what the numbers are. Thanks everyone for your suggestions.

Is Oxford happy?

After moving down to Oxford I did an update of my Birmingham Emotions conversational psychogeography project. That’s now quite simple as I have built a ‘happy monitor’ that can centre anywhere. I’m not as happy myself as I was with the results however, whether due to the increasing volume of the Tweets that it analyses or something else the rating doesn’t move around too much. Such was the problem I proposed in a very quick talk at Oxford Geek Night 27. Here are the slides from the presentation, I think the audio was being recorded and will add if I get hold of it.

I’ve already had a number of suggestions about improving the equation or analysis, if they’re code-able by me I shall try. If not I will have to ask for help…

On a side note, the whole idea of conversational psychogeography came to me when I was thinking of putting an emotional wellbeing indicator in the form of a light at the top of Birmingham’s Rotunda (see how it’s still unfinished right at the top. That was back in 2008, but it seems that London has finally installed something a little similar. Drat.

You can get twice daily Oxford updates on Twitter.

Twitizen Kane

Yesterday, I tried the Twitpanto method on “the greatest film ever made”. As part of  ‘Yarn presents Five Stories High’ at Flatpack Festival, I re-interpreted around ten minutes of Citizen Kane. It was a tight deadline, so plans to do something really different fell behind just writing a script and getting together a few ‘actors’ I could trust.

In a live setting I was interested in how the audience would understand the language of the Twitter feed just being projected on the wall. I hoped to get heckles and confusing stuff too.

The script, is here. We got ‘moved on’ (for reasons of time I suspect) just before the bit about the principles, which I thought was the crux of it. Never mind.

I’m not sure everyone got what was going on but this quick review from another participant means that at least someone did:

[blockquote]”obviously, members of the audience start tweeting using the hashtag, and it was just hilarious. And silent, and awkward, but in a brilliant way.”[/blockquote]

The weekend’s other Flatpack activity for me was to chair a Q&A with Lawrence (ex of Felt etc), that was both more conventional and a little better received I think. Great fun, and really nice to meet a musical hero.

Excellent Engagement

Content, interaction, community—that’s what your social media profile is all about. It’s a message that seems to have hit most brands, and organisations right down to the smallest. But from what I’m seeing a lot of at the moment, there are a lot of people finding it hard to think about what to do once they get there.

There’s an episode of the Simpsons (Season Two, Episode 22), stay with me, where Mr Burns would like to be nice to Homer—but he knows nothing about him (nor really cares) so falls on the most bland of engagement:

“Hey there Mr….d’uh….Brown Shoes! How ’bout that local sports team eh?”

(Oddly for a great Simpson’s quote the video doesn’t seem to be on YouTube anywhere, but there is an audio clip here.)

Does that remind you of anything? Here’s a collection of Tweets reminding me of it that I collected on Friday:

It’s not exclusive to Twitter, nor the Royal Wedding: check out any number of Facebook fan pages or any social platform on a Friday lunchtime to see loads of “Hey guys, what are you doing this weekend. Let us know!” type-posts. They’re a close cousin of the way blogs starting up will often end their debut post with a plaintive cry of “what would you like to see?”

It is no doubt amusing to watch them all come in (and to watch the meme or cliche spread), but there’s something deeper I think—and some lessons to learn.

I think it sometimes happens because people are following what the mainstream media started to do a few years ago (‘have your say’). “Let us know!” became their coda to all stories, because they were getting to grips with the idea that people could converse and create en masse without their involvement. They were trying to channel this new thing called UCG through them so they could continue to act as gatekeepers, or perhaps they were genuinely excited by all of those pictures of snow. The TV programmes and the newspapers (and to an extent their associated online spaces) were offering an audience, much like Tony Hart in his gallery, and still do—hence the potential motivation for sharing your content through them.

Most brand social web channels don’t have such a huge audience, or if they have a big one it’s often very tightly around a subject—big wide and generic questions aren’t going to engage that audience. Your dry cleaners, or a skincare brand, aren’t the first place you think of to tell your plans for a Bank Holiday.

Possibly it also comes from a desire to “get into the conversation”, to make a brand seem like it’s one of your mates. Might work, if you’re trying to create a very small community round your social web space—if you’re usually about answering questions and sending out news, isn’t it a little odd? What are your other followers going to do with the information if you get it and and then you spread it?

Most of all, people probably do it because they see others doing the same. That’s one way to learn, but you need to think more deeply about whether any techniques apply to your situation—what they might achieve and how they might look. In essence if you’re attempting to engage around your brand then things closely related, or of direct relevance are going to hold more weight.

As a bonus here’s Mr Burn’s classic funk track ‘Look at all those idiots‘, including wailing guitar from Waylon Smithers. What’s your favourite Simpsons as metaphor for social web engagement story? Let us know!

Is Twitter about to do a mass reclaim of unused accounts?

We've missed you on Twitter! - - Gmail

It’s easy to sign up for a Twitter account, all you need is an email address. It used to be even easier, they weren’t even verified. I have, I estimate, about a hundred—lots used but others created for short-term projects or jokes. Some, in truth, in the same way as you register a domain name—idea half-formed but name assured.

That it was so, lead to a lot of great Twitter names claimed but unused and unloved. (@fry posted to about once very 6 months, @cat about the same)

Unless it violates a trademark, there’s no real mechanism for getting one freed up either.

Twitter has about the same sign-up to action ratio as most social web sites, but unlike Facebook for example your username, its uniqueness, its readability, matters. And those are getting used up too cheaply.

So, the first stage I think—the “where are you” email above, which I ‘ve just received. A shot that says ‘we did warn you’, when six months later—if you don’t log in— the account is closed and the name freed

How do I tell them that directing Twitpantos is a very, erm, seasonal activity?

If you’ve an account that you value, I’d take time to post every so often.

Is Twitter about to do a mass reclaim of unused accounts?

Sentiment Analysis of a Football Match

(click through for big)

Last night I turned my sentiment analysis tool on two hashtags: #bcfc and #avfc, the most widely used tags to refer to Birmingham City and Aston Villa during their League Cup quarter final game. It was a chance to see if visualising to ‘competing’ tags around the same event would be a useful exercise.

Caveats that would apply to this:

  • Some people use the tags instead of team names, meaning that they might be used by people supporting the other team (or no team at all)—most fans, though seem to tag with just the hashtag representing their team.
  • Some tweeters use both—these tweets could be removed technically, but make no difference to the comparative scores.
  • If there’s a subject that uses more slang or metaphor than football, it’s not often discussed on Twitter.

There was a generally a downward trend throughout the match, tension? Bad football? It could have been both. The first two goals seemed to have a much bigger impact than the third—this I don’t quite understand, but it seems to be more about the tweets themselves than the tool.

I could see how a special subject-set of emotion words could be created for football, which could cope with more nuanced or unusual words. It’s something to consider.

The sentiment scores in a Google spreadsheet, csv files: #avfc tweets (657 of which were during the game), #bcfc tweets (370 during).

The obligatory Wordle:

Sentiment Analysis of the X-Factor

As promised, I turned my Twitter sentiment analysis tool on the big TV/social web phenomenon that is the X-Factor. I started the script running at around 6:30pm and off again at 10:30pm — but the really interesting bit is during the show itself (thankfully watching the results stream in meant I didn’t have to watch the show itself).

It ran every minute and looked at the most recent 1,000 tweets tagged #xfactor.

emThe real reason for using the X-Factor is that I was aware just how violently the emotions can swing on Twitter when watching—and also it is a very defined timeline of events. The Valence (the happy-sad ratio, red line) had greater peaks and troughs in short times than any sentiment graphing project I’ve tried before.

The differences are far more prominent in the graph than any trends over the whole two and a half hours. Arousal (awake-ness, for want of a better word) was relatively constant, as was dominance (the feeling of control), although both jump up and down (within boundaries) along with Valence.

And who was ever-so unpopular around 8:50pm? This chap:

Next, I think I’ll try Question Time.

Sentiment Analysis and Twitter ‘wormals’

I’ve tried two experiments with the “is Birmingham happy” algorithm in the last few days, as they’re not based on place it makes more sense to use the popular term ‘sentiment analysis’ to refer to what it’s doing in this instance. As they were both reasonably short uses it was posible to update the reading often (and use a smaller number of tweets as the sample, giving more variation in the average scores) and give the sentiment graphs a live ‘wormal’ feeling, watching the ratings change over time.

First was on the Personal Democracy Forum EU conference in Barcelona, for the length of the two-day conference I monitored the hashtag #pdfeu every five minutes:

(click image for larger view)

The highest rating was 64.4% (at 12:45pm on Tuesday), the lowest 49.6% (Monday at 12:14pm during a short power failure). What was interesting to me was that the “arousal” rating seemed to work well as it stayed pretty steady during the power failure  (or even leaped up a little) even as the happiness of the hashtag users  dived. Post-lunch conference lulls and periods of excitement (the big spikes in day two, at least, corresponded with much applause) were mapped quite accurately.

The overall average was 57.29%. If you would like to explore or graph the data yourself, you can see in all in a Google Spreadsheet here.

Secondly I tried a much shorter and more mainstream application, David Cameron’s speech to the Conservative Party Conference:

cpchappyThe emotion tracking tool graphed here ran every 10 seconds during David Cameron’s speech to the CPC and analysed the last 100 tweets with the hashtag #cpc10 and the word “tories”. I chose two versions as I wasn’t sure that non-Conservative supporters would use the ‘official’ hashtag, I theorised that they would be likely to use the word ‘tories’. As it turned out I think that while there was a more even spread of pro and anti political types using the hashtag than I expected, but the ‘tories’ Tweeters were definitely more hostile. (See the data.) There was greater movement across the graph than on any other test I’ve run.

Conclusions? None so far, other than that I think this might be a very useful tool, and that more interesting data is created the more Tweets you have and the more you can afford (server-wise) to poll for results. I’m itching to try it on another big live event with conflicting opinions, that might mean training it on a reality TV event. Roll on the X-Factor.

Is Birmingham Happy?

I’ve been running a, very rough, scrape of the Birmingham (UK) based interweb for ’emotional wellbeing’ since April of 2008. Simply put a script running twice a day read in Tweets, news headlines and (originally) blog posts and compared the words within them to a table I’d drawn up of ’emotion’ words and fairly arbitrary scores.

It was surprisingly interesting to watch: despite its roughness, the internal consistency let patterns emerge. It broadly followed weather and sports results, with some peaks and dips you could map to specific happenings, or news stories.

graph of emotion scores

It lead to a spin off focussing on Tweets from MPs, which I think influenced some of the developments that Tweetminster produced in the next year or so.

It was the patterns that lead me to keep putting off improving the algorithm, but recent Twitter API developments meant I had to do some work anyway and that (together with another project, of which more soon) gave me the impetus to give the project an overhaul. And here’s how it works now…

Twitter’s geolocation services are now much improved, so I can specify a point (the centre of Victoria Square in Birmingham) and a radius (10 miles) and get a reasonably accurate dump of Tweet data back—the algorithm calls for the most recent 1000.

Twitter is now the sole focus of data, in keeping with the ‘conversational pychogeography‘ aims of the project (in essence, words used without too much pre-meditation are more interesting than those written purely for publication). It also provides much more and more reactive data.

The words contained within these tweets are then compared to data from the University of Florida (The Affective Norms for English Words – PDF link). Within that data set each word covered (there are around a thousand in the set I’ve using) is given a score for Valence (sad to happy on a scale 0-10), Arousal (asleep to awake on a scale of 0-10) and Dominance (feeling lack of control to feeling in control  on a scale of 0-10). The scores are then collated and a mean calculated. The overall emotional wellbeing score here is calculated as a mean of the three individual means, although the scores are revealed individually on the site.

I’m unsure if combining the results in this way is the best, which is why the site reveals the working — the Twitter feed just goes with one value for ease of understanding and adds a rating adjective too:

if ($brumemotion<100){$rating="fantastic";}
if ($brumemotion<90){$rating="superb";}
if ($brumemotion<80){$rating="good";}
if ($brumemotion<70){$rating="okay";}
if ($brumemotion<60){$rating="average";}
if ($brumemotion<50){$rating="quiet";}
if ($brumemotion<40){$rating="subdued";}
if ($brumemotion<30){$rating="low";}
if ($brumemotion<20){$rating="dreadful";}
if ($brumemotion<10){$rating="awful";}

The Twitter feed produces results twice a day, and these scores are being saved to visualise more graphically, but the website updates every ten seconds (and will self-refresh if you stay on the site) and also displays a word cloud of the currently found ’emotion words’:

is Brum happy right now?

Thoughts on further development

I’ve been experimenting with more local results (here is a version running on just one Birmingham post code — B13) as well as live graphing. I also have a version that will analyse results for a hashtag—something we may use in conjunction with the Civico player to produce ‘wormals’ (graphs of sentiment) during conferences.

But for now, I’m happy to let the new algorithm bed in—wondering about the amount of data and frequency that will be required to see the most detail—and to see what patterns we can spot.

Feedback welcome. Go see for yourself or follow on Twitter.

Loose Tweets Sink Fleets

WWIII Propaganda: Loose Tweets Sink Fleets

…or otherwise carefully crafted communications anyway.

If you’re newish to Twitter and attempting to communicate with people to achieve anything more than a way to update friends or follow people you like you might want to print this out and stick it close to your monitor.

This stuff doesn’t seem to be explained clearly enough by Twitter or people who are encouraging its use, based on the number of people I see trying to reach an audience and scuppering themselves a bit by making these mistakes:

Think Before You Tweet: People can’t DM you if you’re not following them. A tweet starting with a @username can only be seen by those following both of you. You can’t guess a username, a typo or no space after breaks them.

Once you remember, pass it on.