After my talk at Oxford Geek Night I was happy to have a couple of suggestions to see if the algorithm could produce better results. One was to remove retweets from the search, which makes sense as we all know from many Twitter bios “a RT does not imply endorsement”—and that was easy to implement as the basic Twitter search api returns retweets ‘old-style’ with “RT” at the head.
The other was more complex, so I’m going to quote Owen who emailed me directly:
“This morning I thought up an analogy. Suppose you have weather readings for the last 100 days. For each day you have temperature (T), humidity (H) and mm of precipitation (P). What you’re doing is multiplying these all together, presumably because you want to get one number out. Unfortunately this number is meaningless. If you wanted to combine these quantities in some way you should really be thinking about what meaning you’re attaching to the number you get out. I’m ignoring here the fact that you multiplied them all together, when in all likelihood adding them would make more sense. I suggest it would be more meaningful to keep track of them separately, and plot three graphs instead of one. Indeed, this is what is done with weather data.
You spoke about wanting to get a measure of how much spread a set of data has. What you want is the variance, or something like it. The average (more properly called the mean) of a set of numbers is obtained by adding them all up and dividing by the total number. This tells you something very useful, but it loses all information about how spread out the information was. The variance captures that. It’s a bit tricky to calculate. I’ll try to explain it here, but you can always google for more details. Suppose you have numbers a1 up to a100. The average is M = (a1 + a2 + … + a100) / 100. The calculate the variance we have to calculate some intermediate numbers. First, you have to calculate the average. Then you have to calculate the average of each number squared: Z = (a1^2 + … + a100^2) / 100. Now the variance is V = Z – M. I know that doesn’t seem to make much sense. There is a way of calculating the variance which makes it clearer why it’s any use, but it’s a bit harder to actually implement.
You might want to square root the variance to get the standard deviation. This is measured on the same scale as the original numbers you had, so it makes a bit more sense to use that instead.”
So, @IsOxfordHappy and the location sensitive page now do both of those. I’ve removed the ‘word scale’ for the time being till I can see roughly what the numbers are. Thanks everyone for your suggestions.
After moving down to Oxford I did an update of my Birmingham Emotions conversational psychogeography project. That’s now quite simple as I have built a ‘happy monitor’ that can centre anywhere. I’m not as happy myself as I was with the results however, whether due to the increasing volume of the Tweets that it analyses or something else the rating doesn’t move around too much. Such was the problem I proposed in a very quick talk at Oxford Geek Night 27. Here are the slides from the presentation, I think the audio was being recorded and will add if I get hold of it.
I’ve already had a number of suggestions about improving the equation or analysis, if they’re code-able by me I shall try. If not I will have to ask for help…
On a side note, the whole idea of conversational psychogeography came to me when I was thinking of putting an emotional wellbeing indicator in the form of a light at the top of Birmingham’s Rotunda (see how it’s still unfinished right at the top. That was back in 2008, but it seems that London has finally installed something a little similar. Drat.
You can get twice daily Oxford updates on Twitter.