Site Link: http://election.twitter.com/
Mr. Sharp said the index had a database of thousand of words to understand if these Twitter messages were for or against a candidate. As these messages are being shared by millions of people on Twitter, the software also takes into account colloquialisms.
Mr. Sharp noted that “bad,” for instance, could mean bad, or it be slang for good. He said that Topsy could differentiate between these words in a sentence and if they are positive or negative.
Topsy uses Twitter’s high-volume fire hose of data to look at every tweet in the world, and establish a neutral baseline. Separately, it looks at all the tweets about Barack Obama and Mitt Romney, runs a sentiment analysis on them, and compares this analysis to the baseline. It looks at three days’ worth of tweets each day, weighting the newer ones higher than then older ones. It then returns a numerical score for each candidate based on how tweets about the individual compare to all tweets as a whole. A completely neutral score would be 50. Anything above that is a net positive, while lower is a net negative.
So, for example, if Obama has a score of 38, that would mean that tweets about him are more positive than 38 percent of all other messages on Twitter.
So Twitter began working with polling groups and Topsy to look into the political data buried in the din of constant online chatter — they wanted a better way to measure the sentiment voters were expressing in real-time. Topsy would look at every single tweet sent in the world, every day, and create a three-day average baseline. It created an algorithm to understand which tweets skewed positive and which were negative. Together, Twitter and Topsy built a keyword engine, and via repetitive, ongoing spot checks by human observers, they found their algorithm would generate voter-accurate results 90 percent of the time.
And that was just the beginning of a refinement process. Every time they ran the data set against human curators and found differences, they were able to improve the algorithm. What Twitter eventually built was the Twindex. It didn’t rely on questions, and could be generated in real-time. And when Twitter compared the Twindex for Obama with Gallup’s approval rating, the graph was remarkable.