As a company devoted to finding the meaning in piles of data…
We thought it would be interesting to try to understand how people feel about the presidential candidates through tweets. In this post, we will go through how we used a simple sentiment analysis to interpret how the Greater Richmond Area felt about the two presidential candidates during the debates on the nights of September 26th, October 9th, and October 19th. Sentiment analysis is the extraction of positive or negative attitudes from a snippet of text. The main goal is not to determine which candidate is more positive or negative, but to observe whether the tweets mentioning that candidate are positive or negative in regards to the attitude of the writer (regardless of whether they support or do not support that candidate).
Preliminary Analysis – Overview of Tweets and Tweeters
In our preliminary analysis, we noticed that Trump was mentioned in roughly twice as many tweets as Clinton, and that the total number of tweets went down for each debate, with the third debate being roughly two thirds the volume of the first. The graphs that follow show a four hour window of the two hour debates with an hour buffer on either side (8 PM to 12 AM, for debates starting at 9 PM). We see that Trump is mentioned in tweets more than Clinton for almost every minute of all three debates.
We can also spot moments of great interest during the debate. Without distinguishing whether these moments were well or ill received, we were able to look at each of these incidents through the tweets and determine what the topic of discussion was. In the first debate, the high traffic point was a discussion mentioning Trump and “stop and frisk”. The second debate’s high point was the opening question to Trump about a video that was released the week before involving “locker room talk” (the candidate used the phrase “locker room banter” but “locker room talk” was the phrase most seen in the tweets). The final debate found a high point of tweets per minute when Trump had a string of comments in short succession involving the phrases “hombres”, “very much better”, and “big league”. These comments were repeated in the tweets along with mention of the candidate. The most tweeted minute of all three debates was the one centered around “locker room talk”. And in addition to “stop and frisk” from the first debate, the only other minute with over 30 tweets came up in the second debate when Martha Raddatz, the moderator, said, “let me repeat the question.” This comment even received laughter from the audience, if you were watching on TV.
In order to make sure that the tweets were not coming from one person about one candidate expressing one attitude, we wanted to see how many unique tweeters we had. In other words, we wanted to know how many people were sending the tweets that we were seeing. To observe this, we took a percentage of tweeters per tweets, or number of people over number of tweets they sent (again we did this by candidate for each debate). The observation here is that not only did the number of tweets decrease for each debate, but the number of people sending them also decreased.
Examining tweet sentiment
It’s important to reiterate here that sentiment analysis does not provide us with a tweeter’s opinion of a particular candidate. Rather it strives to reveal the attitude of the writer through the tone of the text.
Our sentiment analysis algorithm divided tweets into positive, negative, or neutral categories; in our analysis, we only examined those that were positive or negative. These followed the trends of the tweets by debate. This can be seen in the numbers of positive and negative tweets below. A trend specific to the sentiment analysis was that in some cases the number of negative tweets was as large as ten times the number of positive tweets. We see that for each candidate and even when both are mentioned negative tweets outweigh the positive ones. Also, for each candidate, the percent of negative tweets decreased for each debate. And when both candidates were mentioned in a tweet, there was higher chance that the tweet was negative than if either of the candidates was mentioned alone.
With the data, we calculated what we call a Running Sentiment Score (RSS) which assesses the net positivity or negativity of the tweets for each minute of each debate by candidate. In other words, if there is a negative tweet mentioning a candidate, then the RSS would drop by 1; likewise, if there is a positive tweet mentioning a candidate, the RSS would go up by 1. This skews the data in the graph below because majority of the political tweets mentioned Trump. However, the RSS does represent the popularity of the debates, decreasing with both candidates. The only adjustment of this trend happens for Trump’s second debate, where that RSS is slightly lower than the more popular first debate for the first 90 minutes. The RSS for both candidates in all three debates decreases as negative tweets mentioning them outweigh the positive tweets.