Blog

  • Inklings to Answers; Data Guides

    Challenges in Crowdsourcing

    One of our small Twitter-based projects was capturing a score during the game along with the time of the score, and deducing how they scored (if there are multiple ways to do so). We felt that by crowdsourcing this task it could be possible reduce human error as well as decrease the number of employees designated to this specific task (after all, there are no sensors detecting which team crosses a goal line, interpreting the officials, and linking the clock and the scoreboard, yet).

    Introduction

    We began testing our theory by tracking college football games and noticed that Twitter could accurately recall the score, but our algorithm would do so a minute after the touchdown. (In a football game, after a touchdown is scored, there is an extra point attempt, which could add an extra point, or two depending on the team, that team’s approach, and the situation. The majority of the tweets did not recognize the touchdown until after the extra point was made.)

    Because kicked extra points are near certainties (percentages in the high 90’s), the impression is that a touchdown is worth 7 points. For this reason, scores are reported as 7-0 and 14-0, instead of 6-0, 7-0, 13-0, and then 14-0. For the football fan, this is common. And because football fans are tweeting, they want to express touchdowns in the full 7 points. More desirable than being consistent (with the general trends of what one is tweeting about), is being correct. When tweeting about something like score, the inclination to tweet correctly is large, forcing those who tweet about it to be accurate with it. These ideas combined to force the tweeting crowd to wait for almost a minute, until the extra point is made, to tweet the score.

    We also noticed that the volume was decreased for meaningless scores. While the score could be extracted when it changed from 7-0 to 7-7, the number of those who tweeted this were fewer (than when the score became 7-0). The score meant less (the game became even again; one team lost the lead, the other was not ahead, and it was early). In this case, tweeters like to tweet something that means something.

    Gameplay extracted from tweets

    Other Issues in Crowdsourcing Information

    For this delay issue, there was an easy fix: subtract the time by a minute to understand when the score happened. However, this exercise taught us about how majority of people like to tweet confidently, accurately, and emphatically when it relates to sports scores.

    Other behavioral trends can be seen with other subject matters, but they are something that needs to be understood when interpreting the results or emotions of the masses. With the number of experiments, we were able to dig and find other issues that could produce errors, or misrepresentations, in crowdsourcing information. Beyond processing volume or understanding geographical tag discrepancies of tweets and users (twitter related dilemmas), there are elements of crowd behaviors that could contribute to errors, or fraudulent representations.

    "That last retweet, some of you wouldn't understand."

    Retweets (increase in volume for unknown reason)

    Retweets can be a great source of volume to represent an accurate factual message. However, retweets can happen sarcastically as well, to send the opposite message. Also, something can be retweeted indicating that it is not factual. And retweets can happen because the source is popular. The likelihood of this happening for a sports score is slim, but for political issues, this happens frequently. Because retweets quote verbatim, there is no difference between the original message and the messages that copy it. Therefore, there is no indication of whether this is retweeted for accuracy, sarcasm, fallacy, or for no reason at all.

    "who's going to win? retweet so more people can vote"

    Marketing (increase in volume from only specific sample)

    In attempt to attract viewers and clicks, many marketing campaigns link to twitter. It is not difficult to remove general marketing tweets (so this will be ignored here); however, some could represent meaningful information. For example, during a political event, a reputable newspaper allowed survey takers to tweet how they would vote in the upcoming election. Tweeting surveys happens pregame for sporting events as well. The tweet will include a link to the survey; so, much like a retweet, these tweets will spread quickly. This spread results in a jump of volume. This jump in volume can be problematic because the marketing campaign might not reach everyone, in fact, it could be generated by a non-neutral party (if an opinion is sought), or a source that is geographically bounded. For this reason, the number and sentiment of tweets might misrepresent the overall opinion of users.

    "Can you recommend anyone for this job?"

    Uniformity of Tweets (not using the appropriate semantics)

    Another issue is that the tweeting style of people differs. A specific example is the character limit, there are many cases when people use multiple tweets to send one message. While this issue can be resolved fairly easily, when a large volume of tweets are imported, the issue can be magnified by retweets. A message can be represented and then altered or completely adjusted by a follow up tweet, which takes time to go through. Another example is that tweeting through emojis can be done with accurate emotional tones or sarcastically, depending on the user and situation. Understanding the sentiment of specific emojis in combination with others and the words can become extremely important. Also, the use of hashtags can categorize comments or even express a point of view, but the understanding of these hashtags is not necessarily uniform across the population. A final example is that tweets come in different languages, and some include multiple languages. Technically, each must be understood to capture each message.

    "When you typo in your hashtag"

    Typos (unknowingly not using appropriate semantics)

    A big issue, when there is a lack of quality control, is the typo. The typo can turn a sentence from the writer’s mind into a different sentence to the reader’s mind. These differences can be simple, meaningless gibberish or comprehensible expressions of the exact opposite thing. Again, retweets can magnify this issue. This idea of a typo can overlap with uniformity of meaning of emojis or words as well; the reason it is listed separately is because there are many cases when the user themselves is unsure of what they have said (inconsistent with what they meant).

    "so y'all gone RT the tweet with the typo... ok."

    Twitter Population (samples might not represent population)

    The next issue in crowdsourcing through tweets is that the population of the people who tweet is not representative of the population as a whole. Unfortunately, the people who tweet do not represent Twitter as a whole, and then people who have accounts do not represent the population by means of age, race, geography, or social status. Even if everyone had the means and desire to tweet, not all people allow their accounts to be represented via geographical indicators. For these reasons, opinions and ideas that are represented through tweets might not be indicative of the entire population.

    ambiguous tweet

    Conclusion

    Mining of Twitter and other social media outlets allows for the efficient extraction of factual information, and it can also be used to develop an interpretation of sentiment of topics. However, there are many factors which the users must be aware of in order to distill the imported wealth of data to create, or relay, an accurate digestible message. Through experience, we have been able to remove a few of these issues, and we continue to reduce others as we move forward with each project. In general, these issues do not distort the representations, but their understanding can further help portray specific moments of importance during significant events.

  • Predicting Game Outcomes through Tweet Volume

    Tweets can tell us a lot about how the population feels during a period of time or event. Along with replying or tweeting with hashtags (or including relevant information) to contribute to a topic, a user can browse through for ideas in specific areas or simply observe what’s trending as indicated by the popular subjects or hashtags on Twitter. By taking advantage of Twitter’s accessibility, we decided to look at tweets pertaining to the Army-Navy football game on December 10th of this year.

    Following the Army Navy Game Flow in Tweets

    While a lot can be learned by reading tweets or analyzing the sentiment of the tweets, by making simple assumptions about those who tweet, we will be able to extract a lot from a simple distribution of the specific samples of tweeters. First, we will start with a base of tweets that include handles and tweets that relate to the Army-Navy game including, but not limited to, #GoArmyBeatNavy and #GoNavyBeatArmy. Second, we determine the team that the tweet supports, if something like #GoArmyBeatNavy is the hashtag, the tweet supports Army (likewise for Navy). If tweets do not include anything specific, or contain hashtags or phrases specific to both, then the tweet will be considered “not team supporting” or general.

    By monitoring the tweets per minute for the 4 hour window from 2 PM (when coverage of the game started on TV) to around 6PM (about when the game ends, 6:15 PM), we can see that the total tweets with game indications reflect game events. The only information pulled from these tweets are whether or not they contain Army-Navy phrases (mentioned earlier) and their timestamp; the tweets are not being read.

    Tweets per Minute with Game Events in Real Time

    tweets a minute of the army navy football game

    We see that the volume of tweets increases during significant events during the game. For each team, this image becomes more clear.

    tweets in support of army per minute of the army navy football game
    tweets in support of navy per minute of the army navy football game

    Comparing the teams, we see that there is considerable more volume for Army supporters, when Army is ahead, and in the small time frame that Navy is ahead, this holds true. We see peaks for Army at their 7-0 lead, 14-0 score, 21-17 lead, and their official victory. For Navy, we see peaks at their first score, and their only lead at 17-14. Both supporters tweet at the beginning of the game and throughout. The main takeaway is that the support is stronger when winning seems more likely, which gives us a larger peak at 14-0, than our peak at 7-0, and an enormous peak right before victory is certain. There is also information we can pull from general game tweets (not in support of a specific team).

    General Game Tweets and Team Specific Tweets

    non team specific tweets per minute of the army navy football game
    total tweets per minute for the army navy football game

    We see green, or Army supporters, and blue, or Navy supporters, only after key moments in the game. The general game tweets increase in density during exciting moments that do not directly impact the who will win (or do not result in a score). It can be seen that the largest peak for general tweets, which do not directly support either team, correlates to the time where President-elect Donald Trump visits the announcing booth at half time.

    Following where we see blue and green, we see the largest volume of tweets is not during all events of success, but during events of success that give their team an advantage that would lead to victory. If we assume that supporters do not change (flip) support throughout the game, and supporters express themselves more frequently when victory is imminent, as seen by the previous peaks mentioned and at the end of the game. Simply put, we are assuming that majority of the people supporting a team through social media, only do so when they are confident that support will not be denounced. For this reason, we have determined that through tweet volume of specific supporters for both teams, we can produce a Win Confidence for each team throughout the game.

    Win Confidence of Tweeting Supporters

    The produced “Win Confidence” of tweeting fans is comparable to “Win Probabilities” found on sports websites and can be produced in real time over the course of the game. The only difference is that the “Win Probabilities” come from game scores, times, and situations, whereas the “Win Confidence” is created without watching the game, knowing game details, or even reading the tweets counted.

    Before the game, the Win Confidence (represented below in yellow, with a moving average in blue, when Navy’s fans are more confident, and green, when Army’s fans are more confident) bounces back and forth, and because Army scores twice without Navy scoring, their fans reach 70% confidence. When Navy scores for the first time, even though they do not immediately take the lead, their supporters gain a larger than 50% confidence in a win. After scoring twice more without a response, their confidence grows to almost 90%. But when Army takes the lead and closes out the game, the Army Win Confidence approaches 100%. So as not to violate ESPN’s intellectually property rights (it would have been cool to see their Win Probability overlaid on our graph (the correlation was pretty significant!)), we created our own Win Probability using a simple linear regression model which is the thin red line.

    army navy football game fan win confidence with win prediction

    By observing the tweeting fan bases in real time throughout a game, you can see that their confidence reflects the current game status and is a good predictor of probable game outcome.  We find this result exciting because it shows us that individual people through their various experiences build internal “machine learning” algorithms to establish confidence in their team; when aggregated the sum of their knowledge can compete with the opinion of an expert and computer based models.

  • Tweets Mentioning Candidates by Debate

    Sentiment Analysis of Regional Tweets About Presidential Candidates During the Debates

    As a company devoted to finding the meaning in piles of data…

    We thought it would be interesting to try to understand how people feel about the presidential candidates through tweets. In this post, we will go through how we used a simple sentiment analysis to interpret how the Greater Richmond Area felt about the two presidential candidates during the debates on the nights of September 26th, October 9th, and October 19th. Sentiment analysis is the extraction of positive or negative attitudes from a snippet of text. The main goal is not to determine which candidate is more positive or negative, but to observe whether the tweets mentioning that candidate are positive or negative in regards to the attitude of the writer (regardless of whether they support or do not support that candidate).

    Preliminary Analysis – Overview of Tweets and Tweeters

    In our preliminary analysis, we noticed that Trump was mentioned in roughly twice as many tweets as Clinton, and that the total number of tweets went down for each debate, with the third debate being roughly two thirds the volume of the first. The graphs that follow show a four hour window of the two hour debates with an hour buffer on either side (8 PM to 12 AM, for debates starting at 9 PM). We see that Trump is mentioned in tweets more than Clinton for almost every minute of all three debates.

    Tweets Mentioning Candidates by Debate

    We can also spot moments of great interest during the debate. Without distinguishing whether these moments were well or ill received, we were able to look at each of these incidents through the tweets and determine what the topic of discussion was. In the first debate, the high traffic point was a discussion mentioning Trump and “stop and frisk”. The second debate’s high point was the opening question to Trump about a video that was released the week before involving “locker room talk” (the candidate used the phrase “locker room banter” but “locker room talk” was the phrase most seen in the tweets). The final debate found a high point of tweets per minute when Trump had a string of comments in short succession involving the phrases “hombres”, “very much better”, and “big league”. These comments were repeated in the tweets along with mention of the candidate. The most tweeted minute of all three debates was the one centered around “locker room talk”. And in addition to “stop and frisk” from the first debate, the only other minute with over 30 tweets came up in the second debate when Martha Raddatz, the moderator, said, “let me repeat the question.” This comment even received laughter from the audience, if you were watching on TV.

    Tweet-able Debate Moments

    In order to make sure that the tweets were not coming from one person about one candidate expressing one attitude, we wanted to see how many unique tweeters we had. In other words, we wanted to know how many people were sending the tweets that we were seeing. To observe this, we took a percentage of tweeters per tweets, or number of people over number of tweets they sent (again we did this by candidate for each debate). The observation here is that not only did the number of tweets decrease for each debate, but the number of people sending them also decreased.

    Percent of Distinct Tweeters by Candidate by Debate

    Examining tweet sentiment

    It’s important to reiterate here that sentiment analysis does not provide us with a tweeter’s opinion of a particular candidate. Rather it strives to reveal the attitude of the writer through the tone of the text.

    Our sentiment analysis algorithm divided tweets into positive, negative, or neutral categories; in our analysis, we only examined those that were positive or negative. These followed the trends of the tweets by debate. This can be seen in the numbers of positive and negative tweets below. A trend specific to the sentiment analysis was that in some cases the number of negative tweets was as large as ten times the number of positive tweets. We see that for each candidate and even when both are mentioned negative tweets outweigh the positive ones. Also, for each candidate, the percent of negative tweets decreased for each debate. And when both candidates were mentioned in a tweet, there was higher chance that the tweet was negative than if either of the candidates was mentioned alone.

    Number of Positive and Negative Tweets by Debate by Candidate
    Percent Positive and Negative Tweets by Debate by Candidate

    With the data, we calculated what we call a Running Sentiment Score (RSS) which assesses the net positivity or negativity of the tweets for each minute of each debate by candidate. In other words, if there is a negative tweet mentioning a candidate, then the RSS would drop by 1; likewise, if there is a positive tweet mentioning a candidate, the RSS would go up by 1. This skews the data in the graph below because majority of the political tweets mentioned Trump. However, the RSS does represent the popularity of the debates, decreasing with both candidates. The only adjustment of this trend happens for Trump’s second debate, where that RSS is slightly lower than the more popular first debate for the first 90 minutes. The RSS for both candidates in all three debates decreases as negative tweets mentioning them outweigh the positive tweets.

    Running Sentiment Score of Each Debate for Each Candidate