Predicting Game Outcomes through Tweet Volume
Predicting Game Outcomes through Tweet Volume
Tweets can tell us a lot about how the population feels during a period of time or event. Along with replying or tweeting with hashtags (or including relevant information) to contribute to a topic, a user can browse through for ideas in specific areas or simply observe what’s trending as indicated by the popular subjects or hashtags on Twitter. By taking advantage of Twitter’s accessibility, we decided to look at tweets pertaining to the Army-Navy football game on December 10th of this year.
Following the Army Navy Game Flow in Tweets
While a lot can be learned by reading tweets or analyzing the sentiment of the tweets, by making simple assumptions about those who tweet, we will be able to extract a lot from a simple distribution of the specific samples of tweeters. First, we will start with a base of tweets that include handles and tweets that relate to the Army-Navy game including, but not limited to, #GoArmyBeatNavy and #GoNavyBeatArmy. Second, we determine the team that the tweet supports, if something like #GoArmyBeatNavy is the hashtag, the tweet supports Army (likewise for Navy). If tweets do not include anything specific, or contain hashtags or phrases specific to both, then the tweet will be considered “not team supporting” or general.
By monitoring the tweets per minute for the 4 hour window from 2 PM (when coverage of the game started on TV) to around 6PM (about when the game ends, 6:15 PM), we can see that the total tweets with game indications reflect game events. The only information pulled from these tweets are whether or not they contain Army-Navy phrases (mentioned earlier) and their timestamp; the tweets are not being read.
Tweets per Minute with Game Events in Real Time
We see that the volume of tweets increases during significant events during the game. For each team, this image becomes more clear.
Comparing the teams, we see that there is considerable more volume for Army supporters, when Army is ahead, and in the small time frame that Navy is ahead, this holds true. We see peaks for Army at their 7-0 lead, 14-0 score, 21-17 lead, and their official victory. For Navy, we see peaks at their first score, and their only lead at 17-14. Both supporters tweet at the beginning of the game and throughout. The main takeaway is that the support is stronger when winning seems more likely, which gives us a larger peak at 14-0, than our peak at 7-0, and an enormous peak right before victory is certain. There is also information we can pull from general game tweets (not in support of a specific team).
General Game Tweets and Team Specific Tweets
We see green, or Army supporters, and blue, or Navy supporters, only after key moments in the game. The general game tweets increase in density during exciting moments that do not directly impact the who will win (or do not result in a score). It can be seen that the largest peak for general tweets, which do not directly support either team, correlates to the time where President-elect Donald Trump visits the announcing booth at half time.
Following where we see blue and green, we see the largest volume of tweets is not during all events of success, but during events of success that give their team an advantage that would lead to victory. If we assume that supporters do not change (flip) support throughout the game, and supporters express themselves more frequently when victory is imminent, as seen by the previous peaks mentioned and at the end of the game. Simply put, we are assuming that majority of the people supporting a team through social media, only do so when they are confident that support will not be denounced. For this reason, we have determined that through tweet volume of specific supporters for both teams, we can produce a Win Confidence for each team throughout the game.
Win Confidence of Tweeting Supporters
The produced “Win Confidence” of tweeting fans is comparable to “Win Probabilities” found on sports websites and can be produced in real time over the course of the game. The only difference is that the “Win Probabilities” come from game scores, times, and situations, whereas the “Win Confidence” is created without watching the game, knowing game details, or even reading the tweets counted.
Before the game, the Win Confidence (represented below in yellow, with a moving average in blue, when Navy’s fans are more confident, and green, when Army’s fans are more confident) bounces back and forth, and because Army scores twice without Navy scoring, their fans reach 70% confidence. When Navy scores for the first time, even though they do not immediately take the lead, their supporters gain a larger than 50% confidence in a win. After scoring twice more without a response, their confidence grows to almost 90%. But when Army takes the lead and closes out the game, the Army Win Confidence approaches 100%. So as not to violate ESPN’s intellectually property rights (it would have been cool to see their Win Probability overlaid on our graph (the correlation was pretty significant!)), we created our own Win Probability using a simple linear regression model which is the thin red line.
By observing the tweeting fan bases in real time throughout a game, you can see that their confidence reflects the current game status and is a good predictor of probable game outcome. We find this result exciting because it shows us that individual people through their various experiences build internal “machine learning” algorithms to establish confidence in their team; when aggregated the sum of their knowledge can compete with the opinion of an expert and computer based models.