Challenges in Crowdsourcing

Challenges in Crowdsourcing

One of our small Twitter-based projects was capturing a score during the game along with the time of the score, and deducing how they scored (if there are multiple ways to do so). We felt that by crowdsourcing this task it could be possible reduce human error as well as decrease the number of employees designated to this specific task (after all, there are no sensors detecting which team crosses a goal line, interpreting the officials, and linking the clock and the scoreboard, yet).


We began testing our theory by tracking college football games and noticed that Twitter could accurately recall the score, but our algorithm would do so a minute after the touchdown. (In a football game, after a touchdown is scored, there is an extra point attempt, which could add an extra point, or two depending on the team, that team’s approach, and the situation. The majority of the tweets did not recognize the touchdown until after the extra point was made.)

Because kicked extra points are near certainties (percentages in the high 90’s), the impression is that a touchdown is worth 7 points. For this reason, scores are reported as 7-0 and 14-0, instead of 6-0, 7-0, 13-0, and then 14-0. For the football fan, this is common. And because football fans are tweeting, they want to express touchdowns in the full 7 points. More desirable than being consistent (with the general trends of what one is tweeting about), is being correct. When tweeting about something like score, the inclination to tweet correctly is large, forcing those who tweet about it to be accurate with it. These ideas combined to force the tweeting crowd to wait for almost a minute, until the extra point is made, to tweet the score.

We also noticed that the volume was decreased for meaningless scores. While the score could be extracted when it changed from 7-0 to 7-7, the number of those who tweeted this were fewer (than when the score became 7-0). The score meant less (the game became even again; one team lost the lead, the other was not ahead, and it was early). In this case, tweeters like to tweet something that means something.

Gameplay extracted from tweets

Other Issues in Crowdsourcing Information

For this delay issue, there was an easy fix: subtract the time by a minute to understand when the score happened. However, this exercise taught us about how majority of people like to tweet confidently, accurately, and emphatically when it relates to sports scores.

Other behavioral trends can be seen with other subject matters, but they are something that needs to be understood when interpreting the results or emotions of the masses. With the number of experiments, we were able to dig and find other issues that could produce errors, or misrepresentations, in crowdsourcing information. Beyond processing volume or understanding geographical tag discrepancies of tweets and users (twitter related dilemmas), there are elements of crowd behaviors that could contribute to errors, or fraudulent representations.

"That last retweet, some of you wouldn't understand."

Retweets (increase in volume for unknown reason)

Retweets can be a great source of volume to represent an accurate factual message. However, retweets can happen sarcastically as well, to send the opposite message. Also, something can be retweeted indicating that it is not factual. And retweets can happen because the source is popular. The likelihood of this happening for a sports score is slim, but for political issues, this happens frequently. Because retweets quote verbatim, there is no difference between the original message and the messages that copy it. Therefore, there is no indication of whether this is retweeted for accuracy, sarcasm, fallacy, or for no reason at all.

"who's going to win? retweet so more people can vote"

Marketing (increase in volume from only specific sample)

In attempt to attract viewers and clicks, many marketing campaigns link to twitter. It is not difficult to remove general marketing tweets (so this will be ignored here); however, some could represent meaningful information. For example, during a political event, a reputable newspaper allowed survey takers to tweet how they would vote in the upcoming election. Tweeting surveys happens pregame for sporting events as well. The tweet will include a link to the survey; so, much like a retweet, these tweets will spread quickly. This spread results in a jump of volume. This jump in volume can be problematic because the marketing campaign might not reach everyone, in fact, it could be generated by a non-neutral party (if an opinion is sought), or a source that is geographically bounded. For this reason, the number and sentiment of tweets might misrepresent the overall opinion of users.

"Can you recommend anyone for this job?"

Uniformity of Tweets (not using the appropriate semantics)

Another issue is that the tweeting style of people differs. A specific example is the character limit, there are many cases when people use multiple tweets to send one message. While this issue can be resolved fairly easily, when a large volume of tweets are imported, the issue can be magnified by retweets. A message can be represented and then altered or completely adjusted by a follow up tweet, which takes time to go through. Another example is that tweeting through emojis can be done with accurate emotional tones or sarcastically, depending on the user and situation. Understanding the sentiment of specific emojis in combination with others and the words can become extremely important. Also, the use of hashtags can categorize comments or even express a point of view, but the understanding of these hashtags is not necessarily uniform across the population. A final example is that tweets come in different languages, and some include multiple languages. Technically, each must be understood to capture each message.

"When you typo in your hashtag"

Typos (unknowingly not using appropriate semantics)

A big issue, when there is a lack of quality control, is the typo. The typo can turn a sentence from the writer’s mind into a different sentence to the reader’s mind. These differences can be simple, meaningless gibberish or comprehensible expressions of the exact opposite thing. Again, retweets can magnify this issue. This idea of a typo can overlap with uniformity of meaning of emojis or words as well; the reason it is listed separately is because there are many cases when the user themselves is unsure of what they have said (inconsistent with what they meant).

"so y'all gone RT the tweet with the typo... ok."

Twitter Population (samples might not represent population)

The next issue in crowdsourcing through tweets is that the population of the people who tweet is not representative of the population as a whole. Unfortunately, the people who tweet do not represent Twitter as a whole, and then people who have accounts do not represent the population by means of age, race, geography, or social status. Even if everyone had the means and desire to tweet, not all people allow their accounts to be represented via geographical indicators. For these reasons, opinions and ideas that are represented through tweets might not be indicative of the entire population.

ambiguous tweet


Mining of Twitter and other social media outlets allows for the efficient extraction of factual information, and it can also be used to develop an interpretation of sentiment of topics. However, there are many factors which the users must be aware of in order to distill the imported wealth of data to create, or relay, an accurate digestible message. Through experience, we have been able to remove a few of these issues, and we continue to reduce others as we move forward with each project. In general, these issues do not distort the representations, but their understanding can further help portray specific moments of importance during significant events.