(Credit: wavebreakmedia/Shutterstock) Drunk tweets, long considered an unfortunate, yet ubiquitous, byproduct of the social media age, have finally been put to good use. With the help of a machine-learning algorithm, researchers from the University of Rochester cross-referenced tweets mentioning alcohol consumption with geo-tagging information to broadly analyze human drinking behavior. They were able to estimate where and when people imbibed, and, to a limited extent, how they behaved under the influence. The experiment is more than a social critique — the algorithm helps researchers spot drinking patterns that could inform public health decisions, and could be applied to a range of other human behaviors.
To begin with, the researchers sorted through a selection of tweets from both New York City and rural New York with the help of Amazon's Mechanical Turk. Users identified tweets related to drinking and picked out keywords, such as "drunk," "vodka" and "get wasted," to train an algorithm. They put each relevant tweet through a series of increasingly stringent questions to home in on tweets that not only referenced the author drinking, but indicated that they were doing so while sending the tweet. That way, they could determine whether a person was actually tweeting and drinking, or just sending tweets about drinking. Once they had built up a dependable database of keywords, they were able to fine-tune their algorithm so it could recognize words and locations that likely proved people were drinking. To get tweeters' locations, they used only tweets that had been geo-tagged with Twitter's "check-in" feature. They then approximated users' home locations by checking where they were when they sent tweets in the evenings, in addition to tweets containing words like "home" or "bed." This let them know whether users' preferred to drink at home or out at bars or restaurants.
Heat maps show where people were drinking and tweeting tweets. In New York City, the drinking hot spots are Lower Manhattan and it’s surroundings. In Monroe County, they are downtown Rochester (center) and the city of Brockport (left) are the places where people grabbed a drink. (Credit: Hossain, et al) Combining these two datasets gave the researchers a broad idea of how many people in a given area or at a given time were drinking. Not surprisingly, they found a correlation between the number of bars and how much people drank — more bars meant more drunk people. New York City saw a stronger correlation between the two, proving that people in the big city really do like to drink more. Rather paradoxically, their data also showed that city dwellers were more likely to tweet about drinking at home as well. Their work builds on previous studies that attempted to tie people's tweets to specific activities and locations. By using the check-in feature, they say that their system is much more accurate than others, and can reliably place people within a block of their actual location. They published their work on the pre-print server arXiv.
Twitter Offers a Rich Dataset
Knowing just how many people are drunk at a given time may be entertaining, but the researchers say this experiment was meant to prove that an algorithm could track a broad range of behaviors using widely available data. Other actions that people record on Twitter, such as eating, shopping or exercising, are possible targets for machine-learning algorithms to comb through and analyze. Potentially, anything with an associated hashtag or keyword could be tracked. There are a few obvious drawbacks to using Twitter as a source of behavioral information, however. As the researchers note, the demographics of Twitter users tend to skew younger and more minority than the rest of the United States, meaning that any dataset drawn from the service will disproportionally represent those groups. Also, certain behaviors that may be of interest to public health officials, such as drug use, are much less likely to show up in the researchers' algorithm, relying as it does on self-reporting. Still, the model shows promise as a means of gathering candid information about our habits. Twitter is a notoriously unfiltered environment that can offer a close (too close, some would say) look at our thoughts and actions. Combined with the range of tools available for data analysis, from geo-tagging to demographic breakdowns, Twitter may be a social scientist's best friend.