For the hundreds of millions of dollars spent on producing movies every year, predicting how they'll perform at the box office is still more art than science. The best metric, at the moment, is nothing more than counting the number of theaters carrying the film on opening weekend. But a new method that takes the pulse of the general public via Wikipedia activity can predict how a movie will fare up to a month before it hits theaters.
From Monitoring to Prediction
The Web has produced lots of interesting real-time analytics---everything from the current moods of Twitter users
. But what's less well-understood is how to use so-called "big data" for prediction. Some research teams have explored using Twitter or Google keyword volumes to predict stock market changes; Google Flu Trends
uses similar data to predict where a viral outbreak might occur. For movies, though, these methods haven't worked as well. The closest researchers have come is using Twitter activity the night before a film release to gauge its subsequent earnings
. The method was highly accurate for a small sample of movies studied. But more than 24 hours advance warning would be more useful data for the film moguls, marketers and critics who rely on these trends. For instance, a film exec might decide to change the movie's rollout strategy, or a marketer might decide some last-minute advertising is in order, based on predictions a few weeks before opening day.
A Better Model for Movies
For this retrospective study, researchers focused on films released in the United States in 2010. They found a total of 312 films with Wikipedia pages, and using the freely-available data from Wikimedia Toolserver
, they extracted three main data points:
number of pageviews from the time of the entry's creation until the film's release date
number of editors who modified the article
number of edits made to the article
For each film they also obtained first-weekend box office earnings via IMDb. By building a mathematical model using these factors alongside the number of opening theaters, the researchers were able to predict box office earnings with much greater precision than by using theater count alone. Their model matched up with real-world data with 77 percent accuracy, versus the 57 percent accuracy of theater-count alone. What's more, these more-accurate predictions could be made as much as a month before release date, the researchers report
in PLOS One. The method has some limitations---it is, for instance, much better at predicting the performance of blockbusters than B-movies, because a higher volume of data leads to more accurate predictions. But because everything has a Wikipedia page these days, the authors say use of Wikipedia activity to predict future outcomes could be applied to a wide range of products, from a new television series, to a new variety of soda, to whatever freaky flavor of potato chips they come up with next. So if you need to know whether the next superhero smash is going to live up to expectations, look no further than the Wikipedia buzz---just, no spoilers, please. Image by Visionstyler Press via Flickr