Register for an account


Enter your name and email address below.

Your email address is used to log in and will not be shared or sold. Read our privacy policy.


Website access code

Enter your access code into the form field below.

If you are a Zinio, Nook, Kindle, Apple, or Google Play subscriber, you can enter your website access code to gain subscriber access. Your website access code is located in the upper right corner of the Table of Contents page of your digital edition.


Wikipedia Predicts How Movies Will Perform at the Box Office

D-briefBy Lisa RaffenspergerAugust 23, 2013 1:16 AM


Sign up for our email newsletter for the latest science news


For the hundreds of millions of dollars spent on producing movies every year, predicting how they'll perform at the box office is still more art than science. The best metric, at the moment, is nothing more than counting the number of theaters carrying the film on opening weekend. But a new method that takes the pulse of the general public via Wikipedia activity can predict how a movie will fare up to a month before it hits theaters.

From Monitoring to Prediction

The Web has produced lots of interesting real-time analytics---everything from the current moods of Twitter users

to the locations of every plane in the sky above you

. But what's less well-understood is how to use so-called "big data" for prediction. Some research teams have explored using Twitter or Google keyword volumes to predict stock market changes; Google Flu Trends

uses similar data to predict where a viral outbreak might occur. For movies, though, these methods haven't worked as well. The closest researchers have come is using Twitter activity the night before a film release to gauge its subsequent earnings

. The method was highly accurate for a small sample of movies studied. But more than 24 hours advance warning would be more useful data for the film moguls, marketers and critics who rely on these trends. For instance, a film exec might decide to change the movie's rollout strategy, or a marketer might decide some last-minute advertising is in order, based on predictions a few weeks before opening day.

A Better Model for Movies

For this retrospective study, researchers focused on films released in the United States in 2010. They found a total of 312 films with Wikipedia pages, and using the freely-available data from Wikimedia Toolserver

, they extracted three main data points:

  • number of pageviews from the time of the entry's creation until the film's release date

  • number of editors who modified the article

  • number of edits made to the article

For each film they also obtained first-weekend box office earnings via IMDb. By building a mathematical model using these factors alongside the number of opening theaters, the researchers were able to predict box office earnings with much greater precision than by using theater count alone. Their model matched up with real-world data with 77 percent accuracy, versus the 57 percent accuracy of theater-count alone. What's more, these more-accurate predictions could be made as much as a month before release date, the researchers report

in PLOS One. The method has some limitations---it is, for instance, much better at predicting the performance of blockbusters than B-movies, because a higher volume of data leads to more accurate predictions. But because everything has a Wikipedia page these days, the authors say use of Wikipedia activity to predict future outcomes could be applied to a wide range of products, from a new television series, to a new variety of soda, to whatever freaky flavor of potato chips they come up with next. So if you need to know whether the next superhero smash is going to live up to expectations, look no further than the Wikipedia buzz---just, no spoilers, please. Image by Visionstyler Press via Flickr

2 Free Articles Left

Want it all? Get unlimited access when you subscribe.


Already a subscriber? Register or Log In

Want unlimited access?

Subscribe today and save 70%


Already a subscriber? Register or Log In