Stay Curious

SIGN UP FOR OUR WEEKLY NEWSLETTER AND UNLOCK ONE MORE ARTICLE FOR FREE.

Sign Up

VIEW OUR Privacy Policy


Discover Magazine Logo

WANT MORE? KEEP READING FOR AS LOW AS $1.99!

Subscribe

ALREADY A SUBSCRIBER?

FIND MY SUBSCRIPTION
Advertisement

How Data Mining Visualizes Story Lines in the Twittersphere

A vast new dataset reveals the popularity of words and phrases on Twitter and how they change over time.

Credit: PopTika/Shutterstock

Newsletter

Sign up for our email newsletter for the latest science news

Sign Up

One curious side-effect of the work to digitize books and historical texts is the ability to search these databases for words, when they first appeared and how their frequency of use has changed over time.

The Google Books n-gram corpus is a good example (an n-gram is a sequence of n words). Enter a word or phrase and it’ll show you its relative usage frequency since 1800. For example, the word “Frankenstein” first appeared in the late 1810s and has grown in popularity ever since.

By contrast, the phrase “Harry Potter” appeared in the late 1990s, gained quickly in popularity but never overtook Frankenstein — or Dracula, for that matter. That may be something of surprise given the unprecedented global popularity of J.K. Rowling’s teenage wizard.

And therein lies the problem with a database founded on an old-fashioned, paper-based technology. The Google Books corpus records “Harry Potter” once for each ...

Stay Curious

JoinOur List

Sign up for our weekly science updates

View our Privacy Policy

SubscribeTo The Magazine

Save up to 40% off the cover price when you subscribe to Discover magazine.

Subscribe
Advertisement

0 Free Articles