Register for an account


Enter your name and email address below.

Your email address is used to log in and will not be shared or sold. Read our privacy policy.


Website access code

Enter your access code into the form field below.

If you are a Zinio, Nook, Kindle, Apple, or Google Play subscriber, you can enter your website access code to gain subscriber access. Your website access code is located in the upper right corner of the Table of Contents page of your digital edition.


Do-It-Yourself Linguistics

InkfishBy Elizabeth PrestonDecember 21, 2010 2:19 PM


Sign up for our email newsletter for the latest science news

You may have heard about a massive new database that Google has provided to academia. Happily, they've also shared their new toy with us armchair nerds. 

Over the past several years, Google and its university partners have been scanning every book they can get their hands on into the searchable Google Books resource. Despite the lawsuits, they've collected over 15 million books. Meanwhile, a team at Harvard led by researchers Jean-Baptise Michel and Erez Lieberman Aiden has been digging through this immense trove of data and pulling out all kinds of gems.

For their first study, published last week by Science, the authors pared down the data set to only the most reliable books--excluding, for example, those with blurry scans or uncertain dates of publication. The resulting data set was 5 million books. By searching the database for words and phrases (n-grams), the researchers were able to track patterns and changes in the English language. You can read their whole study, and see all their graphs, at the link above (with a free registration). 

Among other findings, they showed how the number of English words has been steadily increasing...


When verbs with irregular forms were replaced with more regular words...



And how effectively the Nazis were able to erase Jewish artist Marc Chagall from public awareness.


Want to try it yourself? You can make your own word graphs with Google's n-gram tool. Here are a few things I've found:

While "men" vastly exceeded "women" until the 1980s, "boys" and "girls" have been better matched. The kids saw an increase in popularity in the mid-20th century, maybe when a lot of child-raising books were being written. But around the time "women" surpassed "men," "girls" also edged out "boys."


Genetics has been an increasingly popular way to explain our traits and tendencies over the past century. Before that, what did we have? Head bumps, for one thing.


Newly discovered scientific principles have a steep learning curve, then plateau once people have caught on. It remains to be seen where global warming will level off.


Luckily, we're not a generation that sits back and assumes that what happens on this planet is outside of our control.


    3 Free Articles Left

    Want it all? Get unlimited access when you subscribe.


    Already a subscriber? Register or Log In

    Want unlimited access?

    Subscribe today and save 70%


    Already a subscriber? Register or Log In