When Charles Dickens wrote It was the of, it was the of, the immortal first words in A Tale of Two Cities, he can't have imagined that 21st-century computer scientists would parse his prepositions and pronouns as part of vast literary data sets. But today's researchers are studying the unimportant words in books to find important literary trends. With the meaty words taken out, language becomes a numbers game.
To see how literary styles evolve over time--a science dubbed "stylometry"--researchers led by James Hughes at Dartmouth College turned to Project Gutenberg. The site contains the full text of more than 38,000 out-of-copyright books. Researchers began their mining expedition by digging out every author who wrote after 1550, had a known date of birth and (when relevant) death, and had at least 5 English-language books digitized.
These criteria gave the researchers a set of 537 authors with 7,733 published works. But they weren't interested in every word of those books. Nouns and adjectives were out: No Kareninas or Lolitas, nothing nice or bad or beautiful, no
roads or homes or people
. Most verbs were out, except for forms of the utilitarian to be. No one could speak or walk or Fly, good Fleance!
It may seem that the researchers were stripping all the information-containing words out of the sentences, and in fact that was their goal: "Content-free" words were all they wanted. The 307-word vocabulary that remained from the books was mostly prepositions, conjunctions, and articles.
This linguistic filler, the little stitches that hold together the good stuff, is known to contain a kind of authorial fingerprint. We may not think much about these words when we're writing or speaking, but scientists can use them to define our style.
Hughes and his team used computer analysis to score each author's similarity to every other author. They found that before the late 18th century, authors's stylistic similarity didn't depend on how close to each other they lived. (Each author was represented by a single year, the midpoint between his or her birth and death.) During this time period, authors who lived in the same generation didn't influence each other's styles much more than authors who lived hundreds of years away.
But from the late 18th century to today, it was a different story. Stylistically, authors were more similar to their contemporaries than to other writers. By the late 19th century, writers closely matched the style of other writers who lived at the same time (at least according to the computers tallying up their non-content words). This influence dropped off outside of 30 years. In other words, authors who lived more than three decades away each other may as well have lived centuries away, for all the similarity between their writing.
Looking at more recent books, that window of influence seems to become even tighter. Among authors from the first half of the 20th century, the similarity of style drops off beyond just 23 years.
Over time, authors have become more and more influenced by the other authors writing at the same time. The researchers say this may simply be due to the number of books published. In the early part of their dataset, there were few enough books around that a studious person could read, well, most of them. But as more and more books were published, contemporary books made up a larger share of what was available to read. Authors have filled more and more shelves in their libraries with books by their peers--and this has made them more likely to echo each other's styles.
Because Project Gutenberg relies on public-domain material, there weren't very many authors after the mid-20th century included in this study. Looking forward, "You would expect a continued diminishing of influence," says Daniel Rockmore, the paper's senior author. Contemporary books take up an ever greater portion of what's available to read. In addition to the huge number of books published each year (more than 288,000 in the United States in 2009), there are now e-books and e-readers and Japanese Twitter novels.
A century from now, we may be able to look back and see that today's authors had an ever-condensing frame of influence. Of course, by then literary styles might only last a week. Most books will be forgotten, but every author will be a revolutionary.
James M. Hughes, Nicholas J. Foti, David C. Krakauer, & Daniel N. Rockmore (2012). Quantitative patterns of stylistic influence in the evolution of literature PNAS : 10.1073/pnas.1115407109
Image: Library of Congress from ep_jhu/Flickr