When Charles Dickens wrote It was the of, it was the of, the immortal first words in A Tale of Two Cities, he can't have imagined that 21st-century computer scientists would parse his prepositions and pronouns as part of vast literary data sets. But today's researchers are studying the unimportant words in books to find important literary trends. With the meaty words taken out, language becomes a numbers game.
To see how literary styles evolve over time--a science dubbed "stylometry"--researchers led by James Hughes at Dartmouth College turned to Project Gutenberg. The site contains the full text of more than 38,000 out-of-copyright books. Researchers began their mining expedition by digging out every author who wrote after 1550, had a known date of birth and (when relevant) death, and had at least 5 English-language books digitized.
These criteria gave the researchers a set of 537 authors with 7,733 published works. But ...