The Genome As Word Puzzle: Who’s Ready to Play?

The Loom
By Carl Zimmer
Feb 1, 2008 9:05 PMNov 5, 2019 4:49 AM

Newsletter

Sign up for our email newsletter for the latest science news
 

I'm always learning something from the readers of the Loom. Yesterday, I wrote about how scientists had inserted their names into a synthetic genome, and how such signatures would erode away like graffiti inside real organisms. But how about the opposite case--what if evolution has produced sequences of DNA that happen to form words? In the comment thread, Peter Ellis asked,

What actually is the longest word (in any language) encoded by the reference human genome? If I had the time and computer power I'd have a look... Guesstimate - it'll be somewhere in the 4-5 letter range, depending on letter frequency in the target language.

Bear in mind the rules of this game...the letters are the amino acids specified by codons (three bases of DNA). There are 20 amino acids in most living things, so you can't spell every word--or you can use alternatives, like using V for U. (Here's a table.) Ron then replied:

Just wander over to NCBI and blast to your hearts content. Taking "gvesstimate" (note the classical spelling) and checking against the protein refseq database finds: >ref|NP_939322.1| Putative peptide ABC transport system ATP-binding protein [Corynebacterium diphtheriae NCTC 13129] Length=560 GENE ID: 2649530 DIP0959 | protein coding [Corynebacterium diphtheriae NCTC 13129] (10 or fewer PubMed links) Score = 26.1 bits (54), Expect = 215, Method: Composition-based stats. Identities = 9/11 (81%), Positives = 10/11 (90%), Gaps = 0/11 (0%) Query 1 GVESSTIMATE 11 GVESS I+ATE Sbjct 278 GVESSEILATE 288 (sorry about the lack of proper formating) Knock yourself out. I do have vague recollections of someone doing something similar a long time ago, when the database was much, much smaller.

I had not heard about anyone trying this before, but it sounds like a lot of fun. I'm a complete novice when it comes to reading genomes with BLAST, so I won't try. But if anyone wants to post the longest word they can find, let's see what you get. (Maybe I'll get my word-guru brother to team up with a geneticist...that would be interesting.) If you think about it, life on Earth is probably coming up with stray words in its many genomes, which then turn to gibberish (to our eyes), only to produce new words for us to find. The four-billion-year world search, as it were. Update: Stephen Matheson offers easy step-by-step instructions. Thanks! Without a Z in the genetic code, I can't make an egotistic search for Zimmer. But here's Darwin lurking in bacteria.

1 free article left
Want More? Get unlimited access for as low as $1.99/month

Already a subscriber?

Register or Log In

1 free articleSubscribe
Discover Magazine Logo
Want more?

Keep reading for as low as $1.99!

Subscribe

Already a subscriber?

Register or Log In

More From Discover
Stay Curious
Join
Our List

Sign up for our weekly science updates.

 
Subscribe
To The Magazine

Save up to 40% off the cover price when you subscribe to Discover magazine.

Copyright © 2025 LabX Media Group