Living things, from bacteria to humans, depend on a workforce of proteins to carry out essential tasks within their cells. Proteins are chains of amino acids that are strung together according to instructions encoded within that most important of molecules - DNA.
The string of "letters" that make up DNA correspond to chains of amino acids, and they are read in threes, with every combination representing one of many amino acids. Until now, scientists believed that this relationship is unambiguous - within any single genome, every three-letter combination maps to one and only one amino acid. This strict one-to-one relationship is a tenet of genetics, but new research shows that it's not an absolute one.
A team of American scientists have found a surprising exception to this rule, within a sea microbe called Euplotes crassus. In its genome, one particular triplet of DNA letters can stand for one of two different amino acids - cysteine or selenocysteine - even within the same gene. It all depends on context. This is the first time that such dual-coding has been spotted in the genes of any living thing.
Before I go any further, it's probably a good idea to have a quick primer on the genetic code for non-scientists. Anyone with prior knowledge of genetics can just skip the next four paragraphs. DNA is a chain of four molecules called nucleotides - adenine, cytosine, guanine and thymine, represented by the letter A, C, G and T. These sequences are transcribed into a similar molecule called messenger RNA (mRNA), which contains three of the same nucleotides, but replaces thymine with uracil (U). It's the information coded by mRNA that is finally translated into proteins.
Proteins are built from 20 different amino acids, chained together in various combinations. In mRNA, every three letters corresponds to a specific amino acid. These three-letter combinations are called "codons", the genetic equivalent of words. For example, the codon CCC (three cytosines in a row) corresponds to the amino acid proline, while AAA (three alanines) corresponds to lysine. And some codons act as full-stops, indicating that the amino acid chain has come to an end.
This genetic code is almost universal. The same codons almost always match up to the same amino acids in tiny bacteria, tall trees and thoughtful humans. There are a few deviations from the universal template, but even then, the differences are relatively minor. Think about computer keyboards - almost all have the same configuration of keys for various letters and symbols, but some will have the @ key in a different place.
The genetic code is redundant, so that several codons represent the same single amino acid, but there are no ambiguities. There are no examples of a single codon within any genome that represents more than one amino acid. That is, until now.
The Euplotes crassus Code
Anton Turanov, Alexey Lubanov and Vladimir Gladyshev from the University of Nebraska have discovered that in Euplotes crassus, the UGA codon can mean either cysteine or selenocysteine, depending on its location in the gene.
In the universal code, UGA is a stop signal but many species use it to signify selenocysteine, an amino acid that isn't represented in the universal code. This alternative translation of UGA into selenocysteine hinges on a structure called a SECIS element. The SECIS is part of the mRNA molecule itself but sits outside the region that actually codes for amino acids. It's like a genetic Shift key - its presence changes the meaning of UGA codons that sit before it.
What makes E.crassus unique is the fact that its UGA codons can mean either selenocysteine or cysteine - a choice between two amino acids rather than one amino acid and a stop signal.
Turanov and Lubanov analysed the microbe's tRNAs -molecules with one end that recognises a specific codon and another that sticks to its corresponding amino acid. These are the decoders that translate strings of codons into strings of amino acids. It turned out that E.crassus has different tRNAs that recognise UGA - one of these matches the codon with cysteine and another matches it with selenocysteine.
Turanov and Lubanov also purified a protein from E.crassus called Tr1. Its RNA has a SECIS element and five UGA codons, and the duo found that the first four of these are translated into cysteines and the fifth into selenocysteine. Location is all-important when it comes to working out which interpretation comes out top. When Turanov and Lubanov added lots of UGA codons at sites throughout the TR1 gene, they found the vast majority were translated into cysteines. Only those inserted at the end of the gene, within its final 20 codons and near the SECIS element, were interpreted as selenocysteines.
So the SECIS element, in its Shift-key role, affects the fate of nearby UGAs. To confirm that, Turanov and Lubanov replaced the entire SECIS element in the TR1 gene with an equivalent element from a different gene and a different species. They found that this new SECIS element had a wider zone of influence; when it was introduced, UGA codons that sat outside the final 20 were translated into selenocysteines instead of cysteines.
So in E.crassus, the UGA codon is not tied to a single fate - it has a choice. It can be interpreted in two different ways, depending on its location and that of the SECIS element that influences it. One codon, two amino acids - it's a unique set-up and further proof that the genetic code, universal though it almost is, is open to expansion and evolutionary change.
Reference: A. A. Turanov, A. V. Lobanov, D. E. Fomenko, H. G. Morrison, M. L. Sogin, L. A. Klobutcher, D. L. Hatfield, V. N. Gladyshev (2009). Genetic Code Supports Targeted Insertion of Two Amino Acids by One Codon Science, 323 (5911), 259-261 DOI: 10.1126/science.1164748