There's a new paper in Nature (OPEN ACCESS), Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project:
...First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome....
From Eureka Alert, New findings challenge established views on human genome:
The ENCODE consortium's major findings include the discovery that the majority of DNA in the human genome is transcribed into functional molecules, called RNA, and that these transcripts extensively overlap one another. This broad pattern of transcription challenges the long-standing view that the human genome consists of a relatively small set of discrete genes, along with a vast amount of so-called junk DNA that is not biologically active. The new data indicate the genome contains very little unused sequences and, in fact, is a complex, interwoven network. In this network, genes are just one of many types of DNA sequences that have a functional impact. "Our perspective of transcription and genes may have to evolve," the researchers state in their Nature paper, noting the network model of the genome "poses some interesting mechanistic questions" that have yet to be answered.
. From an evolutionary viewpoint it also seemed a bit peculiar to relegate most of the genome to non-functional status, after all, why was it still around after all this time? Evolution is a noisy process that is predicated on "good enough" local solutions, but it seemed a little bit of a stretch to believe that this is the best that various evolutionary dynamics could come up with. Speaking of which:
Other surprises in the ENCODE data have major implications for our understanding of the evolution of genomes, particularly mammalian genomes. Until recently, researchers had thought that most of the DNA sequences important for biological function would be in areas of the genome most subject to evolutionary constraint - that is, most likely to be conserved as species evolve. However, the ENCODE effort found about half of functional elements in the human genome do not appear to have been obviously constrained during evolution, at least when examined by current methods used by computational biologists. According to ENCODE researchers, this lack of evolutionary constraint may indicate that many species' genomes contain a pool of functional elements, including RNA transcripts, that provide no specific benefits in terms of survival or reproduction. As this pool turns over during evolutionary time, researchers speculate it may serve as a "warehouse for natural selection" by acting as a source of functional elements unique to each species and of elements that perform the similar functions among species despite having sequences that appear dissimilar.
The old view promoted by R.A. Fisher was that most of the genome (OK, they didn't know about the "genome" then, but you get the picture) would be constrained by selective forces, as new mutants would invariably be deleterious. On occasion a selectively favored mutation would arise that would increase in frequency and quickly "substitute" in place of the previous allele on that locus, resulting in a slow and gradual turnover of the genome. Neutral and nearly neutral theory supplemented or overturned (depending on your perspective and scale of focus) the classical model by positing that mutations with little selective import were responsible for the preponderant number of substitutions at any given locus over evolutionary time. The implication here is that evolutionary change would be roughly proportional to the rate of mutation. My posts on genetic draft add another process to the toolkit of evolutionary dynamics, as the sweeps drive reorganizations of the genome adjacent to the area favored by selection. Now this finding that much of the functionally relevant genome is not under strong constraint will surely be fruit for many hypotheses. Perhaps selection is more pluralistic than we thought? Or perhaps the long arm of evolution implicitly sweeps across the contingencies of adaptive peaks over the horizon? In any case, my first instinct to infer that Fisher was wrong to assume that one fitness peak dominated the landscape and that only a very precise genetic conformation would yield the optimal phenotype. We know that this seems untrue for human skin color, as multiple alternative genetic events converged upon the same physical outcome. Update: To clear up some confused prose above, from the paper itself:
Instead, we hypothesize five biological reasons to account for the presence of large amounts of unconstrained functional elements. The first two are particular to certain biological assays in which the elements being measured are connected to but do not coincide with the analysed region. An example of this is the parent transcript of an miRNA, where the current assays detect the exons (some of which are not under evolutionary selection), whereas the intronic miRNA actually harbours the constrained bases. Nevertheless, the transcript sequence provides the critical coupling between the regulated promoter and the miRNA. The sliding of transcription factors (which might bind a specific sequence but then migrate along the DNA) or the processivity of histone modifications across chromatin are more exotic examples of this. A related, second hypothesis is that delocalized behaviours of the genome, such as general chromatin accessibility, may be maintained by some biochemical processes (such as transcription of intergenic regions or specific factor binding) without the requirement for specific sequence elements. These two explanations of both connected components and diffuse components related to, but not coincident with, constrained sequences are particularly relevant for the considerable amount of unannotated and unconstrained transcripts. The other three hypotheses may be more general--the presence of neutral (or near neutral) biochemical elements, of lineage-specific functional elements, and of functionally conserved but non-orthologous elements. We believe there is a considerable proportion of neutral biochemically active elements that do not confer a selective advantage or disadvantage to the organism. This neutral pool of sequence elements may turn over during evolutionary time, emerging via certain mutations and disappearing by others. The size of the neutral pool would largely be determined by the rate of emergence and extinction through chance events; low information-content elements, such as transcription factor-binding sites110 will have larger neutral pools. Second, from this neutral pool, some elements might occasionally acquire a biological role and so come under evolutionary selection. The acquisition of a new biological role would then create a lineage-specific element. Finally, a neutral element from the general pool could also become a peer of an existing selected functional element and either of the two elements could then be removed by chance. If the older element is removed, the newer element has, in essence, been conserved without using orthologous bases, providing a conserved function in the absence of constrained sequences. For example, a common HNF4A binding site in the human and mouse genomes may not reflect orthologous human and mouse bases, though the presence of an HNF4A site in that region was evolutionarily selected for in both lineages. Note that both the neutral turnover of elements and the 'functional peering' of elements has been suggested for cis-acting regulatory elements in Drosophila115, 116 and mammals110. Our data support these hypotheses, and we have generalized this idea over many different functional elements. The presence of conserved function encoded by conserved orthologous bases is a commonplace assumption in comparative genomics; our findings indicate that there could be a sizable set of functionally conserved but non-orthologous elements in the human genome, and that these seem unconstrained across mammals. Functional data akin to the ENCODE Project on other related species, such as mouse, would be critical to understanding the rate of such functionally conserved but non-orthologous elements.
After reading the whole paper more closely I feel like there need to be 5 or 6 titles, there's so much stuff packed into that paper. Related: Keep track of this via google news, it'll be big. John Timmer at Ars Technica is not happy.