Credit: Karl Magnacca
The Pith: In this post I review some findings of patterns of natural selection within the Drosophila fruit fly genome. I relate them to very similar findings, though in the opposite direction, in human genomics. Different forms of natural selection and their impact on the structure of the genome are also spotlighted on the course of the review. In particular how specific methods to detect adaptation on the genomic level may be biased by assumptions of classical evolutionary genetic models are explored. Finally, I try and place these details in the broader framework of how best to understand evolutionary process in the "big picture."
A few days ago I titled a post "The evolution of man is no cartoon"
. The reason I titled it such is that as the methods become more refined and our data sets more robust it seems that previously held models of how humans evolved, and evolution's impact on our genomes, are being refined. Evolutionary genetics at its most elegantly spare can be reduced down to several general parameters. Drift, selection, migration, etc. Exogenous phenomena such as the flux in census size, or environmental variation, has a straightforward relationship to these parameters. But, to some extent the broadest truths are nearly trivial. Down to the brass tacks what are these general assertions telling us? We don't know yet. We're in a time of transitions, though not troubles.
Going back to cartoons, starting around 1970 there were a series of debates which hinged around the role of deterministic adaptive forces and random neutral ones in the domain of evolutionary process. You have probably heard terms like "adaptationist," "ultra-Darwinian," and "evolution by jerks" thrown around. All great fun, and certainly ripe "hooks" to draw the public in, but ultimately that phase in the scientific discourse seems to have been besides the point. A transient between the age of Theory when there was too little of the empirics, and now the age of Data
, when there is too little theory.
Biology is a very contingent discipline, and it may be that questions of the power of selection or the relevance of neutral forces will loom large or small dependent upon the particular tip of the tree of life to which the question is being addressed.
Evolution may not be a unitary oracle, but rather a cacophony from which we have to construct a harmonious symphony for our own mental sanity. Nature is one, an the joints which we carve out of nature's wholeness are for our own benefit. The age of molecular evolution, ushered in by the work on allozymes in the 1960s
, was just a preface to the age of genomics. If Stephen Jay Gould and Richard Dawkins were in their prime today I wonder if the complexities of the issues on hand would be too much even for their verbal fluency in terms of formulating a concise quip with which to skewer one's intellectual antagonists. Complexity does not make fodder for honest quips and barbs. You're just as liable to inflict a wound upon your own side through clumsiness of rhetoric in the thicket of the data, which fires in all directions. In any case, on this weblog I may focus on human genomics, but obviously there are other organisms in the cosmos. Because of the nature of scientific funding for reasons of biomedical application humans have now come to the fore, but there is still utility in surveying the full taxonomic landscape. As it happens a paper in PLos Genetics, which I noticed last week, is a perfect complement to the recent work on human selective sweeps. Pervasive Adaptive Protein Evolution Apparent in Diversity Patterns around Amino Acid Substitutions in Drosophila simulans
In Drosophila, multiple lines of evidence converge in suggesting that beneficial substitutions to the genome may be common. All suffer from confounding factors, however, such that the interpretation of the evidence—in particular, conclusions about the rate and strength of beneficial substitutions—remains tentative. Here, we use genome-wide polymorphism data in D. simulans and sequenced genomes of its close relatives to construct a readily interpretable characterization of the effects of positive selection: the shape of average neutral diversity around amino acid substitutions. As expected under recurrent selective sweeps, we find a trough in diversity levels around amino acid but not around synonymous substitutions, a distinctive pattern that is not expected under alternative models. This characterization is richer than previous approaches, which relied on limited summaries of the data (e.g., the slope of a scatter plot), and relates to underlying selection parameters in a straightforward way, allowing us to make more reliable inferences about the prevalence and strength of adaptation. Specifically, we develop a coalescent-based model for the shape of the entire curve and use it to infer adaptive parameters by maximum likelihood. Our inference suggests that ~13% of amino acid substitutions cause selective sweeps. Interestingly, it reveals two classes of beneficial fixations: a minority (approximately 3%) that appears to have had large selective effects and accounts for most of the reduction in diversity, and the remaining 10%, which seem to have had very weak selective effects. These estimates therefore help to reconcile the apparent conflict among previously published estimates of the strength of selection. More generally, our findings provide unequivocal evidence for strongly beneficial substitutions inDrosophila and illustrate how the rapidly accumulating genome-wide data can be leveraged to address enduring questions about the genetic basis of adaptation.
Figure 1 C shows the top line. As you can see, there's a "trough" around non-synonymous substitutions. Non-synonymous simply means that a base pair substitution at that position within the codon
changes the amino acid encoded. In contrast, a synonymous change does not. A substitution is not just a mutant variant though. It is rather an assessment of a population level shift from one allele to another. Neutral theory
posited that most substitutions were not driven by natural selection, but rather random walk processes. Ergo, most evolutionary change was not adaptive. A simple way to check the power of selection against this background of stochastic variation is to measure the ratio of substitution between non-synonymous and synonymous bases
. But this sort of thing is more appropriate when comparing closely related species. In the paper on selective sweeps in humans obviously that's not going on, they were looking within one species. Instead the authors looked at reduction of variation across regions which may have been targets of natural selection. The reduction occurs because when one particular allele becomes the target of strong positive selection it pulls along adjacent linked regions in a "hitchhiking" process. Recombination works against this, resulting in decay over time of linkage disequilibrium
which spikes in th wake of selection. But these conceptions are predicated on a simple model of the emergence of variants, and the way selection does, or doesn't, target these variants. One imagines a new mutant which arises against the ancestral genetic background. In a single-gene model the probability of fixation, that is, going to ~100% and substitution in the population, is 1/N (or 2N for diploid). In plain English the fixation probability for a mutant is inversely proportional to the effective population size. In contrast, the probability of fixation of a mutant which is selectively favored is proportional to its selection coefficient, which simply measures its fitness as a ratio to that of the population mean. The fixation of neutral variants is random walk, and the time until fixation is directly proportional to population size. In contrast, selectively favored variants can sweep to fixation rather quickly. Being very conservative one can infer that the fixation of lactose tolerance in Northern Europeans due to a mutation on the LCT gene took about ~7,000 years, or a little less than 300 generations. Because of this rapidity recombination has far less leisure with which to "chop" apart the physical associations of variants on the ancestral mutant genetic background. No wonder the LCT locus has one of the longest "haplotype blocks" in the European genome; a sequence of associate markers. But let's modify our mental model a bit. Imaging that a genetic variant has been floating around at a low frequency for a long time. There may be many copies of the mutant, associated with different genetic variants due to the impact of recombination. We can for example imagine a recessively deleterious allele which persists in low frequencies because of the lack of efficacy of selection (most alleles are found in heterozygote individuals with normal fitness). Many variants have multiple effects. Imagine that this allele has a dominant phenotypic effect which goes from being neutral to being very selectively favored. Now you have a situation where the genomic region will be dragged upward in frequency during adaptation, but, there will be many regions, not just one. Concretely, if the selective event occurred only a few generations after the original mutant the impact on the local genome would be much stronger in terms of generating homogenization than if the event occurred dozens of generations after the original mutant, as the original genetic background would have been recombined and so lost its distinctive coherency. This is a form of natural selection from "standing variation." Old mutants floating around in the background noise, rather than new mutants. In the paper above the authors find a fair amount of conventional selective sweeps, but, they suggest that the higher ratios of the proportion of the genome under natural selection found by some researchers in Drosophila may be due to the fact that some methods catch the whole basket of selection, while others focus on more tractable "cartoon" models. Of the selection which can be modeled as a classic selective weep the authors also found a "power law" effect. There was a combination of a few hits of powerful selection, and more numerous bouts of weak selection. This is not totally unexpected according to theory. Some of the human traits which have been amenable to genome-wide association, such as pigmentation, probably fall under this category. Most of the trait variance is due to a few genes of large effect, but there are a larger number of loci which account for the minority balance of variance. The same no doubt can hold across evolutionary time with the dynamics of natural selection. But we also shouldn't get lost in the genomic trees and lose sight of the forest. Not only are evolutionary processes subject to molecular scale parameters such as recombination and mutation rates, but they are also impacted by organism and population scale parameters. One presumes that fruit flies are subject to a different pressures and have had a different history from human beings, just as both have from philopatric
amphibians. Humans have an enormous census size, huge populations, and, we've undergone a massive change in lifestyle over the last 10,000 years. But as land bound mammals we may exhibit more population substructure than some species, for example birds with a wide range. Additionally, because of a low long term effective population we have only so much genic variation to work with. Such a welter of details distorts attempts at elegance, but they need to be kept in mind. The authors conclude:
In summary, our findings establish a distinctive, genome-wide signature of adaptation in D. simulans, suggesting that many amino acid substitutions are beneficial and are driven by two classes of selective effects. Enabled by a richer summary of diversity patterns that avoids an a priori choice of scale, these conclusions offer a coherent interpretation of the results of previous inferences.
It will now be interesting to see whether similar findings emerge in other Drosophila species, which vary in their recombination rates, effective population sizes, and ecology.
I wouldn't limit this just to Drosophila. Because the different fruit fly species have different distributions, natural histories, as well as common ancestral traits and genes, they're an excellent laboratory of evolution. But eventually we'll start sweeping our gazes across all the multitudinous branches of the tree of life. Soon. Citation:
Sattath S, Elyashiv E, Kolodny O, Rinott Y, & Sella G (2011). Pervasive Adaptive Protein Evolution Apparent in Diversity Patterns around Amino Acid Substitutions in Drosophila simulans PLoS Genetics : 10.1371/journal.pgen.100130