A month ago I pointed to a short communication in Nature Genetics which highlighted differences in the patterns of variation between the X chromosome and the autosome. I thought it would be of interest to revisit this, because it's a relatively short piece with precise and crisp results which we can ruminate upon.
Sometimes there is a disjunction between how evolutionary biologists and molecular biologists use terms like "gene." The issue is explored in depth in Andrew Brown's The Darwin Wars. Brown observes that one of the problems with Richard Dawkins' style of exposition is that it did not translate well to the American context. He spoke of genes as units of analysis, from which logical inferences could be made. This was the classical Oxford style of evolutionary biology which Ernst Mayr objected to. In contrast American biologists were used to thinking of genes in more concrete biophysical terms, and tended to miss the theoretical context which Dawkins was alluding to in his arguments. In Dawkins' defense, it must be remembered that the gene does have its origins as an abstract entity whose biophysical substrate, DNA, was not known for decades. In my post Simple rules for inclusive fitness I outlined a paper which is very much in keeping with the analytic tradition. Start with an abstract model and allow the chain of inferences to be made, and see where it takes you. But biology is obviously more than just armchair analysis. Though there were always quantitative thinkers such as R. A. Fisher who made great contributions to the field, naturalists and anatomists such as Charles Darwin and Thomas Huxley were the predecessors of the vast majority of working biologists today. Even molecular biologists arguably descend from the laboratories of the physiological geneticists of the early 20th century. With a more robust understanding of the biophysical embeddedness of genetic inheritance in genomic structures a new dimension has been added to analysis from first principles. The fact that genetics is mediated through chromosomes matters. One obvious aspect in mammals is that males are the heterogametic sex. All males have one X chromosome, while all females have two. This means that an X chromosomal lineage will "spend" 2/3 of its "time" in females, all things equal. This physical reality has been spun out to fascinating effect by evolutionary biologists, outlined in Matt Ridley's The Red Queen: Sex and the Evolution of Human Nature. In this way the concrete nature of genetics has yielded another axiom to insert into the analytic engine of evolutionary biology. Of late one finding that has been emerging out of the area of human evolutionary genomics is that the X chromosome may experience selective dynamics differently than the autosome. This possibility turns out to be critical in explaining some strange results reviewed in the communication. The ratio of human X chromosome to autosome diversity is positively correlated with genetic distance from genes:
The ratio of X-linked to autosomal diversity was estimated from an analysis of six human genome sequences and found to deviate from the expected value of 0.75. However, the direction of this deviation depends on whether a particular sequence is close to or far from the nearest gene. This pattern may be explained by stronger locally acting selection on X-linked genes compared with autosomal genes, combined with larger effective population sizes for females than for males.
At issue are some statistics presented in two papers, Accelerated genetic drift on chromosome X during the human dispersal out of Africa and Sex-biased evolutionary forces shape genomic patterns of human diversity. The first group found much less nucelotide diversity, π, on the X chromosome than the second group. From π and D, the divergence between humans and a primate outgroup, they could ascertain a rough proxy of effective population size. The smaller the effective population, the less nucleotide diversity as drift will tend to expunge variation out of the genome. The ratio of effective population size inferred from the X and the autosomes was given by NeX/NeA. The value was on the interval 0.65-0.75 for the first group (depending on the human samples used), and 0.75-1.08 for the second group. From this the first group concluded that female effective population sizes were smaller for our species. Recall that the X spends more time in females. Naturally from their results the second group concluded that there were more breeding females. In science it is not optimal when two groups come into conflict when looking ostensibly at the same question using similar methods. But, their were subtle differences in their methods which may have biased the results. The first group looked at large regions of the genome, while the second group focused on intergenic regions with recombination. In the second case the aim was to look for patterns of variation away from genes which might have been targets of natural selection (recombination would break apart associations). The logic of the first group was presumably that increasing the proportion of the genome surveyed would mitigate the distorting affect of a few genes which had been subject to natural selection. To check for this they examined the statistics when constraining the data set to regions far away from genes. This did not change their finding.
But the second group, which submitted this communication to Nature Genetics, observes that it is not physical distance which is the appropriate variable, but genetic distance. To explore this question they looked at the dependence of π upon genetic distance from genes, and it is clear that the difference in π between the X and the autosome decreases as a function of genetic distance. Obviously the X chromosome has sharply reduced genetic variation near the genes. Why? The general answer is that the X chromosome experiences selective pressures differently from the autosome. That is a function of the cytogenetics of mammals. Unlike the autosomes a substantial minority of X chromosomes are exposed to the full force of natural selection in the haploid state. To make this concrete males have only one copy of the X, so positive or negative fitness implications would have a much stronger immediate impact. The negative aspect is famous from "sex-linked diseases," where sons inherit defective genetic variants from their mothers, who are carriers. Since the mothers have two X chromosomes they do not manifest illness. But the positive impact is that if there is a favored allele, and it only expresses recessively, then natural selection is going to be much more efficacious on the X chromosome because a substantial minority of allelic variants will express even at low frequencies. The problem with recessive traits being the primary target of positive selection is that at low frequency the traits almost never express. If, on the other hand, the frequency of the allele rose because of its exposure in males, then that would have a positive feedback loop effect as more and more females would also express the trait in the homozygous state. In sum the authors conclude that different regions of the X chromosome are telling us different stories. Genic regions are witness to the powerful impact of natural selection upon the genome. In contrast, neutral sites are representative of the demographic history of the species, and in particular its females. I'll let them finish:
If this hypothesis is correct, multiple evolutionary processes may confound inferences based on wholesale comparisons of full genome sequence data. If we wish to disentangle the history of selection, recombination and demography, a targeted set of carefully chosen regions at sufficient genetic distances from functional elements is needed. Intriguingly, at least for the human X chromosome, the signature left solely by demographic history may be hidden in the small fraction of selectively neutral polymorphisms that reside far from genes.
Hammer MF, Woerner AE, Mendez FL, Watkins JC, Cox MP, & Wall JD (2010). The ratio of human X chromosome to autosome diversity is positively correlated with genetic distance from genes. Nature genetics PMID: 20802480