It is well known that average levels of population structure are higher on the X chromosome compared to autosomes in humans. However, there have been surprisingly few analyses on the spatial distribution of population structure along the X chromosome. With publicly available data from the HapMap Project and Perlegen Sciences, we show a strikingly punctuated pattern of X chromosome population structure. Specifically, 87% of X-linked HapMap SNPs within the top 1% of FST values cluster into five distinct loci. The largest of these regions spans 5.4 Mb and contains 66% of the most highly differentiated HapMap SNPs on the X chromosome. We demonstrate that the extreme clustering of highly differentiated SNPs on the X chromosome is not an artifact of ascertainment bias, nor is it specific to the populations genotyped in the HapMap Project. Rather, additional analyses and resequencing data suggest that these five regions have been substrates of recent and strong adaptive evolution. Finally, we discuss the implications that patterns of X-linked population structure have on the evolutionary history of African populations.
Remember that Fst is measuring the genetic variance between and within populations. As Fst approaches 1, that means all the variance can be partitioned between groups. For example: Population A: Allele frequency 1 = 1.0 Allele frequency 2 = 0.0 -------------------------------------------------------------------------- Population B: Allele frequency 1 = 0.0 Allele frequency 2 = 1.0 All the variance is between the populations, not within them. There's no difference within the population, so it works by definition. By contrast, Fst approaches 0 when all the variance is within the population, and not between. For example: Population A: Allele frequency 1 = 0.5 Allele frequency 2 = 0.5 -------------------------------------------------------------------------- Population B: Allele frequency 1 = 0.5 Allele frequency 2 = 0.5 There's a lot of variance within both populations, but none between. In other words, Fst is telling you whether there's any point in looking at population substructure. In the latter case obviously you can throw everything into a big bin and not lose any information (assuming HWE in both). In the first case, pooling the populations together would mask the fact that there's lot of between population variance, which might be important. In the paper they note that that between population variance in the form of higher Fst has a larger basal value in the X chromosome, likely because the X has a smaller long term effective population size. Remember that males have only one X, and we confer only one X to our offspring. There are fewer copies of the X floating around than autosomal chromosomes, those which are not sex chromosomes. This naturally reduces the long term effective population, and so makes the X more susceptible to stochastic fluctuations in frequency such as random genetic drift. When populations are separated and there is minimal gene flow genetic drift will generally increase between population variance. There's a large space to "random walk" across in terms of gene frequency, and turnover of neutral alleles will produce very different patterns of variation (consider the random patterns generated by scattershot firing of a gun; noise is diverse). But the authors of this paper felt that they saw something else. Natural selection acting upon genomic regions, fixing particular alleles, producing between population variation. Here's a figure which illustrates the variation in Fst across the X chromosome. The top two panels are for the HapMap dataset, while the bottom two are for the Perlegen. Additionally, the second of each pair shows the cluster of loci above the 99th percentile in Fst across the genome.
And here are the genes around the high-Fst clusters:
Many of these genes sit in regions which exhibit haplotypes which are on the order of 500 kb long, so no surprising that some SNPs within these genes have popped up on tests for detecting natural selection based haplotype structure. All but one of the genes above are at higher frequency in the derived form in Eurasians than in Africans. Derived as in the younger mutant variant has increased in frequency and replaced the older variant. Interestingly in Africans the centromeric variant is derived. Here are the frequencies for an SNP at that locus from the HGDP dataset:
black = ancestral white = derived The authors note that the derived variant in Africans is not a function of Bantu ancestry. In other words, there isn't a simple demographic explanation of this pattern. Here are the authors in the discussion:
The modern Recent African Origin model for human evolution explains the high genetic variation in contemporary African populations, relative to genomic regions with sharply reduced variation in non-Africans, by presupposing that human migrations out of Africa involved strong founder effects. Hence, a combination of genetic drift and local adaptation can readily account for the existence of derived alleles at high frequencies in non-African populations but low frequencies within Africa. Much less is known about African population history, particularly in the past 50,000-100,000 years during which founders of contemporary non-African populations emigrated into Europe and Asia. Our results suggest that a single African population, ancestral to contemporary Africans, may have remained a relatively coherent and local entity long enough for natural selection to sweep the cluster of derived alleles we describe to near fixation.This process would have occurred either after the initial out-of-Africa migrations or, equally as plausible based on current data, in an African population different than the one from which these out-of-Africa migrations occurred. Under this model, the ancestral African population would necessarily have been large to account for both the levels of variation and substructure evident in contemporary African populations.
It is common to say that "we are all Africans." That Bushmen, for example, are the most "ancient humans." This seems to presuppose that Africans have been genetically stationary, while other groups have gone their own way. But the frequency of the Duffy allele in Africa, a response to malaria which emerged in the last 10,000 years, falsifies this simplistic narrative. All human populations are equally ancient, and have derived from ancestral populations. There are no living fossils. It is genes, in the form of ancestral alleles, which may be envisaged as "living fossils," not peoples (though some of these genes are subject to great functional constraint, which means that you want to fossilize the good). Citation: Lambert, Charla A.; Connelly, Caitlin F.; Madeoy, Jennifer; Qiu, Ruolan; Olson, Maynard V.; Akey, Joshua M. doi:10.1016/j.ajhg.2009.12.002