A few days ago I discussed a new paper which explores the patterns of natural selection in the genome of the X chromosome. As you know the X is "carried" disproportionately by females, as males have only one copy, so it offers up an interesting window into evolutionary dynamics (see The Red Queen for a popular treatment). Today Dienekes points me to a new paper in Genome Biology which puts the focus on the X chromosome again, Characterization of X-Linked SNP genotypic variation in globally-distributed human populations:
Background The transmission pattern of the human X chromosome reduces its population size relative to the autosomes, subjects it to disproportionate influence by female demography, and leaves X-linked mutations exposed to selection in males. As a result, the analysis of X-linked genomic variation can provide insights into the influence of demography and selection on the human genome. Here we characterize the genomic variation represented by 16,297 X-linked SNPs genotyped in the CEPH human genome diversity project samples. Results We found that X chromosomes tend to be more differentiated between human populations than autosomes with several notable exceptions. Comparisons between genetically distant populations also showed an excess of X-linked SNPs with large allele frequency differences. Combining information about these SNPs with results from tests designed to detect selective sweeps, we identified two regions that were clear outliers from the rest of the X chromosome for haplotype structure and allele frequency distribution. We were also able to more precisely define the geographical extent of some previously described X-linked selective sweeps. Conclusions The relationship between male and female demographic histories is likely to be complex as evidence supporting different conclusions can be found in the same dataset. Although demography may have contributed to the excess of SNPs with large allele frequency differences observed on the X chromosome, we believe that selection is at least partially responsible. Finally, our results reveal the geographical complexities of selective sweeps on the X chromosome and argue for the use of diverse populations in studies of selection.
The low effective population of the X chromosome and the power of drift to produce greater between population difference comes up in this paper again, as it did in the one I discussed a few days ago. What's going on here is that noisy variation has no specific direction, so random genetic variation which accumulates within the genomes of different populations will tend to be different. A given locus in a large mixed population may have many alleles, a1, a2...an, at a given locus. If you divide the population into smaller clusters which no longer have any contact, and maintain the proportions of the alleles identical to the parental populations, the frequencies will begin to drift in different directions. The probability of any allele, a, fixing to 100% is the same in all populations, but the populations will likely fix different alleles. Ergo, they will start to exhibit greater between population differences. This is easily illustrated visually. The colors below represent different alleles. In the parental population three alleles are extant at 1/3, and in the initial daughter populations they are also at 1/3. Over time one notes that in these smaller populations different alleles fix, and the variance between the populations increases. If the X chromosome always is assumed to have a smaller effective population size, then it would be more strongly shaped by these dynamics than the autosome.
In this paper they confirmed that the X chromosome exhibits greater between population variance, more or less. The table below uses the HGDP data set and clusters populations by region to produce within and between population genetic variance statistics. Since the font is small, I will tell you in general it confirms that the X chromosome shows more between population variance than the autosomal genome when comparing between continents, but the pattern was less clear within continents, and for East Asian populations the X chromosome is actually less differentiated than the autosome.
There are patterns in the data here, but it's a little more complex and stark than the assertion that the X is always more variant between population groups. The authors wisely advise caution in overly general pronouncements on the nature of demographic processes due to inferences made from genomic data, since those inferences may be highly sensitive to population. They also wondered if the X chromosome showed difference patterns of population genetic substructure than the autosome, so they compared the X with chromosome 16 using frappe. The X are the two top panels, chromosome 16 the second two. The plot below shows K = 7, that is, 7 putative ancestral groups.
Specific populations are less relevant than that the X chromosome and chromosome 16 seems to exhibit pretty much the same pattern. There are some differences between populations, which might reflect sex-mediated migration or mating. The 7 K clusters seem to map onto 7 geographical regions, Africa, the Middle East, Europe, Central Asia, East Asia, Oceania, and America. I can't understand why the Hazara or Uyghur would be more East Asian on the X than the autosome from history though. In particular in the case of the Hazara one assumes that this admixed group derives from the mating of West Eurasian (Persian) women and East Asian (Mongol) men (an inverse Saberi). There may be limitations of the sample size or the SNPs in their data set. Next they looked at pairwise allele frequency differences, δ. In short, the bigger the allele frequency differences, the bigger the δ. Our prior assumption is that there will be more high δ results on the X than the autosomes. This is correct, in particular for African vs. non-African pairs. The table below shows the high δ values for three extreme distinct and differentiated populations, the French, Han and Yoruba.
They note that the high δ alleles on the X come in clusters. This is what was reported in the other paper as well. Additionally, it is evident from comparing the high δ SNPs with the total number of SNPs on the autosome and X that the X is enriched for between population differences in allele frequency. Not surprising, but nice to be validated. But the next part is a little complicated. They wondered if the between population differences were simply due to differences in sex effective population size and sex ratio of migration, Nf/N and mf/m. Remember that if the female effective population is low, that will reduce the effective population of the X because 2/3 of the X are in females. Similarly, strong bias toward male or female migration and results in gene flow across populations will influence the ratio of δ values on autosomes and the X. They conclude from their model that demography can not explain the between population differences. Rather, they strongly suggest that between population differences may be due to natural selection. The second table above shows evidence that X high δ markers are overrepresented as genic SNPs; that is, mutations which might actually produce coding changes. This is strongly suggestive of selection. Additionally, they found that there was a skew toward derived SNPs among high δ regions on the X for Africans in relation to the autosomal regions. Finally, they looked at their data set for signatures of natural selection using haplotype based tests, iHS, CLR, and XP-EHH. The latter two detect selective sweeps which are almost complete, that is, the adaptive allele is nearly at 100%. By contrast iHS tends to be better at detecting alleles where the sweep is partially complete. On the X chromosome they found an association between high δ regions, and positive results for the haplotype based tests of natural selection. After they fixed in on specific regions where the various methods intersected, they surveyed the literature for genes in that region which might be of adaptive and/or functional significance. I will leave it to you to look over the genes in detail, but it is interesting to note that one of the genes is a relation of EDAR, though the significance is left rather fuzzy. The main upshot of this paper seems to be that there are multiple pointers that the peculiarities of the X chromosome can not be placed at the feet of demographic parameters. That is, some researchers have assumed that the prevalence of patrilocality and polygyny in relation to matrilocality and polyandry, combined with the structural fact that the X is disproportionately carried in females, can explain the differences in patterns of genetic variation. The data here suggest that natural selection may be a necessary supplement to explain what we see. Specifically, the authors point to one way in which the X is exposed to selection to a greater extent than the autosomal genome: males are haploid for most of the X chromosomal genes because they only have one copy, so recessively expressed traits are always expressed in males. The sex linkage of traits such as color blindness are the most well known result of this phenomenon. Let's make this concrete. Assume that a gene comes in two flavors, a and b, and that a homozygote b produces a lethal trait. So: Frequency of b = 50% in parents, 25% of offspring die Frequency of b = 10% in parents, 1% of offspring die Frequency of b = 1% in parents, 0.01% of offspring die Frequency of b = 50% in parents, 50% of alleles exposed to selection Frequency of b = 10% in parents, 18% of alleles exposed to selection Frequency of b = 1% in parents, 2% of alleles exposed to selection As the frequency of the allele decreases, more and more of the copies of the allele are "masked" from selection in the form of heterozygotes. By contrast, in the X chromosome 1/3 of the copies are not masked because they are carried by males, who are operationally haploid. This means that in the case of the X chromosome conventionally recessive traits are not always recessive, and so selection is potentially more efficacious in driving allele frequencies to fixation. Ergo, some of the clusters and regions of between population genomic difference are likely due to local adaptation. Note: This paper extensively references the framework outlined in twopapers which came out of the Pritchard lab earlier this year. Citation: Characterization of X-Linked SNP genotypic variation in globally-distributed human populations, Amanda M Casto , Jun Z Li , Devin Absher , Richard Myers , Sohini Ramachandran, Marcus W Feldman, Genome Biology 2010, 11:R10doi:10.1186/gb-2010-11-1-r10