One of the great things about ADMIXTURE is that the population elements shake out of the data through the logic of the program. The worst thing is that it is then left up to you to make sense of the elements. A useful way to use ADMIXTURE and avoid excessive interpretive fogginess is to figure out individual proportions of contribution from X ancestral groups when you have a pretty good idea that an admixture event did occur between very distinct and distantly related population groups. To some extent the whole New World is a good laboratory for this process. Consider, for example, someone from the Dominican Republic or Puerto Rico. There is a good chance that their ancestry will fractionate into three elements: - An African one - An Amerindian one - A European one These three elements are sampled from very different locations geographically. The ancestral populations have been separated for tens of thousands of years, with little to no gene flow across them. This means that the allele frequencies of the "source" populations should be relatively different (maximizing Fst). A mapping of inferred allele frequencies between abstract ancestral populations generated by ADMIXTURE to concrete allele frequencies of known source populations is rather straightforward. So here's an experiment. I have 40 individuals with non-trivial African admixture. Most of them are African Americans, though some are of Latino heritage, and several of Ethiopian or Somali origin. A minority are also people who have a small quantum of African ancestry, but well above the "noise" threshold. Let's take four populations from the HapMap: Yoruba, Utah whites, Maasai, and Chinese from Beijing. I merged the data (removing problem individuals), and added the aforementioned 40 individuals. I pruned the data set so that no more than 0.5% of a given SNP is missing across the individuals. I was left with ~120,000 markers. Then I did two runs of ADMIXTURE: supervised and unsupervised. In the supervised run the HapMap populations were "pure," while in the unsupervised runs the HapMap populations also had their ancestries inferred. Here are the population breakdowns for the HapMap populations in the unsupervised run:
OK, so how did the admixed set that I have vary across the two runs? There were four ancestral components, which I labeled: - West African - European - Chinese - East African Here are the correlations between the two runs for the 40 individuals: - West African, 0.9995 - European, 0.9997 - Chinese, 0.9957 - East African, 0.9988 Not too shabby. Here are the barplots side by side:
This seems like a best-case scenario for ADMIXTURE smoking out population structure. For all the reality that ADMIXTURE is just a "dumb program," when used judiciously it can be very illuminating.