Register for an account

X

Enter your name and email address below.

Your email address is used to log in and will not be shared or sold. Read our privacy policy.

X

Website access code

Enter your access code into the form field below.

If you are a Zinio, Nook, Kindle, Apple, or Google Play subscriber, you can enter your website access code to gain subscriber access. Your website access code is located in the upper right corner of the Table of Contents page of your digital edition.

Health

A best case scenario for unsupervised ADMIXTURE?

Gene ExpressionBy Razib KhanApril 7, 2011 11:59 PM

Newsletter

Sign up for our email newsletter for the latest science news

One of the great things about ADMIXTURE is that the population elements shake out of the data through the logic of the program. The worst thing is that it is then left up to you to make sense of the elements. A useful way to use ADMIXTURE and avoid excessive interpretive fogginess is to figure out individual proportions of contribution from X ancestral groups when you have a pretty good idea that an admixture event did occur between very distinct and distantly related population groups. To some extent the whole New World is a good laboratory for this process. Consider, for example, someone from the Dominican Republic or Puerto Rico. There is a good chance that their ancestry will fractionate into three elements: - An African one - An Amerindian one - A European one These three elements are sampled from very different locations geographically. The ancestral populations have been separated for tens of thousands of years, with little to no gene flow across them. This means that the allele frequencies of the "source" populations should be relatively different (maximizing Fst). A mapping of inferred allele frequencies between abstract ancestral populations generated by ADMIXTURE to concrete allele frequencies of known source populations is rather straightforward. So here's an experiment. I have 40 individuals with non-trivial African admixture. Most of them are African Americans, though some are of Latino heritage, and several of Ethiopian or Somali origin. A minority are also people who have a small quantum of African ancestry, but well above the "noise" threshold. Let's take four populations from the HapMap: Yoruba, Utah whites, Maasai, and Chinese from Beijing. I merged the data (removing problem individuals), and added the aforementioned 40 individuals. I pruned the data set so that no more than 0.5% of a given SNP is missing across the individuals. I was left with ~120,000 markers. Then I did two runs of ADMIXTURE: supervised and unsupervised. In the supervised run the HapMap populations were "pure," while in the unsupervised runs the HapMap populations also had their ancestries inferred. Here are the population breakdowns for the HapMap populations in the unsupervised run:

Unsuppops.png

The Maasai are the only group with much intrapopulation variance:

masvar.png

OK, so how did the admixed set that I have vary across the two runs? There were four ancestral components, which I labeled: - West African - European - Chinese - East African Here are the correlations between the two runs for the 40 individuals: - West African, 0.9995 - European, 0.9997 - Chinese, 0.9957 - East African, 0.9988 Not too shabby. Here are the barplots side by side:

sup3.jpg

Here are the runs so you can see them:

sup1.jpg

sup2.jpg

This seems like a best-case scenario for ADMIXTURE smoking out population structure. For all the reality that ADMIXTURE is just a "dumb program," when used judiciously it can be very illuminating.

2 Free Articles Left

Want it all? Get unlimited access when you subscribe.

Subscribe

Already a subscriber? Register or Log In

Want unlimited access?

Subscribe today and save 70%

Subscribe

Already a subscriber? Register or Log In