Register for an account

X

Enter your name and email address below.

Your email address is used to log in and will not be shared or sold. Read our privacy policy.

X

Website access code

Enter your access code into the form field below.

If you are a Zinio, Nook, Kindle, Apple, or Google Play subscriber, you can enter your website access code to gain subscriber access. Your website access code is located in the upper right corner of the Table of Contents page of your digital edition.

Health

Visualization of genetic distances, part n

Gene ExpressionBy Razib KhanApril 22, 2011 1:56 AM

Newsletter

Sign up for our email newsletter for the latest science news

Zack Ajmal has been taking his Reference 3 data set for a stroll over at the Harappa Ancestry Project. Or, more accurately, he's been driving his computer to crunch up ADMIXTURE results ascending up a later of K's. Because it is the Harappa Ancestry Project Zack's populations are overloaded a touch on South Asians. He managed to get a hold of the data set from Reconstructing Indian History. If you will recall this paper showed that the South Asian component which falls out of ancestry structure inference algorithms may actually be a stabilized hybrid of two ancient populations, "Ancestral North Indian" (ANI) and "Ancestral South Indian" (ASI). ANI are a population which can be compared pretty easily to other West Eurasians. There are no "pure" groups of ASI, but the indigenous peoples of the Andaman Islands are the closest, having diverged from the mainland ASI populations tens of thousands of years ago. At K = 11, that is, 11 inferred ancestral populations, Zack seems to have now stumbled onto the patterns which one would expect from this hybrid model of South Asians. Let me quote him:

Now let’s take all the reference populations with an Onge component between 10% to 50% and use the equation above to calculate their ASI percentage. The results are in a spreadsheet. There are several populations with an even higher Ancestral South Indian than any of the Reich et al groups, with Paniya being the highest at 67.4%.

The r-squared between % ASI and % Onge, an Andaman group, is 0.994. That means 99.4% of the variation in the former can be explained by variation of the latter. The % ASI is consistently higher than Onge. Why? The last common ancestors of Andaman Islanders and the ASI diverged on the order of tens of thousands of years ago. Dienekes observed ADMIXTURE needs good reference populations, and the Onge have been so long diverged from the last common ancestor with the mainland ASI populations that it's not a perfect proxy for this ancient group. But it seems that the underestimate is systematically biased in the same direction, so that explains the good fit between the two trends. Zack naturally generated a pairwise matrix of Fsts between these inferred ancestral populations. Remember, the value within Fst shows the proportion of the genetic variance in the two populations which can be partitioned across them, but not within them. So it's a rough measure of genetic distance. Here's the matrix. I've renamed some populations:


S AsianAndamanE AsianSW AsianEuropeanSiberianW AfricanPapuanAmerindianKhoisan/PygmyE African

S Asian00.1650.1210.090.0710.1340.1840.210.1750.2610.15

Andaman0.16500.1220.1610.1520.1440.2240.2090.2070.3040.304

E Asian0.1210.12200.1520.1370.0670.2160.2050.1390.2940.187

SW Asian0.090.1610.15200.0480.1630.1790.2350.2080.2570.143

European0.0710.1520.1370.04800.1430.1860.2230.1780.2610.148

Siberian0.1340.1440.0670.1630.14300.2320.2280.1410.3110.203

W African0.1840.2240.2160.1790.1860.23200.2860.2810.1230.059

Papuan0.210.2090.2050.2350.2230.2280.28600.290.3670.26

Amerindian0.1750.2070.1390.2080.1780.1410.2810.2900.3640.252

Khoisan/Pygmy0.2610.3040.2940.2570.2610.3110.1230.3670.36400.133

E African0.150.1950.1870.1430.1480.2030.0590.260.2520.1330


The South Asian population above is very different from the components you've seen before. It seems equivalent to ANI more than anything else. This is a good reminder that the labels we're giving to these ancestral groups are mnemonics, they're not to be taken as literal and concretely. Personally I find Fst matrices hard to read, so I've generated a number of multidimensional scaling plots illustrating the relationships with the matrix. Clarity can be achieved by mixing & matching the populations, so that's what I did. Also, I only display dimension 1 and dimension 2. Remember that dimension 1 is the one with more weight.

Do not think of these as real concrete populations from which all modern populations emerged. These eleven populations are abstractions which fulfill the dictates of the algorithm. But, I do think that with that caveat in mind, there are suggestive patterns. First, the "SW Asian" component isn't that much closer to "W Africans" than the other West Eurasian groups. Yet we know in reality that Southwest Asian populations are closer to Africans. What's going on? Southwest African populations have African admixture. And, that admixture is recent enough that it shakes out rather easily. This is in contrast to the normal South Asian modal components, which are indicative of a greater time since admixture, which was thorough enough that it is not trivial to tease out the two ancestral groups from each other's genetic background. Fission and fusion are normal parts of the history of any geographically expansive species. ADMIXTURE will capture the earlier parts of fusion. But after a long enough period of time that fusion becomes its own distinctive element. There is the conventional east-west division you see in Eurasia on PCA, but you see evidence of the north-south secondary component on these plots too. The Andaman populations are closer to East Eurasians than West Eurasians, but, they also occupy their own position which highlights a north-south axis. Finally, the S. Asian/ANI population seems somewhat closer to "Europeans" than "SW Asians. That is interesting. But this where you have to very careful and remember that these "pure" ancestral components can themselves fractionate into substituent elements at higher K's or when you constrain the data set appropriately (Africans and inbred groups tend to hog clusters in ADMIXTURE). If you've read all the genome bloggers you will be aware that "European" and "SW Asian" components themselves break apart upon closer inspection. The "SW Asian" component usually divides into a northern and southern branch. The northern branch is often positioned closer to the other "European" groups than it is to the southern branch in terms of genetic distance. Here are a selection of West Eurasian groups sorted by their "S Asian" proportion:

3.jpg

4.jpg

5.jpg

61.jpg

7.jpg

8.jpg

9.jpg

10.jpg

11.jpg

Also observe that the distance between SW Asians and Europeans is smaller than bertween Europeans and S Asians. Crunching up the K's, or limited the data set to West Eurasian groups, would probably show more fine-grained relationships.

South Asian %

Iranians30%

Lezgins (Caucasian)29%

Georgians (Caucasian)26%

Adygei (Caucasian)24%

Armenians 22%

Turks21%

Syrians19%

Druze18%

Lebanese17%

Samaritians16%

Palestinian15%

Cypriots14%

Saudis14%

Yemenese14%

Russian8%

Tuscans7%

Hungarians7%

Utah whites7%

Orcadian5%

British5%

French5%

Italian5%

Finnish4%

2 Free Articles Left

Want it all? Get unlimited access when you subscribe.

Subscribe

Already a subscriber? Register or Log In

Want unlimited access?

Subscribe today and save 70%

Subscribe

Already a subscriber? Register or Log In