Zack Ajmal has been taking his Reference 3 data set for a stroll over at the Harappa Ancestry Project. Or, more accurately, he's been driving his computer to crunch up ADMIXTURE results ascending up a later of K's. Because it is the Harappa Ancestry Project Zack's populations are overloaded a touch on South Asians. He managed to get a hold of the data set from Reconstructing Indian History. If you will recall this paper showed that the South Asian component which falls out of ancestry structure inference algorithms may actually be a stabilized hybrid of two ancient populations, "Ancestral North Indian" (ANI) and "Ancestral South Indian" (ASI). ANI are a population which can be compared pretty easily to other West Eurasians. There are no "pure" groups of ASI, but the indigenous peoples of the Andaman Islands are the closest, having diverged from the mainland ASI populations tens of thousands of years ago. At K = 11, that is, 11 inferred ancestral populations, Zack seems to have now stumbled onto the patterns which one would expect from this hybrid model of South Asians. Let me quote him:
Now let’s take all the reference populations with an Onge component between 10% to 50% and use the equation above to calculate their ASI percentage. The results are in a spreadsheet. There are several populations with an even higher Ancestral South Indian than any of the Reich et al groups, with Paniya being the highest at 67.4%.
The r-squared between % ASI and % Onge, an Andaman group, is 0.994. That means 99.4% of the variation in the former can be explained by variation of the latter. The % ASI is consistently higher than Onge. Why? The last common ancestors of Andaman Islanders and the ASI diverged on the order of tens of thousands of years ago. Dienekes observed ADMIXTURE needs good reference populations, and the Onge have been so long diverged from the last common ancestor with the mainland ASI populations that it's not a perfect proxy for this ancient group. But it seems that the underestimate is systematically biased in the same direction, so that explains the good fit between the two trends. Zack naturally generated a pairwise matrix of Fsts between these inferred ancestral populations. Remember, the value within Fst shows the proportion of the genetic variance in the two populations which can be partitioned across them, but not within them. So it's a rough measure of genetic distance. Here's the matrix. I've renamed some populations:
S AsianAndamanE AsianSW AsianEuropeanSiberianW AfricanPapuanAmerindianKhoisan/PygmyE African
S Asian00.1650.1210.090.0710.1340.1840.210.1750.2610.15
Andaman0.16500.1220.1610.1520.1440.2240.2090.2070.3040.304
E Asian0.1210.12200.1520.1370.0670.2160.2050.1390.2940.187
SW Asian0.090.1610.15200.0480.1630.1790.2350.2080.2570.143
European0.0710.1520.1370.04800.1430.1860.2230.1780.2610.148
Siberian0.1340.1440.0670.1630.14300.2320.2280.1410.3110.203
W African0.1840.2240.2160.1790.1860.23200.2860.2810.1230.059
Papuan0.210.2090.2050.2350.2230.2280.28600.290.3670.26
Amerindian0.1750.2070.1390.2080.1780.1410.2810.2900.3640.252
Khoisan/Pygmy0.2610.3040.2940.2570.2610.3110.1230.3670.36400.133
E African0.150.1950.1870.1430.1480.2030.0590.260.2520.1330
The South Asian population above is very different from the components you've seen before. It seems equivalent to ANI more than anything else. This is a good reminder that the labels we're giving to these ancestral groups are mnemonics, they're not to be taken as literal and concretely. Personally I find Fst matrices hard to read, so I've generated a number of multidimensional scaling plots illustrating the relationships with the matrix. Clarity can be achieved by mixing & matching the populations, so that's what I did. Also, I only display dimension 1 and dimension 2. Remember that dimension 1 is the one with more weight.
Do not think of these as real concrete populations from which all modern populations emerged. These eleven populations are abstractions which fulfill the dictates of the algorithm. But, I do think that with that caveat in mind, there are suggestive patterns. First, the "SW Asian" component isn't that much closer to "W Africans" than the other West Eurasian groups. Yet we know in reality that Southwest Asian populations are closer to Africans. What's going on? Southwest African populations have African admixture. And, that admixture is recent enough that it shakes out rather easily. This is in contrast to the normal South Asian modal components, which are indicative of a greater time since admixture, which was thorough enough that it is not trivial to tease out the two ancestral groups from each other's genetic background. Fission and fusion are normal parts of the history of any geographically expansive species. ADMIXTURE will capture the earlier parts of fusion. But after a long enough period of time that fusion becomes its own distinctive element. There is the conventional east-west division you see in Eurasia on PCA, but you see evidence of the north-south secondary component on these plots too. The Andaman populations are closer to East Eurasians than West Eurasians, but, they also occupy their own position which highlights a north-south axis. Finally, the S. Asian/ANI population seems somewhat closer to "Europeans" than "SW Asians. That is interesting. But this where you have to very careful and remember that these "pure" ancestral components can themselves fractionate into substituent elements at higher K's or when you constrain the data set appropriately (Africans and inbred groups tend to hog clusters in ADMIXTURE). If you've read all the genome bloggers you will be aware that "European" and "SW Asian" components themselves break apart upon closer inspection. The "SW Asian" component usually divides into a northern and southern branch. The northern branch is often positioned closer to the other "European" groups than it is to the southern branch in terms of genetic distance. Here are a selection of West Eurasian groups sorted by their "S Asian" proportion:
Also observe that the distance between SW Asians and Europeans is smaller than bertween Europeans and S Asians. Crunching up the K's, or limited the data set to West Eurasian groups, would probably show more fine-grained relationships.
South Asian %
Iranians30%
Lezgins (Caucasian)29%
Georgians (Caucasian)26%
Adygei (Caucasian)24%
Armenians 22%
Turks21%
Syrians19%
Druze18%
Lebanese17%
Samaritians16%
Palestinian15%
Cypriots14%
Saudis14%
Yemenese14%
Russian8%
Tuscans7%
Hungarians7%
Utah whites7%
Orcadian5%
British5%
French5%
Italian5%
Finnish4%