There's a new paper in AJHG out, Whole-Genome Genetic Diversity in a Sample of Australians with Deep Aboriginal Ancestry, which I'll hit later. It doesn't have anything too surprising, but in the supplements they have a figure which shows frappe and Structure plots for the HGDP populations as well as their Australian Aboriginal sample. These methods take an individual's genome and assign elements to one of K ancestral populations. For African Americans this is highly illuminating, as K = 2 simply breaks down along European/African ancestral lines. The mean turns out to be ~20% for the minority quantum, exactly what had previously been ascertained through genealogy, classical autosomal markers (e.g., Duffy), and the average of uniparental lineages for European ancestry (African Americans tend to be enriched for European Y chromosomal markers, and have less than the expected European mtDNA markers. Again, totally intelligible in light of the history of relations in the old South). These abstractions extract visually intelligible information out of the hundreds of thousands of concrete variant bases within human populations. They have clear and immediate utility when you have some inkling of the population history of a given sample. But when you attempt the same with populations whose histories are less clear and distinct, or who do not have such an obvious and well known genesis as African Americans, then things get murkier. Therefore when it comes to higher values of K in many of these papers I just avoid reading too much into the results because the human mind is a pattern recognition machine, and it's very easy to tell stories which have no way of being validated or falsified. Most of the authors of these papers tend to agree as higher K plots are usually nested in the supplements, not the main paper itself. But with all that caution entered into the record, I thought that K = 8 in the supplemental figure 1 was of some interest, and I want to focus on it just a little bit. I reedited it, removing many populations, and shifting the frappe and Structure plots at K = 8 next to each other. I also added some population labels for clarity, though if you're familiar with the HGDP data set it's clear what the abbreviations are.
First, it seems that at K = 8 the fact that the non-indigenous ancestry of the Australian Aboriginal sample is Western European is pretty clear even without the known history (Dienekes noted this as well). The only question is distinguishing which Western Eurasian populations the contribution came from, and this is of some interest because of a possible connection between India and Australia. Many South Asians have a vague resemblance to Australian Aboriginals, and many Indian tribal groups are termed "Australoid." More recently a very distant mtDNA link between Indian tribal groups and Aboriginals has been validated. But that's totally expected, as all populations to the east of South Asia probably went through that region on the way out of Africa. A coalescence time on the order of 50,000 years ago seems to suggest that that is the connection, not a more recent migration as some have hypothesized, and which could give a phylogenetic causal basis to morphological similarities. In the frappe plot, to the right, note that the South Asians are enriched for the orange shaded ancestral group. It's residual in most Europeans, and almost absent in Australian Aboriginals. In the Structure plot, to the left, it's the blue segment which is enriched in South Asians, and residual in Europeans. Again, it's nearly absent in Aboriginals. That, combined with the attested presence of a high frequency of European diagnostic markers, such as the blue-eye OCA2 SNPs, should seal the deal in regards to the question of any more recent admixture from the initial settlement of the current indigenous stock with any group but Europeans. But the reason I'm posting isn't because of Aboriginal genetics. There are a few coarse clusters of human populations. Roughly, Amerindians, East Asians, Oceanians, West Eurasians + North Africans, and Sub-Saharan Africans. But within these clusters are further differences. Among the Mozabites (an Algerian Berber group with substantial Sub-Saharan African admixture), the Basque, and Sardinians, there seem to be an element which is nearly absent, but which increases in frequency as one goes east toward the heart of Eurasia. I am referring here to the aforementioned segments which I highlighted as the components whose lack suggests that Aboriginals received their non-indigenous ancestry from Europeans. It makes me think about Li et al.'s argument that skewed population coverage has resulted in the omission of a major Central Eurasian ancestral population cluster between those of the west and east. If there was a major demographic pulse out of the center of Eurasia it would make sense that groups on the western fringe of the World Island, those in the western Mediterranean region, would show the least sign of it. I have no model for what such a pulse would be. Perhaps it wasn't a pulse, but just isolation-by-distance and clinal variation which pops out in a discrete fashion if one cranks up the K's. My initial thought is that it was the Indo-European languages, but it's well represented in the Levant, and the Adygei (ADY) are not Indo-European anyway (though they could be distantly related to Indo-European and so exhibit some of the same genetic variation as the original population). I think there's a good chance that here I'm confusing the analytical methods, frappe and Structure, for reality. But I thought I'd throw it out there since I've noticed this pattern for several years now....