School girls in Hunza, Pakistan A few days ago I observed that pseudonymous blogger Dienekes Pontikos seemed intent on throwing as much data and interpretation into the public domain via his Dodecad Ancestry Project as possible. What are the long term implications of this? I know that Dienekes has been cited in the academic literature, but it seems more plausible that this sort of project will simply distort the nature of academic investigation. Distort has negative connotations, but it need not be deleterious at all. Academic institutions have legal constraints on what data they can use and how they can use it (see why Genomes Unzipped started). Not so with Dienekes' project. He began soliciting for data ~2 months ago, and Dodecad has already yielded a rich set of results (granted, it would not be possible without academically funded public domain software, such as ADMIXTURE). Even if researchers don't cite his results (and no doubt some will), he's reshaping the broader framework. In other words, he's implicitly updating everyone's priors. Sometimes it isn't even a matter of new information, as much as putting a spotlight on information which was already there. Below is a slice of a bar plot from Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation. It uses STRUCTURE with K = 7. To the right of the STRUCTURE slice are two plots of individual data on French and French Basque from the same HGDP data set using ADMIXTURE at K = 10 from Dodecad.
Repeated runs and higher K's make it clear that the French Basque lack a "West Asian" aspect which other French, and Iberians as well, have.
Some of this is clear in the paper I referenced above as well...the key is you have to look at the supplements at K = 6. Because the Basque are the only native non-Indo-European speakers in Western Europe, their origin and relationship to nearby populations has always been of interest (they also have the highest Rh- frequency of world populations). Granted, the French Basque are very similar genetically to the French as a whole. But, it is obviously highly informative that they lack an ancestral component in totality which seems to exist at low but consistent levels across Western European populations. The only other European population at K = 15 who lack the West Asian component in totality are Finns (the Lithuanians come very close). This is all preamble to a discussion of a post Dienekes put up today, A solution to the problem of Indo-Aryan origins. Remember that Dienekes has been "playing" with ADMIXTURE for only a few months. To claim to have found a 'solution' to a problem as intellectually and politically intractable and explosive as this is rather bold. The crux of the matter is that at a certain confluences of K's and population sets Dienekes has discovered a distinctive signature of ancestry which seems to be modal on the north slope of the Caucasus, and spans India and Europe. He terms this "Dagestani," due to the fact that among a population sample from this province in Russia this ancestral component is overwhelmingly dominant. The patterns of Dagestani admixture in Europe and India are curious and suggestive. 1 - In Europe the frequencies are low, but irregularly distributed (excepting around the North Caucasus). Scandinavians and British have appreciable fractions, Finns and Southern Europeans do not. Here's Dienekes:
Interpreting this pattern is not easy, but it does seem that this component seems to have a V-like distribution, achieving its maximum in Caucasus and its environs, then undergoing a diminution, and achieving a secondary (lower) frequency mode in NW Europe. The surprising appearance of the homonymous Dagestan component in India suggests a widespread presence of a common ancestry element. The West Asian element, by comparison seems to have a more normal /-like distribution around its center in Anatolia-Caucasus-Iran region. It does reach the Atlantic coast, but is lacking in Scandinavia and Finland, and also in India itself.
2 - South Indian Brahmins have appreciable fractions, but non-Brahmins in the same region do not. In contrast, those who come from Indo-Aryan speaking backgrounds do seem to have Dagestani ancestral components, irrespective of other aspects of ancestry. For example Pakistanis don't have that much more Dagestani than South Indian Brahmins or Gujaratis. Also compare the relatively narrow window of Dagestani ancestry variance among Dodecad South Asians (I'm DOD075). DOD088 is from what I recall a Reddy from Andhara Pradesh, a non-Brahmin but non-low caste. It is interesting that they have a high proportion of "Pakistan," but no Dagestani. I have ~10% Dagestani, but no Pakistani. Below is K = 10 for a selection of populations. Dienekes has now included in two non-Indo-European speaking Pakistani populations: the Brahui (Dravidian) and Burusho (linguistic isolate in the mountains of Pakistan):