The Pith: There is a very tight correlation between language and genes in the Caucasus region.
If the Soviet Union was the "The Prisonhouse of Nations," then the Caucasus region must be the refuge of the languages. Not only is this region linguistically diverse on a fine-grained scale, but there are multiple broader language families which are found nowhere else in the world. The widespread Indo-European languages are represented by Armenians, Greeks, and Iranians. The similarly expansive Altaic languages are represented by the Turkic dialects. But in addition to these well known groups which span Eurasia there are the Northwest Caucasian, Northeast Caucasian, and Kartvelian, families. These have only a local distribution despite their distinctiveness. On the one hand we probably shouldn't be that surprised by the prominence of small and diverse language families in this rugged region between Russia and the Near East. Mountains often serve as the last refuges of peoples and cultures being submerged elsewhere. For example, in the mountains of northern Pakistan you have the linguistic isolate of Burusho, which has no known affinity with other languages. Likely it once had relatives, but they were assimilated, leaving only this last representative isolated in its alpine fastness. The once extensive Sogdian dialects (Sodgian was once the lingua franca between Iran and China) are now only represented by Yaghnobi, which persists in an isolated river valley in Tajikistan. How the mighty have fallen! But the mountains are always the last fortresses to succumb.
But the Caucasus are peculiar for another reason: they're so close to the "action" of history. In fact, history as we know it started relatively near the Caucasus, to the south on the Mesopotamian plain ~5,000 years ago. Therefore we have shadows and glimmers of what occurred on the south Caucasian fringe early on, such as the rise and fall of the kingdom of Urartu ~3,000 years ago. The ancient ancestors of the Georgians even show up in Greek myth, as the Colchis of Medea. And this was a busy part of the world. Hittite, Greek, Roman, and Arab, came and went. The rise of Turkic resulted in the marginalization of many of its predecessors. Some scholars even argue that the Indo-European and Semitic languages families issue from the north and south fringes of the Fertile Crescent, respectively. And it isn't as if history has skirted the Caucasians. The Georgians faced the brunt of the Mongol armies, while the Circassians have famously been present across the greater Middle East as soldiers and slaves. Ultimately it seems that geography can explain much of the sui generis character of the Caucasus in relation to adjacent regions. The homogenizing impact of large political units such as Byzantium, Persia, the great Arab Caliphates, Russia, and the Ottomans, was dampened by the fact that the Caucasus was often administered indirectly. The cost of conquering valley after valley was presumably prohibitive, and the natives could always retreat to the mountains (as the Chechens did most recently in the 1990s). A new paper in Molecular Biology and Evolution illuminates the genetic relationship of Caucasian peoples, both within the region, and to groups outside of it. Parallel Evolution of Genes and Languages in the Caucasus Region:
We analyzed 40 SNP and 19 STR Y-chromosomal markers in a large sample of 1,525 indigenous individuals from 14 populations in the Caucasus and 254 additional individuals representing potential source populations.
We also employed a lexicostatistical approach to reconstruct the history of the languages of the North Caucasian family spoken by the Caucasus populations. We found a different major haplogroup to be prevalent in each of four sets of populations that occupy distinct geographic regions and belong to different linguistic branches. The haplogroup frequencies correlated with geography and, even more strongly, with language. Within haplogroups, a number of haplotype clusters were shown to be specific to individual populations and languages. The data suggested a direct origin of Caucasus male lineages from the Near East, followed by high levels of isolation, differentiation and genetic drift in situ. Comparison of genetic and linguistic reconstructions covering the last few millennia showed striking correspondences between the topology and dates of the respective gene and language trees, and with documented historical events. Overall, in the Caucasus region, unmatched levels of gene-language co-evolution occurred within this geographically isolated populations, probably due to its mountainous terrain.
In some ways this is a paper which would have been more in keeping with the early 2000s. It focuses on Y chromosomal markers, so the direct male lineage. This is contrast to the sort of analyses which focus on hundreds of thousands of autosomal markers across the genome. But there are some benefits to focusing on Y chromosomal lineages, which are highlighted within this paper. First, one can construct very precise trees based on the mutational distance of individuals. Haplogroups can be subdivided cleanly into haplotypes with treelike phylogenetic relationships by comparing mutational differences. Second, one can use molecular clock methodologies to peg the timing of the separation between two clades. I don't have a good natural grasp of the ethnography of the region, nor am I very well versed in the phylogeography of Y chromosomal lineages (at least in relation to some of the readers of this weblog), so I won't go into specifics much (see Dienekes Pontikos' comments). The main step forward here is the enormous sample size and fine-grained coverage of the ethnic groups across the Caucasus. In a region of such linguistic diversity and geographic fragmentation this is of the essence. They found a 0.64 correlation between variance in genes and language, and 0.60 correlation between variance in genes and geography. Because geography and language are so tightly linked in the Caucasus they couldn't obtain statistically significant results when one variable was controlled, but language seems to be a bigger predictor than geography. The following two maps show the distribution of haplogroups across Caucasian populations, as well as how they relate to other groups. A general affinity with Near Eastern groups is evident in this simply through inspection:
In classic fashion the authors found a very tight correlation between the phylogenetic trees generated from Y chromosomes and linguistics (the Dargins being the exception):
Many researchers, such as Marcus Feldman, assume that this sort of correspondence is a natural outgrowth of the fact that gene flow tends to be demarcated by dialect continuums. By this I mean that intermarriage between two groups all things equal is going to be favored if there is linguistic comprehensibility. In the pre-modern era before "standard" languages codified from on high this means that genes would flow from tribe to tribe, with subtle differences of dialect, which nevertheless would remain intelligible. That is until you encounter a language family barrier, where despite borrowings across the chasm intelligibility is simply not possible. In the Balkans the Slavic languages of Bulgarian and Macedonian reputedly exhibit a dialect continuum. But the barrier between these two languages and Greek is not just one of subtle shading, but deep differences. This seems to be at work in the Caucasus, where the chasm is even greater in linguistic terms (Greek and Slavic langauges are both Indo-European, though I suspect that at that level of distance there isn't much of a difference if it was Greek to Georgian or Slavic to Azeri). There are lots of details in the paper, ranging from a synthesis with archaeological evidence for the development of Caucasian cultural complexes derived from Near Eastern sources, to the timing of the separation between the major language families or sub-families. The weeds here are beyond me to be frank. So what can we conclude from this specific case to the generality? At some point in the near future we'll have thick and robust data sets like this for many regions of the world, so this may be a preview of what is to come. This is focusing on the Y chromosomal lineages, and we must remember that male mediated ancestry can exhibit consistent differences from female mediated ancestry. I no longer am very confident of the finding from comparisons of mtDNA and Y chromosomal variation that the majority of human gene flow has been female mediated because of patrilocality. But this may be at work in some areas. In general the scholars, such as Bryan Sykes, who have looked at the phylogeography of uniparental lineages tend to notice a difference between Y chromosomal and mtDNA patterns, whereby the former were subject to much clearer partitioning between groups (e.g., the Wales-England border) than the latter. The natural inference is that this is a hallmark of "man the warrior," as male linages eliminate and marginalize each other in the "great game" of genetic competition. Over the short term in the pre-modern world there is a zero sum aspect to this, populations are relatively constant, and so for Genghis Khan to be fruitful other men must be pushed aside. This does not necessarily entail slaughter. Bonded or landless men may not reproduce their genes, or, their reproduction may be sharply diminished. A few generations of differential fertility can quickly lead to major differences in the distribution of ancestry. Assume for example that at generation 1 population A outnumbers population B by a factor of 20. Assuming that A has a replication of 0.95 per generation and B 1.20 per generation, how many would it take for B to overtake A in total numbers? 13 generations. We have examples from the New World where Iberian Y chromosomal lineages have totally replaced Amerindian ones among the racially mixed population, while preserving Amerindian mtDNA. In areas with generations of European male migration the total genome content has become overwhelmingly male, but the mtDNA still shows the signature of the founding Amerindian population. I am willing to be that for the Caucasus we would see much less distinction on the mtDNA if the same study was replicated with the same individuals. The major explanation for why this would not be so from my perspective would be if the original male Near Eastern groups arrived and intermarried with sharply distinctive local female lineages, and these distinctions have been preserved over time through endogamy, whether culturally conditioned (language barriers) or geographically necessitated. Finally, on the broadest canvass these sorts of findings should make us question the contention that nationality is a totally modern invention. These language and genetic clusters clearly denote populations which have deep differences which have persisted and emerged over thousands of years. This has resulted in a "Balkan powder-keg" in our time (e.g., the Russian government backing the Ossetes against Chechens, and so on) . To some extent contemporary conflicts are rooted in the exigencies of the present. But, they often also utilize preexistent differences and allegiances which have deep time roots. Dismissing these differences as purely socially constructed epiphenomena is I think the wrong way to approach the question. Citation:
Oleg Balanovsky, Khadizhat Dibirova, Anna Dybo, Oleg Mudrak, Svetlana Frolova, Elvira Pocheshkhova, Marc Haber, Daniel Platt, Theodore Schurr, Wolfgang Haak, Marina Kuznetsova, Magomed Radzhabov, Olga Balaganskaya, Alexey Romanov, Tatiana Zakharova, David F. Soria Hernanz, Pierre Zalloua, Sergey Koshel, Merritt Ruhlen, Colin Renfrew, R. Spencer Wells, Chris Tyler-Smith, Elena Balanovska, & and The Genographic Consortium (2011). Parallel Evolution of Genes and Languages in the Caucasus Region Mol Biol Evol : 10.1093/molbev/msr126