If you have not read my post "To the antipode of Asia", this might be a good time to do so if you are unfamiliar with the history, prehistory, and ethnography of mainland Southeast Asia. In this post I will focus on mainland Southeast Asia, and how it relates implicitly to India and China genetically, and what inferences we can make about demography and history. Though I will touch upon the Malay peninsula in the preliminary results, I have removed the Indonesian and Philippine samples from the data set in totality. This means that in this post I will not touch upon spread of the Austronesians.
I present before you two tentative questions:
- What was the relationship of the spread of Indic culture to Indic genes in mainland Southeast Asia before 1000 A.D.?
- What was the relationship of the spread of Tai culture to Tai genes in mainland Southeast Asia after 1000 A.D.?
The two maps above show the distribution of Austro-Asiatic and Tai languages in mainland Southeast Asia. Observe that when you join the two together in a union they cover much of the eastern 2/3 of mainland Southeast Asia. The fragmented nature of Austro-Asiatic languages in the northern region, edging into the People's Republic of China, implies to us immediately that it is likely that in the past there was a continuous zone of Austro-Asiatic speech in this region. From the histories and mythologies of the Tai people we know that this group migrated from the southern fringes of China around ~1000 A.D. This is obvious when we note that there are still Tai people in southern China, and the expansion of the Tai across what is today Thailand is to some extent historically attested. Between 1000 and 1500 there was a wholesale ethnic reorganization of the Chao Phray river basin. Was that a matter of demographic replacement, or cultural assimilation, or some of both?
Second, what was the impact of Indians upon mainland Southeast Asia? One of the easiest ways to ascertain Indian influence is script. Burmese, Thai and Cambodian scripts all derive from Grantha, an archaic Tamil script (non-Islamic scripts in island Southeast Asia, such as Javanese and Balinese, are also derive from South Indian precursors). The Indian religious influences also are more southern than northern, manifesting in the southern forms of Shaivite Hinduism and Sri Lankan Theravada Buddhism.
There are three data sets which I looked at. I ran most of them from K = 2 to K = 12. This means that I threw all the individuals into a common pool and told the ADMIXTURE program to estimate their individual proportions of K number of populations. In this way we can get a general sense of the relationships of the populations. Remember that these aren't necessarily real populations, and, the nature of the variation thrown into the pool impacts the nature of the inferred components greatly. I'm not reporting clear, distinctive, and objective entities extracted out of the data set. We're looking at human intelligible interpretations of the patterns dependent upon the inputs and parameters. They're telling us something real, but this isn't like measuring the acceleration of a falling ball. It's like describing the position of the ball in relation to a different set of reference objects. There's a real ball with a specific position, but the descriptions are going to vary depending on what references you use (e.g., to the left of object A and below B, to the right of object C and above object D, etc.).
Here are the sets:
1) A "large" set which includes the mainland Pan-Asian populations, the white Americans from the HapMap, and some Malay peninsular groups.
2) A "medium" set which prunes most of the North Asian groups, Malaysian groups, and the white Americans. So it's mostly mainland Southeast Asia, southern China, and India.
3) A "small" set, which removes many of the Southeast Asian populations, but keeps the Indian ones. I purposely overloaded this set with Indians to examine possibilities of Indian admixture in a few Southeast Asian groups.
Some notes. The Pan-Asian data set has ~56,000 markers. This is tolerable, but not optimal. It's definitely good enough for European vs. Indian vs. East Asian vs. Negrito. But not less optimal for intra-regional variation. So take it with a grain of salt. But since I'm looking at Indian vs. East Asian, I'm mildly confident of that finding in relation to this data set. Second, the intersection of white Americans with the Pan-Asian set was ~30,000 markers. For Cambodians it was only ~22,000. There were ~100 white Americans, but only ~11 Cambodians. Be very cautious of the Cambodian results for this reason. Finally, remember that the ancestral components are abstractions, and can imply that stable and long admixed hybrid populations are their own distinct component, as well as isolates which are highly inbred.
There are three analyses and visualizations I will display below.
1) ADMIXTURE bar plots, which show the ancestral proportions of groups or individuals of a particular ancestral element.
2) Fst estimates across ancestral elements. This is a rough summary of genetic distance. I'll also show you a two dimensional visualization on occasion, but remember that this removes some relationship information. The table is more accurate, though the visualization is easier to read.
3) Finally, I used EIGENSOFT to run some PCAs. This means that I took the pool of data and allowed the program to extract out the independent dimensions of variation. I ran it so that it pulled out the top 6 dimensions. The west-east dimension is always the largest by many multiples. Remember that the plots are not scaled.
I should also say that the K's I'm showing are the most before inbred subgroups within the reported populations started breaking out into their own components (this happened especially within the Indians).
Starting at the beginning, I have noticed in the Pan-Asian data set that some groups, particularly Mons and Malays, seem to show Indian admixture. My question: is this really Indian admixture, or perhaps recent European admixture? That's why I had the large data set, with white Americans. Here are the results:
So it seems unlikely that the Mon and Maly admixture with a West Eurasian element is from Europeans. Rather, it is consistent with Indians. In fact, I'm pretty confident it isn't West Asian either, as is a possibility in the case of the Malays, because that component tends to align with Europeans at this scale. Finally, I will tell you that the admixture in both Mon and Malays is relatively even. In other words, the group estimates aren't being shifted by one or two highly admixed Indians, which would be a good tell as to recent intermarriage. Not unheard of. Mahathir Mohamad's paternal grandfather was a Kerala Muslim.
Now let's look at the PCA. I'll focus on dimensions 1, 2, and 3. Remember that these are the three largest dimensions of genetic variance rank ordered. Dimension one is by far the largest, by a factor of at least five usually in these plots. It's the west vs. east Eurasian dimension.
I've highlighted the important bits. Two notes. First, I think you do see the suggestion that the Mon & Malay are shifted toward the Indians, not the Europeans. This is in perfect alignment with the ADMIXTURE result. Second, please note that the "Indian Singapore" population is heterogeneous. It is mostly Tamil, but there are clearly other Indians in the sample, and, some individuals who have Malay or Chinese ancestry.
Additionally, please note in the ADMIXTURE result above the similarity between the Tai and the Zhuang. The Zhuang are China's second largest ethnic group, and reputedly the source population for the Tai migrations into mainland Southeast Asia. Before I move on, you should have some sense of the locations and ethno-linguistic affinities of some of the more obscure groups:
One aspect which isn't listed here is the classification of some of these populations as "hill tribes" or not. The Mon and the H'tin are both Austro-Asiatic, but the former are in some ways analogous to the Greeks on mainland Southeast Asia, while the latter are a tribal isolate which has preserved its identity in the hills of northern Thailand. By Greeks, I mean that the Mon have been assimilated or dominated by the Bamar in Burma and the Tai in Thailand, but in both cases have imparted to these groups the essence of Southeast Asian Indic high culture. The Mon were at one point ascendant from the lower Irrawaddy in southern Burma to the lower Chao Praya basin in Thailand, the terminus of which today is Bangkok. In contrast, groups like the H'tin and Lawa were presumably relatively insulated from Indic influence. The Hmong are relative newcomers to Southeast Asia, which explains their status as animists for example. Finally, you have groups like the Wa which are technically not even Southeast Asian, but are Austro-Asiatic. They should give us a sense of Austro-Asiatics without an Indic imprint.
Let's move on to step two, the medium data set. I'm removing the white Americans, Malaysians, and North Asian groups. And now I'm including the Cambodians.
Again, the Mon have the Indian component. And so do the Cambodians. Remember that while everyone else has 56,000 SNPs, the Cambodians only have 22,000, so we need to be careful. Though you see this element in the HGDP runs as well. That is, an Indian affiliated component. It's relatively evenly distributed among the Cambodians, so you can't chalk it up to a few admixed individuals. Again, you see the similarity between the Zhuang and the Tai. The main difference is that the Tai seem to have admixed with various Southeast Asian groups. That's to be expected. What surprised me though is that from these results it seems that the Tai expansion was demographically, not just linguistically, dominant. This is clear even the Bangkok sample. More on this later.
Below are the genetic distances between the inferred ancestral groups. The labels given the modal population, and then the language family:
In this plot you see both the Mon and Cambodians shifted toward the Indians, again. Also, note the Zhuang and the Tai mostly overlap rather well. The y-axis is defined it seems by Austro-Asiatic hill tribes, then the Tibeto-Burman groups, and a gap until you hit the Tai cluster, which eventually merges with the Hmong. There's a reasonable language family affinity here, insofar as the Yao are between the Tai and the Hmong.
Finally, we move to the Indo-centric run. I've removed a lot of the Southeast Asian groups now. Some of the hill tribes are obviously relatively isolated, and so throw up their own clusters or diverge on PCA rather easily. That's a function of genetic differences which build up if you are relatively insulated from gene flow. Because I removed so many populations I'm only left with three K's before you get qasi-family clusters showing up as K's. Also, I'm going to show you individual bar plots for Cambodians and Mon to illustrate that the Indian component isn't just isolated instances of admixture:
OK, first, since this is an Indian focused set, you see that there's more than the standard west-east dimension. You have several lower order dimensions which separate Indians! I had previous assumed that the Indian component which always shows up in the Cambodians in the HGDP was a function of deep ancient ancestry with the "Ancestral South Indians" of Reich et al. This ancient population may have had affinities with many groups out toward Southeast Asia, and so the residual cluster in Cambodians may have been part of the deep Ice Age ancestry of this group. These results convince me that this is not so straightforward an explanation. In this sample the group that has the highest ASI are the Bhils, a tribal population. In one of the plots you see that the Bhils form one end of the distribution, and Gujarat Vaishyas the other. It is clear that this is an Ancestral North Indian-Ancestral South Indian cline. The Mon and Cambodians don't deviate much from the center, suggesting to me that they aren't too skewed toward the ASI! Additionally, the "center" of the distribution is weighted toward caste South Indians. This is then is a nice resolution, because it dovetails perfectly with the historical evidence for a South Indian specific influence on Southeast Asia in the early historic period.
This isn't a slam dunk. There needs to be estimates of the time since admixture. It should post-date the ANI-ASI admixture event, and be in the same range as the Uyghurs. Unfortunately with only 56,000 SNPs I'm not sure this estimate is possible, but I'll look into it. Additionally, a deeper survey of Y and mtDNA lineages need to be done in Southeast Asia. They may show sex-biased migration. I did look for the West Eurasian specific SLC24A5 variant, which goes no lower than ~50% in South India, but that's not in the Pan-Asian SNP data set. It is in the HGDP, and none of the 11 Cambodians have it. This would lean toward the ASI hypothesis, but seeing as how the West Eurasian variant may only about ~50%, and the Cambodians are less than 10% South Asian, it isn't totally implausible that it wouldn't show up in 22 gene copies (using realistic assumptions I get a ~50% probability that a West Eurasian copy of SLC24A5 wouldn't be found in the Cambodians with N = 11).
I've not devoted too much space to the Tai-Zhuang connection in this post, because it's obvious in the plots. The Tai are obviously somewhat shifted toward Austro-Asiatic groups, but far less than I would have expected. In fact, taking the ADMIXTURE components too literally you might infer that there's been more Tai admixture into the Mon and Khmer than the other way around! This might not be totally implausible when you consider that Thailand's population is nearly five times that of Cambodia. But the standard model I've read suggests that Tai warrior bands conquered the Mon-Khmer indigenes, and absorbed much of their high culture. These results don't cohere easily with that in terms of demographics.
I have a possible explanation for what occurred. Much of Thailand may not have been too populous until the past ~1,000 years, with lowland agriculture being driven by elite direction. The Tai may have brought superior agricultural techniques, and so entered into a phase of rapid population expansion into the lowland frontier, which had no parallel during the Mon and Khmer period of dominance. In other words, the Tai bands were small and initially outnumbered by the Mon and Khmer. But through favorable resource direction and priority allocation of newly arable land to co-ethnics the small Tai population might quickly have come to dominate the previous inhabitants. This is the model which is outlined in the Rise of Islam and the Bengal Frontier. In it the author basically argues that eastern Bengal was lightly populated until large scale Muslim elite driven projects to open up the agricultural frontier. The recruited peasants were either Muslim or converted to Islam, because the cultural landscape was relatively fluid and unsettled, in contrast to the more static peasant economy of western Bengal, which remained Hindu. The Islamicization of eastern Bengal in this model had less to do with the conversion of native tribes, and more to do with the rapid demographic expansion of Bengali peasant colonies which were enabled by agricultural projects, colonies which were Islamicized or were drawn from the minority Muslim peasantry of the western zone by Mughal elites intent on creating a region where the Hindu upper castes were marginalized. Similarly, the Tai expansion in Southeast Asia may have been into a de facto "empty" landscape. During the period when Mon and Khmer high culture was absorbed the Tai may have been the smaller element in terms of numbers. The current ratios are a function of later social and demographic processes.