Human population structure, part n

I still remember when L. L. Cavalli-Sforza's The History and Geography of Human Genes was a candle in the dark, illuminating human history with slivers of genetic data laboriously gathered and analyzed over decades. We've come a long way. Dienekes points me to a new paper, Fine-scaled human genetic structure revealed by SNP microarrays:

We report an analysis of more than 240,000 loci genotyped using the Affymetrix SNP microarray in 554 individuals from 27 worldwide populations in Africa, Asia, and Europe. To provide a more extensive and complete sampling of human genetic variation, we have included caste and tribal samples from two states in South India, Daghestanis from eastern Europe, and the Iban from Malaysia. Consistent with observations made by Charles Darwin, our results highlight shared variation among human populations and demonstrate that much genetic variation is geographically continuous. At the same time, principal components analyses reveal discernible genetic differentiation among almost all identified populations in our sample, and in most cases, individuals can be clearly assigned to defined populations on the basis of SNP genotypes. All individuals are accurately classified into continental groups using a model-based clustering algorithm, but between closely related populations, genetic and self-classifications conflict for some individuals. The 250K data permitted high-level resolution of genetic variation among Indian caste and tribal populations and between highland and lowland Daghestani populations. In particular, upper-caste individuals from Tamil Nadu and Andhra Pradesh form one defined group, lower-caste individuals from these two states form another, and the tribal Irula samples form a third. Our results emphasize the correlation of genetic and geographic distances and highlight other elements, including social factors that have contributed to population structure.

Here are a few charts from the supplement which I've reformatted and labeled for clarity:

A few comments: Take a look at the differences between the HGDP and non-HGDP samples. These analyses have a fine scale, but they aren't always representative. I've made the complaint before that "South Asia" in the HGDP samples oversamples from the northwest fringe of the region, and you can see that on these plots. Increased sample sizes will probably just bring into focus the coarse patterns, but fine-grained details will no doubt be unearthed in the process which surprise. The African PC chart confirms earlier findings on Pygmies and Africa as a whole. I am intrigued though as the possibility of fine-grained differences on the "frontier" of Bantu expansion in South Africa. The Sotho-Tswana group includes populations dominant in Botswana, while the Nguni are common across South Africa. Though the Khoisan populations were extant across both regions, it seems plausible that the ratio of immigrants to the natives would be higher in the latter case than the former. In South Africa's climate, particularly the wetter east, the Bantu would have leveraged their cultural advantages (specifically, the type of agriculture which they practiced) into demographic expansion to a greater extent than in relatively arid Botswana. These particular data may, or may not, pan out in the long term, but there may be results which shed light on the ethnogenesis of the groups along the Bantu frontier of expansion, a topic which has had previous political salience (in the form of whites attempting to claim that the Bantu groups were still expanding after the Europea settlement at the Cape was founded). Due to the lack of written records genetics can serve as a critical supplement to archaeology. In the East Asian PC chart I want to point out that L. L. Cavalli-Sforza suggested that South Chinese clustered with Southeast Asians, while North Chinese clustered with Northeast Asians, nearly two decades ago. This is obviously a controversial claim for a variety of reasons, and other avenues of data seem to contradict it (e.g., similarities of Y lineages across China proper). It seems to me that looking at this the HGDP sample, which Cavalli-Sforza used in his original analysis, might have led to this conclusion, but a larger data set exhibits more continuity. Southeast Asia groups such as the Vietnamese and Thai have a historic origin within what is today southern China, where linguistic relations of these peoples still remaining in China as national minorities. The question then is how much amalgamation with the locals these populations engaged in after their emigration, and how much amalgamation occurred in China between natives and Han who migrated from the north. Whatever the "final answer" is, I doubt we'll have a scenario where we can dispense with a quantitative qualification of admixture between putative ancestral populations in South China, or for the Thai and Vietnamese (the large scale assimilation of ethnic Chinese from Fujian into the Thai population will serve as an important confound that one must be cautious of). It is interesting to note that the Khmer, whose own cultural domain covered much of what is today Thailand and Vietnam, seem to be more genetically distant from the South Chinese populations, as you would expect, but the sample sizes here are small. Mainland Southeast Asia (Burma to Vietnam) has many relict "Mon-Khmer" populations. In Burma and Thailand the northern populations to a great extent adopted the high culture of the indigenous populations, in particular Indic flavored Therevada Buddhism. In contrast in Vietnam the northern populations retained more of their own cultural uniqueness, as evidenced by their Mahayana Buddhism (there are in Vietnam Malay speaking Chams who are Saivite Hindus, attesting to the previous ascendancy of Indian culture in that region). The South Asian PC chart illustrates what I noted above as to the sampling of the HGDP populations; they tend to be as Iranian as they are Indian, and so skew perceptions of the relationship of South Asian populations to other groups. What is new to these data are the relationships of caste populations in Southern India. The non-Brahmin groups are Dalits, Untouchables, while the Irula are an ancient South Indian tribal population. Social historians often assert than the difference between Untouchables and tribals is a temporal one, insofar as the former are ex-tribals who have been integrated in a marginal manner into the mainstream Hindu South Asian culture, while the latter remain outside of it. The Brahmins in South India have historical memory of migration from the north of India. Physically Tamil Brahmins do seem to be different from the general population of Tamil Nadu, with a higher frequency of individuals who exhibit a phenotype more common in northern India. These data confirm previous results which show that caste stratification has a genetic reality. On the other hand they also support our intuition that there is a great deal of similarity between the South Asian groups. The Brahmins of South India have a much larger proportion of ancestry assigned to the "European"* cluster than Dalits. In fact, the Brahmin with the lowest proportion of that ancestral proportion has a higher fraction than the Dalit with the highest proportion. But the Dalits do have some of the European ancestry, while the tribals have very little. Going back to the previous model of assimilation of tribals into the Indian social structure as Dalits, after the point of integration small amounts of intermarriage would easily result in this sort of gene flow. Though the rough correlation between genetic and social structure exists, there's obviously been gene flow (e.g., Namboodiri Sambandham being a relatively recent instantiation of formalized intercaste relations), and there is also record of communities going into "uplift" in terms of caste status in the recent past, while other groups have legends of decline. Most of the legends are no doubt myths whose role is to concoct an auspicious origin for a particular group, but certainly there were likely high status individuals and groups who fell from power and eminence. Dienekes alludes to the fact that some Southeast Asian groups seem to share more genetically with Indians than you would expect. There are two obvious explanations for this: deep common ancestry of the ancient substrate upon which was overlain the genes of populations which moved in from the north and northwest respectively for the two groups. Or, more recent gene flow. In the Indian case some Indo-Aryan South Asian groups in the northeast have obviously assimilated populations which migrated recently from Southeast Asia (the originally Tibeto-Burman Chakma of Bangladesh remain physically distinct and culturally Therevada Buddhists, but now speak a Bengali dialect). But we have strong historical evidence of trade networks between India and the Maritime Southeast Asia, as well as the powerful cultural influence on Mainland Southeast Asia. If the ancestry was recent, presumably the Indian proportion in Southeast Asia would be less diverse than than in India (a subset), while if it was deep common ancestry then it would be relatively diverse (though if serial founder effects from the Out of Africa migration were operative perhaps somewhat less diverse in any case). Obviously something for further exploration. Update:

Related:Genetic map of Europe, Genetic map of East Asia, Population Substructrure in Japan and South Indian Phylogeography. * If this is exogenous, as Dienekes noted it might not be strictly European if the populations originated from the East of the Urals. "West Eurasian" might be more accurate.

Human population structure, part n

Newsletter