Proper methods and false results

Gene ExpressionBy Razib KhanMay 23, 2011 4:07 AM


Sign up for our email newsletter for the latest science news

The Pith: Honorable intent and punctilious adherence to proper form and method does not guarantee a set of results which flesh out a genuine phenomenon. Much of science is tragic.

Most of the time I point to and review papers on this weblog which excite me. But in the interests of "balance" and dampening the bias toward material I find interesting and salient I thought it would be interesting to look at a paper which I thought wasn't too interesting. It's in the Journal of Human Genetics, part of the Nature Publishing Group empire. Also, it is open access, so you can read it yourself and make your own individual judgments. The Soliga, an isolated tribe from Southern India: genetic diversity and phylogenetic affinities:

India's role in the dispersal of modern humans can be explored by investigating its oldest inhabitants: the tribal people. The Soliga people of the Biligiri Rangana Hills, a tribal community in Southern India, could be among the country's first settlers. This forest-bound, Dravidian speaking group, lives isolated, practicing subsistence-level agriculture under primitive conditions. The aim of this study is to examine the phylogenetic relationships of the Soligas in relation to 29 worldwide, geographically targeted, reference populations. For this purpose, we employed a battery of 15 hypervariable autosomal short tandem repeat loci as markers. The Soliga tribe was found to be remarkably different from other Indian populations including other southern Dravidian-speaking tribes. In contrast, the Soliga people exhibited genetic affinity to two Australian aboriginal populations. This genetic similarity could be attributed to the ‘Out of Africa’ migratory wave(s) along the southern coast of India that eventually reached Australia. Alternatively, the observed genetic affinity may be explained by more recent migrations from the Indian subcontinent into Australia.

To be blunt about it I think the researchers here just randomly stumbled onto a weird result which happened to align with some plausible preconceptions. This happens all the time, and is responsible for the unfortunate confirmation bias which plagues science. Researchers know very well what the expected results are, and may unconsciously or consciously sift through their data for a set of facts which align well with their theoretical preconceptions. In this case it isn't quite so bald, as there are no orthodoxies, but a set of alternative hypotheses which go back a century or so.

The back story is the idea of the Australoid race, first conceived of by Thomas H. Huxely. To the left is a map which illustrates the original divisions of mankind as inferred by Huxley from his catalog of human characters. I haven't included the labels because they should be rather intuitive. Observe the similar shading of Australia and a portion of India. This is as economists might say a 'stylized fact,' it captures the basic nugget of truth, but shouldn't be taken as a strict concrete representation of reality. The fact is that it is obvious that upon visual inspection many South Asians, especially those termed adivasi, the "tribal" population which has customarily existed on the margins or outside of the Hindu caste system, bear some resemblance to Australian Aborigines. Additionally some anatomists adduced that there were similarities in the skeletal morphology and the like. I can't evaluate that, but there's a long tradition in biological anthropology which asserts that there is some connection between the peoples of Australia, and a substrate element in South Asia. Many South Asians I know can see this resemblance as well, so it isn't as if this was "invented" by Thomas H. Huxley from his fertile mind. More recently there has been the idea that the Out of Africa migration was characterized by a "southern wave" which skirted the coastlines of the Indian ocean, and pushed all the way to Australia. The reason that this rapid maritime migration has been posited is that the residence of modern humans in Australia is of long standing, on the order of ~50,000 years. In a traditional genetic model of the emergence of modern humanity that left barely any time between the rise of modern humans in Africa and their arrival in Australia (in contrast, anatomically modern humans didn't arrive in Europe until after 40,000 years before the present, and perhaps a bit later). Obviously any migration of humans from Africa to Australia would have had to touch base in India. Therefore genetic anthropologists went looking, in particular they focused on the mitochondrial and Y chromosomal lineages. Eventually they found what they were looking for. At low frequencies in India they detected possible connections to Australian haplogroups. In other words, the ancestors of Australian Aborigines who had no doubt touched down in India left some descendants in India. The idea of a southern migration of neo-Africans ~50,000 years ago naturally allowed one to bridge Huxley's model of an "Australoid race" derived from pre-cladistic taxonomy to the methods of modern genetics. And conveniently for the purposes of time depth the features of the Australoid race are more clearly represented amongst the tribal and low caste populations which are also presumed to have deeper roots in South Asia. There are two major problems which jump out at me here though. The first is somewhat theoretical: how exactly does phenotypic continuity get maintained between populations which diverged ~50,000 years ago? According to the older model of modern human origins this isn't really that much later than the last common divergence between all non-Africans, and perhaps even Africans. Did the Australian Aborigines and Indian tribal populations enter into a period of phenotypic stasis? There a rejoinder here: the connections between Indian tribal populations and Australian Aborigines are far more recent. The arguments, theses, and data to support this conjecture are all laid out in the paper. The most extreme adherents have suggested that in fact a migration occurred to Australia within the last ~5,000 years, which brought the dingo, and that that migration is the common source population of Australian Aborigines and Indian tribes. Both the genetic and archaeological data are tendentious which might support this model. The discussion in the text of the paper doesn't go into the contention and frank politicization which occurred in regards to these theories in Australia. And why should they? It's a journal of human genetics, not one of the social construction of science. But it's important to keep in mind. But the big issue is that as they note surveys of hundreds of thousands of SNPs don't really show a connection between Aborigines and South Asians which are particularly supportive of any strong affinity between the two groups. Projects such as the Harappa Ancestry Project have huge data sets of South Asians, including tribal Indians. At low K's there is some affinity between Papuans and South Asians, but this tends to go away at higher K's. I do think there is some continuity and relationship between Oceanians (Australian Aborigines & Melanesians) and the genetic substrate of South and Southeast Asia, but it is far too attenuated to substantiate the persistence of an Australoid race. So what's going on with the results in this paper? As I note in the title the methods are in my opinion kosher from what I can tell. But the conclusion just doesn't seem creditable. How to explain the failure of valid methods? First, they use 15 loci. Granted, these are hypervariable regions of the genome which should be ancestrally informative. But it's still 15 markers! Very importantly the authors note in regards to the Australian Aborigine affiliated Indian tribe:

For example, they possess the lowest number of alleles (115) of all the reference worldwide populations examined...They also display the lowest average observed heterozygosity (0.75643)...The high degree of genetic homogeneity observed could also have been caused, in part, by their low status in the social hierarchy.

I think a plausible explanation for their genetic homogeneity is that like many Indian tribes they have low effective population sizes, and so lost most of their genetic variation because of drift. Take 15 markers, crank them through drift, and I don't think it is implausible that you could random walk a population far away from its neighbors. Indian tribal populations in other analyses seem to exhibit a repeated pattern of strange results because of excessive inbreeding or some sort of population bottleneck in the recent past (think about how the Kalash of Pakistan often break out in their own genetic cluster). This brings me back to my suspicion that this is just a false positive which bubbled up at the confluence of a preconceived model and the noise which is going to be an issue in any of these statistical genetic analyses. The authors know that Indian tribes should cluster with Australian Aborigines in some models. So when they see one of their several Indian tribal populations clustering with Aborigines on their 15 marker diagnostic, naturally this result is slotted into the prefab model. But as I have hinted before if you "mix & match" the populations in your data, modulate the marker thickness, and tweak parameters enough, you can "stumble" upon many explanatory models using these algorithms which infer genetic distance and ancestry. I suspect that other research teams using other tribal populations with other STRs may have stumbled onto weirder results, such as a cluster of Indian tribals with Sami or Greenlanders, which were just assumed to be ridiculous on the face of it. This particular result is obviously not ridiculous on the face of it, but I think looking at the full sweep of other genetic results we can discard it as being a good representation of the total genome affinity between these two populations. A reductio ad absurdum of this emphasis on a small marker set were the old attempts to construct races based on blood group distributions!

Finally, what about old Thomas H. Huxley and his Australoid race? I think that it's probably convergent evolution. Humans come in a range of colors from pink to very dark brown. They don't come in red or yellow or green. They're tall or short. Their hair is curly or straight. And so on. In the finite set of possible variables you're going to have many human populations which arrive at a convergence of traits, and so resemble each other despite lack of particularly recent common ancestry. The Ainu of Japan were once assumed to be a distant branch of the family of European peoples because of their lack of the distinctive characteristics of their Japanese neighbors. Even the early classical genetic markers disabused scientists of this possibility, and more recent genetic work seems to point a broad affinity with other Siberian populations. Similarly, despite superficial similarities between Melanesians and Africans, the two groups are not particularly close (in fact, most genetic distance measures seem to place Melanesians as more distant from Africans than West Eurasian populations, probably due to greater long term isolation). These sorts of complications are why I'm so obsessed with emphasizing a caution about relying on a particular figure or paper as definitive on a given genetic question. In some domains results can be taken out of their proper context, but in the case of a statistical science there's just a lot of randomness, and our pattern matching intuitions and culturally preconditioned expectations strongly predispose us to anchor onto confirming results. This is a major reason why I'm pretty dismissive and hostile to attempts to "win" arguments by dragging out a few citations. The unfortunately reality is that most results are either trivial or false, and with a search engine you can construct an argument with five supporting facts elementary school style within a few minutes. This may "win" the argument, but you lose the war to "win" an understanding of reality.Addendum: The undersampling of Australian Aborigine populations and South Asians in surveys of genetic variation softens the force of my critique here. It may be that the Soglia are a particular distinctive Dravidian tribe, which preserve a very ancient element in South Asian genetic history. Honestly I kind of doubt it after seeing the rampant admixture results among all South Asians in the most recent waves of SNP-chip studies (including the amateurs who are genome blogging). A bigger issue for me is the undersampling of Australian Aborigines. There may be variation which we're just no aware of it. I doubt that that variation will be too surprising, but who knows? Citation:

Morlote DM, Gayden T, Arvind P, Babu A, & Herrera RJ (2011). The Soliga, an isolated tribe from Southern India: genetic diversity and phylogenetic affinities. Journal of human genetics, 56 (4), 258-69 PMID: 21307856

1 free article left
Want More? Get unlimited access for as low as $1.99/month

Already a subscriber?

Register or Log In

1 free articleSubscribe
Magazine Examples
Want more?

Keep reading for as low as $1.99!


Already a subscriber?

Register or Log In

More From Discover
Recommendations From Our Store
Shop Now
Stay Curious
Our List

Sign up for our weekly science updates.

To The Magazine

Save up to 70% off the cover price when you subscribe to Discover magazine.

Copyright © 2021 Kalmbach Media Co.