PCA plots and trees

Jun 2, 2010 12:52 AMNov 20, 2019 2:29 AM

Newsletter

Sign up for our email newsletter for the latest science news

A few years ago I had the pleasure of asking the famed geneticist L. L. Cavalli-Sforza some questions. Here's part of the Q & A which is germane to my post from a few days ago:

7) Question #3 hinted at the powerful social impact your work has had in reshaping how we view the natural history of our species. One of the most contentious issues of the 20th, and no doubt of the unfolding 21st century, is that of race. In 1972 Richard Lewontin offered his famous observation that 85% of the variation across human populations was within populations and 15% was between them. Regardless of whether this level of substructure is of note of not, your own work on migrations, admixtures and waves of advance depicts patterns of demographic and genetic interconnectedness, and so refutes typological conceptions of race. Nevertheless, recently A.W.F. Edwards, a fellow student of R.A. Fisher, has argued that Richard Lewontin's argument neglects the importance of differences of correlation structure across the genome between populations and focuses on variance only across a single locus. Edwards' argument about the informativeness of correlation structure, and therefore the statistical salience of between-population differences, was echoed by Richard Dawkins in his most recent book. Considering the social import of the question of interpopulational differences as well as the esoteric nature of the mathematical arguments, what do you believe the "take home" message of this should be for the general public? Edwards and Lewontin are both right. Lewontin said that the between populations fraction of variance is very small in humans, and this is true, as it should be on the basis of present knowledge from archeology and genetics alike, that the human species is very young. It has in fact been shown later that it is one of the smallest among mammals. Lewontin probably hoped, for political reasons, that it is TRIVIALLY small, and he has never shown to my knowledge any interest for evolutionary trees, at least of humans, so he did not care about their reconstruction. In essence, Edwards has objected that it is NOT trivially small, because it is enough for reconstructing the tree of human evolution, as we did, and he is obviously right.

PCA plots show you variation that occurs in a correlated fashion across a set of genes. In other words, they're large systematic signals within the sea of noise genetic variation. They can tell us a great deal, in concert with other techniques, about the history of our species, and the nature and extent of the relationship between populations within in our species. The reason that there is correlated variation across a subset of genes which are highly informative in regards to population identity is simple: human population groups generally have a common shared history. They have been subject to the same evolutionary dynamics, and those dynamics, from drift to selection, have particular effects on the nature of genomic variation (or lack thereof). My point in my previous post was to emphasize that this information needs to be integrated into the bigger picture in a nuanced fashion. Broad systematic population wide patterns of variation, and between population variation, is important, and of great evolutionary interest. But the genetic uniqueness within families, from recent unique de novo mutations (operationally, family scale private alleles), is also of great interest and importance. PCA plots such as the ones above are naturally not going to tell us much about this aspect of human variation. In the "thought experiment" I presented I indicated that focus on the largest signals of between population variation alone can miss a great deal.

1 free article left

Want More? Get unlimited access for as low as $1.99/month

Subscribe

Already a subscriber?

Register or Log In

1 free articleSubscribe

Want more?

Keep reading for as low as $1.99!

Subscribe

Already a subscriber?

Register or Log In