Register for an account


Enter your name and email address below.

Your email address is used to log in and will not be shared or sold. Read our privacy policy.


Website access code

Enter your access code into the form field below.

If you are a Zinio, Nook, Kindle, Apple, or Google Play subscriber, you can enter your website access code to gain subscriber access. Your website access code is located in the upper right corner of the Table of Contents page of your digital edition.


PCA plots and trees

Gene ExpressionBy Razib KhanJune 2, 2010 12:52 AM


Sign up for our email newsletter for the latest science news

A few years ago I had the pleasure of asking the famed geneticist L. L. Cavalli-Sforza some questions. Here's part of the Q & A which is germane to my post from a few days ago:

7) Question #3 hinted at the powerful social impact your work has had in reshaping how we view the natural history of our species. One of the most contentious issues of the 20th, and no doubt of the unfolding 21st century, is that of race. In 1972 Richard Lewontin offered his famous observation that 85% of the variation across human populations was within populations and 15% was between them. Regardless of whether this level of substructure is of note of not, your own work on migrations, admixtures and waves of advance depicts patterns of demographic and genetic interconnectedness, and so refutes typological conceptions of race. Nevertheless, recently A.W.F. Edwards, a fellow student of R.A. Fisher, has argued that Richard Lewontin's argument neglects the importance of differences of correlation structure across the genome between populations and focuses on variance only across a single locus. Edwards' argument about the informativeness of correlation structure, and therefore the statistical salience of between-population differences, was echoed by Richard Dawkins in his most recent book. Considering the social import of the question of interpopulational differences as well as the esoteric nature of the mathematical arguments, what do you believe the "take home" message of this should be for the general public? Edwards and Lewontin are both right. Lewontin said that the between populations fraction of variance is very small in humans, and this is true, as it should be on the basis of present knowledge from archeology and genetics alike, that the human species is very young. It has in fact been shown later that it is one of the smallest among mammals. Lewontin probably hoped, for political reasons, that it is TRIVIALLY small, and he has never shown to my knowledge any interest for evolutionary trees, at least of humans, so he did not care about their reconstruction. In essence, Edwards has objected that it is NOT trivially small, because it is enough for reconstructing the tree of human evolution, as we did, and he is obviously right.

PCA plots show you variation that occurs in a correlated fashion across a set of genes. In other words, they're large systematic signals within the sea of noise genetic variation. They can tell us a great deal, in concert with other techniques, about the history of our species, and the nature and extent of the relationship between populations within in our species. The reason that there is correlated variation across a subset of genes which are highly informative in regards to population identity is simple: human population groups generally have a common shared history. They have been subject to the same evolutionary dynamics, and those dynamics, from drift to selection, have particular effects on the nature of genomic variation (or lack thereof). My point in my previous post was to emphasize that this information needs to be integrated into the bigger picture in a nuanced fashion. Broad systematic population wide patterns of variation, and between population variation, is important, and of great evolutionary interest. But the genetic uniqueness within families, from recent unique de novo mutations (operationally, family scale private alleles), is also of great interest and importance. PCA plots such as the ones above are naturally not going to tell us much about this aspect of human variation. In the "thought experiment" I presented I indicated that focus on the largest signals of between population variation alone can miss a great deal.

3 Free Articles Left

Want it all? Get unlimited access when you subscribe.


Already a subscriber? Register or Log In

Want unlimited access?

Subscribe today and save 70%


Already a subscriber? Register or Log In