Register for an account


Enter your name and email address below.

Your email address is used to log in and will not be shared or sold. Read our privacy policy.


Website access code

Enter your access code into the form field below.

If you are a Zinio, Nook, Kindle, Apple, or Google Play subscriber, you can enter your website access code to gain subscriber access. Your website access code is located in the upper right corner of the Table of Contents page of your digital edition.


The residual of the genes & geography correlation

Gene ExpressionBy Razib KhanFebruary 28, 2011 3:14 PM


Sign up for our email newsletter for the latest science news


David of the Eurogenes Genetic Ancestry Project has a cautionary post up, When is a genetic map also a geographic map? Always and never. In it, he uses a specific peculiar pattern as a launching point into a broader exploration of the relationship between visualizations of genetic variation, and geography. That pattern is that Russians, the most geographically furthest east of European peoples, are closer to the Slavs of Central Europe than the Balts when plotted on the two largest dimensions of variation. I've highlighted this pattern from a PCA David extracted from a paper on northeast European genetics. This disjunction between geography and genetics has a pretty straightforward possible explanation: the current distribution of Russian-speaking peoples is a function of a massive demographic expansion to the east by Slavic farmers within the last 2,000 years. We already know that the borderlands between the steppe and the forest were long dominated by North Iranian people, from the Scythians to the Sarmatians, while further north the Great Russians absorbed a Finnic substrate (clear because some of the absorption is attested down to the early modern period). With that duly noted, I think there's definitely some margin in more rigorously estimating the deviations from expectation when one attempts to generate a correspondence between a PCA and a geographic map. What I'm imagining is that you simply enter in the positions of various ethnic groups on a real map, and then transpose the PCA with the ethnic labels on top of that map and shift until you maximize the correlations. When the correlations are maximized, stop, and then note where there are the greatest deviations from expectation. Taking example above a vast swath of eastern Europe would show up as a major deviation. Some of these peculiarities will be due to geography. The chasm between Africans and non-Africans will probably be greater than one would expect as a function of distance, but the intervening Sahara presents itself as a good cause. But, when you look at the genetic data sometimes strange and unexpected correspondences emerge. If one can't immediately spot a reason, than that bears further investigation. As I've given this some thought, I guess I should admit that I've fiddled with R's mapping functions, and also looked for other applications. But the labor input is such that I've put off getting deeper into this topic. I'd be curious if anyone else was interested in this sort of intersection between genetic and geographic data visualization. I think maps are pretty much informational gold.

3 Free Articles Left

Want it all? Get unlimited access when you subscribe.


Already a subscriber? Register or Log In

Want unlimited access?

Subscribe today and save 70%


Already a subscriber? Register or Log In