Beyond visualization of data in genetics

Gene ExpressionBy Razib KhanMay 31, 2010 4:08 PM


Sign up for our email newsletter for the latest science news

Hopefully by now the image to the left is familiar to you. It's from a paper in Human Genetics, Self-reported ethnicity, genetic structure and the impact of population stratification in a multiethnic study. The paper is interesting in and of itself, as it combines a wide set of populations and puts the focus on the extent of disjunction between self-identified ethnic identity, and the population clusters which fall out of patterns of genetic variation. In particular, the authors note that the "Native Hawaiian" identification in Hawaii is characterized by a great deal of admixture, and within their sample only ~50% of the ancestral contribution within this population was Polynesian (the balance split between European and Asian). The figure suggests that subjective self assessment of ancestral quanta is generally accurate, though there are a non-trivial number of outliers. Dienekes points out that the same dynamic holds (less dramatically) for Europeans and Japanese populations within their data set. All well and good. And I like these sorts of charts because they're pithy summations of a lot of relationships in a comprehensible geometrical fashion. But they're not reality, they're a stylized representation of a slice of reality, abstractions which distill the shape and processes of reality. More precisely the x-axis is an independent dimension of correlations of variation across genes which can account for ~7% of the total population variance. This is the dimension with the largest magnitude. The y-axis is the second largest dimension, accounting for ~4%. The magnitudes decline precipitously as you descend down the rank orders of the principle components. The 5^th component accounts for ~0.2% of the variance. The first two components in these sorts of studies usually conform to our intuitions, and add a degree of precision to various population scale relations. Consider this supplement chart from a 2008 paper (I've rotated and reedited for clarity):

The first component separates Africans from non-Africans, the latter being a derived population from a subset of the former. The second component distinguishes West Eurasians from East Eurasians & Amerindians. These two dimensions and the distribution of individuals from the Human Genome Diversity Project reiterates what we know about the evolutionary history of our species. And yet I wonder if we should be careful about the power of these two-dimensional representation's in constraining us excessively when we think about genetic variation and dynamics. Naturally there is the sensitivity of the character of dimensions upon the nature of the underlying data set upon which they rely. But consider this thought experiment, Father = Japanese Mother = Norwegian Child = Half Japanese & Half Norwegian If you projected these three individuals upon the two-dimensional representation above of the worldwide populations the father would cluster with East Asians, the mother with Europeans, and the child with the groups who span the divide, Uyhgurs and Hazaras. So on the plot the child would be far closer to these Central Asian populations than to the groups from which its parents derive. And here's a limitation of focusing too much on two-dimensional plots derived from population level data: is the child interchangeable with a Uyghur or Hazara genetically in relation to their parents? Of course not! If the child was a female, and the father impregnated her, the consequence (or probability of a negative consequence) would be very different than if he impregnated a Uyghur or Hazara woman. The reason for this difference is obvious (if not, ask in the comments, many readers of this weblog know the ins & outs at an expert level). Abstractions which summarize and condense reality are essential, but they have their uses and limitations. Unlike physics biology can not rely too long on elegance, beauty, and formal clarity. Rather, it always has to dance back between rough & ready heuristics informed by the empirics and theoretical systems which emerge from axioms. Usually a picture has its own sense. But the key is to be precise in understanding what sense it makes to you.

1 free article left
Want More? Get unlimited access for as low as $1.99/month

Already a subscriber?

Register or Log In

1 free articleSubscribe
Discover Magazine Logo
Want more?

Keep reading for as low as $1.99!


Already a subscriber?

Register or Log In

More From Discover
Recommendations From Our Store
Shop Now
Stay Curious
Our List

Sign up for our weekly science updates.

To The Magazine

Save up to 40% off the cover price when you subscribe to Discover magazine.

Copyright © 2023 Kalmbach Media Co.