The history and geography of genomes

A new paper in PloS Genetics sheds some light on issues which we were already familiar with through conventional history, Ancestral Components of Admixed Genomes in a Mexican Cohort. What we already know: Mexicans and people of Mexican descent predominantly derive from an admixture event(s) between Europeans and Amerindians, with a minor African component. The last is often a surprise to Mexicans themselves, but it is no surprise to those who are aware of the nature of Spanish colonialism in the New World. In some cases, such as in Cuba, the African slave economy which we're familiar with the United States was the norm, but in many instances African slaves accompanied Spaniards as secondaries in their conquest of the indigenous populations. New Spain was a caste society with a Spaniard and Creole elite, and a productive base of indios from whom they extracted rents. But Africans served as junior partners to the European elites, and were a substantial demographic presence down to the 19th century. Their near total genetic absorption though seems to have resulted in their near elimination from the cultural folk memory of Mexico. Most of the techniques in the paper should be somewhat familiar to you. In particular, there's a lot of PCA, as well as some model-based clustering methods. The PCA takes all the genetic variation in the data set, and reduces it down to large independent dimensions which you can visualize on a two dimensional plot (e.g.., PC 1 vs. PC 2 represents the largest explanatory dimension vs. the second largest). It turns out that most of the largest dimensions of variation are pretty well explained by our intuitions of genetic distance. The model-based approaches are different. Instead of letting the algorithm generate the clusters hypothesis free (i.e., you put labels on the clusters after the fact) you specify a number of populations, K, and the method forces the data you input to fit that parameter. In other words, it's kind of like a sausage. Sometimes the fit is good, and sometimes not so good (if you try and divide Swedes into 20 distinct populations, the algorithm will try and comply, but it should really tell you that's you're being crazy). But another way to go is to look at the structure of the genome itself in methods which focus on correlations across the chromosomes. While PCA and model-based methods can give you an intuition as to the average admixture of an individual, more fine-grained genomic methods which assign ancestry to segments across an individual's genotype yield more information. To get a better sense, here are two graphics generated from 23andMe's Ancestry Painting.

Both individuals shake out as ~50% European and ~50% East Asian. On PCA and model-based clustering they're not distinguishable. But when you look at the patterns on a more fine-grained chromosomal scale you see clearly that the individual to the left seems to show no evidence of recombination between "European" and "East Asian" segments, while the individual to the right shows many. That's because Uygurs as a genetic group emerged at last 1,000 years ago, and perhaps earlier. There are literally dozens of generations over which recombination could break apart association. In contrast, someone who is an F1 hybrid won't manifest that, because their parents are from "pure" populations. Recombination events will only result in the swapping of segments of the same ancestral origin. What does this have to do with this paper? The authors extracted out the different ancestral segments from Mexicans, European, African, and Amerindian, and constructed "virtual genomes" out of this raw material. And then they used the other methods to analyze these virtual genomes! You can see the result below.

As you would expect, the segments assigned to Europeans, Africans, and Amerindians diverge from each, and Mexicans span the gamut in proportion to assumed admixtures. There's no Amerindian population in the HapMap, but the position of the "virtual Amerindians" seems about right. Observe here though the importance of the nature of the input sample in PCA: the largest dimension of variation separates Amerindians from Europeans and Africans, not Africans from everyone else. The latter is the norm in most studies, but this data set is biased toward large numbers of Mexicans, and the Amerindian vs. European difference is what dominates that then. If your population is mostly Amerindian and European, then Amerindian vs. European differences will explain more variance than African vs. Amerindian or European difference. On the other hand, if you balanced the proportions somehow the African vs. everyone else difference should take its position as the largest independent dimension again (PC 1).

But the authors did find something interesting using this method that we didn't quite know. The virtual genomes go where you'd expect on the coarse continental level, but here you see that Mexicans seem to have two sources of Amerindian ancestry. One from southwest Mexico, and another are Maya from the Yucatan. Observe the clear admixture among some of the indigenous groups, while the virtual genome budges far less. This supports the validity of their method, as their assignments obviously did avoid labeling European segments as Amerindian. In any case, this pattern of ancestry is in sharp contrast with African Americans, who seem to exhibit little inter-individual difference in African ancestry. Why the contrast? Obviously the stories about the anomie of slave family life are correct, as ethnic cohesion and family integrity were rapidly destroyed in the New World. In contrast, in Mexico the indios may have been helots, but they weren't quite chattel. Their native communities persisted and maintained integrity in a way not possible with enslaved Africans. In the future these techniques are going to get better, especially with whole genome analysis. This means we'll be able to explore in more detail the contributions of various to groups to any given population. Specific elements of history will come into sharper focus. For example, we could ascertain the impact genetically of the Dutch, French (descended from Protestant refugees), and Germans, on the genomes of Afrikaners. This group clearly has non-trivial African and Asian ancestry, but perhaps a more intriguing anthropological issue is why the Dutch culture dominates over that of the French and German if the latter contributed so much biologically, as some historians have maintained. Citation: Johnson NA , Coram MA , Shriver MD , Romieu I , Barsh GS , et al. 2011 Ancestral Components of Admixed Genomes in a Mexican Cohort. PLoS Genet 7(12): e1002410. doi:10.1371/journal.pgen.1002410

The history and geography of genomes

Explore the ancestral components of admixed genomes in a Mexican cohort, revealing European and Amerindian genetic influences.

Newsletter

Razib Khan

The Secret to Hibernation Is Hidden in Human DNA and We Might One Day Activate It

Two Cancer Drugs Show Surprising Promise in Treating Alzheimer’s

Vitamin C Promotes Skin Cell Growth to Keep Skin Healthy and Prevent Aging

New Blood Type Discovered in France — And Offers a Breakthrough in Transfusion Medicine

Iron Deficiency Could Trigger Sex Change in Mammals Before Birth

New CRISPR Modification Could Make Fixing Genes More Accurate and Effective

Stephen Hawking's Disease: How ALS Impacts the Body and Progress for Treatment

CRISPR Fulfills Its Promise with First-Ever Personalized Gene-Editing Therapy

Advances in Existing Drugs and Personalized Therapy Could Help Treat Osteoarthritis

The Mysterious Source Behind the Monkeypox Virus Is a Squirrel

A Healthy Prenatal Stage Could Be Key to Preventing Psychiatric Disorders

High-Sugar Diet Linked to Lung Cancer, Expanding Our Understanding of Diet’s Impact

Prenatal Treatment Offers Hope for Infants Born With Spinal Muscular Atrophy

Fat Cells Can Retain a Genetic Memory — Even After Weight Loss

Henrietta Lacks’ Cells Were Taken Without Consent, so How Is Her DNA Protected Today?

Stay Curious

JoinOur List

SubscribeTo The Magazine