Register for an account


Enter your name and email address below.

Your email address is used to log in and will not be shared or sold. Read our privacy policy.


Website access code

Enter your access code into the form field below.

If you are a Zinio, Nook, Kindle, Apple, or Google Play subscriber, you can enter your website access code to gain subscriber access. Your website access code is located in the upper right corner of the Table of Contents page of your digital edition.


The coincidental intersection of sociology & genetics

Gene ExpressionBy Razib KhanApril 21, 2011 8:57 AM


Sign up for our email newsletter for the latest science news

Hispanic - Definitions in the United States:

The 1970 Census was the first time that a "Hispanic" identifier was used and data collected with the question. The definition of "Hispanic" has been modified in each successive census. The 2000 Census asked if the person was "Spanish/Hispanic/Latino". The U.S. Office of Management and Budget currently defines "Hispanic or Latino" as "a person of Mexican, Puerto Rican, Cuban, South or Central American, or other Spanish culture or origin, regardless of race."

Because Hispanics can be any race, you need to look at their own self-identification. The breakdowns as per the American census are that somewhat over 50% of American Hispanics/Latinos identify as white, most of the rest as "some other race," with a small minority as black, Native American, etc. This came to mind when I saw this paper in BMC Genetics, Comparing self-reported ethnicity to genetic background measures in the context of the Multi-Ethnic Study of Atherosclerosis (MESA). The issue is that when you're doing association studies between genes and diseases you want to control for population structure. For example, if disease X is found in Chinese Americans to a higher degree than the general population, then all the alleles distinctive to Chinese Americans would correlate with disease X in an aggregated pool. Self-reports are pretty good, but on the margin there is now some juice to squeeze out of the data sets by using ancestrally informative markers to "clean up" the outliers within the populations. Here are the results:

Four clusters are identified using 96 ancestry informative markers. Three of these clusters are well delineated, but 30% of the self-reported Hispanic-Americans are misclassified. We also found that MESA SRE provides type I error rates that are consistent with the nominal levels. More extensive simulations revealed that this finding is likely due to the multi-ethnic nature of the MESA. Finally, we describe situations where SRE may perform as well as a GBMA in controlling the effect of population stratification and admixture in association tests.

Below is a principal component analysis plot which illustrates the largest dimensions of genetic variation in their data set for the individuals from four different populations, African Americans, European Americans, Hispanic Americans, and Chinese Americans. I thought of the above census results when I saw the distributions on the plot:


Granted, there is a big difference between genetic admixture in populations which can vary over a continuous range, and the artificial binning you see in census categories. But the 50% white vs. 50% non-white (white + other) corresponds reasonably well to the PCA in my mind....

3 Free Articles Left

Want it all? Get unlimited access when you subscribe.


Already a subscriber? Register or Log In

Want unlimited access?

Subscribe today and save 70%


Already a subscriber? Register or Log In