Register for an account


Enter your name and email address below.

Your email address is used to log in and will not be shared or sold. Read our privacy policy.


Website access code

Enter your access code into the form field below.

If you are a Zinio, Nook, Kindle, Apple, or Google Play subscriber, you can enter your website access code to gain subscriber access. Your website access code is located in the upper right corner of the Table of Contents page of your digital edition.


Genomics to complex traits

Gene ExpressionBy Razib KhanFebruary 18, 2009 3:01 PM


Sign up for our email newsletter for the latest science news

My post below outlining the possible future of genomics and intelligence made me recall a paper from last fall, Predicting Unobserved Phenotypes for Complex Traits from Whole-Genome SNP Data:

Results from recent genome-wide association studies indicate that for most complex traits, there are many loci that contribute to variation in observed phenotype and that the effect of a single variant (single nucleotide polymorphism, SNP) on a phenotype is small. Here, we propose a method that combines the effects of multiple SNPs to make a prediction of a phenotype that has not been observed. We apply the method to data on mice, using phenotypic and genomic data from some individuals to predict phenotypes in other, either related or unrelated, individuals. We find that correlations between predicted and actual phenotypes are in the range of 0.4 to 0.9. The method also shows that the SNPs used in the prediction appear in regions that are known to contain genes associated with the traits studied. The prediction of unobserved phenotypes from high-density SNP data and appropriate statistical methodology is feasible and can be applied in human medicine, forensics, or artificial breeding programs.

The number of QTLs for the traits here is rather small, on the order of 15. Here's some interesting numbers:

For the data set on mice (~2200 individuals and ~10,000 SNP), it took ~15 minutes with a single CPU (~2 GHz), which compares favourably to a number of other computing strategies on the same data set...Assuming that computing time increases linearly with the number of individuals and markers, the method would run within one week even if the data set was large (e.g. 10,000 individuals with 1,000,000 SNPs). More time may be required to adequately monitor convergence, however parallel computing strategies would be useful here...Therefore, the methods described in this study can scale up to much larger data sets.

There's obviously a big difference between 2,200 mice and 2,200 humans. And it looks like these traits had relatively big effect loci controlling the variation, with moderate to very high heritabilities (~0.50 to almost 1.0).

    3 Free Articles Left

    Want it all? Get unlimited access when you subscribe.


    Already a subscriber? Register or Log In

    Want unlimited access?

    Subscribe today and save 70%


    Already a subscriber? Register or Log In