My post below outlining the possible future of genomics and intelligence made me recall a paper from last fall, Predicting Unobserved Phenotypes for Complex Traits from Whole-Genome SNP Data:
Results from recent genome-wide association studies indicate that for most complex traits, there are many loci that contribute to variation in observed phenotype and that the effect of a single variant (single nucleotide polymorphism, SNP) on a phenotype is small. Here, we propose a method that combines the effects of multiple SNPs to make a prediction of a phenotype that has not been observed. We apply the method to data on mice, using phenotypic and genomic data from some individuals to predict phenotypes in other, either related or unrelated, individuals. We find that correlations between predicted and actual phenotypes are in the range of 0.4 to 0.9. The method also shows that the SNPs used in the prediction appear in regions that are known to contain genes associated with the traits studied. The prediction of unobserved phenotypes from high-density SNP data and appropriate statistical methodology is feasible and can be applied in human medicine, forensics, or artificial breeding programs.
The number of QTLs for the traits here is rather small, on the order of 15. Here's some interesting numbers:
For the data set on mice (~2200 individuals and ~10,000 SNP), it took ~15 minutes with a single CPU (~2 GHz), which compares favourably to a number of other computing strategies on the same data set...Assuming that computing time increases linearly with the number of individuals and markers, the method would run within one week even if the data set was large (e.g. 10,000 individuals with 1,000,000 SNPs). More time may be required to adequately monitor convergence, however parallel computing strategies would be useful here...Therefore, the methods described in this study can scale up to much larger data sets.
There's obviously a big difference between 2,200 mice and 2,200 humans. And it looks like these traits had relatively big effect loci controlling the variation, with moderate to very high heritabilities (~0.50 to almost 1.0).