Last week Luke Jostins (soon to be Dr. Luke Jostins) published an interesting paper in Nature. To be fair, this paper has an extensive author list, but from what I am to understand this is the fruit of the first author's Ph.D. project. In any case, you may know Luke because I have used his loess curve on hominin encephalization for years. His bread & butter is statistical genetics, and it shows in this Nature paper. God knows how he managed to cram so much density into ~5.5 pages of plain text. Luke is also a contributor to Genomes Unzipped, and has put up a post over there on one implication of the paper, Dozens of new IBD genes, but can they predict disease? The short answer is that for individual prediction complex traits are going to be a hard haul over the long term.* They are subject to what Jim Manzi would term "high causal density." A simple way to state this is that outcome X is dependent on a host of variables, and if you capture only a small number of variables, you aren't going to be explaining much in a general fashion. This is obvious from the text of Luke's paper. Let' look at the abstract, Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease:
Crohn's disease and ulcerative colitis, the two common forms of inflammatory bowel disease (IBD), affect over 2.5 million people of European ancestry, with rising prevalence in other populations...Genome-wide association studies and subsequent meta-analyses...have implicated previously unsuspected mechanisms...Here we expand on the knowledge of relevant pathways by undertaking a meta-analysis of Crohn's disease and ulcerative colitis genome-wide association scans, followed by extensive validation of significant findings, with a combined total of more than 75,000 cases and controls. We identify 71 new associations, for a total of 163 IBD loci, that meet genome-wide significance thresholds. Most loci contribute to both phenotypes, and both directional (consistently favouring one allele over the course of human history) and balancing (favouring the retention of both alleles within populations) selection effects are evident. Many IBD loci are also implicated in other immune-mediated disorders, most notably with ankylosing spondylitis and psoriasis. We also observe considerable overlap between susceptibility loci for IBD and mycobacterial infection. Gene co-expression network analysis emphasizes this relationship, with pathways shared between host responses to mycobacteria and those predisposing to IBD.
The numbers tell the tale here. This is a massive GWAS study, with ~75,000 cases and controls. And yet what does that gain us? I'll let the text speak here: "We have increased the total disease variance explained (variance being subject to fewer assumptions than heritability7) from 8.2% to 13.6% in Crohn’s disease and from 4.1% to 7.5% in ulcerative colitis." This is not trivial. But it is exactly the kind of incremental increase in knowledge that systems characterized by high causal density will yield, even granting herculean efforts at data collection. I believe that studies like this, with "best-of-breed" methods, are important, because cohorts of tens of thousands, and perhaps hundreds of thousands, are not going to be unusual in the near future. The hope is that geneticists keep pushing the boulder up the hill, every so slightly. If not individual prediction, then is there another value to this sort of work? First, one can still generate drug discovery from small genetic effects. And a major aspect of the paper above is that the authors are localizing classes of genes likely to be implicated in these illnesses. Not only that, they report that many of the pathogenic variants may not be SNPs, but structural variants of some sort. In other words, massively scaled up GWAS holds not the promise of individual prediction, but a fuller and better systematic knowledge of the human organism in the aggregate. Finally, there is one aspect of the paper which jumped out at me because I'm not a practical person with biomedical interests first and foremost. Jostins et al. report that many of these loci seem to be subject to either directional or balancing selection. The latter is not unexpected to me. Many of the loci have immunological associations, and host-pathogen coevolution is assumed to be governed by negative frequency dependence. In other words, when slow reproducing organisms develop an effective anti-pathogen strategy, the pathogens adapt very quickly. But at this point the lower frequency strategies are now more fit, and effective against the pathogens, who are localized on a narrow adaptive peak. But what about directional selection? My working assumption here is that high density living and the protean conditions of the post-hunter-gatherer world have reshaped the genome of most humans a great deal. Now recall that immediate adaptations often have deleterious consequences. They're kludges. When a problem is confronting you you reach for the closest and easiest solution, even if in the infinite space of possibilities there are more optimal solutions. You don't have the time, energy, or choice, frankly. For what it's worth Crohn's is more frequent in Ashkenazi Jews in relation to the population wide average (though one can posit environmental rationales for this; there's high causal density popping up again!). The moral of the story is that many complex traits and diseases may simply be the wages of adaptation itself. Even in an environmentally unperturbed context it is difficult to imagine a situation where endemic host-pathogen coevolution wouldn't result in fluctuations in gene frequencies which might have deleterious consequences. This may be the best of all worlds, though all the most optimal worlds may be characterized by a familiar mediocrity in physiological fitness. Citation: doi:10.1038/nature11582 * IBD here = Inflammatory bowel syndrome, not identical by descent!