I noticed today an interesting paper in Genetics by Simon Gravel, Population Genetics Models of Local Ancestry. As indicated by the title this is a general paper where the method is the main course. But, there was an interesting empirical result which I want to highlight:
Comparing the ancestry variance from the African-American data to those predicted by the demographic models, we find that the pulse model predicts a genealogy variance of 0.0005, whereas the variance in the model with two distinct pulses is 0.002. The total variance in the African-American sample is 0.0047, of which we infer that 0.0041 is due to genealogy variance (using the method described in Appendix 3). Thus the model with two pulses of migration is again more realistic than the single pulse model; the fact that it still underestimates the variance can be due to a combination of factors that have not been modeled: our demographic model may be underestimating low level, very recent migration because of the parameterization as two discrete pulses of migration, and both population structure and errors in ancestry assignment may be adding to the observed variance.
To the left is a screenshot which represents a slice of the technical meat of the paper. Most people aren't going to be able to penetrate this. So how to evaluate? The author presents an empirical prediction. I've read a bit about American slavery, a few years back, and I don't recall any mention of two pulses. This isn't too surprising, as there wasn't that much cliometrics. But if this is attested in the literature it would certainly increase my confidence in the utility and power of the method of the paper. By their fruits you shall know them! More generally, this sort of analysis of phased data sets is obviously the future. A more detailed topography of genomic variation is going to open up a huge window onto the human past.