Early this year I received an email from Dr. Peter Ralph, inquiring if I might discuss some interesting statistical genetic results from analyses of the POPRES data set which might have historical relevance. I've been excitingly waiting for the preprint to be made public so it could trigger some wider discussion. I believe that the methods outlined in the paper perhaps show us a path into the near future, where we might gain a much sharper perspective upon the recent past. So it's finally out, and you can read it in full. Ralph and Dr. Graham Coop have posted put it up at arXiv, The geography of recent genetic ancestry across Europe. The paper uses ~500,000 SNPs from the POPRES data set individuals, and looks at patterns of identity by descent as a function of geography. By identity by descent, we're talking about segments of the genome which are derived from a common ancestor. Because of recombination the length of the segments can give us a sense of the date of the last common ancestor; long segments indicate more recent ancestry because fewer recombination events have chopped up sequence.
Here's the big takeaway of the paper: ...There is substantial regional variation in the number of shared genetic ancestors: especially high numbers of common ancestors between many eastern populations likely date to the Slavic and/or Hunnic expansions,while much lower levels of common ancestry in the Italian and Iberian peninsulas may indicate weaker demographic effects of Germanic expansions into these areas and/or more stably structured populations. Recent shared ancestry in modern Europeans is ubiquitous, and clearly shows the impact of both small-scale migration and large historical events....
When I first saw the panels above, which illustrate the proportion of IBD sharing between populations, with the starred population being the focal one, I immediately thought of the early medieval Slavic expansion. There is already evidence of this in the genetic data. Dienekes noted that modern Greeks seem to have a significant component of "northern" ancestry, which is attenuated in Turks, and nearly absent in Greek Cypriots. These results suggest that the peoples of eastern Europe share a very large number of common ancestors within the past 1,500 years, irrespective of geographic distance (note that it is difficult to observe a decay in the size of the circles, which is more evident in other panels).
The other surprising pattern, at last to me, is the deep structure of the Italian population. These results imply that Italian relatedness has a notably deeper time depth than that of other European nation-states. I'll quote the authors here: This suggests signicant substructure and large population sizes within Italy, strong enough that dierent groups within Italy, share as little recent common ancestry as other distinct, modern-day countries, substructure that was not homogenized during the migration period. These patterns could also reflect in part a history of settlement of Italy from various sources, including: settlement of Greeks in southern Italy, settlement of Illyrians in eastern Italy, and an influx of people from across the Roman empire, including gene flow from Africa...but is unlikely to be entirely due to these effect. Spain seems to exhibit the same distinctness from the rest of Europe as Italy, but has a much more normal pattern of IBD, with shallower time depth to common ancestry.
There are plenty of other possible inferences one could make. For example, is the negative correlation between IBD tracts in individuals of UK origin affiliated with Germany and Ireland a function of a difference in Celtic and Germanic ancestry dating to the Dark Ages, or is it simply due to the fact that the United Kingdom has had a recent wave of Irish ancestry in the 19th century, or perhaps just a natural result of a geographic continuum and isolation by distance? The last is an issue which will need addressing in the future. The authors make the case that because of the power of the IBD method one can make inferences without a finer geographic granularity, but what is sufficient for statistical genetics is not sufficient for historical-demographic inference. The POPRES data set was collected in London and Lausanne, and there are limits to how much geographic information you can squeeze out of this. I assume in the near future these sorts of methods which infer IBD tracts will be applied extensively, so this is just here to whet out appetites.
This paper has a wealth of results. You can create many stories. But to create credible stories you need "thick" and "deep" knowledge. So I invite readers to dig through the results and see what jumps out at them. It's no cost to you, and I don't think the time spent pursuing this material is going to be time wasted.