In yesterday's post on African genetics I tried to work with a large set of populations, but narrowed SNPs down to ~40,000. Today I thought I'd go another route, focus on having a thicker market set, but with fewer populations. So I did a bunch of runs with 400,000 SNPs. Here's K = 8. Please note, I did some "trial" runs and pulled out people with obvious admixture which was recent or an outlier within their population. (e.g., Mozabites with a lot of Sub-Saharan African or San which obviously had European ancestry).
Notice that there are three non-Sub-Saharan modal components. South of the Sahara the European one is absent. But here's the weird thing. Below are MDS representations of genetic distance between the ancestral groups inferred above:
All of these "ancestral" groups are abstractions. More plainly, they're fake but useful (physicists would say "toy models," economists "stylized facts"). But the Nilotic one seems kind of crazy here. It told the program to go look for 8 populations. It went and looked, and came back with some with a weird one. I guess that means I'll have to do cross-validation from now on, even though that slows everything down.