A few weeks ago I put up a new data set into my repository. As is my usual practice now the populations can be found in the .fam file. But I've added more into this. I have to rewrite my ADMIXTURE tutorial soon, so I thought I would bring up an important issue when interpreting these data sets using clustering methods: one has to understand that conclusions can not rest on one single result. Rather, one must attempt to ascertain the statistical robustness of the results. If you arrive at an expected result this is obviously not as important a consideration, but if you arrive at a novel and surprising result, then you have to make sure that it isn't simply a fluke. To do this I have been running my PHYLOCORE data set with cross-validation (regular 5-fold). In theory you should be able to see where the value is ...
Confidence in inference in phylogenetic data sets
Understanding the statistical robustness of results is crucial in interpreting genetic data. Learn more about replicates and guidelines.
More on Discover
Stay Curious
SubscribeTo The Magazine
Save up to 40% off the cover price when you subscribe to Discover magazine.
Subscribe