Register for an account


Enter your name and email address below.

Your email address is used to log in and will not be shared or sold. Read our privacy policy.


Website access code

Enter your access code into the form field below.

If you are a Zinio, Nook, Kindle, Apple, or Google Play subscriber, you can enter your website access code to gain subscriber access. Your website access code is located in the upper right corner of the Table of Contents page of your digital edition.


Do you want your genotype in a public data set?

Gene ExpressionBy Razib KhanJanuary 16, 2013 12:54 PM


Sign up for our email newsletter for the latest science news

In the near future one of my projects is revising and expanding the "PHYLO" pedigree file which I put up a week ago. Basically I want there to be a public data set which has a modest number of SNPs useful for phylogenetic analysis (100-200,000) with a wide population coverage. Additionally, I am going to do a few things like rename the family ids to populations, and also release it with scripts to help in running Admixture (for example, shell scripts which will automate replication and later analysis of replicates). Finally, I'm planning on running ~50 replicates of K = 2 to K = 20 with 10-fold cross-validation (yes, this is will take a while) to get a good sense of the "best" K's. The reality is that most people probably are only interested in the "most informative" K, +/- 1, so there's no need for everyone to run K = 2 to K = 20. The time saved should be used on running replicates, and then CLUMPP to merge the results. I would say that this is for 'amateurs' only, but I don't think it's betraying confidence to observe that several academic researchers at prominent institutions have ended up inquiring of me of how to get good public data sets. This sort of information still hasn't percolated to the general public, including scientists who don't work on population genomics. After a few trial runs with public data sets people with academic access could move to things like the POPRES data set. But the ultimate point of this post is to ask: do you want to be in this data set? If so, I need the file (23andMe format is fine, otherwise, pedigree files only), your name, and some minimal ethnic information. I'm not going to add everyone. I just want to diversify the public data set a little. But I am going to put names in the sample sheet, so you won't have anonymity. As you know I don't particular care about this personally, but your mileage may vary. Researchers might need to contact or check that people are who they are. Email: contactgnxp -at- gmail -dot- com

3 Free Articles Left

Want it all? Get unlimited access when you subscribe.


Already a subscriber? Register or Log In

Want unlimited access?

Subscribe today and save 70%


Already a subscriber? Register or Log In