Register for an account

X

Enter your name and email address below.

Your email address is used to log in and will not be shared or sold. Read our privacy policy.

X

Website access code

Enter your access code into the form field below.

If you are a Zinio, Nook, Kindle, Apple, or Google Play subscriber, you can enter your website access code to gain subscriber access. Your website access code is located in the upper right corner of the Table of Contents page of your digital edition.

Health

Using your 23andMe data: exploring with MDS

Gene ExpressionBy Razib KhanJanuary 8, 2013 3:09 PM

Newsletter

Sign up for our email newsletter for the latest science news

pca.jpg

Note: please read the the earlier post on this topic if you haven't. The above image is from 23andMe. It's from a feature which seems to have been marginalized a bit with their ancestry composition. Basically it is projecting 23andMe customers on a visualization of genetic variation from the HGDP data set. This is actually a rather informative sort of representation of variation. But there has always been an issue with the 23andMe representation: you are projected onto their invariant data set. In other words, you can't mix & match the populations so as to explore different relationships. The nature of the algorithm and representation produces strange results, so varying the population sets is often useful in smoking out the true shape of things. With the MDS feature I wrote about yesterday you can now compute positions with different weights of populations and mixes. This post will focus on how to manipulate the overall data set. You should have PHYLO from the the earlier post. Open up the .fam file. It should look like this: Malayan A382 0 0 1 -9 Paniya D36 0 0 1 -9 BiakaPygmies HGDP00479 0 0 1 -9 BiakaPygmies HGDP00985 0 0 1 -9 BiakaPygmies HGDP01094 0 0 1 -9 MbutiPygmies HGDP00982 0 0 1 -9 Mandenkas HGDP00911 0 0 1 -9 Mandenkas HGDP01202 0 0 1 -9 Yorubas HGDP00927 0 0 1 -9 BiakaPygmies HGDP00461 0 0 1 -9 BiakaPygmies HGDP00986 0 0 1 -9 MbutiPygmies HGDP00449 0 0 1 -9 Mandenkas HGDP00912 0 0 1 -9 Mandenkas HGDP01283 0 0 1 -9 Yorubas HGDP00928 0 0 2 -9 And so forth. PHYLO has 1,500+ individuals. This is a bit much, which is why the - -genome command took so long. To ask particular questions it is often useful to prune the population down. I have a friend who is 1/4 Filipino who is curious as to whether his ancestry was more Chinese or native Filipino. How to answer this? - You want a range of East Asian populations, north to south. - You want a good out group. I'll use the Utah whites. All you need to do is go through the .fam file and keep only those lines you want, and put them into a new file, keep.txt. Then you run this command: plink - -noweb - -bfile PHYLO - -keep keep.txt - -make-bed - -out PHYLONARROW So I've now made a new pedigree data set which is a subset of the original. Now I merged my friend and my daughter's genotype into this data set. What about if I wanted to remove some individuals, for examples, the ones in keep.txt? You do it like so: plink - -noweb - -bfile PHYLO - -remove keep.txt - -make-bed - -out PHYLOAFEWGONE With - -keep and - -remove, and making files drawn from the .fam file(s), you can customize your own data set for your own purposes. Again you want to produce an MDS, so run: - -plink - -noweb - -bfile PHYLONARROW - -genome -plink - -noweb - -bfile PHYLONARROW - -read-genome plink.genome - -mds-plot 6 This time - -genome will run very fast, because there are far fewer individuals. Here is my plot of the result of the outcome (my friend is "RF," my daughter is "RD"):

final.jpg

Note that RF is aligned straight toward the "Dai" population, an ethnic group from South China, but not Han (they are related to the Thai). It seems plausible that my friend is of mixed Chinese and Filipino background. My daughter's minimal East Asian ancestry is indeed Southeast Asian, and this is clear from this plot, as she is shifted further toward the Cambodians (this may be due to South Asian affinities as well). The point is not to rely on one plot, but to generate many so as to explore the possibilities, and develop and intuition.

2 Free Articles Left

Want it all? Get unlimited access when you subscribe.

Subscribe

Already a subscriber? Register or Log In

Want unlimited access?

Subscribe today and save 70%

Subscribe

Already a subscriber? Register or Log In