Analyzing ancestry with ADMIXTURE, step by step

By Razib Khan
Mar 14, 2011 10:55 PMOct 7, 2019 7:32 PM

Newsletter

Sign up for our email newsletter for the latest science news
 

Over the past few months I was hoping more people would start doing what Zack Ajmal, Dienekes, and David, have been doing. There are public data sets, and open source software, so that anyone with nerdy inclination can explore their own questions out of curiosity. That way you can see the power and the limitations of  genomics on your own desktop. I wonder if one of the biggest reasons that more people haven’t started doing this is formatting. It can be a pain to convert matrix formatted files into pedigree format, for example. But the data gusher isn’t ending, look at what’s coming out (and has come out) in the 1000 Genomes project!

I’ve been thinking I need to write up a post which is a “soft landing” for people so that we can reduce the “activation energy” for this sort of thing…once you get hooked, you only go deeper. Luckily an anonymous tipster has sent me the link to a URL with a huge data set which has been merged, already pedigree formatted. Here are the populations:

!KungBuryatsHausaMadaPunjabi ArainTotonacAdygeiCambodianHazaraMakraniPygmyTuAfrican AmericansChineseHemaMalayanRomaniansTujiaAlgeriaChinese AmericansHezhenMandenkaRussianTunisiaAltaiansChukchisHungariansMayaSahara OccTurksAlurChuvashsIbanMbutiSakilliTuscansAp BrahminCochin JewsIgboMelanesianSamaritiansTuviniansAp MadigaColombianIranian JewsMexicansSamoanUrkarahAp MalaCypriotsIraniansMiaoSanUtahn WhitesArmeniansDaiIraq JewsMongolaSan NbUygurArmenians BDaurIrulaMongoliansSandaweUzbekistan JewsAshkenazy JewsDogonItalianMoroccansSardinianUzbeksAzerbaijan JewsDolgansJapaneseMorocco JewsSaudisVietnameseBalochiDruzeJordaniansMorocco NSelkupsGreenlandersBambaranGreenlandersKabaMorocco SSephardic JewsXhosaBamounEgyptKalashMozabiteSheXiboBantukenyaEgyptansKaritianaN EuropeanSindhiYakutSouth AfricaEthiopian JewsKetsNaxiSingapore ChineseYemen JewsBasqueEthiopiansKhmerNepaleseSingapore IndiansYemeneseBedouinEvenkisKongoNganassansSingapore MalayYiBeijing ChineseFangKoryaksNguniSlovenianYorubaBelorussianFrenchKurdNorth KannadiSotho/TswanaYukaghirsBiakaFulaniKyrgyzstaniOrcadianSpaniards Bnei MenasheGeorgia JewsLahuOroqenStalskoe BolivianGeorgiansLebanesePalestinianSurui BrahuiGujaratisLezginsPaniyaSyrians BrongGujaratis BLibyaPapuanThai BulalaHadzaLithuaniansPathanTamil Brahmin BurushoHanLuhyaPediTamil Dalit BuryatHan NchinaMaasaiPimaTongan

The data set has ~4,000 individuals, and ~30,000 markers. The binary file is ~25 MB. The download has four files. The .bed, .bim, and .fam, are pedigree formatted. The .csv is a “master list” of the information on each individual (population, region, etc., tied to a specific identification number). This is important because once you have some output files…you need to figure out what it means, and visualize it, and that’s only informative if you have a master list with more than just family and individual information.

Here is the link to the file to download with all the above populations. I’ve pulled it down and run it, so I know it’s not malware.

0 free articles left
Want More? Get unlimited access for as low as $1.99/month

Already a subscriber?

Register or Log In

0 free articlesSubscribe
Discover Magazine Logo
Want more?

Keep reading for as low as $1.99!

Subscribe

Already a subscriber?

Register or Log In

Stay Curious

Sign up for our weekly newsletter and unlock one more article for free.

 

View our Privacy Policy


Want more?
Keep reading for as low as $1.99!


Log In or Register

Already a subscriber?
Find my Subscription

More From Discover
Stay Curious
Join
Our List

Sign up for our weekly science updates.

 
Subscribe
To The Magazine

Save up to 40% off the cover price when you subscribe to Discover magazine.

Copyright © 2025 LabX Media Group