I got my daughter a netbook, so now my computer is doing Harappa Prohect work 24×7. Also, Simranjit was nice enough to offer me the use of a server. For privacy reasons, I am not going to upload any of the participants’ data there but it is much faster than my machine and hence very useful for running Admixture on the reference data (especially with crossvalidation).
As for steps back, I downloaded the current 1000genomes data (1,212 samples, 2.4 million SNPs). It’s in vcf format. Using vcftools to convert it to ped format will take about 3 weeks.
Yes you heard that right. BTW, the good stuff from a South Asian point of view will come later this year with a 100 Assamese AhomF, 100 Kayadtha from Calcutta, 100 Reddys from Hyderabad, 100 Maratha from Bombay and 100 Lahori Punjabis. Also, I spent most of Sunday evening and night in the ER and got a diagnosis of ureterolithiasis for my efforts. All I can say is: Three cheers for Percocet!!
First, wish Zack well. Second, he has over 70 individuals in the Harappa Ancestry Project data base (in addition to the public data sets). If you're South Asian, Iranian, Burmese, or Tibetan, here are the details of participation.