Effort will attempt to sequence SNPs and structural variants of over 1,000 people.
An international research consortium initiated the 1000 Genomes Project to create the most detailed and medically useful picture to date of human genetic variation, the NIH reports. The project will receive support from the Wellcome Trust Sanger Institute, the Beijing Genomics Institute (BGI), and the NHGRI.
The scientific goals of the 1000 Genomes Project are to produce a catalog of variants that are present at 1% or greater frequency in the human population across most of the genome and down to 0.5% or lower within genes. The effort will sequence the genomes of at least a thousand people from around the world.
Going beyond the HapMap, the 1000 Genomes Project will map not only SNPs but will also produce a high-resolution map of structural variants.
In the first phase of the 1000 Genomes Project, lasting about a year, researchers will conduct three pilots. The results of the pilots will be used to decide how to most efficiently and cost effectively produce the project’s map of human genetic variation. The first pilot will involve sequencing the genomes of two nuclear families (both parents and an adult child) at deep coverage that averages 20 passes of each genome.
The second pilot will involve sequencing the genomes of 180 people at low coverage that averages two passes of each genome. The third pilot will involve sequencing the exons of about 1,000 genes in about 1,000 people. This is aimed at exploring how best to obtain an even more detailed catalog in the approximately 2% of the genome that is composed of protein-coding genes.
During its two-year production phase, the 1000 Genomes Project expects to deliver sequence data at an average rate of about 8.2 billion bases per day. The first thousand samples for the 1000 Genomes Project will come from those used for the HapMap and from additional samples in the extended HapMap set. These people will be anonymous and will not have any medical information collected on them.
Among the populations whose DNA will be sequenced in the 1000 Genomes Project are Yoruba in Ibadan, Nigerian, Japanese in Tokyo, Chinese in Beijing, Utah residents with ancestry from Northern and Western Europe, Luhya in Webuye, Kenyan, Maasai in Kinyawa, Toscani in Italy, Gujarati Indians in Houston, Chinese in metropolitan Denver, people of Mexican ancestry in Los Angeles, and people of African ancestry in southwestern U.S.
The data generated by the 1000 Genomes Project will be held by and distributed from the European Bioinformatics Institute and the National Center for Biotechnology Information. There will also be a mirror site for data access at BGI. In addition to a catalog of variants, the data will include information about surrounding variation that can speed identification of the most important variants.
The sequencing work will be carried out at the Sanger Institute, BGI, and NHGRI’s Large-Scale Sequencing Network, which includes the Broad Institute, the Washington University Genome Sequencing Center, and the Human Genome Sequencing Center at the Baylor College of Medicine. The consortium may add other participants over time.