Researchers claim achievement has generated largest catalog of genetic data ever assembled.
Researchers involved in the international 1000 Genomes Project have published data from the three studies that constituted the pilot phase of the four-year initiative. The results, from the sequencing of 800 individuals, describe the location, allele frequency, and local haplotype structure of approximately 15 million SNPs, 1 million short insertions and deletions, and 20,000 structural variants. All data is being made publicly available so research groups can benefit from the findings in combination with their own genome-wide association studies. The pilot-phase results are available at www.1000genomes.org and have been detailed in an article in Nature titled “A map of human genome variation from population-scale sequencing.”
“Having a systematic catalogue of human variation changes the way we can study human genetics, much in the same way as having a catalogue of human genes did,” says Paul Flicek, Ph.D., who led the team at EMBL-EBI. “Among other things, it also gives us a platform for analyzing the connections between genes and an individual’s disease risks.” The collated findings from this first phase suggest that on average, each person carries approximately 250 to 300 loss-of-function variants in annotated genes, and 50 to 100 variants previously implicated in inherited disorders.
The overall aim of the 1000 Genomes Project is to discover, genotype, and provide accurate haplotype information on all forms of human DNA polymorphisms in multiple human populations. By sequencing and analyzing the genomes of 2,500 individuals, the ultimate goal is to characterize over 95% of variants in genomic regions accessible to current high-throughput sequencing technologies, and that are found in at least 1% of each of five major population groups. These include populations in or with ancestry from East Asia, South Asia, West Africa, and the Americas.
The aim of the now-completed pilot phase of the 1000 Genomes endeavor was to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. Three studies were undertaken: low-coverage sequencing of 179 individuals; deep sequencing of six individuals in two trios; and exon sequencing of 8,140 exons in 697 individuals.
The pilot phase was completed in less than two years, and identified about 8 million genetic variants that had never been described before, states the 1000 Genome Project’s co-chair Richard Durbin, Ph.D., from the Wellcome Trust Sanger Institute in the U.K. “The amount of information delivered by this first stage of the project is remarkable,” he notes. “This is the largest catalogue of its kind, and having it in the public domain will help maximize the efficiency of human genetics research.”
The pilot study was financed by the U.K.’s Wellcome Trust and a number of national funding bodies, including the U.S. National Institutes of Health and agencies in China and Germany. The 1000 Genomes project is co-chaired by David Altshuler, M.D., Ph.D., associate professor of genetics and medicine at Harvard Medical School, Massachusetts General Hospital.