Researchers at the J. Craig Venter Institute (JCVI) sequenced the entire genome of an individual, J. Craig Venter, Ph.D., covering both sets of chromosomes that were inherited from each parent. This new genome, known as the HuRef version, represents the first time a true diploid genome from one individual has been published, according to the team.
Currently, two other versions of the human genome including one from Cellera exist. These genomes were not of any single individual. They were a melding of DNA from various people.
The goal of the investigators at the JCVI was to construct a reference human genome based on one individual. Building on reanalyzed data from Dr. Venter’s genome that constituted 60% of the previously published Celera genome, the team produced additional data making the final 32 million sequences. They used whole genome shotgun sequencing and long reads from Sanger dideoxy automated DNA sequencing.
From the combined data set of more than 20 billion bp, the researchers were able to assemble the human genome with an overall length of 2.810 billion bp. The genome was covered 7.5 times, ensuring that each set of contributing chromosomes was covered over 3.2 times for greater than 96% coverage of the two parental genomes, according to JCVI.
The team compared and contrasted the new HuRef diploid genome sequence to earlier versions of published human genomes and found that the HuRef version provided more and correctly oriented base pairs, they report.
Since the HuRef genome is diploid, each of the parental chromosomes could be directly compared to each other. One of the findings from this research was the high degree of genetic variation that was found between two chromosomes within a single individual, according to the investigators.
“With this publication, we have shown that human-to-human variation is more than seven-fold greater than earlier estimates, proving that we are in fact very unique individuals at the genetic level,” says Dr. Venter. In the analysis of Dr. Venter’s genome, the team report finding a total of 4.1 million variants covering 12.3 million bp of DNA, with more than 1.2 million new variants discovered.
Of the 4.1 million variations between chromosome sets, 3.2 million were SNPs, while nearly one million were other kinds of variants such as insertion/deletions, copy number variants, block substitutions, and segmental duplications. While the SNPs outnumbered the non-SNP types of variants, the non-SNP variants involved a larger portion of the genome.
The team also used the 4.1 million variant set and new algorithms to build haplotype assemblies that, when compared to the HapMap project, represented longer and more complete linkages. The JCVI researchers expect this number to improve as additional sequence coverage is added to HuRef.
This research was performed in collaboration with The Hospital for Sick Children and the University of California San Diego. It is published in the latest issue of PLoS Biology.