Technique Developed for Refining Multi-Source Genomic Data
A scientific team from the National Institute of Standards and Technology, Harvard, and the Virginia Bioinformatics Institute says they have developed new methods to integrate data from different sequencing platforms, thus producing a reliable set of genotypes to benchmark human genome sequencing.
“Understanding the human genome is an immensely complex task and we need great methods to guide this research,” says Justin Zook, Ph.D., of NIST. “By establishing reference materials and gold-standard datasets, scientists are one step closer to bringing genome sequencing into clinical practice.”
The techniques put forth by the researchers are designed to make it increasingly possible to use an individual's genetic profile to guide medical decisions to prevent, diagnose, and treat diseases. Their report (“Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls”) appears in Nature Biotechnology.
“We present methods to make high-confidence, single-nucleotide polymorphism, indel and homozygous reference genotype calls for NA12878, the pilot genome for the Genome in a Bottle Consortium,” write the investigators.
“We minimize biases toward any sequencing platform or dataset by comparing and integrating 11 whole human genome and three exome datasets from five sequencing platforms,” says Dr. Zook.
NIST organized the Genome in a Bottle Consortium to make well-characterized, whole-genome reference materials available to research, commercial, and clinical laboratories.
The team addressed the challenge with the expertise of David Mittelman, Ph.D., an associate professor of biological sciences at the Virginia Bioinformatics Institute. He and his group create tools that analyze vast amounts of genomic information.
The researchers came up with a metric to determine the accuracy of gene variations and understand biases and sources of error in sequencing and bioinformatics methods. Their findings are available to the public on the Genome Comparison and Analytic Testing website, known as GCAT, to enable real-time benchmarking of any DNA-sequencing method. The collaborative, free online resource compares multiple analysis tools across a variety of crowd-sourced metrics and datasets.