Ready for Medical Grade?
“One of the big problems with big data is lack of quality standards and, therefore, lack of performance metrics such as accuracy of assemblers, accuracy of genotyping calls, detection limits of variants, etc.,” said Justin Johnson, director of bioinformatics, EdgeBio. “How do we know that the next-gen sequencing results are, indeed, accurate?”
EdgeBio leads the development of the validation protocol underwritten by the X Prize Foundation, a nonprofit organization that creates and manages global competitions to solve challenges facing humanity. The Archon Genomics Xprize presented by Express Scripts, a $10 million award, will be given to the first team to sequence the genomes of 100 centenarians in 30 days cheaply, accurately, and completely.
The genomes must be sequenced with an error rate of one in one million bases. At this level of quality, the resulting sequences are moving toward “medical-grade”, meaning that the data may be used in clinical care decisions. The purpose of the validation protocol is first to develop an answer key, and second to create an automated scoring system against the answer key.
To create the answer key, EdgeBio made 5,000 fosmids (cloned portions of the genome, about 200 MB) from two well-known reference samples, Yoruba Male and CEU Female. The fosmids were sequenced by three different methods to reveal the extent of the bias due to a particular sequencing platform.
“About 15% of single nucleotide polymorphisms (SNPs) can be attributed to sequencing technologies,” continued Johnson. “We evaluated the discordance between platforms and used multiple statistical algorithms to annotate true positives and true negatives.”
Next, EdgeBio developed software to compare the answer key with other sequencing results from the same two reference samples. The algorithm scores the results and produces the quality report. The company integrated the upload of test sequences, comparison, scoring, and reporting into a workflow with an intuitive interface (www.validationprotocol.org).
Even before the XPrize, EdgeBio was deeply invested in clinical sequencing and received CLIA certification in 2012.
“While the significance of whole genome is still not quite established, medical-grade sequencing of exomes, targeted gene pools, or transcriptomes may provide clinically actionable information,” said Johnson. “Development of performance metrics will speed up the incorporation of next-gen technologies into clinical diagnostics.”
“Simply annotating and aligning DNA sequences is not enough to discover their biomedical value,” said Martin Seifert, Ph.D., CEO, Genomatix. “To perform meaningful analysis of their sequencing data, the researchers need to view it in combination with existing biological knowledge.”
The Genomatix Genome Analyzer (GGA) enables visualization of NGS data in a context of multiple databases containing a comprehensive compilation of information on transcriptional regulation, DNA binding sites, epigenomic spots, and signaling networks.
“Knowledge datasets are available for 33 different organisms adding up to several terabytes of data. Cross-organism comparisons help assign meaning to genetic elements for which the function is not yet understood.”