March 1, 2016 (Vol. 36, No. 5)
Purpose-Built for Genomics, Dragen Processor Could Form Core of Clinic-Ready Data Systems
The world’s first processor expressly designed to perform secondary analysis of next-generation sequencing data is relieving the bottleneck between sequencing and practical application. Setting a new standard for speedy analysis, the processor recently allowed a provider of clinical sequencing services to analyze sequenced genomes in just 26 minutes. Moreover, it did so while maintaining high sensitivity and specificity, even though conventional analysis normally takes around 30 hours.
The Dragen™ Bio-IT Processor, developed by Edico Genome, performs reference-based mapping, aligning, sorting, deduplication, and variant calling using optimized algorithms. During the 26-minute-long analysis just mentioned, Dragen processed raw sequence data from a single whole genome, converting the data from base call format (BCL), the sequencing instrument’s native format, to variant call format (VCF) at 30× coverage. By maintaining this level of performance, the processor was able to analyze 50 whole human genomes within 24 hours. Earlier, when the processor was used to enable a record-breaking 26-hour diagnosis of critically ill newborns at Children’s Mercy Kansas City, it exhibited a sensitivity and specificity of 99.5%.
Reduced Computational Overhead
By locating the processor near the gene sequencer, Dragen eliminates the need to upload massive quantities of data to other systems for analysis. This reduces the computational overhead for both analysis and storage.
Data from a single genome, straight from the genome sequencer, typically requires more than 200 GB of storage. “Our analytic output is a couple of magnitudes smaller, at about 200 MB,” notes Pieter van Rooyen, Ph.D., Edico Genome’s president and CEO. “Researchers, therefore, can analyze data locally, while also retaining raw data from joint genotyping analyses.”
“Ultimately,” declares Dr. van Rooyen, “we plan to enable results to be uploaded to a hybrid cloud.”
In their work, Edico Genome researchers took account of confluent developments in genomics and IT. Genomic sequencing platforms have become so powerful, and genomic data sets so extensive, that better analytical platforms have become increasingly necessary. At the same time, IT technologies such as consumer electronics, semiconductors, and big data keep improving. By bringing these genomic and IT trends together, the company reasoned, it might be possible to achieve a breakthrough.
The use of big data often depends on computational clouds to provide the processing power and storage, but many companies aren’t comfortable storing sensitive data in the public cloud. Dragen means to resolve that concern through the hybrid cloud. Specifically, Dragen proposes that researchers perform their analyses locally with the Dragen Bio-IT Processor on their own servers and, eventually, in the IBM Watson cloud environment using Dragen’s Platform as a Service (PaaS) option.
According to Edico Genome, researchers using Dragen could perform whole-genome analysis on site and leverage the speed and scalability of the cloud or the privacy and security of their in-house computing platforms without sacrificing power, efficiency, or analytic quality. The idea, says Dr. van Rooyen, is to strike a “good balance.”
Regardless where the analysis is performed, researchers can use the same file names and access data the same way. Storage is transparent to the user. Consequently, Edico Genome suggests, researchers can be assured of always accessing the most current data.
Scalability
The Dragen Bio-IT Processor is integrated on a PCIe card, an arrangement meant to achieve the fastest possible data transfer. Because the system is available on a preconfigured server, workflow integration near the point of initial data generation is possible. Ultimately, says Edico Genome, availability through IBM’s Watson will make it easier to share data with colleagues, to run in-house or third party apps atop, and to scale for testing. Scalability can range from small three-person studies to large population studies.
Dragen recently was chosen by Macrogen, one of the world’s largest sequencing centers, for its large-scale genomic analysis and clinical analysis services. Using Dragen, Macrogen analyzed each genome sequenced by the HiSeq X Ten platform at 30× coverage in 26 minutes, including the time needed to convert files from BCL to standard FASTQ format and process through to VCF.
Updates are available on a “pull” basis, so researchers can update the processor at their convenience and ensure that analyses are completed using the same version of the platform. “People think hardware platforms are limited, but our ability to roll out updates far outpaces that of software-based systems,” asserts Gavin Stone, vice president, marketing. “For example, upgrades to a software pipeline takes 26 to 30 hours, but we can update the field programmable data array (FPGA)-based Dragen platform (essentially, firmware) in 26 minutes.”
“All data is encrypted,” he continues. “Also, data is compliant with standards set by the Global Alliance for Genomics and Health (GA4GH).”
Edico Genome’s development team combines backgrounds in consumer electronics and bioinformatics, as well as genomics and genetics. Access to such wide-ranging expertise, contends Dr. Van Rooyen, is advantageous for the company and its customers. “Typically, people in genomics have backgrounds in biology, not hardware,” he explains. “We took the principles of consumer IT and applied them to a substantial problem in genomics, working closely with genome sequencing centers to ensure our platform performs the way they need it to.”
The Future of Healthcare
“Genomics is the future of healthcare,” Dr. van Rooyen declares. “But to make genomics widely available, there must be a fundamental infrastructure. Our vision is to provide that.”
Dragen provides the secondary analytics platform. Edico Genome also is working with other companies to develop or optimize apps that run atop that platform. Dragen currently is used by the U.K.’s Genome Analysis Centre (TGAC), the Centers for Disease Control and Prevention, HudsonAlpha Institute for Biotechnology, PerkinElmer, Harvard University, Stanford University, and other leading organizations. Edico Genome has worked closely with the Broad Institute to offer the institute’s genome analysis algorithm, Genome Analysis Toolkit (GATK), and MuTect 2, a cancer-specific GATK-based variant caller, on Dragen.
“The inflection point is coming,” predicts Dr. van Rooyen. “The infrastructure for genomics is building, and standards are being put in place. When these activities reach their culmination, genomics data will make its way quickly from research labs to routine clinical use.”
Edico Genome
Location: 3344 North Torrey Pines Court, La Jolla, CA 92037
Phone: (858) 260-5234
Website: www.edicogenome.com
Principal: Pieter van Rooyen, Ph.D., President and CEO
Number of Employees: 37
Focus: Edico Genome has created a bioinformatics processor to analyze sequencing data. The company’s platform as a service (PaaS) option is designed to enable swift whole-genome analysis for clinical applications and research studies from individual to population scale.