February 1, 2015 (Vol. 35, No. 3)
A More Embedded, Pervasive Genomics Sets the Stage for Increasingly Ambitious Applications
Many factors drive the next-generation sequencing (NGS) market in regulated environments. Only a few years ago, the throughput and price point did not allow for easy transition from existing technologies.
The launch of benchtop instruments has significantly reduced the capital equipment costs and simplified the skill sets required for operation, expanding common near-term applications in noninvasive prenatal testing (NIPT), oncology, virology, drug-resistance testing, and infectious disease.
Scientists from around the world gathered to discuss the changing dynamics of NGS at a VIB event, Revolutionizing Next-Generation Sequencing: Tools and Technologies, which took place January 15–16 in Leuven, Belgium.
Understanding how a genetic profile can help to streamline clinical diagnostic methods also drives the NGS market. Sequencing offers an opportunity to revolutionize diagnostics through noninvasive methods, and to reduce unnecessary medical treatments that may not fit a genetic profile.
According to Paul Schaffer, lifecycle leader, sequencing platforms, Roche Diagnostics, each of the existing commercial sequencing platforms have limitations that the user needs to assess prior to platform purchase, such as the need for workflow automation, bioinformatics support, and data analysis. Roche’s recently announced collaborations and acquisitions of long-read sequencing technologies will allow for phasing and the ability to sequence across repetitive regions, simplifying data analysis, a current bottleneck.
The quick turnaround time of the platforms under development also allows sequencing results to be integrated in clinical reports with results generated by other technologies. If the sample-to-result turnaround time is less than one week for the other technologies, then the NGS workflow needs to be completed in the same time frame. In addition, single-molecule sequencing allows for the detection of nucleotide base modifications, eliminates GC sequencing bias, and streamlines workflow.
The clinical market will grow significantly from where is it today, according to Schaffer. Not only are near-term applications driving demand, but current NGS technologies are maturing and becoming more user friendly while costs continue to decrease. The adoption rate in the U.S. clinical markets will depend on the implementation of the guidelines for laboratory developed tests (LDTs) and concerns about reimbursement and ethics.
A New Mindset
In a research environment, it used to be acceptable for an NGS analysis to take 20 hours. Today, with enormous data stores being generated and a genomics clinical diagnostics environment being developed, there is a need for high-speed, low-cost NGS processing. Specifically developed for NGS, DRAGEN (Dynamic Read Analysis for Genomes) is a digital bio-IT processor.
DRAGEN performs the reference-based mapping, aligning, sorting, deduplication, and variant calling that is currently performed by software on cluster or cloud-based platforms. The processor accurately analyzes over 50 whole human genomes in less than a day, reducing the need for clusters of large servers, thereby lowering costs related to storage space and IT infrastructure.
“My background is consumer electronics and high-volume electronics production. This is the type of mindset that will take genomics to the next level,” stated Pieter van Rooyen, Ph.D., CEO, Edico Genome.
“In the mid-80s, only a few people had cell phones, and there was one big company in the space. That is where we are in genomics today,” Dr. van Rooyen continued. “The equipment is big and clunky. The applications are promising, but no one is exactly sure how it is all going to turn out.
“I believe it is going to play a huge role in all of our lives, and our technology is just one small step in making that a reality. Genomics is a fundamental key in making the healthcare system more affordable. But for clinical applications, real-time processing is necessary, so results are immediately actionable by a physician.”
The goal is to reduce the dependence on IT infrastructure, to avoid relying on multiple servers, and to make sequencing a stand-alone box. DRAGEN allows the use of one server, and ultimately the processor will be integrated into the sequencer.
The Illumina HiSeq X Ten has a capability to annually sequence 18,000 whole human genomes at $1,000 each. “The HiSeq X Ten is ushering in the global genomic concept that we term geonomics. This is very large-scale sequencing which puts the clinical work first and then the research flows from that,” commented Alex Dickinson, Ph.D., senior vice president, strategic initiatives, Illumina.
An example of the clinic-to-laboratory approach, said Dr. Dickinson is the UK 100,000 Genome Project. Run and funded by the National Health Service, this project is focused on rare diseases, cancer, and infectious disease.
“Although it is fundamentally a clinical project, it is accumulating data that has huge research potential,” observed Dr. Dickinson. “Applications like the 100,000 Genome Project require a big investment up front; the economic advantages come later.”
More sequencing generates more information. The task is to determine how to take the exponentially increasing flood of sequencing scientific papers, describing relationships to disease and medicine, and to channel it through useful applications. Data interpretation, turning whole genome sequencing into helpful health recommendations, remains a technical hurdle.
“The true utility of large-scale sequencing will not emerge until significant numbers of genomes are available,” predicted Dr. Dickinson. “Leaps of faith are required to jump that gap. The first country to do that has the potential to be at the forefront of an entire new industry. Although sequencing is enabling, it is by no means a complete solution to make whole genome sequencing clinically useful.
“The genome does not change throughout your life; it is the map for your entire biology. You can imagine after someone is born that they would be genome sequenced, the data entered into their medical record, and almost any medical decision would be filtered through the genome including preventive care.”
Detecting All Genetic Variation
Despite developments in targeted gene sequencing and whole genome analysis techniques, the robust detection of all genetic variation in and around genes of interest remains a challenge. A potentially useful method called targeted locus amplification (TLA) selectively amplifies and sequences entire genes on the basis of the cross-linking of physically proximal sequences.
TLA enables the targeted sequencing as well as the haplotyping of any region of interest in any genome, irrespective of the sequence changes, and it provides comprehensive sequencing information about the presence of single nucleotide variants in addition to structural variants. The technology is highly suited for transgene analysis to determine integration sites, and to verify whether transgenes have really integrated in the appropriate manner.
TLA uses one primer pair, specific for the region of interest, for the targeted amplification. For the transgene application, primers specific for sequences that occur only in the transgene are used. Then the protocol provides complete sequencing information including any genetic alterations that may have occurred in the transgene and integration site.
According to Max van Min, CEO, Cergentis, current applications based on the whole-cell protocol involve the complete sequencing of regions that may be diagnostically relevant or linked to phenotypes of interest.
Examples include using TLA technology with cancer cells to sequence genes of interest and detect variation potentially linked to prognosis and drug response, and supporting pharmaceutical companies in the characterization of transgene model organisms and cell lines for disease or drug-response modeling, or cell lines for the production of proteins and primary antibodies.
Since TLA provides more genetic information than other technologies, the classical NGS challenge applies—how to interpret this additional genetic information. In some cases, the genetic variants of clinical interest are already known, but occasions exist in which it is difficult to classify a variant and determine whether it is linked to the trait of interest or is just benign.
The first version of the TLA protocol uses whole cells as the input material. A newly developed protocol is compatible with isolated DNA and opens up new opportunities in genetic diagnostics, enabling the targeted sequencing of any region of interest in microbial, plant, or animal genomes.
AmpliSeq technology allows massively parallel multiplex PCR for target enrichment of a genome or transcriptome subset in 2–3 hours with only 10–15 ng of input material. New assays are available for exomes, whole transcriptomes, smaller gene and RNA panels, chromosomal translocations or gene fusions, and copy number variation. Up to 20,000 primers or transcripts can be placed in a single tube or, in the case of the entire exome, 300,000 primers in 12 tubes.
Using AmpliSeq Transcriptome, certain RNA molecules of interest can be specifically amplified; the quantitative levels can be maintained after the amplification; and samples can be compared, allowing the quantitative detection of changes in RNA or gene abundance between the samples.
“The Ion sequencer does not use a termination reaction,” discussed Mike Lelivelt, Ph.D., senior director, bioinformatics products, Thermo Fisher Scientific (Life Technologies). “This saves time and gives us fast sequencing, but the historical cost was that sometimes, when we were sequencing the same stretch of nucleotides, exact quantitation could be difficult to measure.
“The power of NGS is that you get a lot of reads and redundancies. The collection of reads screens through nonsystematic errors and gives the right answer. Redundancy is your friend. We leverage that capability to clean up some of these error profiles, but we want to improve accuracy.”
Hi-Q Sequencing Kits resulted from an enzyme evolution program; the polymerase was mutagenized in a controlled manner to select mutants. Resultant enzymes gave fundamentally lower systematic error, and false positives have been lowered 50%, lessening the need for increased validation and providing a more accurate sequencing system.
Publicly available data comparing the AmpliSeq technology using the Ion PGM with Hi-Q relative to other systems demonstrated that Ion PGM plus AmpliSeq was more accurate for targeted panels using a controlled cancer sample, the AcroMetrix Oncology HotSpot panel, with over 500 cosmic variants spiked in at known concentrations.
Analyzing Single-Cell and FFPE DNA Variants
Advances in next-generation sequencing (NGS) technologies have made the analysis of variants from single-cell and FFPE DNA a possibility. However, assessing the quality of FFPE samples before sequencing and the limited amounts of single-cell DNA remain challenging.
In two studies, Qiagen researchers highlighted two approaches that address these challenges and demonstrated how these approaches are used in Qiagen’s sample-to-insight solution for NGS applications.
In the first study (“Single cell mutation detection with multiplex PCR-based targeted enrichment sequencing”), a multiplex polymerase chain reactin (PCR) targeted enrichment approach was used to demonstrate the feasibility of single nucleotide variants (SNV) detection on DNA from single cells. “This approach is ideal for such an application since a targeted enrichment approach makes the analysis of rare SNVs very efficient and robust,” said Eric Lader, an author on both of the Qiagen papers.
In the second study (“Advanced qualification and quantification of amplifiable genomic DNA for PCR-based targeted enrichment prior to next-generation sequencing”), a simple quantitative real-time (qPCR) approach was used to assess the quality of FFPE samples and optimize NGS conditions to rescue low-quality samples that otherwise might not be amenable to sequencing based on other assessment methods. “The use of two qPCR assays targeting 40 genomic loci reduces bias compared to approaches that target few genomic locations,” pointed out Lader.
Aligning with Diploid Reference Sequences
The advent of next-generation sequencing has enabled the use of genomics data to diagnose human disease. But even the low false positive and overall error rate of this technology can be problematic for rare disease diagnoses, in part because of the use of a standard, monoploid reference to study every varied, diploid human genome.
The NIH Undiagnosed Diseases Program (NIH UDP) hypothesized that aligning to a diploid reference sequence personalized with individual, parental, and population information would improve the accuracy of alignment and genotype calling. The method, called DiploidAlign, utilizes genomic SNP chip data from family members along with population haplotype data from the 1,000 Genomes Project, creating a personalized diploid reference sequence that can be used to align NGS data.
The pipelines were developed and tested on the NIH Biowulf supercomputing cluster and were optimized and run on the Ayrris platform on the Appistry cloud.
“DiploidAlign helps align short reads to the correct location, compared with alignment to a standard, monoploid reference,” said William A. Gahl, M.D., Ph.D., director of the NIH UDP. “Sanger validation showed that DiploidAlign resulted in more correctly called genotypes.”
The use of a personalized reference sequence saves analysis time by reducing false positives and helps the NIH UDP discover causal variants, continued Dr. Gahl. The pipeline has been utilized as a common platform for alignment and genotyping and has contributed to several diagnoses at the NIH UDP since September 2014, he added.