September 1, 2009 (Vol. 29, No. 15)
Researchers Are Finding Myriad Applications for This Rapidly Evolving Technology
Next-generation sequencing (NGS) has arrived. “The expansion of users and applications shows no signs of abating,” notes Michael Rhodes, Ph.D., senior manager, SOLiD Sequencing at Applied Biosystems, a division of Life Technologies. “The uptick of transcriptomics applications has been faster than expected. Human genome resequencing projects will increase, and the existing eight published genomes will vastly increase.”
“NGS is an important area right now,” agrees Fred Ernani, Ph.D., product manager at Agilent Technologies. “Scientists look for solid tools and improved workflows. Making tools more efficient is crucial. Many lower throughput systems don’t lend themselves to automation so variability in results needs to be addressed.”
In addition to de novo and resequencing, instrument manufacturers see expansion into amplicon, transcriptome, and metagenomics. Response has been swift; product rollouts start at CHI’s upcoming “Exploring Next Generation Sequencing Conference” to be held later this month and continue through the fourth quarter and into next year.
“This conference is more of a pharmaceutical venue with an eye on clinical research,” says Timothy Harkins, Ph.D., director of marketing at Roche Applied Science.
“We were first to market with 454 sequencing, resulting in an undue amount of scrutiny, as we were first to challenge traditional Sanger technologies,” continues Dr. Harkins. “Ultimately, this was good for the industry, as we were held to a higher standard and thus set the pace. Downside, it slowed adaptation. Our platform is usually the first adopted for new applications.
“What we initially observed was a good snapshot of what happened on the biological level, as you need a lot of sequencing reads to identify transcription site binding events within the genome. As our competition increases the number of short reads for their respective platforms, we see that they are better suited for this specific application.”
By using Roche NimbleGen’s Sequence Capture technologies coupled with 454’s NGS, it is possible to readily sequence over 170,000 exons within the human genome in a single instrument run, adds Dr. Harkins. “We present a series of projects demonstrating performance of combining sequence capture arrays with the Genome Sequencer FLX. These projects include analysis of publicly available reference genomes to assess what coverage models are needed to identify genetic variants within the human exome.” [See the Assay tutorial on page 36 for more information on the combination of these products.]
Dr. Harkins notes that sequencing the whole human genome is neither practical nor feasible for labs outside of the genome centers.
“Sequencing the whole exome provides a practical approach to sequencing the complete protein coding portion of the human genome,” he says. “We have done several projects sequencing the whole exome and shown it to be effective in population studies and disease models in identifying all genetic variants within the coding regions.”
Abizar Lakdawalla, Ph.D., senior product manager, sequencing systems, Illumina, describes the Genome Analyzer as a massively parallel next-generation DNA sequencer. “It reveals information on the genome, epigenome, transcriptome, and the protein-nucleic acid interactome that had not even been imagined before,” he says. “In a few years, personal genomics will be relatively routine.”
Illumina’s sequencing technology relies on attaching randomly fragmented genomic DNA to a planar, optically transparent surface. Attached DNA fragments are extended and bridge amplified to create an ultrahigh density sequencing flow cell with hundreds of millions of clusters, each containing ~1,000 copies of the same template.
This platform uses the bridge-amplification process, eliminating the substantial challenges associated with emulsion PCR, explains Dr. Lakdawalla. These amplified templates are sequenced using a four-color DNA sequencing-by-synthesis technology that employs reversible terminators with removable fluorescent dyes. This approach ensures high accuracy and true base-by-base sequencing, reducing sequence-context specific errors and enables sequencing through homopolymers and repetitive sequences, he claims.
“Most of the scientific excitement is derived from the application of the sequencing technology in gene expression, protein-nucleic acid interactions, and genomics. New data, new algorithms, novel insights, a change in our understanding of basic biology; a lot has changed in a short time,” says Dr. Lakdawalla. “True systems biology is now on the horizon. We already see exciting clinical applications—prenatal detection of chromosomal abnormalities at much higher effectiveness and lower cost than any other current technology.”
DNA Library Preparation
Nicholas Caruccio, Ph.D., director of market development at Epicentre Biotechnologies, points out that DNA library preparation is a common entry point for NGS. The company’s Nextera technology is a single-tube method for preparing fragmented and tagged DNA libraries in less than eight hours. This technique can generate libraries for multiple NGS platforms, whole-genome amplification, and related applications, explains Dr. Caruccio.
“Our core competency is molecular biology,” continues Dr. Caruccio. “We’ve primarily dealt with cloning and manipulation of genomes and RNA, so we’re a good fit with the library preparation market. Researchers performing Sanger sequencing used our transposon technology to get coverage across large fragments. Recently, we realized we could exploit this method and modify the transposons; we ended up creating an NGS Library.”
Library preparation currently has three distinct steps: fragment the DNA, repair the ends, and ligate on adaptors. “We combine these steps into one transposon fragmentation and tagging reaction. The end result is a flexible library that is compatible with either Illumina or 454 sequencing,” notes Dr. Caruccio.
Genomic Loci Enrichment
RainDance Technologies’ platform was designed to enable PCR for high-throughput amplification of targeted regions of the genome, according to James Brayer, manager of commercial scientific applications.
“The RainDance Sequence Enrichment Solution allows researchers to generate targeted resequencing data necessary for the characterization of rare variants associated with human health and disease,” he says.
RainDance launched its first commercial application earlier this year, leveraging the company’s RainStorm™ microdroplet-based technology. “This application facilitates the follow-up of genome-wide association studies and the resequencing of candidate genes and certain pathways of interest,” he notes.
“Because we use PCR, we can target those genome areas that are of interest to researchers—areas that are often inaccessible using alternative enrichment methods due to repetitive sequence or regions of high homology. The droplets produced are uniform, resulting in consistent amplification of the greater than one million droplets generated per sample. Consequently, targeted regions are uniformly represented. The process is scalable.
“Researchers apply this technology to accurately identify rare variants, which is critical to understand the genetic basis of disease. Via collaboration with Expression Analysis, we work with eight researchers in the cancer community, each with a slightly different target in mind, to analyze genome content.”
Patrice M. Milos, Ph.D., vp and CSO, Helicos BioSciences, says her company focuses on scientists who sequence directly. She notes that Helicos takes a different approach to cell biology research because of the lack of PCR, ligations, or gel purification.
“Genomic DNA is sheared, tailed with poly A, and hybridized to the flow cell surface containing oligo dT for initiating sequencing by synthesis,” says Dr. Milos. “Our technology has been used successfully to sequence an array of bacteria representing the diverse genomic content of microorganisms, C. elegans, and, just recently, a near complete human genome.”
Dr. Milos explains that the nature of sample preparation has allowed for direct sequencing of nucleic acid from a variety of sample types including formalin-fixed paraffin embedded tissue and archival tissue samples. “We also optimize our sample preparation to allow preparation and sequencing from picogram quantities of nucleic acid,” she continues.
Helicos addresses what to do with the data, too. “We provide pipelines that help process data. We have SNP Sniffer software for identifying nucleotide variants,” says Dr. Milos. “Other companies’ work allows us to layer our sequence data with their data-visualization tools.”
Target enrichment is useful when a researcher is interested in sequencing a particular genome segment. Agilent’s SureSelect platform can capture a subset of exons or other genome targets and wash away the rest of the genome prior to sequencing. SureSelect replaces other labor-intensive methods of targeted re-sequencing such as PCR techniques that cause major bottlenecks in most NGS workflows, according to Dr. Ernani.
Consequently, Agilent recently introduced the SureSelect DNA Capture Array “as a way to do smaller scale target enrichment,” he says. “This complements Agilent’s in-solution SureSelect Target Enrichment System, designed for a broad range of NGS study sizes including automated high-throughput workflows.”
The Agilent SureSelect DNA Capture Array was the result of work done in collaboration with the Gregory Hannon lab at Cold Spring Harbor Laboratories. Probes were designed through eArray, Agilent’s web-based design application, and high-fidelity microarrays were then custom synthesized using the SurePrint fabrication platform.
In addition, Dr. Ernani notes that Agilent acquired Velocity 11, now called Agilent Automation Systems, and is integrating these new capabilities into the next-generation workflow, particularly the BRAVO system. Agilent also provides tools for quality control of samples using the microfluidic 2100 Bioanalyzer with its High Sensitivity DNA Kit.
“This combination allows our customers to use less PCR in their target enrichment strategy, thus reducing potential for sequencing bias,” explains Dr. Ernani.
Targeted Sequence Capture
Despite NGS advances, notes Michelle Lyles, Ph.D., vp, marketing and sales, febit, “it still takes weeks to sequence a typical mammalian-sized genome at an acceptable depth of coverage for most applications (such as rare mutation discovery).”
Certain high-value genome regions are associated with certain disease states, phenotypic traits, or responses to drug treatment or other environmental stimuli. Dr. Lyles explains that targeted resequencing of genomes can identify variants faster and at a much lower cost than whole-genome sequencing.
“What febit’s technology, called HybSelect, can do is find the needle in the haystack,” maintains Dr. Lyles. “Essentially, HybSelect focuses your sequencing efforts on the areas you want to sequence. febit integrates microfluidics and temperature control with classic microarrays into an instrument called the Geniom RT Analyzer. The instrument, with our bioinformatics approach for capture probe design, allows capture of regions of interest on Geniom Biochips.”
Genomic hot-spots ranging between tens and thousands of kilobases are targets for HybSelect. “You can load your sequencing libraries, hybridize them to capture probes on the Biochip, and conduct wash steps all in an automated fashion and all on a single instrument,” Dr. Lyles notes.
“HybSelect automates capture by hybridization and elution of the DNA of interest. Our customers are looking for capture-enrichment technologies for particular applications, and we are aligning our goals to address that.”
Applied Biosystems’ Dr. Rhodes says the company aligns for NGS expansion by providing end-to-end solutions for each application, with the goal of making sure that the entire process—sample preparation, library construction, sequencing reactions, and bioinformatics—are “as simple as possible.”
At the CHI meeting, Dr. Rhodes will present the latest data from the SOLiD system, emphasizing how the product enables researchers to use a single system to carry out systems biology experiments. “I’ll show examples in which genome sequencing data and transcriptomics data are combined,” he says. “I’ll also highlight some of the new analysis tools available as well as the results obtained with them.”
The biggest challenge, according to Dr. Rhodes, is bioinformatics. “The other parts of the NGS process are improving at such a pace that our users have found the bioinformatics hard to keep up with,” he explains. “To alleviate this, we make sure analysis is simple to execute and, as much as possible, in a preconfigured pipeline so that the user needs minimal interaction.”
“Many people bought machines and produced data; now analyzing that data is the issue,” adds Martin Seifert, Ph.D., GM Genomatix. “We start where hardware ends—we bring all the data analysis into one place and give it context.”
Dr. Seifert’s presentation will focus on live data analysis to show Genomatix’ core strength. “The large amounts of data derived from NGS projects make efficient data-mining strategies necessary if we are to keep pace with the platform data flow,” he explains.
“I’ll show strategies for analyzing epigenetic markers such as methylation. These strategies begin with high-efficiency mapping of raw sequence tags, rapid clustering and peak finding, and data integration into an up-to-date genome annotation database for downstream analysis. I’ll show seamless mining of enriched regions for epigenetic markers in correlation with RNA-seq data.”
DNAStar also tackles the data dilemma. “If you go back five years, getting data was expensive, and analysis was base by base,” says Tom Schwei, vp and GM. “Now, sequencing has become so inexpensive that it’s easy to produce a lot of data. Helping scientists filter through that data is crucial.”
DNAStar provides an array of tools to help scientists rein in their data. Their latest version of visualization software is GenVision v2.0, with expanded capabilities for creating publication-quality images depicting large quantities of genomic information. According to Schwei, “NGS volumes are trending toward shorter read lengths, which is a whole different animal, and we handle these aspects well.”
The company’s desktop software is designed to work with its Lasergene software to permit easy flow of larger data sets and to make real-time visual and data edits. “The analysis of data is a field that’s wide open,” says Schwei. “Templated and de novo assemblies are both challenging, each with its own issues. But developing solutions for both types of projects and more (e.g., transcriptome analysis) is complex. We develop tools to keep up with the trends.”
NGS vs. Microarrays
NGS improvements have enabled more detailed views on classic gene-expression experiments typically run on microarrays. “Typically, microarray experiments produce lists of genes with differential expression across a collection of samples,” says Michael Lelivelt, Ph.D., vp, genomics, Partek.
“However, several technical developments enable analysis of NGS mRNA-seq data to produce estimates of levels of individual transcripts differentially expressed rather than simply differentially expressed genes.”
Partek is a desktop application that compares different genomes and transcriptomes. “Instead of mapping sequencing reads to genes, we map the reads to specific transcripts and then define global changes in the transcriptome across various conditions,” explains Dr. Lelivelt.
“With NGS, one can look at gene expression in a less biased way than with microarrays. The ability to correlate changes in RNA abundance with changes in DNA template abundance helps corroborate scientific findings. Researchers want more integration of multiple genomic technologies.”
Giving researchers a good desktop tool to answer these questions makes personalized medicine more feasible. “More researchers use large-scale genomic technologies, because it is so much cheaper now to sequence transcriptomes and easier to define a set of differentially expressed transcripts through streamlined analysis. Lowering both economic and technological barriers will increase NGS’ impact on personalized medicine,” notes Dr. Lelivelt.