A paper published online in March in the New England Journal of Medicine describes the use of Life Technologies’ SOLiD™ System to perform whole-genome sequencing, resulting in the identification of the genetic mutation associated with Charcot-Marie-Tooth disease, an inherited neurological disorder that impairs motor and sensory nerve function. The SOLiD system is an example of amplification-based gen-2 sequencing technology.
Researchers at the Institute for Systems Biology relied on the genome sequencing services of Complete Genomics to sequence the genomes of four members of a nuclear family in which the two children have Miller syndrome and ciliary dyskinesia, a recessive genetic disorder that causes lung disease similar to cystic fibrosis.
The company sequenced the four genomes to a depth of 51X to 88X, making a call on 85%–92% of the bases. Comparative analysis of the four genomes led to the identification of four genes consistent with recessive inheritance of rare variations, and one of these genes, called DHODH, was concurrently identified as the cause of Miller syndrome, a finding published in Nature Genetics.
“Last year at ‘AGBT’ we presented our first sequenced genome,” recalled Rade Drmanac, Ph.D., Complete Genomic’s CSO. “During that year we sequenced more than 50 genomes.” For 2010, the company’s goal is to sequence 5,000 genomes. Earlier this year, it sequenced eight genomes for a customer for a total price of $160,000, or $20,000/genome, and that price continues to drop. “We will get to the point where the cost of sequencing will not be a limiting factor for large studies,” said Dr. Drmanac. The company’s business model centers on providing a large-scale human genome sequencing service for genomics researchers.
The company’s sequencing technology is based on self-assembling DNA nanoarrays and ligation-based sequencing. When developing its technology, Complete Genomics intentionally steered clear of single-molecule detection methods, noted Dr. Drmanac, in order to keep both the cost of sequencing and the error rate as low as possible. Yet the company sought to build in some of the advantages that an SMS format offers. The result was a high-density rectangular nanoarray in which 200 nm diameter DNA spots, separated by a distance of 0.7 microns, are arrayed on a silicon chip. About three billion DNA spots can be arrayed on a single 1 inch x 3 inch chip in a precise grid.
The company developed a method for amplifying short (200–300 base) single-stranded fragments of circularized genomic DNA templates in solution using a traditional enzymatic reaction. This rolling circle replication strategy produces a concatamer of as many as 500 copies of the DNA fragment in one long strand. To prevent these molecules from becoming entangled as they increase in length, Complete Genomics devised a strategy by which the pallindromic sequences present along each DNA chain fold back on each other as the replication progresses, forming a nanoball. These DNA nanoballs are about 200 nm in size and are highly polar particles that naturally repel each other in solution.
“In 30 minutes, a 1 mL sequencing reaction can create up to 50 billion DNA nanoballs, each composed of 500 copies of a genomic DNA sequence,” noted Dr. Drmanac. “We then flood the surface of a silicon chip with the solution of DNA nanoballs and they self-assemble into grids, with one nanoball per spot.” A single chip will contain enough amplified DNA to sequence an entire genome.
The fluorescent signals emitted from the chip are detected with a CCD camera. “We are currently able to measure a DNA spot using two pixels,” added Dr. Drmanac. The detection efficiency of the system will continue to increase as camera speeds improve.
Following the introduction of its first Genome Sequencer™ (GS) System—the GS 20—nearly five years ago, 454 Life Sciences, a Roche company, brought the GS FLX System to market in January 2007. Now, during the second quarter of this year, the company will launch the GS Junior System, a smaller, lower-cost version of the GS FLX.
Christopher McLeod, president and CEO of 454, described the benchtop GS Junior as “the personal computer for life sciences.” It has a throughput of >35 million bases per run, with a run time (including sequencing and data processing) of 12 hours. Average read lengths are 400 bases, with 99% accuracy. In addition, later this year the company plans to announce increased read-length capability of up to 1,000 bases for the GS FLX System.
The 454 sequencing technology is a high-resolution, high-throughput genotyping tool, and the company intends to move it into the clinical setting for diagnostic applications in the future. Applications would include infectious-disease diagnostics, identification of drug-resistant variants of HIV, or detecting one or more alleles or sets of genes associated with a particular medical disorder.
At “AGBT”, Henry Erlich, Ph.D., director of human genetics and vp of discovery research at Roche Molecular Systems, described an application in which the GS FLX was used for human leukocyte antigen typing. The GS FLX is also currently being used for emerging pathogen surveillance in Africa as part of a joint project with Google.org, the philanthropic arm of Google.com.
Sample processing with 454 sequencing technology begins with random fragmentation of genomic DNA into 300–800 base-pair fragments and attachment of a DNA adaptor to the fragments to facilitate amplification, sequencing, and quantification. Each single-stranded DNA fragment in the library is immobilized on a capture bead and amplified in parallel via emulsion PCR, producing several million copies of the original fragment per bead.
Individual beads are loaded into the wells of a PicoTiterPlate device for sequencing. Earlier this year, 454 Life Sciences introduced the REM e System for the GS FLX, a robotic accessory module that automates liquid-handling functions for performing emulsion PCR and reportedly eliminates about five hours of manual lab work.
“In a single run we generate tens of thousands of clonal reads and can achieve unambiguous allele calling in less time and for less cost than conventional capillary electrophoresis genotyping methods,” said McLeod.
Doing More With Less
Taking advantage of the ability of single-molecule sequencing technology to generate data from small quantities of nucleic-acid template and to generate epigenetic information, Helicos is collaborating with researchers at Massachusetts General Hospital and the Broad Institute to perform chromatin immunoprecipitation (ChIP) studies to identify epigenetic markers of chromatin structure. The model system being used for this research is inner cell mass from preimplantation mouse blastocysts, which gives rise to early stem cells. ChIP DNA derived from the inner cell mass is present in only picogram quantities.
In Helicos’ tSMS technology, labeled nucleotides are mixed with nucleic-acid templates immobilized on a flow cell. Detection of the fluorescent signals emitted as a result of each base addition is performed in the HeliScope™ Genetic Analysis System. The Helicos system can sequence 105–180 megabases/hour with average read lengths of 33–36 bases from templates ranging in length from 25 to 5,000 bases.
For direct RNA sequencing, the system can produce 300–400 million aligned reads/run with an average read length of 34 nucleotides (range 25–55) and a <5% per nucleotide error rate. Dr. Milos presented qualitative and quantitative data from RNA studies using tSMS to map the 3´ ends of RNA transcripts from yeast and human liver cells, producing a high-resolution map of 3´ polyadenylation sites. Another project under way is using direct RNA sequencing to study a pool of micro-RNAs and generate miRNA count distribution. Early results suggest that the technique yields greater quantitative accuracy than conventional cDNA-based methods.
In the system introduced by Pacific BioSciences, sequencing takes place on SMRT cells, each of which contains thousands of zero-mode waveguides (ZMWs). Each ZMW represents a hole tens of nanometers in diameter in a metal film that has been deposited on a silicon dioxide substrate.
The company describes each individual ZMW as a “nanophotonic visualization chamber” that serves as a window for observing the activity of the lone DNA polymerase molecule immobilized on the bottom surface of the ZMW. In this way the instrument can record the addition of each individual fluorophore-labeled phospholinked nucleotide in real time with a high signal-to-noise ratio despite the dense fluorogenic background.
Each labeled nucleotide emits a fluorescent signal that identifies the base being added to the growing DNA strand. The polymerase subsequently cleaves off the fluorophore, allowing the signal to return to baseline before the next nucleotide addition. Pacific Biosciences developed a method for nucleotide labeling in which the fluorophore is attached without interfering with the intrinsic speed, read length, and accuracy of the DNA polymerase.
Describing the ongoing challenge of detecting multiple target genes in the often-limited amounts of DNA present in patient samples, Alex Parker, Ph.D., principal scientist for molecular sciences at Amgen, presented the company’s nanoliter-scale strategy for creating highly multiplexed sequencing libraries by amplifying target genes derived from multiple patient samples to enable massively parallel sequencing using gen-2 sequencing technology. Amgen developed an adaptor-mediated reverse-nested PCR technique using the Fluidigm Access Array PCR platform to create a multiplexed sequencing library. This strategy eliminates the need for large panels of PCR primers.
The method involves attaching a universal tag to the 5´ end of the PCR primers, which allows them to mate with another set of primers that contain a multiplexing identifier and an oligonucleotide linker sequence. Amgen devised this method for use with the 454 Life Sciences sequencing platform and thus selected a specific linker sequence that would attach each member of the sequencing library to the beads used in the 454 system. The multiplexing identifier is, in essence, a barcode specific for each sample being sequenced that identifies the sample of origin for each DNA fragment sequenced.
By using a universal adaptor sequence, only one set of barcode primers is needed and each primer pair only needs to be synthesized once. The barcode primers can be reused and mixed with different sample DNAs in various combinations. For a multiplexed sequencing assay designed to analyze 100 different genes from 100 samples, for example, 200 primer pairs would be needed. This paradigm allowed Amgen to carry out the PCR reactions in small volumes on the Fluidigm instrument, with 35 nL reaction volumes, compared to a typical 10 µL reaction volume using conventional PCR technology.
“We have been able to scaledown the amount of input DNA,” using fewer “nonreplaceable research samples. Yet we typically get 5 percent more data per experiment compared to conventional PCR,” said Dr. Parker. This method yields faster results and reduces reagent volumes as well.
Dr. Parker and colleagues have also demonstrated that a high level of multiplexing is possible without significantly impacting the uniformity of the representation of PCR products in the final library. They are now applying this strategy to pathology samples, in which the DNA is often degraded due to the process of fixing the samples, and are working toward a standardized platform suitable for clinical applications.