What Does it Take to Get There?
The ultimate goal continues to be a de-novo sequence assembled with 99.9% accuracy, or no more than one error per 10,000 nucleotides, with essentially no gaps. The Sanger method remains the gold standard of sequencing, generating read length and data quality reportedly exceeding that offered by its competitors. In the past three years the costs of Sanger sequencing decreased to less than $0.6 per 1,000 bases, but this efficiency is enjoyed only by large genome centers.
Most of the new approaches are using methods other than Sanger. Direct methods determine each base of DNA individually. Indirect methods assemble DNA sequences based on the experimental determination of oligonucleotide content.
“Any upcoming technology has to demonstrate that it is cheaper and better than Sanger or has unique advantages,” adds Chad Nusbaum, Ph.D., co-director of the genome sequencing and analysis program at the Broad Institute (www.genome.wi.mit.edu). “Datatypes coming from new-generation sequencers vary quite significantly. Read length as well as quality varies dramatically, and because reads are short, assembly is a challenge. 454, for example, requires 20-fold coverage or more of their 100–200 bp reads to be able to assemble contigs of the useful size.
Shorter reads require even deeper coverage to assemble. By comparison, a typical draft assembly of 700-base reads (as generated by an Applied Biosystems instrument) requires only sevenfold coverage. On the other hand, one advantage of 454 is that with a single instrument you can set up a small-scale genome center with minimal investment in infrastructure and personnel.” According to the company, the estimated cost of 454 sequencing is about $0.5/1,000 bases.
“It is good to have different technologies, and eventually we may see some specialization according to the advantages of each technology,” emphasizes Dr. Peterson. “Sanger sequencing is able to read the homopolymeric regions that are still problematic with other methods. However, if DNA is not clonable then Sanger cannot be used. It is important to continue improving the Sanger process.”
The installed base of Sanger-based sequencers is estimated to be close to 10,000 instruments. According to the Association of Biomolecular Resource Facilities (www.abrf.org), the majority of the installed base resides at the core labs and university sequencing centers. At the same time, perhaps only 25% of all Sanger reactions are run at these facilities.
The cost of core sequencing has progressively decreased, and in 2005 was estimated to be approximately $7 per combined reaction and run (2006 General Survey of DNA Sequencing Facilities).
“This market is not going to disappear for a long time,” says Stevan Jovanovich, Ph.D., president/CEO of Microchip Biotechnologies (MBI; www.microchipbiotech.com). “Core facilities run a range of projects from simple clone verification to large-scale sequencing, and they are not going to spend millions of dollars to replace the existing infrastructure. However, they would look favorably at decreasing costs while continuing to use a well-established Sanger method.”
MBI is developing a miniaturized version of cycle sequencing and integrated clean-up on a reusable chip. Eventually, the cycle-sequencing reaction would take place in a total volume of 25 nL. Microfluidic circuits for DNA amplification and purification would be operated by microrobotic valves and pumps, also integrated into the chip. The resulting reaction could be run on the existing instrumentation.
“If core facilities were able to use 1/1,000 of the current materials, their sequencing cost could be less than a cent per base, bringing them to par with genome centers,” says Dr. Jovanovich. In the future MBI plans to etch the actual capillary array onto the chip.
Many recognize the value of inexpensive technologies for high-quality resequencing. Sequencing-by-hybridization (SBH) is based on high-throughput miniaturized sequencing on microarrays of specially designed overlapping oligos. SBH does not require determination of nucleotide positions experimentally, but instead derives the sequence information indirectly from oligonuclotide content.
Other advantages include high parallelism, long reads, and the ability to sequence heterogenous mixtures. While it is costly to make an initial array, the sequencing process itself is inexpensive and could be efficiently used for repetitive interrogation of the same DNA region.
“One of the areas where SBH would be very effective is the analysis of isolates of pathogenic bacteria,” offers Dr. Nusbaum. “With SBH you can efficiently sequence thousands of almost identical isolates. Genetic results combined with the data on virulence would give an insight into the molecular mechanism of pathogenicity.”
“However, SBH could run into problems if a target is substantially different from the reference,” adds Dr. Gibbs. “SBH also continues to struggle with signal-to-noise ratio. Adequate discrimination of all sequences under the same hybridization conditions is still a problem.”
Sequence information can also be extracted from single DNA molecules without amplifying DNA or incorporating the labels. The advantages of this method include high sensitivity, minimal use of reagents, and high parallelism. Nanopore sequencing is the most familiar model for this approach. “Theoretically, we can sequence at the speed of one million bases per second. This means that the whole human genome could be sequenced in less than one hour,” says Scott Collins, Ph.D., professor, University of Maine.
“In the electronic world such speeds are actually very low. A computer microprocessor operates at one hundred times higher speed.” Dr. Collins’ group is developing a silicon-based inorganic nanopore. As a DNA strand is threaded through the nanopore, four microscopic electrodes are used to detect the differences in tunneling current of each individual nucleotide.
“Our major obstacle lies in machining components in the nanometer range—1.8-nm nanopore and electrodes. At this scale the device components are only 6–7 atoms wide,” says Dr. Collins. Eventually all necessary electronic equipment could fit on a 1 mm x 1 mm disposable chip, which may cost only a few hundred dollars. Only a connection to a computer would be required to read the entire genome.
While nanopore technology is still in the future, a single-molecule sequencing technology developed by Helicos Biosciences (www.helicosbio.com) is close to commercialization. According to the company, its first-generation system, available in the second half of 2007, will be able to provide 103 price advantage over current Sanger sequencing methods and will produce 109 bases per day using sequencing-by-synthesis.
With the reads of only 25 bases, the assembly of contigs is still a challenge, at least initially limiting the scope of possible applications. But because each DNA strand is analyzed as a separate sequence, Helicos was able to develop a pair-end read strategy that involves sequencing the initial 25 base pairs of the template, followed by an extension with the predetermined number of dark nucleotides. The extended fragment, still bound to the template, is treated with another round of sequencing by synthesis. This process is a focus of a recently awarded NHGRI grant.