GEN UPDATES in biotechnology:
Next-Generation Sequencing
Novel Methodologies for Rapid, Low-Cost Sequencing
Kate Marusina, Ph.D.
DNA sequencing is a powerful technique firmly entrenched into every aspect of biological research and drug discovery. Recent advances in high-throughput capillary sequencing combined with the shotgun method of sample preparation made sequencing of the whole human genome a reality. Sequencing of whole genomes leads to identification of novel genes that may influence development of human diseases, enable pathogens to avoid cellular defenses, or make some viruses lethal. Sequencing of the same genes across the population maps polymorphisms that may correlate with susceptibility to certain diseases or with adverse reaction to certain medications. Sequencing of individual genomes may some day become a part of routine medical care, as the therapies will be tailored to a specific genetic profile. Thus, high-accuracy combined with high-throughput and low-cost
per run remains the holy grail of sequencing. 454 Life Sciences,
a majority owned subsidiary of Curagen, has a vision of sequencing
whole genomes in days, not months at a cost of 100x less than using
currently available Sanger sequencing method. The company’s
strategy rests on "microchip for sequencing," a miniaturized
and highly parallel version of sequencing based on detection of
light. |
Sanger-based Sequencing
The Sanger method (dideoxynucleotide sequencing) is the gold standard of DNA sequencing. This method utilizes fluorescently labeled 2',3'–dideoxynucleotide triphosphates (ddNTPs) that terminate DNA chain elongation because they cannot form a phosphodiester bond with the next dNTP. A particular ddNTP constitutes only 1% of regular dNTP mix, enabling some DNA polymerization to proceed. Thus, the polymerase reaction produces a mixture of fluorescent products of various lengths that can be resolved by either slab gel or capillary electrophoresis. Applied Biosystems (AB) is a market leader in development and distribution of instrumentation and BigDye® Terminator chemistries, based on the Sanger technology. Innovative optics, automation, and proprietary reagents enabled AB to increase sequencing throughput and minimize reagent consumption. For a user of an average core facility, the cost of sequencing, including the BigDye Terminator reaction, running the sample, and generation of data varies from $6—$23, with average cost of about $9—$12, or approximately $0.01/base1. Pyrophosphate-based sequencing is based on DNA elongation (sequencing by synthesis) and principally opposite to Sanger termination reaction. Each dNTP incorporation event into the DNA strand is accompanied by release of pyrophosphate (PPi) in a quantity equimolar to the amount of incorporated nucleotide. PPi is quantitatively converted by sulfurylase into ATP, which in turn facilitates luciferase conversion into oxyluciferin. The light generated during the reaction can be detected and quantified by a CCD camera. Even though the pyrosequencing reaction is fast and less laborious,
it generates average reads of less than 100 nucleotides and is considered
an insignificant competitor to the Sanger method, which can routinely
generate 700 bp reads. |
Table. Strategies for Genome Sequencing
| 454 Life Science Emulsion-Based Sample Preparation | Hierarchical Shotgun | Whole-Genome Shotgun |
| Nebulizing of the genome into 200–500-bp fragments* | Mapping of the genomeby series of overlapping BACs (150kb) | Shearing of the genome |
| Ligation of the adaptors, bead capture, and clonal aplification of each captured fragment | Subcloning of 1.5-kb fragments | Cloning of 2-kb and 10-kb fragments |
| Sequencing of 100 bp from one end of each clone | Sequencing of 500 bp from one end of each clone | Sequencing of 500 bp from each end of each clone |
| Assembly of contigs from a pool of overlapping fragments | Joining fragments together, based on sequence overlap (PHRAP) | Assembly of contigs from a pool of overlapping fragments |
| Close the gaps by directed sequencing or by sequencing of both ends of the template | Close the gaps by additional cloning and sequencing | Close the gaps by additional cloning and sequencing |
*Data published in Nature online edition, July 31,2005. According to Dr. Rothberg, with the average read length of 500bp, the fragment size can be increased to 400bp.
454 Life Sciences
Biotage utilizes the technology primarily for re-sequencing, SNP analysis, and epigenetic studies (CpG methylation). In August 2003, 454 Life Sciences secured a five-year exclusive license from Biotage to use pyrophosphate sequencing for whole genome applications. 454 Life Sciences developed a miniaturized version of the reaction, requiring only picoliters of the reaction volume. The firm’s unique emulsion-based method for sample preparation allows capturing of the entire genome onto the bead array, followed by simultaneous solid-phase sequencing of all genomic fragments. Jonathan Rothberg, Ph.D., founder and chairman of 454 Life Sciences, believes that the company’s technology will create the same ripple effect in the life sciences as the introduction of PCs created in the computer industry. "Our single instrument generates the same amount of data as 100 capillary sequencing machines." Therefore, a $500,000 investment in our Genome Sequencer 20 replaces a $30-million investment in the sequencing instruments and a $20-million investment in the supporting robotics. "Capital requirement is the biggest bottleneck of genomic advancements. We democratize sequencing and enable everyone to have a sequencing machine on their bench." Using the company’s sample preparation kit, a single individual is able to analyze a 2–Mb genome in less than 24 hours, at an average cost of 0.1 cent/base, one tenth of the current cost. A single run yields consensus accuracy better than 99.98% in the nonrepeat parts of the genome. The installed base of the Genetic Analyzer 20 includes Baylor University, Washington University in St Louis, Sanger Center, the Joint Genome Institute, and Broad Institute. |
Sample preparation for solid-phase sequencing (A) Fragments of the whole genome are ligated to specialized adapters (B) The fragments are captured on Sepharose beads. DNA is amplified in aqueous microreactors suspended in oil. (C) Clonally amplified fragments are loaded in PicoTiter Plate™ (fiberoptic slide). (D) Beads with immobilized sequencing and detection enzymes are added to the wells. (E) The plates are loaded into Genome Sequencer 20. All fragments are sequenced simultaneously.
New Applications, New Markets
"In the same way that introduction of the PC created hundreds of different applications, introduction of 454 technology has already generated applications that were not cost-effective or technologically possible before," continues Dr. Rothberg. "Even though the impact on medical research is still to be understood, 454 brings us a step forward toward a visionary goal of $1,000 genome sequencing," adds Sharon Sheridan, director of sales and marketing for the sequencing business at Roche Diagnostics. "We see a lot of applications in industrial markets, agriculture, and bioterrorism. There is a lot of interest in studies of microbial ecosystems in chronic human diseases and mechanisms of drug resistance." As described in the December 2004 issue of Science Express, the scientists at Johnson & Johnson Pharmaceutical Research & Development (J&JPRD) identified a new target by sequencing and comparing the genomes of the drug-resistant Mycobacterium tuberculosis strain and the two resistant M. smegmatis strains, as well as the parental M. smegmatis. The whole genomes comparison pinpointed two mutations in the membrane domain of the proton pump of ATP synthase. J&JPRD is now developing a new drug from the diarylquinolone family inhibiting ATP synthase function. The distinct target of the new drug means that there is no cross-resistance with existing anti-TB drugs. Other applications in development include identification of signaling pathways responsible for cancer resistance, drug-resistant HIV isolates, microRNAi, and expression profiling. In May 2005, 454 Life Sciences entered a five-year exclusive worldwide
agreement with Roche Diagnostics for the development and distribution
of the instruments and the proprietary reagents. The agreement includes
$62 million in license fees, milestones royalties, and research
funding. Toward a $1,000 GenomeWorking in nanoscale drastically reduces the preparation time, handling, inventory, and storage costs associated with the traditional whole genome sequencing approaches. The 454 technology does not require subcloning, bacterial propagation, or handling of individual clones. The whole genome is nebulized, and the library is subjected to several enzymatic steps, followed by ligation to the specialized adaptors. The library is fractionated, and the individual fragments are captured on Sepharose beads. Each captured molecule is PCR amplified on the bead within a droplet of the buffer (microreactor) suspended in an oil emulsion. This unique miniaturization approach generates approximately 1,000 microreactors per microliter. DNA-carrying beads are arrayed in the wells of the fiber-optic slides, and the individual fragments are sequenced all at the same time. "This is like moving from a vacuum tube to a transistor," says Dr. Rothberg. "We have no limits on the problems we can solve. It is a disruptive technology." "It is a very exciting technology," agrees Lori Murray, senior manager public relations at AB. "However, it would be premature to talk about total displacement of capillary sequencing. The 454 technology is confined to whole genome sequencing and it is not able to satisfy the strong demand for genotype discovery, that is re-sequencing of the same gene across the population. The technology is also limited to sequencing of smaller organisms. At the present, capillary sequencing is the most versatile technology, which is able to work with whole genomes as well as with one gene at a time." Current strategy for human genome sequencing is based on cloning of individual genome fragments (the shotgun approach), with cost estimates anywhere from $10 to $25 million. The publicly sponsored Human Genome Project (HGP) utilized a hierarchical shotgun sequencing approach, whereas Celera Genomics used a whole-genome shotgun approach (Table). It took four to five years for the HGP to complete the preparatory work of the crude mapping followed by 15 months of intense sequencing efforts involving 20 centers around the world working around the clock. By bypassing the genome mapping stage, Celera was able to reduce the overall time required to produce the equivalent data to nine months. 454 Life Sciences aims to sequence the human genome by the end of 2006 at a cost of $100,000. "Our approach reduces the operating costs to less than 10% of any other method," says Dr. Rothberg. The company packages all the necessary reagents into a $5,000 kit—enough for the sequencing of a minimum of 20 million bases, which will provide 10x over sampling for a 2–Mb genome. For instance, 2.1-Mb Streptococcus pneumonae genome can be prepared and sequenced in about 15 hours using just one microchip, even though the 454 method requires 50% more reads to yield the same number of contigs as does the Sanger method. Data processing, including de novo assembly, is performed on an off-line computer and takes another 1.5 hours to complete. The human genome is comprised of approximately 3,200 Mb. The completion of the de novo sequencing of a mammalian genome may take as many as 30 454 microchips. Since the reads are very short, bioinformatics presents a fundamental constraint on the ability to assemble the complex genomes. To reach the accurate consensus, the genome has to be sampled over 10 times, which generates millions of short sequence reads that have to be assembled in contigs. As states Sheridan, "interpretation of the collected data on this scale is still a challenge." According to 454 Life Sciences, one of the main advantages of the the company’s technology is remarkable uniformity of the distribution of the genomic fragments, both as a function of genome position and as a function of GC content. Both shotgun approaches have to deal with inevitable clone biases, requiring additional cloning and sequencing of under-represented regions. The uniform distribution of genome would result in fewer gaps than
provided by cloning. However, the accurate sequencing of repeats
and homopolymers remains a challenge for 454. In humans, coding
sequences comprise less than 5% of the genome, whereas repeat sequences
account for at least 50% or more. To be able to span the repeats
in complex genomes, the company is developing methods for paired
end sequencing in a single well. Other improvements will be directed
toward achieving longer reads while speeding up the fluidics to
maintain the same overall sequencing speed. |
Reference
1. R. Persha, et al. DNA Sequencing Research Group General Survey. Journal
of Biomolecular Techniques 14:231-235 (2003).
Kate Marusina, Ph.D., is a business development and marketing
consultant, based in Sacramento, CA. Phone: (530) 979-1522. E-mail: .
This article previously appeared in the September 1, 2005, issue of GEN. Web: www.genengnews.com.

