DNA sequencing is a powerful technique firmly entrenched into every aspect of biological research and drug discovery. Recent advances in high throughput capillary sequencing combined with the shotgun method of sample preparation made sequencing of the whole human genome a reality.
Sequencing of whole genomes leads to identification of novel genes that may influence development of human diseases, enable pathogens to avoid cellular defenses, or make some viruses lethal.
Sequencing of the same genes across the population maps polymorphisms that may correlate with susceptibility to certain diseases or with adverse reaction to certain medications. Sequencing of individual genomes may some day become a part of routine medical care, as the therapies will be tailored to a specific genetic profile.
Thus, high accuracy combined with high throughput and low cost per run remains the holy grail of sequencing. 454 Life Sciences (Branford, CT), a majority owned subsidiary of Curagen, has a vision of sequencing whole genomes in days, not months at a cost of 100x less than using currently available Sanger sequencing method.
The companys strategy rests on microchip for sequencing, a miniaturized and highly parallel version of sequencing based on detection of light.
The Sanger method (dideoxynucleotide sequencing) is the gold standard of DNA sequencing. This method utilizes fluorescently labeled 2',3'-dideoxynucleotide triphosphates (ddNTPs) that terminate DNA chain elongation because they cannot form a phosphodiester bond with the next dNTP.
A particular ddNTP constitutes only 1% of regular dNTP mix, enabling some DNA polymerization to proceed. Thus, the polymerase reaction produces a mixture of fluorescent products of various lengths that can be resolved by either slab gel or capillary electrophoresis.
Applied Biosystems (ABI; Foster City, CA) is a market leader in development and distribution of instrumentation and BigDye Terminator chemistries based on the Sanger technology. Innovative optics, automation, and proprietary reagents enabled ABI to increase sequencing throughput and minimize reagent consumption.
For a user of an average core facility the cost of sequencing, including the BigDye Terminator reaction, running the sample and generation of data varies from $6$23, with average cost of about $9$12, or approximately $0.01/base.1
Pyrophosphate-based sequencing is based on DNA elongation (sequencing-by-synthesis), and principally opposite to Sanger termination reaction. Each dNTP incorporation event into the DNA strand is accompanied by release of pyrophosphate (PPi) in a quantity equimolar to the amount of incorporated nucleotide.
PPi is quantitatively converted by sulfurylase into ATP, which in turn facilitates luciferase conversion into oxyluciferin. The light generated during the reaction can be detected and quantified by a CCD camera.
Even though the pyrosequencing reaction is fast and less laborious, it generates average reads of less than 100 nucleotides and is considered an insignificant competitor to the Sanger method, which can routinely generate 700 bp reads.
Biotage (formerly Pyrosequencing; Uppsala, Sweden) utilizes the technology primarily for re-sequencing, SNP analysis, and epigenetic studies (CpG methylation).
454 Life Sciences
In August 2003, 454 Life Sciences secured a five-year exclusive license from Pyrosequencing to use pyrophosphate sequencing for whole genome applications.
454 Life Sciences developed a miniaturized version of the reaction, requiring only picoliters of the reaction volume. The firms unique emulsion-based method for sample preparation allows capturing of the entire genome onto the bead array followed by simultaneous solid-phase sequencing of all genomic fragments.
Jonathan Rothberg, Ph.D., founder and chairman of 454 Life Sciences, believes that the companys technology will create the same ripple effect in the life sciences as the introduction of PCs created in the computer industry.
Our single instrument generates the same amount of data as 100 capillary sequencing machines. Therefore a $500,000 investment in our Genome Sequencer 20 replaces a $30 million investment in the sequencing instruments and a $20 million investment in the supporting robotics.
Capital requirement is the biggest bottleneck of genomic advancements. We democratize sequencing, and enable everyone to have a sequencing machine on their bench.
Using the companys sample preparation kit, a single individual is able to analyze a 2 Mb genome in less than 24 hours, at an average cost of 0.1 cent/base, one tenth of the current cost. A single run yields consensus accuracy better than 99.98% in the nonrepeat parts of the genome.
The installed base of the Genetic Analyzer 20 includes Baylor University, Washington University in St Louis, Sanger Center, the Joint Genome Institute and Broad Institute.
New Applications, New Markets
In the same way that introduction of the PC created hundreds of different applications, introduction of 454 technology has already generated applications that were not cost-effective or technologically possible before, continues Dr. Rothberg.
Even though the impact on medical research is still to be understood, 454 brings us a step forward toward a visionary goal of $1,000 genome sequencing, adds Sharon Sheridan, director of sales and marketing for the Sequencing Business at Roche Diagnostics (Indianapolis).
We see a lot of applications in industrial markets, agriculture, and bioterrorism. There is a lot of interest in studies of microbial ecosystems in chronic human diseases and mechanisms of drug resistance.
As described in the December 2004 issue of Science Express, the scientists at Johnson & Johnson Pharmaceutical Research & Development (J&JPRD; Raritan, NJ) identified a new target by sequencing and comparing the genomes of the drug resistant Mycobacterium tuberculosis strain and the two resistant M. smegmatis strains, as well as the parental M. smegmatis.
The whole genomes comparison pinpointed two mutations in the membrane domain of the proton pump of ATP synthase. J&JPRD is now developing a new drug from the diarylquinolone family inhibiting ATP synthase function.
The distinct target of the new drug means that there is no cross-resistance with existing anti-TB drugs. Other applications in development include identification of signaling pathways responsible for cancer resistance, drug-resistant HIV isolates, microRNAi, and expression profiling.
In May 2005, 454 Life Sciences entered a five-year exclusive worldwide agreement with Roche Diagnostics for the development and distribution of the instruments and the proprietary reagents. The agreement includes $62 million in license fees, milestones royalties, and research funding.
Toward a $1,000 Genome
Working in nanoscale drastically reduces the preparation time, handling, inventory, and storage costs associated with the traditional whole genome sequencing approaches. The 454 technology does not require subcloning, bacterial propagation, or handling of individual clones.
The whole genome is nebulized, and the library is subjected to several enzymatic steps, followed by ligation to the specialized adaptors. The library is fractionated, and the individual fragments are captured on Sepharose beads.
Each captured molecule is PCR amplified on the bead within a droplet of the buffer (microreactor) suspended in an oil emulsion. This unique miniaturization approach generates approximately 1,000 microreactors per microliter.
DNA-carrying beads are arrayed in the wells of the fiber-optic slides, and the individual fragments are sequenced all at the same time. This is like moving from a vacuum tube to a transistor, says Dr. Rothberg. We have no limits on the problems we can solve. It is a disruptive technology.
It is a very exciting technology, agrees Lori Murray, senior manager public relations at ABI. However, it would be premature to talk about total displacement of capillary sequencing. The 454 technology is confined to whole genome sequencing, and it is not able to satisfy the strong demand for genotype discovery, that is re-sequencing of the same gene across the population.
The technology is also limited to sequencing of smaller organisms. At the present, capillary sequencing is the most versatile technology, which is able to work with whole genomes as well as with one gene at a time.
Current strategy for human genome sequencing is based on cloning of individual genome fragments (the shotgun approach), with cost estimates anywhere from $10 to $25 million. The publicly sponsored Human Genome Project (HGP) utilized a hierarchical shotgun sequencing approach, whereas Celera Genomics (Rockville, MD) used a whole-genome shotgun approach (Table).
It took four to five years for the HGP to complete the preparatory work of the crude mapping followed by 15 months of intense sequencing efforts involving 20 centers around the world, working around the clock. By bypassing the genome mapping stage, Celera was able to reduce the overall time required to produce the equivalent data to nine months.
454 Life Sciences aims to sequence the human genome by the end of 2006 at a cost of $100,000. Our approach reduces the operating costs to less than 10% of any other method, says Dr. Rothberg.
The company packages all the necessary reagents into a $5,000 kitenough for the sequencing of a minimum of 20 million bases, which will provide 10x over sampling for a 2Mb genome.
For instance, 2.1 Mb Streptococcus pneumonae genome can be prepared and sequenced in about 15 hours using just one microchip, even though the 454 method requires 50% more reads to yield the same number of contigs as does the Sanger method. Data processing, including de novo assembly, is performed on an off-line computer and takes another 1.5 hours to complete.
The human genome is comprised of approximately 3,200 Mb. The completion of the de novo sequencing of a mammalian genome may take as many as thirty 454 microchips. Since the reads are very short, bioinformatics presents a fundamental constraint on the ability to assemble the complex genomes.
To reach the accurate consensus, the genome has to be sampled over 10 times, which generates millions of short sequence reads that have to be assembled in contigs. As states Sheridan, Interpretation of the collected data on this scale is still a challenge.
According to 454 Life Sciences, one of the main advantages of the the companys technology is remarkable uniformity of the distribution of the genomic fragments, both as a function of genome position and as a function of GC content. Both shotgun approaches have to deal with inevitable clone biases, requiring additional cloning and sequencing of under-represented regions.
The uniform distribution of genome would result in fewer gaps than provided by cloning. However, the accurate sequencing of repeats and homopolymers remains a challenge for 454. In humans, coding sequences comprise less than 5% of the genome, whereas repeat sequences account for at least 50% or more.
To be able to span the repeats in complex genomes, the company is developing methods for paired end sequencing in a single well. Other improvements will be directed toward achieving longer reads while speeding up the fluidics to maintain the same overall sequencing speed.