Toward a $1,000 Genome
Working in nanoscale drastically reduces the preparation time, handling, inventory, and storage costs associated with the traditional whole genome sequencing approaches. The 454 technology does not require subcloning, bacterial propagation, or handling of individual clones.
The whole genome is nebulized, and the library is subjected to several enzymatic steps, followed by ligation to the specialized adaptors. The library is fractionated, and the individual fragments are captured on Sepharose beads.
Each captured molecule is PCR amplified on the bead within a droplet of the buffer (microreactor) suspended in an oil emulsion. This unique miniaturization approach generates approximately 1,000 microreactors per microliter.
DNA-carrying beads are arrayed in the wells of the fiber-optic slides, and the individual fragments are sequenced all at the same time. This is like moving from a vacuum tube to a transistor, says Dr. Rothberg. We have no limits on the problems we can solve. It is a disruptive technology.
It is a very exciting technology, agrees Lori Murray, senior manager public relations at ABI. However, it would be premature to talk about total displacement of capillary sequencing. The 454 technology is confined to whole genome sequencing, and it is not able to satisfy the strong demand for genotype discovery, that is re-sequencing of the same gene across the population.
The technology is also limited to sequencing of smaller organisms. At the present, capillary sequencing is the most versatile technology, which is able to work with whole genomes as well as with one gene at a time.
Current strategy for human genome sequencing is based on cloning of individual genome fragments (the shotgun approach), with cost estimates anywhere from $10 to $25 million. The publicly sponsored Human Genome Project (HGP) utilized a hierarchical shotgun sequencing approach, whereas Celera Genomics (Rockville, MD) used a whole-genome shotgun approach (Table).
It took four to five years for the HGP to complete the preparatory work of the crude mapping followed by 15 months of intense sequencing efforts involving 20 centers around the world, working around the clock. By bypassing the genome mapping stage, Celera was able to reduce the overall time required to produce the equivalent data to nine months.
454 Life Sciences aims to sequence the human genome by the end of 2006 at a cost of $100,000. Our approach reduces the operating costs to less than 10% of any other method, says Dr. Rothberg.
The company packages all the necessary reagents into a $5,000 kitenough for the sequencing of a minimum of 20 million bases, which will provide 10x over sampling for a 2Mb genome.
For instance, 2.1 Mb Streptococcus pneumonae genome can be prepared and sequenced in about 15 hours using just one microchip, even though the 454 method requires 50% more reads to yield the same number of contigs as does the Sanger method. Data processing, including de novo assembly, is performed on an off-line computer and takes another 1.5 hours to complete.
The human genome is comprised of approximately 3,200 Mb. The completion of the de novo sequencing of a mammalian genome may take as many as thirty 454 microchips. Since the reads are very short, bioinformatics presents a fundamental constraint on the ability to assemble the complex genomes.
To reach the accurate consensus, the genome has to be sampled over 10 times, which generates millions of short sequence reads that have to be assembled in contigs. As states Sheridan, Interpretation of the collected data on this scale is still a challenge.
According to 454 Life Sciences, one of the main advantages of the the companys technology is remarkable uniformity of the distribution of the genomic fragments, both as a function of genome position and as a function of GC content. Both shotgun approaches have to deal with inevitable clone biases, requiring additional cloning and sequencing of under-represented regions.
The uniform distribution of genome would result in fewer gaps than provided by cloning. However, the accurate sequencing of repeats and homopolymers remains a challenge for 454. In humans, coding sequences comprise less than 5% of the genome, whereas repeat sequences account for at least 50% or more.
To be able to span the repeats in complex genomes, the company is developing methods for paired end sequencing in a single well. Other improvements will be directed toward achieving longer reads while speeding up the fluidics to maintain the same overall sequencing speed.