Upon completion of the human genome sequence in 2003, a vision for the future of genomics research was put forth by the U.S. National Human Genome Research Institute (NHGRI) and published in Nature. In it, the NHGRI called for researchers to develop technology that would allow sequencing of a human genome for $1,000.
This is no small order. Using today's state-of-the-art capillary-based DNA sequencers it still costs well over $10 million to sequence three billion base pairs, the amount of DNA in the human genome. But many different groups, both commercial and academic, have taken on the challenge to drastically reduce the cost of DNA sequencing. With one race barely just finished, a new one is already under way.
Last fall, the NHGRI awarded grants to spur the development of lower-cost DNA sequencing technologies. Representatives from Microchip Biotechnologies (Fremont, CA), Agencourt Bioscience (Beverly, MA), 454 Life Sciences (Branford, CT), LI-COR (Lincoln, NE), and Stephen R. Quake, Ph.D., of Stanford University were among the awardees developing near-term methods to sequence a human-sized genome for $100,000.
According to the NHGRI, "there is strong potential that, five years from now, some of these technologies will be at or near commercial availability."
The approach of Microchip Biotechnologies is unique among awardees in that it plans to extend current Sanger sequencing methods. "Big capillary-based DNA sequencing analyzers are nearing the end of their technological life cycle. The next major move is to replace big clumsy capillary sequence analyzers with a chip-based system, which solves several problems simultaneously.
"What we're doing is an evolutionary progression of the same chemistry. The key innovation is in reducing volumes and sizes, and in automating," says Roger McIntosh, vp engineering. Compared with efforts based on new technologies, "the risk is much reduced," he adds.
"To get to a $100,000 genome, you have to eliminate all typical consummables. Even if you use only one pipette tip per read, that adds up. Another major cost is labor, so the system has to be totally automated. This matches up nicely with the microfluidic approach," says Stevan Jovanovich, Ph.D., president and CEO.
In Microchip's technology, individual DNA fragments are attached to a bead surface and amplified. Sample preparation, separation, and detection will be completely integrated and automated, taking place on a microfluidic chip (a six-inch round glass wafer with etched channels).
"Lots of little pieces and technology elements need to be developed and tied together. Some are being worked on by the Mathies lab at the University of California.
"Another element is to develop new gels or separation matrices suitable for use in tiny channels. People have used gels in capillary machines, but they're not entirely appropriate for use on chips. This is being addressed by the Barron lab at Northwestern," says McIntosh.
"When people think of automated equipment, they think of robotic arms. But now when we talk about it at this level, actuating tiny elements on a chip under computer control, the inherent reliability goes up. The technological leap is comparable to going from vacuum tubes to transistors in electronics. We're going from big, large-scale robots to automated systems on a chip," says McIntosh.
Agencourt Bioscience claims it has one of the largest commercial sequencing facility in the world. Gina L. Costa, Ph.D., director of new technology development, says that the company's extensive DNA sequencing infrastructure and existing customer base provide advantages for advancing internal technology development.
Sequencing by Synthesis
Like other awardees, Agencourt is taking a sequencing-by-synthesis approach, in which measurement takes place as nucleotides are incorporated into DNA. The firm's technology is based on polony sequencing, short for polymerase-generated colony, originally developed in the laboratory of George Church, Ph.D., at Harvard University.
Agencourt has optimized methods for bead-based polony sequencing. In this system, clonal populations of short DNA fragments are amplified onto beads, then packed tightly onto a slide containing a gel matrix. In a flow cell, reactants flow over the slide to allow DNA synthesis to take place.
"Hundreds of millions of DNA beads can be sequenced in each run," says Dr. Costa. "Our technology speeds up conventional sequencing by 100-fold, and will be the equivalent of 500 to 1,000 ABI 3730xl instruments a day."
Agencourt's methodology, like other non-single molecule sequencing by synthesis approaches, affords read lengths of only 50 to 150 base reads. This makes sequence assembly more challenging. To get around this, Dr. Costa says the company uses "tricks" to get paired end information across short reads, which allows researchers to put short segments of nucleotides "together like a jigsaw puzzle."
"What really sets us apart is our simple and robust biology," Dr. Costa says. "Our system is not complicated. Simplicity and consistency are important attributes."
454 Life Science is "refining and advancing the performance of sequencing by synthesis," according to Marcel Margulies, vp engineering. Margulies says 454's technology is extremely high throughput, both in terms of sequencing reactions and front-end sample preparation.
"We perform sequencing by synthesis on solid support and on 400,000, 500,000, and even 700,000 clonally amplified fragments simultaneously," he says. "Entire floors of robots" can be devoted to sample preparation for conventional sequencing.
"With our approach, irrespective of the size of the genome, you can do library preparation with one person, with a single sample preparation, in a couple hours," says Margulies.
DNA is amplified and bound to beads, which are then deposited into a fiberoptic plate with 1.6 million wells. Because of size exclusion and a 2.5 times oversupply of wells, only one bead gets deposited into each well. Nucleotides are then added in sequence. When incorporated, inorganic pyrophosphate is released.
"That release is observed through emission of light that is captured. Measurement takes place at each incorporation. The beauty is that if you have a homopolymeric stretch, all will incorporate at the same time, and you get five times as much light," says Margulies.
Margulies says 454 generates a tremendous amount of data because of the number of reads performed simultaneously. To address this, they have outfitted computers with FPGA (field programmable gate array) chips.
"The entire computer takes advantage of this to do data processing in real time. The way we designed the system, at the end of the run, we have reads that have been corrected and quality scored."
454 has also developed its own assembly software. "Typically people convert signal to nucleotide letters, and then use the letters to do all the manipulations. We generate that as well, so one could use that information in conventional assemblers or mappers.
"But this may not ultimately be the best way of doing things. We have found that if you use signals themselves to find homologies as opposed to the letters, you can do a better job," Margulies explains.
LI-COR says that it pioneered the development of infrared fluorescence labeling and detection systems for DNA sequencing. The company is now pursuing single molecule DNA sequencing using charge switch dNTPs.
"When a nucleotide is incorporated into DNA, pyrophosphate is cleaved from the nucleotide. While dyes are typically attached to the base moiety, in our case the label is on the gamma phosphate so that the released pyrophosphate is labeled," says John Williams, Ph.D., principal scientist. One of the greatest challenges in developing single molecule sequencing is in imaging.
"With a single dye molecule, the signal-to-noise ratio is low. To get around error, you have to oversequence. This drives the cost back up," says Dr. Williams. One way LI-COR is working to get around this problem is by sequencing "using brighter labels giving a stronger signal," explains Dr. Williams.
LI-COR's methodology, while still under development, holds advantages over other approaches, because "there is no limit to read length except the length of the template," adds Dr. Williams.
Because electrophoresis cannot differentiate size differences of only one nucleotide of longer DNA sequences, read lengths are currently limited to about 1,000 base pairs.
Non-single molecule sequencing by synthesis technologies have inherent limitations to read length as well: since sequencing reactions are not 100% efficient, the population of lengthening DNA strands eventually gets too far out of phase for accurate sequence determination.
Long read lengths are important for two reasons. For one, it makes sequence assembly of the genome easier, since there are fewer pieces to the puzzle. Another advantage is that you get haplotype information.
"The single molecule approach lets you sequence each chromosome. You are able to see both chromosomes and how they differ from each other. If you had 100 base pair reads, you wouldn't know which polymorphisms go with which chromosomes," adds Dr. Williams.
Stephen R. Quake, Ph.D., professor of bioengineering at Stanford University, was another grant recipient focused on single molecule sequencing by synthesis.
Practical Sequencing Platform
Helicos BioSciences (Cambridge, MA), founded in May 2003, is working to create "a practical sequencing platform from my group's technology," Dr. Quake boasts. Stan Lapidus, president and CEO of Helicos, believes that "moving away from the Sanger sequencing technology breaks the price and throughput bottleneck" for whole-genome sequencing.
The Quake/Helicos technology relies on the detection of fluorescence resonance energy transfer on a total internal reflection microscope.
"On a single 10-cm proprietary substrate, we sequence billions of single strands of DNA in a single experiment," says Lapidus. The hope is to be able to sequence a whole genome in days using the methodology; eventually it may take only hours.
In simple terms, labeled nucleotides (i.e., G's) are added to immobilized DNA and then washed away. A picture is taken, and strands with the brightest spots have added that particular base at that point in the sequence. The process is then repeated with each other nucleotide, again and again.
In a sense, the pictures become "cross sections" of the ultimate DNA strand, in that the full sequence is realized by "stacking" all the pictures on top of each other.
"Our advantage is that we can do high densities. We expect to routinely reach a density of one million molecules per square millimeter and ultimately reach 10 million molecules per square millimeter." The company plans to have a demonstration project by the end of 2005.
It's not yet clear when and if any of these technologies will allow consumers to get a human genome sequenced for $100,000. And maybe more than one group will need to succeed to really get prices to drop. "Competition and market pressure are going to push the price down," Dr. Jovanovich says.
What is clear is that many groups are hard at work, and if DNA sequencing does become more affordable, modern biological research and the practice of human medicine will continue to undergo rapidfire change.