GEN UPDATES in biotechnology:
Next-Generation Sequencing
New DNA-Sequencing Technologies Advancing
Richard A. Gibbs, Ph.D.
For the first time in nearly a decade there is a new choice in commercially available DNA-sequencing platforms. As a result there has been a flurry of development activity that promises to lead to other new platforms being generally available. This has energized the sequencing community and, more than ever before, encouraged new entrants into the field.
The achievement of the human genome project was entirely performed by
fluorescent Sanger di-deoxyribonucleotide sequencing, which was almost
exclusively provided by Applied Biosystems.
Other vendors also supported fluorescent/Sanger-based methods, including
Molecular Dynamics (GE Healthcare), Beckman Coulter, and LiCor Biosciences,
but Applied Biosystem’s AB 3700 was the workhorse for the first
mammalian genome. As a consequence there are more AB machines in modern
laboratories than any equivalent devices.
As the human sequence was being generated it was recognized that there would be an ongoing and significant reduction in sequencing costs, but it was felt that the data was urgently needed and there should be no waiting for costs to come down before at least starting the main thrust of the project. In addition, it was recognized that ongoing activities would actually drive down future sequencing costs, so in a sense the human genome project (HGP) would be responsible for increased efficiencies as well as take advantage of them.
The sense of urgency and the prospect of large-scale production being a cost-driver supported the model of going with the best at the time and helped to perpetuate the current near monopoly of technologies.
Development of other Methods
In the pre-Applied Biosystems /HGP era, there was active development of other methods. For example, unlabelled mixtures of Sanger sequence products were transferred to nylon membranes and repetitively probed to reveal sequence ladders, and microchannel devices were formed to simplify electrophoresis. Others shaped microfabricated platforms to more easily manage molecular manipulations, and oligonucleotide hybridization was explored.
In addition to improved Sanger chemistry, pyrosequencing was developed to measure the release of pyrophosphate during each base addition. More ambitious innovations included sucking DNA strands down small holes while measuring their profile, and degrading bases from the ends so that each could be detected in a flowstream.
Scientists at Baylor College of Medicine’s Human Genome Sequencing Center and others proposed sequencing-by-synthesis so that stepwise additions of individual bases could be detected before reporter groups were released and a new base added. The enzymes and nucleotides available at the time prevented success for those schemes.
Much of the activity and interest in these schemes subsided through the course of the HGP as resources shifted to data production and more incremental refinements of the Sanger technologies. This was a good thing—it allowed the generation of HGP data and moved the field of genomics ahead rapidly. It also meant that the standard of Sanger data that could be produced became very high—currently 800 base reads with accuracy of >99.5% is not uncommon. A wealth of accompanying data-management tools were also developed so these Sanger data could be assembled and manipulated with relative ease.
The one downside of this productive period was the reduced speed of development of alternative sequencing methods.
The human sequence has now been joined by more than 20 other mammals as well as innumerable model organisms and smaller genomes. There remains an urgency for data production, but the situation is different from that in the early 1990s.
First, there are fewer large projects where the data should be obtained at all costs. Put simply, consumers of data are now less inclined to pay any price for large data sets, but are instead in a better position to shop around for data that is of the very highest value.
Second, the high standard of the individual DNA-sequence reads produced by Sanger methods are not always the best data in the new models. With plenty of reference sequences for complex genomes, and a need to analyze simple ones, the production of shorter reads and tolerance for higher error rates has increased.
In this climate the emphasis has shifted back to development of new methods and platforms and examination of their competitive performance.
Emphasis Shifts to New Methods
454 Life Sciences, Solexa, Helicos Biosciences, Visigen Biotechnologies, and Nanofluidics, among others, stepped into this arena. During the HGP period these groups continued to innovate and now they are increasing their levels of activity in order to challenge the status quo.
454 has a product for sale, and for the usual price of a DNA sequencing instrument (several hundred thousand dollars) you can have one in your own laboratory. The 454 technology is able to take advantage of emulsion PCR protocols and highly parallelized formats to generate sequences from hundreds of thousands of unique samples simultaneously.
Meanwhile Solexa is reporting impressive machine-performance statistics and expects to soon be delivering machines. It is not clear how far behind the other innovators are, but reports of beta devices and soon-to-be-deployed strategies abound. They are eagerly awaited.
Whatever happens in this sequencing race, there is little doubt that new kinds of data will be commonplace. The expected main feature of these data is that they will be less expensive on a per-base basis, but they may have a lower quality and length than enjoyed by current Sanger methods. This disparity may not prevail, but 800 base reads with >99% accuracy are hard to beat. So at least in the short term there will be many opportunities to make use of shorter reads.
There are many examples where these shorter reads have been effectively assembled into smaller genomes. Larger projects are being tested. But the real opportunity will be in the arena of repeat-sequence applications where the methods are applied to the detection of subtle genetic variation.
In these experiments the reads are either randomly sampled and mapped to known reference sequences or else directed PCR is used to amplify specific segments of complex genomes and to sequence those fragments deeply. These sequences are easy to align as we know where they should match, and the possibility of errors in the base calls is minimal, since when the same fragments are analyzed many times the normal sequence is simply the pattern that recurs most of the time.
In human genetics, as well as many other fields, the discovery of new variation is the most important current endeavor. Without a doubt the recent platform innovations will have a role in the discovery of new variations. The precise pathway to discover all human sequence variation is not yet clear but it is likely that methods other than fluorescent Sanger dideoxy sequencing will have prominent role.
Richard A. Gibbs, Ph.D., is a professor at Baylor College of Medicine (BCM) and is also director of BCM’s Human Genome Sequencing Center. Web: www.hgsc.bcm.tmc.edu. E-mail: .

