When asked about the daily reality of developing DNA sequencing technology for a high-throughput sequencing center—a job that I’ve enjoyed every minute for the past 20 years—I typically respond, “Never a dull moment!”
When I find time to reflect upon the amazing trajectory of innovation that has brought the field to the current state, I marvel at what has been accomplished in a relatively short time period. Think about it.
Fred Sanger and colleagues first began publishing DNA sequencing methods in the mid-1970s, culminating in their description of the dideoxynucleotide chain termination method in PNAS in 1977.1 I learned DNA sequencing in the mid-1980s from Bruce Roe, my Ph.D. mentor, who learned from Fred Sanger and Alan Coulson.
At that time, the state-of-the-art included labeling four separate reactions for each
template with 32P-dATP and the Klenow fragment of E. coli DNA polymerase, separation on hand-assembled and poured polyacrylamide-urea gels, and autoradiography. Sequence data was entered by one hand into the computer (thankfully, the QWERTY keyboard contains the A, C, G and T keys on the same half!), while peering at the gel on a fluorescent lightbox and keeping track with the other hand.
Sounds like ancient history, but that was less than 30 years ago. In retrospect the dramatic changes in DNA sequencing in terms of its scale of execution, the interdisciplinary efforts required to produce, analyze, and interpret the data, the expanding impact of DNA sequencing on biological research, and, ultimately, on our economy have compounded at an amazing rate in that time frame.
This acceleration seems quite likely to continue as the pace of sequencing and its reach across the biological sciences shows no sign of slowing. Research activities in several areas that have been transformed by next-generation sequencing technology may be illustrative of this trajectory and its predicted acceleration.
In human metagenomics studies, for example, the sampling of bacterial, viral, and eukaryotic organisms taken from a specific area of the human body by gross sampling methods and DNA isolation has been greatly accelerated by next-generation sequencing (NGS) technology.
Generating NGS data from these metagenomic isolates addressed the two major issues—cost and ease of data generation—that kept early metagenomic studies from achieving their full potential.
With the decreased cost of NGS reads, significantly deeper sampling of each population isolate can be achieved and hence minor species detected. The digital nature of these reads further enables a measure of the relative proportions of each population member.
The simplicity of NGS library preparation, the ability to make these libraries from tiny input amounts of DNA, and the elimination of a bacterial cloning intermediate have been invaluable to improving the representation of all species present in the populations sampled and to expanding the types of samples we can address.
The development of data-analysis approaches that encompass the increasing size of the sequence datasets that result from deep NGS sampling and that can accurately mine information from them has re-engaged computational biologists and has resulted in a staggering amount of innovation.
Resulting “big science” projects such as the NIH’s Human Microbiome Project, Europe’s MetaHit, and other international projects, as well as the research of independent investigators, have begun to define bacterial diversity in human health and disease.2-3
Interestingly, we now have a bacterial census of the intestines of cats and dogs4, pigs5, and the octopus.6 Novel pathogenic viruses have been discovered by mining metagenomic datasets, and etiologic agents have been identified in disease outbreaks.7-8
By studying RNA isolates converted to cDNA from various sources, a new experimental approach called “metatranscriptomics” has resulted, and can be applied to characterize the metabolic potential of each population.
Metagenomics also has been applied to characterize environments unrelated to human health, such as soils9-10, lakes11, and thermal springs in Russia12, among myriad others. In fact, a quick search of PubMed with “metagenomic” reveals 1,382 references. Most have been published since 2005, across an incredible breadth of topics that reflect the explosion of this scientific endeavor in basic biological discovery, data analysis, data mining, and methods development.
As the transformation of metagenomics by NGS and advanced analytical approaches continues, it will be interesting to see its impact on diverse areas such as food safety monitoring (evidence the need for continuing vigilance as yet another E. coli strain impacts human health in Europe), and pharmaceutical product development and quality control (such as in vaccines or other live-cell products).
Another possible use will be diagnosis for optimal antibiotic treatment in patients affected with pathogens known to harbor a spectrum of antibiotic resistance such as methicillin-resistant Staphylococcus aureus (MRSA). There are numerous other applications.