In the next-gen sequencing (NGS) arena, the focus over the past several years has been on technological advances, moving from second-generation to third-generation sequencing strategies and producing research instruments capable of delivering whole-genome sequences in parallel at increasing speed. More recently, as read lengths and coverage continue to increase, throughputs rise, and costs decline, the expanding range of applications of NGS has taken center stage.
Concurrently, broader accessibility and affordability of NGS and its promise in the clinical arena have captured the spotlight with the emergence of two new “personal” sequencing systems, opening the door to sequence-based diagnostic and prognostic applications, tumor profiling, treatment selection, and patient stratification for clinical trials. The ability of the NGS technology available today to deliver the goods is evident in the steady stream of whole-genome sequences being reported across microbial, plant, and animal species.
Genomic Health announced results from its next-gen sequencing-driven biomarker discovery program in breast cancer at the recent “Advances in Genome Biology and Technology” (AGBT) meeting. Based on sequencing of the whole human transcriptome in formalin-fixed paraffin-embedded (FFPE) tumor and normal breast tissue samples, the company found hundreds of differences in both coding and noncoding transcripts between the two sample populations. Genomic Health reported an association between specific genes and some non-coding RNAs and risk of breast cancer recurrence.
As recently reported in the New England Journal of Medicine, a collaborative effort between researchers at Harvard Medical School and Pacific Biosciences produced the genome sequence for the strain of Vibrio cholerae responsible for the recent cholera epidemic in Haiti. The strain is related to a South Asian cholera variant and had not previously been documented in the Caribbean region or Latin America.
PacBio applied its single molecule real-time (SMRT™) DNA sequencing technology to decode two samples from the recent Haitian outbreak and three other strains of V. cholerae and compared them to DNA sequence information for 23 cholera strains available in public databases. Sequencing of the five sample genomes was completed in less than two days, demonstrating the potential for using NGS for rapid pathogen identification in outbreak situations.
PacBio is in the late stages of its limited production release (LPR) program, optimizing and upgrading the chemistry and software for its beta version RS system and performing validation studies in preparation for a first half of 2011 launch date for the commercial instrument.
Eric Schadt, Ph.D., CSO, presented the cholera study data at AGBT. He highlighted two key advantages of the RS system that have become evident during the LPR period: rapid sample turnaround and the value of long read lengths. Quick turnaround is especially important in the infectious disease space, noted Dr. Schadt. It will usher in “a new era in molecular epidemiology,” allowing a shift from phenotype-based to sequence-based determination of infectious strains.
Long read lengths help uncover large-scale structural variation such as copy-number variation or gene rearrangements, which may have a greater impact on function than SNPS or indels. Referring to the cholera example, Dr. Schadt said, “we were able to achieve 15x coverage in 90 minutes” and to identify large structural variations that contributed to unambiguous differentiation of the bacterial strains.
Dr. Schadt identified two main areas targeted for improvement: throughput and accuracy. The throughput of PacBio’s third-generation methodology does not yet match that of second-generation sequencing technology, and higher throughput will be needed to sequence larger mammalian genomes efficiently.
The first commercial RS system will contain a SMRT Cell with two sets of 75,000 zero-mode waveguides (ZMWs), which are nanometer-sized holes that function as a window for observing DNA polymerase-driven nucleic acid synthesis at the single molecule level. As the density of ZMWs increases, the throughput of a chip will increase.
In the short-term, hybrid assembly strategies that combine second-generation sequencing technology to generate highly accurate short reads and third-generation sequencing to achieve full coverage and assemble the short contigs into a complete genome will help overcome this limitation, in Dr. Schadt’s view. With regard to improving accuracy, PacBio’s beta instrument has an average raw sequence read accuracy of 86%, and the company is aiming for 85%–90% accuracy for the commercial system.
Applications in progress or in development at PacBio include identifying epigenetic modifications at full-genome scale, performing targeted resequencing of medically relevant genes to stratify patient populations, enabling direct transcriptome sequencing, and generating “disease weather maps” that can be used to identify and track changes in human pathogenic viruses present in the sewage system, water supply, and food supply of populated areas.