As recently reported in the New England Journal of Medicine, a collaborative effort between researchers at Harvard Medical School and Pacific Biosciences produced the genome sequence for the strain of Vibrio cholerae responsible for the recent cholera epidemic in Haiti. The strain is related to a South Asian cholera variant and had not previously been documented in the Caribbean region or Latin America.
PacBio applied its single molecule real-time (SMRT™) DNA sequencing technology to decode two samples from the recent Haitian outbreak and three other strains of V. cholerae and compared them to DNA sequence information for 23 cholera strains available in public databases. Sequencing of the five sample genomes was completed in less than two days, demonstrating the potential for using NGS for rapid pathogen identification in outbreak situations.
PacBio is in the late stages of its limited production release (LPR) program, optimizing and upgrading the chemistry and software for its beta version RS system and performing validation studies in preparation for a first half of 2011 launch date for the commercial instrument.
Eric Schadt, Ph.D., CSO, presented the cholera study data at AGBT. He highlighted two key advantages of the RS system that have become evident during the LPR period: rapid sample turnaround and the value of long read lengths. Quick turnaround is especially important in the infectious disease space, noted Dr. Schadt. It will usher in “a new era in molecular epidemiology,” allowing a shift from phenotype-based to sequence-based determination of infectious strains.
Long read lengths help uncover large-scale structural variation such as copy-number variation or gene rearrangements, which may have a greater impact on function than SNPS or indels. Referring to the cholera example, Dr. Schadt said, “we were able to achieve 15x coverage in 90 minutes” and to identify large structural variations that contributed to unambiguous differentiation of the bacterial strains.
Dr. Schadt identified two main areas targeted for improvement: throughput and accuracy. The throughput of PacBio’s third-generation methodology does not yet match that of second-generation sequencing technology, and higher throughput will be needed to sequence larger mammalian genomes efficiently.
The first commercial RS system will contain a SMRT Cell with two sets of 75,000 zero-mode waveguides (ZMWs), which are nanometer-sized holes that function as a window for observing DNA polymerase-driven nucleic acid synthesis at the single molecule level. As the density of ZMWs increases, the throughput of a chip will increase.
In the short-term, hybrid assembly strategies that combine second-generation sequencing technology to generate highly accurate short reads and third-generation sequencing to achieve full coverage and assemble the short contigs into a complete genome will help overcome this limitation, in Dr. Schadt’s view. With regard to improving accuracy, PacBio’s beta instrument has an average raw sequence read accuracy of 86%, and the company is aiming for 85%–90% accuracy for the commercial system.
Applications in progress or in development at PacBio include identifying epigenetic modifications at full-genome scale, performing targeted resequencing of medically relevant genes to stratify patient populations, enabling direct transcriptome sequencing, and generating “disease weather maps” that can be used to identify and track changes in human pathogenic viruses present in the sewage system, water supply, and food supply of populated areas.