As reported May 5 online in the journal Nature Methods, collaboration between the Department of Energy Joint Genome Institute (DOE JGI) , Pacific Biosciences (PacBio) and the University of Washington has resulted in an improved workflow for genome assembly that the scientists describe as “a fully automated process from DNA sample preparation to the determination of the finished genome.”
While DNA sequencing has become cheaper and faster, efficient reconstruction of entire genomes remains challenging because the small DNA sequence reads produced by second generation sequencers require assembly into finished genomes to be informative in terms of sequence order and function. Due to short read lengths, multiple copy long repeats in the bacterial chromosome or on separate genomic elements like phages or plasmids often can’t be resolved, resulting in unfinished, fragmented draft assemblies.
And, the authors point out, gaps in draft genome assemblies can also be caused by extreme sequence context such as high GC or AT-rich regions, or palindromic sequences, both of which are frequently not covered by second-generation sequencing methods. While Sanger sequencing has been used to finish genomes and resolve these issues, its laborious and low-throughput nature make the process of finishing slow and expensive.
More recently, hybrid assembly processes that combine short and relatively longer reads generated by so-called and third generation sequencers have been applied successfully for closing genomes of a variety of microbes and also eukaryotic organisms. But these approaches have required the preparation of at least two different sequencing libraries, several types of sequencing runs and sometimes several different sequencing methods.
In the current Nature Methods paper, scientists describe development of a technique, known as hierarchical genome assembly process (HGAP) for high-quality de novo microbial genome assemblies using a single, long-insert shotgun DNA library in conjunction with Pacific Biosystem’s Single Molecule, Real-Time (SMRT) DNA sequencing platform. The authors say the technique eliminates the need for additional sample preparation and sequencing data sets required for previously described hybrid assembly strategies.
With HGAP, “only a single, long-insert shotgun DNA library is prepared and subjected to automated continuous long-read SMRT sequencing, and the assembly is performed without the need for circular consensus sequencing,” the team reported.
The authors describe the application of a fully-automated non-hybrid HGAP process to the de novo construction of several microbial genomes into finished, single-contig assemblies. A key part of this workflow, the investigators say, is a new consensus algorithm which takes advantage of SMRTsequencing quality values, resulting in highly accurate genome sequence results.
This de novo assembly method was tested using three microbes previously sequenced by the DOE JGI. The data collected were compared against the reference sequences for these microbes and the team found that the HGAP method produced final assemblies with >99.999% accuracy.
The U.S. Department of Energy Joint Genome Institute (DOE JGI) is among the world leaders in microbial genome sequencing, focusing on their potential applications in the fields of bioenergy and environment. As a national user facility, the DOE JGI is also focused on developing tools that more cost-effectively enable the assembly and analysis of the sequence that it, as well as other genome centers, generates.