January 15, 2012 (Vol. 32, No. 2)
Using Systems Efficiently and Economically to Obtain Desired Data at Reasonable Cost
The efficiency of next-generation sequencing (NGS) depends on the quality of the libraries used. Biased libraries with artificially joined segments can complicate genome assembly and may cause incorrect conclusions. Existing NGS sample-preparation kits are not optimized for A-tailing, resulting in libraries that are not efficiently ligated to T-tailed adapters.
In addition, because nonproofreading enzymes such as Taq DNA polymerase add nontemplated purines (both A and G residues) to the 3´ ends of templates, these kits produce both A- and G-tailed templates, which also reduce the ligation efficiency to T-tailed adapters. Inefficient ligation will lead to libraries contaminated with chimeras and concatemers, containing incorrect, artifactually joined segments, which greatly complicates assembly.
Lucigen has developed NxSeq™ technology, an optimized library construction system that repairs template ends and maximizes A-tailing and subsequent ligation to T-tailed adapters. Optimization was accomplished with a molecular assay that allows one to quantify every step of the library-construction process.
In this assay, a synthetic 90 bp target DNA is used, which consists of a mixture of random 3´ ends that require end repair and phosphorylation before ligation is possible. Efficient A-tailing prevents self-ligation and chimera formation. Subsequent ligation of the A-tailed target DNA to fluorescently labeled T-tailed or C-tailed adapters allows for the quantization of tailing and ligation.
Library-preparation results demonstrated an essentially chimera-free product with over 80% efficiency of T-tailed adapters ligated in a single step compared to significant concatemer formation and low T-tailed adapter ligation (7% to 48%) with other systems. This procedure is universal and can be used in the construction of Roche 454, Illumina, and Ion Torrent libraries.
Evaluating Library Creation Efficiency
Sample preparation for the majority of NGS platforms follows a similar protocol. Genomic DNA is sheared to an appropriate size, end repaired, and in most cases A-tailed. Adapters are ligated and the final product is purified or size selected. If needed, the library will be subjected to limited amplification by PCR. Because shearing results in DNA fragments with overhangs, ends must be repaired prior to the addition of 3´ A tails or the ligation of blunt-end adapters. A-tailing of target DNA in combination with ligation to T-tailed adapters is designed to prevent the formation of chimeras and concatemers formed by self-ligation of both target DNA and adapters.
Although kits that perform end repair and tailing in a single tube are simple to use and fast, they are generally not optimized for A-tailing and will add both 3´ adenine and guanine residues at approximately equal rates. The result of inefficient tailing is a mixture of DNA with blunt, A-tailed, and G-tailed ends, and consequently, inefficient ligation of adapters and low complexity libraries.
Lucigen developed a molecular assay for end repair and tailing that allows the quantization of the efficiency of tailing and ligation. In the absence of tailing, a ladder of concatemers will be visible in the assay indicating blunt ligation. Target DNA consists of a double-strand synthetic 90 mer that must be end repaired and kinased before it is ligatable. Adapters were designed having either a T-tail or a C-tail to assess ligation to A-tailed target or G-tailed target, respectively.
Target DNA that is end repaired, but not tailed, will form concatemers when ligated. A target that is G-tailed will ligate to the C-tailed adapter. Before end repair, the target DNA will not self-ligate to form concatemers. After end repair, but prior to A-tailing, the target DNA will self-ligate and form a ladder of concatemers (Figure 1). T-tailed and C-tailed adapters will not self-ligate and will ligate only to target DNA with A- or G- overhangs, respectively.
To benchmark the optimized NxSeq™ technology, two leading NGS library kits representing two different sequencing platforms were compared using the previously described assay. In Figure 2, an ethidium bromide-stained gel demonstrates the efficiency of A-tailing and subsequent ligation to T- or C-tailed adapters for the three systems.
The expected products are denoted on the left side of the gel, with the blue bar representing the 90 mer target and the red bars representing ligated adapter.
If A-tailing is occurring at a higher rate than G-tailing, the intensity of the desired band in the “T-adapter” lanes should be stronger. Examination of the existing NGS library kits reveals equal amounts of A- and G-tailing, along with notable presence of bands that can only exist by the formation of concatemers and chimeras. In contrast, the Lucigen library exhibits the high efficiency of a T-tailed ligated adapter with essentially no side products (concatemers or C tails).
The Effect of Chimeras
By eliminating chimeras and concatemers, one may theorize that improvements in the quality of NGS data will follow. This can be tested by directly comparing library-preparation methods with replicate samples, generating sequencing data, and running statistical analyses for bias and complexity.
Genomic E. coli DH10B DNA was sheared and prepared using Lucigen’s NxSeq technology and that of a leading supplier.
Figure 3 graphically represents the sequence bias obtained from these two libraries when compared to the reference genome. A tighter line with less deviation from the mean represents the library with lower bias and thus, more accurately matches the reference sequence. Along with lower bias, higher library-creation efficiency results in greater levels of library complexity. Higher percentages of genome coverage are achieved with fewer total reads, and fewer repeat reads are generated.
Despite the advent of less expensive platforms for next-gen sequencing, the cost of sequencing, analysis, and data storage continues to be a major constraint for researchers. Existing commercial kits are inherently inefficient in their ability to A-tail and ligate adapters, resulting in lower library complexity and greater bias. The outcome of this inefficiency is increased time and cost associated with the data analysis required to complete a genome or even the inability to obtain the desired data at all. By improving the quality of the library, the return on investment for sequencing projects can be significantly increased.
Michael Lodes, Ph.D. ([email protected]), is senior scientist at Lucigen. Web: www.lucigen.com.