January 15, 2018 (Vol. 38, No. 2)
Madelyn Light Scientist Integrated DNA Technologies
Strategies for Rescuing DNA from Preserved Samples
Next-generation sequencing (NGS) has brought about a revolution in the detection of genetic abnormalities. Its unprecedented accuracy, combined with speed and affordability, has made NGS a commonly used technique, both in clinical and research environments.
The success of NGS in diagnostics and research has made it an attractive analysis option, not just for use with fresh tissue samples, but also with older, archived specimens. Many labs have extensive archives containing well-documented tissue specimens, which can be a valuable resource.
A potential issue with archived specimens is that they are often processed for histology directly after retrieval. These formalin-fixed, paraffin-embedded (FFPE) samples retain their structure for long periods of time, but the process may also damage or alter the state of the DNA inside the cells. Here, we discuss how measuring DNA quality can help optimize coverage in NGS, to get the most out of your FFPE samples.
Damaged by the Process
The process of tissue fixation and embedding requires subjecting a sample to a range of processing steps that can affect DNA quality. These steps include formalin fixation, dehydration (with potential for contamination), and exposure to high temperatures for several hours. Furthermore, once samples are embedded, long-term storage can lead to spontaneous degradation of DNA.
The most critical aspect when retrieving DNA from FFPE samples is to break the crosslinks between DNA and other molecules. These crosslinks, introduced during the formalin fixation, affect DNA molecules—binding them either to themselves or to proteins. Inadequate de-crosslinking means less DNA will be available for downstream analysis. DNA extraction kits provide protocols with detailed guidance on choosing the right parameters for lysis and de-crosslinking.
The main parameter affected by the quality and quantity of DNA available for sequencing is library complexity. Adequate library complexity is essential for sufficient target coverage, to achieve sensitive detection of somatic variants. PCR-based amplification helps to increase the quantity of DNA, but not necessarily the complexity of the final library. Figure 1 shows the effects that the quality and quantity of the retrieved DNA have on the sequencing library.
Figure 1. Quality and quantity of DNA affect library complexity. (A) Starting with sufficient high-quality DNA produces a representative, high-complexity library. (B) Low amounts of starting material produce a low-complexity library that is not completely representative of the starting sample. (C) A poor-quality sample produces a low-complexity library, because damaged DNA is not converted during library construction. (D) Increasing the amount of DNA can potentially compensate for low quality and improve library complexity.
A range of commercially available DNA extraction kits can be used to lyse FFPE samples and extract the DNA. To investigate whether it is possible to improve quality of extracted DNA by choice of kit, we assessed the DNA retrieved from five FFPE samples using four different kits (Figure 2). We looked both at the amount of DNA extracted as well as the DNA integrity number (DIN), which is used as a measure of DNA quality.
Our results show that both yield and DIN are unaffected by the choice of extraction kit, indicating that fixation and state of the sample are the factors that most affect the quality of a DNA sample and, as a whole, the quality of the resulting sequencing library. So, how can one rescue a low-quality sample?
How Quality and Quantity Affect Target Coverage
After checking the integrity of the retrieved DNA, we investigated how this translated to target coverage. We took 10 ng of DNA from each sample, prepared libraries using the KAPA Hyper Prep Kit (Kapa Biosystems), and performed target enrichment using the xGEN® AML Cancer Panel (Integrated DNA Technologies). The results are plotted as coverage vs. DIN (Figure 3A).
Figure 3A shows that the maximum mean coverage corresponds very well with the DIN score. This means that assessing the condition of extracted DNA is a good predictor of library complexity.
Maximum mean coverage is an estimate of how many unique molecules will align to a target region when the library undergoes deep sequencing, and is defined as the average target coverage depth that would be expected when sequencing deeply enough to observe 75% PCR duplicates. Highly complex libraries have many unique molecules and can achieve very deep coverage. Low complexity libraries, which contain mostly PCR duplicates, will have very shallow unique coverage. Unique coverage is an important factor in determining an assay’s ability to detect low-frequency mutations.
Is it possible to improve coverage by increasing the amount of input DNA? To find out, we varied the amount of input DNA between 1 and 100 ng, corrected the difference with the number of preamplification PCR cycles, and plotted maximum mean coverage against input DNA.
The dotted lines in Figure 3B represent linear regression of DNA with different levels of integrity. The lines show that there is a near-linear relationship between coverage and input amount, which means that the coverage of low-quality DNA can be improved by using more starting material. Figure 3C shows how much DNA we needed to get 500× mean coverage, and Figure 3D shows how much coverage we got from 50 ng of starting material. Our results indicate that if DNA quality is poor, even 100 ng might not be sufficient to get to 500× coverage.
DNA Quality a Good Predictor of Library Complexity
Archived FFPE samples can be a valuable resource for studying genetic abnormalities with NGS, but the condition of the extracted DNA is a limiting factor for target coverage and therefore, data quality. Measuring the DNA quality after extraction is a good predictor of the complexity of the resulting library, and this information can be used to optimize the sequencing run. This can be done by using DIN, but other commercially available kits, such as the Q score (Kapa Biosystems), work just as well.
It is clear that fixation, embedding, and recovery have a negative impact on DNA integrity and therefore, library complexity. We have demonstrated here that measuring DNA quality is an important step in sequencing from FFPE samples. Samples with less-than-optimal DNA can be rescued by increasing the amount of starting material. It is even possible to estimate coverage based on the quality measurement and the chosen input amount. These results can help clinicians and researchers harness the power of their FFPE samples in their quest to advance understanding of genetic abnormalities.