Commonly used sequencing approaches do not capture full information from both genetics and epigenetics. Some of these may involve separate, parallel workflows and sequencing to produce full information, which may increase sample requirements, cost, and time, or produce data with no phased information.

Researchers from the University of Cambridge and Cambridge Epigenetix Ltd. present a method for acquiring accurate, phased genetic and epigenetic information, in a single experimental and data workflow. Notably, the approach is compatible with any sequencer platform. This approach will likely offer numerous advantages in research and diagnostics by providing more comprehensive biological information, with ease of use and at a reduced cost.

The article “Simultaneous sequencing of genetic and epigenetic bases in DNA” was published in Nature Biotechnology.

A short history of genetic and epigenetic sequencing

Most methods for sequencing DNA deal with either genetics or epigenetics, meaning they need to work in parallel and give only partial information. Also, combining different datasets is hard and can lead to measurement errors that add up and coverage gaps across workflows.

The heart of this problem is that many of the methods used to find epigenetic DNA bases don’t work because they don’t catch common C-to-T mutations—the most common mutation in the mammalian genome and in cancer—or tell the difference between 5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC). The ambiguity caused by the inability to distinguish C-to-T conversions and 5mC from 5hmC increases the number of false-positive matches in the search space, consequently making the computational alignment and mapping of converted reads slower, more expensive, and less accurate.

Several base-conversion chemistries have been developed to help tell the difference between C that has not been changed and its 5mC or 5hmC epigenetic variants. These include bisulfite-based approaches such as whole-genome bisulfite sequencing (WGBS) and bisulfite-free approaches such as enzymatic-methyl sequencing (EM-seq) and TET-assisted pyridine borane sequencing. However, an important shortcoming of all such methods is that converting either the C base or one of its epigenetic derivatives to a U (read as T) compromises the direct detection of genetic C-to-T changes.

Also, these existing methods cannot distinguish 5mC from 5hmC in a single workflow. Methods to distinguish 5hmC from 5mC by exclusively converting one base have been developed, for example, oxidative bisulfite sequencing, TET-assisted pyridine borane sequencing-beta, Tet-assisted bisulfite sequencing, APOBEC-coupled epigenetic sequencing, or by selectively copying 5mC across strands of DNA.

Third-generation sequencing systems, such as single-molecule real-time (SMRT) sequencing and the detection of DNA cytosine methylation using nanopore sequencing, also measure epigenetic and genetic modifications in the same workflow by detecting signal differences directly caused by epigenetic modifications. Since DNA changes like methyl and hydroxymethylcytosine are lost during PCR, they can’t be used to make more copies of the starting material. So, they are severely limited by the read depth that can be achieved with low DNA inputs. This means they can’t sample circulating tumor DNA (which is only a small part of cfDNA) or measure epigenetic changes at specific loci that are more than 0% or 100% methylated.

True simultaneous sequencing

When sequencing DNA using a single-base coding system with a four-state readout (G, C, T, and A), you can report on a maximum of four genetic or epigenetic information states in a single run. A two-base coding system, in which combinations of two bases send information about a state, can be used to decode up to 16 states clearly. This makes it possible to read all four genetic states and multiple epigenetic states in a single run.

The research article in Nature Biotechnology shows a whole-genome sequencing method that can sequence the four genetic letters plus 5mC and 5hmC to get an accurate digital readout of all six letters in a single workflow. With this method, the DNA sample is processed using only enzymes, which keeps the DNA from breaking down and keeps the genome coverage from being skewed. When the bases on the original and copy strands are decoded at the same time, it gives a phased digital readout. This gives a more complete picture of the information stored in genomes. The researchers apply this method to human genomic DNA and cell-free DNA extracted from a cancer patient’s blood sample.

At the end of the research article, it says that future work will focus on measuring more changes by using more information states that are built into the technology.