Caroline Seydel Contributor GEN
Tiny Device Fills in Some Gaps in the Sequence of the Human Genome
For the first time, researchers using a nanopore sequencer have assembled a human genome using ultra long reads. This achievement, described today in Nature Biotechnology, marks a milestone for the technology, as well as a considerable step toward eventually completing the human genome.
“The ability to do this is significant,” Matthew Loose, Ph.D., associate professor of medicine at the University of Nottingham and one of the lead authors on the paper, tells GEN. “This is the first time it’s been possible to generate enough data on a nanopore sequencer to perform a de novo assembly of the human genome.” Using the Oxford Nanopore Technologies (ONT) MinION sequencer, the team generated over 91.2 Gb of sequence data, or 30x coverage of the genome. They achieved single reads of up to 882 kb, with over half the reads coming in at more than 100 kb.
These ultra-long sequences allowed for an impressively contiguous assembly. They also closed 12 remaining gaps in the reference genome, in highly repetitive sequences where shorter reads won’t do the trick. “If you imagine a jigsaw where you have two regions of the jigsaw that have exactly the same image on them, there’s just no way of resolving which bit goes where,” Loose explains. “As you get longer and longer reads, you start to get more unique elements in those reads that allow you to see differences as to where they should be placed.”
The longer reads also help “phase” the sequence, or sort out which single nucleotide variants go together on each chromosome. Here, the authors phased the entire major histocompatibility complex (MHC) region, one of the most gene-dense and highly variable regions of the genome. The long stretches of sequence data allowed the team to successfully reconstruct both heterozygous alleles.
“It’s definitely opening up regions of the genome that are intractable to short reads,” says Shawn Baker, genomics advisor and consultant at SanDiegOmics.com, who was not involved in the work. “In terms of helping to understand the genome, it’s a win.”
Accuracy may still be a sticking point, however. Baker points out that the nanopore sequence data alone weren’t as accurate as other sequencing techniques, compared with the reference genome. “There appears to be substantial systematic error that can’t be corrected with nanopore reads alone,” he says. “They needed to bring in Illumina’s high quality short reads to improve the accuracy.”
Post-processing tools help boost accuracy, such as software like “nanopolish” that helps verify the correct identity of the bases. Jared Simpson, Ph.D., computational biologist at Ontario Institute of Cancer Research and one of the paper’s coauthors, points out that the raw signal contains a fair amount of ambiguity. Nanopore sequencing works by threading a strand of DNA through a nanopore, a tiny hole created in a membrane by a specially designed protein. An electrical current is passed through the protein, and each nucleotide base disrupts the current differently. Creating a sequence depends on deciphering these electrical disruptions.
“The observed signal depends on multiple bases in and around the pore, not a single base,” Simpson says. For instance, an A surrounded by C’s will look different than an A flanked by a G or a T. Because of that ambiguity, the initial interpretation by the base-calling software may be flawed.
To minimize those errors, Dr. Simpson created nanopolish, which considers the context of the surrounding bases to correctly decode the signal data. In this case, nanopolish brought the sequence accuracy up from 95.74% to 99.44%, and mapping against Illumina data pushed that figure to 99.96%, the authors write.
Epigenetic markings could introduce another source of inaccuracy. Because nanopore sequencers use DNA fresh from the cell, without amplification by PCR, these DNA molecules retain their epigenetic modifications, which the nanopore detects. Bases with modifications, such as methylation, create a different electrical signature than unmodified bases.
“We naïvely think that current change just tells you about A, T, G, and C, but in reality, that current change is telling you a lot more,” Dr. Loose says. The signal from the modified base generates a different electrical signal than a regular one, and that signal could get misinterpreted by the standard base-calling software.
Nanopolish has already been trained to translate the telltale signal of methylation on a nucleotide, and the ability to decode other modifications may not be far behind. “Being able to read epigenetic information directly could be revolutionary,” Dr. Loose says.
So far, nanopore technology has been used primarily to sequence microbial genomes, for instance, tracking the Ebola virus outbreak in Africa or the Zika virus in the Americas. Proving it capable of human genome sequencing could boost the popularity of nanopore sequencing for other applications. “It will definitely get some researchers to start thinking about using the MinION for larger genomes,” says Baker.
Though it’s still prohibitively slow to routinely sequence the entire human genome with the pocket-sized device, some work has already shown that nanopore sequencing works better than short-read techniques for detecting complex chromosomal rearrangements, as in some cancers and rare inherited diseases.
ONT has also developed larger-scale nanopore sequencers that generate piles of data in a fraction of the time of the MinION, though these remain considerably more expensive to start using than the portable version. But, says an ONT representative, “The GridION and PromethION come capital-free, like the MinION. GridION cost-per-base is actually the same as, or better than, MinION. We expect [PromethION] to be dramatically cheaper in terms of cost-per-base too, again without having to commit capital.”
To get the full benefit of these long reads requires high quality, high-molecular weight DNA, which means the samples much be handled much more carefully than those used for short reads. It also requires huge amounts of starting material, in the neighborhood of 10 micrograms to produce 2 Gb of sequence. That much DNA may be hard to come by in some applications. Work is underway to optimize these techniques and reduce the amount of material needed to get good data, but it remains a challenge.
Although nanopore technology isn’t quite as mature yet as that available for short-read sequencing, Baker says, “the situation should start to improve as the platform rises in popularity.”
*Updated on 1/29/18 to include a comment from ONT.