Science fair projects presenting complicated genomic analyses. Smartphones displaying personal genome sequences. Treatment plans specifying genomic modifications. These are just a few of the “bold predictions” outlined in the new strategic vision developed by the National Human Genome Research Institute (NHGRI). These, in addition to the more attainable near-term goals, hinge on the fundamental technology of DNA sequencing.
There is little doubt that genomic sequencing technologies will continue to advance, scrutinizing ever longer base sequences, operating ever more accurately, and generating the information ever more quickly. And all these boons will be delivered at ever lower costs. During 2020, encouraging developments included the arrival of the “$100 genome” (as announced by MGI, a company that introduced a new high-throughput sequencing platform) as well as a study reporting the first complete, gapless reconstruction of a human chromosome sequence.
But when some genomicists look into their crystal balls, increasingly they see beyond the strings of As, Ts, Gs, and Cs. Rather, they focus on the other, perhaps underappreciated aspects of DNA structure that can manifest as health or phenotypic changes. These include the role of structural variants (SVs), the phasing of chromosomes, and DNA modifications—the epigenome. These aspects of genome biology are of tremendous importance to obtain a more complete understanding of the information held in DNA.
Up until now, the excitement in the genomics field, explain Trey Foskett and Brian Kudlow, PhD, co-founders of Watchmaker Genomics, has largely centered around the development of tools for generating high-quality sequences. Those tools have arrived. As a result, Foskett and Kudlow suggest, increasingly more, and diverse, areas of biology will be tagged and sequenced. The two scientists propose the idea of “sequencing as a universal readout.” They suggest that sequencing could be used to measure different aspects of biology in a more high-throughput and multiplexed way. Examples of this can be seen in the work SomaLogic does to measure protein concentrations using aptamers, or in one of the hottest new technologies to enter the genomics space—spatial transcriptomics. According to Watchmaker’s founders, sequencers producing more varied biological readouts will drive the most exciting developments in genomic analysis.
Using a sequencer to probe different aspects of biology takes the development of a new, complementary set of tools. One of the toolmakers is Ivan Liachko, PhD, CEO of Phase Genomics. Liachko tells GEN that he “fell in love” with the work he does using Hi-C—a technique that identifies DNA interactions in the cell—because of the ways the technique opens up new ways to explore the genome. Liachko explains that success in this type of genomic discovery does not depend on additional horsepower. Rather, it relies on “clever tricks.” That, Liachko adds, “is what invention is all about.”
Cutting and running with DNA-binding proteins
A 2017 eLife paper described a new way to profile where proteins interact with DNA called CUT&RUN, which stands for Cleavage Under Targets and Release Using Nuclease. The method described in the paper, “An efficient targeted nuclease strategy for high-resolution mapping of DNA binding sites,” serves the same purpose as chromatin immunoprecipitation followed by sequencing (ChIP–seq), the current standard for protein-DNA interaction analysis, but without some of ChIP-seq’s pitfalls.
The paper’s authors, Peter J. Skene, PhD, director of bioinformatics and computational biology at the Allen Institute in Seattle, and Steven Henikoff, PhD, a scientist at the Fred Hutchinson Cancer Research Center and a Howard Hughes Medical Institute Investigator, described how they developed CUT&RUN around a micrococcal nuclease—an enzyme that is brought to the DNA when an antibody binds to a known target. In CUT&RUN, the nuclease cleaves DNA to release fragments that can be sequenced. The paper’s authors noted that CUT&RUN allows for both quantitative, high-resolution chromatin mapping and probing of the local chromatin environment.
“I do think CUT&RUN will overtake ChIP and is already doing so,” Mary Gehring, PhD, associate professor of biology at the Whitehead Institute, tells GEN. As researchers become more focused on understanding transcription and epigenetic dynamics in specific cell types or even individual cells, methods that scale down to really low input become increasingly important. “CUT&RUN works really well with few cells, which is a major advantage over ChIP,” she says. For Gehring’s work with Arabidopsis thaliana seeds, the new method proved to be “much easier and more fruitful than ChIP ever was for us.”
The CUT&RUN technology was licensed by the North Carolina–based company EpiCypher, founded in 2012 with a focus on epigenetics. Bryan Venters, PhD, director of genomic technologies at EpiCypher, tells GEN that the founders’ goal was to address the longstanding problem of poor antibody specificity to chromatin modifications. EpiCypher focuses on antibodies—the “coin of the realm” in the epigenetics field.
The company moved into the genomics space with these immune-tethering approaches when it licensed the CUT&RUN technology. CUT&RUN sidesteps the problems of ChIP-seq, reduces sequencing costs, and requires less starting material. By commercializing the reagents needed for CUT&RUN, EpiCypher would like to increase the accessibility of the technology and disrupt the ChIP-seq market.
Earlier this year, the first end-to-end human chromosome sequence was published—an enormous effort by the Telomere-to-Telomere consortium, which relied heavily on the value of long-read sequencing, the forte of the Pacific Biosciences and Oxford Nanopore sequencing platforms. With short reads, many DNA fragments need to be assembled to build the full chromosome sequence. And in humans, where there are two copies of each gene, knowing which version (allele) belongs on which chromosome is important. This process, known as haplotype phasing, is particularly important in the clinic to establish, for example, if an allele in a child came from mom or from dad.
But not everyone who needs phased chromosomes has the technological power of large, multi-institutional sequencing programs. Creating enough long reads to be able to choose the longest, best ones, requires a lot of sequencing horsepower—which is not economically viable for many users. An alternative is to use the molecular biology technique known as Hi-C, which identifies sequences of DNA that are physically close to each other in the cell.
Hi-C has been particularly useful in the microbiome field, as researchers sift through large mixtures of chromosomal and plasmid DNA from hundreds of species in a single analysis. Hi-C can identify which DNA came from the same cell, allowing for the tracking of antibiotic resistance and other traits. Originally launched to make Hi-C easier for people to use, Phase Genomics has recently uncovered, in collaboration with Pacific Biosciences, that its Hi-C kits can be used to create high-quality, phased diploid genome assemblies. Phasing is difficult to do alone. Hi-C, says Liachko, is the “little thing at the end that snaps it all together.”
The technique is called FALCON-Phase, which integrates long-read sequencing data and ultra-long-range Hi-C chromatin interaction data from a diploid individual. The method was recently published in Nature Communications in the paper, “Extended haplotype-phasing of long-read de novo genome assemblies using Hi-C.” In the paper, the method was evaluated by application to three datasets, including human, cattle, and zebra finch datasets. The result? High-quality, fully haplotype-resolved assemblies.
Will more accessible, long-read sequencing make a tool like FALCON-phase unnecessary? Liachko maintains that the opposite is the case—more long-read sequencing has increased the need for scaffolding technologies. Even though some sequencing technologies produce reads that are very long, “the chromosomes still need to be finished,” insists Liachko. One reason why people are adopting Hi-C technology more for phasing, he points out, is that these long, “platinum genomes” are within their reach.
Large genomic rearrangements hint at COVID-19 susceptibility
From the start of the COVID-19 pandemic, multiple large, international working groups have been established to analyze the host response to a SARS-CoV-2 infection. These include the COVID-19 Host Genetics Initiative, which was organized by researchers at the Broad Institute, and the COVID Human Genetic Effort, which was established by Jean-Laurent Casanova, MD, PhD, professor at the Rockefeller University, and his colleagues. These groups are working to uncover the genetic basis for the severity of COVID-19.
Another group seeking answers to this question is taking a different approach. This group, the COVID-19 Host Genome Structural Variant Consortium, was formed by Ravindra Kolhe, MD, PhD, director of the Georgia Esoteric and Molecular Laboratory at Augusta University. The goal is to bring researchers together working on the contributions of genomic SVs to immunity and response to infection. This COVID-19 SV consortium, with members from more than 50 institutions, hopes to uncover genomic factors that contribute to SARS-CoV-2 infection, progression, or recovery by analyzing SVs in the genomes of COVID-19 patients.
Alka Chaubey, PhD, chief medical officer at Bionano Genomics, a genome imaging company that uses optical mapping to identify SVs, tells GEN that previous work has associated SVs with host genome immune responses in other infectious diseases. This served as the basis to investigate the role of large SVs in the host genome immune responses to COVID-19.
SVs include deletions, duplications, inversions, translocations, mobile element insertions, and complex alterations that can span tens of thousands of bases. They account for a large amount of human genomic variation and are “a subject of intense interest in the sequencing world,” according to Keith Robison, PhD, a genomics blogger.
Because they are large and can occur in repetitive regions, SVs are often undetectable using typical sequencing methods. As Robison explains on his blog, “Experimentalists develop new protocols for DNA isolation to drive read lengths higher, and the computationalists generate new algorithms to use that data to more precisely and sensitively detect SVs.”
“When it comes to genome analysis,” notes Erik Holmlin, PhD, CEO of Bionano Genomics, “sequencing or even microarray technology leaves a ton of information uncovered.” The missed information includes structural details such as location, quantity, and orientation. Such information, Holmlin asserts, reveals critical aspects of a genome that a sequencer alone cannot measure.
“There are fewer SVs than single nucleotide variants (SNVs), but the impact of SVs is generally greater,” Kolhe tells GEN. For example, he continues, an entire gene can be deleted or broken by an SV, while most single-nucleotide variants (SNVs) have no measurable effect unless they change the coding sequence.
Clinically, SVs are known to play a role in intellectual disabilities, neurodegenerative diseases, and cancer. Holmlin tells GEN that the traditional techniques used to analyze them in patients, such as chromosomal microarray analysis and karyotyping, “really suck.” Bionano Genomics’ mission is to make Saphyr the next-generation equivalent of cytogenetics. “If you’re doing a genome analysis without getting an accurate structural picture,” he advises, “you’re not doing a deep enough analysis for 2020.”
The Saphyr platform, developed by Bionano Genomics to detect SVs, images DNA fragments that are, on average, 350,000 base pairs long—long enough to span repetitive sections. The Saphyr uses imaging of the whole genome, at high resolution, to detect SVs. The process starts by enzymatically attaching fluorescent labels to high-molecular-weight DNA at specific sequences throughout the genome. The labeled DNA is put into a chip where it is linearized—threaded into nanochannels which hold them as elongated straight molecules—and imaged, capturing labeling patterns. Imagine “taking a bowl of rotini and tricking it into being dried spaghetti,” says Robison. This builds a map of the entire genome, a process that, Robison declares, is “pretty amazing.”
The genomics field, observes Liachko, has long been using two-dimensional readouts such as genomic sequences to discover three-dimensional information. But new technologies are starting to unpack more of the information—and the different types of information—held in the genome. In doing so, these technologies are ushering in the next generation of genomic analysis.