While high-throughput DNA sequencing has made genomic sequencing almost routine, it does leave scientists an unwelcome, time-consuming chore—assembling a plethora of DNA scraps into a coherent whole. To accomplish this chore, often called genome scaffolding, scientists must resort to labor-intensive, low-throughput techniques. Even then, it may be difficult or impossible to determine where a particular DNA fragment belongs. Genomes are full of highly repetitive sequences that appear in a multitude of places.
In a development that promises to lessen the tedium of genome assembly, scientists at the University of Massachusetts Medical School have developed a new method for piecing together the short DNA reads produced by next-generation sequencing technologies. Job Dekker, Ph.D., and colleagues have shown that entire genomes can be assembled faster and more accurately by measuring the frequency of interactions between DNA segments and by using their three-dimensional shape as a guide. Employing this technique, they have been able to place 65 previously unaccounted for DNA fragments in incomplete regions of the human genome.
The scientists describe their approach in a paper that appeared November 24 in Nature Biotechnology. The paper, entitled “High-throughput genome scaffolding from in vivo DNA interaction frequency,” shows how they looked to the three-dimensional structure of the genome as a guide for assembling linear DNA sequences.
The key, the scientists write, is a technology called Hi-C: “Hi-C is an experimental technique that measures the in vivo spatial interaction frequency between chromatin segments over the whole genome, by cross-linking loci that are in close physical proximity and quantifying them with high-throughput, paired-end sequencing.”
In other words, Hi-C is used to measure how frequently each DNA fragment in the genome interacts with others. DNA sequences that are located near each other in the three-dimensional genome tend to interact more frequently, while DNA sequences that are further apart interact less frequently. Computational methods are then used to mathematically determine the linear genomic position of each fragment in the genome based on the 3D interaction frequency data that fits that sequence.
Assessing one particular advantage of their technique, the scientists state that “the features which make the canonical Hi-C interaction patterns a hindrance for the analysis of specific looping interactions, namely their ubiquity, strength, and consistency, make them a powerful tool for estimating the genomic position of contigs.”
One of the study’s authors, Noam Kaplan, Ph.D., a postdoctoral research fellow in Dekker’s lab, elaborated on the technique’s usefulness: “While a particular sequence may fit in many places in a linear genome, we can determine if a particular sequence is a better fit, three dimensionally, in one location versus another, based on interaction data.”
Reflecting on the implications of his lab’s genome assembly technique, Dr. Dekker commented, “This new approach to genome assembly can help produce higher-quality genome sequences faster and easier than current methods. It will be especially interesting to apply this method to identify chromosomal aberrations, which are a hallmark of cancer.”