Over the past few years, the Telomere-to-Telomere (T2T) consortium, a collaborative project funded by the National Human Genome Research Institute (NHGRI), part of the NIH, used multiple DNA sequencing technologies and analytical methods to generate and manually assemble the remaining 8–10% of the human genome sequence. The resolution of the most complex repeats relied on the integration of long reads from Oxford Nanopore sequencing instruments with a high-resolution assembly graph built from PacBio HiFi reads.
Now, researchers from the NIH have developed and released an innovative software tool to assemble truly complete (gapless) genome sequences from a variety of species in days. This software, called Verkko, which means “network” in Finnish, makes the process of assembling complete genome sequences more affordable and accessible.
This work is published in Nature Biotechnology, in the article, “Telomere-to-telomere assembly of diploid chromosomes with Verkko.”
“Verkko can democratize generating gapless genome sequences,” said Adam Phillippy, PhD, an NHGRI senior investigator with the T2T project and the development of Verkko. “This new software will make assembling complete genome sequences as affordable and routine as possible.”
Verkko grew from assembling the first gapless human genome sequence, which was finished last year by the T2T consortium.
“We took everything we learned in the T2T project and automated the process,” said NHGRI associate investigator Sergey Koren, PhD. “Now with Verkko, we can essentially push a button and automatically get a complete genome sequence.”
Verkko is an iterative, graph-based pipeline for assembling complete, diploid genomes. The authors write that, “Verkko begins with a multiplex de Bruijn graph built from long, accurate reads and progressively simplifies this graph by integrating ultra-long reads and haplotype-specific markers.”
The researchers tested Verkko with human and non-human genome sequencing data. The software quickly and precisely assembled the sequences of whole chromosomes. Running Verkko, they noted, on the HG002 human genome resulted in 20 of 46 diploid chromosomes assembled without gaps at 99.9997% accuracy.
As Verkko leads to more complete human genome sequences, researchers can better assess human genomic diversity. With only one gapless human genome sequence, scientists currently lack knowledge about the diversity of many portions of the genome, such as regions of highly repetitive DNA, across the human population.
Verkko will also accelerate efforts to generate gapless genome sequences of species commonly used in research, such as mice, fruit flies, and zebrafish, improving their usefulness to scientists. Additionally, generating gapless genome sequences from a variety of plants, animals, and other organisms will aid in comparative genomics, the study of the differences and similarities among the genomes of diverse species.