Single-cell Hi-C can identify cell-to-cell variability of 3D chromatin organization—critical information regarding the role of genome folding and gene expression in the cell. But, Hi-C data has been challenging to analyze, given the sparseness of measured interactions. Now, a new algorithm developed by a team in Carnegie Mellon University’s computational biology department offers a powerful tool for illustrating the process at an unprecedented resolution.
This work is published in Nature Biotechnology, in the paper, “Multiscale and integrative single-cell Hi-C analysis with Higashi.”
The algorithm, known as Higashi, is based on hypergraph representation learning—the form of machine learning that can recommend music in an app and perform 3D object recognition. It can “incorporate the latent correlations among single cells to enhance overall imputation of contact maps.”
The algorithm is the first tool to use sophisticated neural networks on hypergraphs to provide a high-definition analysis of genome organization in single cells. Where an ordinary graph joins two vertices to a single intersection, known as an edge, a hypergraph joins multiple vertices to the edge.
The Higashi algorithm works with single-cell Hi-C, which creates snapshots of chromatin interactions occurring simultaneously in a single cell. Higashi provides a more detailed analysis of chromatin’s organization in the single cells of complex tissues and biological processes, as well as how its interactions vary from cell to cell. This analysis allows scientists to see detailed variations in the folding and organization of chromatin from cell to cell—including those that may be subtle, yet important, in identifying health implications.
“The variability of genome organization has strong implications in gene expression and cellular state,” said Jian Ma, PhD, professor of computational biology in the School of Computer Science at Carnegie Mellon University.
Ruochi Zhang, a doctoral student in the School of Computer Science, together with graduate student Tianming Zhou and Ma, named Higashi after a traditional Japanese sweet, continuing a tradition he began with other algorithms he developed. “He approaches the research with passion but also with a sense of humor sometimes,” said Ma.
The Higashi algorithm also allows scientists to simultaneously analyze other genomic signals jointly profiled with single-cell Hi-C. Eventually, this feature will enable expansion of Higashi’s capability, which is timely given the expected growth of single-cell data Ma expects to see in coming years through projects such as the NIH 4D Nucleome Program his center belongs to. This flow of data will create additional opportunities to design more algorithms that will advance scientific understanding of how the human genome is organized within the cell and its function in health and disease.
“This is a fast-moving area,” Ma said. “The experimental technology is advancing rapidly, and so is the computational development.”
The authors wrote that Higashi, “outperforms existing methods for embedding and imputation of single-cell Hi-C data and is able to identify multiscale 3D genome features in single cells, such as compartmentalization and TAD-like domain boundaries, allowing refined delineation of their cell-to-cell variability.” In addition, Higashi can incorporate epigenomic signals jointly profiled in the same cell into the hypergraph representation learning framework, as compared to separate analysis of two modalities, leading to improved embeddings for single-nucleus methyl-3C data.
The work was conducted as part of a multi-institution research center seeking a better understanding both of the 3D structure of cell nuclei and how changes in that structure affect cell functions in health and disease. The $10 million center was funded by the National Institutes of Health and is directed by Carnegie Mellon University, with Ma as its lead principal investigator.