Although an orchestra warming up before a performance may produce a meaningless mixture of sounds, the individual musicians are probably playing bits and pieces from the same score. If only listeners could isolate the fragmentary themes and motifs, and move them backwards and forwards in the imagination, order would emerge from chaos, the sense of musical arrangements would become clear.
Something like this organizational power has been needed in single-cell genomics. Although individual cells all play from the same score—the genome—they don’t necessarily act as though they are following a conductor’s baton. Even cells of the same type may appear to be transcriptionally distinct simply because they are at different stages of the cell cycle, or are different ages. Confounding factors such as these can obscure deep commonalities or—to return to the orchestra analogy—unheard harmonies.
Attuned to the hidden music of cell populations is a new statistical method, the single-cell latent variable model (scLVM). It was introduced January 19 in the journal Nature Biotechnology, in an article entitled, “Computational analysis of cell-to-cell heterogeneity in single-cell RNA-seq data reveals hidden subpopulation of cells.”
The article, which was prepared by scientists at the European Bioinformatics Institute (EMBL-EBI), is a refinement of single-cell RNA-sequencing, a relatively new technology that probes how genes are expressed in different types of healthy tissue and in cancers. It provides data on the gene-expression profiles of hundreds of individual cells in a single experiment, producing an exact picture of the individual cell types. However, the fundamental complexity of single-cell transcriptome profiles has posed a major challenge to making sense of the data.
“With single-cell genomics, we take cells from a tissue and group them into different types based on their expression profile, identifying subtypes that may have a range of functional roles. But to do that properly, we need to deal with confounding factors, and until now we haven’t had robust methods for doing that,” explained John Marioni, Ph.D., research group leader at EMBL-EBI. To account for confounding factors and identify hidden variables, the EMBL-EBI scientists developed a computational approach that uses latent variable models.
“We show that our [scLVM] allows the identification of otherwise undetectable sub-populations of cells that correspond to different stages during the differentiation of naïve T cells into Th2 cells,” wrote the authors of the Nature Biotechnology article. “Our approach can be used not only to identify cellular sub-populations but also to tease apart different sources of gene expression heterogeneity in single-cell transcriptomes.”
The authors also explained how their technique relates to other kinds of transcriptome analysis. For example, they described protocols in which the amplification of small quantities of mRNA may be combined with microfluidics to isolate individual cells. Although such protocols allow the entire transcriptome of large numbers of single cells to be assayed in an unbiased way, methods that can identify subpopulations of cells and reveal detailed gene regulatory patterns are have started to emerge only recently.
“If all you have is gene expression data from single cells, you need a way to identify and correct for the underlying factors that differentiate individual cells, so you can reveal the underlying biology,” explained Oliver Stegle, Ph.D., research group leader at EMBL-EBI. “Our model accounts for relatedness between single cells, for example whether they are at the same stage of the cell cycle, identifies potentially confounding variables and removes them. It also makes it easier to find new subtypes—variables you might not have known existed—and correct for them, all at one go.”
“We’ve defined how factors such as cell-cycle stage, measurement noise, or biological processes can be taken into account, making it possible to create a more accurate picture of gene expression in different cell types and subtypes,” asserted Florian Büttner, Ph.D., who led the research at EMBL-EBI as an EMBO visiting scientist from the Institute of Computational Biology at Helmholtz Zentrum München. “Combining single-cell analyses with statistical methods lets us identify cell types that would otherwise remain undetected.”