Scientists at the Wellcome Sanger Institute and their collaborators say they have created a new computational method for assigning the donor in single cell RNA sequencing experiments that provide an accurate way to unravel data from a mixture of people. The Souporcell technique, described in an article “Souporcell: robust clustering of single-cell RNA-seq data by genotype without reference genotypes” in Nature Methods, could help study how genetic variants in different people affect which genes are expressed during infection or response to drugs, according to the team, which believes the software could increase efficiency of single-cell experiments, assisting research into transplants, personalized medicine, and malaria.
Single-cell RNA sequencing (RNAseq) can reveal exactly which genes are switched on in each individual cell, revealing cell types and what they do. Pooling multiple people’s cells into a single cell RNAseq experiment helps to identify how different genomes affect this gene expression. However it is essential to be able to separate the resulting data by individual, which can be very difficult. The researchers tested Souporcell against three other computational methods using placental cells, pluripotent stem cell lines, and malaria parasites.
“Methods to deconvolve single-cell RNA-sequencing (scRNA-seq) data are necessary for samples containing a mixture of genotypes, whether they are natural or experimentally combined. Multiplexing across donors is a popular experimental design that can avoid batch effects, reduce costs, and improve doublet detection. By using variants detected in scRNA-seq reads, it is possible to assign cells to their donor of origin and identify cross-genotype doublets that may have highly similar transcriptional profiles, precluding detection by transcriptional profile. More subtle cross-genotype variant contamination can be used to estimate the amount of ambient RNA,” write the investigators.
“Ambient RNA is caused by cell lysis before droplet partitioning and is an important confounder of scRNA-seq analysis. Here we develop souporcell, a method to cluster cells using the genetic variants detected within the scRNA-seq reads. We show that it achieves high accuracy on genotype clustering, doublet detection and ambient RNA estimation, as demonstrated across a range of challenging scenarios.”
Haynes Heaton, MD, the first author from the Wellcome Sanger Institute, said: “Our method…is able to separate mixtures of individuals’ cells in scRNAseq experiments without knowing each individual’s full genome sequence beforehand, unlike previous methods. One of the key features of the method is that it estimates the amount of background RNA from dead cells, which is often referred to as the soup. This then allows the removal of that source of noise, and hence the name Souporcell.”
Being able to combine the cells into a single experiment increases the accuracy, enabling more information to be found, and also reduces the cost of these experiments, he added.
According to Martin Hemberg, PhD, a senior author from the Wellcome Sanger Institute, “The exact genetic sequence of each person can affect their response to infections, or to drug treatments. The new method enables single cell expression data from multiple people to be analyzed, to show links between genotype and phenotype, in diseases and in the presence of drugs. This will have implications for personalized medicine.”
In addition, some samples inherently have a mix of cells with different genomes, including samples from transplant patients who have their original cells and cells from the donor, or populations of parasites, such as malaria, from an infected individual.
“This method is helping us understand malaria,” explained Mara Lawniczak, PhD, a senior author from the Wellcome Sanger Institute. “People get infected with multiple strains of malaria at once, but we don’t know how these strains are competing with each other to reproduce. To even ask the question we have to be able to split out cells of different malaria strains, and Souporcell is enabling this.”