Whitehead Institute researchers have created the first information-rich genotype-phenotype map that reveals a multidimensional portrait of gene and cellular function. By performing genome-scale Perturb-seq (CRISPR-based screens with single-cell RNA-sequencing readouts) targeting all expressed genes in millions of human cells, Jonathan Weissman, PhD, and colleagues have mapped the transcriptional effects of genetic perturbations to predict the function of genes at genome-scale.

This research presents a blueprint for constructing and analyzing rich genotype-phenotype maps to serve as a driving force for systematically exploring genetic and cellular functions. “I think this dataset is going to enable all sorts of analyses that we haven’t even thought up yet by people who come from other parts of biology, and suddenly they just have this available to draw on,” said Tom Norman, PhD, former Weissman Lab postdoc and a co-senior author of the paper.

The findings were published in Cell in a paper titled, “Mapping information-rich genotype-phenotype landscapes with genome-scale Perturb-seq.” The data is available for other scientists to utilize on the Weissman Lab website.

There are few questions more fundamental to genetics and other fields of biology than, “What is the function of every gene?” Researchers have chipped away at this problem for decades, first by dissecting individual genes to more recent advances with genome-wide approaches like Perturb-seq, albeit at limited scales.

“We often take all the cells where ‘gene X’ is knocked down and average them together to look at how they changed,” said Weissman, a professor of biology at Massachusetts Institute of Technology (MIT) and investigator with the Howard Hughes Medical Institute. “But sometimes when you knock down a gene, different cells that are losing that same gene behave differently, and that behavior may be missed by the average.”

The Whitehead researchers turned this concept on its head, using Perturb-seq to reveal a multidimensional portrait of cellular behavior, gene function, and regulatory networks. The massive Perturb-seq map was made possible by foundational work from Joseph Replogle, an MD-PhD student in Weissman’s lab and co-first author of the present paper.

The comprehensive genotype-phenotype maps enabled the discovery of gene functions and dissection of cellular phenotypes—from RNA splicing to differentiation to chromosomal instability (CIN). This approach paved the way for the first genome-wide screen of factors that are required for the correct segregation of DNA. In this way, Norman said that this approach is highly applicable to studying aneuploidy study because it captures a phenotype that you can only get using a single-cell readout.

The genome-phenotype maps were able to pry open the longstanding question of why mitochondria still have their own DNA. The analysis revealed how nuclear and mitochondrial DNA are coordinated and regulated in different cellular conditions, especially when a cell is stressed. “A big-picture takeaway from our work is that one benefit of having a separate mitochondrial genome might be having localized or very specific genetic regulation in response to different stressors,” said Replogle.

These findings are only the beginning of what could be revealed by these genotype-phenotype maps, a massive resource that enables researchers to go in and do discovery-based research. Weissman said, “Rather than defining ahead of time what biology you’re going to be looking at, you have this map of the genotype-phenotype relationships, and you can go in and screen the database without having to do any experiments.”

To conclude the article, Weissman and colleagues emphasized that single-cell CRISPR screens require only a fraction of the number of cells used by other approaches and thus are well suited to the study of iPSC-derived cells and in vivo samples. At present, the major limitation of single-cell CRISPR screens is cost. To this point, the last experiments of the study involved sequencing some of the genome-scale libraries on a lower-cost, ultra-high throughput sequencing platform developed by Ultima Genomics, generating results equivalent to those sequenced on Illumina instruments.