July 1, 2016 (Vol. 36, No. 13)
Richard A. A. Stein M.D., Ph.D.
To Conduct Cellular Censuses Sequence Deeply and Appreciate Transcriptomic Heterogeneity
The contribution of individual cells to aggregate phenotypes is of increasing interest in biology, as is evident in the rise of technologies that can resolve cell-level details from the blur of population-level generalities. Among these technologies is RNA sequencing (RNA-seq), which can take snapshots of the transcriptome, capturing fleeting expression profiles, zooming in on single cells, and exposing cell-cell variations.
At present, RNA-seq is helping researchers take a fresh look at biological phenomena such as development, malignant transformation, and the behavior of microbial populations. Already, the new views are yielding conceptual advances.
“Over the years, we have become increasingly aware that what we once treated as homogeneous populations of cells are, in fact, heterogeneous,” says Alex K. Shalek, Ph.D., a core member of the Institute for Medical Engineering and Science at MIT. “In parallel, we developed a greater appreciation for the different cellular players that are involved in shaping systems-level phenotypes.”
One of the fundamental problems in dissecting the link between genotype and phenotype is that biological samples are usually complex mixtures of cells. In certain cases, such as with blood, there is a relatively good agreement as to the identity of the major cellular components. “In other cases, such as with tumors, there are unknown mixtures of different cell types and states that drive the ensemble behaviors we observe,” adds Dr. Shalek.
One of the promises of single-cell genomics is the possibility of performing transcriptome-wide analyses of the genes expressed by each cell. By uncovering patterns in gene expression and co-variation, researchers can identify what cell types are present and which pathways are active or silent.
A t-distributed stochastic neighbor embedding (t-SNE) plot showing the clustering of diverse cell types. Each dot represents a cell and is colored according to the cell’s density clustering assignment. This plot, which summarizes RNA-seq data obtained from cells within a clinical isolate, was generated at the MIT laboratory of Alex Shalek, Ph.D.
Survey the Tumor Microenvironment
In a recent analysis, Dr. Shalek and colleagues used single-cell RNA-seq to profile 430 cells from five primary glioblastomas. This approach unveiled considerable intratumor variability in transcriptional programs related to processes fundamental for cancer biology, such as oncogene signaling, hypoxia, and proliferation. “By initially focusing on just tumor cells, we missed a lot of essential information,” notes Dr. Shalek. “Nonmalignant cells, such as immune infiltrate or stroma, can make up a large fraction of a tumor,” explains Dr. Shalek.
Subsequently, Dr. Shalek’s team participated in a study that was focused on characterizing the cellular microenvironment of melanoma. Other study participants included Levi Garraway, M.D., Ph.D., an associate professor of medicine at the Dana Farber Cancer Institute, and Aviv Regev, Ph.D., a computational biologist at the Broad Institute of MIT and Harvard.
“This was our first foray toward understanding the diversity of the cells that are implicated in a tumor,” relates Dr. Shalek. During this work, Dr. Shalek and colleagues profiled immune, stromal, and endothelial cells in addition to malignant ones. Looking at both cancerous and noncancerous cell states provided opportunities to understand the interplay among cellular phenotypes in the tumor microenvironment. “Going forward,” states Dr. Shalek, “we hope to leverage this deeper understanding to guide more effective therapies.”
Get to the Heart of Development
“During our studies on mesoderm patterning in the early embryo, we became particularly interested in the specification of the cells that form the second heart field,” says Michael Kyba, Ph.D., the Carrie Ramey/CCRF Endowed Professor in Pediatric Cancer Research at the University of Minnesota.
One of the features of the mammalian heart is its multichamber organization, an evolutionary innovation that allows much more efficient circulation. Compared with the single-chambered heart, which is found in less-developed species, the multichambered heart evolutionarily developed to facilitate predation. Paralleling the evolution of the multichambered heart was the coordinated development of new muscle types to help meet the requirement for predatory biting and for a new type of food intake.
In the four-chambered human heart, the left ventricle is the ancestral part of the heart, while the atria and the right ventricle are the more evolved part and originate from the second heart field. “The facial muscles, used for eating, also originate from those new founder cells that produce the second heart field,” elaborates Dr. Kyba. “The evolutionary innovation that enabled predatory feeding involved specifying a new common progenitor population that gives rise to facial muscles and to the new parts of the heart.”
The mesoderm-promoting transcription factor MESP1 is expressed in these cells and in much of the early mesoderm. Previously, MESP1 was thought to be a heart-specific master regulator. “Several years ago,” notes Dr. Kyba, “we showed that MESP1 promotes the development of not only the heart, but also of other lineages, such as skeletal muscle and blood, which are different types of mesoderm.”
Recently, Dr. Kyba and colleagues proposed to interrogate whether individual MESP1-producing cells have both potentials. “We wanted to know whether the early cells that produce cardiac and blood cells are intrinsically different, so that a cell could become either one of the two—and retrospectively, whether they still had those developmental opportunities open,” says Dr. Kyba.
RNA-seq opened the possibility of examining the intrinsic differences between cells that superficially appear similar but might have different developmental potentials. In cells intrinsically programmed to become one lineage or another, single-cell expression patterns would reveal different clusters for the genes involved in cardiovascular development than for those associated with blood development.
An alternative possibility involves cells that do not intrinsically have these different fates but are still in a plastic state. “Such cells would have the same expression pattern of the detectable genes,” proposes Dr. Kyba. “One would not be able to cluster them.”
In single-cell RNA-seq whole-genome transcriptomic analyses, Dr. Kyba and colleagues found that the genes did not cluster into two or more different groups. “We were surprised because the cells looked as if they were homogeneous,” recalls Dr. Kyba. However, particularly during early development, when gene expression profiles are set to change dramatically, capturing key transcription factors is known to be more important than surveying the totality of the genes from the genome.
“When we selected the key master regulators of the blood and the cardiac lineages and looked at those specific genes, we found that cells segregated,” reports Dr. Kyba. “It was possible to discern cells destined to become cardiac from the ones destined to become blood lineages.” This analysis revealed that in cells appearing superficially homogeneous, whole-transcriptome analysis might obscure very small subsets of genes that are the key drivers of later developmental decisions.
The image, provided by Michael Kyba, Ph.D., and Sunny Chan, Ph.D., of the University of Minnesota, illustrates a single Mesp1-induced cell (green) captured by the Fluidigm C1 microfluidic system. The cell is colored for clarity purposes. There are 96 of these fluid microcells in the device. After cell capture, the RNA is extracted and collected from each cell. Ultimately, an RNA-seq library is created.
“We initially used RNA-seq to study gene function and genome organization in individual organisms,” says Karsten Zengler, Ph.D., associate professor of pediatrics at the University of California, San Diego. “We recently started using it more in the community context.”
Researchers in Dr. Zengler’s lab have performed several microbiome studies to understand the dynamics of individual members of the microbiome over time. One of the major challenges in microbiology has been the fact that the vast majority of the microbial species cannot be cultured using routine laboratory methods, but studies on microbial populations promise to circumvent the need to culture individual microorganisms.
“The idea is not so much to isolate microorganisms, but to elucidate what they do in their natural environment,” explains Dr. Zengler. Exploring microbial populations is ideally positioned to collect information that would otherwise be difficult or impossible to collect from the individual constituents of the population. “The most challenging aspect of using RNA-seq is that we have very little material to work with,” notes Dr. Zengler, who also works on the human skin microbiome.
In a recent study, Dr. Zengler and colleagues interrogated complex microbial communities. The scientists were able to extract the genomes of individual community members and transcriptomic information to unveil details about metabolic interactions between species. This work relied on a combined systems biology and molecular biology approach that helped the scientists characterize the metabolic flux at the community level. The scientists were also able to predict the effects of metabolic perturbations.
This proof-of-concept study promises experimental strategies in which individual community members can be depleted or enriched to achieve a desired effect, such as therapeutic benefits. “Using this concept, which we developed with our collaborators, we can now identify microorganisms that control the microbial community on skin,” asserts Dr. Zengler. “By placing selected microbes into an ointment, we might be able to treat skin diseases such as atopic dermatitis.”
Deliberately adding specific microorganisms back into microbial communities promises therapeutic avenues for a variety of medical conditions. “Ultimately, the goal is to explore how the microbiome is originally assembled,” concludes Dr. Zengler. “If we understand why an organism is present or absent, this would allow us to intervene in a targeted manner to modulate health and disease outcome.”
Edge toward Co-Expression Networks
“Co-expression is the ideal approach to look at how gene sets interact and to determine functional output,” says Jesse Gillis, Ph.D., assistant professor at the Cold Spring Harbor Laboratory. Major efforts in Dr. Gillis’ lab are focusing on understanding gene variants that are relevant for disease and on using single-cell gene expression to capture the function of genes and their interactions in complex networks.
In the first major single-cell co-expression analysis, Dr. Gillis and colleagues examined single-cell RNA-seq data from 31 studies, including 163 different cell types, to characterize co-expression replicability. “Surprisingly, we found that one of the ways genes interact in cells is not very cell-specific,” informs Dr. Gillis. “While some cell specificity is involved, the interaction tends to be fairly binary.”
In this analysis, single-cell network connectivity emerged as a significant predictor of function. To further explore the factors shaping this connectivity, Dr. Gillis and colleagues performed RNA-seq analyses on 126 cortical interneurons. This experimental design captured co-expression patterns and provided an ideal setting to identify factors that interfere with measurements and to implement approaches to control for them.
“A major challenge that is being recognized, but has not been fully solved, is that single-cell data has fairly distinguishable properties, and the technical noise may be high,” notes Dr. Gillis. A second challenge is related to the fact that the interpretation of single-cell datasets is often performed in an unbiased manner, but unbiased approaches are limited by their suboptimal ability to distinguish technical artifacts from the data.
“We always use preexisting biological knowledge to distinguish the biological signal from the technical signal,” insists Dr. Gillis. “As a research team, I would like to see more of this exploited in single-cell RNA-seq experiments.”
At the Cold Spring Harbor laboratory of Jesse Gillis, Ph.D., the derivation of gene co-expression networks goes through the following steps: (A) Measure gene expression across many individual cells and take special note of genes that share expression profiles. Such genes are known to preferentially share function. (B) Capture genome-scale data in network plots. Each node (circle) represents a gene; each link (line), the strength of co-expression between a gene pair. (C) Evaluate whether genes that share function (color) tend to be linked. Hidden information can be reconstructed by local similarity in the network. (D) Process multiple datasets to strengthen gene-function predictions. Very substantial amounts of data are needed, particularly since single-cell data is noticeably noisier than bulk, despite showing similar performance trends (inset).
Evaluating Sequencing Tools
“We have assembled a complete solution under the QIAseq brand for applications that include whole-transcriptome sequencing and targeted sequencing,” says Vikram Devgan, Ph.D., director and head of biological research content marketing at Qiagen. The QIAseq Targeted RNA Panels were developed to meet the needs for quantitative gene-expression analyses, and the QIAseq Targeted RNAScan Panel is positioned to detect fusion genes, which have been increasingly incorporated in diagnostic and therapeutic decisions in multiple malignancies.
“Both these products were developed using our innovative molecular barcode technology,” details Dr. Devgan. In both applications, molecular barcodes assigned to individual complementary DNA (cDNA) templates provide true quantification, and they are used to correct amplification or library biases and increase detection sensitivity.
“These products have been developed to provide a complete solution,” asserts Dr. Devgan. “They enable consumers to use their RNA and generate libraries ready for sequencing using the same kit.” Additional advantages of the QIAseq products are their compatibility with any sequencing platform and the possibility of customizing them to target genes or fusion gene junctions of interest.
Qiagen also promises solutions in the single-cell genomics space, which has attracted increasing interest with the recognition of the critical role that heterogeneities in cellular populations play in disease and development. “Under the brand name REPLI-g, we established one of the most widely used single-cell solutions in genomics and transcriptomics,” remarks Colin Baron, senior director and head of product management at Qiagen’s NGS Life Sciences unit. “We see our role as democratizing single-cell sequencing.”
One of the typical challenges that researchers face in single-cell studies is that platforms have traditionally been associated with a high cost. “REPLI-g allows easy access to single-cell sequencing,” offers Baron. The REPLI-g Single Cell RNA Library Kit includes reagents needed to generate libraries from single cells, leverages robust amplification, and provides high fidelity and quantitative accuracy.
“We are also working on launching a very low-cost solution for single-cell isolation and recovery for downstream genomic analysis or functional characterization,” adds Baron. The device promises to facilitate experiments in which capturing the genome of single cells is essential, such as studies on circulating tumor cells.
The RNA-seq Explorer Solution is a new tool that integrates Ingenuity® Pathway Analysis™ Biomedical Genomics Workbench® and other Qiagen bioinformatics solutions to generate insights for research into improved detection, diagnosis, and treatment of cancer. The solution was demonstrated by Jean-Noel Billaud, Ph.D., principal scientist, Qiagen Bioinformatics, at the annual meeting of the American Association for Cancer Research (AACR) in New Orleans.
RNA-seq Explorer Solution can facilitate simple, accurate discovery and validation of biomarkers. It is designed to enable researchers to go from raw data in FASTQ format to significant insights that home in on the genetic drivers of cancer.
The solution draws upon Qiagen’s Ingenuity Pathway Analysis (IPA), an all-in-one, web-based software application that can enable analysis, integration, and understanding of expression data. IPA is backed by the expert-curated Ingenuity Knowledge Base of highly structured, detail-rich biological and chemical findings. RNA-seq Explorer Solution also integrates Qiagen’s Biomedical Genomics Workbench, a comprehensive data analysis platform that offers end-to-end workflows and tools for the alignment, normalization, and statistical analysis of NGS experimental results.
Target Validation of RNA-Seq
To search for novel and known genes that show tissue-specific expression and potentially drive disease, scientists perform next-generation sequencing of RNA (RNA-seq) and compile transcriptome information. That is, scientists build a library of all the messenger RNAs (mRNAs) expressed in a tissue. Such a library provides an initial list of targets.
Subsequently, bioinformatic approaches can be used to identify the most interesting targets based on gene expression patterns as well as the presence of various transcript-level alterations, including mutations, splice variants, and gene fusions. However, RNA-seq cannot indicate exactly which cells in a tissue sample are responsible for differences in gene expression and which cells carry the transcript-level alterations.
Other common techniques, such as immunohistochemistry and Western blot, may lack appropriate antibodies and thereby yield false results. These limitations are often attributed to cross-reactivity and low levels of protein expression, problems seen, for example, with G-protein-coupled receptors.
To overcome these limitations, researchers have been taking advantage of RNAscope® from ACDbio (Advanced Cell Diagnostics).
“Scientists from the University of Miami, for example, used the technology to understand expression of multiple olfactory signaling genes in the eye,” explains Chris Silva, vp of marketing at ACDbio. “The ability to see whether the gene of interest is expressed in a tissue or not is critical.
“In addition, spatial information about gene expression indicates which specific cell types express this gene, allowing an understanding of tissue heterogeneity,” adds Silva, noting that RNAscope offers the ability to multiplex—to visualize expressions of two or three genes at the same time.
The technology can also provide quantitative information. The assay yields information on relative abundance of the target mRNA, down to the level of a single molecule of RNA. “RNAscope also offers relative simplicity and speed,” continues Silva. “Results can be obtained in a day.”
ACDbio has more than 9,000 RNAscope target probes for 100+ species, and these probes are available “off the shelf.”