March 15, 2016 (Vol. 36, No. 6)

Like Big-City Transit Systems, Gene Expression Networks Are Multimodal

Efforts to interrogate the link between genotypes and phenotypes catalyzed some of the fundamental advances in science. Novel tools and frameworks to study gene expression and its regulation have been at the core of these developments. By revealing intricate details about biological systems, and
often changing paradigms, gene expression has become an integral and increasingly vital component of virtually every biomedical question.

“What surprised me in our recent study is how large a transcriptional change we were able to see after deleting the microRNA control in the cells,” says Phillip A. Sharp, Ph.D., professor of biology at MIT and co-recipient of the 1993 Nobel Prize in Physiology or Medicine.

MicroRNAs have been implicated in many biological processes that are critical during development, homeostasis, and disease. Each microRNA regulates multiple mRNA targets, and each mRNA is regulated by several distinct microRNAs, making it particularly challenging to dissect the regulatory networks.

Studies that examined individual microRNAs indicated that most of them modulate their target mRNA molecules only modestly, but approaches that interrogated their network-wide function revealed that microRNAs could impact gene regulatory networks more powerfully if they acted in combination rather than in isolation.

To gain insight into the effect of microRNA perturbations on gene expression, Dr. Sharp and colleagues systematically profiled and integrated transcriptional, post-transcriptional, and epigenetic changes in murine fibroblast cell lines in the presence and then in the absence of the Dicer protein. This strategy allowed transcriptional changes to be decoupled from post-transcriptional changes, and enabled their integration with epigenetic modifications.

Cellular microRNA depletion caused a global rewiring of the gene transcription networks. This explained most of the change in cellular mRNA expression levels.

“We set out to see if we could identify the transcriptional factors that were primarily responsible for these changes,” explains Dr. Sharp. The use of network modeling helped identify specific microRNA-regulated transcription factors, and several of them were tagged and overexpressed in wild-type cells and silenced in cells lacking the Dicer protein. This approach allowed individual contributions to transcriptional changes to be explored and validated.

“We identified several transcription factors that were changing in activity, but we were not able to make the connection between microRNAs and the transcription factors straightforward,” notes Dr. Sharp. “This is a limitation of the computational system program.”

Connecting microRNAs to targets and targets to transcription is one of the gaps in the mechanistic understanding of the link between microRNA perturbations and the global transcriptional changes that they cause. “If we could perform a high-throughput sampling of transcription factors and analyze highly redundant data to look at the coordinate cellular changes using computational models,” surmises Dr. Sharp, “that should give us a much better resolution of what factors are responsible for transcriptional changes in those cells.”

Maintaining Genomic Stability

“We looked for long noncoding RNAs (lncRNAs) that are regulated by DNA damage,” says Joshua Mendell, M.D., Ph.D., professor of molecular biology at the University of Texas Southwestern Medical Center and investigator of the Howard Hughes Medical Institute. “This led us to notice a particular lncRNA that had been annotated but not studied before.”

This finding was made in Dr. Mendell’s lab as part of efforts focusing on identifying lncRNAs that are involved in the DNA damage response. Several thousand lncRNAs have been described to date, and one of the gaps in the lncRNA field is the insufficient understanding of the extent to which lncRNAs are functional entities within the cells. In their search for lncRNAs with functional roles, Dr. Mendell and colleagues hypothesized that these molecules are likely to be more abundant and evolutionarily conserved.

“We looked for lncRNAs that are important in cancer relevant pathways,” informs Dr. Mendell. This effort led to the identification of a new lncRNA that was termed “noncoding RNA activated by DNA damage,” or NORAD. “With 1,000 copies of the RNA present in mammalian cell lines under some conditions, NORAD is more abundant than most lncRNAs,” Dr. Mendell continues. “It is similar in abundance to housekeeping mRNAs.” NORAD is highly conserved evolutionarily, with human and mouse versions of NORAD evidencing a high degree of sequence identity—about 60%.

“When we inactivated NORAD using a genome-editing approach, we noticed that some knockout cells gained a tetraploid DNA content, which was very surprising,” reports Dr. Mendell. Even cells that retained their diploid DNA content had an unstable number of chromosomes, which is a phenotype of chromosomal instability and a very frequent feature of cancer cells, which have a higher frequency of chromosome gain or loss.

“Prior to this finding, a lncRNA that is essential for maintaining chromosomal stability had not been described,” notes Dr. Mendell. Dissection of the molecular pathways responsible for the genomic instability phenotype revealed that NORAD acts as a molecular decoy and sequesters the PUM1/PUM2 proteins, which repress key proteins involved in the cell cycle, mitosis, and DNA replication and repair. The ability of NORAD to act as a negative regulator of the PUMILIO proteins limits their ability to
repress their target mRNAs.

In the laboratory of Joshua Mendell, M.D., Ph.D., a researcher at the University of Texas Southwestern Medical Center, genomic instability is being investigated. This effort has led to the identification of a new lncRNA called NORAD. In this image’s left panel, NORAD activity is preserved in a wild-type nucleus that is labeled for chromosomes 7 and 20. As expected, there are two dots, one for each chromosome. In the right panel, a NORAD knockout nucleus shows four dots—evidence of tetraploid DNA content.

Predicting Successful Implantations

“We performed transcriptome analyses of samples consisting of a few human trophectoderm cells,” says Karin Lykke-Hartmann, Ph.D., associate professor of biomedicine at Aarhus University. “Our aim was to identify gene-expression profiles associated with successful implantations and live births.”

For several years, data from animal models pointed toward the existence of specific gene-expression profiles that are distinct in viable as compared to nonviable embryos. Although gaining insight into gene-expression profiles in human embryos is crucial, this has historically been elusive for multiple reasons, including limited access to research material and the small amount of genetic material available from embryos.

“Currently, selection of the embryo that is most likely to lead to a successful pregnancy is based on morphological considerations, including how sharp and healthy the cells look,” explains Dr. Lykke-Hartmann. “But differences at the molecular level are not accessible by this approach.”

In a study that enrolled infertile couples that presented to Aarhus University Hospital between 2011 and 2013 for fertility treatment, Dr. Lykke-Hartmann used embryonic gene-expression profiles to distinguish nonimplanted blastocysts from blastocysts that resulted in successful pregnancies. After oocytes were fertilized by means of in vitro fertilization or intracytoplasmic sperm injection, the resulting embryo, which developed in an incubator, had its progress recorded for several days. Three to five trophectoderm cells were removed from a single six-day blastocyst.

These cells were subjected to next-generation sequencing, and gene-expression profiles were generated for the blastocysts. Profiles for the blastocysts that failed to implant successfully were then compared to those for the blastocysts that led to live births.

“A significant number of genes were found to be up- or downregulated between the two groups,” reports Dr. Lykke-Hartmann. From a set of 37 genes that were differentially expressed, many of them have historically not been implicated in implantation.

“This approach promises to provide a complement to the morphological evaluation of blastocysts,” asserts Dr. Lykke-Hartmann. “If we are able to define a subset of genes that can be tested, we can provide results within a few hours.”

While previous studies have examined gene expression during early development, this strategy allowed, for the first time, molecular signatures to be captured in single blastocysts. “This strategy has limitations in terms of the number of patients enrolled,” concludes Dr. Lykke-Hartmann. “But it provides initial insights into developing gene-expression profiles that could be used to predict successful implantations.”

Gaining Post-Transcriptional Insights

“We initially created a shotgun library to identify mRNA-fate regulators in trypanosomes,” says Esteban D. Erben, Ph.D., a research scientist at the University of Heidelberg. “The data helped generate a second collection of full-length proteins.”

One of the challenges in studying the biology of African trypanosomes is that they rely mostly on post-transcriptional mechanisms to control gene expression. In this parasite, open reading frames are organized into long polycistronic arrays that are processed into monocystronic individual mRNAs by trans-splicing and polyadenylation, and transcription initiation is not controlled at the level of the RNA polymerase II at individual genes.

To gain insight into the proteins that are implicated in post-transcriptional gene regulation in the African trypanosome, Dr. Erben and colleagues generated an inducible plasmid library of random trypanosoma genomic fragments fused to the lambda-N peptide, which binds RNA molecules.

The plasmid library provided a 10-fold coverage of the parasite genome. After transfecting these plasmids into cells expressing positive or negative selection reporters, a tethering screen helped identify approximately 300 proteins that could orchestrate post-transcriptional mRNA regulation.

One of the limitations of this approach is the potential for protein fragments to generate false hits. To circumvent this limitation, Dr. Erben and colleagues refined their work by generating a small-scale library of open reading frames that encode full-length proteins thought to be involved in mRNA metabolism.

The tethering screen was used to identify putative candidates implicated in post-transcriptional control. To capture the mRNA-bound proteome that is relevant in vivo, poly(A) mRNA-bound proteins were captured from bloodstream forms of the parasite after crosslinking, and the enriched RNA-bound proteins were examined by mass spectrometry.

“These experiments are not truly informative about mechanisms and functions,” notes Dr. Erben. “But the approaches complement one another to help identify regulators, and this is a valuable first step.”

This strategy helped identify a nonredundant group of 155 high-confidence candidates, many of which have not been previously annotated as RNA-binding proteins. Candidates included nuclear, cytoplasmic, nucleolar, and mitochondrial proteins, illustrating the diversity of mRNA-binding proteome. “Our next step,” says Dr. Erben, “will be to generate a complete library that contains the 8,000 genes that are present.”

BIOVIA’s Variant Annotator

The collection and reporting of data relevant to genomic alterations are key steps in genome sequence data analysis. Researchers rely on annotations to filter large numbers of variants to a subset of alterations most important to disease states and potential targets.

According to officials at Dassault Systèmes BIOVIA, their variant annotator is a user-friendly, web-based application for annotating genomic point mutations and indels. “The application finds information from all imported data sources for a collection of input mutations,” says Tim Moran, product manager, life science research.

Inputs are formatted as: [Chromosome], [Start Position], [End Position], [Reference Base], [Observed Base] and data sources include databases or lists of annotation information selected by the users (e.g., dbSNP, COSMIC, ExomeVariant Server (EVS), Ensembl, Hugo, etc.).

“Outputs are formatted as a table of annotation information culled from the specified version of each Data Source and saved behind an organization’s firewall,” continues Moran. “The application allows version control of the annotation source files, ensuring reproducibility and traceability.”

Previous articleSeeing Gene Networks as Social Networks, Analysts Identify Disease Nodes
Next articleClinical Actionability in Cancer