Studying the complete collection of messenger RNA produced by the cell—the transcriptome—allows researchers to better understand gene expression and protein formation, as RNA is both the molecular bridge between DNA and the production of proteins. Intrinsically, RNA exists in a variety of forms, each with a particular role and purpose, some which are not entirely understood.
A team of scientists from the Perelman School of Medicine at the University of Pennsylvania has devised a new method for mapping the transcriptome, which they believe will shed additional light on the role of RNAs in cells.
The research team was able to identify in mammals RNA variants that had been largely invisible to previous techniques, as well as to demonstrate that these so-called “dark” variations in RNA are remarkably common in mammalian cells and likely to have roles in gene regulation in tissues, development, and human disease. The Penn investigators plan to utilize their newly developed software to analyze aberrant cells in neurodegenerative disorders, cancers, and other illnesses.
“It's very exciting for us, and I think for the research community in general, among other reasons, because we can now go back through the vast amount of existing transcriptome data, knowing that new and important things will emerge,” remarked senior study author Yoseph Barash, Ph.D., assistant professor of genetics and senior fellow at the Penn Institute for Biomedical Informatics.
Dr. Barash’s laboratory has been interested in the study of RNA splicing through the use of machine learning and computational modeling. Because a single gene may code for multiple forms of the same protein, each of which has its distinct biological role, it has been difficult for scientists to study all the molecular variants at once. Moreover, splicing patterns that deviate from normal have been known to contribute to many diseases, making their analysis even more critical.
The advent of RNA-seq has allowed scientists to explore the transcriptome even further, though most results yield sequences of only fragments of messenger RNA. Those fragment sequences essentially have to be stitched back together, with the aid of sophisticated software and existing RNA databases, to get a complete picture of the transcriptome. And that picture isn't necessarily a complete one. Researchers have long been searching for an easy, error-free way to identify and quantify all the distinct messenger-RNA splice variants within a sample.
“The reads from RNA-seq are sparse and also short compared to actual messenger-RNA transcripts, so you don't directly know what transcripts those reads came from,” Dr. Barash explained. “Therefore, you also don't directly know the abundance of those transcripts.”
Dr. Barash and his team took a new approach, beginning with the mapping of what they call local splice variations (LSVs)—essentially the variable junctions between exons, which are detectable sequences that span more than one exon.
“These are places where the splicing machinery of a cell makes a choice about which exon is spliced to another,” Dr. Barash explained.
The Penn researchers developed novel software—dubbed modeling alternative junction inclusion quantification or MAJIQ— to generate LSV maps from RNA-seq data and combine those data with existing RNA databases to yield pictures that include common, known splice variants, as well as complex splice variants that other methods fail to detect.
The scientists used their new software to analyze RNA-seq data from a variety of species, including lizards, mice, and humans. The analysis revealed that complex splice variants are much more frequent than previously thought—comprising, for example, about 37 percent of the transcriptome variations in human samples.
“These variations are a bit like the dark side of the moon,” stated Dr. Barash. “They were known to exist, yet we lacked the ability to shine a light on them—and now they turn out to make up a third of the variations in human messenger RNAs.”
The findings from this study were published recently in eLife through an article entitled “A new view of transcriptome complexity and regulations through the lens of local splicing variations.”
Additionally, another complex human variant detected with MAJIQ, synapse-related gene CAMK2D, turned out to be expressed about 40% less in brain tissue from Alzheimer's patients, compared to controls. Upon wider analysis, the team identified approximately 200 cases of altered splicing in Alzheimer patients that were reproducible in the two independent studies.
“We think that findings like those are just the tip of the iceberg,” Dr. Barash said and who also plans to do further MAJIQ-based investigations of complex splice variants in other disorders.