The Central Dogma of molecular biology states that DNA encodes RNA, RNA encodes protein. While this process is certainly true for the vast majority gene regulatory functions, in recent years, scientists have discovered that this is by no means an exclusive process. In fact, studies examining the association between genes and diseases have shown that a growing number of disease variants are found outside of protein-coding genes.
Now, a team of researchers led by investigators at the RIKEN FANTOM consortium has generated a comprehensive atlas of human long noncoding RNAs (lncRNAs) with substantially improved gene models, allowing them to better assess the diversity and functionality of these RNAs. The researchers have made their data accessible in an extensive searchable resource that they anticipate will have wide research applications. The findings from the study were published recently in Nature in an article entitled “An Atlas of Human Long Non-Coding RNAs with Accurate 5′ Ends.”
“There is strong debate in the scientific community on whether the thousands of lncRNAs generated from our genomes are functional or simply by-products of a noisy transcriptional machinery,” explained senior study investigator Alistair Forrest, Ph.D., professor at the Harry Perkins Institute of Medical Research at the University of Western Australia and senior visiting scientist at the RIKEN Center for Life Science Technologies (CLST). “By integrating the improved gene models with data from gene expression, evolutionary conservation, and genetic studies, we find compelling evidence that the majority of these lncRNAs appear to be functional, and for nearly 2000 of them we reveal their potential involvement in diseases and other genetic traits.”
Most current attempts to draw maps of RNA transcription rely on sequencing technologies that do not always accurately identify the beginnings, or 5' ends, of the RNA transcripts. To overcome this limitation, the research team used a technology known as Cap Analysis of Gene Expression (CAGE), which was developed at RIKEN, to build an atlas of human lncRNAs with accurate 5' ends, precisely pinpointing where in the genome their transcription is initiated.
“Intriguingly, the majority of lncRNAs appear to be generated from enhancer elements,” noted lead study investigator Chung-Chau Hon, Ph.D., senior scientist at CLST. “It deepens our understanding toward the largely heterogeneous origins of lncRNAs.”
The atlas, which contains 27,919 lncRNAs, summarizes for the first time their expression patterns across major human cell types and tissues. By intersecting this atlas with genomic and genetic data, their results suggest that 19,175 of these RNAs may be functional, hinting that there could be as many—or even more—functional noncoding RNAs than the approximately 20,000 protein-coding genes in the human genome.
“The improved gene models and the broad functional hints of human lncRNAs derived from this atlas could serve as a Rosetta Stone for us to experimentally investigate their functional relevance as part of our ongoing work for the upcoming edition of the FANTOM consortium,” concluded senior study investigator Piero Carninci, Ph.D., director of the division of genomic technologies at CLST. “We anticipate that these results could further push the boundary of our understanding of the functions of the noncoding portion of our genome.”