Patricia F. Fitzpatrick Dimond Ph.D. Technical Editor of Clinical OMICs President of BioInsight Communications
Noncoding gene sequences control gene expression and influence disease processes.
In the August 1 issue of CELL, researchers from the Gene and Stem Cell Therapy Program at Sydney’s Centenary Institute revealed another function of introns, or noncoding nucleotide sequences, in DNA. They reported that gene-sequencing techniques and computer analysis allowed them to demonstrate how granulocytes use noncoding DNA to regulate the activity of a group of genes that determines the cells’ shape and function.
Their report adds to growing experimental support for the idea that all that extra stuff in the human genes, once referred to as “junk DNA,” is more than functionless, space-filling material that happens to make up nearly 98% of the genome. The paper adds to a growing body of knowledge establishing a considerable role for this material in the regulation of gene expression and its potential role in human disease.
For most genes, the initial RNA that is transcribed from a gene’s DNA template requires processing into mRNA before it can become translated into protein. One step in the processing, RNA splicing, involves the removal or “splicing out” of certain sequences, or introns (the aforementioned junk DNA). The final mRNA consists of protein-coding sequences, or exons, joined to one another through the splicing process.
Back in the old days, the general wisdom had it that introns loaded into the human genome were basically useless. While some noncoding DNA is transcribed in noncoding RNA, such as transfer RNA, ribosomal RNA, regulatory RNA, or endogenous retroviruses, others produce RNA with no known function or identified utility to the cell.
But over the past few years, as high-powered analytical tools and genomic information have become available, the function of introns, such as transcription factor recognition sequences, has become better understood. And, as John Stamatoyannopoulos, M.D., associate professor of genome sciences and medicine at the University of Washington, points out, while only about 2% of the human genome codes for proteins, “Hidden in the remaining 98 percent are instructions that basically tell the genes how to switch on and off.” His laboratory focuses on disease-associated variants in regulatory regions of DNA.
Mapping Noncoding Regions
Propelled by the publication of initial results in 2012 from the Encyclopedia of DNA Elements (ENCODE), a public research consortium launched by the U.S. National Human Genome Research Institute (NHGRI), scientists are increasingly identifying regulatory functions for intronic DNA.
Publishing its initial findings in a set of 30 papers in Nature, Genome Biology, and Genome Research, ENCODE indicated that the biologically active portion of human DNA was “considerably higher” than any previous estimates. In an overview paper, ENCODE reported that its members could assign biochemical functions to over 80% of the genome. Many functions involved controlling the expression levels of exons, or intronic material, which makes up less than 1% of the genome. ENCODE systematically mapped regions of transcription, transcription factor association, chromatin structure, and histone modification. The consortium’s participants were particularly interested in characterizing parts of the genome “outside of the well-studied protein-coding regions.”
The scientists said that many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a “statistical correspondence” to sequence variants linked to human disease, and can thereby, they concluded, guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research, the authors said.
In addition, researchers are finding that noncoding sequences in DNA may also be associated with sequence variants linked to human disease. The researchers note, however, that predicting the consequences of genetic variation in these noncoding regions, especially those involved in gene regulation, will be challenging.
Hidden in the remaining 98% of the genome are the instructions that tell the genes how to switch on and off in different kinds of cells. A chief goal of ENCODE has been to find those instructions and understand how they are written in the genome.
“In essence, these instructions are organized into millions of DNA ‘switches.’ These switches consist of strings of genetic letters, maybe 100 to 200 letters long, that can be thought of as sentences made up of short DNA words. The DNA words function as docking sites for special regulatory proteins,” said Dr. Stamatoyannopoulos.
Splicing Enhancers and Silencers
A key study published earlier this year in Nature Structural & Molecular Biology provided evidence that introns are “vital” for RNA splicing to create protein-coding transcripts. The study’s authors, including Zefeng Wang, Ph.D., associate professor of pharmacology at the Lineberger Comprehensive Cancer Center, and Christopher B. Burge, Ph.D., Professor of Biology and Biological Engineering at MIT, explained how they used a cell-based screen to identify 10 diverse motifs that inhibit splicing from introns.
All motifs, validated in human cell types, showed exonic splicing enhancer (ESE) or silencer (ESS) activity, and grouping these motifs according to their distributions yielded clusters with distinct patterns of context-dependent activity. The investigators identified candidate regulatory factors associated with each motif, recovering 24 splicing regulators, some of which were previously shown to regulate splicing.
Specific domains in selected factors were sufficient to confer intronic splicing silencer (ISS) activity. Many factors bound multiple distinct motifs with similar affinity, and all motifs were recognized by multiple factors, which revealed a complex overlapping network of protein-RNA interactions.
Associating Noncoding Variants with Disease Processes
As Dr. Stamatoyannopoulos and his colleagues pointed out in a 2012 paper in Science, the average individual is expected to harbor thousands of variants within noncoding genomic regions involved in gene regulation. But, they note, it is currently not possible to interpret reliably the functional consequences of genetic variation within any given transcription factor recognition sequence.
According to these authors, genome-wide association studies have identified many noncoding variants associated with common diseases and traits. In their own study, the investigators showed that these variants are concentrated in regulatory DNA marked by deoxyribonuclease I (DNase I) hypersensitive sites (DHSs).
Eighty-eight percent of such DHSs are active during fetal development and are enriched in variants associated with gestational exposure-related phenotypes. They further identified distant gene targets for hundreds of variant-containing DHSs that may explain phenotype associations. Disease-associated variants, they noted, systematically perturb transcription factor recognition sequences, frequently alter allelic chromatin states, and form regulatory networks.
The investigators also demonstrated tissue-selective enrichment of more weakly disease-associated variants within DHSs and the de novo identification of pathogenic cell types for Crohn’s disease, multiple sclerosis, and an electrocardiogram trait without prior knowledge of physiological mechanisms.
Their results, the authors concluded, suggest pervasive involvement of regulatory DNA variation in common human disease and provide pathogenic insights into diverse disorders.
According to Dr. Stamatoyannopoulos, the research revealed that, with diseases, “It’s not necessarily the gene but probably a network of genes that are working together.” He added that regulatory DNA sequences, or “switches,” may orchestrate entire networks. Such thoughts were affirmed by Eric Schadt, Ph.D., professor and chair of genetics and genomic sciences at Mount Sinai School of Medicine in New York. Dr. Schadt, who co-authored a perspective that accompanied the article, said, “They are affecting regions in the DNA that regulate whether genes should be expressed or not, and at what level. They are playing more of a regulatory role versus a protein-function role.”
The footnote to all of this is probably that knowing individual gene sequences that encode specific proteins is only the beginning of understanding the complexity of the human genome, but it may show how introns and other mechanisms control gene expression and ultimately shed light on many human diseases.
Patricia Fitzpatrick Dimond, Ph.D. ([email protected]), is technical editor at Genetic Engineering & Biotechnology News.