Superimposed on the genetic sequences coding for amino acids is a second genetic code. This second genetic code, which makes use of dual-use codons, or duons, specifies how genes are controlled.
The discovery emerged from the Encyclopedia of DNA Elements Project, also known as ENCODE. Besides suggesting an explanation for codon usage bias, the discovery raises the possibility that mutations may cause disease by deranging gene control programs. Alternatively, genetic diseases may involve a combination of altered protein sequences and gene control changes.
Evidence for dual-coding DNA was presented December 13 in Science, in an article entitled “Exonic Transcription Factor Binding Directs Codon Choice and Affects Protein Evolution.” The article’s authors, centered at the University of Washington, used genomic deoxyribonuclease I footprinting to map nucleotide resolution transcription factor (TF) occupancy across the human exome in 81 diverse cell types. They found that about 15% of human codons are duons that simultaneously specify both amino acids and TF recognition sites.
“For over 40 years we have assumed that DNA changes affecting the genetic code solely impact how proteins are made,” said John Stamatoyannopoulos, M.D., an associate professor of genome sciences and medicine at the University of Washington. “Now we know that this basic assumption about reading the human genome missed half of the picture. These new findings highlight that DNA is an incredibly powerful information storage device, which nature has fully exploited in unexpected ways.”
The genetic code uses a 64-letter alphabet called codons. The University of Washington team discovered that some codons can have two meanings, one related to protein sequence, and one related to gene control. These two meanings seem to have evolved in concert with each other. The gene control instructions appear to help stabilize certain beneficial features of proteins and how they are made.
In their article, the scientists indicated that duons are highly conserved. They also noted that TF-imposed constraint appears to be a major driver of codon usage bias. “Conversely, the regulatory code has been selectively depleted of TFs that recognize stop codons,” they observed. “More than 17% of single-nucleotide variants within duons directly alter TF binding.”
While evaluating the implications of their work for determining the genetic origins of disease, the scientists focused on common disease- and trait-associated single-nucleotide variations identified by genome-wide association studies (GWAS) in coding sequences. They found that 13.5% fall within duons. “GWAS single-nucleotide polymorphisms in duons encompass both synonymous (12%) and nonsynonymous (88%) substitutions and may directly affect pathogenetic mechanisms,” they wrote in their study. “As such, disease-associated variants within duons may compromise both regulatory and/or protein-structural functions. These findings have substantial practical implications for the interpretation of genetic variation in coding regions.”