Oryza sativa
Oryza sativa is the most common type of rice used as a food crop. [C T Johansson [CC BY 3.0], via Wikimedia Commons]
A team led by scientists from the University of Chicago (UChicago) has published a study (“Rapid evolution of protein diversity by de novo origination in Oryza“) in Nature Ecology and Evolution that challenges one of the classic assumptions about how new proteins evolve. The research shows that random, noncoding sections of DNA can quickly evolve to produce new proteins. These de novo, or from scratch, genes provide a new, unexplored way that proteins evolve and contribute to biodiversity, according to the scientists.

“Using a big genome comparison, we show that noncoding sequences can evolve into completely novel proteins. That’s a huge discovery,” said Manyuan Long, PhD, the Edna K. Papazian distinguished service professor of ecology and evolution at UChicago and senior author of the new study.

“New protein-coding genes that arise de novo from noncoding DNA sequences contribute to protein diversity. However, de novo gene origination is challenging to study as it requires high-quality reference genomes for closely related species, evidence for ancestral noncoding sequences, and transcription and translation of the new genes. High-quality genomes of 13 closely related Oryza species provide unprecedented opportunities to understand de novo origination events. Here, we identify a large number of young de novo genes with discernible recent ancestral noncoding sequences and evidence of translation,” wrote the investigators.

“Using pipelines examining the synteny relationship between genomes and reciprocal-best whole-genome alignments, we detected at least 175 de novo open reading frames in the focal species O. sativa subspecies japonica, which were all detected in RNA sequencing-based transcriptomes. Mass spectrometry-based targeted proteomics and ribosomal profiling show translational evidence for 57% of the de novo genes. In recent divergence of Oryza, an average of 51.5 de novo genes per million years were generated and retained. We observed evolutionary patterns in which excess indels and early transcription were favored in origination with a stepwise formation of gene structure. These data reveal that de novo genes contribute to the rapid evolution of protein diversity under positive selection.”

For decades, scientists believed that there were only two ways new genes evolved: duplication and divergence or recombination. During the normal process of replication and repair, a section of DNA gets copied and creates a duplicate version of the gene. Then, one of these copies may acquire mutations that change its functionality enough that it diverges and becomes a distinct new gene. With recombination, pieces of genetic material are reshuffled to create new combinations and new genes. However, these two methods only account for a relatively small number of proteins, given the total number of possible combinations of amino acids that comprise them.

Scientists have long wondered about a third mechanism, where de novo genes could evolve from scratch. All organisms have long stretches of genetic material that do not encode proteins, sometimes up to 97% of the total genome. Is it possible for these noncoding sections to acquire mutations that suddenly make them functional?

This has been difficult to study because it requires high-quality reference genomes from several closely related species that show both the ancestral, noncoding sequences and subsequent new genes that evolved from them. Without this clear, visible line of evolution, there’s no way to prove it’s truly a de novo gene. The supposed new genes reported previously could just be an “orphaned gene” that diverged or transferred from unrelated organisms at some point, then all traces of its predecessors disappeared.

To overcome these challenges, Long’s team took advantage of 13 new genomes sequenced and annotated recently from 11 closely-related species of rice plants, including Oryza sativa, the most common food crop. He worked with groups headed by Rod Wing, PhD, at the University of Arizona. Yidan Ouyang, PhD, from Huazhong Agricultural University, China, also led a team that cultivated their own rice plants in Hainan, a tropical island off the southern coast of China, and harvested them for proteomics sampling.

After analyzing the genomes of these plants, they detected at least 175 de novo genes. Further mass spectrometry analysis of protein activity was conducted by another group led by Siqi Liu, PhD, at BGI-Shenzhen, a genome sequencing center located in Shenzhen, China. They found evidence that 57% of these genes actually translated into new proteins, including more than 300 new peptides.

With this first, large dataset of authentic de novo genes, Long’s team detected a pattern in their evolution. It began with the early evolution of expression, followed by subsequent mutation into protein coding potentials for almost all de novo genes.

“This makes sense given the widely observed expression of intergenic regions in various organisms,” said Li Zhang, PhD, a postdoctoral researcher at UChicago and lead author of the article.

Long said that the Oryza plants are good genomes to search for de novo genes because they are relatively young—you can still see evidence of evolution in their existing genomes.

“The 11 species diverged from each other only about three to four million years ago, so they are all young species,” he said. “For that reason, when we sequence the genomes, all the sequences are highly similar. They haven’t accumulated multiple generations of changes, so all the previous non-coding sections are still there.”

Long and his team next want to study the new proteins to further understand their function and evolution and see if there is something unique about their structure. If de novo genes open up an unexplored path for evolution, they could reveal mechanisms for creating new and improved cellular functions. For instance, the researchers detected evidence of natural selection acting to fix insertions and deletions in the genome to generate new protein sequences, and the sequence’s evolution toward improved functions.

“The new proteins may make certain functions better, or help regulate the genes better,” he said. “Each step of the way, they can bring some kind of benefit to the organism until it gradually becomes fixed in the genome.”

Previous articleAddressing Challenges Posed by the Adoption of Single-Use Systems
Next articleGut Microbiome Parasite Causes Death of Good Bacteria