Researchers from 20 institutions worldwide have produced a standardized catalog of more than 7,200 human gene segments that may encode proteins. These gene segments, called open reading frames (ORFs), were identified using ribosome profiling (Ribo-seq), a technique that provides a global snapshot of protein synthesis in a cell.

These ORFs almost certainly will be contributing factors to many human traits and diseases, both rare diseases and common ones such as cancer,” added John Prensner, MD, PhD, a physician and postdoctoral fellow from the Broad Institute of MIT and Harvard. “The challenge is now to figure out which ones have which roles in which diseases.”

The work was published in Nature Biotechnology in a paper titled, “Standardized annotation of translated open reading frames.”

In recent years, the Ribo-seq technique has led to the surprising discovery of prevalent translation in regions of the genome that were previously assumed to be non-active. These regions included sequences for presumed untranslated regions (UTRs) and long noncoding RNAs (lncRNAs). The identified ORFs are often very small.

Several authors of the current paper previously identified ORFs using Ribo-seq and described them in various scientific journals, including Cell, Science, and Nature Chemical Biology. Certain Ribo-seq ORFs are known to mediate gene regulation, and several have medical implications. Yet none of these ORFs were included in reference databases after initial publication.

“Ultimately, the absence of standardized ORF annotation has created a circular problem: while Ribo-seq ORFs remain unrecognized by reference annotation databases, this lack of recognition will thwart studies examining their roles,” the authors wrote.

“Here, as ‘Phase I’ of this work, we present a consolidated catalog of Ribo-seq ORFs from seven publications annotated onto GENCODE version 35,” they continued. “We hope community usage of this catalog will help address the key technical and biological questions necessary to move this work into ‘Phase II,’ where we aim to create a more comprehensive resource.”

During Phase II, the researchers plan to incorporate a greater diversity of human cell types and tissues so that they can identify which Ribo-seq ORFs are functionally important.

“It is especially remarkable that most of these 7,200 ORFs are exclusive to primates and might represent evolutionary innovations unique to our species,” added Jorge Ruiz-Orera, PhD, an evolutionary biologist from the Max Delbrück Center for Molecular Medicine in the Helmholtz Association in Germany. “These elements can provide important hints of what makes us humans.”

The effort was co-led by Prensner and Ruiz-Orera, along with Sebastiaan van Heesch, PhD, from the Princess Máxima Center for pediatric oncology in the Netherlands, and Jonathan Mudge, PhD, from the European Molecular Biology Laboratory – European Bioinformatics Institute (EMBL-EBI) in the United Kingdom.