As a means of identifying and determining protein structures, artificial intelligence (AI) keeps growing stronger. For example, it is no longer limited to the study of protein monomers. It is beginning to take on protein complexes. However, AI has been better at modeling protein complexes in prokaryotes than in eukaryotes. Why? Because when AI tries to identify protein pairs that may interact with each other, it performs better if evolutionary information is available, and lots of it. That is certainly the case with prokaryotes.
Prokaryotic species vastly outnumber eukaryotic species. Consequently, with prokaryotic species, there are many more opportunities to detect proteins that have coevolved. Consider the case of a two-protein complex. Presumably each protein in such a complex must have an interactive domain that complements an interactive domain in the other protein—and any mutations affecting one protein’s interactive domain must be accompanied by mutations affecting the other protein’s interactive domain. Otherwise, the proteins would eventually cease to interact.
Besides the sheer number of prokaryotic species, there are other factors that simplify the analysis of prokaryotic protein complexes. For example, in prokaryotic organisms, proteins are fewer, and alternative splicing and rounds of genome duplication are atypical. These factors reduce the “noise” that scientists must deal with when they attempt to model prokaryotic protein complexes computationally.
Although the computational analysis of eukaryotic protein complexes is more challenging, researchers based at University of Washington’s Institute of Protein Design and at the University of Texas Southwestern Medical Center were undaunted. They decided to tackle the computational challenge using two deep-learning-based structure prediction methods, RoseTTAFold and AlphaFold. RoseTTAFold, which was invented at the University of Washington, was used to compute contact probability for protein pairs. AlphaFold, which was invented by the Alphabet subsidiary DeepMind, was used to re-evaluate interaction probability and model complex structures.
By taking advantage of proteome-wide amino acid coevolution analysis and deep-learning-based structure modeling, the scientists systematically identified and built accurate models of core eukaryotic protein complexes within the Saccharomyces cerevisiae proteome. The scientists detailed this work in an article (“Computed structures of core eukaryotic protein complexes”) that appeared November 11 in the journal Science.
“[We screened] through paired multiple sequence alignments for 8.3 million pairs of yeast proteins,” the article’s authors wrote. “[We identified] 1,505 likely to interact and [built] structure models for 106 previously unidentified assemblies and 806 that have not been structurally characterized. These complexes, which have as many as five subunits, play roles in almost all key processes in eukaryotic cells and provide broad insights into biological function.”
These findings have implications for understanding the biochemical processes that are common to all animals, plants, and fungi.
As part of a multi-institutional collaboration, the lab of David Baker at the UW Medicine Institute for Protein Design helped guide this new development.
“To really understand the cellular conditions that give rise to health and disease, it’s essential to know how different proteins in a cell work together,” said David Baker, PhD, the director of the Institute for Protein Design and one of the current study’s two senior authors. “In this paper, we provide detailed information on protein interactions for nearly every core process in eukaryotic cells. This includes over a hundred interactions that have never been seen before.”
Proteins, the workhorses of all cells, rarely act alone. Different proteins often must fit together to form precise complexes that carry out specific tasks. These can include reading genes, digesting nutrients, and responding to signals from neighboring cells and the outside world. When protein complexes malfunction, disease can result.
“This work shows that deep learning can now generate real insights into decades-old questions in biology—not just what a particular protein looks like, but also which proteins come together to interact,” added Qian Cong, PhD, assistant professor of biophysics at the University of Texas Southwestern Medical Center and the study’s other senior author.
The hundreds of protein complexes for which detailed structures were generated provide rich insights into how cells function. For example, one complex contains the protein RAD51, which is known to play a key role in DNA repair and cancer progression in humans. Another includes the poorly understood enzyme glycosylphosphatidylinositol transamidase, which has been implicated in neurodevelopmental disorders and cancer in humans. Understanding how these and other proteins interact may open the door to the development of new medications for a wide range of health disorders.
The protein structures generated in this work are available to download from the ModelArchive. The researchers thank and remember the late John Westbrook at the Protein Data Bank for his support in establishing formats and software code to allow efficient deposition of the models into the archive. The Science paper reporting the results was in preparation at the time of Westbrook’s death.