Scientists from the Stowers Institute for Medical Research say they have created a new way to quickly and efficiently define individual protein associations. Their study (“Topological scoring of protein interaction networks”), published in Nature Communications, shows how the topological scoring (TopS) algorithm, created by Stowers researchers, can, by combining data sets, identify proteins that come together, according to the team.

cluster map
A cluster map showing the profiles of bait proteins (rows) that associate with human DNA repair and epigenetic proteins (columns) based on high topological scoring (TopS) values. Yellow (high TopS score) indicates a higher protein interaction preference. [Washburn Lab, Stowers Institute for Medical Research]
Not only does this help researchers identify how proteins perform biological functions or carry out biological processes, the algorithm can be applied to previously generated biological data and potentially other areas of science to glean new information, explained the scientists.

“It remains a significant challenge to define individual protein associations within networks where an individual protein can directly interact with other proteins and/or be part of large complexes, which contain functional modules. Here we demonstrate the topological scoring (TopS) algorithm for the analysis of quantitative proteomic datasets from affinity purifications,” they wrote.

“Data is analyzed in a parallel fashion where a prey protein is scored in an individual affinity purification by aggregating information from the entire dataset. Topological scores span a broad range of values indicating the enrichment of an individual protein in every bait protein purification. TopS is applied to interaction networks derived from human DNA repair proteins and yeast chromatin remodeling complexes. TopS highlights potential direct protein interactions and modules within complexes. TopS is a rapid method for the efficient and informative computational analysis of datasets, is complementary to existing analysis pipelines, and provides important insights into protein interaction networks.”

The approach is similar to looking at the activities and interactions of all the individuals in a community and then selecting out the most meaningful interactions, some which may be very rare. The researchers are looking for the biological equivalent of two individuals who may be the only two in the entire community that participate in an important interaction.

“It’s a form of big data analysis that we are applying to proteomics data to identify and understand protein interaction networks,” said Michael Washburn, PhD, director of the Stowers Proteomics Center. “It’s complementary to a lot of techniques already in use so it can be used to ask and answer new questions.”

Protein data sets can be challenging to examine for meaningful information because they are so large. “You have thousands of proteins to look at,” said Mihaela Sardiu, PhD, a senior research specialist at Stowers. Understanding how a wide variety of proteins come together to do something, like repair DNA, is a difficult problem. “We wanted to simplify the problem.”

That meant instead of taking an overall view of everything, they hunted for less common events. Researchers did this by looking for bait (proteins that are already known to be involved in processes of interest) and prey (proteins that could interact with bait proteins) to see how they interacted in human DNA repair and yeast chromatin remodeling complexes. Through TopS, data is analyzed in a parallel fashion, meaning that data from several biologically-related baits are considered at the same time.

A key attribute of TopS is the ability to evaluate the preference of a prey protein for a bait relative to other baits. “Instead of calculating a score by concentrating only information of a single bait, we now aggregate information from the entire data set,” explained Sardiu.

Washburn and Sardiu believe that TopS can be applied to a wide range of data sets beyond proteomics, in both basic research and beyond. Sardiu sees potential in using it for healthcare data, where physicians might be able to compare a patient’s health to others, like being able to tell if a patient’s disease is “really advanced compared to others or not,” she said.

The team has also published these findings on Github, a computer code repository, because they want to offer other researchers the opportunity to test the algorithm and see how they can apply it to their own projects.

“We’re excited to see how far this can go. It’s a potentially high impact tool and we want to see what other creative and innovative people can come up with,” said Washburn. “We think this is a really valuable potential tool for a lot of people out there who struggle with the challenge of sorting through very large-scale data.”

Previous articleEnd Uncertainty: Break Through the Roadblocks to Monoclonal Antibody (mAb) Characterization
Next articleFDA Approves Tecentriq-Abraxane Combo as First Immunotherapy Regimen for Breast Cancer