Approach published in PNAS combines covariance with a technique called message passing.

Researchers have devised a computational technique that better predicts how bacterial proteins fold and interact. They used the established method of covariance analysis but went a step further by placing the covariance data into a message-passing program. The result is more accurate information with less reported indirect reactions.

The message-passing technique is dependant on the availability of extensive genomic data, and some 800 or so bacterial genomes have been fully sequenced. Applying message passing to animals will have to wait until a similar volume of genomic data is available for them. Ultimately, some form of the technique could identify important protein interactions in humans, which would open a wealth of new drug targeting possibilities.

Covariance is a statistical method that involves studying the amino acids found at specific locations on various protein sequences culled from genomics data. It identifies between two proteins residue positions that vary together from residue positions that vary at random.

Within about a week of computing, the message-passing program analyzed a mass of information and identified patterns with the highest ranking. Continued analysis ultimately yielded predictions about which pairings were in fact direct interactions.

Though covariance has proven quite effective at identifying critical residues that bind directly with other proteins or other spots on the same protein, the method also identifies a high percentage of residues that turn out to not be involved in these direct interactions.

To winnow such indirect interactions, the group focused on the proteins involved in the well-studied two-component signaling system, which is responsible for a range of critical functions in bacteria. The first step of the work was to analyze the countless proteins involved in this system applying standard covariance techniques to available genomics data. The full analysis included about 2,500 different protein pairings and considered the potential interactions between about 100 residues on each protein in a pair.

With a given protein binding site, the message passing identified, on average, 10 direct interactions accurately before giving a single false positive. Given that researchers can identify the active binding site for proteins by knowing as few as three directly interaction residues, this success rate is more than enough to identify a new drug target, according to the investigators. In the case of proteins that interact with themselves, there were 23 correct pairings identified before a first false positive.

“It’s really the last frontier in proteins,” says team leader James Hoch, Ph.D., a professor at The Scripps Research Institute, “figuring out who they interact with and the structures they make.”

The article was written by scientists from Scripps Research Institute and the University of California, San Diego. It appears in this week’s early edition of the Proceedings of the National Academy of Sciences.

Next articleMannKind Inks $2.5M Option Agreement for SemBioSys’ Plant-Produced Insulin