A team of investigators at Dartmouth College has developed a new process that they believe could sharply reduce the work involved in computational protein design. The new technique—details for which were published recently in PNAS through an article entitled “A general-purpose protein design framework based on mining sequence-structure relationships in known protein structures”—uses 3D structural models to project how novel combinations of molecular blocks might work together to achieve the desired effect. The advancement, which focuses on a relatively small number of protein substructures rather than the infinite number of atomic-level combinations, could ease the development of new medications and materials.

“When you design a building, you don’t necessarily need to understand how grains of sand interact with each other within one brick,” explained senior study investigator Gevorg Grigoryan, PhD, an associate professor of computer science at Dartmouth. “Because you know what a brick is and what its properties are, you can instead focus on how bricks come together to form the desired shape. That’s the same approach we are taking. We only focus on protein sub-structures that we know work.”

The study authors added that the “current state-of-the-art approaches to computational protein design (CPD) aim to capture the determinants of structure from physical principles. While this has led to many successful designs, it does have strong limitations associated with inaccuracies in physical modeling, such that a reliable general solution to CPD has yet to be found.”

For years, researchers have focused on building custom proteins that can be useful in the human body. For example, custom proteins can be used to develop therapeutic drugs to fight disease. However, while many therapeutics like insulin is produced from naturally occurring proteins, the field has not advanced to allow widespread development of synthetic proteins.

Among the barriers to developing synthetic proteins is the overwhelming number of possible amino acid combinations. Sorting through combinations to find one that would be helpful in any given scenario is a time-intensive and resource-heavy process.

Researchers developing new drugs currently focus on how specific atoms interact. This approach requires labs to build large libraries of variants to find one that will complete the specified task. While this can produce useful results, researchers have found it challenging to build atomic models that have high levels of accuracy.

“The number of sequences is virtually infinite. This really complicates the process of finding a correct combination to fill a specific therapeutic need,” remarked lead study investigator Jianfu Zhou, a PhD student at Dartmouth.

To develop an optimized approach to protein design, the research team scanned a database of the 3D models of 150,000 known proteins. The team discovered that a small number of structural patterns frequently recurred in proteins and that much of the diversity in protein structure comes from how these building blocks are combined

“We proposed a design framework—one based on identifying and applying patterns of sequence-structure compatibility found in known proteins, rather than approximating them from models of interatomic interactions,” the authors wrote. “We carry out extensive computational analyses and experimental validation for our method. Our results strongly argue that the Protein Data Bank is now sufficiently large to enable proteins to be designed by using only examples of structural motifs from unrelated proteins. Because our method is likely to have orthogonal strengths relative to existing techniques, it could represent an important step toward removing remaining barriers to robust CPD.”

This basic discovery led the team to hypothesize that rather than modeling proteins as complex networks of interacting atoms, they can instead represent them much more simply as groupings of a limited set of structural building blocks.

With the new method, novel protein structures can be more easily judged against established patterns. The approach allows researchers to easily experiment with more creative designs by affording the chance to check them against a library of known structures.

“This technique takes the challenge away from getting the physics absolutely right at the atomic scale, potentially making computational protein design a much more robust process,” said Grigoryan. “Our findings should throw the doors for machine learning in protein design wide open.”

The new process focuses on the larger blocks of atoms that occur in proteins, known as tertiary motifs, to design functioning proteins. These are recurring structural arrangements—similar to an archway or column in a building—that can be applied to designing novel proteins without regard to their atomic-level composition.

Since the structures only come together in certain ways, researchers would no longer need to do the atomic-level guesswork. Researchers only focus on the blocks that fit together, ignoring those structures that would not form a functioning protein.

According to the research paper, the results “strongly argue that the Protein Data Bank is now sufficiently large to enable proteins to be designed by using only examples of structural motifs from unrelated proteins.”

By applying the new technique, the research team hopes to cut out the redundancy of rediscovering physical principles in protein structure by simply relying on those principles in the first place.

Previous articleExpanding the CRISPR Toolbox
Next articleEpigenetic Alzheimer’s Disease Biomarker Candidate Discovered in Peripheral Blood