Protein designers want to venture boldly into uncharted protein space, but they tend to explore only so far as their computational models and experimental screens allow. Essentially, protein designers hug the shore of what is known about the folding of native proteins. Unfortunately, this approach can take protein designers in the wrong direction. Native proteins have evolved for function, not stability, so “natural” protein space may give a limited view of the connections between amino acid sequences and folded, three-dimensional protein structures.
A broader view, one that would survey natural and unnatural protein structures, would be instructive, but to date, screens of computationally derived structures have been limited to 50 to 100 designs. Screens of thousands, however, have been demonstrated by scientists based at the University of Toronto and the University of Washington. These scientists, led by Toronto’s Cheryl Arrowsmith, Ph.D., and Washington’s David Baker, Ph.D., have taken data-driven, high-throughput protein design well beyond the old protein-folding coastline.
The researchers tested more than 15,000 newly designed miniproteins that do not exist in nature to see whether they form folded structures. “We learned a huge amount at this new scale, but the taste has given us an even larger appetite,” said Gabriel Rocklin, Ph.D., a researcher in Dr. Baker’s laboratory. “We're eager to test hundreds of thousands of designs in the next few years.”
Details of this work appeared July 14 in the journal Science, in an article entitled “Global Analysis of Protein Folding Using Massively Parallel Design, Synthesis, and Testing.” This article describes a testing project that led to the design of 2788 stable protein structures and could have many bioengineering and synthetic biology applications.
“We combined computational protein design, next-generation gene synthesis, and a high-throughput protease susceptibility assay to measure folding and stability for more than 15,000 de novo designed miniproteins, 1000 natural proteins, 10,000 point mutants, and 30,000 negative control sequences,” wrote the article’s authors. “This analysis identified more than 2500 stable designed proteins in four basic folds—a number sufficient to enable us to systematically examine how sequence determines folding and stability in uncharted protein space.”
For decades, researchers have studied protein folding by examining the structures of naturally occurring proteins. However, natural protein structures are typically large and complex, with thousands of interactions that collectively hold the protein in its folded shape. Measuring the contribution of each interaction becomes very difficult.
The scientists addressed this problem by computationally designing their own much simpler proteins. These simpler proteins made it easier to analyze the different types of interactions that hold all proteins in their folded structures.
“Still, even simple proteins are so complicated that it was important to study thousands of them to learn why they fold,” explained Dr. Rocklin. “This had been impossible until recently, due to the cost of DNA. Each designed protein requires its own customized piece of DNA so that it can be made inside a cell. This has limited previous studies to testing only tens of designs.”
To encode their designs of short proteins in this project, the researchers used what is called DNA oligo library synthesis technology. It was originally developed for other laboratory protocols, such as large gene assembly. One of the companies that provided their DNA is CustomArray in Bothell, WA. They also used DNA libraries made by Agilent and Twist Bioscience.
By repeating the cycle of computation and experimental testing over several iterations, the researchers learned from their design failures and progressively improved their modeling. Their design success rate rose from 6% to 47%. They also produced stable proteins in shapes where all of their first designs failed.
Their large set of stable and unstable miniproteins enabled them to quantitatively analyze which protein features correlated with folding. They also compared the stability of their designed proteins to similarly sized, naturally occurring proteins.
The most stable natural protein the researchers identified was a much-studied protein from the bacteria Bacillus stearothermophilus. This organism basks in high temperatures, like those in hot springs and ocean thermal vents. Most proteins lose their folded structures under such high-temperature conditions. Organisms that thrive there have evolved highly stable proteins that stay folded even when hot.
“A total of 774 designed proteins had a higher stability scores than this most protease-resistant monomeric protein,” the researchers noted. Proteases are enzymes that break down proteins, and were essential tools the researchers used to measure stability for their thousands of proteins.
The researchers predicted that, as DNA synthesis technology continues to improve, high-throughput protein design will become possible for larger, more complex protein structures.
“We are moving away from the old style of protein design, which was a mix of computer modeling, human intuition, and small bits of evidence about what worked before,” asserted Dr. Rocklin. “Protein designers were like master craftsmen who used their experience to hand-sculpt each piece in their workshop. Sometimes things worked, but when they failed it was hard to say why. Our new approach lets us collect an enormous amount of data on what makes proteins stable. This data can now drive the design process.”