Researchers from Northwestern University and the Institute for Industrial Science at the University of Tokyo say they have developed a new high-throughput approach known as cDNA display proteolysis to evaluate the folding stability of nearly a million proteins in a single experiment.
Failure to fold properly or maintain its three-dimensional structure can disrupt protein function and lead to disease. Insight into how protein folding stability is maintained will shed further light on diseases involving misfolded proteins.
However, it has previously been difficult to evaluate protein folding stability in an efficient and large-scale manner, according to the research team. Therefore, they sought to develop a platform to assess protein folding stability in a reproducible, high-throughput way. Their study, “Mega-scale experimental analysis of protein folding stability in biology and design” appears in Nature.
Advances in DNA sequencing and machine learning are providing insights into protein sequences and structures on an enormous scale. However, the energetics driving folding are invisible in these structures and remain largely unknown. The hidden thermodynamics of folding can drive disease, shape protein evolution, and guide protein engineering, and new approaches are needed to reveal these thermodynamics for every sequence and structure,” the investigators wrote.
“Here we present cDNA display proteolysis, a method for measuring thermodynamic folding stability for up to 900,000 protein domains in a one-week experiment. From 1.8 million measurements in total, we curated a set of around 776,000 high-quality folding stabilities covering all single amino acid variants and selected double mutants of 331 natural and 148 de novo designed protein domains 40–72 amino acids in length. Using this extensive dataset, we quantified (1) environmental factors influencing amino acid fitness, (2) thermodynamic couplings (including unexpected interactions) between protein sites, and (3) the global divergence between evolutionary amino acid usage and protein folding stability.
Identifying stability determinants
“We also examined how our approach could identify stability determinants in designed proteins and evaluate design methods. The cDNA display proteolysis method is fast, accurate, and uniquely scalable, and promises to reveal the quantitative rules for how amino acid sequences encode folding stability.”
“We began with a technique in which proteins are attached to their own DNA,” said lead author of the study Kotaro Tsuboyama, PhD. “Using DNA libraries, we generated a large number of these protein–DNA complexes and treated them with enzymes that destroy unfolded proteins. The intact proteins, which were able to maintain their folded structures during enzyme treatment, were then identified using DNA sequencing.”
This method allowed the research team to evaluate the stability of up to 900,000 protein sequences in a single test tube. To examine how individual elements within a protein sequence affect folding stability, the researchers used this method to analyze a series of natural and designed protein domains.
“We were able to identify a number of factors that contribute to protein stability,” said senior author Gabriel J. Rocklin, PhD, at Northwestern University. “We also used our approach to analyze the effects of specific mutations in protein sequences, and to identify determinants of stability in designed proteins, providing insight that can help advance protein design methods in the future.”
While previous methods for assessing protein stability have been limited to evaluating single-protein sequences, the cDNA display proteolysis method permits the evaluation of many proteins in a single experiment, supplying an unprecedented amount of information regarding protein stability. This approach may advance the development of new predictive models of protein folding, which may further our understanding of diseases involving protein misfolding.