Scientists at Scripps Research have turned to a computational approach usually used to pinpoint the best spot for an oil well, to help in the design of potential new treatments for rare genetic diseases, such as cystic fibrosis (CF). By using the method to analyze the spatial relationships between different variants of a protein—instead of the relationships between test wells across a landscape—the researchers found that they could obtain valuable information on how disease affects a protein’s underlying shape, and how drugs can restore that shape to normal.
The new method requires only a handful of gene sequences, collected from people with the disease. It then determines how the structure of each corresponding protein variant is associated with its function, and how this functional structure can affect pathology and be repaired by therapeutics. To show its utility, the Scripps Research team used the method to show why existing drugs for CF fall short of curing the disease.
“This is an important step forward for treating rare diseases,” said senior author William Balch, PhD, professor of molecular medicine at Scripps Research. “The fact that we can get so much information from a few gene sequences is really unprecedented.”
Balch, together with colleagues Chao Wang, PhD, and Frédéric Anglès, PhD, reported on their study in Structure, in a paper titled, “Triangulating variation in the population to define mechanisms for precision management of genetic disease.”
Variation is foundational for driving evolvability and diversity in biology, the authors wrote. “…understanding genetic variation in the population is critical to reveal the functional mechanisms of wild-type (WT) protein fold, which, when disrupted by variation, strongly impacts the individual’s susceptibility and clinical presentation of disease.
Studies on inherited diseases often rely on techniques that determine the precise three-dimensional shape of a protein affected by disease. But genetic diseases can be caused by dozens—or even hundreds or thousands—of different variants of the same gene. Some of these variants destabilize or change the protein shape in ways that make isolating the protein for further investigation much more difficult than usual.
Balch, with Scripps Research senior staff scientist Wang, and staff scientist Anglés, wanted to use natural variation to their advantage. For most genes in the human genome, numerous variants exist in the human population; some of these variants cause disease and others have little impact on biology and go unnoticed. The team had previously shown that genetic diversity contributing to protein sequence diversity can be framed through “variation spatial profiling,” or VSP, which enables a better understanding of what is termed “sequence-to-structure” relationships that drive the function of a protein fold. VSP applies a form of probabilistic machine learning tool known as Gaussian process regression (GPR) to construct “phenotypic landscapes” based on spatial covariance (SCV) in biological systems, they explained. “SCV defines the covariant connections between the genetic variants found in the population by linking sequence to the multi-dimensional functional and structural roles of the protein fold in biology.”
Building on this, the team’s newly developed variation-capture (VarC) mapping analyzes the natural array of gene sequences and determine the mechanism by which each changed a protein’s structure to cause disease. Balch’s group integrated GPR machine learning and statistical tools into VarC which with only a few gene sequences, let the researchers determine the most likely structural mechanisms driving function for each variant leading to disease, as well as model how drugs impacted those structural functions.
“To understand mechanistically how the protein fold is shaped by therapeutics to inform precision management of disease, we developed VarC mapping,” they noted. “VarC triangulates sparse sequence variation information found in the population using GPR-based machine learning to define the combined pairwise-residue interactions contributing to dynamic protein function in the individual in response to therapeutics.”
To demonstrate utility of the system the researchers applied VarC mapping to CF, a disease that results from genetic variants in the cystic fibrosis transmembrane conductance regulator (CFTR) gene, and which leads to a build up of mucus in the lungs. More than 2,000 variants of the CFTR gene have been identified and described in a patient database. Researchers knew that many of these variants had very different effects on the CFTR protein, but it has been difficult to compare and contrast these variants to guide how patients with different variants should be treated differently in the clinic.
“When you want to treat patients, you really have to appreciate that different therapeutics might target different variants in completely different ways, and that’s why our approach that looks at many different variants all at once is so powerful,” said Wang. “Our approach not only reveals how these variants contribute to each patient’s biology, but also connects them in a way that each variant can inform how to manage the others.”
The researchers input the information of about 60 genetic variants found in the CF population into their VarC program. The computational analysis captured how each amino acid residue talks to every other residue to generate function, and revealed that for most of the CF patients there was the same net effect on the protein: an unstable inner core.
When the Scripps Research team then used the program to model how existing CF drugs impacted the structures, they discovered that although the drugs impact the CFTR protein in different ways none of them effectively stabilized the protein’s inner core hidden from view, like the location of an oil reservoir in a complex landscape that is revealed by test wells.
“Using an SCV principled approach, where a sparse collection of variant input derived from the worldwide population can be interpreted by GPR to yield as output a quantitative view of sequence-to-function-to-structure relationships spanning the entire protein fold, we have discovered a key energetic core of the CFTR fold that is critical for CFTR function but only weakly impacted by current therapeutics,” the team noted.
With a better understanding of the structural deficiencies in CFTR in CF patients, Balch and colleagues say that the job of developing an effective drug will be much easier. Potential compounds can be modeled in advance of lab experiments for their effect on the inner core of the CFTR protein.
“In most drug discovery, you throw thousands of compounds at a protein and see which ones change it, often without fully understanding the mechanism,” said Balch. “To fix a thing, you must first understand the problem.”
Balch suggests that CF isn’t the only disease likely to be solved with their VarC approach. Any genetic disease can be analyzed in the same way, using knowledge of patient variants found in the population along with the information on symptoms triggered by each variant. “We really think we can do this for any protein out there,” said Balch. “It’s a fast track toward drug discovery for rare diseases that have been very hard and slow to study in the past.”
They suggest that VarC effectively links genome variation to proteome function in the population, providing a generalizable tool that can triangulate information from genetic variation in the population to mechanistically discover therapeutic strategies for guiding precision management for individuals. The team concluded, “These results demonstrate that VarC, through a dynamic GPR-based covariance matrix, provides a new tool to describe sequence-to-function-to-structure relationships that are compromised in human genetic disease. VarC provides a new approach that is distinct from algorithms that predict only structures, because VarC, as a genome-based tool, captures structural relationships in the context of native function of the fold framed by physical, chemical, and/or cell biologic properties found in the evolving human population.”
The team is already applying the method to other rare genetic diseases, as well as pursuing new drugs to treat CF.