Understanding how the wealth of genetic variation in the human genome impacts on disease could potentially transform healthcare, but while we know the consequences of perhaps a handful of specific genetic mutations, our ability to interpret the meaning of millions of genetic variations identified through genome sequencing remains a challenge.

Researchers at Harvard Medical School and Oxford University have now developed an artificial intelligence (AI) tool called EVE (evolutionary model of variant effect), which uses a sophisticated type of machine learning to detect patterns of genetic variation across hundreds of thousands of nonhuman species and then use them to make predictions about the meaning of variations in human genes.

In a study published in Nature, the team used EVE to assess 36 million protein sequences and 3,219 disease-associated genes across multiple species. Their results suggested that 256,000 previously identified human gene variants currently of unknown significance should, in fact, be reclassified as either benign or disease causing. While the researchers emphasize that EVE is not a diagnostic test, they say it could augment current clinical tools used by geneticists and other physicians to make diagnoses, predict disease progression, and even choose treatment based on the presence of certain disease-causing genetic mutations. “Increasingly, people have access to sequencing their genomes, but making sense of the data is not always straightforward,” said study senior author Debora Marks, PhD, associate professor of systems biology in the Blavatnik Institute at HMS. “There is very little information about what it even means for likelihood of disease or disease progression … We believe our approach can be used as an added tool in current clinical assessments and offers a powerful new way to reduce uncertainty and clarify decision-making, particularly in the clinical setting.

Marks co-led the reported research, alongside colleague Yarin Gal, PhD, at Oxford University, co-first authors Jonathan Frazer, PhD, and Mafalda Dias, PhD, at Harvard Medical School, and Pascal Notin at Oxford. In the scientists’ report, titled, “Disease variant prediction with deep generative models of evolutionary data,” they concluded, “Our work suggests that models of evolutionary information can provide valuable independent evidence for variant interpretation that will be widely useful in research and clinical settings.”

No two human beings are the same, and this is a biologic singularity encoded in the unique arrangement of each person’s DNA. But while this genetic variation is a cardinal feature of biology that drives diversity, and represents the engine of evolution, it also has a dark side.

Alterations in DNA sequences and the resulting proteins that build our cells can sometimes lead to profound disruptions in physiologic function and cause disease. But understanding which variants impact on disease is a huge challenge, and relating specific changes in the human genome to disease continues to bedevil the field of clinical genetics because the number of variants in the human population dwarfs the number that scientists can investigate.

Even though only a tiny fraction of the human population has been sequenced, researchers are already seeing millions of variants whose significance and meaning are unclear. Of those variants, only 2% are classified as benign, neutral, or pathogenic. The remaining 98% of the identified gene variants are currently deemed of “unknown significance.” The authors commented, “The exponential growth in human genome sequencing has underlined the substantial genetic variation in the human population … Quantifying the pathogenicity of protein variants in human disease-related genes would have a marked effect on clinical decisions, yet the overwhelming majority (over 98%) of these variants still have unknown consequences … relating specific changes in the genome to disease phenotypes remains an open challenge as the number of variants in the human population exceeds the number that we are able to investigate.”

The stakes of accurately interpreting the meaning of genetic variation are enormous. Reading a benign variation as disease-causing could lead to erroneous diagnosis, fueling a cascade of further testing and potentially unnecessary medical interventions. Conversely, misinterpreting a disease-promoting DNA variant as benign could provide false reassurance when observation, further testing, or preventive measures may be mandated.

In the human genome, protein-coding regions alone contain large variation between people, and to date, 6.5 million missense variants have been observed, the team noted. These so-called missense mutations may have no effect on the function of a protein, or they may render the protein dysfunctional, causing disease. In fact, researchers estimate there may be a variant for every protein position—save for lethal ones—in the genomes of the 8 billion people inhabiting the planet.

“There’s many ways in which one person doesn’t just have one genome,” Marks said. “You may have a different variant on one copy of a gene and, as we age, there are all sorts of somatic variations that occur—not only related to cancer development but to neurodegeneration, both of which are age-related processes driven by mutation.”

There are a number of disease-associated genes for which researchers have identified mutations that carry high risk of clinical disease. These include BRCA1 and BRCA2 for breast and ovarian cancers, and the tumor-suppressor gene p53 for a range of cancers. But even those genes have shown other unstudied mutations, the significance of which remains unclear. All of this creates an urgent need to clarify the significance of genetic variations in humans—a process in which computation is going to play an increasingly important role in providing answers, Marks said.

A defining feature of neural networks is their capacity to continually reassess and update the probability of a hypothesis as new data become available. This means that neural networks can reevaluate evidence using new knowledge and therefore can detect patterns and meanings missed by traditional methods.

In the current study, the researchers used a sophisticated type of analysis known as unsupervised machine learning, a form of artificial intelligence that is not based on predefined parameters and rules but instead involves adaptive learning. What this means is that when presented with new data, a machine learning algorithm will become better at recognizing patterns over time. By contrast, in supervised machine learning, the algorithm learns to detect patterns from prelabeled data—its training has been supervised.

In a classic example given by informaticians, the algorithm is presented with cat and dog images and told which ones are which before it gets challenged to recognize unlabeled images of cats and dogs. In unsupervised machine learning, the algorithm is given a set of cat and dog images and not told which ones are which. It must discern the patterns on its own. “Because the algorithm doesn’t need to know in advance which images are cats, which images are dogs—it just needs a bunch of images of cats and dogs—there’s no way of using information that it shouldn’t know,” Gal further noted.

Both types of machine learning offer advantages for specific tasks. One advantage of unsupervised models is that there is no chance of biasing their learning by feeding them prelabeled data. Also, they can adapt as the data change to perform more complex analyses. Most current computational methods used to assess the significance of gene variants employ supervised training based on clinical labels, which may bias these tools and cause inflated accuracy of prediction in the real world, the researchers said. “In principle, computational methods could support the large-scale interpretation of genetic variants. However, state-of-the-art methods have relied on training machine learning models on known disease labels. As these labels are sparse, biased and of variable quality, the resulting models have been considered insufficiently reliable.”

It is precisely the ability of unsupervised machine learning to detect new patterns from never-before encountered data that renders this approach especially suitable for analyzing genetic sequences from non-humans. Scientists have used comparative genetics for many years to detect regions of similarity across DNA or protein sequences to draw meaning. The Harvard-Oxford team used a neural network to do so on a much greater scale.

For their reported study, the researchers revisited the concept, that by studying genetic variation across multiple species they might glean clues about the significance of variation in humans. “… we revisit the clinical value of evolutionary information in light of recent developments in unsupervised generative modeling,” they noted. Evolution tends to preserve features that are critical, or at least important, to function and survival across species. Thus, amino acid arrangements that recur across species are markers of biologic importance, indicating that they are important to an organism’s function and its evolutionary fitness. So, alterations to such highly conserved sequences may spell trouble, and link to pathogenicity.

The computational method analyzed data from 140,000 species, including endangered and extinct organisms, and effectively looked for evolutionarily conserved patterns to draw conclusions. “Our method—EVE—learns the propensity of human missense variants to be pathogenic from the distribution of sequence variation across species,” the team wrote. “These species are a long way away evolutionarily speaking, and there are many genetic differences, but taken together, they give us information,” Marks said. “This is why the model is so powerful about patterns that are relevant for humans and human variation.”

After training on 250 million protein sequences, EVE estimated the likelihood of each single amino acid variant being either benign or pathogenic. To determine whether EVE was making accurate predictions, the researchers compared its scores with established human mutations for which the significance is already known. The tool’s results were remarkably consistent with the clinical data, the team found.

Next, the researchers applied EVE to a set of 3,219 human genes associated with disease. EVE made the right call on whether the mutation was pathogenic or benign across all genes, including 60 “clinically actionable” genes, the researchers said. When researchers compared EVE’s performance with that of other supervised and unsupervised tools, it showed notably greater accuracy of prediction. Indeed, the analysis showed that EVE outperformed other computational prediction models in predicting clinical effect and also scored as high as or better than current gold-standard high-throughput experiments that test the effect of a mutation on biologic function. “EVE outperforms all supervised and unsupervised methods at predicting known clinical labels,” the team stated.

But how would EVE’s predictions fare compared with findings made from actual clinical experiments, the gold standard of assessing how a genetic mutation affects physiologic function? To answer this question, the team compared EVE’s scores against results from clinical experiments involving well-studied mutations in five genes, among them genes related to various forms of cancer, several cancer syndromes, and heart rhythm disorders. EVE’s predictions overlapped with current labels from experimental data. “Our model EVE … not only outperforms computational approaches that rely on labelled data but also performs on par with, if not better than, predictions from high-throughput experiments, which are increasingly used as evidence for variant classification … The primary advantage of our approach over experimental approaches is significant gain in scope at a negligible fraction of the cost.”

“Our results turned out to be far better than we expected,” Marks said. “It seems that by simply training a model to fit the distribution of sequences across evolution we extract information which enables us to make unexpectedly precise predictions about disease risk arising from a given genetic variant.”

A notable advantage that EVE has over current methods is that it assigns a continuous score rather than a binary score. This is because even when gene variants are labeled as benign or pathogenic, how a mutation might manifest physiologically is more nuanced.

“There’s a whole continuum of pathogenicity,” Marks said. “The continuous score is very important for predicting what the level of pathogenicity is. Does the mutation mean I am going to get pain in my little toe, or am I going to die tomorrow?”

Another important aspect of the tool is that it assigns a confidence-of-prediction score on a gene-by-gene basis. This can help clinicians contextualize the degree of certainty for any prediction. In other words, for each genetic variant, EVE tells the expert how much they can trust its call. This is a matter of trustworthiness, of confidence in the model, the researchers said.

“What we hope this approach will do is generate powerful data that can empower the clinicians on the frontlines to make the right diagnostic, prognostic, and treatment decisions,” Gal said. “We’re not providing clinicians merely with a number but also giving them the degree of uncertainty that comes with it. This is something that the expert can take and use in the decision-making process. The tool can say, ‘I think that variant belongs to that pile, but I’ve never seen any variants like that before so take that with a grain of salt.’ Or the tool can also say, ‘I think that that other variant belongs to this pile, and I’ve seen very similar variants to that in the past, and I saw them belonging to this pile and therefore I’m going to assign it to this pile with high confidence.’ Building trust between the tool and the expert is an important aspect of this work.”

This type of modeling is still in its infancy, and it’s clear that evolution and genetic variation still have much to teach us about disease, the researchers said, adding that they plan to extend the work to other parts of the genome beyond protein-coding regions. Nevertheless, they concluded, “An appealing prospect is that our method may be useful in guiding future experimental efforts, essentially acting as a means of identifying which variants and which genes would be most informative to probe.”

One urgent task for the immediate future is to make clinical use of the genetic variation for which we do have some understanding. To do so, the researchers have already teamed up with a genome-sequencing company and are collaborating with various groups via the Chan Zuckerberg Initiative.

The team is also participating in the Atlas of Variant Effects Alliance, a global research effort whose mission is to map the effects of variation across the genome and create a comprehensive atlas of all possible human gene variants and their effects on protein function and physiology. The ultimate goal of the effort is to improve the diagnosis, prognosis, and treatment of human disease.