Tom Chittenden
Tom Chittenden, PhD, DPhil, PStat, Chief A.I. Scientist at WuXi NextCODE

Sponsored content brought to you by

WuXi NextCODE Logo

Researchers at U.S.-based WuXi NextCODE and the Yale School of Medicine published two new studies in the Journal of Experimental Medicine and Nature Metabolism on novel, artificial intelligence (A.I.) approaches that breathe new life into big data for complex diseases. A deep dive into these publications demonstrates breakthrough, biologically-validated A.I. approaches with the potential to understand virtually any disease in much greater detail using cost-effective designs of therapeutics.

With precision medicine poised to transform patient care, the big data revolution in healthcare and drug development has taken center stage. Yet, the promise of precision medicine hinges on the ability to reduce the complex interconnections of large, multi-omic data sets into useful biological information.

One of the long-term promises of precision medicine is to understand underlying disease mechanisms to ultimately improve our ability to diagnose, treat, and prevent a diverse array of conditions. But unlike using A.I. to recognize faces in a photo or words in an audio file, this is not a simple classification problem. To identify a single genetic variation that is correlated with a disease state, for example, requires understanding that each one of the 3.2 billion positions of the human genome does not operate independently of one another—it’s all connected.

Traditional A.I.—the Old Guard

Traditional analytical approaches search for correlations between data points—such as between a gene variant and a disease phenotype or between gene regulation and disease severity—looking for areas of overlap to infer a potential disease pathway. While these traditional approaches are being employed to accurately segregate a disease state from a healthy state, upon closer inspection the genes associated with these classes are not consistently represented between runs of the algorithm. Furthermore, these approaches provide black box predictions that fail to provide causal insight into a set of biomarkers that have emerged at the top of the list. These approaches in designing and testing therapies have a higher likelihood of resulting in failure,  given that the unvalidated, non-replicable correlations between biomarkers and phenotypes provide a qualitatively lower level of confidence for making R&D investment decisions. Often the output reflects nothing more than statistical artifacts.

What is required to increase confidence and understanding is an approach capable of mimicking the complex signaling cascades of cellular networks that receive a signal, translate it, and transmit the output message to the next cell to produce a healthy or disease state.

This question is too big and too open-ended for hypothesis-driven biology, but one potentially well suited for the ensemble Artificial Intelligence (A.I.) approaches used in two recent landmark publications and developed by the WuXi NextCODE Advanced A.I. Research Laboratory.

According to the website, WuXi NextCODE is a genomic data and insights company headquartered in the U.S., in Cambridge MA, collecting disease-specific genomic and deep phenotypic data across more than 60 diseases; and leveraging a proprietary platform, tools, and advanced analytics to help biopharma and academic partners identify disease drivers, drug targets, and biomarkers.  Yale School of Medicine is a preeminent academic medical center based in New Haven, CT.

In the collaboration, Yale worked with WuXi NextCODE’s novel A.I. approaches to understand the etiology of cardiovascular disease (CVD), resulting in the identification of a novel approach to treating atherosclerosis—a disease leading to the buildup of plaques in vessel walls that underly heart attack, stroke, and peripheral artery disease; collectively the leading causes of death worldwide. For this disease, as with many complex diseases, anomalous molecular signals result in aberrant cell populations driven by largely unknown pathways.

The results of this work were published in a series of publications in the Journal of Experimental Medicine (JEM) and Nature Metabolism by the Yale researchers, led by Michael Simons, MD, and the WuXi NextCODE’s Advanced A.I. Research Laboratory, led by Tom Chittenden, PhD, DPhil. These landmark publications outline the use of novel ensemble A.I. approaches, to not only produce robust in silico mechanistic predictions but also provide experimental validation of major new therapeutic approaches that may slow and even reverse cardiovascular disease.

Identifying Casual Drivers of Cardiovascular Disease
Research Collaboration with WuXi NextCODE and Yale Cardiovascular Research Center. Deep Learning, BBN Analysis, and NLP of Single Cell RNA-seq Data. Modified from Figure S5, Ricard et al., JEM 2019.

Phenotype Projection

In the first study, published in Journal of Experimental Medicine, the researchers showed that a family of proteins—known as transforming growth factor beta (TGFβ)—activate a signaling cascade in response to the disruption of another signaling molecule, ERK1/2. The mechanism leading to this cascade of signals remained elusive. To unravel the biology, the A.I. team first reduced the dimensionality of the large gene feature space by applying a deep artificial neural network (DANN) to identify the most informative genes that differentiate bulk vasculature cells with and without functional ERK1/2. This is where traditional approaches may have stopped, having been challenged by how the list of genes could be ordered into a meaningful cellular program. Dr. Chittenden and his team overcame this challenge by employing probabilistic programming to test the causal dependencies of millions of possible interactions between these proteins. The output of this analysis revealed a causal gene network driven by TGFβ2 and accurately predicted the observed vascular pathologies of hypertension and renal dysfunction, providing a causal structure of the network, linking genotype to disease. Coined phenotype projection, this efficient A.I.-driven approach can predict complex phenotypes by teasing out the causal molecular underpinnings of disease. This study demonstrates the applicability of phenotype projection to derive the drivers and causal network of vascular regulation.

From the JEM study press release:

“It is a real milestone to be able to draw out and validate a causal biological network using such an efficient and replicable AI approach,” said Dr. Simons, Professor of Medicine and Cell Biology at Yale and senior author on the paper. “We have become quite good at making observations that correlate a genotype and a phenotype, but tracing the biology that lies between has always been elusive because it is so complex. The promise of AI is that it is powerful enough to bring together all the biology and genetic data we now have to unravel this complexity, and Tom’s group is leading the way in showing how this can be done.”

“Today we are providing a first concrete look at what we call phenotype projection: an efficient AI-driven approach that can predict complex phenotypes by teasing out the causal molecular underpinnings of disease,” said Dr. Chittenden, Ph.D., DPhil, co-senior author. “By furthering our collective understanding of biology, such approaches hold the potential to be truly transformative. It means that we can understand virtually any disease in much greater detail using cost-effective experimental designs, a fundamental capability for creating precision medicine. The result is a range of validated potential points for developing therapeutic interventions; validated markers for designing smaller clinical trials with a greater chance of success; and a wealth of information for identifying patients likely to respond to approved compounds.”

Single Cell A.I.

With a better understanding of the underlying gene network driving vascular regulation, the team next performed single-cell analysis with generative deep-learning models to unmask a new mechanism of atherosclerosis. The findings, published in Nature Metabolism demonstrate that in endothelial cells, which form the lining of blood vessels, the activity of TGFβ is critical to the establishment and growth of atherosclerotic plaques. To unveil the complex interconnections driving this condition would require high resolution, single-cell analysis of aortic endothelial cells where the TGFβ signaling pathway was intact or disabled in mice exposed to similar risk factors for atherosclerosis in humans. To accomplish this, WuXi NextCODE’s statistical machine learning was applied to the single cell data and established how transcription was clustered in the different test and control groups through unsupervised approaches, working through the potential bias and noise of the datasets to identify two clusters that were driving the effect of TGFβ signaling and suppression. This work establishes a new mechanistic role of endothelial cell vs. smooth muscle cell TGFβ signaling, which drives vascular inflammation and atherosclerosis. Interestingly, this result reveals a similar biological mechanism to tumors where a clonal subpopulation drives the aberrant cellular outgrowth. To biologically validate these findings, the teams then silenced TGFβ signaling in the aortic endothelium of mice with high plaque burden, where the plaques were significantly reduced, effectively demonstrating a mechanism for reversing the disease process.

According to the Nature study press release:

“Our AI has played a key role in finding and validating in vivo a promising new mechanism for combatting the disease that kills most people in the world today,” said Dr. Tom Chittenden, co-senior author on the paper. “Mike Simons’ team has the single-cell biology know-how to generate exactly the right data, and our AI enables us to interpret it and overcome the bias that plagues so much of the field. The proof is in the pudding, and the success of the therapeutic approach shows that we have put our finger on a major driver of disease.”

“Dr. Simons, co-senior author on the paper, said: “As we develop precision medicine going forward, I believe the precision of the research is going to be of the essence. We need to be able to look not only at single cells but focus on specific cell types and aberrant cell populations within these. In this study, our group did the former, but Tom Chittenden’s statistical machine learning was key to homing in on the latter, pinpointing and giving us confidence in the signal we needed.”

This novel A.I. approach teaches and informs the understanding of the vastly complex molecular underpinnings driving cellular behavior in any given experimental design or condition. The confirmation of the in-silico predictions in vivo demonstrate a novel, biologically-validated A.I. approach with the potential to understand virtually any disease in much greater detail using cost-effective designs.

Previous articleE. coli Host Cell Protein Kit
Next articleMendeley
Previous articleE. coli Host Cell Protein Kit
Next articleMendeley