A new predictive AI system has been developed to predict gene activity in human cells with high precision. Researchers at Columbia University Vagelos College of Physicians and Surgeons, led by Raul Rabadan, PhD, have created an AI model to better investigate and understand the inner workings of cells, both healthy and diseased.  

“Predictive generalizable computational models allow [us] to uncover biological processes in a fast and accurate way,” Rabadan said. “These methods can effectively conduct large-scale computational experiments, boosting and guiding traditional experimental approaches.” 

The model and analysis were published in Nature in a report titled, “A foundation model of transcription across human cell types.” 

The newly developed model addresses a gap in current cellular biological research methods. Conventional methods are retrospectiveanalyzing what has already happened or, in some experiments, what is currently happening in a cell. These are often experiments involving incremental condition changes, which assess cellular response to the conditions.  

Current methodologies do not precisely predict the multitude of potential changes that could occur. Rabadan and his team developed an AI model that would instead be prospective rather than retrospective.  

“Having the ability to accurately predict a cell’s activities would transform our understanding of fundamental biological processes,” Rabadan said. He posits that the impact of this new technology could “turn biology from a science that describes seemingly random processes into one that can predict the underlying systems that govern cell behavior.” 

Rabadan and his team aimed to use AI models for normal cells, however, they recognized that the majority of models are focused on specific cell types and diseases. “Previous models have been trained on data in particular cell types, usually cancer cell lines or something else that has little resemblance to normal cells,” Rabadan said. 

The Columbia team developed their AI model—GET (general expression transformer)—to “uncover regulatory grammars across 213 human fetal and adult cell types.” GET uses chromatin accessibility data and genome sequences to predict gene expression patterns from millions of cells obtained from normal human tissues. These data trained GET to understand how cells function generally, which allows for predictive capabilities in normal or diseased cells.  

GET functions similarly to how ChatGPT and other language-based AIs work. The model developed rules for how cells function, like the grammar rules at the root of large language models (LLMs). “Here it’s exactly the same thing: we learn the grammar in many different cellular states, and then we go into a particular condition—it can be a diseased or it can be a normal cell type—and we can try to see how well we predict patterns from this information,” explained Rabadan. 

In describing the novel functionality of their model, the authors wrote, “GET also shows remarkable adaptability across new sequencing platforms and assays, enabling regulatory inference across a broad range of cell types and conditions, and uncovers universal and cell-type-specific transcription factor interaction networks.” 

The model was evaluated for its use in diseased cells including the use of pediatric leukemia as an applied model. GET was used to predict mutations that disrupted the interaction of transcription factors in lymphocytes that “explains the functional significance of a leukemia risk predisposing germline mutation,” the authors wrote. The prediction by GET was confirmed with laboratory experiments.  

In addition to its use in understanding the functional impacts of mutations in diseased cells, GET can also be extended to exploratory research into noncoding, often regulatory, regions of the genome.  

“The vast majority of mutations found in cancer patients are in so-called dark regions of the genome. These mutations do not affect the function of a protein and have remained mostly unexplored,” said Rabadan. “The idea is that using these models, we can look at mutations and illuminate that part of the genome.” 

GET’s ability to generate accurate predictions of cellular function in both healthy and unhealthy cells opens doors to new ways of experimentation and may improve the speed and accuracy of understanding complex diseases. With the continued advances in AI generally, Rabadan envisions grander applications of this and similar models in future research.  

“It’s really a new era in biology that is extremely exciting; transforming biology into a predictive science.”

Previous articleHighly Multiplexed Analysis of Membrane Proteins for Biologics Development Using Virion Display Technology
Next articleBase Editing Addresses Inherited Macular Degeneration in Primate, Human Tissue Models