The past few years have seen several flashy demonstrations of how artificial intelligence (AI) algorithms may transform biomedical research, particularly with respect to drug discovery. This past November, for example, Google’s AI subsidiary, DeepMind, announced that its AlphaFold program could deliver computational predictions of protein structure that approach the quality of those provided by gold-standard experimental techniques such as X-ray crystallography.1

Such high-profile announcements have elicited justifiable excitement about the future of algorithmically guided drug development, but AI’s champions in the industry remain wary about overselling the technology’s current capabilities. “I still feel that there’s a lot of hype around it,” says Paul Nioi, PhD, senior director of research at Alnylam Pharmaceuticals. “Companies are springing up that claim to solve all the issues of drug discovery, target discovery, and development using AI. I think that’s yet to be proven.”

Nevertheless, a growing number of companies now recognize the value that AI—and more specifically, the subset of algorithmic techniques known as “machine learning”—can deliver at various stages in the drug discovery process. “There’s an increase in investment across all of the companies that I’ve talked to,” relates Peter Henstock, PhD, machine learning and AI lead at Pfizer.

“The capabilities are still being sorted out,” Henstock adds. “In most cases, we’re still negotiating how to go about using it effectively.” But the opportunities are clear, and machine learning–based techniques are already finding a place in early-stage target discovery and drug development workflows—and offering a glimpse of the gains in efficiency and success rates that the future could bring.

A deep dive into the literature

The vast majority of biomedical data are imprisoned in unstructured formats that are, in their raw form, inaccessible to computational analysis. Data are trapped within publications, patents, clinical records, and other documents that are exclusively targeted at human readers. Natural language processing (NLP) algorithms offer a powerful solution to this problem. These employ a machine learning technique known as deep learning to analyze documents and other datasets and identify biologically relevant text elements such as the names of genes, proteins, drugs, or clinical manifestations of disease.

NLP algorithms can rapidly comb through vast collections of data and identify previously overlooked patterns and relationships that might be relevant to a disease’s etiology and pathology. Henstock’s team used such an approach to scrutinize PubMed’s tens of millions of abstracts. “Just by text mining,” he points out, “we could basically take some genes and figure out what diseases might be related to them.”

The Pfizer group subsequently incorporated other dimensions into its analysis, looking at publication patterns to identify “trending” areas of disease research where rapid scientific progress might offer a solid foundation for rapidly shepherding a drug development program into the clinic. According to Henstock, this approach achieved a greater than 70% success rate in identifying patterns of heightened disease research activity that ultimately gave rise to clinical trials.

The value of NLP-mined data can be greatly amplified by threading together structured data from multiple sources into a densely interconnected “knowledge graph.” “We have huge amounts of data in different areas, like omics—chemical-related, drug-related, and disease-related data,” says Natnael Hamda, an AI specialist at Astellas Pharma. “Ingesting that data and integrating those complex networks of biological or chemical information into one collection is tricky.”

The rewards, however, can be considerable, as the resulting interconnections tell a much richer biological story than any one dataset on its own. For example, these interconnections can enable more sophisticated predictions of how to develop therapeutic agents that safely and effectively target a particular medical condition.

AI-focused startup Healx relies heavily on knowledge graphs assembled from a diverse range of public and proprietary sources to gain new insights into rare genetic diseases. The company’s chief scientific officer, Neil Thompson, PhD, notes that this category encompasses roughly 7,000 diseases affecting on the order of 400 million patients in total—but treatments are available for only 5% of these disorders.

“Our main focus is on finding new uses for old drugs,” says Thompson, “We are working with all the data on the 4,000 FDA-registered drugs, and we are building data on drugs registered elsewhere in the world.” This information is complemented by data provided by the various patient organizations with which Healx collaborates.

According to Thompson, this approach has yielded an excellent success rate, with multiple disease programs yielding candidates that demonstrated efficacy in animal models. One of these programs, a treatment for the intellectual developmental disorder fragile X syndrome, is on track to enter clinical trials later this year.

Getting a clearer picture

Machine learning is also effective for extracting interesting and informative features from image data. Alnylam has been using computer vision algorithms to profile vast repositories of magnetic resonance imaging (MRI) data collected from various parts of the body in tens of thousands of patients with various medical conditions. “We’re training a model based on people that we know have a certain disease or don’t have a certain disease,” says Nioi, “and we’re asking the model to differentiate those two categories based on features it can pick up.”

One of the lead indications for this approach is nonalcoholic steatohepatitis (NASH), a hard-to-treat condition in which fat accumulation in the liver contributes to inflammation and scarring—and ultimately, cirrhosis. NASH is a chronic condition that gradually worsens over time, and MRI analysis could reveal early indicators of onset and progression as well as biomarkers that demonstrate the extent to which a therapy is preventing the disease from worsening.

“The idea is to put the disease on a spectrum, so that you can look for these different points of intervention,” explains Nioi. He notes that this approach has already led to some promising drug targets, and that his company is now looking to apply a similar approach to neurological disease.

Several other companies are using machine learning–based image analysis to go deeper into their analyses of disease pathology. For example, cancer immunotherapies can vary widely in their efficacy because of differences in the structure and cellular composition of the “microenvironment” within a tumor, including the strength of the local immune response.

“We can now apply computer vision to identify the spatial condition of the tumor microenvironment,” asserts Hamda. “We are getting huge amounts of omics data at the cellular level, and we can characterize the microenvironment of a tumor cell by applying deep neural networks.” This kind of approach is proving valuable in analyses of the extensive (and publicly available) datasets that are being generated by the Human Cell Atlas Project, The Cancer Genome Atlas Project, and other initiatives.

Progress has been slower in terms of applying AI for the actual design of drugs themselves, but machine learning continues to be explored as a means of improving the performance of existing drug candidates. “Small-molecule drugs are a multivariate optimization problem, and humans are not very good at doing that,” observes Henstock. His team is working with self-training algorithms that can tweak such compounds based on a variety of physicochemical criteria, and he believes a similar approach should also be suitable for antibody drugs—a class of proteins for which the structural and biochemical features are particularly well defined.

Alnylam is also using machine learning to fine-tune its therapeutics, which are based on chemically modified RNA sequences that directly interfere with the expression of disease-related genes. “We’re thinking about designing molecules that optimally knock down a target—[and] only the target that you’re interested in—without broad effects on the transcriptome,” says Nioi.

AI on the rise

After 20 years at Pfizer, Henstock is seeing unprecedented enthusiasm around making AI a core part of the company’s processes. “I ran an AI course three years ago … that seemed to change a lot of the conversations,” he recalls. “We had executive meetings that actually put themselves on hold so they could attend this session.”

Artificial Intelligence
According to MarketsandMarkets, AI in drug discovery represents a market that will attain a global value of $1.4 billion by 2024, up from $259 million in 2019, reflecting a compound annual growth rate of 40.8%. This growth is driven by factors such as the growing number of cross-industry collaborations and partnerships, the increasing need to develop drugs more quickly and cost effectively, the rising adoption of cloud-based computing, and the impending patent expiry of blockbuster drugs. [metamorworks/Getty Images]
A similar transition is playing out elsewhere. At Astellas, for example, investments in AI are extensive. “There are so many big AI or machine learning initiatives here,” says Hamda, who began working with Astellas as a research fellow in 2020.

It takes skilled experts who know their way around the dense forest of available algorithms to make the most of these capabilities. Hamda notes that choosing the wrong computational approach for a given research question can be disastrous in terms of wasted time and resources.

“I fear that some companies are developing ‘black box’ software,” he continues. “[It’s possible that] people are just entering input and collecting output without knowing what’s going on inside.” He emphasizes the importance of planning and building explainable and reproducible computational workflows that allow the users to trust the quality of the resulting models.

Although many machine learning techniques are generalizable across fields, effective implementation in the context of drug discovery also requires deep expertise in the relevant scientific disciplines. Henstock recalls starting at Pfizer as an engineer with a doctorate in AI.

“I couldn’t understand the chemists or the biologists—they speak their own language and have their own concepts,” he says. “And if you can’t understand what they’re trying to get at, you can’t really do your job very well.” This disconnect motivated him to return to school for a biology degree.

Building out such capacity is costly and labor intensive, and some companies are opting for a hybrid model in which some AI-oriented projects are contracted out to smaller startups for which computational biology is the primary focus. For example, Alnylam has partnered with a company called Paradigm4 for key aspects of its machine learning–guided drug development efforts. “It’s really down to resources,” declares Nioi. “There are people that do this for a living and spend their entire time focused on one problem, whereas we’re juggling many things at the same time.”

But in the long run, the gains from bringing AI on board could be huge. In a 2020 article, Henstock cited projections indicating that the pharmaceutical industry could boost earnings by more than 45% by making strong investments in AI.2 “This means making some interesting tradeoffs in how we do science, how we approach problems, and how we approach our processes,” he explains. “But it’s kind of critical because you can do better experiments with greater richness.”


1. Callaway E. ‘It will change everything’: DeepMind’s AI makes gigantic leap in solving protein structures. Nature 2020; 588: 203–204.
2. Henstock P. Artificial Intelligence in Pharma: Positive Trends but More Investment Needed to Drive a Transformation. Arch. Pharmacol. Ther. 2020; 2(2): 24–28.

Previous articleAddressing the Top Five Challenges to AAV-Based Gene Therapy with ddPCR
Next articleFunding the Next Generation of Cancer Therapies