When Insilico Medicine’s CEO Alex Zhavoronkov, PhD, and his co-authors wrote their 2023 opinion paper on AI-powered target discovery,1 he was thinking about how often target discovery is misunderstood. In his experience, most large pharmaceutical or biotechnology companies already have ways of selecting therapeutic targets, and in some cases, they already know which ones to pursue. What they are looking for with target discovery solutions is reassurance.

It’s like selecting which stocks to invest money in, he told GEN magazine. Pharmaceutical companies “want to be convinced that the targets that they are going after are the right ones.” They also need to know when to move on those targets. “If you go after them too early, nobody’s going to buy them,” he explained. “If you go after them too late, everybody [else is] going to have a molecule. So, you need to have this kind of balance.”

This position is understandable given the significant investment required to move drugs from discovery through clinical trials and all the way to regulatory approval, and the oft-repeated statistic that 90% of drugs fail during clinical trials. Larger pharmaceutical companies may have the resources to bet on entirely novel targets, but smaller companies have a lot more to lose if their chosen candidate fails.

Rising hopes

Given the challenges of drug development, it makes sense for pharmaceutical companies to invest in solutions that they are convinced will increase their probability of success or, at the very least, lower their costs. And the success of tools like DeepMind’s AlphaFold protein structure prediction software as well as generative AI like OpenAI’s ChatGPT have boosted scientists’ interest and confidence in AI as the next big technology for drug discovery.

Step onto the floor of almost any scientific conference these days, and “AI is very much on everyone’s lips,” Timothy Cheeseright, PhD, chief technology officer at Cresset Biomolecular Discovery, told GEN during a conversation at this year’s Bio-IT World Conference held in Boston. Although Cresset does not focus specifically on target identification, the company provides computational chemistry solutions that use some AI-based methods for drug discovery. Cheeseright believes that some forms of AI like large language models can help medicinal chemists and biologists work more productively. In fact, Cresset is working on integrating generative AI into its portfolio of drug design solutions and recently hired Mutlu Dogruel, PhD, an expert in applying large language models, from Microsoft to drive its strategy.

It goes without saying that part of what’s driving the current interest in AI is the diversity and size of biomedical research datasets. As a neuroscientist, Jonathan Witztum, PhD, understands the big data problem all too well. He now works as the chief technology officer of the US subsidiary of Syntekabio, a publicly traded AI-based drug discovery company headquartered in South Korea with offices in New York. But he previously worked in a neuroscience laboratory where he generated terabytes of images and video of the brain and behavior that quickly outpaced the capacity of regular analysis techniques. The scientific community is “very good at collecting tons of data,” he says. AI technologies, he adds, are “a very good way to deal with tons of data in a fairly quick way [that’s] also consistent.”

Cresset Biomolecular Discovery’s offerings on computer screen
Cresset Biomolecular Discovery’s computational chemistry solutions and service offerings can expedite drug discovery. For example, the company recently released the latest version of its Flare drug discovery platform, which aims to improve ligand and structure-based drug design and lead optimization. Flare can be integrated with Cresset’s KNIME analytics platform to enable ligand comparisons and investigations, docking and scoring experiments, the building of 3D models, and the scoring of new molecular designs.

And there is promising evidence about the potential of drugs found with AI’s help. Candidates from companies that use AI-driven drug discovery such as BenevolentAI, Insilico Medicine and Recursion Pharmaceuticals have brought drug candidates to clinical trials, although whether these candidates will make it through to approval remains to be seen. One recent analysis found that as of December 2023, “24 AI-discovered molecules have completed Phase I trials, of which 22 were successful”—an 80–90% success rate. That same analysis found that 10 AI-discovered molecules had completed Phase II trials as of late last year with a 40% success rate.2 And in their paper, Zhavoronkov and colleagues point to the possibility of combining synthetic data and AI models to find targets and drugs for rare diseases where there is limited patient data available.

Lingering doubts

But whether AI can deliver remains an open question. Many scientists agree on the need for caution about touting AI’s benefits while the community works out exactly which techniques work best for drug discovery. It’s clear that the technology is here to stay and that there are clear applications where it seems to work well. However, as Cheeseright observes, “We are still in the hype curve.”

The degrees to which distinct forms of AI can have an impact are vastly different. For years, scientists have used machine learning, a branch of AI, in drug discovery, but in Cheeseright’s view, applying things such as deep neural networks requires “stacks of data—and that data does not exist in preclinical research.”

And of course, AI models are not magical things. No matter how smart an algorithm is, as with any computational system, it is human-made and can make mistakes. And there are still complex testing and validation wet lab studies that need to be done to ensure that drugs and targets bind as expected—safely and with enough affinity to be turned into new drugs. And what of the notion that a purely in silico approach could predict the success of a compound? Perhaps, Witztum says, “Its something we are all aspiring to and working hard to accomplish, but there is a long way to go.”

Potential solutions

Algorithms that were once discussed primarily in the halls of academia and at select scientific conferences have now made their way far beyond those spaces. Pharmaceutical companies across the board are moving forward with large investments aimed at integrating AI into their processes. And whether this is due to clever rebranding or intentional investment, a growing list of software and services companies now claim to have AI-based solutions for almost all aspects of drug discovery. The solutions for target identification, for example, range from machine learning to generative AI. With these solutions, models are trained extensively using various forms of omics data as well as information on drug-target interactions, clinical trials, pharmacokinetics predictions, textual data, and much more to capture relationships between genes, proteins, pathways, and drugs.

AstraZeneca, for example, has worked with BenevolentAI to identify novel targets for chronic kidney disease and idiopathic pulmonary fibrosis. The list also includes companies like Atomwise, whose AI-based software, AtomNet, uses structure-based drug design to find novel small molecules for protein targets. Companies such as Insilico Medicine have also licensed internally developed preclinical candidates identified using AI to pharmaceutical companies for large payouts—about half a billion dollars in Insilico’s case.

Atomwise drug design
Atomwise uses AI for its structure-based drug design technology, which is designed to enable scientists to predict how well a small molecule will bind to a target protein of interest. The technology can also lessen reliance on empirical screening. The company asserts that its technology can screen billions of compounds, and has demonstrated success using homology-modeled proteins.

In addition, Zhavoronkov’s company also offers an AI-based target discovery solution, PandaOmics, that combines as many as 60 learning models to create an “orchestra of models,” as Zhavoronkov puts it. These models can work independently or in unison to paint a detailed picture of protein targets. “The key is, never trust just one method,” he says. “You need to look at many, many methods.”

For target discovery, PandaOmics’ models learn from gene expression, disease association, methylation, proteomics, and microRNA data, as well as textual data from published literature and other resources to identify targets. And the scope of the information it covers is substantial. It includes five million omics data samples, three million grants, nearly four million patents, 30 million publications, and 1.3 million compounds and biologics. It also includes a large language model, akin to ChatGPT, that facilitates conversations between researchers and the platform.

For a given disease or disorder, the system prioritizes protein targets based on factors such as the novelty of the target, the degree of confidence in the target’s potential, and the target’s commercial tractability. Users can also evaluate information pertaining to druggability—information about the protein target’s structure, drug-binding ability, and accessibility to small molecules or antibodies. Users can also explore the types of supporting data for each target including the number of samples included in each experimental dataset, the platforms used in previous studies, and any data from any trials for the target. It’s even possible to see details on how much money was invested in generating the datasets and how much it would cost to replicate a similar study.

Insilico Medicine's PandaOmics
Insilico Medicine has developed PandaOmics, a cloud-based software platform that applies artificial intelligence and bioinformatics techniques to multimodal omics and biomedical text data for therapeutic target and biomarker discovery. In a recent paper, the company’s scientists presented several case studies in which PandaOmics’ target identification capabilities were validated (Kamya et al. J. Chem. Inf. Model. 2024; doi: 10.1021/acs.jcim.3c01619).

Insilico Medicine launched PandaOmics about two years ago. Since then, the platform has been tried by thousands of users. And Insilico continues to find ways to expand PandaOmics’ capabilities. Some of its expansion plans emerge from interactions with academics, Zhavoronkov says. In fact, PandaOmics was inspired in part by the work of Open Targets, a consortium of academic and pharmaceutical partners that developed an open AI-based platform for identifying and prioritizing drug targets. “If the community comes up with ideas and we see that one of its methods is interesting,” Zhavoronkov points out, “we add it to the platform with the requisite permission.”

Further down the drug discovery pipeline are companies such as Syntekabio that focus on finding and testing viable candidates for clinical development. When the company was launched in 2009, it planned to provide rare mutation detection services for newborns. It has since pivoted away from that to focus on the drug discovery market and is betting big on AI’s potential. It’s worth noting at this juncture that the computational power required to run AI-based algorithms is not trivial. Syntekabio has invested heavily in its infrastructure by building a computation center that has about 5,000 servers currently, and that will soon have up to 10,000 servers. Syntekabio plans to build at least two more such centers by 2026, meaning it will have a grand total of 30,000 servers to support its computational needs.

“All our modeling is based on the physics of interactions,” Witztum declares. “This enables us to work with almost any target and disease.” For example, if the company wants to find small molecules for a target protein, its AI first considers all of the potential biophysical interactions occurring between the protein’s residues as well as its interactions with ions, and water molecules that enter the pocket. It then designs structures that can fit in the binding pocket of the protein.

And once the AI platform learns the core principles of the interaction, it can apply them to new compounds or antibodies or neoantigens. Syntekabio offers its clients candidates that it believes have the best chance of securing regulatory approval. Each of these candidates is thoroughly vetted. Witzhum states, “We have high confidence that it’s going to work well because we tested it and validated it and got good results.”



  1. Pun FW, Ozerov IV, Zhavoronkov A. AI-powered therapeutic target discovery. Trends Pharmacol. Sci. 2023; 44(9): 561–572. DOI: 10.1016/j.tips.2023.06.010.
  2. Jayatunga MKP, Ayers M, Bruens L, et al. How successful are AI-discovered drugs in clinical trials? A first analysis and emerging lessons. Drug Discov. Today 2024; 29(6): 104009. DOI: 10.1016/j.drudis.2024.104009.
Previous articleCRISPR 2.0: Kinder, Gentler, More Powerful Gene Editing
Next articleMicrobiome Therapeutics: Out of Range or Just Under the Radar?