Despite the buzz around artificial intelligence (AI), most industry insiders know that the use of machine learning (ML) in drug discovery is nothing new. For more than a decade, researchers have used computational techniques for many purposes, such as finding hits, modeling drug-protein interactions, and predicting reaction rates. What is new is the hype. As AI has taken off in other industries, countless start-ups have emerged promising to transform drug discovery and design with AI-based technologies for things like virtual screening, physics-based biological activity assessment, and drug crystal-structure prediction.

Investors have made huge bets that these start-ups will succeed. Investment reached $13.8 billion in 2020. And, more than one-third of large-pharma executives report using AI technologies. While a few “AI-native” candidates are in clinical trials, around 90% remain in discovery or preclinical development, so it will take years to see if the bets pay off.

Along with big investments comes high expectations—drug the undruggable, drastically shorten timelines, virtually eliminate wet lab work. Insider Intelligence projects that discovery costs could be reduced by as much as 70% with AI. Unfortunately, it’s just not that easy. The complexity of human biology precludes AI from becoming a magic bullet. On top of this, data must be plentiful and clean enough to use. Models must be reliable. Prospective compounds need to be synthesizable. And drugs must pass real-life safety and efficacy tests.

While this harsh reality hasn’t slowed investment, it has led to fewer companies receiving funding, devaluations, and discontinuation of some more lofty programs, like IBM’s Watson AI for drug discovery. This begs the question: Is AI for drug discovery more hype than hope? Absolutely not. Do we need to adjust our expectations and position for success? Absolutely. But how? Implementing AI in drug discovery requires reasonable expectations, clean data, and collaboration. Let’s take a closer look.

Reasonable expectations

AI can be a valuable part of a company’s drug discovery program. But, for now, it’s best thought of as one option in a box of tools. Clarifying when, why, and how AI is used is crucial, albeit challenging. Investment has largely fallen to companies developing small molecules, which lend themselves to AI because they’re relatively simple compared to biologics, and because there are decades of data upon which to build models.1,2 There is also great variance in the ease of applying AI across discovery, with models for early screening and physical-property prediction seemingly easier to implement than those for target prediction and toxicity assessment.3,4

While the potential impact of AI is incredible, we should remember that good things take time. Pharmaceutical Technology recently asked its readers to project how long it might take for AI to reach its peak in drug discovery, and by far, the most common answer was “more than nine years.”

Clean data

“The main challenge to creating accurate and applicable AI models is that the available experimental data is heterogeneous, noisy, and sparse, so appropriate data curation and data collection is of the utmost importance.”

This quote from a 2021 Expert Opinion on Drug Discovery article speaks to the importance of collecting clean data. While it refers to ADEMT, and activity prediction models, the assertion also holds true, generally. AI requires good data, and lots of it.

But good data are hard to come by. Publicly available data can be inadequate, forcing companies to rely on their own experimental data and domain knowledge. Unfortunately, many companies struggle to capture, federate, mine, and prepare their data, perhaps due to skyrocketing data volumes, outdated software, incompatible lab systems, or disconnected research teams. Success with AI will likely elude these companies until they implement technology and workflow processes that lets them facilitate error-free data capture without relying on manual processing, handle the volume and variety of data produced by different teams and partners, ensure data integrity, and standardize data for model readiness.


Companies hoping to leverage AI need a full view of all their data, not just bits and pieces. This demands a research infrastructure that lets computational and experimental teams collaborate, uniting workflows and sharing data across domains and locations. Careful process and methodology standardization is also needed to ensure that results obtained with the help of AI are repeatable.

Beyond collaboration within organizations, key industry players are also collaborating to help AI reach its full potential, making security and confidentiality key concerns. For example, many large pharmaceutical companies have partnered with start-ups to help drive their AI efforts. Collaborative initiatives, such as the MELLODDY Project, have formed to help companies leverage pooled data to improve AI models. And vendors like Dotmatics are building AI models using customers’ collective experimental data.



  1. Halford, B. Chemical and Engineering News2022, 100 (2). New drug approvals held steady in 2021.
  2. Buvalia, A. BiopharmaTrend.com2022. Will Biologics Surpass Small Molecules In The Pharmaceutical Race?
  3. R&D Software Market Study; McKinsey & Co.; Jan 2022
  4. Life Science’s Lab Informatics Digital Criteria to Separate Vendor Leaders From Laggards, Gartner 2022,12,20
  5. Bigal, M. The Medicine Maker. 2021. A Reawakening of Small Molecule Drug Development.
  6. Trager, R. Chemistry World. 2014, Almost half of US researchers’ time goes on admin.


Haydn Boehm, PhD, is director of product marketing at Dotmatics.

Previous articleTumor Associated Fungal Signatures May Offer Diagnostic Insights
Next articleUnexpected Culprit behind Circadian Rhythm and Cancer Link Exposed