The Next Generation of Drugs Will Be Enhanced by Machine Learning

In 1950, Alan Turing discussed the mathematical possibility of artificial intelligence (AI) in his paper Computing Machinery and Intelligence. As a truly revolutionary paper, it was ahead of its time. Computers predating 1950 lacked the capability to store commands, making learning impossible. However, as technology progressed, scientists were able to work toward an AI proof of concept.

In 1956, the RAND Corporation (Research AND Development) presented Logic Theorist, a program designed to mimic the problem-solving skills of a human. In the nearly 70 years since, AI has continued to evolve, overcoming significant challenges, and today it has the benefit of a world full of data, novel techniques, and dramatically greater processing power.

We are well beyond AI being science fiction lore of the sort that depicts humanoid robots. In drug development, chemists and machine learning (ML) systems can work together to solve complex drug discovery problems. ML systems aren’t here to replace the established medicinal chemistry processes, but to allow medicinal chemists to operate at a vastly different scale than they could using traditional approaches. With ML tools, medicinal chemists can search much larger universes of potential compounds.

Medicinal chemists who rely on traditional drug design processes must compromise creativity to deal with practicalities such as complex syntheses, low-throughput processes, and tight budgets. A medicinal chemist must first generate a lot of good ideas and then prioritize them. Only a few ideas can progress to testing.

Grant Wishart [CharlesRiver]

Using ML models to expand the idea space and to virtualize testing allows chemists to evaluate several orders of magnitude greater numbers of compounds. Therefore, these models give chemists a better chance of identifying structures that possess the multiple (often competing) desired properties that make a drug a drug.

For instance, a drug that needs to have high affinity for the target of interest might also need to minimize binding to other similar off-targets that cause toxicity, all while having specific permeability, metabolic stability, molecular weight, and plasma protein binding properties. Given a target, an ML model can extract the critical features needed to hit that target, and similarly learn from existing data for drug successes and failures to converge on a drug much more rapidly.

The exciting part is that such methods have the potential to provide new chemistries and to optimize these chemistries far more efficiently, improving the ability of medicinal chemists to generate safe, effective medicines. Ultimately, this is not a case of AI displacing medicinal chemists, but of AI empowering medicinal chemists—by helping them adopt more efficient processes.

Traditional discovery gets an upgrade

The traditional drug discovery process has increased in scale, but it has not fundamentally changed in several decades. Most projects still involve brute-force screening followed by chemical-intuition-driven workflows. Computation’s primary role has been to aid the intuition of chemists. For a given target, early discovery groups rely upon their knowledge and experience to design a biological assay suitable for a high-throughput screen with the goal of getting starting points for future compounds. Given these initial hits, there is a subsequent trial-and-error process to refine compound sets until, in the successful cases, there are one or more series of compounds to push forward.

It may seem obvious, but AI enables the ingestion and analysis of far more information than humans can do on their own. However, the benefits go way beyond simple information gathering. The data can be abstracted, and the AI can start decoding patterns across billions of compounds. For instance, a specific chemical feature may be responsible for binding affinity or for catalyzing a particular reaction. The AI tools now available let chemists evaluate chemistry spaces that are thousands of times larger than the ones that could have been evaluated before.

This then allows researchers to push more quickly through each iteration of the design-make-test-analyze (DMTA) cycle, and to use fewer iterations and fewer compounds per iteration, ultimately leading to a more efficient optimization trajectory.

GuidoLanza ValoHealth

Over the past 20 years, ML tools have become increasingly mainstream, enhancing our ability to benefit from the vast amounts of data, public and private, generated by the scientists who came before us. It would of course be impossible for a chemist to hold every known, publicly accessible chemical paper in their mind at one time. It’s equally impractical for a chemist to remember every piece of data they themselves have personally generated for even a single program, or to trade off more than a few competing objectives in a lead optimization program. Machine learning, however, can do this relatively easily, allowing us to benefit from the decades of work that have gone into digitizing the medicinal chemistry data in the public domain.

These public datasets come in various forms. Some are specialized, well-characterized datasets, others are from published literature, and others are from patents. Each has different degrees of labeling, standardization, and utility, but all can eventually be incorporated into novel AI tools.

Humans and machines together create a new discovery landscape

To achieve integrated human/AI drug discovery, we need to think differently. We need to know how we can generate data to fuel the AI models, and how we can translate those models in the laboratory to make actual compounds. The novelty is now in the process. We can decide how best to use AI alongside traditional chemistry and human intelligence to increase efficiency. We can let the computer virtualize as many activities as possible—synthesis, testing, or even the entire DMTA cycle—and give ourselves the ability to work at much larger scales.

As an example, consider high-throughput screening. A typical screen will run thousands or hundreds of thousands of compounds against a given target, but with AI-driven virtual screens, we have the option of searching billions or trillions of reasonable drug-like compounds. The key, then, is to ask, “How can we best scan and score billions of compounds?”

For a drug discovery service like Logica, this human/AI partnership is crucial. Logica draws on the chemical research capabilities of Charles River Laboratories and the AI technology of Valo Health. Both worlds are leveraged by Logica to enhance early discovery and even allow a fundamental rethink of how the entire drug discovery process is conducted.

To see how, consider this example: A target is suggested by a partner that intends to hit that target with a new drug candidate. If there is a lot of data already available on the target, initial AI models can be generated to predict binding. A high-throughput screen may not be necessary, significantly reducing project timelines. If a lot of data is not available, the process can be applied more broadly using high-throughput screening or DNA-encoded library approaches to generate data to seed model building. Novel compounds are suggested, analyzed, and tested with the aid of human and automated chemical research, generating more target-specific data for further refinement.

This iterative process, which is a tightly coupled mix of predictions and experimental validations, eventually culminates in the selection of advanceable leads. These leads undergo further refinement to ensure that they meet the additional criteria to support in vivo testing and, eventually, IND-enabling studies. Here, the goal for the AI algorithms is to generate models that have been localized, meaning models that are tuned to predict properties very accurately in the specific chemistries being advanced. The AI-enhanced optimization can result in fewer iterations in the DMTA cycle, and in better decisions about which compounds to progress, enhancing the development of preclinical candidates.

It is important to note that for the process to work, we need to insert a different level of intentionality into the process of laboratory data generation. Am I making and testing a compound to build an ML model, or am I making a compound to exploit the models to look for solutions to my discovery process?

For example, early on, a program might focus on data generation that provides as many diverse scaffolds as possible to build initial ML models. Later, those models might be used to select compounds and not just to optimize within the chemical space to provide an advanceable lead series that is both patentable and able to test the biological hypothesis in the laboratory.

In each iteration of the DMTA cycle, the models learn by incorporating new data, building on existing knowledge and incrementally filling gaps. This level of data intentionality represents a set of capabilities that allow the combined team of drug hunters, from both Charles River and Valo Health, to drive toward a candidate much more efficiently.

An example of the benefits of this approach is a study conducted by Logica to find inhibitors of a protein tyrosine kinase inhibitor with significant—but common (with kinases)—challenges around both selectivity and crowded intellectual property landscapes. Using a strategy of applying high-throughput screening, DNA-encoded library, and AI approaches in parallel, the team identified hit matter from all approaches across a number of chemotypes. Subsequent AI-guided chemistry optimization cycles very quickly led to the identification of novel advanceable chemotypes suitable for further optimization.

In the end, what does this mean for the role of AI in drug discovery? What is the point of all this effort? The goal is not just to save time, money, or resources, or even to simply get drugs to patients faster—though any one of these goals would justify the AI-driven approach.

By integrating data, AI, and human ingenuity in this way, you change the risk profile of entering a small-molecule program. Currently, there is a daunting uncertainty that accompanies small-molecule discovery, where a negative experimental result could force the team to backtrack for months or years. With this approach, you enter every new phase of discovery with a quantifiable estimate of risk—a real understanding of how successful you are going to be. These approaches are only the beginning but show that AI, coupled with data generation, will irreversibly change the way you prosecute drug design problems.


Grant Wishart, PhD, is senior director, small molecule drug discovery and Logica lead at Charles River Laboratories. Guido Lanza is vice president of integrated research and Logica general manager at Valo Health.