Patterns and commonalities that escape human notice simply because they are beyond human comprehension needn’t remain hidden. They can be revealed with the help of artificial intelligence (AI) and machine learning (ML), as has been demonstrated in many disciplines.

However, in the world of drug discovery, it seems that AI and ML have taken off only recently. Hardly a year has passed since Insilico Medicine’s INS018_055 became the first AI-generated drug to enter Phase II trials. Nonetheless, through the work of pioneering scientists, drug discovery is gaining on other disciplines in realizing the benefits of AI and ML. Indeed, as this article relates, there are several outstanding examples of scientists who are generating creative ideas and producing innovative solutions in AI-enabled drug discovery.

Creating drugs in computational space

Nicolas Tilmans, PhD
Nicolas Tilmans, PhD
Founder and CEO, Anagenex

For many biological problems, the size and quality of the data sets available fundamentally limit the application of a lot of these state-of-the-art models.

“When you look at the kinds of datasets that exist in pharma, the biggest screening decks tend to be on the order of a single-digit million compounds,” Nicolas Tilmans, PhD, CEO and founder of Anagenex, told GEN. “If you look at what people tend to do with ChatGPT, you’re training on the entire internet.”

Jen Nwankwo, PhD
Jen Nwankwo, PhD
Co-founder and CEO, 1910 Genetics

At the very front end of the drug discovery process for Anagenex is a custom built several billion compound library tested in a lab under dozens of conditions. Anagenex scientists are generating huge amounts of data to feed an AI engine capable of designing small-molecule oncology medicines, specifically in the context of synthetic lethality.

At 1910 Genetics, the starting point is synthetic data. Jen Nwankwo, PhD, founder and CEO of 1910 Genetics says, “In the biological context, when tech people talk about synthetic data, they are talking about data that AI can create. You can find new ways to increase your corpus of data so that your ML models can have greater depth and breadth on which to perform.”

But designing molecules with AI comes with the major hurdle of making relevant molecules that can actually be synthesized and follow the rules of nature.

AI predictions often generate many false-positives. For example, when performing molecular docking predictions, if the parameters aren’t right, the software will fit a ligand into a binding site regardless of whether it will actually bind experimentally and have the predicted function. Even under the best of circumstances, purely computational approaches frequently miss the mark.

“We’ve seen models that create these crazy-looking chemical molecules that have carbon connected to five bonds—it’s organic chemistry 101,” Nwankwo remarks. “Or sometimes you have AI create molecules that look like they could exist in nature, but when you try to make them, you find that you have a 25% yield, or that you can’t even make them at all.”

Anchoring compounds in reality

The validation of AI-generated compounds cannot be done entirely in the computational ether. These compounds must also be validated experimentally. However, experimental validation is not purely for the sake of giving the green light to whatever candidates were originally predicted and moving them forward to clinical investigation. Experimental validation is essential for building a loop in which experimental data serves as new data for the AI tools and drives iterative improvements in successive predictions.

According to Tilmans, “ChatGPT applications work not only because they trained on the whole internet, but because they have a bunch of people who are doing this thing called reinforced learning with human feedback to make it so it doesn’t go nuts—well, ChatGPT still says crazy things, but less than it would otherwise as a result of just a lot of curated data and very large volumes of it.”

That’s why Tilmans believes that the only way to tackle these drug discovery problems is by having a really strong laboratory in addition to AI, and by having the laboratory and computational facilities work together like synergistic gears in a transmission.

Anagenex’s scientists experimentally test physical, real-world compounds using a mix of technologies. For example, the scientists feed DNA-encoded library (DEL) entries and affinity selection mass spectrometry (AS-MS) measurements into proprietary ML algorithms to design the next “evolved” generation of compounds. Then the scientists synthesize and test the compounds at the company’s Massachusetts laboratory. Tilmans asserts that in the span of a month, Anagenex can synthesize 100 million small molecules through AI-designed DELs.

“You can’t just be a computational company,” Tilmans insists. “About two-thirds of our employees are working in the laboratory. We have this initial set of two billion compounds that we have built ourselves that allows us to get a first idea of a big dataset, and then we can build it. We can refine at the order of millions of data points on the back end.”

At 1910 Genetics, scientists use a similar approach, one that involves what Nwankwo calls “wet lab proxy biological data.” Experiments are run that are surrogates for ground truth assays done at scale. For example, the most fundamental assay for measuring protein expression is a western blot, but no one can run enough western blots to generate the scale of data needed to train ML models.

In such a situation, scientists at 1910 Genetics consider how they could design a proxy asset that would, as Nwankwo puts it, get the company “about 70–80% of the way” to knowing whether the protein is expressed. “Ideally, it should be an assay that we can scale up using things like next-generation sequencing,” she continues. “We do that and generate even more data. We are talking about millions of data points per day using the proxy assay.”

To test these massive data sets and create an iterative loop between experimental data and synthetic data, 1910 Genetics built an automated laboratory in the Seaport District of Boston.

Finding medicines hidden in the human body

Not everyone is using AI to synthesize compounds out of thin air. Some are using AI to extract drug development solutions from nature. For the latter approach, an orders-of-magnitude increase in the number of samples and the volume sequencing data has become available in the past few years.

Jason Park, PhD
Jason Park, PhD
Co-founder and CEO, Empress Therapeutics; Operating Partner, Flagship Pioneering

According to Jason Park, PhD, operating partner at Flagship Pioneering and co-founder and CEO of Empress Therapeutics, there are tons of human samples—biopsies, stool, swabs, and spit—that may be hiding some relevant “chemistry” for developing drugs. And some of this chemistry may be found in the human microbiome.

“The clues are in the DNA,” Park emphasizes. “The reason why looking at microbes in the body is really interesting is that if you’re going to make a drug, it’s got to be safe, it’s got to be compatible with human physiology, and it’s got to act on something.” He adds that Empress is confident about its approach, which involves “actually starting with compounds and DNA sequences that are already inside the body.”

Empress scientists are using AI to help recreate natural biosynthetic pathways to make new medicines. To do so, these scientists are decoding DNA’s language for generating enzymes that drive biosynthetic pathways.

“If you look at some of the most important chemotherapeutics, they are actually a bunch of compounds synthesized in nature by biosynthetic processes: enzymes catalyze chemical reactions to make a specific chemical compound,” Park points out. “Since enzymes are in DNA, you can now look at a compound and figure out the genes that made it or vice versa. You can look at gene sequences and make predictions based on the chemistry they make.”

“If there is a molecule that interacts with that protein, we are going to find it,” he declares. “Then we will ask, ‘Is this good enough to become a drug or not?’
If the answer is no, we will move on to the next target.”

But recreating biosynthetic pathways is only half the battle. The other half is figuring out what these pathways do and in what context. To do that, Empress is using AI to make a genetic association between a particular chemical compound and a disease state. Empress uses ML techniques such as causal inference, causal discovery, and causal enrichment, where computational platforms generate various models to find connections between genetic variants and phenotypes to explain health and disease outcomes.

1910 Gen
To advance small- and large-molecule drug discovery, 1910 Genetics employs a multimodal AI platform powered by laboratory automation. The platform’s name—Input, Transform, Output (ITO)—reflects the company’s comprehensive approach. ITO activities include the generation of proprietary data streams and the orchestration of data, AI, and laboratory information in iterative training and testing loops.

Park elaborates, “We’re asking, ‘What chemistry has a variation between a healthy person and a diseased person?’ We can tell you there’s chemistry in here that explains whether that patient responded or not or whether they have disease or not. If so, you might make the hypothesis that chemical compounds protect you against that disease.”

If nature has already figured out a way to hit a target, there’s no need to tackle challenging tasks such as designing a compound from the ground up and determining whether it has the right capabilities. For example, a compound may need to assume the right conformation or be suitable for systemic delivery.

Although people are excited about AI because it is better at doing certain tasks than humans when it comes to solving drug discovery problems, it is not going to beat nature at making medicines across the board. Whether all the solutions already exist in nature remains to be seen. The question is how to use AI tools in the most effective way, whether it is through synthesizing artificial compounds or plucking them out of the rich natural world.

Previous articleDendritic Cell Vaccine Improves Survival Compared to Standard of Care in Phase I Trial for Glioblastoma
Next articleEpigenetic Therapy Targets Endocrine-Resistant Breast Cancer