Researchers in New Jersey say they have developed a machine learning model for estimating the viscosity of monoclonal antibody (mAb) formulations in collaboration with AstraZeneca. The new model, according to Pin-Kuang Lai, PhD, a professor of chemical engineering and materials science at the Stevens Institute of Technology, is designed to allow companies to screen out drug candidates before the formulation stage.
The model also marks an attempt to overcome the challenge of having enough training data to implement artificial intelligence in biomanufacturing.
“Antibody production and formulation is expensive so it’s costly to generate datasets,” Lai explains. “A second challenge is data can’t be easily shared, even if a drug candidate is not in the pipeline because of intellectual property considerations.”
AstraZeneca project
To help overcome this problem, AstraZeneca shared 229 mAb sequences and viscosity data with Lai’s team under a confidentiality agreement. The team used the sequences to train a deep learning model to predict which mAbs would be too viscous to use for subcutaneous injection, which is popular because it allows patients to administer drugs at home.
“By developing this model, we can screen out drug candidates that are going to be highly viscous during drug development, and then we don’t need to expend effort finding formulations,” explains Lai. “It’s about streamlining process development by solving problems with the process upfront.”
After training the model using mAb sequences from AstraZeneca, the team tested the model on two independent datasets (one of the commercial mAbs and one from Pfizer.)
“The [model] works,” continues Lai. “The results have an accuracy close to 90% and that’s given us lots of confidence,” adding that a major benefit of the work is they’re able to share the model without releasing the training data.
Several consortia are running similar projects, with companies able to contribute data to model development while hiding confidential information.
“Companies don’t need to share data but can adopt a protocol to train and share a model without revealing their proprietary data,” points out Lai.