Regulators favor processes designed with drug quality in mind. The challenge is that as production methods become more complex, established modeling techniques are struggling to cope.

So says Ian Walsh, PhD, staff scientist at the bioprocessing technology institute A*STAR in Singapore, who looked at evolving quality-by-design (QbD) challenges in a study published earlier this year.

“There is vast complexity in the omics data we can derive from the cell culture media, the physiochemical properties of the bioprocess, and other bioreactor read-outs that can be derived from improved characterization,” he explains. “What we are seeing now is the number of CPPs [critical process parameters] are growing beyond the small number of variables that were used in industry even a couple of years ago. Who knows how many CPPs there will be in five years’ time.”

Machine learning

To cope with the CPP increase, industry needs an alternative to multivariate data analysis techniques, Walsh says, citing machine learning (ML) as a potential solution.

“MVDA techniques can mathematically model the relationships between the input CPPs and output variables like titer, cell growth, and critical quality attributes. MVDA methods are popular because of their simplicity and the ease of use,” he continues. “However, as the number of CPPs increase with increasing number of sensors, increasing quality of sensors and ‘deeper-quicker’ assays of the cell culture media, the relationships that exist between CPPs and the bioprocess output variables will likely be nonlinear and require more sophisticated modeling algorithms such as machine learning.”

An ML algorithm can automatically build a model of a real-world problem without being explicitly programmed. It achieves this by examining sample data and optimizing itself in such a way that it can predict outcomes when faced with new data.

And the ability to predict outcomes is where biopharma can benefit, according to Walsh, who adds that “ML can often do better than humans in that one particular task, for example, predicting if a bioprocess is producing a drug that has substandard quality.”

Coding easy, data requirements less so

And there is good news for companies interested in using ML to model processes. Much of the code needed to build the models has already been written, points out Walsh.

“Creating and training the ML algorithm is the easy part—there are many open-source libraries available to do so,” he says. “The hard part is developing a large, diverse dataset with high-quality process data. However, with new sensors such as Raman, high-throughput LC-MS workflows, the development of real-time assays, and the ability to deeply characterize omics we can derive this data for modeling and or training processes.”

At present, ML algorithms are process specific. However, if industry is willing to invest in expertise and collaborate it may be able to create models that are useful to multiple parties, notes Walsh.

“The holy grail would be ML algorithms that have general predictive power, meaning that the algorithm could be used across different plants without retraining it. This would be a challenge, but possible,” he tells GEN.

Whether drug makers would work together on ML development is unclear, Walsh says, as “the data is extremely valuable to each company.” However, the benefits of such collaboration have already been demonstrated elsewhere.

“In other biological domains some interesting algorithms have been developed because all the data was shared,” he explains. “For example, the protein data available in the protein databank and UNIProt has led to an interesting ML algorithms such as alphafold developed by Google’s deepmind.”