The environmental impact of drug production has been under the spotlight in recent years. Initially, the small-molecule sector was the focus with reports linking the poor treatment of wastewater at plants making antimicrobial APIs to the spread of antibiotic resistance.
While environmental concerns remain, the emphasis recently has expanded to broader ideas of sustainability and economics. Much of the focus has fallen on process development, with scientists working to understand the impact ingredients, raw materials, and energy needs have on costs and waste production.
For small-molecule production processes with few inputs and outputs sustainability modeling is straightforward, according to Oliver Fisher, PhD, research fellow at the University of Nottingham in the U.K. However, for firms making complex biopharmaceuticals the challenge is much greater.
“The outputs from a biopharmaceutical process can include products, co-products, by-products, and waste streams,” Fisher says. “Each output can have several, and often contradictory, sustainability implications.
“Therefore, a model whose sole aim is to maximize a single output, for example, product yield, cannot evaluate the implications to the overall economic or environmental performance of the process without understanding what effect maximizing the product yield has rest of the process outputs.”
Neural network
To try and address this, Fisher and colleagues used a neural network—linked nodes able to receive and analyze multiple data inputs—to generate a predictive sustainability model they claim is more accurate than established approaches.
“Traditional approaches like first principal modeling require a high level of understanding the underlying physics of how the systems work. Data-driven models, however, are derived from fitting process data to algorithms like neural networks and require less knowledge of the underlying detailed mechanisms of the process,” he explains.
“Data-driven models can actually capitalize on the increasing volume of data being produced, combined with increasing computational power. There is huge opportunity to expand the scope of modeling across the process and supply chain to better assess process sustainability.”
Neural networks have been used for process sustainability assessment before. However, the approach developed by Fisher and colleagues uses a network of networks which, he says, makes for a more accurate model.
“Rather than using one neural network to simultaneously predict all outputs, which suffered from overfitting the data, we explored developing a chain of models using the ensemble of regressor chain method,” he tells GEN. “Overfitting is a real challenge in predictive modeling, [While things are fine where the fit is great for a limited set of data], but when the model has additional data added it fails or is unable to reliably predict future scenarios.
“A regressor chain builds a series of models where each model is built using the output of the previous model as input for the next. The ensemble of regressor chains works by creating multiple regressor chains for every permutation of the output sequence order. This method was able to that capture the relationships between the process outputs, thus providing an understanding of the knock-on effects of changing one output on the others.”
Monitoring requirements
High quality data are key, according to Fisher, who says biopharmaceutical companies that want to employ a neural network to predict sustainability will need to ensure they can monitor manufacturing processes in detail.
“As with all data-driven models, the process boundaries and variability need to be defined in the input and output data from which the multi-target models are derived, as rubbish in equals rubbish out,” he continues. “You need to look at what data are available, and [determine] if the volume and granularity of data are sufficient to describe the process.”
Drug companies may need to install additional in-process monitoring systems, points out Fisher, who notes that “a multi-target model may well require more output data being measured to label the input data, which may prove initially more expensive if this output data is not originally being collected. inHowever, the model used to improve process economic and environmental sustainability would prove this worthwhile in the long run.”