In machine learning (ML), machines—computer programs—learn and improve based on the assessment of historical data without being directed to do so. This process allows them to improve the accuracy of predictions or decisions they make.

ML is part of the wider field of artificial intelligence. But, unlike AI which seeks to mimic human intelligence, ML is focused on a limited range of specific tasks. The ML concept is already being used in areas like drug discovery1. For example, last year GSK2 shared details of its use of ML in vaccine development. Likewise, in July the Gates Foundation awarded A-Alpha Bio $800,000 to use machine learning to optimize protein therapeutics for infectious diseases.3

But ML also has potential in the process development suite and on the factory floor according to Moritz von Stosch4, chief innovation office at analytics and process modeling firm DataHow.

“ML can be used for development, monitoring, control and optimization. ML is better at learning complex relationships—for example between process parameters and process performance—than humans and it can make better predictions of what might be happening for slightly different scenarios.”

Quality data costs

Data quality is key to any ML strategy and this is the biggest challenge for the biopharmaceutical industry, von Stosch says.

“Data quality is typically poor, data pre-processing anecdotally requiring 80% of the total effort in any machine learning project. Data quantity is rather low as in bioprocess development the generation of data costs money,” he continues. “Generally in bioprocess development we face the curse of dimensionality because of the large number of parameters and getting informative data for all possible parameter combinations is impossible, wherefore we need to add process knowledge to ML.”

Combining process knowledge to ML—which von Stosch calls5 “hybrid modeling”—aims to generate more insights than MLs alone to increase process understanding and develop models that can better forecast system behavior.

As well as high quality data, biopharmaceutical companies thinking about using ML need to make clear which parameters they want to model.

“The key” von Stosch said is “framing the problem, such that it is concise and can be solved by a machine. He explained that providing the learning algorithm with the correct parameters is critical. “For instance, if you train the machine with data of sugar content of grape juice and the alcohol content after it has been fermented, then the machine is able to predict the alcohol content if you provide with the sugar content of a novel grape juice,” he pointed out. “However, it will not be able to predict the alcohol content based on the mass of grapes that was used to make the juice.”




  1. Réda C, et al. Machine learning applications in drug development. Comput Struct Biotec. 2020;18: 241–252.
  2. Smyth Paul et al. Machine learning in research and development of new vaccines products: opportunities and challenges. ESANN 2019.
  3. A-Alpha Bio Awarded $800K to Develop Infectious Disease Drugs using AlphaSeq Platform.
  4. Narayanan H, et al. Bioprocessing in the Digital Age: The Role of Process Models. Biotechnol. J. 2019, 15, 1900172.
  5. von Stosch M, et al.Hybrid modeling for quality by design and PAT‐benefits and challenges of applications in biopharmaceutical industry. Biotechnol J. 2014, 9: 719–726.
Previous articleA Paradox? Gene Therapy Manufacturers Should Learn Lessons from the Past
Next articleDemocratizing the Availability of Off-the-Shelf Natural Killer Cells