Even as bioprocessors collect ever more data and analyze it with AI-based methods, the industry continues to face a crucial hurdle—data integration. “In most industrial biotech companies or CDMOs with bioprocessing workflows, data integration remains a challenge due to data being stored across various tools and formats and the lack of standardized data models,” says Guru Singh, founder and CEO of Scispot. Instead of combining as much data as possible, it’s scattered across electronic lab notebooks, spreadsheets, laboratory information management systems, internal databases, and cloud storage, Singh says. “This fragmentation—compounded by the absence of a standardized data model and the inability to operate as a data lake—complicates data unification and comprehensive analysis, hindering decision-making and process optimization.”
For any bioprocessor, data integration is well worth the effort. “Integrating bioprocessing data improves the consistency of product quality by maintaining uniform process parameters across batches,” Singh says. “This involves connecting various types of data, including bioreactor parameters, experiment metadata, quality control results, and environmental conditions.” Then, tracking that integrated information in real time “helps identify discrepancies, such as deviations in temperature, pH levels, or oxygen levels, enabling faster decision-making and the ability to fail fast, which speeds up R&D and production cycles,” Singh explains. “Comprehensive data integration also simplifies regulatory compliance by ensuring all data is thoroughly documented and easily accessible for audits, ensuring full traceability of process parameters.”
Trying new tools
With some tools, such as OpenAI’s GPT-4, “modern companies are now interacting with their data in natural language, making it easier for scientists to analyze and interpret complex datasets in real time,” Singh says. “For example, scientists might ask, ‘what were the pH and temperature fluctuations in the last batch?’ or ‘how can we optimize oxygen levels to improve cell growth rates?’”
In addition, some academic scientists focus on better tools for integrating data. As Shila Ghazanfar, PhD, a statistician at the University of Sydney in Australia, says, “Better data integration would enable better validation of new and emerging biotechnologies, as with every new technology, we must stress-test and relate to existing data to ensure robustness of the platforms.”
Using mosaic data integration, Ghazanfar and her colleagues developed a new data-integration tool called StabMap. “Mosaic data integration is the task of bringing together data stemming from multiple different types of biotechnologies, by capitalizing on there being some, potentially small, common biomolecules that are captured in the biotechnologies,” Ghazanfar says. “The main motivation behind developing an algorithm for mosaic data integration is that previous approaches must reduce the complexity of data down to only the biomolecules that are captured in all datasets.” That approach, however, fails. “With more and more datasets, this set of biomolecules can become vanishingly small, and furthermore, is a waste of the information that was so laboriously captured in the high-throughput biotechnologies,” Ghazanfar says.
StabMap solves these problems. “The main algorithmic advance for the StabMap approach is the flexibility of the algorithm—any combination of datasets can be used to perform integration,” Ghazanfar says. “That being said, flexible models like StabMap are only as good as their input data, and indirectly assume that there really is some commonality between the cells that are being integrated.” So, the well-known computing mantra—garbage in, garbage out—must be considered.
Whatever approach a bioprocessor takes, the “integration should automate data capture, transformation, and QC checks,” Singh says. “Harmonizing data and storing it in graph databases and vector databases prepares it for advanced analysis, machine learning, and AI applications, leading to more efficient operations and better decision-making.”