By carrying out multivariate data analysis with machine learning (ML), Lumen Bioscience improved its therapeutic protein production method between 70 and 100 percent, while needing only a small fraction of the runs that would have been necessary under the traditional “design of experiment (DOE)” approach for the same number of variables, according to company officials.

The challenge was to optimize the high growth rate of spirulina, the photosynthetic microbe Lumen uses to produce recombinant proteins for its vaccines and therapeutics, while simultaneously achieving high expression of the proteins.

To optimize those factors, the company collaborated with Google Accelerated Science (GAS) to investigate the relationships among production outcomes and 17 environmental variables, using 96 photobioreactors. By designing an adaptive, iterative model, the company was able to run only 245 configurations versus the 130,000 that would have been needed for a two-level, full factorial DOE approach.

That model allowed resources to be shifted to more promising areas and models to be refined as new data became available. “This adaptability helped de-risk our study design phase, allowing for greater flexibility than many DOE-based studies,” says Caitlin Gamble, PhD, director of informatics at Lumen Bioscience.

Optimal parameters

Furthermore, ML revealed optimal parameters that likely would not have been identified with DOE, based on existing preconceptions. The optimal temperature range (which fell in a narrow band between typical middle and high points) and the optimal pH range (which fell below those reported in the literature) are two examples.

To achieve such results required a significant influx of accurate data, notes Gamble, “so we focused on a high-throughput fluorescent protein assay for our readout. Applying ML also required careful consideration of how to structure our more complex, light-schedule parameters and reward function.

“ML has a way of arriving at solutions that you may not have anticipated, so it was important for us to fully consider whether optimization of our reward function would lead to practical solutions. The approach was relatively robust to observation noise and non-linear interactions, and it allowed us to explore a complex subspace with light ramping and cyclic lighting schedules.

“Between weeks 5 and 15, we discovered multiple bioreactor setting configurations that approximately doubled productivity. During the process, we added a number of controls and normalizations that we will continue to apply to help ensure observed outcomes arise from defined–rather than hidden–variables.”

Lumen continues to explore multiple media parameters, but Gamble says it hasn’t needed to introduce any major changes to the strain background. The team currently is introducing genetic and environmental variables for further optimization.