By Mike May, PhD

Data serves as the foundation of today’s biotechnology and pharmaceutical industries, and that foundation keeps expanding. “The appreciation of the value of data and need for quality data has grown in recent years,” says Anastasia Christianson, PhD, vice president and global head of AI, machine learning and data, Pfizer. She notes that the concept of FAIR data—Findable, Accessible, Interoperable, and Reusable data1—is becoming more widely accepted and more closely achieved.

Part of the transition in data use arises almost philosophically. “There has been a cultural shift or mindset change from data management for the purpose of storage and archiving to data management for the purpose of data analysis and reuse,” Christianson explains. “This is probably the most significant advance. The exponential growth of analytics capabilities and artificial intelligence have probably raised both the expectations for and appreciation of the value of data and the need for good data management and data quality.”

A recent project at Recursion Pharmaceuticals highlights a key data-related challenge: size. The company’s scientists connected biology and chemistry by predicting the protein binding of 36 billion chemical compounds. According to Recursion, “This screen digitally evaluated more than 2.8 quadrillion small molecule–target pairs.” That’s an awful lot of data to create and manage. Consequently, Recursion relied on its own supercomputer to run the simulations and capture the data.

Despite the crucial need to make the most of data, various challenges stand in the way. “Data quality and metadata tagging remain a challenge even though they have improved and are continuing to improve,” Christianson observes. “Multimodal data integration also remains a challenge, not least due to continued exponential growth of data volume, velocity, and variety, and to greater scrutiny of data veracity.”

Integrating data has been difficult for some time. “Companies that develop hardware typically don’t have the right competency to build the best software, especially outside of their platforms, and vice versa,” says Veerle d’Haenens, general manager, global therapeutic systems and cell therapy technologies, Terumo Blood and Cell Technologies. Companies address interoperability challenges in various ways. “Some solutions specific to cell and gene therapy have come to market to meet these needs,” d’Haenens remarks, “but many manufacturers have invested in their own solutions.”

This article explores how some companies already address and overcome this collection of data-related challenges.

Seeking solutions in cell and gene therapies

Data management plays a crucial role from start to finish in cell and gene therapies (CGTs). “In many sectors, it can be the number of data points that causes challenges, but in the CGT sector, it tends to be the depth of the data needed for each patient’s therapy that offers the biggest challenge,” says Matt Lakelin, PhD, co-founder and vice president of scientific affairs and product development, TrakCel.

TrakCell cellular orchestration systems
To serve the cell and gene therapy industry, TrakCel develops what it calls cellular orchestration systems. The company’s flagship product, OCELLOS, is designed to manage commercial-stage therapies from patient enrollment to final product delivery. Simpler, more streamlined versions of OCELLOS include OCELLOS Lite and OCELLOS Core. These solutions can manage supply chain processes and preserve chain of identity and chain of custody in early-stage clinical development.

Another challenge in the sector is the urgency with which GCTs are needed. “The often personalized nature of the therapies paired with their relatively delicate nature causes tight timelines,” Lakelin explains.

Various companies work on ways to speed up the production of CGTs. Terumo Blood and Cell Technologies, for example, takes several approaches that improve the management of data in developing and manufacturing cell therapies.

“The first benefits we’ve enabled with enhanced data management capabilities are for our automated Quantum Flex Cell Expansion System and our Finia Fill and Finish system with fleet management,” d’Haenens points out. “These include remote process monitoring and alarming, which can provide more insight to issues before entering the cleanroom space.” Such capabilities meet a crucial therapeutic need, she says, because “cell expansion is the most time-consuming part of cell therapy manufacturing.”

Keeping data confidential

CGT production depends on data sharing, but there is a proviso: Protect the privacy of patients. “With personalized medicine,” Lakelin stresses, “the patient’s data is inextricably linked to the drug product to ensure that the correct therapy reaches the correct person.”

Various companies, though, must see patient-related data and manage it to manufacture CGTs. “Data flowing between critical partners is vital to keep time-sensitive therapies progressing through the supply chain and to optimize drug product supply,” Lakelin says. “However, these time pressures can lead to data breaches or errors if manual processes are relied on.”

Such data breaches can arise through connecting data points. “As an example, single patient-specific data points may not reveal a patient’s identity but in combination may indicate their identity, and thus be construed as a data breach,” Lakelin explains. “Where paper records would have to be redacted, data flows within an automated system can select only the correct information to share, as well as the correct partner to share it with.”

But how does a company ensure that data is safe and secure? According to Lakelin, a company must use industry-standard encryption. “Developing an in-depth understanding of the value chain can help when deciding when and where it is possible to use pseudoanonymized identifiers instead of personally identifiable information, or PII,” he elaborates. “And restricting access to PII can be undertaken simply by electronic orchestration systems.”

For example, TrakCel’s orchestration system, OCELLOS, can be used for real-time data exchange. “This facilitates more efficient and safer value chain management than using paper-based systems,” Lakelin asserts. “Introducing integrations can further improve the efficiency and safety of advanced therapy supply.”

Still, in the future, additional GCT solutions will be needed. “To date, the CGT industry has been focused on data use in a ‘micro’ form,” Lakelin explains. “As the industry develops and partner technology progresses, considerably larger data volumes will require consideration.” That will include various challenges, including the tracking of data from a particular therapy journey, or the creation of larger pools of cross-patient or cross-therapy data that can be used for predictive analysis.

“There are some very exciting artificial intelligence techniques that are being used in other industry supply chains to assist with forecasting and resource management,” Lakelin says. “Up to now, the quantity of CGT data available to power this has been a challenge as the industry is relatively new and still small, but as it grows, these methods have the potential to unlock powerful possibilities.”

The potential in pictures

Whereas a picture is worth a thousand words, a medical image can summarize billions of data points. “When analyzed properly, scans from different imaging modalities—X-ray imaging, magnetic resonance imaging, computed tomography imaging, and so on—can be applied to better characterize disease mechanisms and therapeutic responses, thereby improving predictive models,” says Costas Tsougarakis, vice president, life sciences solutions, Flywheel. “Such approaches can inform clinical trials by supporting patient-related decisions, such as selection and classification, and novel biomarkers that can become objective study endpoints.”

Data management of images, however, must overcome several obstacles. For example, medical imaging data is unstructured. “Images are acquired with a variety of protocols across imaging sites and modalities,” Tsougarakis points out. “Data classification is primarily based on human input, which leads to inconsistencies and errors.” These challenges make it difficult to develop standardized analytic approaches.

In addition, the size of datasets from imaging creates other complications. “Imaging data must be de-identified, harmonized, and uniformly curated,” Tsougarakis insists. “When traditional methods are used, completing these tasks at scale can take many weeks or even months.” Plus, running computational analyses of such large files—a gigabyte for one magnetic resonance imaging scan, for example—requires high-performance and often cloud-based computing.

Screen showing Flywheel's medical imaging AI platform
Flywheel, a medical imaging company, provides data management solutions to biotechnology and pharmaceutical companies, clinical researchers, and academic medical centers. Flywheel says that its medical imaging AI platform can streamline data discovery, aggregation, and curation; automate research workflows; and scale on demand. The platform includes Flywheel Enterprise, an imaging research data platform; Flywheel Data Exchange, a catalog of curated datasets; and Flywheel Discovery, a cohort discovery tool.

Flywheel is a medical imaging data and artificial intelligence platform that automates data processing and workflows to help scientists get the most from their medical images. “The ultimate goal of modern data management strategies,” Tsougarakis declares, “is to turn complex medical imaging data into analysis-ready datasets that can be reused over time for accelerated algorithm development and clinical research.”

According to Tsougarakis, Flywheel is working toward this goal by partnering “with one of the largest pharmaceutical companies in the world to enable the aggregation and management of medical imaging and associated data to accelerate drug discovery.” He adds that Flywheel is using automated pipelines to organize and process imaging data while “minimizing the potential for human error and saving significant time.”

A dream of simpler solutions

The complexity of biotechnology’s data environment notwithstanding, scientists and companies look forward to simpler solutions to data management. “We can see a future where all systems can be managed through a single software in a single location, which is powerful on its own,” d’Haenens says. “But it will also make possible the kind of iterative data analyses that lead to true advances, helping individual developers and the field writ large to understand what can be done to improve product quality and efficacy, reduce manufacturing time, and drive productivity to increase patient access.”

Where possible, biopharmaceutical companies already take advantage of advances in data management. As Christianson explains, “The technology stack for data management has evolved, making more use of cloud-based solutions, automated pipelines, data cataloging, master data management, and federated learning solutions—all of which lower the barriers to and costs of enhanced data management solutions.”

Although data management makes up the foundation of today’s biotechnology and pharmaceutical industries, patients depend on even more improvements in these processes. “As the whole industry gains a better understanding of each element of the production process,” d’Haenens maintains, “we expect it will help improve safety and efficacy.”



  1. Wilkinson MD, Dumontier M, Aalbersberg IJ, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016; 3: 160018.
Previous articleMaking AAVs the Go-To for Making Gene Therapies Go
Next articleSet to Join the NGS Race, Ultima Genomics Offers a Peek under the Hood