The complexity of science and volume of research data have grown tremendously over the past decade. In today’s biopharma lab, robots dispense 384-well plates in seconds, and data are generated on the petabyte scale. To manage these data, scientists interact with many systems—apps, data systems, databases, external sources, and email—on a daily basis.

But these siloed, stone-age tools bring pipelines to a crawl. Significant time is lost converting between formats, moving data between apps, pivoting tables, aggregating statistics, and coloring graphs. This zoo of systems is interfering with what many scientists need to be doing by day: thinking about problems, looking at data, developing hypotheses, executing experiments, and coordinating research.

To address this time sink, a growing number of biopharma companies—from startups to monoliths—are entering the computational informatics space and moving to the cloud.

How to get on the cloud

As academic and industry researchers begin to move data from hardware-based storage systems to the cloud, the apps are following. Today, there’s an app for everything: microscopy, flow cytometry, genomics, and so forth. So, how should researchers move data to the cloud? What should biopharma scientists look for?

Mike Tarselli
Mike Tarselli, PhD

“Do you call your local service provider and say, ‘Excuse me, can I get a cloud?’” joked Mike Tarselli, PhD, chief scientific officer of TetraScience. More seriously, Tarselli explains that a cloud-native partner utilizes cloud computing to build scalable applications in modern, dynamic environments, including public, private, and hybrid clouds.

That’s why TetraScience created a scientific data cloud—the Tetra Scientific Data Cloud—that integrates with various instruments and software programs and can logically chain together data transformations and analysis pipelines.

TetraScience’s application programming interface (API) is instrument-agnostic, so developers can plug data from any instrument into these integrations. It ingests data from various sources—instruments like mass spectrometers, next-generation sequencers, flow cytometers, and microscopes. It also brings in data from sensors and web systems like Illumina’s BaseSpace.

Then the data are harmonized. “If you have a bunch of different plate readers, mass specs, or HPLCs, they should all have very common things, yet there are 50 different plate reader formats!” Tarselli exclaims. He adds that in essence, TetraScience works with different formats, file types, and sensors and makes them compatible with one another.

Finally, the Tetra Scientific Data Cloud can push data where it needs to go, whether to an electronic lab notebook, a long-term storage database, or a visualization tool like Tableau. “If you think about where the data sources are and then target where you want to publish the data, you can easily see why our data platform fits in the middle,” Tarselli says.

Tetra Scientific Data Cloud
TetraScience asserts that life sciences companies can benefit from the Tetra Scientific Data Cloud, an open, cloud-native system that is designed to “replatform and reengineer” the world’s scientific data, and to leverage machine learning and artificial intelligence in drug discovery. The system’s main components are the Tetra Partner Network (a source of API-based integrations and use case–based scientific applications) and the Tetra Data Platform (which reengineers raw or primary data into harmonized Tetra Data in accordance with FAIR principles—that is, the principles of findability, accessibility, interoperability, and reusability).

How to cross-collaborate on the cloud

The research and development process involves workflows of highly specialized teams functioning in parallel and in series, either within or outside a company’s walls. Consequently, collaboration across a fragmented software ecosystem is often needed.

Benchling is trying to modernize and digitize scientific processes and help customers get their products to market faster. “Our R&D cloud, which is our product offering, is used by over a thousand industry customers from biopharma companies to AgTech (agricultural technology) to CPGs (consumer packaged goods) to industrials, giving us this super-wide lens into how the industry as a whole operates,” says Ashu Singhal, co-founder and president of Benchling.

The Benchling R&D Cloud
The Benchling R&D Cloud offers a platform of interconnected applications and allows R&D teams to share key data to accelerate time to market while meeting compliance requirements. The applications can bring flexibility to early research and rigor to development, and they can improve the tracking of experiments and workflows. Benchling’s permissions, data types, and workflows can be configured and reconfigured by designated administrators without requiring additional code writing or vendor involvement.

Notably, Benchling has partnered with Verve Therapeutics, a pioneer in base editing, by creating a single repository and software layer for Verve’s scientists across R&D to work in that’s helped them get their product to market faster.

Ashu Singhal
Ashu Singhal

Benchling’s R&D platform has three main layers. The first layer is called the productivity hub. “Our R&D platform contains a suite of applications that scientists ‘live in’ as they work,” Singhal relates. “The applications help with everything from capturing notebook entries to designing and editing DNA sequences to tracking physical samples in a lab.”

Beneath the productivity hub is the cloud-native platform. “The goal is to allow our customers to configure and standardize,” Singhal explains, “and then give them all of the robust security access controls and audit trails somebody needs in the cloud.”

The third layer is an ecosystem. Singhal says Benchling’s larger pharma customers are building their in-house software and highly specialized tools for the broader biopharma industry. There’s also a growing ecosystem of third-party startups working on niche tools. Benchling provides several methods to bring these customized solutions to the platform through collaboration and partnership or by creating integrations.

How to test and analyze data on the cloud

Rami Mehio
Rami Mehio, Illumina

Some biopharma researchers don’t need broad and customized computational tools such as those provided by TetraScience and Benchling. Sometimes, all a project needs is access to secondary analysis, which uses previously collected data to perform a new study.

DRAGEN is Illumina’s internal engine for informatics; the company also offers it to the outside world for database analysis. “Everybody who has a server or wants to purchase a server can download DRAGEN and run it,” asserts Rami Mehio, head of global software and informatics at Illumina. “They can also run it on different types of clouds. We can run them on Microsoft and AWS [Amazon Web Services] today, and … in the future, we will include other clouds.”

Illumina’s DRAGEN Platform
Illumina’s DRAGEN Platform uses highly reconfigurable field-programmable gate array technology to provide hardware-accelerated implementations of genomic analysis algorithms, such as BCL conversion, mapping, alignment, sorting, duplicate marking, and haplotype variant calling

Illumina’s DRAGEN gets updated on a regular cadence. With the latest release, version 4.0, Illumina uses machine learning to provide greater accuracy for pharmacogenomic profiling. “There are 20 CPIC (Clinical Pharmacogenetics Implementation Consortium) genes, and all of them are pretty difficult genes to analyze,” Mehio points out. “DRAGEN v4.0 allows end users to gather [sequencing] data … to understand blood interactions and the effectiveness of a drug for specific individuals and so on.”

Finally, DRAGEN v4.0 introduces two new pipelines: single-cell ATAC-Seq and single-cell multiomics. The single-cell ATAC-Seq pipeline enables single-cell resolution profiling of chromatin accessibility. It can now be run in combination with DRAGEN’s single-cell RNA-Seq pipeline as part of the single-cell multiomics pipeline. “We have a focus on multiomics and have made a significant improvement to our single-cell pipeline,” Mehio says. “For the first time, we provide support both on the cloud and on site.”

How to apply machine learning to the cloud

Alfredo Andere
Alfredo Andere, LatchBio

According to Alfredo Andere, cofounder and CEO at LatchBio, the pace of data generation is only accelerating, and it’s not getting easier to handle the volume. Andere expects that in the next three years, every biopharma company will want to plug into a cloud without having to know what AWS is. “They can plug in … and start doing all their analysis without having to build any infrastructure,” Andere explains.

LatchBio provides a web-based platform that enables researchers to store and analyze data without touching code or cloud infrastructure. Using any browser, researchers can quickly import files from existing data stacks and access dozens of popular bioinformatics pipelines and data visualization tools, including RNA-Seq, CRISPResso2, and AlphaFold. Scientists at leading industry organizations, including the Innovative Genomics Institute (IGI), BitBio, and Eligo Bioscience, are leveraging the LatchBio platform to advance their work.

LatchBio Bio Development Framework
LatchBio proposes a Bio Development Framework that will simplify the management of data infrastructures and workflows. Specific activities supported by the framework include (from lower left to upper right) cloud deployment, end-to-end data provenance, interface access, workflow automation, system updating, and ecosystem development.

Andere sees everyone moving into using machine learning and artificial intelligence models that autonomously solve problems and propose next steps. “Once you’ve tested your data and gotten results, you can plug those results into a machine learning model,” he elaborates. “Then the model generates recommendations, such as the next set of experiments you should do.

“I see the whole modality moving into automated cloud labs, but that’s more like a 10-year timeline. Suddenly, you’re doing [experiments] without a single human in the loop without taking care of the whole testing and learning space. That’s what I’m super excited about.”