In the near future, clinical trials will use machine learning routinely to analyze data. Today, however, machine learning is still at an adoption midpoint—no longer pioneering, but not yet ubiquitous.
For example, David Craford, president and chief executive officer, Cytobank (now a Beckman Coulter Life Sciences company), shares that, “In a clinical trials data analysis session at the CYTO 2019 Conference on flow cytometry in June, a live, 104-person survey indicated that 53% are using ‘an unsupervised approach such as FlowSOM’ (a machine learning algorithm) to analyze and visualize cytometry data sets.” These numbers are sure to grow in the immediate future as the trends in increasing size and quantities of datasets, coupled with decreasing costs to generate them, continue. Machine learning—self-training algorithms that sift through massive amounts of data looking for unknown signals in the noise—has emerged as an absolute necessity for making sense of all of that data. And Cytobank is well positioned to respond to that market need.
Tangible benefits from the cloud
Cytobank’s fully integrated, cloud-based machine learning platform is designed to analyze high-dimensional single-cell data from flow cytometry studies. Now incorporating FlowSOM, the platform can allow users to develop self-organizing maps for automated clustering and dimensionality reduction.
Cytobank’s management and collaboration application differs from most competing applications, Craford says. Key differentiators are its cloud computing platform and algorithms that are fully integrated with an intuitive graphical user interface. Those algorithms can be linked to create analysis pipelines, he adds.
Increasing the speed of discovery and achieving more complete discovery are key benefits of integrating machine learning and cytometry analysis. “One top-10 pharma had spent over a year analyzing a complex cytometry dataset and found one biomarker,” Craford recalls. “Then it asked for our analysis of the dataset. Within a couple of weeks we found that same biomarker and also uncovered another putative biomarker.”
Such discovery is one of machine learning’s most frequent applications but, as scientists—and not just data scientists—become more familiar with the technology, Craford expects that usage will expand to include reporting defined endpoints.
Transformation of knowledge generation
The application of machine learning catalyzes two important transformations in the process of generating knowledge from complex, single-cell data. By moving data from individual desktops, cloud computing is a foundational technology that enables machine learning. The cloud’s near-instant scalability lets scientists access more computing power to complete big data analyses quickly.
Access to data also expands. “Collaboration amongst scientists is enabled because the data is easily accessible anywhere by a web-enabled device,” Craford points out. “No data are lost in translation. No time is spent shipping hard drives or using FTP sites where researchers can share final data but not annotations or intermediate steps in the analysis. We have one customer where the principal investigator checks the analysis work from the lab on her iPad at night and can provide detailed feedback. We have many multisite pharma customers where scientists at different sites share analysis between sites. We also routinely host example analyses and accompanying datasets where scientists can reference and learn from other’s work.”
Perhaps more to the point, the availability of machine learning is moving cytometry analysis toward automation and away from biaxial gating analysis, which has been the standard for the past 20 years. “Machine learning,” Craford notes, “can provide a more comprehensive and reproducible analysis of larger datasets…and can remove much of the subjectivity involved in manual gating.”
Opportunities for machine learning
One of the biggest opportunities, Craford says, is to integrate multiple types of single-cell data into dataset analysis: “We have been working with CITE-seq data (scRNA-seq and protein information) and are making progress with this application. There is still more work to do to provide easy-to-use normalization solutions, as well as to develop more complete pipelines to analyze, compare, and integrate multiple types of single-cell data.”
Cytobank is in the second year of a $1.3 million, Phase II, Small Business Innovation Research (SBIR) grant designed to speed the discovery of clinically informative biomarkers for immunotherapy and thus better predict patient responses to specific therapies. The research is based on analyzing the high-dimensional single-cell datasets captured during immunotherapy clinical trials. Researchers who have used machine learning say the automated analysis of large datasets has identified biomarkers that otherwise may have been overlooked.
“We’re making good progress regarding adding functionality to the platform that enables it to scale and work with multiple types of single-cell data,” Craford says. His goal is to increase today’s analysis capacity by between 10- and 100-fold.
Cytobank was formed before 2010 as an outgrowth of the founders’ Stanford University investigations into why some cancer cells resist certain therapies. Co-founder Nikesh Kotecha, PhD, the company’s original chief executive officer, was developing a diagnostic test for juvenile myelomonocytic leukemia. Co-founder Jonathan Irish, PhD, Cytobank’s chief scientific officer, was using clinical signaling profiles to identify therapy-resistant cells in aggressive cancers. Both were working in the laboratory headed by Garry Nolan, PhD, at Stanford University. Nolan, now a Cytobank advisor, was the first to use flow cytometry to investigate cells’ internal signaling. When the three put their heads together, a flow cytometry database, called a cytobank, emerged.
Still, Craford tells GEN, “They needed a better, faster, more scalable solution to analyze complex phospho flow cytometry data.” Since then, Cytobank has scaled the platform to add functionality and built awareness of machine learning’s potential and its applications in flow cytometry.
Best of both worlds
Last June, Cytobank was acquired by Beckman Coulter Life Sciences and placed under the Danaher Life Sciences platform. The company culture still has an entrepreneurial feel, though. “I think we could catalyze some changes in the Beckman Coulter culture,” says Craford, “especially related to what is required to innovate in relatively fast-moving informatics-based businesses.”
While joining Beckman Coulter Life Sciences hasn’t changed Cytobank’s mission, it has extended its resources. Some of those are targeting customer education. Cytobank’s original customers knew they wanted an analysis platform for single-cell data, Craford explains. Now, the company is conducting more customer education to show how machine learning can be used to more quickly and accurately make sense of big data.
“These algorithms are no longer just in the realm of computational biologists and computer-science PhDs,” Craford stresses. “We aim to democratize machine learning and make it accessible to life science researchers of all abilities.”
Part of Beckman Coulter Life Sciences
Location: 2811 Mission College Blvd., 7th Floor, Santa Clara, CA 95054
Phone: (650) 918-7966
Principal: David Craford, President and CEO
Focus: Cytobank is a software-as-a-service company that provides an integrated, cloud-based platform for the analysis of single-cell flow cytometry data.