Today, nine out of ten drugs fail in clinical trials. Furthermore, it takes over a decade and an average cost of $2 billion to develop and approve each medicine.1 An important underlying reason is the gap that exists between cell-based in vitro research and clinical research—often referred to as the “valley of death.” Promising in vitro candidates often fail in the clinic, as in vitro models turn out to be insufficiently predictive and translatable to the clinical setting.

Cells are complex. When exposed to a drug, unexpected subcellular changes occur in vitro. The more accurate the predictions on in vivo toxicity and potential adverse reactions, the more is the likelihood of success in clinical trials and getting drugs to market.

Angeline Lim and David Egan
Angeline Lim, PhD, applications scientist and bioimaging specialist at Molecular Devices (left) and  David Egan, PhD, co-founder and CEO, Core Life Analytics (right) are the authors of this article.

Taking a phenotypic approach to drug discovery

To help improve their chances of success, the drug discovery community is increasingly embracing phenotypic profiling assays, combined with analysis powered by artificial intelligence (AI). Phenotypic profiling is a strategy that uses rich image-based data to quantify cellular profiles, enabling, for example, systemic evaluation of cellular morphologies in response to drug exposure. It helps researchers identify novel drug candidates or targets that are capable of inducing phenotypes of interest in a more efficient and effective way, slashing the cost and time it takes to develop a drug.

High content screening (HCS)-based phenotypic profiling like cell painting is key to advancing drug discovery. One example of a lab that is already succeeding with this approach is Recursion, a clinical-stage biotechnology company.2 Recursion’s workflow—integrating innovations in engineering, laboratory automation, and machine learning—enables it to quantify and processhundreds to thousands of phenotypic features. Their machine learning algorithms enable them to uncover links that would otherwise have remained hidden. Since it began work in 2013, Recursion already has four clinical-stage programs and 41 early discovery programs in its pipeline.3

cells stained with dyes in a Cell Painting assay
Examples of cells stained with dyes in a Cell Painting assay to detect key cellular components.

While the targeted approach to drug discovery is based on existing knowledge of a drug target or a specific hypothesis on its role in disease, the phenotypic profiling approach casts a wider net by capturing all the measurable features related to the cell’s phenotype. The cell painting assay uses up to six fluorescent dyes to label up to eight cellular components and organelles. Cells are cultured and treated with the experimental conditions of interest, either via transfection (genetic screening) or compound addition (small molecule), incubated, and then stained with cell paint dyes. Images are then captured using high-content imaging systems. AI-driven software is used to segment these raw images into individual cells and specific cellular components, and extract quantitative measures of cellular features such as size, intensity and texture. These data are then used to generate detailed cell profiles which enable researchers to assess a compound’s effect on the entire phenotype. This ability to relate changes in the phenotypic profile to a specific drug exposure helps researchers to narrow down the mechanism of action of each drug.

Colchicine treated HEK293 cells
Colchicine treated HEK293 cells are phenotypically different from untreated control cells. Wells with colchicine contain significantly fewer cells after 24hr (646±37) compared to the control wells 4355±264) p<0.001.

Because the phenotypic approach delivers quantitative data representing a cell’s complete response to specific drugs, researchers are no longer limited to a single specific measurement. This allows for unexpected novel drug discoveries. What’s more, this multiparametric dataset collected during the HCS process is not influenced by existing knowledge or hypotheses, thus reducing the human bias associated with the targeted approach.

The importance of high-performance imaging technology 

The obvious power of HCS and technological advancements have led to their increased adoption phenotypic profiling but they face several key challenges. If researchers rely on subpar imaging technology that is not optimized for high-throughput image acquisition, it leads to limitations in scaling the experiment. Image quality is also at risk if an imager isn’t engineered to handle highly multiplexed cell-based assays which can lead to artifacts. When the imaging process is performed manually, interruptions can also cause issues. The good news is that technological improvements and the increased availability of laboratory automations help overcome these inefficiencies.

ImageXpress® Confocal High-Content Imaging System
The ImageXpress® Confocal High-Content Imaging System from Molecular Devices features AgileOptix technology, the combination of a powerful solid-state light engine, custom optics, scientific CMOS sensor, and the ability to change between 5 different disk geometries.

To maximize imaging speed, scale, and quality, the perfect high-content imager needs several key features. At a minimum, it should have a premium camera for images with improved resolution. Integrating a multi-channel laser light source will generate brighter images with higher signal-to-background. Add water immersion capabilities and researchers have higher signal at lower exposure times for greater sensitivity and image clarity. Automation capabilities are increasing in popularity—both at the instrument and workflow levels—due to their time-saving nature. An automated imager or a work cell centered around a robotic arm that moves plates seamlessly between instruments can help reduce workflow interruptions and speed up the imaging of multi-well plates. Together, these innovations generate high-quality images faster, shortening the time between image acquisition and analysis.

The second key challenge is the sheer amount of data generated. An advanced high-content imager can produce gigabytes or even terabytes of content.4 When a cell painting assay is scaled up, the amount of data generated will also rise accordingly. Gleaning insights from such massive datasets can be incredibly daunting and researchers can understandably struggle with data overload.

Illustrating the scale of data involved, the Carpenter-Singh Lab, which is based at the Broad Institute of MIT and Harvard and best known for creating the famed cell painting assay, recommends that researchers image nine positions in each well. Most screens take place in a microtiter plate with 384 wells, which, at nine positions per well, would generate a total of 3,456 images.

Each of these images are then analyzed. If a researcher wanted to look at 1,000 features per cell and there were 100 cells in each well, that’s a total of 100,000 datapoints. Multiply that by 384 wells, and they are capturing 38,400,000 pieces of information from one plate. And naturally, the experiment will need to be replicated, pushing the total up even further.

It’s not hard to see how this volume of data can pose a limitation. At best, it requires the expertise of a data scientist familiar with advanced statistics and machine learning or AI. At worst, only a few features are cherry picked for analysis, rendering the rest of the data wasted. Traditionally, statisticians, data scientists, and software developers were needed to effectively analyze the vast amounts of data produced by HCS. This problem was further illustrated by a recent Drug Target Review survey, where some 70% of respondents said that training and talent were the main barriers to them applying AI and analytics to drug screening.5

To make sense of big data, researchers need advanced analytics. As a result, powerful analysis software is becoming increasingly mainstream. AI-driven platforms are becoming more readily available, democratizing the process and enabling scientists with little or no data analytics experience to glean valuable insights from the data they collect.

StratoMineR diagram
A) StratoMineR from Core Life Analytics is a web-based platform which guides users through a typical workflow for analysis of high content multi-parametric data. B) Principal component analysis (PCA) can be used for data reduction. Feature contributions to PCA2 is shown as a polar plot. C) 3D scatter plot shows interactions between data points in relation to three different PCAs. Note the three separate clusters correspond to each treatment condition.

For example, a team from Google Research and the New York Stem Cell Foundation Research Institute recently carried out a deep learning and cell painting experiment that revealed Parkinson’s disease-specific signatures in primary patient fibroblasts. This was possible thanks to ImageNet, an objection recognition dataset, and CellProfiler, an open-source software designed to measure and analyze cell images.6


Taming the data deluge

The availability of platforms and tools like these mean that phenotypic profiling is no longer limited to data scientists. Advancements in high-content instrumentation and automation are boosting the flexibility needed to meet the growing demands of increasingly complex cellular disease models, pushing the adoption of HCS.

And with the right data analysis software available, every researcher can cut through massive amounts of information generated by their assays. The availability of advanced, user-friendly data mining tools allows even the non-expert to leverage the power of AI, and gives them the ability to structure, re-structure and view the data through a variety of lenses. This helps to glean further insights from large datasets and to bring new findings to the surface.

Interactive graphs with StratoMineR
Interactive graphs with StratoMineR enable exploration of data in great detail. Users can quickly produce and export sophisticated visualizations of specific data subset in the form of hierarchical clustering visualizations, 3D scatter plots, spider plots, and heat maps.

With HCS and an AI-powered analysis platform in place, scientists will no longer be overwhelmed by the data deluge and will instead be able to harness their creativity and curiosity to discover and develop new treatments.



  1. Steedman, M, et. al. (2019). Ten years on: Measuring the return from pharmaceutical innovation. Deloitte (p.14). London: Deloitte UK.
  2. Mullard, A. (Aug 16 2019). Machine learning brings cell imaging promises into focus. Nature Reviews Drug Discovery 18, 653-655.
  3. Pipeline. Recursion. (2022). Retrieved 1 May 2022
  4. Rees, V, Burnham, R. (Aug 11 2021). Exclusive Report : AI & Informatics : Drug discovery and development. Drug Target Review
  5. Rees, V. (Oct 4 2021). AI & Informatics: Drug discovery and development – industry survey infographic. Drug Target Review.
  6. Schiff, Lauren & Migliori, Bianca & Chen, Ye & Carter, Deidre & Bonilla, Caitlyn & Hall, Jenna & Fan, Minjie & Tam, Edmund & Ahadi, Sara & Fischbacher, Brodie & Geraschenko, Anton & Hunter, Christopher & Venugopalan, Subhashini & DesMarteau, Sean & Narayanaswamy, Arunachalam & Jacob, Selwyn & Armstrong, Zan & Ferrarotto, Peter & Williams, Brian & Johannesson, Bjarki. (2020). Deep learning and automated Cell Painting reveal Parkinson’s disease-specific signatures in primary patient fibroblasts. 10.1101/2020.11.13.380576.
Previous articleRapid Bioprocess Analytics Needed to Drive Improvements in Product Consistency and Quality
Next articleAutomating Intracellular Ligand-Protein Binding Data Analyses