Researchers at the New York Stem Cell Foundation Research Institute have unveiled a new platform for discovering cellular signatures of disease, which integrates robotic systems for studying patient cells with artificial intelligence methods for image analysis. Using their automated cell culture platform, the scientists collaborated with Google Research to successfully identify new cellular hallmarks of Parkinson’s disease by creating and profiling over a million images of skin cells from a cohort of 91 patients and healthy controls.
“Traditional drug discovery isn’t working very well, particularly for complex diseases like Parkinson’s,” noted NYSCF CEO Susan L. Solomon, JD. “The robotic technology NYSCF has built allows us to generate vast amounts of data from large populations of patients, and discover new signatures of disease as an entirely new basis for discovering drugs that actually work.”
Marc Berndl, Software Engineer at Google Research, added, “This is an ideal demonstration of the power of artificial intelligence for disease research. We have had a very productive collaboration with NYSCF, especially because their advanced robotic systems create reproducible data that can yield reliable insights.”
Notably, the new platform is disease agnostic, requiring only easily accessible skin cells from patients. It can also be applied to other cell types, including derivatives of induced pluripotent stem cells that NYSCF creates to model a variety of diseases.
Solomon, Berndl and colleagues described the technology in Nature Communications, in a paper titled “Integrating deep learning and unbiased automated high-content screening to identify complex disease signatures in human fibroblasts.” In their report, the team concluded that the platform “… represents a powerful, unbiased approach that may facilitate the discovery of precision drug candidates undetectable with traditional target and hypothesis-driven methods.”
A major challenge in discovering effective therapies for complex diseases is defining robust disease phenotypes that are useful for high-throughput drug screening, the authors noted. “The increasing availability of patient cells through biobanking and induced pluripotent stem cell (iPSC) models presents an excellent opportunity for cell-based drug discovery, but in the absence of reliable drug targets, new methods to discover unbiased, quantitative cellular phenotypes are still needed.” Emerging techniques in artificial intelligence (AI) and deep learning–based analysis could provide new avenues for speeding drug discovery, they suggested, by “distinguishing drug-induced cellular phenotypes, elucidating mechanisms of action, and gaining insights into drug repurposing.”
Parkinson’s disease (PD) is the second most prevalent progressive neurodegenerative disease, and affects 2–3% of individuals over the age of 65 years, the investigators noted. While variants in many genes, including LRRK2, GBA and SNCA, have been associated with PD risk, more than 90% of cases are sporadic, and caused by unknown genetic and environmental factors. And although substantial progress has been made in clarifying the pathological mechanisms underlying PD, the authors pointed out, “… the failure of recent clinical trials targeting established pathological pathways suggests that current drug discovery strategies remain inadequate.”
The newly reported approach leveraged NYSCF’s vast repository of patient cells and state-of-the-art robotic system – The NYSCF Global Stem Cell Array®—to profile images of millions of cells from 91 Parkinson’s patients and healthy controls. Scientists used the Array® to isolate and expand fibroblasts—“a readily accessible cell type that reflects donor genetics and environmental exposure history,”—from skin punch biopsy samples, label different parts of these cells using a technique called Cell Painting, and create thousands of high-content optical microscopy images. The resulting images were fed into an unbiased, artificial intelligence–driven image analysis pipeline, identifying image features specific to patient cells that could be used to distinguish them from healthy controls. The authors further explained, “… we combined scalable automation and deep learning to develop a high-throughput and high-content screening platform for unbiased population-scale morphological profiling of cellular phenotypes … We use fixed weights from a convolutional deep neural network trained on ImageNet to generate deep embeddings from each image and train machine learning models to detect morphological disease phenotypes. Our platform’s robustness and sensitivity allow the detection of individual-specific variation with high fidelity across batches and plate layouts.”
“These artificial intelligence methods can determine what patient cells have in common that might not be otherwise observable,” said co-corresponding author Samuel J. Yang, Research Scientist at Google Research. “What’s also important is that the algorithms are unbiased—they do not rely on any prior knowledge or preconceptions about Parkinson’s disease, so we can discover entirely new signatures of disease.”
“Excitingly, we were able to distinguish between images of patient cells and healthy controls, and between different subtypes of the disease,” noted co-coresponding author Bjarki Johannesson, PhD, a NYSCF Senior Investigator on the study. “We could even predict fairly accurately which donor a sample of cells came from.” As the scientists reported in their paper, “Importantly, our unbiased profiling approach also identified generalizable PD disease signatures, which allowed us to distinguish both sporadic PD and LRRK2 PD cells from those of healthy controls.”
The Parkinson’s disease signatures identified by the team can now be used as a basis for conducting drug screens on patient cells, to discover which drugs can reverse these features. The researchers are also hopeful that the platform can open new therapeutic avenues for many diseases where traditional drug discovery has been unsuccessful. “Our ability to identify Parkinson’s-specific disease signatures using standard cell labeling and deep learning–based image analysis highlights the generalizable potential of this platform to identify complex disease phenotypes in a broad variety of cell types,” the team stated.
“This is the first tool to successfully identify disease features with this much precision and sensitivity,” said NYSCF senior vice president of discovery and platform development Daniel Paull, PhD. “Its power for identifying patient subgroups has important implications for precision medicine and drug development across many intractable diseases.”
As the authors concluded, “To our knowledge, this is the first successful demonstration in which automated, unbiased deep learning–based phenotypic profiling is able to discriminate between primary cells from PD patients (both sporadic and LRRK2) and healthy controls … The scale of this unbiased high-content profiling experiment is, to our knowledge, unprecedented: it provides the scientific community with the largest publicly available Cell Painting dataset to date (in terms of pixel count) at 48 terabytes in size.” The Cell Painting dataset is available to the research community, at https://nyscf.org/nyscf-adpd/.