Unlike the dopplegängers of folklore or the evil twins of popular entertainment, digital twins don’t just appear seemingly out of nowhere. They are painstakingly developed and refined. And neither are they uncanny interlopers or harbingers of doom. They are meant to be helpful.

In biotechnology, digital twins are probably best known as virtual counterparts of real-world biomanufacturing systems. They gather information from monitoring devices, run analyses, and dynamically adjust process parameters to optimize production. Digital twins of this type are part of a broader movement toward digital biomanufacturing. According to a Research and Markets report, digital biomanufacturing already represented a $15 billion market in 2023. The report added that overall, digital biomanufacturing was expected to grow at an annual rate of 11% over the next decade. The corresponding estimate for the digital twins portion of this market was 19%.

Besides establishing themselves in biomanufacturing, digital twins are starting to make their mark in drug discovery and development. To do so, they model biological entities.

“The concept of digital twins translated to drug development and clinical trials describes virtual representations of systems of various complexities, ranging from individual cells to entire humans, and enables in silico simulations and experiments,” a recent perspective article noted (Expert Opin. Drug Discov. 2024; 19: 33–42.) “Digital twins increase the efficiency of drug discovery and development by digitalizing processes associated with high economic, ethical, or social burden.”

This article, which was prepared by scientists affiliated with Roche Innovation Center, Ludwig Maximilian University of Munich, and other institutions, highlighted the challenges of using digital twins to model biological entities—challenges including the need for large multimodal data sets, interpretive frameworks, and regulatory uncertainties.

The scientists suggested that these challenges could be overcome by augmenting well-established mechanistic modeling technology with generative artificial intelligence (AI) technology. “The current state of digital twins in drug discovery and clinical trials does not exploit the entire power of generative AI yet,” the scientists admitted. “Nonetheless, generative AI has the potential to transform the field by leveraging recent developments in deep learning and customizing models for the needs of scientists, physicians, and patients.”

Finally, the scientists noted that multiple digital twins of various biological entities are available as commercial offerings. In general, as the scientists complained, these offerings tend to come from companies that disclose few methodological and technical details. However, the scientists seemed reassured that open source technologies similar to the commercial technologies have been successful.

A few representative commercial offerings are mentioned in this article. They include digital twins that model diseases, cells, and even the prioritization mechanisms of the human brain. All these offerings promise to transform drug discovery and development.

Leveraging causal AI

Aitia, a word derived from the Greek word for causality, is the fitting name of a digital twin company that is based in Somerville, MA. The company emphasizes that it leverages causal AI to comb through human multiomic and clinical data and reverse-engineer hidden biological circuitry. Whereas generative AI recognizes correlations, causal AI uncovers cause-and-effect relationships.

Aitia asserts that its Gemini Digital Twins are computational representations of disease that capture genetic and molecular interactions that causally drive clinical and physiological outcomes. The company’s core technology is called REFS, for Reverse Engineering and Forward Simulation. Rather than sift through data asking questions of the “Is A related to B?” type, REFS asks questions of the “Does A cause B?” type.

According to Aitia, Gemini can simulate gene and protein knockdowns to enable the discovery and validation of biological mechanisms and drug targets. The company also says that it can identify and advance molecular entities into development candidates and optimize clinical trial designs.

Aitia recently announced that it had partnered with Servier to help identify subpopulations of patients who could respond favorably to Servier’s drug candidate for Parkinson’s disease. In this collaboration, Gemini will be used to simulate the mechanisms of action of the candidate—an inhibitor of leucine-rich repeat kinase 2—to highlight biomarkers in patients. Previously, Aitia and Servier indicated that they were working together to clarify the molecular mechanisms underlying multiple myeloma, and to develop novel drug targets and drug candidates relevant to pancreatic cancer.

Aitia is also collaborating with Charles River Laboratories. The companies announced that Aitia would gain access to Charles River’s Logica, an AI-powered drug solution platform, to create and advance drug candidates for neurological indications, including Alzheimer’s, Parkinson’s, and Huntington’s diseases, and for cancers, including prostate cancer and multiple myeloma. The companies added that they intended to develop a patient-derived xenograft digital twin to predict the best tumor models for in vivo oncology research.

Modeling cellular responses to perturbation

One of the companies alluded to earlier—that is, one of the companies said to have developed technology for which open source counterparts offered at least partial corroboration—is DeepLife, a digital biotech company based in Paris. The company, like academic researchers affiliated with Helmholtz Munich, the Technical University of Munich, and other institutions, has demonstrated that computational modeling can be used to accurately predict cellular responses to perturbations.

One of the open source technologies is an compositional perturbation autoencoder, or CPA (Mol. Syst. Biol. 2023; 19(6): e11517). It reportedly combines the interpretability of linear models with the flexibility of deep learning approaches for single-cell response modeling. “CPA learns to in silico predict transcriptional perturbation response at the single-cell level for unseen dosages, cell types, time points, and species,” CPA’s creators stated. “We envision CPA will facilitate efficient experimental design and hypothesis generation by enabling in silico response prediction at the single-cell level and thus accelerate therapeutic applications using single-cell technologies.”

Another of the open source technologies is called scGEN (Nat. Meth. 16; 715–721). It is a model combining variational autoencoders and latent space vector arithmetics for high-dimensional single-cell gene expression data. “We show that scGen accurately models perturbation and infection response of cells across cell types, studies, and species,” scGEN’s creators reported. “With the upcoming availability of large-scale atlases of organs in a healthy state, we envision scGen to become a tool for experimental design through in silico screening of perturbation response in the context of disease and drug treatment.”

For its part, DeepLife asserts that its Digital Twin of Cells functions as a numerical representation of cells that enables scientists to rapidly evaluate how unhealthy cells respond to drug candidates in silico; to decipher therapeutic mechanisms of action; and to identify new targets and biomarkers. “Built on DeepLife’s OMICS Catalog and atlases, which provide an extensive repository of single-cell RNA sequencing data covering a plethora of cell types and tissues, this Digital Twin utilizes state-of-the-art large language models for processing and interpreting vast biological datasets,” the company explains. “DeepLife’s Digital Twin technology serves as a foundation model, which can simulate the biological behavior of real-world cells under varying conditions or disease states.”

DeepLife recently entered a research collaboration with the Mechanisms of Inherited Kidney Disorders (MIKADO) group at the University of Zurich to reconstruct high-resolution model systems of cystinosis using digital twins of diseased cells. When the collaboration was announced, the MIKADO’s team lead, Alessandro Luciani, PhD, offered this observation: “Employing virtual ‘replicas’ of cystinosis-affected cells, tissues, and organs that incorporate detailed mechanistic data, disease manifestations, electronic health records, and lifestyle traits could help uncover disease signatures that predict drug efficacy and elucidate drug mode of action to improve clinical outcomes.”

Optimizing protocols for cell replacement therapy

Damaged or diseased cells in the body can be replaced by functional cells or tissues; however, many cell types have proven difficult to produce, limiting the prospects for cell replacement therapy. A possible solution lies in the production of somite-
derived cells. Somites, which are transient embryonic structures that give rise to various structures of the musculoskeletal system, can be used to produce muscle, brown adipose, cartilage, bone, tendon, and dermis cells. However, this possibility cannot be realized unless efficient somite differentiation protocols can be developed.

A company with expertise in such protocols is the aptly named Somite Therapeutics. The company, which is based in Boston, MA, has combined AI technology and data-rich sources (such as scRNA-Seq, scATAC-seq, and gene expression databases) to create a digital twin of the embryo that can facilitate the rapid identification of novel protocols for generating new cell types, the discovery of new regulators of cell differentiation, and the execution of rapid protocol optimization cycles.

To illustrate how digital twins could be used to advance protocol discovery, Somite has presented a white paper that includes several case studies. For example, Somite describes how it used its digital twin technology to improve upon an established protocol for producing human satellite cells.

“Initial protocols based on expert knowledge of the embryo yielded stem cells from induced pluripotent stem cells with a purity of 25%,” the white paper detailed. “Computational analysis of digital twin scRNA-seq revealed signatures of ligand-mediated signaling with different pathways from those used in the established protocol. A resulting optimized protocol using these predictions generates cultures containing up to about 75% human stem cells.”

The paper noted that these stem cells were shown to be functional when they were used to regenerate injured muscles in mice and restore force production as compared to uninjected controls. “This work,” the paper added, “demonstrates the utility of digital twins for optimization and establishes a working cell replacement therapy protocol for muscle stem cell therapy with potential application to diseases such as Duchenne muscular dystrophy.” Several other somite-derived cell types (and the conditions that they could be used to treat) have been listed by Somite. They include brown adipocytes (metabolic disease), tendon cells and chondrocytes (connective tissue syndromes), and dermis cells (severe burns). The company recently announced that it had raised $5.3 million in preseed funding.

Creating a digital twin ecosystem

Whereas a digital twin can represent biological reality at the level of molecules, cells, tissues, organs, patients, or populations, an ecosystem of connected digital twins can transcend scale, capturing interactions and chains of contingency that range, for example, from the genomic to the phenotypic. So, if a digital twin can present an uncanny double, a digital twin ecosystem can present an alternative, parallel universe. Unlike most of the alternative universes of science fiction, digital twin ecosystems are benign.

Foundational Research Gaps and Future Directions for Digital Twin illustration
This illustration of a cancer patient’s digital twin depicts the bidirectional feedback flow between the real and the virtual. Data from the patient are used to update the virtual models, and the virtual models are used in turn to inform treatment planning, which typically involves a human in a decision-making role. That is, the digital twin provides decision support. A human may also play a crucial role in designing, managing, and operating elements of the digital twin, including selecting sensors and data sources, managing the models underlying the virtual representation, and implementing algorithms and analytics tools [National Academy of Sciences]

Digital twin ecosystems have been the subject of recent workshops organized by the National Academy of Sciences. At these workshops, the participants suggested that digital twin ecosystems could model multidimensional and multiscale biological complexity while overcoming bioinformatic challenges such dark data and hidden biases. The participants also acknowledged that digital twin ecosystems would have to deploy a “combination of data-driven and mechanistic models,” and develop new techniques to “harmonize, aggregate, and assimilate heterogeneous data.” Nonetheless, the participants expressed optimism that new digital twin technology would emerge from “ideas and principles drawn from an understanding of life, rather than on direct harnessing of life’s mechanisms or hardware.”

This optimism appears warranted in light of a recent announcement from Genzeva, molecular diagnostic laboratory, and Rylti, a provider of advanced analytics solutions. These companies participated in a study that described how “an innovative groundbreaking application of AI Knowledge Engineering and use of a biomimetic digital twin ecosystem for advanced genomic research” can help researchers “understand the mechanisms of disease.” The disease that was the subject of this particular study was endometriosis
(J. Mol. Diagn. 2024;26: 543–551).

Patient samples and matched controls were sequenced (with Illumina’s NovaSeq 6000, and with a Dragan pipeline for secondary analysis), and gene-disease associations were uncovered (with QIAGEN’s Clinical Insight Interpret platform for Phenotype-driven analysis). Then, a digital twin ecosystem was created by uploading all patient metadata, medical history, pathology reports, and transcriptomics.

“Clinical exome sequencing study on patients with endometriosis indicated four variants of unknown clinical significance potentially associated with endometriosis-related disorders in nearly all patients analyzed,” the study’s authors reported. “One variant of unknown clinical significance was identified in all patient samples and could be a biomarker for diagnostics.”

Fittingly, the study encompassed multiple dualities. Less a mirror than a kaleidoscope, the study revolved around terms that could be interpreted in different ways. For example, “ecosystem” could be understood to encompass a group of biological entities, a group of digital twins, or a group of technology platforms. Similarly, “biomimetic” could refer to the emulation of biological processes, or of human thought, that is, of human-style prioritization processes. The biomimetic AI engine used in the study, Rylti’s RKE platform, was designed using ideas and principles drawn from research in cognitive science, neuroscience, and psychology.

Biomimetic Digital Twin Ecosystem_J Molec Diagn graph
Multiomics technologies were used in concert with a biomimetic digital twin ecosystem to uncover new DNA variants associated with the development of endometrial related disorders. In this work—which was accomplished by scientists affiliated with Genzeva, LumaGene, Rylti, Brigham and Women’s Hospital of Harvard University, and QIAGEN Digital Insights—patient and matched control samples underwent exome sequencing, secondary analysis, and tertiary analysis. All DNA variants and phenotype-ranked variants were exported to the digital twin ecosystem’s data lake. This image, which is shown here under CC BY 4.0, originally appeared in the Journal of Molecular Diagnostics (2024; 26: 543–551).
Previous articleLeveraging Proteomics for Precision Cancer Research, Diagnosis, and Treatment
Next articleHepatitis C Vaccines Nearer with New Details about the Virus’s Envelope