Key challenges were encountered during the emergence of systems biology. First, while genome sequencing and DNA microarray technologies were readily available, the cost at the time was still extremely high, thus limiting the amount of quality data being produced to build and validate these models. Second, there was insufficient computer processing speed available to effectively churn through all the omics data being produced in order to construct the system.
Over time, technology advancements allowed for dramatically faster, less expensive DNA sequencing. In parallel, advances in supercomputing began to offer researchers the processing capacity they needed to begin extracting knowledge from the crush of data in a timely manner.
But major stumbling blocks toward a holistic view of the system remained. Newly developed mechanistic models were being applied to data. While these models were able to put some of the data into context, the approach was (and remains) limited because these models are based on known biology—perhaps at most 5–10% of the potentially billions of interactions among our roughly 25,000 genes and hundreds of thousands of proteins.
The ability of these models to create meaningful simulations and enable new discoveries was further limited by their reliance on literature-based knowledge, not raw data, which inherently biases their results. The use of incomplete models to assess data limits the ability to create meaningful simulations and make accurate predictions.
Furthermore, the actual development of these mechanistic models also poses a significant challenge to researchers. Model development is a painstakingly slow, manual process, often requiring months and years.
A data-driven, reverse-engineering approach to modeling holds significant promise for bringing greater predictive power to systems biology and drug development. Starting with raw data, this new approach enables the creation of unbiased in silico “blueprints” of disease states and normal human biology that allow researchers to understand the driving perturbations underlying disease and ways of countering those perturbations.
Gene Network Sciences (GNS) has developed a Reverse Engineering/Forward Simulation platform (REFS™, Figure) that rapidly conducts billions of calculations on raw biological data (genetic and epigenetic, molecular profiling, and clinical) to reveal how the biomolecules interact with one another in the complete biological system.
Using IBM Blue Gene supercomputers with more than 30,000 processors, the REFS platform produces a blueprint that defines the most likely causal network connections among all of the variables measured and best describes the underlying system that gives rise to the raw biological data.
In a recent project with Johnson & Johnson, the REFS platform identified seven new genes that were expected to play a role in the efficacy of a cancer drug in development. siRNA knockdown experiments subsequently validated six of the seven predictions, making these biomarkers key to stratifying patient responders from nonresponders.
REFS-based models can also be queried through in silico experiments to discover the highest impact molecular targets for the disease being studied, markers related to specific drug treatments, and the effects of drug dosages or combinations at different time points. Such results, obtained in hours or days rather than months or years, can be validated easily at the lab bench.
Another application of the REFS platform is the evaluation of SNP and gene-expression data in conjunction with Phase II clinical outcomes data for the purpose of reverse engineering a model of disease in patient cohorts. REFS-based simulations can be used to identify key molecules responsible for nonresponse in cohorts. These markers can then be used to stratify patient populations for Phase III trials.