Sagar Anisingaraju Chief Strategy Officer Saama Technologies

Applying Machine Learning and Robotic Process Automation to Plan Clinical Trials

Randomized clinical trials (RCTs) are the “Gold Standard” for testing new therapeutics for safety and efficacy in human subjects. However, their success rates range between 40–80% across phases. 1 Significant failure rates can be attributed to patient recruitment, which is influenced by a number of factors.2 About 90% of successful trials are delayed by at least six weeks due to the failure of meeting enrollment timelines.1 At today’s high cost of conducting RCTs, extending a study timeline by as little as a month can result in significant budget overruns, not to mention the potential revenue and opportunity loss from delayed drug commercialization. The available number of patients willing to participate in a study and patient retention 3 in a clinical trial are, therefore, pivotal factors that determine the likelihood of timely and successful completion of any study. Other factors that impact RCTs include the availability of principal investigators (PIs), clinical trial sites, protocol complexity, and study design.

How to challenge the status quo to move the needle on clinical trial operations?

The pharma industry is actively beginning to explore continuous learning models. These models benefit from understanding the data that are integrated into clinical-trial processes. The entire RCT lifecycle from the study design to study start to study close is a complex data environment riddled with numerous clinical, technical, business, and compliance processes. Applying machine learning (ML) and possibly robotic process automation (RPA) methods to optimize these processes is going to be disruptive, but should improve the efficiency and success rates of clinical trials.

Let’s take a look at a few critical business endpoints that can be optimized by ML and RPA. 

1. Patient matching: the holy grail of successful trials
How can ML help in identifying, screening, and enrolling the right patient population?
To start with, we establish an eligible patient population using dynamic inclusion and exclusion criteria to evaluate the impact of each criterion on the suitable patient cohort. This allows a study designer to know if there are any modifications required to the criteria to ensure a larger eligible patient population.

This process can be automated by using ML and RPA. Studies indicate that ML reduces manual intervention by almost 85% in screening patients per site.4,5 This equates to significant time and cost savings, especially for large, multicenter trials. Moreover, recruitment success can be linked to the success of these models and RPA execution to better update patient matching the next time.

From historical data analysis, we can look at the underlying root causes of patient acceptance and decline rates for trials. Analyzing and understanding social media and patient discussions at forums provide additional leading indicators of potential study issues. This analysis will also help in identifying clusters of patient conversations so that recruitment drives can be mapped to those interactions. Objective and subjective data such as age, race, socioeconomic levels, family, influencers, seasonality, etc. can be stack-ranked for “similar” studies from historical learning. This functionality will enable us to predict the likelihood of meeting the required recruitment number at sites and the most optimal match to make the study a success in terms of time and budget. These efforts would also lead us to ML-assisted patient enrollment by better identifying and mapping physicians and patients.

ML techniques such as association rules and decision trees will help find trends associated with patient acceptance, adherence, and other metrics. RPA bots can then help speed recruitment by executing initial interactions with prospective subjects before final follow-up by clinical associates. The ultimate goal is to influence patient matching and recruitment strategies to increase clinical participation and success rate.

2. Site selection: the known unknown
Researchers start with a known set of clinical trial sites that have relevant experience in the therapeutic area of a proposed study. A learning model can then be built by analyzing site-specific failures and by ranking them in the order of relevance to the specific target study being planned. This learning model can predict the success probability of a specific site for the target study. Other trials that may compete for patients at the same sites allow investigators to compute the overall probability of success in patient recruitment at the site.

3. Principal investigators: the human elements
There is a finite set of individuals from which to choose PIs and patients treating physicians. Advanced techniques help in analyzing availability, relevant training, and prior experience for the target study within the needed therapeutic area, which can be used to build algorithmic models to predict investigator failures and delays. Understanding and connecting with PIs on focused social forums might give additional data about PIs.

4.  Competition: the race for dominance
ML models can analyze the complex competitive landscape for the proposed investigational new product based on several clinical and commercial dimensions. Running the models built on the pointers mentioned in the aforementioned points for publicly available drug data can be used to build probabilistic predictors of competitive success of the product.

5. What is the financial outcome?
Operational and financial performance data of past historical trials can be used to build a financial model for evaluating the potential costs and timelines of future trials. This model can analyze scenarios to predict the dollar outcomes for predicted delays and failures of target studies during the next five years.

A recent McKinsey Global Institute (MGI) report,6The Age of Analytics: Competing in a Data-Driven World,” explains the role of analytics for enhanced decision making, disruptive business models, and organizational challenges. The MGI report highlights several pharma business-use cases where ML will have a high impact across industries. It is no surprise that amongst MGI’s survey responses, the highest ranked use case is “Optimize design of clinical trials, including label writing and patient selection”.

Building predictive models is an iterative process. They will improve in quality and prediction accuracy as more data is processed and analyzed. One key byproduct of this ML and RPA approach is the identification of repetitive processes and early indication of failure points. A properly designed change-management approach can take the machine-assisted learnings and dramatically alter the existing business processes to optimize behaviors. A combination of cognitive learning, RPA, and properly executed change management is the dose that pharma needs to minimize time delays and reduce cost overruns of RCTs. 

How to implement these models?

As explained earlier, the entire RCT lifecycle is a complex data supply chain with data coming from a variety of providers and ecosystem partners. Conducting specific and targeted ML modeling experiments will need access to clean, orchestrated, and governed data for meaningful results. The pharmaceutical industry should address this in a two-step process. One is to conduct these experiments in a clinical innovations labs environment and second is to use a Cloud environment to easily deploy and allow multiple vendor data to be processed. Clinical data as a Service (CDaaS) with an orchestration and governance model would be the long-term solution to seek. 


Applying ML and RPA methods to optimize clinical operation processes is disruptive. For the transformational efficiencies that the pharmaceutical industry needs, it is a great first step. Apart from the financial benefits, if ML helps in optimizing clinical trials and helps the industry accelerate the journey of a critical drug from lab to market-shelf by weeks, months, or years, there are patients out there who could benefit. That alone is enough justification for the industry to explore and invest in this emerging science.

1. Translational research: 4 ways to fix the clinical trial
2. Recruitment Challenges in Clinical Trials for Different Diseases and Conditions
3. More Patient Interest in Clinical Trials an Opportunity to Boost Enrollment
4. Increasing the efficiency of trial-patient matching: automated clinical trial eligibility Pre-screening for pediatric oncology patients
5. Automated clinical trial eligibility prescreening: increasing the efficiency of patient identification for clinical trials in the emergency department
6. The age of analytics: Competing in a data-driven world

Previous articleTop Eight Asia Biopharma Clusters 2017
Next articleProtein Structures Clarified by Using Novel Technique