Intelligent use of large-scale data has become fundamental to other industries: finance, insurance—even sports. But despite its importance in areas of research, data analytics has the potential to do much more if applied across the pharmaceutical enterprise.
In the past 15 years, biology has been transformed by the availability of large-scale genetic and genomic data. The first ten years of work on the human genome yielded one draft genome. The last ten years have yielded over ten thousand. Advances in technology have enabled high-throughput gene expression profiling, cancer genome analysis, and other disciplines to change the way biology is studied. Cheminformatics allows companies like Numerate to screen millions of compounds for activity by purely computational prediction. However, there are much greater opportunities for data-driven transformation across the broader pharmaceutical enterprise.
These opportunities arise, in part, because of the broad trend toward data being tracked and recorded in new and far-reaching ways. Importantly, many of these are outside the pharma industry. Medical records are collected electronically on an unprecedented scale, driven in part by federal “meaningful use” programs. These records reveal how diseases manifest and how treatments are used in the real world. Social media also contains vast amounts of information on real patient experiences with both diseases and treatments. And in the sales and marketing of drugs, data on program effectiveness is collected in real-time by reps, and companies like Aktana are interpreting it to understand where physicians perceive value.
Getting data is only the first step. The true value arises from analytics that generate actionable insights. In many cases, this means predictive modeling: developing algorithms that reveal what drives an outcome of interest (such as response to therapy or drug choice) and allowing that outcome to be predicted in the future. The data scientists that can carry out this type of analysis are multidisciplinary experts with skills from statistics, computer science, biology, chemistry and other fields, and they are highly sought after.
The conventional ways to engage data analysts involve building internal teams of scientists or buying time from consultants. However, data analytics is also particularly well-suited to crowdsourcing, which opens up a problem for many people to address. It’s inevitable that most of the world’s experts in any domain are outside any single pharma company. Even with strong internal teams, as Bill Joy of Sun Microsystems insightfully noted, “Most of the smartest people work for someone else.” Crowdsourcing allows those people to be tapped in a highly flexible and cost-effective manner. A team of experts can coalesce around a problem, working on it only as long as necessary, and then move on.
In 2006, Netflix used crowdsourcing to improve their ability to suggest movies to their customers. Rather than just inviting people to work in isolation, they set up an online competition in which people submitted entries in real-time and vied to come up with the best solution. This is a particularly effective approach to predictive modeling analytics. Seeing their rivals above them on a leaderboard drives people to continuously generate better results. In the Netflix competition, the company’s internal method was surpassed within six days, and the eventual winner was more than 10% better.
The competition approach is equally applicable to pharmaceutical industry problems. For example, Boehringer Ingelheim sponsored a contest to develop methods to predict small molecule safety that resulted in a 25% improvement over an industry standard approach. In the Heritage Health Prize competition, methods are being developed to predict which patients will require hospitalization, and for how long, over the next twelve months. Other competitions have been used to predict patient outcomes, sales patterns, and clinical outcomes. In each case, the results were better than any methods that had previously existed.