Merrilyn Datta, Ph.D. Definiens

Datafication could change healthcare.

Big data and personalized medicine have been industry buzz terms for quite some time, but while it’s widely known there are correlations between the two, many are still struggling with how to effectively leverage mass amounts of data in order to improve efficiencies, reduce costs, and advance patient-centric treatments.

With healthcare costs in the U.S. increasing steadily over the last 20 years to 17% of GDP, healthcare experts are looking for every path possible for efficiency and reform. Many believe that a long-term source of savings could be the use of big data in healthcare; in fact, the McKinsey Global Institute estimates that applying big data strategies to better inform decision making in U.S. healthcare could generate up to $100 billion in value annually.

The creation of this value lies in collecting, combining, and analyzing clinical data, claims data, and pharmaceutical R&D data to be able to assess and predict the most efficacious treatment for an individual patient. Many have envisioned this as a physician’s portal, which would enable clinicians to query similar patients and see what treatments worked for others, and thereby more effectively choose the best treatment option. Before the true realization of big data integration in healthcare, however, three key areas will need to be addressed that will both enable this vision and create value from big data strategies.

#1. R&D Data Integration for Collaboration

The ability to manage, integrate, and link data across R&D stages in pharma might enable comprehensive data search and mining that identify better leads, related applications, and potential safety issues. However, the practicalities of dealing with big data are substantial. The amount of data generated is growing significantly: In 2007 a single next-generation sequencing machine run could produce a maximum of one gigabyte of data, but by 2011, nearly a terabyte could be created—representing a 1,000-fold increase. The sequence alone is much more useful as it is correlated with phenotypes and other types of data. This has naturally affected the way companies think about data storage and structure, with centralized data repositories and cloud solutions becoming more popular. The two leaders in next-gen sequencing technologies, Illumina and Life Technologies, both now offer cloud solutions for data storage and analysis to meet this growing need.

No less important is the enormous opportunity in data from images. A few years ago, high-content analysis drove the need for simple storage solutions. Today, digital pathology is leading the way with pioneering solutions for datafication of tissue, so that it can be mined and correlated with other types of data such as clinical outcomes or genomic data. While in the past it was impossible to effectively mine image data, researchers such as Andy Beck at Harvard have used image analysis solutions to analyze thousands of image features to discover new biomarkers that correlate with clinical outcomes.

In both the case of next-generation sequencing and image analysis, the most value is achieved when researchers are able to not only merge different datasets, but also to conduct advanced analytics and correlations between data types. For example, advanced statistics might show that when a particular gene of interest is mutated, the level of its phenotypic effect can be correlated with a particular tissue marker, and conclusions might be drawn about its increased effectiveness or safety. Or it might show that a particular tissue lesion is always associated with a safety risk when statistics are performed across lead compound studies. It is this level of integration and correlation where big data will provide the most benefit.

#2. More Efficient and Effective Clinical Trials

Clinical trials are of course necessary for every drug to get to market, and the gold standard is currently a randomized clinical trial backed up by a published paper. However, drugs like Herceptin have showed us that the addition of a companion diagnostic test can promise efficacy to a higher percentage of patients, since we can determine which patients will best respond to treatment. In the clinical trial of the future, big data could enable enrollees to not only be monitored for response but also tracked to see if specific subgroups respond differently. Big data approaches are complementary to traditional clinical trials because they provide the ability to analyze population variability and to conduct analytics in real time. Imagine being able to create dynamic sample size estimations in response to emerging clinical trial data. The key to the value of big data is the ability to enable improvements in clinical trial design that will allow shorter and more efficient trials.

#3. Correlations to Clinical Outcomes

Once R&D data and clinical trial data is indexed for big data analysis, the third piece of the big data puzzle is to fully leverage clinical routine data. Ideally big data approaches can be used to combine patient diagnostic profiles, claims, and clinical outcomes to understand which treatments are most effective. In some sense this will drive new thinking about the standard of care, as providers will be able to understand what drives quality of care, cost, and outcomes in healthcare. Ultimately personalized medicine is about this correlation of diagnostics and outcomes, but tailored to each and every patient.


While big data has already been used successfully in consumer markets, challenges remain to its implementation in healthcare. The primary challenge in moving to big data approaches is simply the vast amount of data in existing systems that currently don’t “talk” to one another and have data that exists in different file types. One of the reasons that image datafication is an important key to the future is that it will enable the linking images to other types of data for mining.

The second challenge for data in the clinical space is how to store and share these large amounts of data while maintaining standards for patient privacy. While many institutions are still struggling with these challenges, some, such as Beth Israel Deaconess Medical Center, have taken the plunge—not only by setting up a private cloud to HIPAA standards, but also by moving into the next level analytics to enable doctors to query patient populations for clinical trials.

Achieving better outcomes at lower costs has become imperative for healthcare, and big data is certainly part of the solution in reaching that goal. Although we are in the early days of healthcare big data, it is clear that strategies for big data in the integration of R&D data, efficient clinical trials, and finally in clinical outcomes are foundational to building that solution, lowering costs, and realizing the future of personalized medicine.

Merrilyn Datta, Ph.D., is CMO of Definiens.

Previous articleThermo Sheds Businesses in Return for EC Approval of Life Tech Deal
Next articleShionogi to Develop Egalet Pain Drugs in Up-to-$425M Deal