Diana Kwon Contributing Editor
Looking beyond the Hype of Machine Learning in Medicine
Artificial intelligence (AI) is starting to make its mark on medicine. By capturing information from a variety of sources including electronic health records, genomics, and wearable devices, scientists and engineers hope to use machine learning (ML) to aid diagnoses and predict patient outcomes. Some have already begun to demonstrate the potential of this technology. For example, a team at Google recently reported a deep learning algorithm that detects diabetic retinopathy with a high level of specificity and sensitivity. Other groups have developed AI programs to accurately predict breast cancer risk, identify colonic polyps, and to classify lung cancers based on prognosis.
Despite its potential, this technology is far from perfect. For instance, Google Flu Trends—an algorithm that uses internet searches to estimate flu prevalence—vastly overestimated flu prevalence in the U.S. in 2013 despite an otherwise good track record. Earlier this year, an investigation by STAT News found that IBM’s Watson for Oncology was struggling to live up to expectations.
As AI further penetrates the field of medicine, some are concerned about unexpected consequences. Federico Cabitza, Ph.D., a professor of human-data interaction at the University of Milano–Bicocca in Italy, and his colleagues, Raffaele Rasoini and Gian Franco Gensini, both at the Center for Advanced Medicine in Florence, Italy, outlined some of their concerns in a JAMA article published earlier this year.
According to Cabitza, he and his co-authors share the feeling that the attention, expectations and hopes that some people currently have toward to the role of machine-learning decision support systems (ML-DSS) are “extremely off-balance,” due to the hype around the topic as well as the trivialization of what ML is and what it can actually do for our health and well-being.
“None of us really aimed to deny the potential advantages that these systems could bring into medical practice, nor do we believe that their introduction should be blocked or hindered in virtue of a prejudicial opposition to innovation,” Cabitza said. “In the same vein though, [ML] should not be adopted nor advocated in medicine on the basis of a sheer pro-innovation bias.”
One possible unforeseen consequence of integrating machine learning into medicine, according to Cabitza and his colleagues, is the increased reliance on data at the expense of other factors, such as psychological or organizational issues, that are more difficult to describe. For example, in a 2015 case study, researchers reported that when an AI-based decision support system analysed roughly 14,000 patients with pneumonia, it came to the conclusion that those who also had asthma possessed a lower risk of mortality. However, the algorithm had failed to take into account a critical confounding factor—asthmatics were also more likely to be admitted directly into intensive care units.
“I think this issue of context is really important,” said Ash Damle, the CEO of Lumiata, a medical AI company. “We at Lumiata take huge amounts of time to think about how it is that we actually construct the training environment so that if the right context [is not there], we can determine that and say, ‘I’m sorry, we cannot provide any output.’”
Context is a key concern for precision medicine, said Christine Cournoyer, CEO of N-of-One, a molecular decision support company. For example, she explained, when her company matches a patient’s profile to a therapy, a scientist reads through and analyzes all the appropriate published articles to evaluate the validity and strength of the evidence about a specific drug. “It’s the context that’s coming into the evaluation that wouldn’t be there with AI,” Cournoyer added.
Although ML algorithms can be trained with giant pools of data, applying this to precision medicine, where each patient’s unique molecular profile is taken into consideration, is still a challenge. “There’s a role for AI in things that are very clearly defined, [but] once you get past the FDA-approved drugs and well-documented hot-spot mutations, it’s all judgement—and that’s where the oncologist really is [key],” Cournoyer said. Physicians need to decide which clinical trials are appropriate for their patients while accounting for variables such as how sick the patient is, how far they can travel, and which complexities in their molecular profile are important, she added. “That level of uncertainty and complexity is something that we haven’t codified yet—certainly not in the electronic medical record.”
In general, the inherent uncertainties in healthcare pose an issue for applying AI, Cabitza and his colleagues argue, pointing out that the effect of observer variably on the accuracy of machine learning algorithms is often underrated. “I think the article did a great job of highlighting the problem of uncertainty in medicine,” Ziad Obermeyer, an assistant professor of health care policy at Harvard Medical School, noted. “In most other applications of machine learning we know the truth—when we detect cats in videos, we know what a cat is. But on a chest x-ray, how do we know what cancer is? If we want to predict which patients are at high risk of developing serious infections like sepsis, how do we know who has sepsis when even the definition of sepsis is controversial?”
To avoid developing machine learning algorithms that reflect human errors and biases, developers need to take the challenge of working with medical data more seriously, Obermeyer says. “Relying on diagnostic codes or doctors’ opinions is hugely problematic when we’re dealing with complex medical realities, filtered through the inequalities and biases that plague our current health care system. There is no one-size-fits-all solution here, so each prediction problem requires very careful work to define the outcome and ensure no bias creeps in.”
Another issue raised by Cabitza and his colleagues is that many machine learning algorithms are black box models.
“Recent ML-DSS are interesting for their ‘oracular’ nature—that is, they can exhibit, under some circumstances, very high discriminative accuracy,” Cabitza said. “At the same time they give no explanation for their advice, that is, they are sort of ‘black boxes,’ even for their builders. These systems are based on the data used to ‘train’ them, not on explicit rules that humans can build to account for the data they collect of the observed phenomena.”
Dealing with the “black box” is one of the key concerns for Lumiata, Damle noted. “We spend a lot of time trying to figure out how to actually make deep learning interpretable.” In order to do this, he added, the company provides clinical rationale with each data-based assessment, which outlines a detailed chain of reasoning for the results.
Jonathan Chen, a physician-scientist at Stanford University, said that in addition to the issues raised by Cabitza and colleagues, it is also important to consider the utility of the prediction a person is provided with. “An accurate prediction of what’s going to happen to somebody does not tell you what to do about it, and it also doesn’t mean it’s even possible to change the outcome,” he added.
Looking forward, Cabitza says that he hopes to see more research comparing the accuracy of teams of diagnosticians using ML-DSS to those not using this technology to see whether there is a significant difference in effect size, and at what expense. “I believe that [machine learning] experts should become more aware of [the research providing] insights about the information and automation bias that their system can induce in their users, and consequently reconsider some priorities in their research agenda,” Cabitza said. “Only in doing so [can] they succeed in building systems that have a positive impact on medicine and the health system, and therefore also on the health, safety and well-being of all of us.”
This article was originally published in the September/October 2017 issue of Clinical OMICs. For more content like this and details on how to get a free subscription to this digital publication, go to www.clinicalomics.com.