June 1, 2016 (Vol. 36, No. 11)

Vicki Glaser Writer GEN

Computer Scientist Pedro Domingos Is Possessed Of an Ambition That Is Archimedean

Pedro Domingos, Ph.D., professor of computer science and engineering at the University of Washington, believes that cancer can be eliminated if we get serious about machine learning, an approach to artificial intelligence (AI) that gives computers the ability to “think” for themselves. Dr. Domingos argues that a thinking machine—one that learns from experience, not programming—will draw inferences, generate discoveries, and offer ever more accurate predictions about each cancer’s origins and vulnerabilities. The ultimate output, suggests Dr. Domingos, will consist of treatments and cures.

Dr. Domingos is the author of “The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World.” This book holds that with access to enough data, a Master Algorithm could derive all knowledge including scientific knowledge.

More and more scientific data is becoming available to “learners,” algorithms that can figure things out on their own and essentially program themselves. Much of this scientific data is relevant to bioinformatics.

When the field of bioinformatics was new, data was relatively scarce, and there was a clearly perceived disconnect between the black-and-white nature of computational analysis and the gray, fuzzy nature of biology. Now bioinformatics is maturing, and large amounts of data are being produced by high-throughput screening, DNA microarrays, and next-generation sequencing. And so, as Dr. Domingos explains in this interview, bioinformatics is gaining the heft needed to take on cancer.

GEN: How is machine learning different than what is commonly known as AI?

Dr. Domingos: Machine learning is a subfield of artificial intelligence. Different subfields of AI deal with different aspects of intelligence: reasoning, language, vision, problem-solving, etc. Learning is arguably the most important one. If a computer were as intelligent as a human but had no ability to learn, it would immediately fall behind and never recover. Machine learning is what’s driving the current wave of progress in AI.

GEN: What do you view as the biggest misconceptions about machine learning at present and the greatest flaws in the arguments of machine-learning skeptics?

Dr. Domingos: People think that machine learning is much more limited than it really is—that it’s just about summarizing data, that all it does is discover simple correlations, that it can’t predict previously unseen events (a.k.a. “black swans”), that it can’t be combined with preexisting knowledge, or that it can’t replace intuition. All of these are false, and arise because for most people machine learning is a black box: what goes on inside is a mystery. But if you think of a learning algorithm as a miniature brain—which is what it really is, at some level—you’ll realize that it can do all of those things.
 

GEN: In your book, you present a scenario in which a Master Algorithm could cure cancer. To realize this scenario, would all aspects of biology and pathology have to be translated into digital information? Is this an achievable goal now or in the near future?

Dr. Domingos: Absolutely. These days every learning algorithm worth its salt can deal with gray, fuzzy knowledge, and with big, noisy data. This is one of the key changes from the earlier days of AI when things didn’t work very well.
 
How much digitalization of biology and pathology needs to occur remains to be seen. The best-case scenario is that a very basic machine-learning approach will suffice: computers learn to predict which drugs work for which cancers by generalizing from past cases, with minimal knowledge of biology.
 
While this approach has been surprisingly successful with many diseases, I think it’s unlikely to work for cancer. We will need to model how cells work at a fairly fine granularity, and it will be harder, but we’ll get there.
 

GEN: You describe machine learning as “the scientific method on steroids,” being able to generate, test, discard, and refine hypotheses in silico. What would the Master Algorithm you propose, a single, universal learning algorithm—which would have access to all of the information in the biomedical literature and to patient records—look like and be capable of in terms of discovering a cure for cancer? How would that work?

Dr. Domingos: It would provide a detailed model of how cells work, both healthy and cancerous. We would then be able to instantiate that model to each particular patient and cancer, and probe it with different drugs until we were able to find one that works, or even design a new drug, if needed. All of this would be done at high speed, giving results the same day that the tumor is sequenced.

GEN: In fact, is it more appropriate to talk about a Master Algorithm discovering “cures” for cancer, since cancer is not one disease, has many different causes and presentations, and tumors can mutate as they grow and spread?

Dr. Domingos: Exactly. There is no single cure for cancer in the sense of a single drug that cures all cancers. The real cure is a machine learning system that inputs the cancer’s details as well as the patient’s genome, medical history, etc., and outputs the recommended treatment.

GEN: You write that machine learning alone will not cure cancer. Instead, you suggest that machine learning will do so in concert with cancer patients who will share their data for the benefit of future patients. Why is access to patient data and clinical outcomes so important? How does it contribute to “inverse deduction,” which you describe as the first step in curing cancer?

Dr. Domingos: Machine learning is powerful because it generalizes from data. The more data we have, the more we can learn. Conversely, if we have no data, there’s nothing we can learn. It is by generalizing from patients’ data and their outcomes that we will find a cure for cancer. And precisely because cancer is a very complex, multifaceted disease, it is unlikely that a small amount of data will do. Because every cancer is different, there is something to learn from every patient.
 
Inverse deduction is the process of figuring out what general rules are needed to infer the consequences we see in the data from the premises we also see in the data, so that in the future we can infer those consequences when we don’t know them. In the case of curing cancer, the premises are the patient’s and cancer’s genomes, etc., and the consequences are the recommended treatments.
 
By generalizing from which treatments worked for which cancers and which didn’t in the past, we can predict which will work in the future. That’s how machine learning works, in a nutshell.

GEN: You call the complex, machine learning-based program that will one day be able to input a cancer’s genome and output a drug to kill the tumor CanceRx, and write that it is now possible to picture what that program will look like. Can you describe it? Can you tell us how far along it is in development?

Dr. Domingos: As I mentioned, CanceRx could be as simple as a system that directly predicts which drug to use from the patient’s data, or as complex as using a detailed model of how cells work to test candidate drugs in silico. Rapid progress is being made across the full spectrum, from assembling patient data and learning from it to modeling metabolic and gene regulatory networks, but there is still a long way to go.

GEN: Can some of the same concepts and uses of machine learning discussed above be applied to vaccine development, and is it being used for vaccine discovery? When an infectious agent such as Zika virus emerges and rapidly spreads, do you envision that a Master Algorithm could relatively quickly identify an effective vaccine?

Dr. Domingos: Yes. For example, David Heckerman’s group at Microsoft Research has used machine learning to develop an AIDS vaccine, and they are now gearing up for clinical trials in humans. The AIDS virus is a tough adversary because it mutates very quickly, so a single point of attack is unlikely to stop it for long.
 
David’s approach was to discover many different points of attack from data, and to develop a vaccine that works against enough of them at once that the virus is unlikely to escape. Currently, this process still takes a long time, but the ultimate goal is to have a vaccine available the same day that a new virus is sequenced.
Previous articleStemCells Winding Down Operations after Ending Phase II Study
Next articleMulti-Dimensional Impact of the Public-Private Center for Translational Molecular Medicine