Biopharmas are warming up to artificial intelligence (AI), but a series of challenges will need to be addressed before it becomes widely used by drug developers, a panel of industry executives agreed.
Speaking at the 2019 Annual Meeting of NewYorkBIO in New York City yesterday, panelists identified those challenges as finding more and better data, integrating data from multiple sources, and creating partnerships to gather and analyze that data.
The panel also cited challenges that go beyond data, such as attracting a new generation of professionals capable of applying AI and related technologies such as machine learning—and adapting biopharmas to the new technologies.
Those observations are in line with a study released today by The Pistoia Alliance, a global not-for-profit organization of more than 150 members established by executives from AstraZeneca, GlaxoSmithKline (GSK), Novartis, and Pfizer. The Alliance surveyed 190 life sciences professionals in the US and Europe, with 52% citing access to data, and 44% a lack of skills, as the two key barriers of adoption of AI and machine learning.
“There’s so much culture change that we have to implement in order to structure our industry to take advantage of analytics, to drive it. Today, we don’t incentivize our scientists to annotate, curate their data in order to make it ready for analytics,” said Ramesh Durvasula, PhD, information officer for research at Eli Lilly. “We don’t have a system of bringing incentives to chemists to annotate their electronic lab notebooks sufficiently, such that we can do text mining and your other AI capabilities.
“They write whatever is the minimum required for compliance, and then they move on to the next experiment, so that when you go back and try to apply text mining and image recognition to that lab notebook, it’s a mess,” Durvasula added. “It’s very dirty data because we don’t pay chemists to curate the data. We pay chemists to make compounds.”
AI use will increase as more professionals enter biopharma with backgrounds that combine mastery of science and data. Karen Akinsanya, PhD, SVP and chief biomedical scientist at Schrödinger, cited the growth of dual MD/PhD and computer programming/computer science programs at many universities as increasingly shaping the education of the emerging generation of drug discovery scientists.
“We have folks who are both medicinal chemists and Python programmers. It’s really fascinating for the older ones of us to see how they approach problems in drug discovery. I think it’s surely going to lead to further exploration over time,” Akinsanya said.
The additional technology background will help drive the cultural change needed to advance AI and related technologies, added Paolo Guarnieri, MD, a principal scientist in the cardiometabolic disease research department at Boehringer Ingelheim.
“They think in ontologies”
“You can see that the newer generation of scientists, the youngest ones, they come with a computer,” Guarnieri said. “The way that they approach describing a phenomenon is different from a scientist from an older generation that is more verbal. I can see that they think in ontologies.”
Achieving culture change, Durvasula said, will start with improving on the data collected, and ultimately on the science itself: “I’ve stopped talking about AI. I’ve started talking significantly at Lilly about how are we going to assemble the killer dataset that answers the scientific question that will accelerate our pipeline?
“When you focus everybody in the room, whether it’s the chemist, the biologist, the clinician, the IT guy, the data scientist, etc., on the killer dataset, the scientific question that a killer dataset needs, the culture will start to break down a little bit more, and we’ll understand how do we get to that dataset. Then, the way that we use AI will be improved.”
Flatiron Health, the healthcare technology and services company whose offerings are designed to support cancer care providers and life science companies, has worked to improve its data—much of it extracted from electronic health records—through collaborations and partnerships. Sarah Kramarz, director of business development at Flatiron Health, cited the company’s partnership with Foundation Medicine to develop the Clinico-Genomic Database, aimed at helping researchers and biopharmas speed up the development of targeted therapeutics.
“In oncology—this is probably similar for many diseases—the interesting stuff lives in unstructured data. And so, we’ve developed a process for pulling out clinical insights from unstructured data,” Kramarz said. “We also fully recognize that there’s enormous amounts of information that doesn’t exist within the electronic health record that is really of importance for drug development and discovery and finding insights in real-world data. Through partnerships, we have found sources of information that we don’t have, and have been able to sort-of link our datasets with other datasets.”
Launched in 2016, the Clinico-Genomic Database links Foundation’s genomic profiling data patients sequenced through its comprehensive genomic profiling assays, with Flatiron’s longitudinal data detailing clinical treatments and outcomes. Last month, the two Roche subsidiaries published a study in Journal of the American Medical Association that offered validation that real-world data obtained from routine clinical care to generate a multi-institution clinico-genomic database is feasible and can yield novel, clinically meaningful insights.
The study, “Association of Patient Characteristics and Tumor Genomics With Clinical Outcomes Among Patients With Non–Small Cell Lung Cancer Using a Clinicogenomic Database,” concluded that the data can not only serve as real-world evidence to advance research and discovery in oncology, but can also ultimately inform clinical guidelines.
Flatiron had 2.2 million active patient records available for research through its tech solutions by more than 280 community oncology practices nationwide. A constant challenge for Flatiron, Kramarz said, is generating data from clinical physicians when they are crunched for time, seeing dozens of patients a day.
“A spilled coffee stain”
“Our incentives on the provider side of our business is to reduce the amount of time you spend on your EMR, and then our incentives on the research side of the business is to try to get as structured data as you possibly can. Those two are always at odds,” Kramarz acknowledged. “We have taken the view of our number-one priority as patients and as physicians, and so we’ll do the hard work, and what needs to happen on the back end. But 90% of what we get is dictated. It’s a scanned document with a spilled coffee stain on it. And that’s the reality, because that’s all that our physicians have time to do.”
Regeneron Pharmaceuticals has successfully integrated large-scale genomic analysis with biobank data review in a collaboration with GSK to discover rare, novel, and clinically actionable gene variants within the population. Drawing from the UK Biobank, a repository of genetic and phenotypic data from over 500,000 people, the team selected 49,960 individuals and performed whole-exome sequencing (WES) over 39 megabases of the genome, including 19,396 autosomal and 82 sex chromosomes genes.
Manuel Ferreira, PhD, Regeneron’s director of statistical genetics, said machine learning and AI can help researchers integrate clinical and genetic data where brain images are concerned, given the thousands of images captured per patient. “For me, it’s about finding the bottlenecks that we currently have. Which of those could potentially be fixed with AI? It’s just being pragmatic: There’s a tool. There’s a problem. Do they match?”
As companies address AI bottlenecks, Durvasula of Eli Lilly said, they will be best able to integrate the technologies into their drug discovery and development efforts.
“My hope is that in the next decade, we’re going to shift to a compute-first research environment, a model-first research environment, rather than run as many of these experiments as you can, and then do the modeling and figure out what the heck just happened,” Durvasula said. “It’s got to be a compute-first or a model-first research environment. That requires focusing everybody in the room with all their skills, and with all the multi-domain skills even, on the common purpose, the common scientific question.”