By Jonathan D. Grinstein, PhD
Patricia Brennan speaks of the mission for science technology at the Chan Zuckerberg Initiative (CZI) by trying to reach the same level of conviction that marked President John F. Kennedy’s legendary proposal in May 1961 for the United States to accomplish “landing a man on the Moon and returning him safely to the Earth” before the end of the decade.
“Our mission is to cure, prevent, and manage all disease by the end of the century,” Brennan, CZI’s vice president for science technology, told GEN Edge.
This line isn’t exactly new, as it is an iteration of a similar tagline found throughout the CZI’s website and literature: “Is it possible to cure, prevent, or manage all diseases by the end of this century? We think so.”
On September 19, 2023, CZI announced the funding and creation of one of the largest computing systems dedicated to nonprofit life science research in the world, which is planned to comprise 1,000+ GPUs and will enable artificial intelligence (AI) and large language models (LLMs) for biomedicine at scale.
While CZI’s announcement didn’t reach quite as many people as another historical quote that pertains to the lunar landing—Neil Armstrong’s “One small step for man, one giant leap for mankind” in July 1969, as millions watched around the world with their faces glued to TV sets—the potential successes may reach many more.
The AI triad
Machine learning (ML) tools, such as generative AI, large language models (LLMs), and foundational models, have been taking the world of science, technology, and, even, popular culture by storm, as exemplified by Open AI’s ChatGPT. In biology and medicine, these ML tools are already performing tasks important to scientists, like identifying complex patterns and important variables within large amounts of information, to support meaningful advances in drug discovery, genetics, and precision medicine.
“Some say biology is, in many respects, a computational challenge or endeavor,” said Brennan. This line echoes something said years ago by American biophysicist and expert on the origins of life, Harold Morowitz, PhD, “Computer science is to biology what calculus is to physics. It’s the natural mathematical technique that best maps the character of the subject.”
Generally, all AI applications, whether for biomedicine, robotics, or economics, rely on the same triad: data, algorithms, and computing power. For the past few years, the CZI has made progress in algorithms and data, exemplified by their work on imaging and single-cell biology.
With the CELL by GENE (CELLxGENE) platform, CZI has been working with grantees and the broader scientific community to aggregate, standardize, integrate, curate, and update single-cell data to empower researchers with an aggregated and growing dataset of over 50 million single cells instead of having to allocate tremendous resources to build their own datasets, which may not even be comparable to others.
“What we’ve seen is that not just this notion of aggregating data and normalizing it, but actually making it available, making the entire corpus available and easy to use queryable mode has really spurred some of the development of new models and research across these different areas,” said Brennan. “We’re seeing researchers go from what we call data level analysis to atlas level analysis, where they’re looking across tissue atlases or other aggregated datasets.”
Other data sources include resources generated by CZ Science research institutes. The Chan Zuckerberg Biohub San Francisco created the protein localization and interaction atlas OpenCell as well as the cell atlas Tabula Sapiens. Meanwhile, the Chan Zuckerberg Institute for Advanced Biological Imaging (CZ Imaging Institute) will be creating large datasets of cells at molecular resolution.
The CZI science technology team has been and will continue to be, engaged in developing AI software. One such tool is CellGuide, a free interactive encyclopedia that gives researchers crucial information about more than 700 distinct cell types and sub-cell types using definitions produced by ChatGPT. These include definitions, relevant datasets, an expandable ontology tree visualization of a cell’s lineage, and computational and canonical marker genes.
In addition, in collaboration with CZI’s science technology team, the CZ Imaging Institute is developing an open-source, cloud-based portal for querying organized data from cryo-electron tomography (cryoET) experiments.
“In the past five years, I’ve seen this tremendous amount of progress in predicting the properties of individual molecules,” said Nicholas Sofroniew, PhD, director of product technology at the CZI. “We can fold a single protein, but how do these proteins fit together in cells? That is still much less known there. So, measurements with cryoET, where you are seeing native proteins in that native environment, could be part of the next wave of AI and ML algorithms that the whole community can then work on and develop as we make these things available.”
But according to Sofroniew, the gap between the computational resources available to individual academics and a very small number of tech research labs is very large. Sofroniew said that the CZI can fit that gap and help researchers address problems that they might not be willing to take on right now.
“There is a really unique place that we can position ourselves that spans across the types of problems that we can take on and the ways that we might want to solve them, as well as the types and amounts of compute that we can bring to bear on these sorts of problems, which is becoming increasingly a constraint for computational biology,” said Sofroniew.
Empowering basic science
In some ways, the CZI has been building toward this moment from both the grantmaking and technology perspectives to address complex data, as it already has been gaining ground on understanding imaging and single-cell data. Last year, they brought in Stanford University biophysicist Stephen Quake, PhD, to be CZI’s new head of science, following the decision by former boss Cori Bargmann, PhD, to return to her lab at Rockefeller University.
While most of the CZI’s focus tends to be more on basic research, they do have projects that include working with researchers who are looking at applications and understanding disease mechanisms. However, the CZI’s contribution to curing, preventing, and managing disease by the end of the century will not be by discovering or developing therapeutics directly.
“Lots of the pharma companies are very interested in AI and biology right now, and there are lots of investments in drug-related design,” said Sofroniew. “We’re not going to work on the same sorts of problems. The CZI is focused on the enabling tools, technologies, and data that will advance and empower additional scientific discovery. We are not in pharma, nor are we getting into that.”
The CZI’s goal is thinking about that long-term mission and wanting to build technology across data, models, and applications that enable science to go faster and empower scientists to generate more science.
“[Biology’s] challenges take a long time and are best met by incorporating a mix of software development, basic science, and scientific research alongside grantmaking and identifying opportunities for specialized funding,” said Brennan. “With all the developments, whether it’s in large language models or just computational and compute power in recent months or years, we see that there’s an opportunity to bring this all together.”