The Broad Institute of MIT and Harvard will partner with Intel in a 5-year, $25 million collaboration aimed at improving researchers' ability to analyze massive amounts of genomic data from diverse sources.
The collaboration, announced yesterday, is designed to optimize best practices in hardware and software for genome analytics, allowing for combining and using research datasets that reside on private, public, and hybrid clouds.
The partners will create an Intel–Broad Center for Genomic Data Engineering, where researchers and software engineers plan to build, optimize, and widely share new tools and infrastructure that will help scientists integrate and process genomic data. The Center will focus on three goals:
- Optimizing Broad's Genome Analytics Toolkit (GATK) best-practices hardware recommendations for genomic workloads for on-premise, public cloud, and hybrid cloud use cases
- Optimizing industry-standard Intel-based platforms, GATK, and other genomics software tools, such as the Broad’s workflow execution engine Cromwell, and GenomicsDB, a Broad-Intel solution for patient variant data storage and fast processing.
- Promoting more collaboration by healthcare providers, pharmaceutical companies, and academic research organizations through partnerships to develop workflow execution models across complex and distributed datasets.
The institute and Intel said they hope to enable researchers worldwide to run more data-intensive studies and generate robust results more quickly by accessing data that may have previously been unavailable to them.
“The size of genomic datasets doubles about every 8 months and, as it does, the challenge of acquiring, processing, storing, and analyzing this information increases as well,” Eric Banks, Ph.D., director of the Data Sciences and Data Engineering group at the Broad Institute, said in a statement. “Our work is a step toward building something analogous to a superhighway to connect disparate databases of genomic information for the advancement of research and precision medicine.”
Intel and the Broad Institute are building on a 2.5-year-old partnership, through which the partners earlier this year announced plans to co-develop new tools, and advance fundamental capabilities, so large genomic workflows can run at cloud scale. The new tools aim to simplify the execution of large genomic workflows such as GATK, as well as improve the storage, scalability, and processing of genomic data.
At the same time, Broad also launched collaborations with Intel and other cloud providers—including Google, IBM, and Microsoft—to enable cloud-based access to the GATK.