The Broad Institute of MIT and Harvard and Google Genomics said they will partner to develop computing infrastructure to store and process enormous datasets, as well as create tools to analyze such data in biomedical research.
No value was disclosed for the collaboration, which is nonexclusive.
In the partnership’s first step, Google will offer the Broad Institute’s Genome Analysis Toolkit (GATK) via Google Genomics on the Google Cloud Platform.
GATK will be offered as a managed service available as an alpha release to a limited set of users. Researchers will be able to upload genetic data and run GATK-powered analyses on Google Cloud Platform, and may use GATK to analyze genetic data already available for research via Google Genomics.
The partners have made available “GATK Best Practices,” workflow descriptions with step-by-step recommendations for getting the best analysis results from high-throughput sequencing data. Workflows are available for variant discovery in both DNA-Seq and RNA-Seq.
The GATK service is intended to make best-practice genomic analysis readily available to researchers who don’t have access to the dedicated compute infrastructure and engineering teams required for analyzing genomic data at scale, the institute said.
GATK is a software package developed at the Broad to analyze high-throughput genomic sequencing data. GATK offers analysis tools focused primarily on genetic variant discovery and genotyping, as well as on data quality assurance.
To date, the Broad said, more than 20,000 users have processed genomic data using GATK, which is already available for download free to academic and non-profit users. Business users can utilize GATK through a license from the institute.
The institute said it plans to continue to support and upgrade GATK for all users, both on site and on the cloud, and will continue to offer the software directly.
Services available through the collaboration will be designed to align with existing and emerging standards of the Global Alliance for Genomics and Health (GA4GH), established in 2013 to build a shared framework to enable genomic and clinical data sharing while ensuring data privacy and security. The Broad is a founding host institution of GA4GH; Google joined the alliance last year.
Google Genomics offers a web-based application programming interface (API) designed to store, process, explore, and share DNA sequence reads, reference-based alignments, and variant calls, using Google's cloud infrastructure.
“We are excited to work with Google’s talented and experienced engineers to develop ways to empower researchers around the world by making it easier to access and use genomic information,” Eric Lander, Ph.D., president and director of Broad Institute, said in a statement.
Added Google Genomics director David Glazer: “Google Genomics is helping scientists make genomic information more accessible and useful. By making Broad’s GATK available through the Google Cloud Platform, we hope to accelerate great science added.”