The Human Genome Sequencing Center (HGSC) at Baylor College of Medicine has adopted the DNAnexus enterprise cloud platform as part of a collaboration between the two and with Amazon Web Services (AWS) focused on advancing large-scale clinical analysis of genomic data by processing and analyzing more than 14,000 human genomes in the cloud.
The platform has been used to power HGSC’s Mercury pipeline, a semiautomated and modular set of tools for analysis of next-generation sequencing data in both research and clinical contexts. The Mercury pipeline identifies mutations from genomic data, setting the stage for determining the significance of these mutations as a cause of serious disease, and is the core variant-calling pipeline for the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium.
More than 400TB of data from the project will support a consortium of 300 researchers around the world studying heart disease and aging. Data from the project are being presented today at the American Society of Human Genetics annual meeting in Boston.
“Working with DNAnexus and Amazon Web Services, we were able to rapidly deploy a cloud-based solution that allows us to scale up our support to researchers at the HGSC, and make our Mercury pipeline analysis data accessible to the CHARGE Consortium, enabling what will be the largest genomic analysis project to have ever taken place in the cloud,” said Jeffrey Reid, Ph.D., assistant professor of molecular and human genetics at the Baylor College of Medicine.
CHARGE, which seeks better understanding of how human genetics contributes to heart disease and aging, has a longstanding collaboration with the HGSC focused on advancing disease gene discovery. The collaborators worked with AWS to process data from CHARGE using the Mercury pipeline.
CHARGE involves more than 300 researchers across five institutions worldwide analyzing the genome sequence data of over 14,000 individuals (3,751 whole genomes and 10,771 exomes), requiring approximately 2.4 million core-hours of computational time and some 860 TB of storage. At the project’s peak, The HGSC used the DNAnexus platform to spin up more than 20,000 cores on-demand in order to run the CHARGE data through the Mercury analysis pipeline. During this period, the HGSC was running the largest genomics analysis cluster in the world, hosted by AWS.
DNAnexus provides an enterprise-focused API-based platform-as-a-service designed to enable clinical and research enterprises to move their analysis pipelines into the cloud, using their own algorithms alongside industry-recognized tools and reference resources, with the goal of creating customized workflows in a secure, cost-effective, and compliant environment.
“Through this collaboration with the HGSC and Amazon Web Services, 300 scientists can now perform downstream analyses on these invaluable health and aging data at a scale not previously possible,” DNAnexus CEO Richard Daly said.