Big data and data-enabled science have found their way into the life sciences, bringing challenges that can be summed up in 5 Vs: volume, veracity, velocity, variety, and value. Life-sci data are challenging for their variety, value, and especially the critical need for veracity—compared with, say, banking, where volume and velocity are the biggest challenges.
A newly-published study details efforts by the laboratory of Eugene Kolker, Ph.D., head of the Bioinformatics and High-Throughput Analysis Laboratory at Seattle Children's Research Institute (SCRI) and chief data officer at Seattle Children’s Hospital, to create partnerships aimed at addressing these and other challenges wrought by life-sci big data. That study, “Unraveling the Complexities of Life Sciences Data,” has been published in the preview issue of Big Data, a new open-access journal published by GEN publisher Mary Ann Liebert Inc.
Dr. Kolker, the study’s corresponding author, and eight colleagues also noted how annual Data-Intensive Science Workshops in 2010 and 2011 led to creation of DELSA (Data-Enabled Life Sciences Alliance International) Global, which is working to provide a voice and coordinating framework for collective innovation in data-enabled science for the life sciences community. SCRI and NSF supported the workshops.
Another Kolker Lab initiative, a survey of U.S. proteomics researchers and other life scientists by the University of Washington Business School, identified an immediate need for tools and resources to easily access publicly available proteomics experiments. Important criteria included reliable data, statistically valid results, user-friendly data analysis tools, transparent reporting of results, and an ability to share data.
The survey led to development of MOPED (Model Organism Protein Expression Database), which provides concise summaries of protein identification, relative and absolute expression, and other quantitative data. NSF support also helped Kolker’s lab develop the Systematic Protein Investigative Research Environment (SPIRE) for processing and analysis of proteomics data for MOPED, which contains more than 43,000 proteins with one spectral match.