The Data Deluge
Information technology has spurred the growth of global collaborations throughout society by connecting people in meaningful ways and providing them with new information tools. Yet, by comparison, life sciences has trailed in this regard. Cancer research, an undeniably vast and overwhelmingly intricate field, contains a wide variety of complex processes, technologies, and materials that must converge in order for researchers to have a complete and illustrative view of specific cancer types.
For example, clinical records on large patient populations can be combined with genomic and proteomic data to yield information leading to a new generation of diagnostic tests and targeted therapies. These and other biorepositories are now at the nexus of clinical and basic research.
Today, biomedical research is generating copious genomic, proteomic, medical record, and biospecimen datasets at disparate locations and in varied formats; this makes it difficult, at best, to leverage the data’s full value through integration and comparison. Individual biomedical research labs can generate up to 100 terabytes of data—akin to one million encyclopedias of information. This data, which is doubling in volume every 15 months, is limited in its utility because of a lack of systems and technology for properly managing them.
In addition to the technical complexity, the culture of biomedical scientific investigation itself presents a challenge. Academic and industrial research in the life sciences is charged with fierce independence, creativity, and entrepreneurialism, necessary ingredients for achieving the innovations that have benefited many patients, but potential obstacles to a synchronized effort to compile comparable data from many sources for analysis.
Although a growing population of bench and clinical researchers within the wider life sciences community have expressed a willingness and desire to share and combine data, as well as the tools used to analyze that data, the large number of different formats, conventions, and applications make it an overwhelming task for any individual or group to undertake.
Responding to the need to create greater connectivity throughout the cancer research community, the NCI established caBIG to help align the many disparate parts of the cancer research process. caBIG includes software tools and systems to enable integration of clinical information with molecular information.
To help create a networked community of researchers, caBIG incorporates tools to bridge existing databases; establishes common data standards to ensure that the entire caBIG community is speaking the same language; provides a common set of statistical and visualization tools to analyze data from microarrays and other high-throughput technologies; and facilitates the exchange of biospecimens, reagents, and ideas. Collectively, these technologies and the growing community of users are establishing caBIG as a World Wide Web of cancer research.