Programs to Boost Informatics
To that purpose, as part of the initiative, NSF will work with universities to develop interdisciplinary graduate programs to train students for careers as data scientists and engineers. NSF will also award a $2 million grant for a research group to help train undergraduates in using graphical and visualization techniques for complex data.
More toward research, NSF will spend $1.4 million to support a group of statisticians and biologists who will collaborate to discover the structures of proteins and biological pathways. NSF will also award a $10 million “Expeditions in Computing” grant to researchers at University of California, Berkeley whose AMPLab applies machine learning, cloud computing, and crowd sourcing to tackle projects.
Additionally, NSF and NIH will join award grants under a new program to promote core techniques and technologies for managing, analyzing, visualizing, and extracting useful information from large and diverse datasets. NIH is especially interested in imaging, molecular, cellular, electrophysiological, chemical, behavioral, epidemiological, clinical, and other datasets related to health and disease, Karin Remington, Ph.D., director of the division of biomedical technology, bioinformatics, and computational biology at NIH’s National Institute of General Medical Sciences (NIGMS), told GEN.
The agencies will award “mid-scale” grants for groups of three or more investigators ranging from $250,001 to $1 million per year for up to five years as well as smaller-scale project grants for one or two investigators of up to $250,000 per year for up to three years. Application deadlines are June 13 for the mid-scale grants and July 11 for the smaller grants.
In another key project of the initiative, NIH opted to store the 200 terabytes of data so far yielded from the 1000 Genomes project on the Amazon Web Services cloud and allow the public free access to all that data. Researchers will be charged for downloading the data or computing with the data, Dr. Brooks added. It’s still a bargain, she added, compared with the hundreds of thousands of dollars in computing equipment that universities would have to spend for the needed computing capacity.
At present, with two phases of work completed, the 1000 Genomes project consists of DNA sequenced from about 1,700 individuals, a number set to grow to 2,661 individuals in 26 populations by year’s end, Lisa D. Brooks, Ph.D., program director for the Genetic Variation Program at the National Human Genome Research Institute, told GEN.
Another piece of the big data initiative has CDC’s Special Bacteriology Reference Laboratory (SBRL) developing tools for new species identification designed to allow multiple analyses on a new or rapidly emerging pathogen to occur in hours, rather than days or weeks.
CDC will also upgrade its nearly decade-old BioSense program, a national public health surveillance system for early detection and rapid assessment of potential bioterrorism-related illness. BioSense 2.0 will be expanded to connect with state and local health departments as well as to contribute information for public health awareness, routine public health practice, and improved health outcomes and public health.