Meghan F. Coakley
Maarten R. Leerkes
Jason Barnett
Andrei E. Gabrielian
Karlynn Noble
M. Nick Weber
Yentram Huyen
In order to capitalize on Big Data’s benefits, significant challenges in data analytics must be overcome.
Introduction
The exponential growth of digital data presents both opportunities and obstacles in the domains of science, economics, business, healthcare, media, and virtually all aspects of our lives. Generating vast volumes of data in a multitude of formats is no longer a significant challenge, as an estimated 2.5 quintillion bytes of data are created every day, and 90% of the world’s data has been created in the last 2 years alone.1 The obstacle to making big data truly useful lies in the general shortage of tools and infrastructure to make valuable discoveries, and some federal government agencies have been slower to adapt to the new realities of the era of big data, relative to the private sector. According to predictive analytics expert Nate Silver, “In some cases, the government has the best data in the world, but not always the ability to use it.”2
To address this shortfall, the Federal “Big Data” Initiative was launched in March 2012 by the White House Office of Science and Technology Policy, with an announcement of a $200 million investment in new research and development projects for big data analytics.3 Several federal departments and agencies have committed to improving tools and techniques needed to access, organize, and make discoveries from huge volumes of digital data. Based on the recent emphasis on harnessing the power of big data, the Office of Cyber Infrastructure and Computational Biology at the National Institute of Allergy and Infectious Diseases (NIAID) used its third annual Bioinformatics Festival to address the current challenges in data science and big data analytics.
Held at the National Institutes of Health (NIH) in Bethesda, Maryland, on February 6, 2013, “Data Science: Unlocking the Power of Big Data” featured a diverse group of experts from academia and the public and private sector. Presentation topics included analytics management and governance of big data generated from areas such as astronomy, protein mass spectrometry analysis, and clinical data mining. The speakers presented caveats to leveraging innovative strategies such as crowdsourcing and Twitter analytics. The symposium also featured a noteworthy presentation consisting solely of tweets, delivered by Dr. Michael Rappa of the Institute for Advanced Analytics at North Carolina State University; his talk sparked a flurry of tweets from the audience, a real-time demonstration of the accumulation of big data in the social media sphere.
To read the rest of this article, CLICK HERE.
Meghan F. Coakley, Maarten R. Leerkes, Jason Barnett, Andrei E. Gabrielian, Karlynn Noble, M. Nick Weber, and Yentram Huyen ([email protected]) are affiliated with the Bioinformatics and Computational Biosciences Branch of the Office of Cyber Infrastructure and Computational Biology at the National Institute of Allergy and Infectious Diseases, National Institutes of Health in Bethesda, Maryland.
References:
1 IBM. What is big data? – Bringing big data to the enterprise. Available online at www-01.ibm.com/software/data/bigdata/ (Accessed Feb. 2, 2013).
2 Konkel F. February 25, 2013. Nate Silver on big data’s future: It’s about attitude. FCW.com. Available online at http://fcw.com/articles/2013/02/25/nate-silver-data-insights.aspx (Accessed Feb. 27, 2013).
3 Office of Science and Technology Policy, Executive Office of the President. Press Release: Obama Administration Unveils “Big Data” Initiative: Announces $200 Million in New R&D Investments. March 29, 2012. Available online at www.whitehouse.gov/sites/default/files/microsites/ostp/big_data_press_release_final_2.pdf (Accessed March 1, 2013).
Big Data, published by Mary Ann Liebert, Inc., is an open access peer-reviewed journal that provides a unique forum for world-class research exploring the challenges and opportunities in collecting, analyzing, and disseminating vast amounts of data, including data science, big data infrastructure and analytics, and pervasive computing. The above article was published in June 2013 ahead of print.