Scientists working to understand the mechanics of biological systems now commonly carry out OMICs studies to map the roles of and interplay between genes, proteins, RNA, and other molecules. Sophisticated computational platforms are available to analyze the data from large-scale OMICs screens, but using such complex tools may be beyond the capabilities of laboratory researchers, and rely on highly trained computational scientists.
To help make this wealth of data more accessible and insightful to researchers at the bench, collaborating scientists at Sanford Burnham Prebys, the Genomics Institute of the Novartis Research Foundation (GNF), and the University of California, San Diego (UCSD) have released an open-access, web-based portal, called Metascape, that integrates more than 40 advanced bioinformatics data sources, and which can provide the results to key biological queries in just a few clicks.
Metascape has been developed to make the interpretation of OMICS data more seamless and intuitive. “Even for computational scientists, compiling and analyzing large OMICs datasets can be a difficult and time-consuming task,” commented Yingyao Zhou, director of data science and data engineering at GNF, and first author of the team’s report in Nature Communications, which describes the new system. “Metascape provides biologists with a platform from which they can access the power of numerous analysis tools all within a simple interface and generate an easy-to-interpret report.”
The researchers, headed by Sumit Chanda, PhD, director of the immunity and pathogenesis program at Sanford Burnham Prebys, described the development of Metascape in a paper titled, “Metascape provides a biologist-oriented resource for the analysis of systems-level datasets.”
OMICS assays are now “standard practice” when investigating the molecular mechanisms that underpin biological systems, the authors wrote. Such studies can result in lists of potentially hundreds of gene candidates, and the job then becomes one of analyzing often huge datasets to provide molecular context. As the researchers stated, “Common queries include: What pathways or biochemical complexes are enriched? What are the functional roles of identified protein complexes? Which candidate proteins are secreted, contain a transmembrane domain, or are otherwise druggable? Or, are there any chemical probes available for a rapid candidate validation?”
A number of gene-list analysis portals have been developed to help researchers interrogate their OMICS data, but as the team commented, “multiple portals are required to accomplish a complete systems-level analysis workflow, leading to a fragmented user experience.” They give as an example, a situation where a user analyzing proteomics data may need to employ one tool to convert protein identifiers into gene symbols, another tool to perform pathway enrichment analysis, a third to assess protein interaction network, and then yet more tools to generate visualizations of the data.” Users will need to learn how to use each platform, and how to integrate data from different types of analyses and file formats.
“For the inexperienced user, this can pose a significant barrier to entry,” Chanda et al., stated. This degree of complexity means that significant amounts of biological knowledge may be missed out because of the fragmented nature of the OMICS datasets and their analysis tools. “Ideally, a broad range of biological relationships and classifications can be assessed within one integrated portal.”
The team has developed Metascape to harness what they describe as “the best practices of OMICs data analyses to have emerged over the last decade into one integrated portal.” Users can access Metascape as web-based portal that integrates more than 40 open access bioinformatics databases, and enables integrated, simplified workflows for searches, gene annotation, and comparative analyses.
In their published paper, the team demonstrated the features and capabilities of Metascape using three prior genetic screens of flu that were investigating factors involved in viral replication. The Metascape workflow integrated and analyzed data from the 40 included database spanning 10 common model organisms, and produced an easily understood report in about a minute. The team acknowledges that larger datasets may take longer to process.
Analysis tools can be accessed through a one-click Express Analysis interface, and results are communicated by way of an “article-like analysis report,” the team explained. Key data could be visualized using PowerPoint, Excel, and other reporting formats, automatically. To ensure that Metascape’s data remains as up to date as possible, the workflow incorporated a two-phase approach based on an initial, automated data craw of sources, followed by manual quality control. “In the first phase, individual data sources are automatically crawled, wrangled, and assembled according to a predefined topological order of dependence, where gene identifier resources are processed before dependent annotation resources.” Any significant changes or unexpected errors trigger an alert to the Metascape quality control developers.
“In the second phase a Metascape quality control member inspects the graphical report for suspicious changes in record count,” the team continued. “Alerts received during the whole process are manually examined and addressed in parallel. Taken together, this workflow ensures that Metascape users are not adversely impacted by outdated data sources.”
“Biologists seek answers to some of today’s most devastating diseases—from cancer to Alzheimer’s to infectious diseases, such as HIV or influenza (flu),” said Chanda. “By developing Metascape, we hope to help biologists to better understand their own data so they can uncover information that will lead to novel disease targets, improved vaccines, and new drugs to treat challenging diseases.”
Metascape has been in development since 2014, and a beta version of the software was initially released in late 2015. “Metascape has already facilitated the analysis and interpretation of large OMICs datasets in more than 330 published scientific studies,” added co-author Lars Pache, PhD, a research assistant professor at Sanford Burnham Prebys. “Due to its ease of use, we expect that it will soon become an indispensable platform that will help scientists decipher critical results in the era of big data.”
The Metascape developers are now turning to artificial intelligence to further increase the depth of knowledge and insights that can be provided by the platform. “By applying new machine learning tools to Metascape, we can help biologists uncover more nuances in their data that help scientists even better prioritize the direction they want to take their research,” said Zhou.