Although bench scientists understand the value of using gene-expression microarray technology, the full value of this technology is frequently inaccessible due to persistent roadblocks in microarray data analysis.
Ingenuity® iReport™ from Ingenuity Systems is an interactive visual report for researchers who need to quickly understand gene-expression data, identify novel biological insights, and generate testable hypotheses to drive the experiment-to-experiment cycle.
This tutorial describes the best practices used by iReport for statistical and biological analysis of genomics data analysis, including novel tools for improved data visualization and insight discovery, resulting in an easy to use, one-step solution for statistical and biological analysis of gene expression data.
Pipeline for Statistical Analysis and Quality Control of Gene-Expression Data
Statistical analysis reduces instrument-level, per-sample measurements down to a set of significantly differentially expressed genes (DEGs). iReport includes a robust, fully automated statistical analysis pipeline for microarray data, based on industry-standard open-source components, primarily the widely used Bioconductor software for the R statistical programming language. It employs quantile normalization, RMA summarization and background correction, empirical Bayes methods for batch effect correction (ComBat), and empirical Bayes linear models for statistical analysis (Limma), which maximize its ability to detect differentially expressed genes. The pipeline controls type-I error using Benjamini and Hochberg’s False Discovery Rate when sufficient experimental replicates are available.
The statistical analysis pipeline helps the researcher control data by identifying outlier arrays, recognizing and evaluating the impact of batch effects, and alerting researchers to potential experimental design and statistical power problems (Figure 1).
The statistical pipeline currently supports gene-expression data from most human, mouse, and rat microarrays from Affymetrix, Illumina, and Agilent, as well as RNA-Seq data. It is extensible to other omics platforms, and support for qPCR is currently being developed.
Content and Algorithms for Biological Analysis of DEGs
Biological analysis of DEGs is often overlooked but is the critical step in getting rapid, complete value from microarray data and identifying insights for validation. Once a list of DEGs is identified by the iReport statistical pipeline, the DEGs are sent through a series of automated biological analyses, whose output forms the basis of iReport.
This biological analysis begins with a series of queries to the content in the Ingenuity Knowledge Base, a database of over 3.5 million highly descriptive findings curated from the biomedical literature and structured for computation. This content allows iReport to relate the expression of individual genes and gene sets to known and experimentally demonstrated information on signaling and metabolic pathways, biological processes, cellular functions, diseases, and experimentally demonstrated molecular interactions (both physical and functional).
iReport then identifies the subset of most significantly overrepresented biological and cellular functions, pathways, and diseases from those queries using standard bioinformatics enrichment tools, namely the Fisher’s Exact Test. (This process has been vetted in over 5,500 publications that cite IPA®, which uses the same statistical-enrichment process.) This helps researchers to understand putatively affected cellular functions and pathways and to identify potential markers of disease and key molecular interactions all without relying on bioinformatics support or labor-intensive literature searches.