Pipeline for Statistical Analysis and Quality Control of Gene-Expression Data
Statistical analysis reduces instrument-level, per-sample measurements down to a set of significantly differentially expressed genes (DEGs). iReport includes a robust, fully automated statistical analysis pipeline for microarray data, based on industry-standard open-source components, primarily the widely used Bioconductor software for the R statistical programming language. It employs quantile normalization, RMA summarization and background correction, empirical Bayes methods for batch effect correction (ComBat), and empirical Bayes linear models for statistical analysis (Limma), which maximize its ability to detect differentially expressed genes. The pipeline controls type-I error using Benjamini and Hochberg’s False Discovery Rate when sufficient experimental replicates are available.
The statistical analysis pipeline helps the researcher control data by identifying outlier arrays, recognizing and evaluating the impact of batch effects, and alerting researchers to potential experimental design and statistical power problems (Figure 1).
The statistical pipeline currently supports gene-expression data from most human, mouse, and rat microarrays from Affymetrix, Illumina, and Agilent, as well as RNA-Seq data. It is extensible to other omics platforms, and support for qPCR is currently being developed.