RNA sequencing (RNA-seq), a means of depicting the transcriptome, is being used with increasing frequency to characterize a growing array of conditions—everything from prenatal birth defects to disorders of the elderly. Yet the technique, which is a relatively new form of next-generation sequencing, has yet to win the full confidence of patients, clinicians, and researchers. Just how accurate is this form of sequencing?
To answer that question, two initiatives—one taken by the Sequence Quality Control (SEQC) Consortium and another by the Association of Biomolecular Resource Facilities (ABRF)—have undertaken a number of studies. These initiatives took up the challenge of rigorously defining the scope and sources of variation in RNA sequencing data.
Many of the findings produced by these initiatives appeared in the September issue of Nature Biotechnology, which focuses on the performance of RNA sequencing. In particular, the issue emphasized large-scale studies involving data generated using multiple sequencing sites, platforms, or protocols:
- “A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium”—This study, which assessed RNA-seq performance for junction discovery and differential expression profiling and compared it to microarray and quantitative PCR (qPCR) data using complementary metrics, concluded that RNA-seq can be “a versatile tool for relative expression profiling, with comparable or superior performance to microarrays in many applications given sufficient read depth and appropriate choice of analysis pipeline.”
- “Multi-platform assessment of transcriptome profiling using RNA-seq in the ABRF next-generation sequencing study”—In this study, researchers carried out replicate experiments across 15 laboratory sites using reference RNA standards to test four protocols on five sequencing platforms. The results: “high intraplatform and inter-platform concordance for expression measures across the deep-count platforms, but highly variable efficiency and cost for splice junction and variant detection between all platforms.”
- “The concordance between RNA-seq and microarray data depends on chemical treatment and transcript abundance”—Noting that the concordance of RNA-seq with microarrays for genome-wide analysis of differential gene expression had not been rigorously assessed using a range of chemical treatment conditions, this study generated Illumina RNA-seq and Affymetrix microarray data from the same liver samples of rats exposed in triplicate to varying degrees of perturbation by 27 chemicals representing multiple modes of action. The authors of the study indicated that RNA-seq outperforms microarray (93% versus 75%) in the verification of differentially expressed genes, as assessed by quantitative PCR, with the gain mainly due to its improved accuracy for low-abundance transcripts. “Nonetheless, classifiers to predict MOAs perform similarly when developed using data from either platform. Therefore, the endpoint studied and its biological complexity, transcript abundance and the genomic application are important factors in transcriptomic research and for clinical and regulatory decision making.”
Although they were accompanied by two analyses focused on computational biology and two News and Views articles, these three research articles made up the core of the issue’s RNA-seq coverage.
Conspicuous contributions to these articles were made by the Mayo Clinic in Florida, the Beijing Genomic Institute, and Weill Cornell Medical School. At these institutions, laboratory groups sequenced the same RNA samples multiple times.
More than 1 billion nucleotides of sequencing data were generated by each site. The data were then analyzed under the direction of the FDA with the assistance of a large group of academic and industrial statisticians. The researchers also examined the current technologies and major biochemical methods of 30 RNA-sequencing labs and hundreds of researchers. The researchers also found that RNA can be accurately extracted and analyzed from severely degraded genetic samples, such as from tissue samples that have been stored for many years.
“It was determined that there is very strong agreement between the sequence data generated by experienced sequencing laboratories,” said E. Aubrey Thompson, Ph.D., a professor of cancer biology at the Mayo Clinic. “The studies now establish the best practice for all laboratories to use, so that results are reliable and reproducible across laboratories.”