The emerging field of massively parallel cDNA sequencing, or RNA-seq, provides exciting potential to rapidly characterize and quantify transcriptomes. It is a young and evolving field, however, with challenges accompanying advances and opportunities.
“RNA-seq is currently in its early stages much like the way microarrays were 10 years ago,” notes Kellie J. Archer, Ph.D., associate professor, department of biostatistics, Virginia Commonwealth University.
“It is a wonderful tool with exciting possibilities. Aside from gene expression, we can look also at exon expression, identify microRNA precursors, etc. Even one run provides an enormous amount of such information. But we first must address a number of important issues.”
Dr. Archer says one such challenge is the issue of mapping. “How do we map RNA sequences to a reference genome?
“Mapping sequences in which introns have been removed by cis-splicing can be accomplished, but how do we effectively handle alternative splicing? How do we take quality of reads into account in downstream analyses? There is a lot of research as to the most efficient method to use for mapping, and there are many tools emerging. But it is not yet clear which is the best and most accurate.”
A second issue is how to perform statistical analysis.
“With RNA-seq, the assay returns number of reads per sequence, not a continuous variable reflecting relative abundance (as is the case with traditional gene-expression microarrays). If one merges data across samples, several sequences will have zero counts and the data range can be quite large, so the normal distribution no longer holds. Therefore, we can’t employ commonly used statistical tests such as t-tests.
“Earlier papers examining technical replicates used a Poisson distribution, but more recent studies involving biological replicates suggest a negative binomial model may handle the overdispersion more accurately.”
A third issue is the presence of technical artifacts. “We don’t yet know how to address the fact that different RNA-seq technologies aren’t directly comparable. Initially it was expected that RNA-seq would reveal the truth about number of transcripts in a sample, but we see artifacts from different high-throughput sequencing technologies.”
Dr. Archer believes these problems will be solved, just as they were for microarrays. “The field is definitely moving from the traditional microarray platform in the direction of RNA-seq. Aside from the cost and instrumentation needed, the technical challenges will be solved as the field progresses. Companies are constantly improving their platforms and seeking to give the best sequencing performance at the lowest costs. This field will progressively provide a fuller and complete knowledge of both the qualitative and the quantitative aspects of RNA biology and thus gene expression.”