For more than a decade microarrays have been the tool of choice for analyzing the transcriptome. With the advent of next-generation sequencing technologies and decreasing costs per base, a new approach—mRNA-Seq, the sequencing of full length cDNAs—has emerged. By using this approach instead of relying on nucleic acid hybridization, researchers have access to a more complete view of the transcriptome and detection of rare transcripts and isoforms.
mRNA-Seq from Illumina uses the Genome Analyzer to evaluate RNA. This assay quantifies the expression levels of all mRNA molecules. Because this method is based on sequencing and not hybridization, it provides an unbiased, probe-less measurement of all mRNA molecules in a sample—even with organisms for which microarrays are not available.
Besides profiling RNA expression levels, each individual sequence measurement can be used to discover new transcripts, annotate gene structure predictions, study patterns of alternative splicing, and characterize sequence polymorphisms in the transcriptome. mRNA-Seq with the Genome Analyzer is already being used by researchers to study disease processes and to learn more about the structure and regulation of transcription across the genome.
The mRNA-Seq sample-prep protocol converts RNA into cDNA libraries to be sequenced using Illumina’s ultrahigh-throughput DNA sequencing platform. The protocol starts with as little as one microgram of total RNA (without any preamplification), purified from any eukaryotic organism.
The first step in the process isolates mRNA molecules using a poly-A selection method. After the mRNA molecules are purified, the assay uses a random priming process to create cDNA fragments evenly across the entire length of the RNA molecule, without any bias toward the 5' or 3' end of the mRNA. These short cDNA fragments are prepared for sequencing on the Genome Analyzer, which is currently capable of producing up to 80 million sequences from the mRNA-Seq kit every two days.
To put that in perspective, one run of the mRNA-Seq assay produces more sequence tags than all the expressed sequence tag (EST) data, for all organisms, than has ever been deposited in GenBank. This level of data output allows researchers to study the transcriptome at a level of detail that was previously not possible.
Like EST sequencing of the past, the data produced by the mRNA-Seq assay can be used for many different purposes. The assay provides a sensitive, accurate digital count—across a large dynamic range—of all expressed mRNA molecules in any sample. Because of the parallel nature of the Illumina Genome Analyzer, it is possible to reproducibly detect counts of mRNA molecules that are present at less than one copy per cell. Since the signals in the assay are based upon DNA sequencing and not hybridization, the assay is specific, with few false positives.
Analyzing Expression Levels
As an example of the quantitative ability of the assay, we have used mRNA-Seq to analyze expression levels of all genes in two heavily studied samples that were part of the Microarray Quality Control (MAQC) project. These two samples, the universal human reference RNA and a mixed whole human brain sample, have been intensely studied using all of the major microarray platforms as well as quantitative PCR.
Figure 1 compares the fold-change levels calculated using the digital counts from the mRNA-Seq assay with the same results from quantitative PCR assays for about 750 genes. The overall correlation of the data between these two different assays is a confirmation of the accuracy and range of the mRNA-Seq assay.
In addition to measuring digital gene-expression levels for all transcripts, the mRNA-Seq assay can be used to study transcript structure in genes with many reads spread out along the length of the original mRNA molecule. These reads provide information about alternative splicing, since many of the reads span exon-exon junctions formed during normal mRNA processing. Because they occur only as a result of mRNA splicing, the sequences created at splice junctions usually do not fully align back to the genome. Instead, these reads provide specific evidence of the order of exons within a given transcript.
Figure 2 is a visualization of junction reads that shows how they can be used to study alternative splicing. This figure is a screenshot taken from the GenomeStudio™ Software suite developed by Illumina to help users analyze and interpret the data from the mRNA-Seq Assay. The software displays tables of quantitative data in the form of SNPs and digital counts associated with known genes, exons, and splice junctions. In addition, the visualization tools can be used to help annotate new transcripts and understand the complexities of alternative splicing.
The mRNA-Seq Assay can be used to study polymorphisms in the transcriptome, and as a tool to gain insight into the genetics of transcription. The GenomeStudio software creates reports of all positions in the sample where the consensus base sequence called is different than the reference human genome.
The software automatically creates an Allele Table of putative coding SNPs from every mRNA-Seq Assay. Figure 3 shows the results of analyzing SNPs in the MAQC human brain sample, a mixture of whole-brain RNA from 23 individuals. The figure shows a clear example of a novel SNP that occurs in the coding region of a known gene.
The software color-codes sequence differences, and represents newly discovered SNPs with red characters. After all SNPs have been characterized across each transcript, it is straightforward to use these differences to study allele-specific expression patterns for each gene across individual samples. This method will eventually be used to improve our understanding of the relationship between genetics and gene expression.
mRNA-Seq is a product that combines all the benefits of microarrays, quantitative PCR, and EST sequencing into one powerful new assay. It can be used to study the transcriptome of any organism and is not biased by what is or is not known about any genome. The assay offers a combination of accurate and precise quantification coupled with hypothesis-free, open-ended discovery—all in one experiment.