Leading the Way in Life Science Technologies

GEN Exclusives

More »

Feature Articles

More »
Apr 1, 2012 (Vol. 32, No. 7)

Exploiting Gene-Expression Data

  • RNA-Seq

    The emerging field of massively parallel cDNA sequencing, or RNA-seq, provides exciting potential to rapidly characterize and quantify transcriptomes. It is a young and evolving field, however, with challenges accompanying advances and opportunities.

    “RNA-seq is currently in its early stages much like the way microarrays were 10 years ago,” notes Kellie J. Archer, Ph.D., associate professor, department of biostatistics, Virginia Commonwealth University.

    “It is a wonderful tool with exciting possibilities. Aside from gene expression, we can look also at exon expression, identify microRNA precursors, etc. Even one run provides an enormous amount of such information. But we first must address a number of important issues.”

    Dr. Archer says one such challenge is the issue of mapping. “How do we map RNA sequences to a reference genome?

    “Mapping sequences in which introns have been removed by cis-splicing can be accomplished, but how do we effectively handle alternative splicing? How do we take quality of reads into account in downstream analyses? There is a lot of research as to the most efficient method to use for mapping, and there are many tools emerging. But it is not yet clear which is the best and most accurate.”

    A second issue is how to perform statistical analysis.

    “With RNA-seq, the assay returns number of reads per sequence, not a continuous variable reflecting relative abundance (as is the case with traditional gene-expression microarrays). If one merges data across samples, several sequences will have zero counts and the data range can be quite large, so the normal distribution no longer holds. Therefore, we can’t employ commonly used statistical tests such as t-tests.

    “Earlier papers examining technical replicates used a Poisson distribution, but more recent studies involving biological replicates suggest a negative binomial model may handle the overdispersion more accurately.”

    A third issue is the presence of technical artifacts. “We don’t yet know how to address the fact that different RNA-seq technologies aren’t directly comparable. Initially it was expected that RNA-seq would reveal the truth about number of transcripts in a sample, but we see artifacts from different high-throughput sequencing technologies.”

    Dr. Archer believes these problems will be solved, just as they were for microarrays. “The field is definitely moving from the traditional microarray platform in the direction of RNA-seq. Aside from the cost and instrumentation needed, the technical challenges will be solved as the field progresses. Companies are constantly improving their platforms and seeking to give the best sequencing performance at the lowest costs. This field will progressively provide a fuller and complete knowledge of both the qualitative and the quantitative aspects of RNA biology and thus gene expression.”

  • Diverse Techniques for Studying Gene Expression

    Click Image To Enlarge +
    To detect mutations in cancer samples for their gene expression study, researchers from Foundation Medicine made use of the HiSeq 2000 system from Illumina.

    In a recent study in Respiratory Research entitled “Systems-level comparison of host responses induced by pandemic and seasonal influenza A H1N1 viruses in primary human type I-like alveolar epithelial cells in vitro,” a research team from China and Canada utilized gene-expression analysis to compare transcriptional responses to infection with a seasonal H1N1 influenza virus or a pandemic H1N1 influenza virus isolated during the 2009 influenza pandemic.

    Based on the published data, scientists at Ingenuity Systems tested the ability of a new web-based report to correctly identify expected results and gain biological insights. Using iReport, the Ingenuity Systems researchers not only validated the published results, they also identified additional genes, pathways, and processes involved in seasonal H1N1 influenza compared to pandemic influenza infection from the same data, said Megan Laurance, Ph.D., product manager at Ingenuity Systems.

    “Forty-three other genes encoding zinc finger proteins as well as nine other genes encoding small nucleolar RNAs were observed to be downregulated,” explained Dr. Laurance. “In addition, cytosolic pattern-recognition receptors were activated in response to seasonal H1N1 infection.

    “In less than two days, we were able to confirm the presence of particular pathways and processes using a single tool that tackles both the statistical and biological analyses of gene expression data. iReport expounded upon the findings presented by Lee et al. and provided additional genes of interest for future studies in the areas of transcription and mRNA transport, which are downregulated upon seasonal H1N1 infection but not pandemic H1N1 infection.”

    GeneGo, a Thomson Reuters Business

    Current approaches to deriving genomic biomarkers can produce reasonably accurate biomarkers, but these lack robustness and cannot generally be linked biologically to the endpoint, according to scientists at Thomson Reuters.

    One barrier to the more extensive use of these genomic biomarkers is the difficulty in determining the biological relevance of the signatures from the classifying genes identified, limiting their utility for risk assessment, they said.

    Via a poster entitled “A Novel Method for Deriving Mechanistically-Anchored Gene Expression Biomarkers,” Richard J. Brennan, Ph.D., et al at Thomson Reuters described the development of a new technique for deriving genomic signatures using discrete modules of genes representing a variety of biological pathways and functional categories. They also discussed a two-stage machine-learning approach that identifies individual modules with classification power, and combines them into a meta-signature to optimize predictive performance.

    “These ‘functional descriptors’ have comparable performance to gene signatures for the same endpoint generated using other supervised machine-learning methods,” explained Dr. Brennan. “A functional descriptor predicting renal tubule injury was derived with an estimated sensitivity of 81.7% and specificity of 98.0%, comparable to the performance of a standard gene signature on the same training set (83% and 94%, respectively).”

    Functional descriptors also encompass information about the pathways and metabolic processes involved, leading to an understanding of the biological relevance of the signature, he continued. Classification of tubule toxicity was based on perturbation of pathways involved in cytoskeletal remodeling, lipid metabolism, vitamin D signaling, and amino acid metabolism among others.

    “Functional descriptors therefore hold the promise of combining predictive and mechanistic systems toxicology,” added Dr. Brennan.

    “The Functional Descriptor™ approach leverages a manually curated knowledge base of functional categories to derive a series of signatures for an endpoint using these predefined gene sets as features. Querying the descriptor set for a class prediction permits an investigation into the contributing biological properties.”

    Expression Analysis

    Officials at Expression Analysis (EA) maintain that RNA-Seq has gained much interest due to the potential performance benefits relative to gene-expression microarrays. They cite a number of expected advantages, including unbiased content, more precise quantification, detection of novel isoforms, and detection of structural variation.

    Each of these measures is dependent on the read length, number of reads generated, and other factors that comprise the sequencing strategy,” they said.

    “There has been some reluctance to switch from array to sequencing-based expression studies because of cost and the availability of bioinformatic tools to support RNA-Seq datasets,” explained Steve McPhail, CEO. “Most importantly, there has been no way to compare years of existing array-based datasets to RNA-Seq datasets.”

    EA recently completed a performance comparison of various RNA-Seq strategies to microarrays in a real-world experimental scenario. The experiment consisted of 15 breast cancer cell lines (five unique lines representing each of three breast cancer subtypes). It revealed that at 12 million sequencing reads there were 25–35% greater number of genes found to be differentially expressed compared to that of microarrays.

    The study reportedly also demonstrated a 50% increase in the detection of genes and a 500% increase in isoform detection. At 25 million sequencing reads, the numbers jumped to 40–50% increase in the magnitude of genes found to be differentially expressed, 67% increase in the detection of genes, and a 550% increase in isoform detection.

    “EA has developed a tool to map a portion of the sequencing data to Affymetrix probe sets thus enabling researchers to directly compare their existing array-based datasets to RNA-Seq data,” said McPhail. “This tool presents data in a CEL file format also allowing researchers to utilize their existing array-based data-analysis pipelines to become familiar with the power of RNA-Seq datasets.”


    Detailed in a poster entitled “Detection of cancer-associated mutations, rearrangements and gene-expression changes by targeted deep sequencing of FFPE RNA and DNA,” a research team described the development of FFPE-compatible targeted RNA sequencing and analysis methods for the study of over 200 cancer-associated genes.

    Geoff Otto and colleagues from Foundation Medicine and the Albany Medical College used protocols, validated on cell lines where known mutations and gene fusions (e.g., BCR-ABL1) were detected, to characterize 49 FFPE non-small-cell lung cancer tumors.

    Technical reproducibility in digital expression profiling exceeded r>0.99 and >0.95 for cells lines and FFPE RNA, respectively, according to the scientists, who added that RNA-seq provided evidence of alterations in the genome, including point mutations and novel rearrangements involving known oncogenes.

    Differential expression of oncogenes including EGFR, KIT, and RET was also revealed, ranging from 2- to 50-fold across different tumors. Combination of RNA and DNA sequencing data on identical FFPE samples corroborated functional consequences of genomic alterations. Examples included expression of mutated KRAS and TP53 alleles and reduced STK11 expression in a tumor that had a homozygous deletion at the DNA level.

    Foundation Medicine used Illumina’s HiSeq 2000 system for detection of mutations in cancer samples. The scientists reported that they were able to achieve a high level of sensitivity for detecting mutations in cancer-related genes without any apriori assumptions on the specific mutations.

    “Application of [Illumina’s] next-generation sequencing technologies to FFPE RNA and integration with extant DNA sequencing methods is anticipated to expand our understanding of clinically relevant cancer biology and improve patient care,” noted the researchers, who conclude that targeted RNA-seq of FFPE RNA is highly reproducible and preserves transcript abundance.

    “RNA-seq results were highly concordant with DNA-seq, and 93% of somatic hotspot mutations were detected along with several gene fusions,” the scientists pointed out. “Integration with DNA-seq enabled comprehensive molecular profiling of a FFPE tumor sample with high sensitivity and specificity to mutations, copy number alterations, gene fusions and changes in gene expression.”

Related content

Be sure to take the GEN Poll

Drug Price Hikes

Novum Pharma recently raised the price of an acne cream by over 3,900% in less than a year-and-a-half and Mylan increased price of EpiPen from $100 to $608 . Do you think pharmaceutical companies need to be subjected to price controls?

More »