April 1, 2013 (Vol. 33, No. 7)

Richard A. A. Stein M.D., Ph.D.

The complex and dynamic transcriptional patterns unveiled by the ENCyclopedia of DNA Elements (ENCODE) project, together with the finding that less than 2% of the transcriptional output of the human genome encodes proteins and approximately 98% encodes noncoding RNAs, are some of the advances that reshaped the field and even required that we revisit the definition of the gene.

While insights into the genome have repeatedly been a source of thought-provoking findings, the transcriptome, with its unprecedented and unexpected levels of complexity, promises to be even more intriguing. The emergence of RNA-Seq allowed quantitative and high-throughput analyses of the transcriptome to be performed in different cell types and under various conditions, and with the massive amounts of data that have been generated, computational analysis is emerging as one of the most critical challenges.

“Two of the basic problems in transcriptome analysis are identifying the true sets of transcripts in a given tissue at a given time, and defining the dynamics of gene expression,” says Zhong Wang, Ph.D., staff scientist and group lead for genome analysis at the DOE Joint Genome Institute.

The superior sensitivity and accuracy of RNA-Seq, along with its ability to measure transcript isoform levels and to reconstruct transcriptomes even in the absence of a reference genome, made it a method of choice for transcriptome analysis. However, sequence reads generated by existing platforms are often short, and this represents one of the challenges in the field.

“Pairwise gene expression analyses are routinely performed to compare genes or gene sets that are differentially expressed between cancer tissues and normal tissues,” says Dr. Wang. Among the numerous statistics and bioinformatics challenges, one relates to the difficulties that accompany choosing the most appropriate algorithms. “The statistics very much depend on the types of data-analysis software one wants to use,” he says.

There are situations when the expression level of most genes does not change between two conditions and, in this case, certain assumptions and specific types of statistical analyses are more applicable. In other instances, when the expression levels of most genes change, a different type of analysis might be recommended, and more biological replicates are needed to increase the statistical power.

“Different algorithms are needed for different datasets, and this requirement is best defined by the specific biological question that is being addressed. I don’t think there is a simple solution to address this challenge, it is more like an art,” explains Dr. Wang.

In many industrialized countries, cancer and heart disease are the two most prevalent causes of mortality. Head and neck cancers constitute the sixth most frequent malignancy worldwide, and squamous cell carcinomas, the vast majority of malignancies in this group, represent a significant concern, particularly due to their dire prognosis.

Most patients with head and neck cancers have a history of smoking and drinking. While the incidence of head and neck cancers at most sites has dramatically dropped in the U.S. since World War II, along with a decrease in smoking, oropharyngeal cancer appears to be an exception because its incidence has increased in people who are younger and lack a history of smoking and drinking.

“A potential explanation for this trend appears to be the infection with human papilloma viruses,” says David I. Smith, Ph.D., professor of laboratory medicine and pathology at Mayo Clinic. Human papilloma viruses were also implicated in cervical cancer, where viral integration into the host genome represents an essential step during malignant transformation.

To better understand the involvement of human papilloma viruses in oropharyngeal cancer, Dr. Smith and colleagues used RNA-Seq in combination with exome sequencing to perform a whole-transcriptome analysis in oropharyngeal carcinoma patients including current smokers, never smokers, or ex-smokers with at least 10–15 years of smoking cessation.

“RNA-Seq provides more information and, overall, is a much more comprehensive approach than microarrays to explore the transcriptome,” says Dr. Smith. The analysis revealed that certain genes are differentially expressed among the three groups, and the increased expression of genes involved in DNA repair in human papilloma virus-negative current smokers, as compared to the two other groups, emerged as a distinguishing feature.

While transcriptomics is increasingly becoming routine in the clinic, the bottleneck of data analysis is emerging as one of its most acute challenges. “In the next couple of years we will see an absolute revolution in understanding alterations that occur in cancer and, most importantly, we will be able to design therapies targeting those specific alterations,” says Dr. Smith.

Graphic illustration of a cancer cell. While scientists have been using DNA microarrays to yield information about the molecular heterogeneity of cancer, analyses that evaluate cancer transcriptome information alongside other data will be able to extract deeper biological insights. [Mopic/Fotolia]

Cancer Stem Cells

“Understanding the involvement of stem cells in malignant transformation is of great interest,” says Rolf I. Skotheim, Ph.D., group leader in genome biology and member of the Cancer Stem Cell Centre at Oslo University Hospital.

The cancer stem cell theory proposes that a small subpopulation of cells from malignant tumors, which possess stem-cell-like properties, are able to sustain the cancer due to their unlimited self-renewing capacity, a characteristic that they share with stem cells from nonmalignant tissues. The ability of cancer stem cells to evade conventional chemotherapy makes them of additional interest for therapeutic reasons.

While putative cancer stem cells were identified for several types of malignant tumors, and surface markers associated with the stem cell phenotype have been defined, one of the challenges lies in the difficulty to separate stem cells from the other malignant tumor cells.

“Testicular cancer provides a unique model to perform genetic analyses of malignant stem cells,” says Dr. Skotheim. The uniqueness of testicular germ cell tumors lies in the fact that the embryonal carcinoma type of cell in testicular cancer has stem cell characteristics very much the same as pluripotent embryonic stem cells. Thus, researchers have a sufficient supply of both true cancer stem cells and relevant nonmalignant control stem cells.

To identify malignancy-specific gene expression differences between cancer cells and stem cells, Dr. Skotheim and colleagues compared exon-level transcriptomic profiles between several embryonal carcinoma cell lines and nonmalignant embryonic stem cell lines cultured under comparable growth conditions.

“If you take a view at the transcriptomes, they are virtually identical, but a few genes are different, and those are the ones that could help understand key changes that make one cell malignant and other one nonmalignant,” says Dr. Skotheim.

The goal of new technologies is to enable scientists to study entire sets of genes, uncover cancer interaction networks, and elucidate regulatory mechanisms encoded in cancer gene expression. [4designersart/Fotolia]

Transposable Elements

“It is important to analyze transcriptome and genome data together because some aberrations in the transcriptome originate from aberrations in the DNA copy number,” says Peter J. Park, Ph.D., associate professor of pediatrics at Harvard Medical School.

As an example, Dr. Park and colleagues recently analyzed the involvement of transposable elements in human malignancies. Transposable elements, which abound in the human genome, were associated in previous reports with tumor development, but a comprehensive study was lacking.

In somatic genomes, their activity is normally suppressed epigenetically and at post-transcriptional levels, but the disruption of these mechanisms during malignancies is thought to facilitate their retrotransposition, a process in which these elements are copied and then inserted into new sites in the genome. These insertions can then disrupt the normal function of the genome.

“Whole-genome sequencing data offers an unprecedented opportunity for characterizing transposable elements, but they have not been studied in great detail because it is very difficult to work with repetitive sequences,” explains Dr. Park. Genomic reads from whole-genome sequencing data containing transposable elements have often been disregarded due to difficulties in assigning them to specific chromosomal regions.

“We developed a computational pipeline to analyze these reads and compared tumor and normal genomes from the same patient,” says Dr. Park. This analysis, performed in dozens of cancer samples, allowed identification of many somatic insertions of transposable elements at a single-nucleotide resolution, including a colorectal sample in which more than 100 such elements were found.

“We have several reasons to think that these insertions are biologically important. One of them is that they are not randomly inserted into the genome, but appear to target genes that are also frequently mutated in cancer,” explains Dr. Park. After correlating the insertion sites with DNA methylation and gene expression data, Dr. Park and colleagues found that genes affected by insertions were, on average, downregulated, supporting their hypothesis.

Analyzing microRNAs

“We wanted to more rationally examine the link between disease-related microRNAs and cancer transcriptomes,” says Hiroshi I. Suzuki, M.D., Ph.D., project assistant professor of molecular pathology at the University of Tokyo. While microRNAs impact protein levels primarily by destabilizing their target mRNA molecules, it has been challenging to test this in dynamic biological systems that contain multiple miRNA molecules whose levels fluctuate.

“Understanding this interaction helps us analyze the cancer transcriptome,” explains Dr. Suzuki. By taking advantage of two analytical pipelines, GSEA (Gene Set Enrichment Analysis) and FAME (Functional Assessment of miRNAs via Enrichment), Dr. Suzuki and colleagues developed a new approach, GFA (GSEA-FAME Analysis), which allows microRNA activities to be predicted from mRNA expression data, including microRNA perturbation experiments, and provides the proof of concept for mRNA destabilization by microRNAs in the disease transcriptomes.

By using GFA to mine the multidimensional data from The Cancer Genome Atlas (TCGA), Dr. Suzuki and colleagues identified several microRNAs that can serve as robust prognostic markers for cancer survival.

“Many previous datasets used either microRNA profiling or mRNA profiling, but in our analysis, we showed that it is the combined microRNA and mRNA profiling. This provides exceptional opportunities to identify and develop robust biomarkers,” says Dr. Suzuki.

Looking at the Big Picture

“Historically, investigators have frequently focused on specific genes or pathways but are now realizing that the whole transcriptome, as opposed to single genes, may be involved in any response,” says Hua Lu, M.D., Ph.D., professor and chair of biochemistry and molecular biology at Tulane University School of Medicine.

Researchers in Dr. Lu’s lab recently described and characterized inauhzin, a small molecule that activates and stabilizes p53 by increasing its acetylation and, as a result, suppresses tumor growth. Microarray analyses combined with RT-qPCR performed by Dr. Lu and colleagues revealed that the induction of p53 target genes occurs at a much larger scale than previously thought.

Over 320 genes were overexpressed at least 2.3-fold, and over 260 genes were downregulated at least twofold by inauhzin, in a p53-dependent manner. “This finding provided opportunities to see more genes that are involved, globally, in the p53 response, and to obtain a much better image than by examining a single pathway,” says Dr. Lu.

This strategy also unveiled multiple genes that are regulated by inauhzin in a p53-independent manner. “This is what should be done when examining a drug, to understand the response, because a number of classical drugs have been revealed to have additional targets,” explains Dr. Lu.

Reaping the benefits of technological advances, transcriptomics is catalyzing the emergence of new paradigms in molecular and clinical oncology. The increasing focus on surveying global cellular perturbations, and the integration of the data with other systems-level approaches, including genomics and proteomics, define new conceptual frameworks.

These interdependent technological and research developments are paving the path toward the time when the systematic analysis of cancer transcriptomes, still in its infancy, will become a routine part of clinical medicine.

Previous articleDainippon Sumitomo Invests $16M in RIKEN Company
Next articleArQule Regains Rights to AKT Inhibitor from Daiichi Sankyo