April 15, 2013 (Vol. 33, No. 8)

Rhea U. Vallente, Ph.D.

The generation of high-throughput data from sequencing and gene expression profiling experiments has resulted in an overabundance of information, making analysis quite overwhelming.

Pathway analysis offers an approach that involves grouping thousands of molecules based on similarities, interactions, and components, thus simplifying the task of finding meaning in all the gathered information.

Pathway analysis also allows identification of minor changes that might occur within biological systems at varying conditions, which may help in devising an explanation for a particular response to a stimulus. At the recent “BIO-IT World Conference and Expo”, several computational scientists and researchers shared their analytical tools, experiences, and expertise in using pathway analysis in their investigations.

An enrichment analysis platform allows the enumeration and interpretation of genes that have been identified from genomics and gene-ChIP studies, according to Gary Bader, Ph.D., associate professor at The Donnelly Centre, University of Toronto.

“Genomics researchers often generate massive gene expression data, so it’s important to be able to identify which genes are differentially expressed, know which ones are highly expressed, and those that are least expressed. In addition, these experimental data should also be compared to that of the control.”

Dr. Bader also explained that there is currently so much information from genomics studies, resulting in a lot of redundancy in terms of raw information. Thus it is important to further analyze these datasets to delineate smaller groups among a huge network of genes.

“We have developed the Enrichment Map, which is a visualization method that allows the user to identify functional themes within gene expression data. These gene sets, or pathways, reduce the complexity of the analysis because they simplify the interpretation of the dataset,” discussed Dr. Bader.

“This approach thus generates a map of biological processes that one can view, together with p-values based on pathway (gene set) enrichment. This network-based approach is interactive, allowing the user to see nodes, edges, and overlaps between gene sets. For example, within the cell cycle alone, genes could be further grouped into specific mechanisms within this biological event, such as anaphase, metaphase, and DNA replication, and thus the user is provided with a ranked pathway list, which helps in prioritizing target genes for further analysis.”

Dr. Bader’s group plans to further zoom in to mechanistic details of biological events, including mutations from cancer genomics. “We would like to look into ChIP-seq data and mutation data to identify specific mechanisms for particular types of cancers. It would be interesting to determine which regulators are responsible for a certain gene expression profile in a specific cancer type.”

Pathway analysis has helped Novartis perform research in two major aspects: patient selection and identification of drug combinations that may be useful for addressing unmet clinical needs, especially relating to drug resistance, according to Joseph Lehar, Ph.D., director of bioinformatics, oncology translational research.

“In patient selection, we identify gene sets from networks of association data, and these sets of genes often belong to the same functional unit,” Dr. Lehar explained. “The approach allows us to have a predictive advantage in terms of drug response and reduces the noise that is usually observed when examining individual genetic features. It may also be possible that based on relationships established using pathway analysis, we may know whether one gene affects the expression of another.”

Furthermore, Dr. Lehar said that pathway analysis has facilitated their work in assessing drug resistance and identifying useful drug combinations. “It is possible that one drug works on a specific mechanism, and pathway analysis can help us find that a second drug works on an escape mechanism, and perhaps find ways to pulse the treatment—that is, treating cells with the second drug before the cells develop resistance to the first.”

Dr. Lehar is also a member of a research consortium that has developed the Cancer Cell Line Encyclopedia, which is a compilation of cell line-specific data on gene expression, sequencing, and copy number that could be utilized in identifying various predictors for drug resistance and sensitivity. His group envisions that this unique dataset will enhance the design and development of personalized treatment schemes in cancer.

“One challenge is that drug research in cancer involves the identification of mechanisms that could be targeted during therapy. Some cancers are dependent on a simple dominant mechanism of activation and progression, but other cancers are very heterogeneous, meaning that any one type of drug may not be effective for all patients.

“We are now figuring out how to select patients that would respond to specific drugs based on markers in pathways of the targeted proteins. We and most drug companies no longer rely on the classical approach of treating large undifferentiated cancer populations and instead are using a personalized approach, identifying groups of patients with likely drug responses based on their specific cancer genotypes.”


Enrichment map of a microtubule cluster after estrogen treatment for 24 hours showing gene sets from various partitions grouped together. The red nodes indicate enrichment or upregulation after treatment with estrogen, whereas the blue color denotes downregulation after treatment with estrogen or enrichment in untreated cells. [University of Toronto]

Visualizing Datasets

Alexander Lex, Ph.D., research scientist of the visual computing group of Harvard School of Engineering and Applied Sciences, said that visualization of large datasets of experimental data has allowed his group to identify changes in specific biological pathways when looking, for example, at cancer data.

“Our Caleydo project was designed to bring biomolecular data into a visual form that helps researchers in finding relationships within large sets of data. Our approach can tackle various types of data, ranging from mutation, methylation patterns, gene expression, and microRNAs, to clinical data and pathways.

“This enables, for example, looking at subtypes of glioblastoma. Distinguishing factors of cancer subtypes could be derived from experimental data, but integrating a wide variety of datasets enables researchers to characterize subtypes better, to find supporting evidence in the clinical data, or to reason about causes based on pathways.” He further discussed that Caleydo has been applied to various datasets and has been successfully used to analyze data from The Cancer Genome Atlas (TCGA).

Dr. Lex has also focused on the modulation of behavior of genes for potential application in drug discovery. “It would be interesting not only to find out how a certain gene works, but also how a drug affects the gene and how this affects the rest of the pathway it is associated with,” explained Dr. Lex.

“One of the challenges in our research involves the size of the network during analysis. Some think that using smaller pathways is good, but this may not be true at all times, since small pathways remove the complexity we observe in reality. One challenge therefore is to find a compromise between what to show and what to hide, so a researcher can make informed decisions without being drowned in information. Finding this balance also stresses the importance of collaborations with various researcher disciplines.”


Stratification of cancer subtype data using StratomeX: The first three columns from the left represent tabular data from methylation and mRNA expression analyses, the fourth column represents copy number variation of EGFR. The right-most column represents Kaplan-Meier plots for “days to death” based on the EGFR data. The heatmap assists the user in judging composition (number of cancer subtypes) of the dataset, as well as predicting treatment outcomes of patients. [Harvard School of Engineering and Applied Sciences]

Metabolism Studies

Pathway analysis has also been used in studies involving secondary metabolism. “We look for families of genes that are associated with the biosynthesis of chemical compounds,” said Daniel Udwary, Ph.D., assistant professor, biomedical and pharmaceutical sciences, University of Rhode Island.

“We are interested in identifying new drugs from natural products and looking at various bacterial species based on their metabolomes. One particular feature of working with microbial metabolomes is that their genes are often clustered within an area in the genome. It is thus simply a matter of looking around that gene and the rest of the cluster is there; this also occurs in fungi, but not in plants and other higher-order organisms.”

Dr. Udwary’s research has concentrated on identifying gene clusters that are associated with the synthesis of secondary metabolites. “We have currently identified 3,892 gene clusters with specific pathways, and it is interesting to know that each pathway is different. It is almost the same as snowflakes, in which each one is unique. Now our challenge after identifying specific pathways is to be able to predict the mechanism of a specific gene based on its DNA sequence.”

Dr. Udwary plans to conduct comparative analyses of gene clusters among various microbial species to establish natural products-based drug discovery roadmaps.

“Unfortunately, drug discovery using natural products has diminished in the last few decades and a lot of potential mechanisms have been overlooked. It is critical for us to recognize that the horizontal transfer of genes plays an important role in the biosynthesis of new drugs, and thus revisiting these operons in microbial species can help in establishing trends in secondary metabolism.”

Pathway analysis has also helped scientists in elucidating mechanisms of drug action. According to Joshua Apgar, Ph.D., principal scientist of systems biology, department of immunology and inflammation at Boehringer Ingelheim Pharmaceuticals, their use of well-described pathway models has helped them understand drug selectivity and functionality.

“We are inspired by the fact that some compounds show functional selectivity in vivo but are not selective in vitro. We are interested in identifying on-target and off-target mechanisms of new drugs and how systems-level processes can affect these mechanisms,” explained Dr. Apgar.

These processes are quite complicated and may be influenced by a variety of feedback processes that only exist in vivo. Reconstruction of all these processes in vitro may be impossible and thus the use of pathway models has assisted their investigations of drug target effects.

Pathway Analysis Reveals Exercise May Not Slow Muscle Aging

Ingenuity Systems says its IPA technology helped elucidate a network of genetic drivers for aging. The networks determined, among other things, that muscle age was not primarily related to the biology of physical activity.

Previously, research had hypothesized that resistance exercise contributed to a reverse of the aging process, yet work carried out by the laboratory of Professor James A. Timmons at Loughborough University in the U.K. demonstrated that response to exercise is highly variable in humans and that pre-existing gene expression levels can predict future response to exercise. The paper also identified biological molecules that may drive the response to physical activity.

The new work coupled analysis with Ingenuity’s IPA software and microarray data generated both by Dr. Timmons’ laboratory as well as datasets from published literature. The analysis determined that the genetic regulators of age-related genes were distinct and unrelated to the regulators of exercise-influenced genes.

“Several papers in the recent past have asserted that exercise ‘reverses’ the aging process, which is a very attractive proposition—if we could simply exercise more, we would slow the aging process and potentially have fewer chronic problems,” notes Dr. Timmons. “In this study, we took a different approach, measuring the variation in the products being made from the genetic code using Ingenuity’s IPA, and determined that it just isn’t that simple.”

The research was published in the journal PLOS Genetics in a paper titled “Molecular Networks of Human Muscle Adaptation to Exercise and Age”. IPA (Ingenuity Pathway Analysis) is a web-based functional analysis tool for comprehensive omic data.


Exercise isn’t the key to the fountain of youth: the answer lies in our genes.

Previous articleGEN 10 Awards and Grad Bash
Next articleSuper-Enhancer Discovery Opens New Cancer Research Doors