April 1, 2011 (Vol. 31, No. 7)

Dynamic Field Shaped by Both Genetic and Epigenetic Factors

One of the captivating debates in the field of genetics and genomics revolved, until recently, around the number of genes that make up human chromosomes. Prior to the Human Genome Project, some predictions placed the number of genes at 100,000. It came as a surprise when the analysis of the human genome revealed that we only have approximately 20,000 to 25,000 protein-encoding genes—a number that is comparable with the roundworm Caenorhabditis elegans and the flowering plant Arabidopsis thaliana, which have 20,000 and 28,000 genes, respectively.

This intriguing find illustrates that there is much more to the complexity of an organism than the number of protein-encoding genes. In addition, alternative splicing, which allows a single pre-mRNA to generate multiple protein isoforms, has emerged as the major contributor to protein diversity and biological complexity in humans. Over 70% of human genes are thought to undergo alternative splicing, and many human diseases have been linked to mutations at splice sites.


False-color scanning optical micrograph of the soil-dwelling bisexual nematode Caenorhabditis elegans: It came as a surprise to many that C. elegans have approximately the same number of protein-encoding genes as humans. This humbling finding illustrates that there is much more to the complexity of an organism than the number of protein-encoding genes. [James King-Holmes/Photo Researchers]

Alternative Splicing

While generating a stable cell line to study an alternatively spliced gene involved in frontotemporal dementia, a splicing disease, Tom Misteli, Ph.D., head of the cell biology of genomes group at the National Cancer Institute, and colleagues noticed that splicing occurred in some clones but not in others, and they also identified a third class of clones in which splicing occurred only in some RNA molecules.

“This did not make a lot of sense initially,” recalls Dr. Misteli, “but it was reminiscent of observations dating back 10–15 years when investigators who generated transgenic mice noticed that some animals expressed the gene at high levels, others expressed it at low levels, and still others did not express it at all.”

This is explained by the fact that the genomic site of integration shapes transgene-expression levels. Integration near a heterocromatin region, which is not expressed, leaves a transgene silent, while integration into an open chromatin domain causes high expression. “With the splicing reporter, we saw a very similar phenomenon, and this unexpected finding opened the possibility that chromatin could play a role in alternative splicing,” explains Dr. Misteli.

Subsequently, Dr. Misteli and colleagues used the FGFR2 gene as an experimental model and revealed, for the first time, that histone post-translational modifications are causally linked to alternative splice site selection. Based on this finding, the authors proposed that an adaptor system made of a histone mark, a chromatin binding protein that reads the histone mark, and a splicing regulator form a complex that allows the epigenetic information to be transmitted to the pre-mRNA processing machinery and shape the process of splicing.

“This was the first demonstration that the DNA talks to the RNA,” reveals Dr. Misteli. While many previous studies have described epigenetic changes in terms of their quantitative effect on gene expression, this finding provided an additional important layer of information. “Our results reveal that epigenetic marks also determine how a gene is expressed, or which combination of exons is used, and this represents a whole different level of regulation.”

“We are looking at epigenetic modifications that are linked to aberrant gene regulation in different contexts, such as cell differentiation and autoimmune disease,” says Esteban Ballestar, Ph.D., head of the chromatin and disease group at the Bellvitge Biomedical Research Institute.

One of the research efforts in Dr. Ballestar’s group is to understand alterations that occur in autoimmune conditions such as systemic lupus erythematosus (SLE) and rheumatoid arthritis. Over the years, substantial efforts from many research teams have focused on elucidating genetic variants associated with pathology in these diseases, but the increasing role that epigenetic contributions makes to gene regulation has emerged in recent years.

In the first high-throughput DNA methylation analysis conducted for autoimmune diseases, Dr. Ballestar and collaborators examined the genome of monozygotic twins that are discordant for SLE, rheumatoid arthritis, and dermatomyositis, and identified a set of genes that are differentially methylated between individuals discordant for SLE.

Gene ontology analyses revealed that genes involved in the immune response were overrepresented in this differentially methylated set, and the findings reinforced the idea that, for a specific genotype, the influence of environmental factors can shape predisposition to disease.

“Genetic information is very important in defining the function of a cell, but alterations in the profile of epigenetic marks that may occur through different mechanisms, including environmental effects, can also affect gene function,” Dr. Ballestar explains.

The importance of transcription factors in shaping tissue identity has been appreciated for many years. However, transcription factors commonly recognize relatively short binding motifs, which occur by chance many thousands of times throughout the genome. “One of the most intriguing aspects is to understand how specificity is achieved, because a cell needs to turn on or off the correct set of genes,” says Berthold Göttgens, Ph.D., principal investigator in the department of hematology at the University of Cambridge. Dr. Göttgens and colleagues recently conducted a comprehensive genome-wide binding pattern analysis of 10 key transcriptional regulators from hematopoietic progenitor cells, and revealed a previously unrecognized combinatorial interaction between distinct transcription factors.

This work provided the most comprehensive dataset, to date, of transcription factor interaction from blood stem cells, and the results, in addition to offering an important resource for investigators interested in these specific transcription factors, also provided a model to dissect combinatorial interactions among transcription factors in other cell types.

To test the predictions that emerged from the high-throughput genome analysis, Dr. Göttgens and colleagues conducted biological experiments by using a mouse model. This work revealed that even though mice heterozygous for two of these transcription factors, GATA2 and RUNX1, exhibited only weak hematopoietic manifestations, compound heterozygous mice were not viable, indicating that the two genes function as synthetic lethal alleles and providing new biological insights into the functional architecture of transcriptional processes in hematopoietic progenitor cells.

“That was a genetic proof of the importance of these transcription factor interactions that came out from our genome-wide analysis,” emphasizes Dr. Göttgens.

As new advances enable genome-wide datasets to be generated faster and at lower costs, understanding the significance of the vast amounts of data remains one of the challenges for many biomedical areas. “Mapping binding sites for transcription factors became so much easier with the availability of next-generation sequencing, but the big issue currently is to understand what can be learned from it.”

Fitting Datasets Together

With the advent of new high-throughput approaches to generate vast datasets, one of the challenges is how to integrate different types of information that are sometimes generated by using distinct techniques. For example, various types of genome-scale data are currently available, particularly from cancer patients, and include microarray analyses and specific amplifications or deletions on the chromosome.

“The question is how these datasets fit together, how one can make sense out of it. That is really important, because if we find the connections between these data types, we may in fact identify the genes that drive certain diseases,” says Gábor Balázsi, Ph.D., assistant professor in the department of systems biology at the University of Texas MD Anderson Cancer Center.

At the recent CHI “Molecular Medicine Tri Conference” in San Francisco, Dr. Balázsi talked about work that he and collaborators conducted to integrate mRNA and gene amplification/deletion datasets in an effort to identify genes and sets of genes that drive specific subtypes of breast cancer.

“We designed a method to put together the two types of data and, also, used existing information on protein-protein interaction and gene regulation,” he explains. Based on this approach, Dr. Balázsi and collaborators, involving postdoctoral fellow Bhaskar Dutta, identified gene networks that are important in breast cancer and unveiled specific subnetworks that they termed “driver networks” to illustrate the putative importance of the participating genes in the appearance of different breast cancer subtypes.

In breast cancer, triple negative tumors pose some of the most significant therapeutic challenges. This is in contrast to estrogen-receptor positive tumors, which often respond to tamoxifen or estrogen receptor antagonists, and to Her2 positive tumors, which usually respond to herceptin.

“One of the most interesting aspects of our study is that we identified, in collaboration with Dr. Lajos Pusztai’s laboratory, the gene sets for the triple negative subset of breast cancers,” explains Dr. Balázsi. Furthermore, by knocking down the genes from this network in triple negative cell lines established from patients, Dr. Pusztai and colleagues experimentally confirmed that genes identified by computational analysis play a role in the survival of triple negative breast cancer cells. “The driver networks we defined from gene expression and CGH data of human breast cancer patients provided directly testable therapeutic hypotheses that suggest treatment strategies and in particular combination therapies that could and should be tested in the clinic,” concluded Dr. Balázsi.


Integrating diverse data types: Gene-expression data and gene copy number aberration data are overlaid on a genome-scale regulatory network to extract breast cancer subtype-specific “driver” networks. [University of Texas MD Anderson Cancer Center]

Tumor Microenvironment

One of the most important and clinically relevant aspects related to gene expression is the need to visualize it as a highly dynamic process. “Overall, gene expression changes as tumors progress. The tumor microenvironment affects gene expression, and epigenetic modifications bring a significant contribution to this,” says David S. Hoon, Ph.D., director of the department of molecular oncology at the John Wayne Cancer Institute. One of the research efforts in Dr. Hoon’s laboratory, particularly over the past eight years, has focused on studying gene-expression changes that occur during cancer progression and metastasis, particularly from an epigenetic perspective.

Dr. Hoon and colleagues recently reported that RUNX3, a gene that exists in most cell types and appears to be important for development and cell differentiation, shows abnormal expression in primary and metastatic cutaneous melanoma. This work revealed that two epigenetic mechanisms, miRNA and promoter CpG hypermethylation, suppress RUNX3 mRNA levels in primary tumors as compared to untransformed cells, and an intriguing finding was that this suppression was even stronger in metastatic melanoma.

“An important aspect to remember is that we often assume that the gene-expression profile of a metastatic tumor will be the same as in the primary tumor, and we often treat based on the primary tumor, but this is not always correct,” emphasizes Dr. Hoon. The existence of gene-expression differences between primary and metastatic tumors represents a clinically relevant aspect that underscores the necessity to explore the epigenetic and gene-expression profile for both the primary and metastatic tumor, because they may be different in the same patient.

The interplay between genetic and epigenetic changes is emerging as one of the most exciting and thought-provoking developments from recent years. Increasingly, findings from multiple biomedical fields have converged to illustrate the concerted contribution of genetic and epigenetic factors as they shape gene expression. The reversible nature of epigenetic changes opens the possibility to monitor or modulate their impact on gene expression and promises attractive prophylactic, diagnostic, and therapeutic applications.

Previous articleDefiniens to Develop Image Analysis Application for ACD
Next articleInvestigators Implicate Sharpin Protein in Inflammation Triggered by Linear Ubiquitin