One of the captivating debates in the field of genetics and genomics revolved, until recently, around the number of genes that make up human chromosomes. Prior to the Human Genome Project, some predictions placed the number of genes at 100,000. It came as a surprise when the analysis of the human genome revealed that we only have approximately 20,000 to 25,000 protein-encoding genes—a number that is comparable with the roundworm Caenorhabditis elegans and the flowering plant Arabidopsis thaliana, which have 20,000 and 28,000 genes, respectively.
This intriguing find illustrates that there is much more to the complexity of an organism than the number of protein-encoding genes. In addition, alternative splicing, which allows a single pre-mRNA to generate multiple protein isoforms, has emerged as the major contributor to protein diversity and biological complexity in humans. Over 70% of human genes are thought to undergo alternative splicing, and many human diseases have been linked to mutations at splice sites.
While generating a stable cell line to study an alternatively spliced gene involved in frontotemporal dementia, a splicing disease, Tom Misteli, Ph.D., head of the cell biology of genomes group at the National Cancer Institute, and colleagues noticed that splicing occurred in some clones but not in others, and they also identified a third class of clones in which splicing occurred only in some RNA molecules.
“This did not make a lot of sense initially,” recalls Dr. Misteli, “but it was reminiscent of observations dating back 10–15 years when investigators who generated transgenic mice noticed that some animals expressed the gene at high levels, others expressed it at low levels, and still others did not express it at all.”
This is explained by the fact that the genomic site of integration shapes transgene-expression levels. Integration near a heterocromatin region, which is not expressed, leaves a transgene silent, while integration into an open chromatin domain causes high expression. “With the splicing reporter, we saw a very similar phenomenon, and this unexpected finding opened the possibility that chromatin could play a role in alternative splicing,” explains Dr. Misteli.
Subsequently, Dr. Misteli and colleagues used the FGFR2 gene as an experimental model and revealed, for the first time, that histone post-translational modifications are causally linked to alternative splice site selection. Based on this finding, the authors proposed that an adaptor system made of a histone mark, a chromatin binding protein that reads the histone mark, and a splicing regulator form a complex that allows the epigenetic information to be transmitted to the pre-mRNA processing machinery and shape the process of splicing.
“This was the first demonstration that the DNA talks to the RNA,” reveals Dr. Misteli. While many previous studies have described epigenetic changes in terms of their quantitative effect on gene expression, this finding provided an additional important layer of information. “Our results reveal that epigenetic marks also determine how a gene is expressed, or which combination of exons is used, and this represents a whole different level of regulation.”
“We are looking at epigenetic modifications that are linked to aberrant gene regulation in different contexts, such as cell differentiation and autoimmune disease,” says Esteban Ballestar, Ph.D., head of the chromatin and disease group at the Bellvitge Biomedical Research Institute.
One of the research efforts in Dr. Ballestar’s group is to understand alterations that occur in autoimmune conditions such as systemic lupus erythematosus (SLE) and rheumatoid arthritis. Over the years, substantial efforts from many research teams have focused on elucidating genetic variants associated with pathology in these diseases, but the increasing role that epigenetic contributions makes to gene regulation has emerged in recent years.
In the first high-throughput DNA methylation analysis conducted for autoimmune diseases, Dr. Ballestar and collaborators examined the genome of monozygotic twins that are discordant for SLE, rheumatoid arthritis, and dermatomyositis, and identified a set of genes that are differentially methylated between individuals discordant for SLE.
Gene ontology analyses revealed that genes involved in the immune response were overrepresented in this differentially methylated set, and the findings reinforced the idea that, for a specific genotype, the influence of environmental factors can shape predisposition to disease.
“Genetic information is very important in defining the function of a cell, but alterations in the profile of epigenetic marks that may occur through different mechanisms, including environmental effects, can also affect gene function,” Dr. Ballestar explains.
The importance of transcription factors in shaping tissue identity has been appreciated for many years. However, transcription factors commonly recognize relatively short binding motifs, which occur by chance many thousands of times throughout the genome. “One of the most intriguing aspects is to understand how specificity is achieved, because a cell needs to turn on or off the correct set of genes,” says Berthold Göttgens, Ph.D., principal investigator in the department of hematology at the University of Cambridge. Dr. Göttgens and colleagues recently conducted a comprehensive genome-wide binding pattern analysis of 10 key transcriptional regulators from hematopoietic progenitor cells, and revealed a previously unrecognized combinatorial interaction between distinct transcription factors.
This work provided the most comprehensive dataset, to date, of transcription factor interaction from blood stem cells, and the results, in addition to offering an important resource for investigators interested in these specific transcription factors, also provided a model to dissect combinatorial interactions among transcription factors in other cell types.
To test the predictions that emerged from the high-throughput genome analysis, Dr. Göttgens and colleagues conducted biological experiments by using a mouse model. This work revealed that even though mice heterozygous for two of these transcription factors, GATA2 and RUNX1, exhibited only weak hematopoietic manifestations, compound heterozygous mice were not viable, indicating that the two genes function as synthetic lethal alleles and providing new biological insights into the functional architecture of transcriptional processes in hematopoietic progenitor cells.
“That was a genetic proof of the importance of these transcription factor interactions that came out from our genome-wide analysis,” emphasizes Dr. Göttgens.
As new advances enable genome-wide datasets to be generated faster and at lower costs, understanding the significance of the vast amounts of data remains one of the challenges for many biomedical areas. “Mapping binding sites for transcription factors became so much easier with the availability of next-generation sequencing, but the big issue currently is to understand what can be learned from it.”