A remarkable scientific feat of the past century, the discovery of the DNA double helix, set in motion advances that reshaped the biomedical, physical, social, and behavioral sciences more powerfully than any other event in history.
The ensuing decades marked a vibrant period, one in which existing research areas were redefined and reshaped, new disciplines were born, and fields that had previously been perceived as unrelated converged to forge inter- and multidisciplinary endeavors.
Our progress in characterizing genes, followed by the interest and the need to learn about their organization into genomes, was assisted and paralleled by key developments in biotechnology. In 1977, when the first viral genome, that of bacteriophage Φ174, was reported, approximately 1,000 base pairs could be sequenced annually. Later estimates projected that over a millennium would be needed to sequence the Escherichia coli chromosome, and over a million years to sequence the human genome, using similar approaches.
Yet, 1995 witnessed the first fully sequenced genome of a free-living organism, that of the bacterium Haemophilus influenzae, and at the end of 2009, when sequencing and annotating a bacterial genome took less than 24 hours, the 1,000th bacterial genome sequence was published.
While approximately 13 years and $3 billion were required to sequence the first human genome, most recently this task has become possible within a day or less, for $1,000.
Genome-Wide Association Studies
As the Human Genome Project revealed that people are 99.9% identical at the DNA level, the remaining 0.1% emerged as one of the most exciting components for further study. Many of the estimated 3–10 million single nucleotide polymorphisms, and the more recently unveiled copy number variants, started to shed light on inter-individual differences in traits, disease susceptibilities, and therapeutic responses.
Subsequently, genome-wide association studies established increasing numbers of links between genes and phenotypes. However, despite the strength of this approach, many of the links were not causal, others were not statistically significant, and the ones that conferred an enhanced risk often explained only a small proportion of the heritability for specific complex diseases.
For example, 32 loci associated with Crohn’s disease explain approximately 20% of the heritability for this condition, and about 47 loci linked to type 2 diabetes and glycemic traits account, collectively, for only approximately 10% of the heritability.
This phenomenon is known as “missing heritability.” The phenotypic discordance between monozygotic twin pairs that is often apparent for many medical conditions is also supported by studies showing increasingly divergent DNA methylation, histone acetylation, and gene expression patterns between monozygotic twin pairs as they are advancing with age. Collectively, these findings point toward the importance of additional, nongenetic factors, in shaping phenotypes.
Epigenetics During Differentiation
That notion that DNA sequence changes are not the only factor shaping gene expression and phenotypes is neither new, nor unexpected. It has, in fact, been apparent for decades. In 1957, Conrad Hal Waddington introduced the term “epigenetic landscape” to refer to the causal interaction between genes and their products, as phenotypes are being shaped in a differentiating embryo.
To illustrate the multitude of interconnected choices that a differentiating cell can make, Waddington depicted it as a ball rolling down a landscape of ridges and valleys that branch at different points, representing the alternative fates that it could assume.
As the ball moves downhill, its options are progressively narrowed, and by the time it reaches the valley, it becomes a differentiated cell. It has become increasingly apparent that epigenetic modifications can explain the ability of totipotent cells to generate the over 220 cell types of an adult organism that, with small exceptions, share the same DNA, but nevertheless exhibit significantly different gene expression patterns and perform widely distinct functions. Thus, development provides an ideal system to visualize gene expression changes that are epigenetically shaped.
Over the years, the definition of epigenetics as a field has shifted and, most recently, the term has been used to describe potentially heritable gene expression changes that occur without alterations in the DNA nucleotide sequence.
The groundbreaking discovery that overexpressing four embryonic transcription factors is sufficient to reprogram terminally differentiated fibroblasts into induced pluripotent stem (iPS) cells, which resemble embryonic stem cells, demonstrated the possibility to reverse cellular differentiation, a process known as reprogramming. This was first reported in 2006 for mouse fibroblasts and in 2007 for human fibroblasts, and subsequently for additional cell types.
The iPS cells exhibit the ability to differentiate into many cell types and the capacity for infinite self-renewal. In addition to unveiling details about the molecular basis of differentiation and about mechanisms that allow cells to maintain their identity, this finding opened novel therapeutic opportunities, including the possibility to generate patient-specific embryonic stem cells for use in regenerative medicine and to treat various conditions.
The ability of physical, chemical, biological, and socio-emotional factors to change gene expression by epigenetic modifications has opened a fascinating chapter in biology. Ultraviolet B exposure, previously known to induce mutations, was shown to cause DNA hypermethylation and to transcriptionally downregulate tumor-suppressor genes.
In many instances, these links provide the mechanistic basis for epidemiological observations made decades ago. For example, divalent nickel compounds have long been implicated in carcinogenesis based on animal and human epidemiological studies, even though they do not appear to be strong mutagens in vitro, suggesting that carcinogenesis may also occur by nongenotoxic pathways.
Divalent nickel salts were shown to cause epigenetic changes that include aberrant DNA methylation and post-translational histone modifications, causing changes in chromosomal condensation and gene silencing. These findings also illustrate that, for a long time, we erroneously visualized carcinogens as being mutagens, and neglected to consider the possibility that gene expression changes may occur by mechanisms that do not involve mutagenesis.
This fallacious strategy, similar to searching for the lost keys only under the lamp, because that is where the light reaches, was relevantly called, by Trosko and Upham, the “lamp post effect.”
In 2012, it was reported that slightly over 20% of all human cancers are causally linked to infectious diseases, and epigenetics played a pivotal role in rekindling and providing a mechanistic understanding of this link, which was first reported over a century ago but subsequently fell into oblivion for decades. These advances helped characterize the “epigenetic field for cancerization” or “epigenetic field defect,” a region with aberrant CpG methylation that has a higher likelihood of undergoing malignant transformation.
An epigenetic field for cancerization was visualized after exposure to various carcinogens, and was reported for multiple malignant tumors, including stomach, breast, liver, and colon cancer.