If you were to imagine the genome as a long-contested frontier, you would realize that the landscape, so scarred by countless molecular battles, could tell a war story worthy of Caesar. In fact, the genomic landscape has many Caesars. They are called transcription factors. And if transcription factors could speak, they would say: we came, we saw, we conquered … and we enslaved.
Transcription factors, the proteins that turn genes on or off, bind to precise sites on DNA. Despite their important roles in development and disease, transcription factors have been largely unexplored because they posed a formidable challenge for researchers. But now a group of researchers from the University of Toronto have completed a systematic study of the largest group of human transcription factors, the Cys2-His2 zinc finger proteins, or C2H2-ZFs. These transcription factors are incredibly diverse, corresponding to over 700 proteins and representing about 3% of the human genome.
The researchers, led by Tim Hughes, Ph.D., have determined that C2H2-ZFs grew to be so abundant and diverse because many of them evolved to defend our ancestral genome from damage caused by unending skirmishes with “selfish DNA.” This finding appeared February 18 in Nature Biotechnology, in an article entitled, “C2H2 zinc finger proteins greatly expand the human regulatory lexicon.”
“Natural C2H2-ZFs encoded in the human genome bind DNA both in vitro and in vivo, and we infer the DNA recognition code using DNA-binding data for thousands of natural C2H2-ZF domains,” wrote the authors. “We provide direct evidence that most KRAB-containing C2H2-ZF proteins bind specific endogenous retroelements (EREs), ranging from currently active to ancient families. The majority of C2H2-ZF proteins, including KRAB proteins, also show widespread binding to regulatory regions.”
EREs are the remnants of selfish DNA that probably came from ancient retroviruses. Similar to modern retroviruses, the ancient interlopers inserted their DNA into the host’s genome. When this happened in an egg or sperm, the viral DNA got passed on to the next generation. Once they secured a DNA beachhead, the EREs would multiply, inserting copies of themselves randomly across the genome.
These depredations, however, did not go unanswered. Eventually, legions of C2H2-ZFs were on the march. Initially, they evolved to switch off EREs. Then they began using the EREs scattered across the genome as DNA docking sites, from which they could take control of nearby genes. The EREs were first conquered, and then enslaved.
Hughes described a neat example of this process. One C2H2-ZF family member, a transcription factor called ZNF189 evolved to silence an ancient retro-element, known as LINE L2, which is a staggering 100 million years old. L2 is now inactive but ZNF189 still binds to it because it uses L2 remnants to reach other genes.
Relics of L2 sequences happen to be near genes that drive brain and heart development. And so ZNF189 could take on a new role in shaping these organs, an arrangement preserved by natural selection because it was beneficial to the embryo.
ZFN189 likely puts “breaks” on the “brain genes,” similar to its ancient role with L2. But in heart cells, it may actually turn genes on because it misses the part that makes the “off switch.”
“What I think was not appreciated until this study is that retro-elements are really a driving force in the evolution of transcription factors themselves. All mammals have a whole bunch of custom transcription factors that came about to silence the EREs,” said Dr. Hughes. “But the EREs and these new transcription factors are different even for different vertebrates.”
Dr. Hughes’ team combined data from a modified bacterial one-hybrid system with protein-binding microarray and chromatin immunoprecipitation analyses to show that C2H2-ZF proteins are primarily DNA-binding proteins. “We estimate that the full collection of sequence preferences of full-length human C2H2-ZF proteins is likely to encompass 450 distinct motifs,” added the authors of the Nature Biotechnology article. “In contrast, the combination of all other transcription factor classes is expected to contain just over 350 independent motifs.
“The prevalence of C2H2-ZFs, frequent conservation of their binding sites, and association of their binding sites with genes of diverse functions suggest that they are as integral to metazoan genome function and evolution as the more classical ‘evo-devo’ and other conserved pathways that currently dominate the literature on gene regulation.”