March 1, 2014 (Vol. 34, No. 5)
Charles A. Gersbach, Ph.D. Assistant Professor Institute for Genome Science & Policy, Duke University
Thomas Gaj, Ph.D. Postdoctoral Research Associate University of California
Carlos F. Barbas III, Ph.D. Kellogg Professor & Chair The Skaggs Institute for Chemical Biology
The Genomic Revolution has promised to advance medicine and biotechnology by providing scientists with enormous amounts of data that can be converted into useful information.
Over 10 years ago, the Human Genome Project produced the first draft of the more than 3 billion base pairs of DNA that make up the genetic code in each of our cells.
More recent efforts like the 1000 Genomes and HapMap Projects have since focused on identifying the differences within these billions of base pairs of DNA between individuals, while genome-wide association studies have pinpointed specific sequences that determine health and disease. The ENCODE Project and other studies have annotated chromatin states, regulatory elements, transcription factor binding sites, and other epigenetic states throughout the genome.
Dozens of other species have since undergone similar analyses, with the number of sequenced genomes continuously growing. Collectively, these efforts have generated an incredibly rich source of data that promises to aid our understanding of the function and evolution of any genome. However, until recently, scientists have been lacking the tools necessary to interrogate the structure and function of these elements.
While conventional genetic engineering methods could be used to add extra genes to cells, they cannot be easily used to modify the sequences or control the expression of genes that already exist within these genomes. These types of tools are necessary to determine not only the function of genes, but also the role of genetic variants and regulatory elements. They can also be used to overcome longstanding challenges in the field of gene therapy. Without these technologies, it has been difficult—and in some cases impossible—for scientists to capitalize on the Genomic Revolution.
A potential route for introducing precise changes into the genome was suggested by the discovery of homologous recombination and its application in creating transgenic and knockout mice, for which the Nobel Prize in Physiology or Medicine was awarded in 2007. This method, however, was only efficient in mouse embryonic stem cells and not immediately applicable to cells from other species or even other mouse cell types.
But in 1994 a breakthrough study showed that creating a double-strand break at a specific site in the genome could stimulate the cellular DNA repair pathway and increase the frequency of homologous recombination at the break point by many orders of magnitude. Critically, this study also suggested a direct approach for editing virtually any gene sequence in many different cell types and species using homologous recombination—but only if DNA breaks could be targeted to specific sequences.
The re-engineering of molecular machinery to recognize new sequences in complex genomes is a daunting task. However, in 1991, the crystal structure of Zif268, a naturally occurring zinc finger protein, provided insight into how Nature solved this problem. This structure led to the discovery that zinc finger proteins, among the most common class of DNA-binding proteins across all domains of life, recognize DNA using independent and modular domains that make specific contacts with three base pairs of DNA (Figure 1). This work suggested that these domains could be redesigned to recognize new base pair combinations and linked together to form new proteins.
Subsequent research by several laboratories led to the development of technologies for making custom synthetic zinc finger proteins that can be targeted to a broad range of sites in almost any genome. This constituted the first technology for targeting and regulating specific endogenous genes.
In the late-1990s, the catalytic domain of the FokI endonuclease, which nonspecifically cleaves DNA, was fused to custom zinc finger proteins to generate the first zinc finger nuclease (ZFN). Because the DNA-binding specificity of zinc finger proteins could be reprogrammed, new ZFNs could be rapidly formed and used to introduce targeted double-strand breaks to almost any gene in the genome.
Critically, because FokI acts as a dimer, it’s necessary to engineer two ZFNs that target opposite strands of DNA in a head-to-head configuration (Figure 1). Thus, when two FokI catalytic domains assemble together at the targeted DNA site, a double-strand break is created and genome editing is initiated.
New and Improved Tools
Despite the powerful potential of ZFN-mediated genome editing and the many notable achievements made using this technology, the scientific community was slow to broadly adopt it. This was largely due to the technical expertise necessary to engineer new zinc finger proteins, as well as the need to screen many ZFNs in order to uncover enzymes with high activity and low toxicity. Arguably the lack of cheap and widespread DNA synthesis and public plasmid repositories in the early 2000s also slowed the development of this technology.
But in 2009, the DNA recognition code of another modular DNA-binding protein–transcription activator-like effectors (TALEs)–was reported. TALEs are proteins produced by plant pathogenic bacteria and, in contrast to zinc fingers, recognize a single base pair of DNA using only a single protein module (Figure 1). Therefore, new TALE proteins capable of recognizing almost any DNA sequence could be assembled from only four pieces.
Within two years of these first reports, several groups developed methods to quickly and economically assemble DNA sequences encoding new TALE proteins, as well as methods for fusing them to the FokI catalytic domain to create TALE nucleases, or TALENs. Most remarkably, almost every new report agreed that the majority of assembled TALENs were highly active at their intended target sites, in contrast to ZFNs, which often required screening or selection methods.
Consequently many of the applications previously developed for genome editing with ZFNs were quickly replicated with TALENs. However, two main limitations remained. First, a new TALEN dimer must be constructed for every new target site. As a result, assembling TALENs requires some expertise with recombinant DNA techniques. Second, TALENs are large proteins with many highly repetitive sequences. Thus, their delivery into cells with size-restricted vectors or lentiviral system can be challenging.
Following quickly on the heels of TALEN technology, in 2012 it was shown that the type II Clustered Regularly Interspersed Palindromic Repeats (CRISPR)/CRISPR-associated protein (Cas) system that naturally functions as an adaptive immunity system in bacteria and archaea could be engineered to serve as an RNA-guided nuclease outside of its natural host. This CRISPR/Cas system consists solely of the Cas9 nuclease and a single guide RNA (gRNA) that binds to Cas9 and directs it to a specific genomic target site. This is achieved by complementary hybridization to a 20 base pair sequence known as the protospacer (Figure 1).
The only restriction on target site selection is that the protospacer must be located directly upstream of a short DNA sequence known as the protospacer-adjacent motif (PAM), which is specific to each bacterial species from which the Cas9 and gRNA are derived. The ability to redirect the CRISPR/Cas system to new target sites by only swapping the 20 base pair targeting sequence of the gRNA is a significant advantage compared to ZFN and TALEN systems.
As this technology requires no protein engineering expertise, its simplicity has enabled genome editing for a much wider audience of users. Furthermore, the CRISPR/Cas system has proven to be remarkably robust across many cell types and species with both high success rates and genome-editing frequencies.
Another distinct advantage of the CRISPR/Cas system is that many sites can be simultaneously targeted with a single enzyme and multiple gRNAs, in contrast to needing to deliver a separate pair of ZFNs or TALENs for each target. Early studies suggested that the CRISPR/Cas system might be limited by lack of specificity. However, more recent studies have reported a variety of approaches to improve specificity, including using Cas9 nickases that cut one strand of DNA, rather than two, and using shorter protospacer targets that limit off-target sites.
Genome Editing at Work
Genome editing allows for the introduction of a wide range of alterations to almost any DNA sequence. By creating double-strand breaks and inducing the nonhomologous end joining DNA repair pathway, genes can be disrupted or short fragments of DNA can be captured at the break point (Figure 2). If two nucleases that cut the same chromosome are delivered into cells simultaneously, the intervening segment can be deleted or inverted. Alternatively, if a DNA repair template is co-delivered into cells with the nuclease, it can be copied into the genome by homology-directed repair. This process allows for the exchange of just a few base pairs or the integration of whole gene expression cassettes (Figure 2).
The simplicity of implementing the CRISPR/Cas system has dramatically accelerated the pace of scientific advances using genome engineering in academic labs in the last year. However, all three genome-editing platforms fundamentally act through the same mechanism and therefore, at this time, there is no clear single “best” platform to use when comparing high-quality ZFNs, TALENs, and CRISPR/Cas9.
Rather, the optimal platform to work with depends on available resources and application-specific criteria, such as delivery methods and target cells or tissue types. Nonetheless, TALENs and CRISPR/Cas9 are now being rapidly implemented for many of the same applications that ZFNs were developed for over the last 15 years. This includes using the nucleases to knockout or modify genes sequences in diverse species, including human cell lines and stem cells, Drosophila, C. elegans, zebrafish, mice, rats, pigs, cows, insects, plants, and most recently monkeys.
Genome modifications have been used for applications as diverse as investigating gene function, generating pesticide-resistant crops, creating transgenic animals, engineering cell lines to produce biopharmaceuticals, correcting mutations that cause genetic disease, and inhibiting HIV infection of human cells, including several ongoing clinical trials.
Thus far, zinc finger proteins, TALEs, and CRISPR/Cas9 have been most widely used as nucleases for genome editing. However, an equally significant application of these technologies is in the regulation of gene expression and manipulation of the epigenome. Rather than fusing zinc fingers or TALEs to the FokI endonuclease, they can be attached to activator or repressor domains that modulate gene expression, or even enzymatic domains that manipulate DNA methylation or histone modifications. Similarly, the nuclease activity of Cas9 can be ablated by two point mutations and the resulting RNA-guided protein scaffold can be fused to gene regulation domains to activate or repress target genes.
Thus, these technologies provide a complete toolbox for custom redesign of almost any property of the genome for both basic science and medicine.
The ZFN, TALEN, and CRISPR/Cas genome engineering technologies have provided scientists with the tools necessary to dissect the vast amount of information accumulated through the Genomic Revolution. These approaches will catalyze basic science studies to identify the genetic basis of complex diseases, such as diabetes, cardiovascular disease, and neurological disorders. The technologies have immediate implications in enhancing agricultural and biopharmaceutical production.
Finally, these methods have the potential to address technical challenges that have been obstacles to the field of gene therapy for decades. Whereas the first phase of the Genomic Revolution informed us of the biology of how genomes function, this next phase will be characterized by engineering genomes to advance science, biotechnology, and medicine.
Charles Gersbach, Ph.D. (firstname.lastname@example.org), is an assistant professor in the department of biomedical engineering and at the Institute for Genome Science and Policy, Duke University. Thomas Gaj, Ph.D., is a postdoctoral research associate at the University of California, Berkeley. Carlos F. Barbas III, Ph.D., is Kellogg Professor and Chair at the The Skaggs Institute for Chemical Biology and professor of chemistry and cell and molecular biology at The Scripps Research Institute-La Jolla.