A protein called integration host factor (blue) creates a sharp turn in the DNA upstream of the CRISPR repeat, allowing Cas1-Cas2 (green and yellow) to recognize and bind the insertion site. [Addison Wright/UC Berkeley]
A protein called integration host factor (blue) creates a sharp turn in the DNA upstream of the CRISPR repeat, allowing Cas1-Cas2 (green and yellow) to recognize and bind the insertion site. [Addison Wright/UC Berkeley]

New data from Jennifer Doudna, Ph.D., co-pioneer of the CRISPR/Cas9 genome-editing technology, and her research group at the University of California, Berkeley show that for bacterial adaptive immunity to achieve site-selective expansion, the CRISPR integration complex relies on structural DNA cues rather than direct sequence recognition. The research team looked specifically at the Cas1-Cas2 enzymes and how they integrate viral DNA into a host's genome, creating a kind of molecular file that can be referenced later, during subsequent infections, to facilitate targeted counterattacks. Findings from the new study were published recently in Science through an article entitled “Structures of the CRISPR Genome Integration Complex.”

These proteins rely on the unique flexibility of the CRISPR DNA to recognize it as the site where viral DNA should be inserted, ensuring that “memories” of prior viral infections are properly stored. Lead study investigator Addison Wright, a graduate student in Dr. Doudna’s lab, told GEN “the discovery that recognition depends more on structure than direct sequence recognition has some pretty significant implications for the repurposing of Cas1-Cas2 as a tool. Their most obvious functions take advantage of their evolved role as a sort of information storage device, as we saw with George Church's recent work encoding a movie in E. coli genomes.”

Importantly, the investigators found that a third protein, called integration host factor (IHF), binds near the insertion site and bends the DNA into a U-shape, allowing Cas1-Cas2 to bind two recognition sites simultaneously. Moreover, the researchers discovered that the reaction requires target DNA to bend and partly unwind, something that only occurs at the appropriate target site.

CRISPR is an acronym that stands for clustered regularly interspaced short palindromic repeats, which refers to the unique region of DNA where snippets of viral DNA are stored for future reference—allowing the cell to recognize any virus that tries to reinfect. The viral DNA alternates with short palindromic repeats, which serve as the recognition signal to direct Cas1-Cas2 to add new viral sequences. Specific recognition of these repeats by Cas1-Cas2 restricts integration of viral DNA to the CRISPR array, allowing it to be used for immunity and avoiding the potentially fatal effects of inserting viral DNA in the wrong place.

In contrast to DNA-binding proteins that directly “read out” the nucleotides of their recognition sequence, Cas1-Cas2 recognizes the CRISPR repeat through more indirect means: its shape and flexibility. In addition to coding for proteins, the nucleotide sequence of a stretch of DNA also determines the molecule's physical properties, with some sequences acting as flexible hinges and others forming rigid rods. The sequence of the CRISPR repeat allows it to bend and flex in just the right way to be bound by Cas1-Cas2, allowing the protein complex to recognize its target by shape.

The discovery of how Cas1-Cas2 recognizes its target opens the door for modification of the proteins themselves. By tweaking the proteins, researchers might be able to create complexes that could be redirected to sequences other than the CRISPR repeat. Such complexes could be useful in organisms that lack their own CRISPR locus.

“The fact that recognition depends on the structure of the DNA changes how we go about looking for potential target sites,” Mr. Wright pointed out in correspondence to GEN. “Rather than looking for sequences that have a G at position 4, for example, we might look for ones that have a potential 'hinge' sequence in the middle.”

Wright concluded that “we might want to alter the proteins themselves to target them to a new system. Since recognition seems so dependent on the overall structure of the complex, the best strategy would be to make mutations over the protein as a whole, with the aim of subtly altering active site positioning, rather than focusing mutations at a particular sequence-readout region of the protein.”