April 15, 2015 (Vol. 35, No. 8)
Edward Perello CBO and founder Desktop Genetics
Novel Algorithm Seeks to Streamline Complex, Multiparameter Process
CRISPR gene editing technology offers life scientists unparalleled capabilities in functional genomics and cell-line engineering, as well as serving as a novel therapeutic tool to address and prevent disease. However, planning a gene editing experiment is a complex process that asks scientists to consider a multitude of experimental parameters, including:
- The nature of the cellular outcome (functional knock-out, N-terminal knock-in, coding mutation/silent mutation knock-in)
- The sequence of the targeted loci
- The availability of guide RNAs (gRNA) relevant to the target (distance/strand)
- The PAM sequence to use (NGG vs. NAG or others)
- The Cas9 variant to use (determines use of single guides vs. paired)
- The position of gRNAs relative to the target
- The position of gRNAs relative to each other
- The sequence and nature of a knock-in donor (oligo vs. plasmid donor, length of homology arms)
- The on-target activity scores
- The off-target activity scores
- The off-target cutting locations
There is a growing host of commercial and academic tools available for use by the CRISPR community. Each tool affords users the ability to design gRNAs for some, but not all, of the parameters described above.
For instance, while the MIT Broad Tool allows users to design single and paired guides, users can submit only 250 bp at a time, and can only obtain off-target activity scores. On the other hand, the Doench tool allows users to input genomic regions up to 10 kb, but they can only obtain on-target activity scores.
With these tools, and others, users must work with ENSEMBL and NCBI to slowly extract the correct nucleotide sequence for the transcript they want to target, reassembling the coding sequence in a DNA editing package (painstakingly annotating it as they go), before submitting it to one of these tools and waiting hours for their results.
Further, many tools make shortcuts in their off-target searches, typically using nonexhaustive, heuristic search methods (e.g., BLAST or Bowtie), repeat masking or excluding introns. Such approximations are generally unsuitable for CRISPR applications that require precise knowledge of risky off-target cuts in essential locations, especially CRISPR-based therapeutics.
The effects of these poor user experiences and suboptimal algorithms include unnecessarily slow planning of genome editing projects, wasted time for countless scientists, lost revenues for companies, and major delays in foundational advances and life-saving treatments.
Desktop Genetics seeks to give life scientists the power to expertly design genome-editing experiments in any cell line, right from their desktop. To do this, Desktop Genetics built Guidebook, a dedicated CRISPR tool, in partnership with Horizon Discovery.
Guidebook combines elements of all public tools, expands upon them with deeper algorithms, and overlays results in an information-rich format. The tool has two modes, a Wizard mode that picks optimal gRNAs for a given genome editing experiment, and an Advanced mode that provides control over all parameters.
Case Study—Advanced Mode
Guidebook’s Advanced mode pulls in data for genes, transcripts, and exon annotations directly from the latest annotation set available at ENSEMBL.org. CCDS annotations are taken from the latest data available at NCBI, and all transcripts are presented in line with one-another so that users can see which gRNAs cut across multiple transcripts, or just the one they care about (Figure 1).
gRNAs can be scored for predicted on-target activity, using the training set published by Doench et al. (2014) for human and mouse genomes. The platform scores all guides for on-target activity using these parameters, providing results in a few seconds. Other genomes are also available as a separate customizable service.
Guidebook also supports whole-genome, explicit searches for off-target sites with up to three mismatches. Off-target scores are calculated based on: mismatch locations, mismatch density, mismatch identity, and base-position weights as published by Hsu et al. (2013). The number of mismatches is customizable and the tool will find guides with both NGG and NAG PAM sites. The platform will report back to the user all mismatch sites found, the coordinates of the site and whether this falls in a coding DNA sequence.
Guidebook designs gRNAs for use with wild-type Streptococcus pyogenes Cas9 in a “single guide mode” as well the nickase D10A mutant for “pairs mode,” which generates gRNA designs for 5´ overhangs. Paired guides’ on-target activity is based on the Doench-Root activity score of the lowest-scoring gRNA. The maximum distance between the 5´ ends of each guide in a pair is configured at 100 bp. The minimum distance between the 5´ ends of each guide in a pair is configured at -4 bp. Other Cas9 nucleases are also available on request.
Currently under development is an advanced knock-in mode (Figure 2), which automatically designs donor sequences for oligo donors, considering homology arms and allowing users to introduce silent mutations to avoid retargeting.
Case Study—Wizard Mode
There are many nuances in picking the right gRNAs to cut a given gene, and novices are likely to choose sub-optimal guides. While building Guidebook, we observed Horizon Discovery’s scientists’ behaviors—some chose guides with low on-target scores in certain cases, others chose guides in particular exons for certain gene families. We collected this industry know-how through ethnographic study and translated it into new algorithms, which now wrap those described above.
In Wizard mode (Figure 3) users simply input a gene, the tool considers the biological context based on data from ENSEMBL and NCBI, and generates five gRNAs that represent the final panel that Horizon’s own gene-editing specialists would use for human and mouse cell lines.
In summer of 2014, Horizon gave away over 5,000 genome editing vectors targeting 1,000 human genes—all designed with the Wizard Library. The data we have collected so far indicates that >93% of researchers were able to knock-out their targets successfully.
The Right Genome
To date, Guidebook has designed over 10,000 gene editing experiments for reference human and mouse genomes. While Desktop Genetics continues to add more features and improve our gRNA design algorithms with a long-term goal of achieving predictive gRNA design, CRISPR is about more than gRNA scores. No score is fully predictive, and 100% of guides cutting in the wrong part of a gene, or not cutting due to SNPs disrupting PAM sites, will fail.
It is therefore important to consider other factors that influence CRISPR outcomes, such as the unique genome of the cell line that you work with, chromatin structure, or cell state.
To achieve the goal of fully predictive in silico genome editing for all cell lines, we recently announced our Free CRISPR Libraries Initiative. In exchange for data on your experiments, Desktop Genetics will design CRISPR libraries targeting any genome for any cell line. The data will be used to advance our machine-learning approaches to optimize CRISPR for the community.