Multiple CRISPR technologies are being used to specifically alter or silence genes and inhibit protein production. One of these tools is CRISPRi (CRISPR interference) which blocks genes and gene expression without modifying the DNA sequence. Like the traditional CRISPR mechanism, the guide RNA directs a nuclease (Cas). However, the CRISPRi nuclease binds to the DNA without cutting it, resulting in a downregulation of the corresponding gene. CRISPRi has become a leading technique to silence gene expression in bacteria. However, design rules remain poorly defined.
Until now, it has been challenging to predict the performance of CRISPRi for a specific gene. But researchers have now developed a machine learning approach using data integration and artificial intelligence (AI) to improve such predictions in the future. The scientists used data from multiple genome-wide CRISPRi essentiality screens to train a machine learning approach. Their goal was to better predict the efficacy of the engineered guide RNAs deployed in the CRISPRi system.
The authors found that gene-specific characteristics of targeted genes have a significant impact on guide RNA depletion in genome-wide screens. In addition, combining data from multiple CRISPRi screens significantly improves the accuracy of prediction models and enables more reliable estimates of guide RNA efficiency. The study provides valuable insights for designing more effective CRISPRi experiments by predicting guide RNA efficiency, enabling precise gene-silencing strategies.
This work is published in Genome Biology, in the paper, “Improved prediction of bacterial CRISPRi guide efficiency from depletion screens through mixed-effect machine learning and data integration.”
“Unfortunately, genome-wide screens only provide indirect information about guide efficiency. Hence, we have applied a new machine learning method that disentangles the efficacy of the guide RNA from the impact of the silenced gene,” explained Lars Barquist, PhD, research group leader at the Würzburg Helmholtz Institute for RNA-based Infection Research (HIRI) and junior professor in the faculty of medicine at the University of Würzburg.
The team developed a “mixed-effect random forest regression model that provides better estimates of guide efficiency.” In doing so, they established comprehensible design rules for future CRISPRi experiments. The study authors validated their approach by conducting an independent screen targeting essential bacterial genes, showing that their predictions were more accurate than previous methods.
“The results have shown that our model outperforms existing methods and provides more reliable predictions of CRISPRi performance when targeting specific genes,” said Yanying Yu, a PhD student in Barquist’s research group.
The scientists were particularly surprised to find that the guide RNA itself is not the primary factor in determining CRISPRi depletion in essentiality screens. “Certain gene-specific characteristics related to gene expression appear to have a greater impact than previously assumed,” explained Yu.
The study also reveals that integrating data from multiple data sets significantly improves the predictive accuracy and enables a more reliable assessment of the efficiency of guide RNAs.
“Expanding our training data by pulling together multiple experiments is essential to create better prediction models. Prior to our study, lack of data was a major limiting factor for prediction accuracy,” noted Barquist. “Our study provides a blueprint for developing more precise tools to manipulate bacterial gene expression and ultimately help to better understand and combat pathogens.”