Scientists at the Wellcome Sanger Institute say they developed a technique to predict the exact mutations CRISPR-Cas9 gene editing can introduce to a cell. The team edited 40,000 different pieces of DNA and analyzed a thousand million resulting DNA sequences to reveal the effects of the gene editing and develop a machine learning predictive tool of the outcomes.

The group maintains that their work (“Predicting the mutations generated by repair of Cas9-induced double-strand breaks”), published in Nature Biotechnology, will assist researchers who are using CRISPR-Cas9 to study disease mechanisms and drug targets and enable them to predict the best sequences to target to make CRISPR-Cas9 gene editing more reliable, and therefore cheaper and more efficient.

“The DNA mutation produced by cellular repair of a CRISPR–Cas9-generated double-strand break determines its phenotypic effect. It is known that the mutational outcomes are not random, but depend on DNA sequence at the targeted location. Here we systematically study the influence of flanking DNA sequence on repair outcome by measuring the edits generated by >40,000 guide RNAs (gRNAs) in synthetic constructs,” write the investigators.

“We performed the experiments in a range of genetic backgrounds and using alternative CRISPR–Cas9 reagents. In total, we gathered data for >109 mutational outcomes. The majority of reproducible mutations are insertions of a single base, short deletions or longer microhomology-mediated deletions. Each gRNA has an individual cell-line-dependent bias toward particular outcomes.

“We uncover sequence determinants of the mutations produced and use these to derive a predictor of Cas9 editing outcomes. Improved understanding of sequence repair will allow better design of gene editing experiments.”

The researchers created over 40,000 pairs of different target DNA and guide RNA, and carried out CRISPR-Cas9 gene editing. By deep sequencing of each pair in different cells, they were able to analyze in detail how the DNA was cut and rejoined. They found that the repair depended on the exact sequence of DNA and guide and discovered that it was reproducible within the same sequence.

The team then used the huge quantity of sequence data to create a machine learning computational tool, which created general rules to determine the outcome of the repair. This program–called FORECasT–enabled them to predict the repaired sequence, using the targeted DNA sequence alone.

“We have carried out the largest, most comprehensive study on CRISPR-Cas9 action to date, and analyzed more than a thousand million DNA sequences to allow us to study the process,” said Luca Crepaldi, Ph.D., joint first author on the study from the Wellcome Sanger Institute. “We demonstrated that specific target sequences were repaired by the cell in the same way, proving that the action of the cell mechanisms is reproducible.”

“CRISPR-Cas9 is an extremely important system for introducing mutations into DNA for research, and prospective therapeutic purposes,” added Leopold Parts, Ph.D., the senior author who is also from the Wellcome Sanger Institute. “Our research allows scientists to understand its workings much better, and our transformational method enables people to predict the effects of each CRISPR-Cas9 edit in a cell. This allows better design of editing experiments, and may lead to future therapeutic applications.”