Efficiency Prediction and Mechanism Discovery for the CRISPR-Cas9 System
Abstract
CRISPR-Cas9 has been employed as a genome editing tool in a wide range of cells of different organisms. One of the biggest challenges it faces is to maintain the efficiency of the gene regulation. To address this challenge, we have designed in this study a data-driven approach based on machine learning to predict the efficiency and to discover the mechanism of CRISPR-Cas9. We have developed Bayesian Network models to model the relationships between sequence features of target DNA and the efficiency of CRISPR-Cas9 system. We first replicated results of 2 studies and explained why naive Bayes works better as a generative model than logistic regression. Then we solved the false conditional independence of the nucleotides assumption by changing the dummy encoding to k-mer encoding. We also adopted Bayesian network structure learning and inference to assess the prediction power of the model. We eventually used D-separation analysis to study the mechanism of the CRISRR/Cas9. We combined the latest CRISPR/Cas9 structure with our D-separation analysis results and we found that the location of the active site of Cas9 and the location of scissile bonds is consistent with our D-separation findings.
Citation
Yan, Yi (2018). Efficiency Prediction and Mechanism Discovery for the CRISPR-Cas9 System. Master's thesis, Texas A & M University. Available electronically from https : / /hdl .handle .net /1969 .1 /173628.