A tool for CRISPR-Cas9 sgRNA evaluation based on computational models of gene expression

Cohen, Shai; Bergman, Shaked; Lynn, Nicolas; Tuller, Tamir

doi:10.1186/s13073-024-01420-6

Software
Open access
Published: 23 December 2024

A tool for CRISPR-Cas9 sgRNA evaluation based on computational models of gene expression

Shai Cohen¹^na1,
Shaked Bergman¹^na1,
Nicolas Lynn¹ &
…
Tamir Tuller ORCID: orcid.org/0000-0003-4194-7068^1,2

Genome Medicine volume 16, Article number: 152 (2024) Cite this article

2797 Accesses
4 Citations
Metrics details

Abstract

Background

CRISPR is widely used to silence genes by inducing mutations expected to nullify their expression. While numerous computational tools have been developed to design single-guide RNAs (sgRNAs) with high cutting efficiency and minimal off-target effects, only a few tools focus specifically on predicting gene knockouts following CRISPR. These tools consider factors like conservation, amino acid composition, and frameshift likelihood. However, they neglect the impact of CRISPR on gene expression, which can dramatically affect the success of CRISPR-induced gene silencing attempts. Furthermore, information regarding gene expression can be useful even when the objective is not to silence a gene. Therefore, a tool that considers gene expression when predicting CRISPR outcomes is lacking.

Results

We developed EXPosition, the first computational tool that combines models predicting gene knockouts after CRISPR with models that forecast gene expression, offering more accurate predictions of gene knockout outcomes. EXPosition leverages deep-learning models to predict key steps in gene expression: transcription, splicing, and translation initiation. We showed our tool performs better at predicting gene knockout than existing tools across 6 datasets, 4 cell types and ~207k sgRNAs. We also validated our gene expression models using the ClinVar dataset by showing enrichment of pathogenic mutations in high-scoring mutations according to our models.

Conclusions

We believe EXPosition will enhance both the efficiency and accuracy of genome editing projects, by directly predicting CRISPR’s effect on various aspects of gene expression. EXPosition is available at http://www.cs.tau.ac.il/~tamirtul/EXPosition. The source code is available at https://github.com/shaicoh3n/EXPosition.

Background

Over the past decade, significant progress has been achieved in the field of genome editing, largely attributed to the utilization of CRISPR (clustered regularly interspaced short palindromic repeats) and its Cas (CRISPR-associated) proteins (reviewed in [1]). The Cas9 protein creates double-stranded breaks (DSBs) that are subsequently repaired by the cell’s repair mechanisms, usually through non-homologous end joining (NHEJ), leading to the potential introduction of indels. One of the areas where these advancements have occurred pertains to gene silencing, with the primary objective being the selective inhibition of a specific gene without affecting others. Various methods for gene silencing using CRISPR exist, including expression inhibition through CRISPRi (CRISPR interference), the introduction of point mutations, and the insertion of premature stop codons. Another common approach involves the use of Cas9 proteins to induce mutations in start codons, thereby inhibiting translation initiation [2,3,4].

In most computational models predicting CRISPR activity, researchers have shown interest in the DSB’s location and likelihood, as well as the identity of the resulting mutation [5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22]. This paradigm assumes that if a mutation is induced, then the gene’s expression will significantly decrease; however, this is not always the case. For example, a gene with a mutated start codon can still be translated due to an alternative in-frame start codon, so the resulting protein remains functional [23] (Fig. 1A). Furthermore, transcription and splicing are influenced by various properties of the DNA sequence [24, 25] and not every change to the original sequence will result in a measurable change in the gene’s expression. Therefore, mutated off-target genes, i.e., genes mutated by CRISPR even though the single-guide RNA (sgRNA) was not meant to target them, do not necessarily have their expression affected. On the other hand, even if each exon’s DNA sequence remains unmutated, the gene’s expression could be affected by intronic or intergenic mutations (Fig. 1B). Thus, mutations in all positions require careful evaluation to determine whether they exert a discernible effect on expression.

Recently, a few tools designed to predict gene knockout following CRISPR were created [26, 27]. These tools focus on sgRNAs that target the coding region of a gene and use amino acid composition, conservation and frameshift likelihood. However, a tool designed to address sgRNAs’ effect on gene expression, that can evaluate sgRNAs targeting non-coding regions of genes, is needed.

Here we present a tool that addresses these issues by explicitly considering CRISPR’s effect on the target site’s phenotype rather than the genotypic change. Our tool predicts the impact of CRISPR’s action on three aspects of gene expression: transcription, splicing, and translation initiation. It then incorporates these estimations with predicted cutting efficiency and predictions from tools designed to predict gene knockout to better assess gene knockout. Since not every mutation will significantly affect gene expression, researchers using our tool can save time and money when deciding which sgRNA is most likely to achieve the desired change in gene expression (e.g., silencing) without affecting the expression of off-target genes.

Implementation

General pipeline of the tool

Our tool, called EXPosition, accepts a CRISPR target site location and predicts whether targeting that site will silence the target gene. It accomplishes this by combining post-CRISPR gene expression estimations with gene knockout predictions from other models that rely on other features. Importantly, EXPosition differs from previous tools by also providing information about the likely phenotypical effect of CRISPR at that site, i.e., the effect of the induced mutation on gene expression. The tool’s modules are summarized in Fig. 1C. In short, EXPosition accepts an sgRNA and a gene of interest; predicts the likelihood of the guide’s cut and NHEJ-induced mutations using CRISPRedict [28] and Lindel [29], respectively; predicts the mutations’ effect on transcription, splicing, and translation initiation using Xpresso [24], Oncosplice [30], and TITER [31], respectively; and combines them with the scores from VBC [26], GuidePro [27], and CRISPRedict [28] to provide a prediction of gene knockout using that sgRNA. The source code for our tool is available at https://github.com/shaicoh3n/EXPosition [32].

How to run EXPosition is hereby described in more detail. Firstly, the user chooses which sub-models to run (Fig. 1C:a): transcription, splicing, and translation initiation. Then, the user inputs a target site location (Fig. 1C:b). Using CRISPRedict [28] and Lindel [29], the tool predicts the most likely mutations to be induced by CRISPR, along with their probabilities (Fig. 1C:I). Alternatively, the user can specify the mutations and their probabilities (Fig. 1C:c). In both cases, these mutations are then analyzed in the chosen sub-models. In each sub-model (Fig. 1C:III-V), we evaluate the effect of each mutation on one aspect of expression for each human gene, thereby finding the genes affected by the mutation. We then multiply each mutation’s phenotypic score by the probability of the mutation, and sum over all mutations to arrive at an expected value of effect for that aspect of expression for a certain gene (Fig. 1C:f–h). If multiple genes have been affected by the mutations, the gene that received the highest score (i.e., most detrimental effect on expression) will be the output of the sub-model. Thus, the final score for each sub-model would be:

$$\mathrm{Sub-model}\;\mathrm{score}\;=\max_{j\in\{1,..,N_{\mathrm{affected}\;\mathrm{genes}}\}}\sum_i^{N_{\mathrm{mutations}}}p_i\ast s_{ij}$$

(1)

where $j$ ranges from 1 to ${N}_{\text{affected genes}}$, which is the number of genes affected by the predicted mutations and is usually 1–2 (Fig. 1C:e); ${N}_{\text{mutations}}$ is the number of top mutations (i.e., most probable mutations) considered (by default, ${N}_{\text{mutations}}=4$); ${p}_{i}$ is the probability of mutation $i$ occurring; and ${s}_{ij}$ represents the expected effect of mutation $i$ on gene $j$ according to the sub-model, i.e., its effect on transcription, splicing, or translation initiation. ${s}_{ij}$ is normalized to range between 0 and 1 in the following way:

$$s_{ij}=\min\left(1,\frac{{s'}_{ij}}{M_{\mathrm{sub}-\mathrm{model}}}\right)$$

(2)

where ${s{\prime}}_{ij}$ represents the highest raw sub-model score for mutation $i$ over all gene $j$‘s transcripts, and ${M}_{\text{sub}-\text{model}}$ is the maximal score predicted by the sub-model on all ClinVar mutations (see the section “Mutations with high gene expression scores are overrepresented in ClinVar-designated pathogenic mutations”).

Finally, the scores from EXPosition are combined with the predicted cutting efficiency (Fig. 1C:i), VBC score (Fig. 1C:j), and GuidePro score (Fig. 1C:k) into a Support Vector Machine (SVM) classifier (Fig. 1C:VIII). to provide a binary classification of whether the sgRNA would cause a gene knockout (Fig. 1C:I).

Importantly, instead of evaluating every human coding gene/transcript affected by the mutations, the user can input a gene or transcript of interest (Fig. 1C:d); if they do so, the mutation’s effects in the selected sub-models will be checked only against that gene or transcript. If a transcript of interest was set, ${s{\prime}}_{ij}$ is the score of that transcript (instead of the maximal score over all the associated gene’s transcripts).

In the transcription sub-model (Fig. 1C:III), the score (${s}_{ij}$) represents the predicted relative change in mRNA levels caused by a mutation; this change is predicted using Xpresso [24]. In the splicing sub-model (Fig. 1C:IV), the score signifies the predicted loss of functionality of the gene’s proteins (based on evolutionary conservation) caused by a mutation, while considering any mis-splicing events and changes in the position of their start codon; this is accomplished using Oncosplice [30]. In the translation initiation sub-model (Fig. 1C:V), the score denotes the predicted relative change in the start codon’s efficiency of translation initiation, while considering any mis-splicing events; this is achieved using TITER [31]. The following sections detail the different parts of EXPosition.

Predicting the genotypic outcome of CRISPR’s DSB

As a first step, based on the user’s cut site location, we extract the target site along with its flanking sequences to predict the DSB’s probability and resulting indels using CRISPRedict [28] and Lindel [29] respectively (Fig. 1C:I). CRISPRedict is a linear regression model that predicts the probability of cutting by sgRNAs; it takes as input a 30-nt-long sequence surrounding the cut site: 4 nt upstream to the cut site, 20 nt of the site, and the following 6 nt downstream to site. Lindel is a logistic regression model that predicts the likelihood of NHEJ mutations induced by CRISPR; its input is a 60-nt sequence centered around the cut site, and its output consists of the predicted probabilities of 557 possible mutations: deletions around the cut site of up to 30 nt, every possible insertion of 1–2 nt, and a single collective mutation for any insertions of ≥ 3 nt.

We analyze only the ${N}_{\text{mutations}}$ (by default ${N}_{\text{mutations}}=4$) most likely mutations predicted by Lindel in our tool, as checking all possibilities is not feasible timewise. We normalize the probabilities of these mutations so that their sum equals 1. We also exclude insertions longer than 2 nt, as Lindel does not provide explicit mutations for such cases, which have been demonstrated to be exceedingly rare [22, 29].

Thus, the probability for each mutation is calculated as follows:

$$p_i=\frac{{p_{\mathrm{Lindel}}}_i}{\sum_1^{N_{\mathrm{mutations}}}\;p_{{\mathrm{Lindel}}_{\mathrm k}}}\ast{\;p_{\mathrm{CRISPRedict}}}_i$$

(3)

where ${{p}_{\text{Lindel}}}_{i}$ and ${p}_{{\text{CRISPRedict}}_{i}}$ represent the probabilities from Lindel and CRISPRedict of mutation $i$, respectively; and ${N}_{\text{mutations}}$ is the number of most probable mutations taken from Lindel.

The predicted mutations and their probabilities serve as input for the three sub-models described in the following sections. Alternatively, users can manually input specific mutations of interest along with their probabilities, which can include both indels and substitutions.

Transcription sub-model

To predict the effect of a mutation on transcription (Fig. 1C:III), we employ Xpresso [24], a fast and accurate deep learning model (Additional file 1: Fig. S1) that predicts mRNA steady-state abundance based on the nucleotide context around a transcription start site (TSS). While more accurate models exist such as Enformer [33], which can consider mutations up to 100k base pairs away, they are too slow and heavy computationally to be incorporated into EXPosition. Enformer can take ~ 5 min to evaluate a mutation as compared to less than a second needed by Xpresso. More details about Xpresso can be found in supplementary Sect. 1. Its input consists of a 10.5-kb context around the TSS, while the output is the log₁₀ mRNA expression level of the respective gene (Fig. 2A).

For a given mutation $i$ and gene $j$, we examine whether the mutation could potentially impact the gene’s transcription levels by considering its transcripts’ TSSs and checking if the mutation falls within 7 kb upstream or 3.5 kb downstream of them (i.e., in the region that Xpresso considers when evaluating a TSS). For each potentially impacted transcript, we calculate the Xpresso score of the 10.5 kb sequence around its TSS before and after the mutation (denoted ${r}_{\text{WT}}$ and ${r}_{\text{mutated}}$, respectively). Each transcript’s final transcription score reflects the relative change in mRNA transcription levels following the mutation:

$$s_{\mathrm{Trancription}}=\frac{\left|r_{\mathrm{mutated}}-r_{\mathrm{WT}}\right|}{r_{\mathrm{WT}}}$$

(4)

${s{\prime}}_{ij}$ in Eq. 2 is the maximal ${s}_{\text{Trancription}}$ caused by mutation $i$ over gene $j$’s transcripts. If a specific transcript of interest was provided, the output score pertains solely to that transcript.

Finding mRNA isoforms following a mutation

Both the splicing model (Fig. 1C:IV) and the translation initiation model (Fig. 1C:V) analyze isoforms of transcripts following mutations and potential aberrant splicing events. We obtained splice site annotations by Ensembl [34]. To predict mis-splicing events, we utilize SpliceAI [25], a deep learning tool (Additional file 1: Fig. S2) that predicts the change in a position’s probability to function as a splicing donor/acceptor site following a mutation. More information can be found in supplementary Sect. 2. We use these annotated splice sites and the splicing changes predicted by SpliceAI to generate all possible mRNA isoforms following the mutation by concatenating donor and acceptor splice sites (Additional file 1: Fig. S3). Further details are available in supplementary Sect. 3.

Splicing sub-model

The splicing sub-model (Fig. 1C:IV) assesses the impact of a mutation on a gene’s viability by examining the isoforms generated for each of the gene’s transcripts. To gauge the effect of a mutation on a protein’s functionality, we use Oncosplice [30]. This model receives a mutation and a gene as input and predicts how much the mutation disrupts the gene’s protein function. This disruption is scored using evolutionary conservation information (Fig. 2B).

For each isoform, a sliding window is employed to identify the most conserved area that is affected by the mutation; the window’s length is set to the average domain length of all human proteins. The score of each transcript is the average score of its isoforms (Additional file 1: Fig. S4); and finally, the gene’s score is the maximal transcript score, i.e., the transcript whose function was most significantly disrupted. If a specific transcript of interest was provided, the output score pertains solely to that transcript. For further information, please refer to supplementary Sect. 4.

Translation initiation sub-model

The translation initiation sub-model (Fig. 1C:V, Fig. 2C) assesses the ability of mutant variants to initiate translation by searching for suitable start codons within the isoforms identified by the splicing model (see the section “Finding mRNA isoforms following a mutation”). The translation initiation score of the suitable codons is determined using TITER [31], a deep learning tool (Additional file 1: Fig. S5) that integrates a deep learning algorithm with known codon compositions of translation initiation sites (TISs) to predict TIS functionality. For additional information, please consult supplementary Sect. 5.

For each isoform, we locate the start of the coding sequence through local alignment with the original transcript’s coding sequence’s start. An iterative process is then initiated where TITER examines all in-frame NUG and ACG codons (where N can be any nucleotide) within a window surrounding the coding sequence’s start, with the window size increasing in each iteration. The process concludes when either the best new codon is discovered (with a TITER score sufficiently close to that of the canonical start codon) or when the maximum window size is reached. Further information can be found in supplementary Sect. 6. We then calculate the WT start codon’s TITER score rank, compared to the TITER scores of all human canonical start codons; this rank is denoted as ${i}_{\text{WT}}$. We repeat this calculation for the best new start codon found for the isoform, whose rank is denoted ${i}_{\text{mutated}}$. The isoform's score is defined as the relative change in the isoform start codon's TITER score rank:

$$s_{\mathrm{Initiation}}=\frac{\left|i_{\mathrm{mutated}}-i_{\mathrm{WT}}\right|}{i_{WT}}$$

(5)

Like the splicing sub-model, we calculate the transcript’s initiation score as the average of the isoform scores and the gene’s initiation score (i.e., $s'_{ij}$ from Eq. 2) as the highest score among all its transcripts (Additional file 1: Fig. S4). Likewise, if a specific transcript of interest is provided, we provide the score exclusively for that transcript.

Vienna Bioscore CRISPR (VBC)

VBC [26] (Fig. 1C:VI) is a tool that predicts gene knockout given an sgRNA and the position of the target site. It outputs its score using linear regression with the following features: (A) indel formation predictions from inDelphi [12]; (B) a “Bioscore” calculated using protein features like Pfam domains, DNA and amino acid conservation, amino acid identity, and gene structure; and (C) sgRNA activity prediction obtained using predictors like Azimuth and similar models. Together, these components form a comprehensive score that captures key processes in CRISPR–Cas9 mutagenesis and can be used to estimate gene knockout effectiveness following CRISPR. For more information about this tool and its performance, please review the original paper and the results in this paper.

GuidePro

GuidePro [27] (Fig. 1C:VII) is another tool that predicts gene knockout of sgRNAs targeting protein-coding exons. Knockout efficiency is governed by three key factors: (A) sgRNA activity score attained with DeepHF [35], Azimuth [36] and SSC [37]; (B) frameshift probability acquired with inDelphi [12], Lindel [29], and FORECasT [14]; and (C) amino acid sensitivity score which is evaluated using conservation, Pfam domain annotations, post-translational modifications (PTMs), and secondary structures. Each of these scores is created with an SVM, and these scores are then fed into another SVM to estimate gene knockout. For more information about this tool and its performance, please review the original paper and the results in this paper.

SVM sgRNA gene knockout classifier

EXPosition utilizes an SVM classifier (Fig. 1C:VIII) with an RBF kernel using EXPosition’s gene expression estimations, along with the predicted cutting likelihood and the scores from VBC and GuidePro as features. The model was trained using all sgRNAs from all the datasets in this paper. In cases when the scores from VBC and/or GuidePro are not available for an sgRNA, EXPosition uses one of similarly trained SVM classifiers that don’t require GuidePro and/or VBC.

Mutations with high gene expression scores are overrepresented in ClinVar-designated pathogenic mutations

We wanted to further validate each of our gene expression sub-models, even though their main components were already validated elsewhere, and demonstrate our ability to predict functional effect of mutations. Thus, we used our tool’s gene expression component to analyze mutations from the ClinVar dataset [38], which contains mutations and their phenotypes accumulated from laboratories and researchers globally. We analyzed ~ 325k mutations tagged as benign (192k) or pathogenic (133k). Although using these data is not ideal, and not every mutation would lead to a functional effect, we expect a monotonic association between the score of the model and the number of pathogenic mutations.

For each of EXPosition’s gene expression sub-models, we examined the 3254 mutations in each percentile range (i.e., 99–100%, 98–99%, etc.) and calculated the fraction of pathogenic mutations in that set (Fig. 3; the score thresholds for each percentile are detailed in supplementary Sect. 7, Additional file 1: Table S1). We denote this fraction ${S}_{\text{ClinVar}}$. A higher fraction indicates better recognition of pathogenic mutations by the sub-model. All sub-models provided meaningful rankings on the ClinVar dataset from a certain threshold. The thresholds of the transcription/splicing/translation initiation sub-models correspond to the 92/67/97 percentiles. ${S}_{\text{ClinVar}}$ values for the higher percentiles exhibited significant enrichment of pathogenic mutations in almost all top 5 percentiles (${p<10}^{-25}, {p<10}^{-324}$, ${p<10}^{-308}$ for the transcription, splicing, and translation initiation sub-models respectively using the hypergeometric test).

Following this analysis, each sub-model’s ClinVar threshold was set as the lowest percentile in which we observed a significant enrichment of pathogenic mutations (e.g., the 97th percentile for the translation initiation sub-model). The thresholds are used for informing the user that a mutation’s effect on a certain aspect of expression is potentially disease-causing. We then calculated the maximal score predicted by each sub-model for the whole set of ClinVar mutations and normalized the thresholds by these maximal values, facilitating easier interpretability. To assess the impact of an sgRNA on a specific expression aspect, we compute an average score across the corresponding sub-model for all predicted mutations, weighted by their probabilities. If the mean sub-model score surpasses its designated threshold, EXPosition informs the user that this aspect is affected. Alternatively, if a mutation was inputted manually, its score is checked against the sub-model’s threshold and the user is notified accordingly. The maximal scores from the ClinVar analysis for each sub-model used to normalize the scores (Eq. 2), as well as the raw and normalized final thresholds, can be found in supplementary Sect. 7, Additional file 1: Table S2.

Technical details regarding data for validation of EXPosition

To validate EXPosition, we used four functional screening datasets published by Doench et al. [39, 40]; Doench et al. [36, 41], Shalem et al. [42, 43]; and Xu et al. [37, 44]. The Doench (2016) and Shalem studies performed negative selection screening using sgRNAs that targeted multiple sites with the aim of identifying essential genes in A375 and HT29 cell lines. The data from Xu contains sgRNAs marked as efficient/non-efficient at performing gene knockout on KBM7 and HL60, while the data from Doench (2014) only consists of sgRNAs that demonstrated gene knockouts on A375 cells.

sgRNAs that did not have the scores for VBC or GuidePro, were unanalyzable due to bugs in EXPosition, or were not found in the genome were omitted from the analysis. The Xu dataset contained ~ 2k sgRNAs (for both cell lines), of which we used 1.8k were used. The Doench (2014) consisted of ~ 1.3k sgRNAs, of which 1.2k were used. The Doench (2016) contained ~ 113k & ~ 77k sites for the A375 and HT29 cell-lines respectively, of which ~ 89k & ~ 61k sites were used respectively. The Shalem dataset contained ~ 65k sites, of which were used ~ 53k sites.

In the Doench (2016) dataset and the Shalem dataset used, there were measurements from both lentiCRISPRv2 and lentiGuide, with multiple repeats. Thus, these results were averaged as described in supplementary Sect. 8.

Results

EXPosition is a tool that predicts gene knockout by considering gene expression estimations

We have developed EXPosition (EXPosing CRISPR's Impact on EXPression and Position), a computational tool designed to assess CRISPR sites and determine the likelihood of successfully silencing a target gene. It does this by integrating post-CRISPR gene expression estimations with gene knockout evaluations from models that consider various other factors such as frameshift likelihood, amino acid composition, and conservation. EXPosition’s gene expression sub-models estimate the sgRNA’s impact on three key stages of gene expression: transcription, splicing, and translation initiation. The tool begins by predicting the most probable mutations and their associated probabilities resulting from CRISPR utilization. Subsequently, EXPosition assesses the impact of each mutation on these three aspects of gene expression and outputs a score for each gene affected by the mutations. Finally, EXPosition utilizes previous tools to predict scores pertaining to gene knockout, and combines their scores along with the predicted cutting efficiency and gene expression estimates in an SVM classifier to predict potential gene knockout by the input sgRNA.

EXPosition (depicted in Fig. 1C) employs deep-learning algorithms for its gene expression component, all of which have undergone independent validation to execute their tasks. The transcription sub-model (Fig. 1C:III) predicts the relative change in mRNA levels following the mutation. As for the splicing and translation initiation sub-models (Fig. 1C:IV-V), we consider all possible isoforms generated by the mutation through alternative splicing. These isoforms can lead to a distinct protein from the original one due to the following factors:

1.
Aberrant splicing events: These can arise by either creating new donor/acceptor sites or deleting existing ones, resulting in an altered splicing pattern of the mRNA.
2.
Alterations in initiation site usage: Mutations in the start codon or its context can modify the initiation capability. Factors such as the nucleotide context and folding energy of the start codon play pivotal roles in determining initiation efficiency (as reviewed in [45]). Any change in these aspects could impact the initiation capability of the original start codon. Additionally, other potential start codons, such as ATGs in the same reading frame, could serve as alternative initiation sites, preserving the transcript's initiation capability [23] and potentially retaining the protein’s function. Consequently, an assessment of the initiation capability of all potential start codons, including the original ATG if unaltered, is necessary to predict whether translation initiation can occur.
3.
Elimination of the gene's stop codon: This type of mutation can lead to the addition of potentially unnecessary amino acids to the translated protein.

Therefore, our tool’s gene expression component comprehensively evaluates the potential effects of mutations on these three aspects of gene expression, utilizing deep-learning algorithms that have undergone rigorous validation.

The mRNA’s isoforms are constructed by predicting the mutation’s effect on alternative splicing and concatenating relevant exons, i.e., exons with viable splicing donor/acceptor pairs. These isoforms are then passed to the splicing and translation initiation sub-models, which assess the viability of these isoforms based on amino acid conservation and the translation initiation capability of the isoforms, respectively. Details regarding each gene expression sub-model appear in the “Material and methods” section.

Our tool also incorporates the sgRNA’s VBC (Fig. 1C:VI) and GuidePro (Fig. 1C:VII) scores and uses them along with the predicted cutting efficiency (from CRISPRedict—Fig. 1C:II) and gene expression estimates in an SVM classifier (Fig. 1C:VIII) to determine if the input sgRNA will silence the gene. Details regarding VBC, GuidePro, CRISPRedict, and the final SVM classifier also appear in the “Material and methods” section.

The tool is written in Python 3.9 and can be accessed in http://www.cs.tau.ac.il/~tamirtul/EXPosition. The tool’s GUI is shown in Fig. 4.

EXPosition’s gene expression component provides additional information that improves prediction of gene knockout with CRISPR

To validate our tool’s gene expression component, i.e., the gene expression estimates post-CRISPR, we searched for datasets containing measurements of gene expression following CRISPR editing. Since no such explicit data was found, we decided to use four functional screening datasets published by Doench et al. [39], Doench et al. [36], Xu et al. [37], and Shalem et al. [42]. Our reasoning was that we could validate our tool’s gene expression component by comparing its predictions with quantifications of gene knockout (more information about the data can be found in the “Methods” section).

In each of the studies, the cells were infected using lentiviruses, causing them to express sgRNAs and Cas9, and measured the fold change in sgRNA levels following CRISPR’s action (Fig. 5A). The underlying assumption was that sgRNAs targeting essential genes would negatively impact the cell's fitness, leading to a reduction in the production of these sgRNAs. We believe that this assumption generally holds true to some extent for any gene [46, 47]. Furthermore, we believe that this relationship is, to some extent, attributable to alterations in gene expression. Consequently, we anticipated observing a stronger correlation between the impact on fitness and the predicted effect on gene expression than between the impact on fitness and the efficiency of DNA cutting alone.

We performed regression analysis on each dataset where we compared existing models’ performance to their performance when adding EXPosition’s gene expression results as features. In addition, we conducted classification analysis for each dataset by dividing the sgRNAs into silencing/non-silencing by choosing sgRNAs which had depletion rates of at least half as silencing sgRNAs, and the rest as non-silencing (as done by the authors of VBC) (Fig. 5B). The results pertaining to each dataset are obtained using regressors/classifiers that were trained and tested (on withheld data) using data solely from that dataset. Additional details regarding the training of these models can be found in supplementary Sect. 9.

EXPosition’s gene expression component improved prediction of gene knockout on the test sets when added to VBC or GuidePro across different functional screening libraries and cell types in all cases, which include 6 functional screening experiments, 4 cell types, and encompass 207k sites (Table 1). This demonstrates that additional information is embedded in EXPosition’s gene expression features, thus validating the gene expression estimations (which were already validated, both independently and in our analysis of the ClinVar dataset).

Table 1 EXPosition’s gene expression features improve Spearman correlations with sgRNA depletion rates. “VBC vs. VBC + EXP. gene expression” and “GuidePro vs. GuidePro + EXP. gene expression” are the comparisons between the regressor trained with VBC\GuidePro and the regressor trained with VBC\GuidePro and EXPosition’s gene expression models. Each cell contains the median spearman correlations obtained from 100 cross-validations of training an XGBoost regressor on randomly chosen 80% of the data and testing on the remaining 20%. Cells with an asterisk were cases where the added gene expression features from EXPosition improved performance with statistical significance (p<0.05, Wilcoxon rank-sum test)

Full size table

To verify that this improvement was not a result of random chance, we repeated this analysis when shuffling the training data labels; with this change, no improvement was observed for any of the comparisons, indicating the improvement is indeed not due to random chance. Similar results were obtained using classification (Additional file 1: Tables S3-S5, see supplementary Sect. 10).

We note that the main goal of EXPosition’s gene expression component is to predict CRISPR’s effect on various aspects of gene expression, which might be interesting even if not aiming to silence a gene (e.g., to evaluate CRISPR’s effect on an off-target). On the other hand, VBC and GuidePro are designed to predict gene knockout following CRISPR focusing on the amino acid content of the gene and the protein functionally; thus, EXPosition’s gene expression predictions by themselves do not exactly compete with them in terms of performance (Additional file 1: Tables S9-S10, see supplementary Sect. 10).

It is important to mention that VBC and GuidePro cannot analyze a general sgRNA, but rather only specific guides that were already analyzed and that target coding sequences; whereas EXPosition’s gene expression component deals with any guide sequence, which can affect any region of the gene, including UTRs and introns. In addition, the datasets we analyze are biased towards VBC and GuidePro (compared to our gene expression models), since similarly to these tools, the datasets are designed to target the gene’s coding sequence and change its amino acid sequence, rather than directly affecting aspects of gene expression such as transcription, splicing, and translation initiation, which is what EXPosition’s gene expression component predicts. Despite all this, we show that EXPosition’s gene expression component provides valuable information that is not contained in the other tools.

EXPosition outperforms previous tools in predicting gene knockout following CRISPR usage

We wanted to compare EXPosition to existing tools which predict gene knockout following CRISPR usage such as VBC [27] and GuidePro [28]. Therefore, we performed similar regression and classification analyses as in the previous section, only this time, we used all of EXPosition’s outputs (gene expression estimates, predicted cutting likelihood, VBC score, GuidePro score) as features to train and test the regressors and classifiers and compared their performance against regressors/classifiers trained solely with VBC or GuidePro. Additional details regarding the training of these models can be found in supplementary Sect. 9.

The results can be seen in Table 2. EXPosition outperforms VBC in 5 of 6 cases and outperforms GuidePro in all 6 cases. Results from the classification analysis yielded similar results (Additional file 1: Tables S6-S8, see supplementary Sect. 10).

Table 2 EXPosition score produces better Spearman correlations with sgRNA depletion rates than previous tools’ scores. “VBC vs. EXP” and “GuidePro vs. EXP” are the comparisons between the regressor trained with VBC/GuidePro and the regressor trained with predictions from VBC, GuidePro, CRISPRedict and EXPosition’s gene expression models. Each cell contains the median Spearman correlations obtained from 100 cross-validations of training a XGBoost regressor on randomly chosen 80% of the data and testing on the remaining 20%. Cells with an asterisk were cases where the EXPosition outperformed the compared tool with statistical significance (p<0.05, Wilcoxon rank-sum test)

Full size table

Examples of silencing sites found with EXPosition

Finally, we provide a few examples where EXPosition improved on previous models to predict sgRNAs that will cause gene knockout (Table 3, Fig. 6). We used classifiers trained and tested on the same data from Doench et al. [36], as defined in the section “EXPosition outperforms previous tools in predicting gene knockout following CRISPR usage” and Fig. 5B. In each example, using classifiers whose only features are the cutting efficiencies and the scores from VBC or GuidePro would misclassify the sgRNA, while a classifier incorporating EXPosition’s gene expression predictions classifies it correctly.

Table 3 Examples of sites EXPosition improved on existing models. Each column contains the score of its gene expression respective model. In each of the cases EXPosition correctly classified the sites as silencing, while the classifiers trained only using CRISPRedict, VBC and GuidePro or just CRISPRedict misclassified them as non-silencing

Full size table

Discussion

In recent years, CRISPR has been used to edit genes and specifically to silence them. The prevalent paradigm is to use computational tools which estimate the likelihood of a mutation following the use of CRISPR and choose a site for gene silencing by taking the site with the highest mutation likelihood. However, when using CRISPR, we are usually interested in affecting the expression of the target gene without affecting any other gene’s expression. Thus, the current common approaches for designing sgRNAs do not optimize the right objective.

Recently, a few tools designed to predict gene knockout post-CRISPR usage, using features other than predicted cutting efficiency, were developed. These tools are helpful but they have limitations: they do not consider alterations in gene expression when forming their predictions; they cannot grade every given sgRNA, but rather are limited to a subset of already pre-processed sgRNAs; and they are focused on sgRNAs targeting coding regions.

Therefore, we created EXPosition, a tool that circumvents these limitations by combining gene knockout predictions (VBC and GuidePro) and predicted cutting efficiency (CRISPRedict) with gene expression estimates post-CRISPR to predict gene knockout. Our tool can evaluate any sgRNA (not just pre-processed ones), including sgRNAs not in coding regions, and it considers the effect of CRISPR usage on gene expression.

EXPosition’s gene expression component predicts the most likely mutations following CRISPR use and their effect on transcription, splicing, and translation initiation. In addition, EXPosition can analyze manually inserted mutations, regardless of their origin, and assess their effects on gene expression. This versatility allows users to assess the effects of mutations that may not have been generated via CRISPR or other specific methods, expanding the tool’s applicability to a wider range of scenarios. Since our tool’s gene expression component is composed of various algorithms which predict various aspects of gene expression that were validated and compared to measurements of gene expression, we expect predictions used in our tool to be relevant and correspond with actual expression measurements.

We validated our tool’s gene knockout predictions using experimental data from 6 functional screening experiments, on 4 cell types, encompassing 207k sites. EXPosition predictions produced better Spearman correlations with sgRNA depletion rates than 6/5 out of 6 cases for GuidePro/VBC, respectively. In addition, EXPosition’s gene expression component was validated via showing that when using only the combination of VBC or GuidePro, there is a significant decrease in the performances. Similar results were obtained using classification analysis.

We also gave additional information about our gene expression outputs by providing the fraction of pathogenic mutations from the ClinVar dataset out of all pathogenic and benign mutations that received certain values of EXPosition’s gene expression score. Analysis of the ClinVar dataset also validated our gene expression models by showing enrichment of pathogenic mutations in subsets of mutations that have high-scoring gene expression estimates. We hope that the user-friendly GUI will encourage people to use our tool in their scientific endeavors. We believe that since EXPosition is modular, it will be possible to update each part of the tool with newer and better models, including models that are specific for different cell types and/or Cas proteins. We believe that once robust models of gene expression steps, likely mutations post-CRISPR cleavage and likelihood for cleavage post-CRISPR are available for non-human organisms, we will be able to extend EXPosition’s gene expression component to apply for these organisms as well.

The study reported here clearly demonstrates two important gaps in the field of CRISPR research: (1) we should carefully design better objective functions to correctly evaluate sgRNAs and (2) we should conduct more experiments that include the target gene in their endogenous genomic context while measuring the effect on gene expression in addition to cutting efficiency. Studies including this type of data will facilitate better understanding of a given sgRNA’s phenotypical effect on its target site, rather than only its genotypical effect; they can also be used to further improve our tool.

Conclusions

EXPosition is a user-friendly tool for the classification of sgRNAs into silencing/non-silencing that considers the effects of predicted gene expression along known factors such as conservation, amino acid composition, and frameshift likelihood. Validated on several datasets of different human cell types, it offers the scientific community a better tool to assess the functionality of sgRNAs than before and for the first time reveals the likely gene expression outcomes following CRISPR usage, which complement prediction of cleavage likelihood by predicting the actual objective of CRISPR usage: changing gene expression. With research on CRISPR ever growing, we hope that datasets with gene expression measurements post-CRISPR cleavage will be published to improve our understanding of the interplay between CRISPR and gene expression.

Data availability

For the version of EXPosition available at the time of this publication, please refer to the EXPosition citation [32] or use the following link: https://doi.org/https://doi.org/10.5281/zenodo.14228618.

The latest developments to EXPosition can be found here: https://github.com/shaicoh3n/EXPosition.

Human genome annotations were downloaded from Ensembl [34].

The data we analyzed by Doench et al. [36] can be found in Table 11 at [41].

The Shalem dataset can be found in Table 10 at [43].

The data we analyzed by Doench et al. [39] can be found in Supplementary Table 10 at [40].

The data analyzed by Xu et al. can be found in Supplementary Table_1 at [44].

The ClinVar dataset can be found in the ClinVar FTP server (https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz) [38].

References

Pickar-Oliver A, Gersbach CA. The next generation of CRISPR–Cas technologies and applications. Nat Rev Mol Cell Biol. 2019;20(8):490–507. https://doi.org/10.1038/s41580-019-0131-5.
Article PubMed PubMed Central Google Scholar
Uehara H, Zhang X, Pereira F, Narendran S, Choi S, Bhuvanagiri S, et al. Start codon disruption with CRISPR/Cas9 prevents murine Fuchs’ endothelial corneal dystrophy. Zoghbi HY, Cepko CL, Ksander B, editors. Elife. 2021;10:e55637.
Article PubMed PubMed Central Google Scholar
Si X, Zhang H, Wang Y, Chen K, Gao C. Manipulating gene translation in plants by CRISPR–Cas9-mediated genome editing of upstream open reading frames. Nat Protoc. 2020;15(2):338–63.
Article PubMed Google Scholar
Whitworth KM, Benne JA, Spate LD, Murphy SL, Samuel MS, Murphy CN, et al. Zygote injection of CRISPR/Cas9 RNA successfully modifies the target gene without delaying blastocyst development or altering the sex ratio in pigs. Transgenic Res. 2017;26(1):97–107.
Article PubMed Google Scholar
Chari R, Yeo NC, Chavez A, Church GM. sgRNA Scorer 2.0: a species-independent model to predict CRISPR/Cas9 activity. ACS Synth Biol. 2017;6(5):902–4.
Article PubMed PubMed Central Google Scholar
Listgarten J, Weinstein M, Kleinstiver BP, Sousa AA, Joung JK, Crawford J, et al. Prediction of off-target activities for the end-to-end design of CRISPR guide RNAs. Nat Biomed Eng. 2018;2(1):38–47.
Article PubMed PubMed Central Google Scholar
Lei Y, Lu L, Liu HY, Li S, Xing F, Chen LL. CRISPR-P: a web tool for synthetic single-guide RNA design of CRISPR-system in plants. Mol Plant. 2014;7(9):1494–6.
Article PubMed Google Scholar
Liu H, Wei Z, Dominguez A, Li Y, Wang X, Qi LS. CRISPR-ERA: a comprehensive design tool for CRISPR-mediated gene editing, repression and activation. Bioinformatics. 2015;31(22):3676–8.
Article PubMed PubMed Central Google Scholar
Concordet JP, Haeussler M. CRISPOR: intuitive guide selection for CRISPR/Cas9 genome editing experiments and screens. Nucleic Acids Res. 2018;46(W1):W242–5.
Article PubMed PubMed Central Google Scholar
Peng D, Tarleton R. EuPaGDT: a web tool tailored to design CRISPR guide RNAs for eukaryotic pathogens. Microb Genom. 2015;1(4):e000033.
PubMed PubMed Central Google Scholar
Montague TG, Cruz JM, Gagnon JA, Church GM, Valen E. CHOPCHOP: a CRISPR/Cas9 and TALEN web tool for genome editing. Nucleic Acids Res. 2014;42(W1):W401–7.
Article PubMed PubMed Central Google Scholar
Shen MW, Arbab M, Hsu JY, Worstell D, Culbertson SJ, Krabbe O, et al. Predictable and precise template-free CRISPR editing of pathogenic variants. Nature. 2018;563(7733):646–51.
Article PubMed PubMed Central Google Scholar
Stemmer M, Thumberger T, del Sol KM, Wittbrodt J, Mateo JL. CCTop: an intuitive, flexible and reliable CRISPR/Cas9 target prediction tool. PLoS ONE. 2015;10(4):e0124633.
Article PubMed PubMed Central Google Scholar
Allen F, Crepaldi L, Alsinet C, Strong AJ, Kleshchevnikov V, De Angeli P, et al. Predicting the mutations generated by repair of Cas9-induced double-strand breaks. Nat Biotechnol. 2019;37(1):64–72.
Article Google Scholar
Labuhn M, Adams FF, Ng M, Knoess S, Schambach A, Charpentier EM, et al. Refined sgRNA efficacy prediction improves large-and small-scale CRISPR–Cas9 applications. Nucleic Acids Res. 2018;46(3):1375–85.
Article PubMed Google Scholar
Moreno-Mateos MA, Vejnar CE, Beaudoin JD, Fernandez JP, Mis EK, Khokha MK, et al. CRISPRscan: designing highly efficient sgRNAs for CRISPR-Cas9 targeting in vivo. Nat Methods. 2015;12(10):982–8.
Article PubMed PubMed Central Google Scholar
Pulido-Quetglas C, Aparicio-Prat E, Arnan C, Polidori T, Hermoso T, Palumbo E, et al. Scalable design of paired CRISPR guide RNAs for genomic deletion. PLoS Comput Biol. 2017;13(3): e1005341.
Article PubMed PubMed Central Google Scholar
Heigwer F, Kerr G, Boutros M. E-CRISP: fast CRISPR target site identification. Nat Methods. 2014;11(2):122–3.
Article PubMed Google Scholar
Li VR, Zhang Z, Troyanskaya OG. CROTON: an automated and variant-aware deep learning framework for predicting CRISPR, Cas9 editing outcomes. Bioinformatics. 2021;37(Supplement_1):i342–8. https://doi.org/10.1093/bioinformatics/btab268.
Article PubMed PubMed Central Google Scholar
Molla KA, Yang Y. Predicting CRISPR/Cas9-induced mutations for precise genome editing. Trends Biotechnol. 2020;38(2):136–41. Available from https://www.sciencedirect.com/science/article/pii/S0167779919302069.
Article PubMed Google Scholar
Chuai G, Ma H, Yan J, Chen M, Hong N, Xue D, et al. DeepCRISPR: optimized CRISPR guide RNA design by deep learning. Genome Biol. 2018;19(1):80. https://doi.org/10.1186/s13059-018-1459-4.
Article PubMed PubMed Central Google Scholar
Leenay RT, Aghazadeh A, Hiatt J, Tse D, Roth TL, Apathy R, et al. Large dataset enables prediction of repair after CRISPR–Cas9 editing in primary T cells. Nat Biotechnol. 2019;37(9):1034–7. https://doi.org/10.1038/s41587-019-0203-2.
Article PubMed PubMed Central Google Scholar
Ben-Yehezkel T, Zur H, Marx T, Shapiro E, Tuller T. Mapping the translation initiation landscape of an S. cerevisiae gene using fluorescent proteins. Genomics. 2013;102(4):419–29.
Article PubMed Google Scholar
Agarwal V, Shendure J. Predicting mRNA abundance directly from genomic sequence using deep convolutional neural networks. Cell Rep. 2020;31(7):107663.
Article PubMed Google Scholar
Jaganathan K, Kyriazopoulou Panagiotopoulou S, McRae JF, Darbandi SF, Knowles D, Li YI, et al. Predicting splicing from primary sequence with deep learning. Cell. 2019;176(3):535–548.e24.
Article PubMed Google Scholar
Michlits G, Jude J, Hinterndorfer M, de Almeida M, Vainorius G, Hubmann M, et al. Multilayered VBC score predicts sgRNAs that efficiently generate loss-of-function alleles. Nat Methods. 2020;17(7):708–16.
Article PubMed Google Scholar
He W, Wang H, Wei Y, Jiang Z, Tang Y, Chen Y, et al. GuidePro: a multi-source ensemble predictor for prioritizing sgRNAs in CRISPR/Cas9 protein knockouts. Bioinformatics. 2021;37(1):134–6.
Article PubMed PubMed Central Google Scholar
Konstantakos V, Nentidis A, Krithara A, Paliouras G. CRISPRedict: a CRISPR-Cas9 web tool for interpretable efficiency predictions. Nucleic Acids Res. 2022;50(W1):W191–8. https://doi.org/10.1093/nar/gkac466.
Article PubMed PubMed Central Google Scholar
Chen W, McKenna A, Schreiber J, Haeussler M, Yin Y, Agarwal V, et al. Massively parallel profiling and predictive modeling of the outcomes of CRISPR/Cas9-mediated double-strand break repair. Nucleic Acids Res. 2019;47(15):7989–8003. https://doi.org/10.1093/nar/gkz487.
Article PubMed PubMed Central Google Scholar
Lynn N, Tuller T. Detecting and understanding meaningful cancerous mutations based on computational models of mRNA splicing. NPJ Syst Biol Appl. 2024;10(1):25.
Article PubMed PubMed Central Google Scholar
Zhang S, Hu H, Jiang T, Zhang L, Zeng J. TITER: predicting translation initiation sites by deep learning. Bioinformatics. 2017;33(14):i234–42. https://doi.org/10.1093/bioinformatics/btx247.
Article PubMed PubMed Central Google Scholar
Cohen* S, Bergman* S, Lynn N, Tuller T. EXPosition. Zenodo. 2024. https://doi.org/10.5281/zenodo.14228618. Cited 2024 Nov 25.
Avsec Ž, Agarwal V, Visentin D, Ledsam JR, Grabska-Barwinska A, Taylor KR, et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat Methods. 2021;18(10):1196–203. https://doi.org/10.1038/s41592-021-01252-x.
Article PubMed PubMed Central Google Scholar
Cunningham F, Allen JE, Allen J, Alvarez-Jarreta J, Amode MR, Armean IM, et al. Ensembl 2022. Nucleic Acids Res. 2022;50(D1):D988–95. https://doi.org/10.1093/nar/gkab1049.
Article PubMed Google Scholar
Wang D, Zhang C, Wang B, Li B, Wang Q, Liu D, et al. Optimized CRISPR guide RNA design for two high-fidelity Cas9 variants by deep learning. Nat Commun. 2019;10(1):4284.
Article PubMed PubMed Central Google Scholar
Doench JG, Fusi N, Sullender M, Hegde M, Vaimberg EW, Donovan KF, et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. Nat Biotechnol. 2016;34(2):184–91. https://doi.org/10.1038/nbt.3437.
Article PubMed PubMed Central Google Scholar
Xu H, Xiao T, Chen CH, Li W, Meyer CA, Wu Q, et al. Sequence determinants of improved CRISPR sgRNA design. Genome Res. 2015;25(8):1147–57.
Article PubMed PubMed Central Google Scholar
Landrum MJ, Lee JM, Benson M, Brown GR, Chao C, Chitipiralla S, et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 2018;46(D1):D1062–7. https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz.
Doench JG, Hartenian E, Graham DB, Tothova Z, Hegde M, Smith I, et al. Rational design of highly active sgRNAs for CRISPR-Cas9–mediated gene inactivation. Nat Biotechnol. 2014;32(12):1262–7. https://doi.org/10.1038/nbt.3026.
Article PubMed PubMed Central Google Scholar
Doench JG, Hartenian E, Graham DB, Tothova Z, Hegde M, Smith I, et al. Rational design of highly active sgRNAs for CRISPR-Cas9–mediated gene inactivation. Supplementary Table 10. Nat Biotechnol. 2014. https://staticcontent.springer.com/esm/art%3A10.1038%2Fnbt.3026/MediaObjects/41587_2014_BFnbt3026_MOESM10_ESM.xlsx.
Doench JG, Fusi N, Sullender M, Hegde M, Vaimberg EW, Donovan KF, et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. Table S11. Nat Biotechnol. 2016;34(2):184–91. https://staticcontent.springer.com/esm/art%3A10.1038%2Fnbt.3437/MediaObjects/41587_2016_BFnbt3437_MOESM8_ESM.zip.
Shalem O, Sanjana NE, Hartenian E, Shi X, Scott DA, Mikkelsen TS, et al. Genome-scale CRISPR-Cas9 knockout screening in human cells. Science (1979). 2014;343(6166):84–7. https://doi.org/10.1126/science.1247005.
Article Google Scholar
Doench JG, Fusi N, Sullender M, Hegde M, Vaimberg EW, Donovan KF, et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. Table S10. Nat Biotechnol. 2016;34(2):184–91. https://staticcontent.springer.com/esm/art%3A10.1038%2Fnbt.3437/MediaObjects/41587_2016_BFnbt3437_MOESM8_ESM.zip.
Xu H, Xiao T, Chen CH, Li W, Meyer CA, Wu Q, et al. Sequence determinants of improved CRISPR sgRNA design. Supplementary Table_1. Genome Res. 2015:1147–57. https://genome.cshlp.org/content/suppl/2015/06/12/gr.191452.115.DC1/Supplemental_Table_1.xlsx.
Tuller T, Zur H. Multiple roles of the coding sequence 5′ end in gene expression regulation. Nucleic Acids Res. 2015;43(1):13–28. https://doi.org/10.1093/nar/gku1313.
Article PubMed Google Scholar
Lang GI, Murray AW, Botstein D. The cost of gene expression underlies a fitness trade-off in yeast. Proc Natl Acad Sci. 2009;106(14):5755–60. https://doi.org/10.1073/pnas.0901620106.
Article PubMed PubMed Central Google Scholar
Keren L, Hausser J, Lotan-Pompan M, Vainberg Slutskin I, Alisar H, Kaminski S, et al. Massively parallel interrogation of the effects of gene expression levels on fitness. Cell. 2016;166(5):1282–1294.e18. Available from: https://www.sciencedirect.com/science/article/pii/S009286741630931X.
Article PubMed Google Scholar

Download references

Acknowledgements

The authors would like to thank Nadav Kra-Oz for contributing to the GUI. This study was supported in part by a fellowship from the Edmond J. Safra Center for Bioinformatics at Tel-Aviv University.

Funding

SC, SB, and NL are supported by a fellowship from the Edmond J. Safra Center for Bioinformatics at Tel Aviv University. The study was also supported by the CRISPR-IL consortium grant from the Israeli Innovation Authority.

Author information

Shai Cohen and Shaked Bergman contributed equally.

Authors and Affiliations

Department of Biomedical Engineering, Tel Aviv University, Tel-Aviv, 6997801, Israel
Shai Cohen, Shaked Bergman, Nicolas Lynn & Tamir Tuller
Sagol School of Neuroscience, Tel Aviv University, Tel-Aviv, 6997801, Israel
Tamir Tuller

Authors

Shai Cohen
View author publications
Search author on:PubMed Google Scholar
Shaked Bergman
View author publications
Search author on:PubMed Google Scholar
Nicolas Lynn
View author publications
Search author on:PubMed Google Scholar
Tamir Tuller
View author publications
Search author on:PubMed Google Scholar

Contributions

SC, SB, and TT conceived the project. All authors analyzed the data. SC and SB wrote the software. TT supervised the project. All authors interpreted the results. All authors wrote and revised the manuscript. All authors read and approved of the final manuscript.

Corresponding author

Correspondence to Tamir Tuller.

Ethics declarations

Ethics approval and consent to participate

This study only utilizes data that has been previously published [38, 40, 41, 43, 44].

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Supplementary methods.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Cohen, S., Bergman, S., Lynn, N. et al. A tool for CRISPR-Cas9 sgRNA evaluation based on computational models of gene expression. Genome Med 16, 152 (2024). https://doi.org/10.1186/s13073-024-01420-6

Download citation

Received: 22 December 2023
Accepted: 02 December 2024
Published: 23 December 2024
DOI: https://doi.org/10.1186/s13073-024-01420-6

You are viewing the site in preview mode

A tool for CRISPR-Cas9 sgRNA evaluation based on computational models of gene expression

Abstract

Background

Results

Conclusions

Background

Implementation

General pipeline of the tool

Predicting the genotypic outcome of CRISPR’s DSB

Transcription sub-model

Finding mRNA isoforms following a mutation

Splicing sub-model

Translation initiation sub-model

Vienna Bioscore CRISPR (VBC)

GuidePro

SVM sgRNA gene knockout classifier

Mutations with high gene expression scores are overrepresented in ClinVar-designated pathogenic mutations

Technical details regarding data for validation of EXPosition

Results

EXPosition is a tool that predicts gene knockout by considering gene expression estimations

EXPosition’s gene expression component provides additional information that improves prediction of gene knockout with CRISPR

EXPosition outperforms previous tools in predicting gene knockout following CRISPR usage

Examples of silencing sites found with EXPosition

Discussion

Conclusions

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher's Note

Supplementary Information

Additional file 1: Supplementary methods.

Rights and permissions

About this article

Cite this article

Share this article

Genome Medicine

Contact us