PRODE recovers essential and context-essential genes through neighborhood-informed scores

Cantore, Thomas; Gasperini, Paola; Bevilacqua, Riccardo; Ciani, Yari; Sinha, Sanju; Ruppin, Eytan; Demichelis, Francesca

doi:10.1186/s13059-025-03501-0

Method
Open access
Published: 28 February 2025

PRODE recovers essential and context-essential genes through neighborhood-informed scores

Thomas Cantore¹,
Paola Gasperini¹,
Riccardo Bevilacqua¹,
Yari Ciani¹,
Sanju Sinha^2,3,
Eytan Ruppin² &
…
Francesca Demichelis ORCID: orcid.org/0000-0002-8266-8631¹

Genome Biology volume 26, Article number: 42 (2025) Cite this article

1499 Accesses
3 Altmetric
Metrics details

Abstract

Gene context-essentiality assessment supports precision oncology opportunities. The variability of gene effects inference from loss-of-function screenings across models and technologies limits identifying robust hits. We propose a computational framework named PRODE that integrates gene effects with protein–protein interactions to generate neighborhood-informed essential (NIE) and neighborhood-informed context essential (NICE) scores. It outperforms the canonical gene effect approach in recovering missed essential genes in shRNA screens and prioritizing context-essential hits from CRISPR-KO screens, as supported by in vitro validations. Applied to Her2 + breast cancer tumor samples, PRODE identifies oxidative phosphorylation genes as vulnerabilities with prognostic value, highlighting new therapeutic opportunities.

Background

The identification of genes essential for cancer cell survival has been a major goal in cancer research, as they may unravel key biological processes of tumoral cells [1, 2] and represent putative therapeutic targets [3]. Genes that exhibit context-specific essentiality have a potentially high clinical impact, as they may lead to the development of cancer therapies tailored to treating tumors that display context-specific genotypes and are more selective [3, 4]. Synthetic lethality is an example of context-specific essentiality that manifests when a gene is essential for cell survival only in the presence of a functional mutation on a second gene (representing the context) [5, 6]. One of the most remarkable synthetic lethal (SL) relationships involves the gene PARP1 which turns essential in the context of BRCA1/2 gene mutations [7]. After this discovery, PARP1 pharmacological inhibitors demonstrated great clinical utility in treating tumors with specific DNA repair defects [8]. Moreover, besides the PARP1-BRCA1/2 relationship, SL interactions have emerged as valuable predictors of patients’ clinical responses to targeted and immunotherapy treatments [9,10,11]. Beyond SL interactions, context-specific essential genes in cancer cells can also stem from more intricate biological scenarios as, for instance, the essentiality of the WRN gene in the context of microsatellite unstable tumors [12], mitotic checkpoint genes in aneuploid cells [13], and IL-6R in chromosomal-unstable tumors [14].

Collections of loss of function (LoF) screens—based on shRNA (knock-down) or sgRNA (knock-out)—have become increasingly available in recent years thanks to projects such as DepMap [15], providing a valuable resource for studying essential and context-specific essential genes. Current methods for quantifying gene essentiality rely on the computation of gene effects as a measure of cell-fitness changes that occur after the LoF of each gene within the screening [16,17,18,19]. To identify context-essential genes, common approaches test for differential gene effects between two groups of cell-lines as part or not of the context of interest [20]. However, the joint analysis of gene effects computed from screens performed on different cell lines is complicated by biological and technical variation [21, 22], thus limiting the ability to identify reproducible and robust context-specific essential genes [11, 23]. Additionally, different screening technologies, such as shRNA, sgRNA based, or using different guide RNAs libraries [24], have been shown to produce different estimates of gene essentiality [25, 26].

Reasoning that the LoF of a gene may result in cell fitness changes through the impairment of the biological processes in which it takes part, we hypothesized that combining gene effects of genes that display direct protein–protein interactions (PPIs)—hence participate in common biological functions—may prove a robust indicator of gene essentiality (or context-essentiality). The use of PPIs alone or combined with other omics has been previously tested to predict gene effects or classify known essential genes [27,28,29,30,31,32]. However, despite PPIs being informative when predicting gene essentiality or shortlisting context-specific essentiality [33,34,35], there is currently no method that systematically integrates gene effects at the PPI level.

With this goal, we developed PRODE (PPIs Recovery Of neighborhooD Essentiality), an analytical framework that interrogates directly interacting partners of each gene—participating in its first-level neighborhood—to (I) robustly quantify gene essentiality of screened models and (II) prioritize context-specific essential genes by comparing two populations of screened cell-lines. PRODE provides a novel approach that proved more robust than gene effects alone when identifying reference essential and context-essential genes. It also demonstrates better discriminative power compared to alternative methodologies adapted to solve the same task [36,37,38]. We further applied PRODE for the prioritization of essential genes in the context of Her2 + breast cancer and recovered candidates with clinically relevant implications displaying subtype-specific associations to patients’ survival and anti-Her2 therapy response. We propose that PRODE’s integration of gene effects and PPIs has the potential to drive the design of therapeutic strategies for the targeted treatment of tumors.

Results

PRODE integrates gene effects and protein–protein interactions (PPIs) to quantify gene essentiality and context-specific essentiality

To quantify essentiality or context-specific essentiality of each gene across a group of cancer cell-line models, PRODE computes a neighborhood-informed essentiality (NIE) score and a neighborhood-informed context essentiality (NICE) score. While the procedures for calculating NIE and NICE scores are identical, the two derive from different types of input data. In particular, NIE scores are computed starting from gene effects of a screened model (or the average gene effects across models, representative of an essential-like signal). In contrast, NICE scores originate from differential gene effects between two groups of cell-line models (representative of a context-essential signal). For the NIE scores (Fig. 1a), PRODE starts by identifying the list of directly interacting partners of each gene, using a reference list of PPIs. Next, for each gene within the neighborhood, it retrieves its average gene effects previously turned into percentiles according to the complete screened gene list (low percentiles are linked to higher essentiality). To further quantify the neighborhood-level essentiality, it applies the Robust Rank Aggregation (RRA) algorithm [39] and tests whether the percentiles of genes in the collected neighborhood are significantly skewed towards zero. For a given gene (${g}_{i})$, the final NIE score is computed as:

$$NI{E}_{{g}_{i}}=\text{log}\left({u}_{{{NS}_{g}}_{i}}\times {u}_{G{E}_{{g}_{i}}}\right)$$

where ${u}_{{{NS}_{g}}_{i}}$ is the statistical significance of the neighborhood skew of ${g}_{i}$ (computed through the RRA algorithm and turned into percentile) and ${u}_{G{E}_{{g}_{i}}}$ is the ${g}_{i}$ percentile of average gene effect. In the case of NICE scores (Fig. 1b), PRODE computes differential gene effects by fitting a linear model for each gene comparing a subset of screened cell-lines part of the context of interest against a subset not part of that context. In this case, the RRA algorithm computes the statistical significance of skew of differential gene effects of each gene’s neighborhood and the final NICE scores are computed following the same procedure as NIE scores. To ensure interpretability and consistency with the original gene effects, NIE and NICE scores are logarithmically transformed, resulting in lower negative values indicative of stronger essentiality and context-specific essentiality for NIE and NICE, respectively.

PRODE robustly identifies essential genes and biological processes across different screening technologies

We assessed the performance and robustness of PRODE NIE scores in various settings. First, we investigated the ability of NIE scores to discriminate a set of 1028 common essential genes derived from previous publications as well as participating in essential biological processes (Fig. 2a; Additional file 2: Table S1; Methods) from 849 genes that are unexpressed or previously characterized as context-essential, expected to display low to no broad essentiality across cell-lines (Additional file 2: Table S1, [40]). Specifically, we applied PRODE to a collection of shortlisted cell line screening datasets (461 knock-out and 388 knock-down screenings composing the sgRNA and shRNA datasets, respectively; Fig. 2a as described in Methods; Additional file 2: Table S2), and we compared its performance to that of gene effects and other heuristics [36, 38] that integrate gene effects and PPIs. In particular, we examined neighborhood sum (NSUM) and neighborhood max (NMAX) [36] (Table 1), which consider the sum and maximum gene effects among the first-level neighborhood of each gene, focusing on directly interacting partners as PRODE. Additionally, we evaluated scores obtained through random walk with restart (RWR) [38], which diffuses gene effects across nodes in a protein–protein interaction network constructed from the collection of PPIs (Table 1). For each methodology, the obtained essentiality scores were normalized, controlling for the size of each neighborhood to avoid the ascertainment bias inherent in PPI collections (Methods). We tested the discrimination of essential vs. not essential genes in two different settings, considering the average gene effect across cell-lines (Fig. 2b) or gene effects profiles of each cancer cell-line separately (Additional file 1: Fig. S1a) as input. When computing NIE scores using gene knock-out effects (sgRNA dataset), PRODE exhibited a significantly higher ROC AUC (Fig. 2b—left; p value < 10⁻⁵, DeLong’s test) compared to every other approach except for NSUM (p value = 0.11, DeLong’s test) and similar area under the precision-recall curve (PR AUC; Fig. 2b—right). Notably, when utilizing gene knock-down effects (shRNA dataset) for the same task, PRODE outperformed every other methodology (p value < 10⁻⁷, DeLong’s test). When tested on each single cancer cell-line (Additional file 1: Fig. S1a), we observed consistent results (p value < 0.001, paired Wilcoxon test) and noticed that NIE scores preserve tumor-type specific essentialities (Additional file 1: Fig. S1c). Compared to average gene effects in the absence of PPI information, PRODE displayed a significant improvement when classifying genes involved in biological processes that are known to play pivotal essential roles in cell survival (Additional file 1: Fig. S1d; Additional file 2: Table S1; Methods, DeLong test p value < 10⁻³).

Table 1 Overview of the different benchmarked methodologies

Full size table

Notably, PRODE’s scores revealed consistent performance when making use of different PPI collections and of decreased number of interactions (Additional file 1: Fig. S2a–c) while preserving cancer cell-lines unique identity (Additional file 1: Fig. S2d). Combining the neighborhood component with gene effects leads to better performances compared to the single components alone (Additional file 1: Fig. S3a, ROC AUC of 0.97 vs. 0.95 and 0.87 for average gene effects alone or PRODE’s neighborhood component alone for sgRNA dataset and 0.94 vs. 0.87 and 0.92 for shRNA dataset). NIE scores demonstrated improved reproducibility when subjected to increasing levels of noise in the input gene effects at the cell-line level (Fig. 2c; average PRODE Spearman’s correlation = 0.85 and 0.79, average gene effects correlation = 0.72 and 0.69 for sgRNA and shRNA-based screenings, respectively) and exhibited greater consistency than average gene effects across different screening technologies (Fig. 2d; Spearman’s correlation = 0.53 vs. 0.26). Furthermore, in line with the expectation that essential genes exhibit low variation in expression levels across different contexts, we observed higher Spearman’s correlations between NIE scores and the coefficient of variation of gene expression across the collection of cell lines under study (0.60 and 0.39 for NIE scores within sgRNA and shRNA datasets respectively vs. 0.45 and 0.22 in the case of average gene effects). The lower the coefficient of variation, the lower the NIE scores (higher essentiality). As essential genes are more conserved across species compared to not-essential ones [2], we inspected the concordance of each computed essentiality measure with respect to gene-level conservation scores obtained from phylogenetic data across multiple species (Methods) with NIE scores resulting in the highest positive Spearman’s correlation (0.36 compared to average of 0.24 for sgRNA dataset; Table 2). Taken together, these results support the relevance of NIE scores when prioritizing essential genes across a collection of cancer-cell lines.

Table 2 Summary of each method performance across different scenarios

Full size table

NIE scores are robust across screening libraries and loss of function datasets

To qualify PRODE performance, we performed a set of experiments in additional settings. For instance, in the setting of different knock-out screenings performed using different libraries, therefore prone to batch effects, with the expectation that the integration of PPI would ameliorate those effects. We therefore applied PRODE on knock-out and knock-down screens performed on a common set of cancer cell-lines (14 for sgRNA, 226 for shRNA datasets) using different libraries (DEPMAP, SCORE—sgRNA [41]; ACHILLE [15], DRIVE—shRNA [42]) and processed with identical tools (CERES for sgRNAs and DEMETER for shRNAs). We ran PRODE on a single cell-line level, and we observed how the average correlation between PRODE scores across different datasets was significantly higher compared to the one of gene effects alone (average 0.82 vs. 0.62, p value < 2e − 16, paired Wilcoxon test; an example for a cancer cell-line reported Fig. 3a). PRODE scores displayed also higher average performance (ROC AUC = 0.97 vs. 0.93 for sgRNAs—top; 0.92 vs. 0.81 for shRNAs—bottom) when classifying reference essential vs. not essential genes (Fig. 3b) and, importantly, it preserved cancer cell-lines specific signal, as demonstrated by the significantly higher correlation between matched cancer cell-lines compared to unmatched ones, preserving a similar trend to the use of gene effects alone (Additional file 1: Fig. S3b).

Next, we challenged the robustness of the PRODE results investigating how NIE scores could be robust to screenings performed with reduced library sizes, as a small number of guides targeting a gene could lead to imprecise gene effect estimates [43]. Taking advantage of a collection of four different screens derived from DeWeirdt et al. [44] (Methods), we systematically subsampled the number of guide RNAs targeting each gene across four different cancer cell lines (Fig. 3c). Compared to the computation of gene effects alone, gene effects weighted by neighborhood effects (i.e., NIE scores) demonstrated greater consistency (Fig. 3c—top panel, average PCC 0.96 vs. 0.91) as well as better discrimination ability between reference essential and not essential scores (average ROC AUC 0.97 vs. 0.92).

Taken together, we believe that these results highlight the benefits of using PPIs—weighted gene effects, as they lead to more robust and consistent essentiality scores in challenging contexts (e.g., small library size).

Neighborhood essentiality uncovers functionally redundant essential genes

PRODE evaluates essentiality by analyzing the local neighborhood of each gene within a PPI network, capturing the influence of immediate interaction partners. However, this approach may encounter challenges with functionally redundant genes—such as those in multi-protein complexes (e.g., the SWI/SNF complex)—where neighboring genes can compensate for the loss of one another, masking the essentiality of individual components [45]. To address this, we focused on functionally redundant essential paralogs, constructing a benchmark dataset of paralog pairs known to exhibit essentiality only when co-deleted (e.g., Parrish et al. [46], Anvar et al. [47], Additional file 2: Table S7, Methods). By comparing PRODE’s NIE scores to average gene effects, we observed that NIE scores significantly outperformed gene effects in identifying functionally redundant essential genes (Fig. 3d–e, Wilcoxon p < 0.001), with AUCs of 0.88 and 0.74 versus 0.7 and 0.6, respectively. Further analysis revealed that essential paralogs displayed a higher fraction of interactions with reference essential genes (10% and 14% vs. 3% and 5% in Parrish and Anvar datasets; Wilcoxon p < 0.0001), suggesting that local PPI interactions enhance PRODE’s ability to detect compensatory relationships, thereby capturing subtle but critical functional roles. In line with this result, we also observed that top essential paralogs, according to PRODE, displayed a significantly greater gain in essentiality upon co-deletion than their single knock-out (Additional file 1: Fig. S4a–b).

PRODE recovers essential genes missed by shRNA technology

After assessing the robustness of NIE scores across different screening technologies, we sought to test the ability of PRODE to detect essential genes otherwise missed by knock-down (shRNA) LoF screenings compared to knock-out based (sgRNA) screens (Fig. 4a). Previous studies have reported lower detection rates of essential genes in shRNA screenings compared to sgRNA screenings [48], primarily attributed to the presence of off-target effects [26, 49,50,51] and lower consistency between cell-lines [49]. To validate our findings and provide experimental evidence, we focused on the LoF data available for a bladder cancer cell line part of the DepMap collection that we recently tested for the investigation of new treatment opportunities in bladder cancer [52], the HT1197. Starting from a list of 939 common essential genes (present in both sgRNA and shRNA datasets, Fig. 2a) and by introducing data-driven thresholds on gene effects (Additional file 1: Fig. S5a; Methods), we observed that the sgRNA and shRNA screenings performed on HT1197 jointly identified 840 out of 939 genes as essential (89%) of which 198 (21%) were identified by sgRNA technology only (Fig. 4b). Notably, NIE scores computed through the integration of HT1197 shRNA gene effects and PPIs nominated 83 (41%) of the 198 essential genes previously missed. Importantly, even if the remaining undetected essential genes did not pass the pre-computed threshold to call for an essential event, we observed that their NIE score distribution was significantly lower compared to the one of unexpressed genes, expected to be non-essential (Fig. 4c). To experimentally validate the performance of PRODE, we selected three representative genes from the reference essential gene list; RIOK1 (RIO kinase 1), which displayed an essential-like signal with and without PPIs integration; CINP (cyclin dependent kinase 2 interacting protein), deemed not-essential by both approaches; and DIS3 (exosome complex exonuclease RRP44), a gene recovered by PRODE but missed by standard shRNA gene effects (Fig. 4d). Interestingly, DIS3 and RIOK1 exhibited a higher number of PPIs with the set of reference essential genes (Fig. 4e), suggesting their involvement in essential gene processes. Further inspection among their directly interacting partners revealed an enrichment for ribosomal genes (Additional file 1: Fig. S6c–d). To experimentally verify PRODE predictions, HT1197 cells were transduced with inducible lentiviral vectors carrying two different shRNA constructs targeting the gene of interest and a non-targeting shRNA to be used as control (shNT); all vectors had a puromycin resistance cassette that allowed positive selection to ensure homogeneous lentiviral integration. Cells were then seeded in equal numbers to evaluate in vitro viability after 7–14 days of induction of shRNA expression. Crystal violet assay confirmed the predictions of PRODE, as the induced downregulation of DIS3 and RIOK1 through shRNA guides resulted in a reduction in cell viability higher than 50%, while the reduction of expression of CINP did not induce noticeable fitness changes in the HT1197 cell-line (Fig. 4f–g; Additional file 1: Fig. S7; Additional file 2: Table S3). These validations suggest that NIE scores may recover essential genes missed by genome-wide screens, as exemplified by the DIS3 gene. Indeed, despite the DIS3 average shRNA gene effect being compatible with a non-essential gene, ad hoc silencing through viability assays revealed its essentiality. To gain a more comprehensive overview of the essential genes nominated by PRODE and missed by gene effects alone, we selected a set of 179 genes displaying essential-like NIE scores in both sgRNA and shRNA datasets (Additional file 1: Fig. S8a). Notably, the composed set showed significant enrichment for RNA-splicing genes (Additional file 1: Fig. S8b) an essential process for cell survival [53].

PRODE recovers experimentally derived synthetic lethal context-essential genes

Assessing the performance of a computational tool designed to detect context-essential relationships is a non-trivial task due to the lack of consensus reference sets. To address this challenge comprehensively, we constructed three different test sets, each containing gene pairs expected to show a SL interaction with varying degrees of evidence. The datasets consisted of context-essential gene pairs derived from isogenic sgRNA screenings (n = 115), double-KO sgRNA screenings (n = 498) as well as pairs collected in an external database (SLdb; n = 82) prioritized for their experimental evidence [54] (Additional file 2: Table S4). To provide robust negative controls, we included context-gene pairs where the gene is expected to be non-essential and, therefore, hypothesized to show no differential essentiality between cell-lines part or not of the context of interest (Methods). We first tested the detection of context essentialities in each cell-lines lineage (requiring the number of per-group screened cell lines higher or equal 3) and then averaged the scores. Across all three tested scenarios, PRODE NICE scores displayed higher areas under the precision and recall curves (for the three datasets respectively, PR AUCs 0.8, 0.78, 0.8 versus the average of other approaches 0.71, 0.7, 0.71, Fig. 5a–b, Additional file 1: Fig. S9a, Table 3). Moreover, when computing ROC curves, PRODE resulted in higher ROC AUCs compared to other methodologies (PRODE ROC AUCs: 0.68, 0.65, 0.71 for three datasets compared to an average of 0.57, 0.56, and 0.59 for the other approaches respectively; DeLong’s test p value < 0.05, Fig. 5d–f, Table 3). Furthermore, as context-essential relationships may be lineage-specific [55], we investigated the ability of PRODE to recover context-essential interactions nominated by the isogenic screenings while performing a lineage-matched analysis (Methods, Fig. 5c). On top of significantly higher average ROC AUCs when discriminating top vs. bottom 100 context-essential genes nominated in each isogenic screen (Additional file 1: Fig. S9b), NICE scores prioritized the 22% of the top 100 candidate genes nominated by each experiment among its top 10% of hits. This fraction was significantly higher compared to that of differential gene effects (Fig. 5d, paired Wilcoxon test, p value < 10⁻⁵). Moreover, as DNA-damage gene sets are expected to enrich among the top candidates when prioritizing context-essential genes in the presence of PARP1 loss, NICE scores displayed similar enrichment compared to the results emerging from the isogenic-screens and higher enrichment compared to differential gene effects (Additional file 1: Fig. S9c). Overall, these results suggest that PRODE’s NICE scores may recover context-essential genes when analyzing collections of publicly available LoF screenings data.

Table 3 Overview of performances of each methodology across three different datasets collecting reference SL pairs

Full size table

PRODE identifies OXPHOS genes as context-essential in HER2 + breast cancer

Context-specific essential relationships are of high clinical relevance as they may translate into novel therapies and reveal uncharted biomarkers. To investigate the translational potential of context-specific essential genes prioritized by PRODE, we focused on Her2 + breast cancer, an aggressive molecular subtype [56] characterized by the amplification of the gene ERBB2 which encodes for the Her2 protein (Fig. 6a) and compared the top 100 candidates prioritized by NICE scores, with the ones nominated by differential gene effects. Among the genes nominated by both approaches, we found candidates that have been previously reported to be involved in the same pathways as ERBB2, as well as genes associated with resistance to anti-Her2 therapy (Additional file 1: Fig. S10a; Additional file 2: Table S5). Interestingly, by focusing on 56 genes uniquely identified by PRODE and not by differential gene effects, we quantified a significant enrichment for genes involved in oxidative phosphorylation (OXPHOS genes) (Fig. 6d; Additional file 2: Table S6). By further inspection, we noticed that PRODE scored 32% (29 out of 97) of all OXPHOS genes within its 56 private-genes hits, a significantly higher proportion compared to differential gene effects (5%, proportion test p value = 10⁻⁶). Intriguingly, we also observed a trend for significantly higher expression of OXPHOS in HER2 + TCGA patient tumors compared to other molecular subtypes (Fig. 6e; HER2 vs. others p value < 0.05, Wilcoxon test), supporting the relevance of OXPHOS within the HER2 + tumors. To explore the clinical implications of the OXPHOS genes nominated among the PRODE private calls, we investigated their association with overall survival in the TCGA breast cancer cohort and tested whether the interaction between the Her2 + subtype and the expression of OXPHOS genes might be associated with different survival outcomes. The 29 OXPHOS genes analyzed exhibited significantly lower hazard ratios compared to randomly sampled genes (Wilcoxon test p value = 10⁻⁴, Additional file 1: Fig. S10b—eight showed a trend for a significant negative association; p value < 0.1) suggesting that low expression of OXPHOS genes is associated with better overall survival exclusively within Her2 + patients supporting their potential as critical players in Her2 + tumor fitness. Strikingly, when averaging the expression of the OXPHOS genes identified by PRODE, we noticed a better overall survival in Her2 + patients (log rank test p = 0.017, Fig. 6g) in the presence of low OXPHOS expression, but no significant association was found when jointly tested in the other subtypes, highlighting their context-specific clinical relevance. To delve into the clinical implications of OXPHOS gene expression in response to therapy, we evaluated five cohorts of Her2 + breast cancer patients treated with trastuzumab, a monoclonal antibody that targets and inhibits the Her2 protein. Despite that Her2 amplification sensitizes to Her2 inhibitors, patients’ responses to drugs such as trastuzumab have been reported to vary significantly and be largely dependent on the expression levels of Her2 itself, even if characterized by genomic amplification. Our analysis revealed that low expression levels of the 29 investigated OXPHOS genes display an average association with better clinical response, independent of the expression of Her2 itself (Fig. 6h–i). This finding suggests that the OXPHOS genes may serve as putative biomarkers for sensitivity to Her2-targeted therapy.

Discussion

In recent years, the availability of publicly accessible collections of LoF screening data, such as the one generated by the DepMap project, revolutionized the study of essential and context-essential genes. However, the migration from shRNA-based screenings to sgRNA-based ones and the use of different screening libraries and technologies has led to challenges in achieving reproducible results. Investigating the context-specific nature of gene essentiality has proven particularly complicated, limiting the translation of context-essential relationships into clinical applications [23]. This work presents PRODE, a novel analysis framework that provides robust insights into gene and context-specific essentiality. By leveraging the knowledge that essential genes participating in PPIs share common biological functions, display high interconnectivity, and cluster together, PRODE computes NIE and NICE scores that capture gene-level signals and incorporate information from gene neighborhoods, resulting in robust and reliable predictions of essential genes and context-specific essential relationships. The robustness of PRODE NIE scores is demonstrated by its ability to discriminate reference essential genes from non-essential ones effectively both on average and on a per-cell-line basis. Essential genes identified by PRODE display independently assessed biological properties, such as higher conservation scores across species and stable expression levels, further validating the accuracy of the predictions. The essential candidates identified by PRODE and not by gene effects display properties expected to be a primer of essentiality, such as enrichment for vital biological pathways (e.g., ribosome, splicing) and a greater number of interactions with synthetic lethal genes (Additional file 1: Fig. S5b–c, [57]). Remarkably, by weighting gene effects through local neighborhood signals, PRODE scores demonstrated robustness across LoF datasets from diverse sources and knockout libraries (Fig. 3a, [41]). PRODE also proved resilient to downscaled screening data, maintaining stable performance and consistency even when the number of guide RNAs per gene was reduced to one. Taken together, these results highlight PRODE as a reliable tool for analyzing screening data under challenging technical conditions.

Given the recent focus on functionally redundant gene pairs, such as paralogs (see [46, 47, 58, 59]), we found that evaluating essentiality within the context of a gene’s local neighborhood, rather than as an isolated property, improves the identification of essential buffering genes (Fig. 3d–e). Beyond paralogs, functional redundancy may more broadly characterize protein complexes composed of multiple genes. Recognizing this as a potential limitation of the current PRODE implementation, we foresee future updates allowing signal aggregation across entire protein complexes, treating them as cohesive functional units rather than isolated gene components.

Furthermore, PRODE’s efficacy in rescuing genes missed by shRNA-based gene effects highlights its value in detecting essential genes by leveraging information linked to the gene’s biological function rather than solely relying on the screening signal, limited by the technology in use. In the context-essentiality analysis, PRODE outperforms alternative methodologies and successfully identifies experimentally derived context-essential genes from isogenic cell lines and double knockout screenings. When comparing gene effects distributions between two groups of cell lines, PRODE consistently outperforms differential gene effects, reinforcing its superiority in detecting context-essential relationships. With ad hoc LoF screenings (e.g., in isogenic settings), NICE scores proved to be a better choice in prioritizing novel findings for further validation than differential gene effects, as PRODE identifies a higher fraction of hits derived from isogenic experiments at the lineage level. Our results suggest that the use of standard approaches for hit prioritization of in-house experiments (such as by leveraging differential gene effects computed on publicly available collections of screens) may result in the loss of potentially relevant context-specific essential genes.

While PRODE demonstrates improved performances over the use of gene effects alone, there are potential limitations to consider. As with any method leveraging PPI data, the reliability of PRODE’s predictions may be influenced by the quality and coverage of the available PPI networks [60]. Encouragingly, PRODE’s scores remained highly consistent across multiple PPI collections, showing robustness to variations in input interactions while effectively preserving cell-line specificity (Additional file 1: Figs. S2–S3). Notably, when we assessed the use of gene regulatory networks (GRNs) in place of PPIs, we observed a reduction in performance (Additional file 1: Fig. S12). This finding underscores the unique utility of PPIs in gene essentiality studies, as essential genes not only interact more frequently among themselves but also tend to cluster more closely than non-essential genes (groups of essential genes vs. randomly sampled gene sets, Wilcoxon test p value < 0.0001). To further enhance confidence in PRODE’s predictions, the implemented tool now includes an option to incorporate PPI weights, allowing users to calculate a confidence interval for NIE and NICE scores based on the reliability of each interaction (Additional file 1: Fig. S13b–c).

When applied to a clinically relevant scenario like Her2 + breast cancer tumors, PRODE recapitulated top hits prioritized by differential gene effects and nominated novel candidates otherwise undetected. Indeed, the expression of OXPHOS genes displayed key features of high clinical relevance, such as their link to patient overall survival in Her2 + tumors only and their role as biomarkers associated with sensitivity to treatment. Interestingly, the relevance of OXPHOS genes in breast cancer has recently been independently highlighted in a metastatic setting field [50, 51], and Her2 + tumors were also demonstrated to display sensitivity to a potent OXPHOS inhibitor [61]. While systematic experimental validations are needed to further confirm the role of OXPHOS genes in the context of Her2 + breast cancer tumors, this analysis represents proof of the potential clinical impact of context-essential genes prioritized by PRODE and otherwise missed by standard analysis workflows.

Conclusions

In this work, we introduced PRODE, an analytical framework that integrates gene effects of directly interacting partners of a gene and computes neighborhood-informed estimates of essentiality and context-essentiality by integrating gene effects data and PPIs.

Methods

Data collection and curation

Gene effects and cell lines molecular data used in this manuscript were downloaded from DepMap (https://depmap.org/portal/). Gene knock-out effects derived from sgRNA screening were computed by Chronos tool [17] while shRNA-derived gene knock-down effects were generated through DEMETER tool [15] and both datasets downloaded as part of the 22Q1 DepMap release. Starting from the collections of all screened models, we retained only the ones with matched expression data, annotation for microsatellite instability, and cell-culture type information for a total of 461 for the sgRNA dataset and 388 cell lines for the shRNA dataset. DEPMAP, SCORE (CERES processed), ACHILLE, and DRIVE (DEMETER processed) datasets were downloaded from the DepMap data portal (https://depmap.org/portal/data_page/?tab=customDownloads). The collection of human PPIs was downloaded from the STRING version 11 (https://string-db.org/, [27]), HUMANNET [62], and BIOGRID (preserving only human interactions [63]) databases. As we used STRING throughout the majority of our analysis, except when performing comparisons across PPIs (Additional file 1: Fig. S3), only the interactions that displayed a score higher or equal to 400 were used in the analysis allowing to preserve a high fraction of genes present within the gene effects datasets (95%; Additional file 1: Fig. S11) while improving the quality of connections compared to the lack of any threshold. Phylogenetic conservation scores across 1627 species were obtained from Nair et al. [64]. As each score consists of a sequence similarity between a human gene and its homologs in each of the species [64], the final used score was computed as the average conservation scores across all the species. The final measure was multiplied by − 1, so that low negative scores correspond to higher conservation and may be directly compared to NIE scores expecting higher essentiality of conserved genes. Datasets of double knock out of paralog pairs were derived from Parrish et al. [46] and Anvar et al. [47]. Briefly, to create Parrish et al. dataset, we first considered only those gene pairs with expected double KO effect greater than 0 both in PC9 and Hela cell-lines, then, divided them into positive and negative controls according to their genetic interaction flag (GI flag). For Anvar et al., we selected as negative controls those gene pairs with expected and observed double knock-out effects greater than 0.1 and used as positive controls the reference essential paralogs (n = 18). The TCGA-BRCA expression dataset used within the analysis of Her2 + context-specific dependencies was downloaded from recount3 [65] along with patients’ clinical data. GEO datasets of patients treated with trastuzumab were downloaded from Dinstag et al. [9].

PRODE method outline

PRODE computes neighborhood-informed essentiality and context-essentiality estimates starting from a single (or multiple) profile (s) of gene effects and a list of PPIs as input, with gene effects being a measure of cell-fitness changes after a gene-induced LoF (knock-out or knock-down). A lower (negative) gene effect points to an essential gene, which LoF causes a severe reduction in cell-viability. Both PRODE scores combine, for each gene, neighborhood and gene level signal.

NIE scores

Starting from a matrix $G{E}^{n\times m}$ of gene effects composed by $n$ genes and $m$ cell-lines, PRODE first converts the input gene effects profile (or average gene effects in case of multiple profiles) into percentiles${U}_{GE}=\{{u}_{{g}_{1}},..., {u}_{{g}_{i}}, .., {u}_{{g}_{n}}\}\in [\text{0,1})$. To quantify the neighborhood-level essentiality for a gene${g}_{i}$, PRODE retrieves its directly interacting partners from the input reference PPIs along with the corresponding percentiles ${U}_{{g}_{i}}\in {U}_{GE}$ with the length of ${U}_{{g}_{i}}$ being the size of${g}_{i}$’s neighborhood. Then, by leveraging the Robust Rank Aggregation (RRA) algorithm, PRODE quantifies the skewness of the obtained percentiles towards 0. The expectation is that a neighborhood populated by essential genes displays positive skewness and low percentiles. Briefly, the RRA algorithm computes a score representative of the neighborhood skewness as:

$$\rho ({U}_{{g}_{i}})=mi{n}_{k = 1,...,s}({\beta }_{k, s}({U}_{{g}_{i}}))$$

where $s$ is the total number of genes of ${g}_{i}$’s neighborhood, and ${\beta }_{k, s}$ is the probability, computed through a beta distribution, of observing at least $k$ over $s$ genes with lower or equal percentiles than the one of the ${k}^{th}$ gene. Since $\rho$ is dependent on the size of each gene’s neighborhood, PRODE computes $\rho {\prime}$ as an empirical p value according to a background distribution obtained through random shuffling of average gene effects percentiles and maintaining the neighborhood size of each gene ${g}_{i}$. In order to cut computational costs related to the computation of the background distributions, these are pre-computed and approximated by fitting a set of Weibull’s distributions. The final NIE score is computed as:

$$NI{E}_{{g}_{i}}=log({u}_{\rho {^{\prime}}_{i}}\times {u}_{{g}_{i }})$$

where ${u}_{\rho {^{\prime}}_{i}}$ is the percentile of $\rho {^{\prime}}_{i}$ of ${g}_{i}$’s neighborhood.

NICE scores

In order to assess neighborhood-informed context essentiality, PRODE implements a procedure similar to the NIE scores except for the type of data integrated with PPIs. Context-specific essentiality is computed by comparing two groups of collections of gene effects, in a case–control fashion. Considering a matrix $G{E}^{n\times m}$ composed by $n$ genes and $m$ cell-lines, PRODE computes differential gene effects for each gene of the matrix by fitting a linear model of the form:

$$g{e}_{i}\sim {\beta }_{0,i}+{\beta }_{1,i }group+{\beta }_{2,i}M$$

where $g{e}_{i}$ is the vector of gene effects of a gene${g}_{i}$, $group$ is a binary vector being 0 for cell lines part of the control population, and $M$ being the matrix of putative confounding factors included in the analysis. In this case, the percentiles ${U}_{GE}=\{{u}_{{g}_{1}},..., {u}_{{g}_{i}}, .., {u}_{{g}_{n}}\}\in [\text{0,1})$ computed to assess neighborhood and gene-level context-essentiality signal are obtained from the vector ${\widehat{\beta }}$ of the rescaled coefficients ${\beta }_{1}$ for every gene included in the experiments. Before combining neighborhood and gene-level signals, we exclude from the overall gene lists those that display an average gene effect higher or equal to 0 in the control population. This is supported by the observation that positive gene effects are linked to a fitness improvement after gene depletion and, when compared to gene effects within the case population, they may result in differential signals even if their depletion does not cause any relevant fitness reduction within the case population. When PPI weights are present, PRODE allows for the computation of confidence intervals (CIs) around the NIE and NICE scores. To do so, it runs over the input PPI collection multiple times. In each run, it removes interactions with the lowest scores, progressively refining the input PPI collection by retaining only the top-scoring interactions at increasingly stringent levels. This approach allows to compute CIs on a user-specified level (default being 95th) and the NIE scores result as the average scores across the iterations.

Validation datasets and performance assessment

Essentiality—validation datasets

In order to assess the performance of PRODE when distinguishing between essential and not essential genes, we established validation datasets composed of 991 positive control genes and 665 negative controls for sgRNA and 976 and 724 for shRNA (with respectively 939 and 540 positive and negative controls present in both sgRNA and shRNA; Additional file 2: Table S1). Negative control genes were selected as in Vinceti et al. [40] by including genes that show a lack of expression across cell lines—average expression of 0—as well as genes that previously displayed associations with specific contexts (only in this case, cell-lines expression data used consisted in the CMP dataset used by Vinceti et al. [40]). As positive controls, we shortlisted a diverse collection of genes expected to be essential for cell survival as (1) already identified by three independent works that analyzed loss of functions screens [3, 4, 66] and (2) part of biological processes essential for cell survival such as spliceosomal, proteasomal, ribosomal genes and genes participating in DNA replication retrieved from KEGG and GSEA databases.

Essentiality—performance assessment

We compared the performance of PRODE to the one of the average gene effects, as well as other heuristics that combine signal of genes part of the same local neighborhood, named as NSUM (neighborhood sum), NMAX (neighborhood max), and RWR (random walk with restart). In the case of NSUM and NMAX, we took inspiration from MUFFINN, a previously published tool designed for the identification of cancer driver genes based on mutational frequencies. MUFFINN comes with two modules (sum and max), which compute, respectively, scores capturing the sum and maximum mutational frequencies within a gene local neighborhood. Adapted to our problem, we maintained their scoring system, adapting it to gene effects data (i.e., gene effects were first multiplied by − 1, so that higher value indicated more essential genes, similarly to higher mutational frequencies in the original tools). To account for neighborhood size and for a direct comparison with PRODE, we turned each score into an empirical p value by computing a background distribution of 10^⁴ data points through random shuffling of the average gene effects. Similarly, in the case of random walk with restart [38], we computed the background distributions with 10^³ random iterations. To compute a data-driven threshold and perform a discrete call for an essential or not-essential gene, we identified the NIE score or the average gene effect corresponding to the maximal sensitivity and specificity when classifying essential and not-essential genes. We observed slightly different thresholds for sgRNA and shRNA datasets. In the case of PRODE, genes that showed NIE scores < − 4.14 and − 3.48 for sgRNA and shRNA datasets were deemed as essential, while we used − 0.43 and − 0.22 for average gene effects, respectively (Additional file 1: Fig. S2).

Context-essentiality—validation datasets

As context-specific essential gene sets, we collected context-gene pairs from different data sources, including experimentally and literature-derived sets. These pairs are formed by a context (i.e., the presence of a loss of function or gain of function mutation) and a gene, expected to display essential-like behavior in that particular context. We assembled three validation sets spanning context-gene pairs from (1) a collection of 11 different isogenic loss-of-function screenings performed in three different research works [44, 58, 67] and in-house, (2) a collection of six double KO screenings across five works [59, 68,69,70,71], and (3) a collection of SL interactions retrieved from SLdb database. For the first dataset, we included the top 10 hits as positive controls for each isogenic screen performed. Notably, we re-analyzed each single screening following a consistent workflow, as described by DeWeirdt et al. [44]. Among the isogenic screenings performed, we also included a previously unpublished in-house screening performed on the HT1197 urinary-tract cell-line. The experiment was performed in an isogenic setting by performing a KO screen on a wild-type clone and on a clone engineered with a bi-allelic knock out of CDKN2A, CDKN2B, and MTAP genes (3KO clone), part of the 9p21.3 genomic locus. Similar to the isogenic dataset, the double KO dataset was populated by the top 100 hits identified in each of the screenings produced in four research works. Finally, we composed the SLdb dataset by retaining only gene pairs that presented experimental-based evidence and kept only those gene-context pair in which gene A (the context) was annotated as a tumor suppressor gene according to a list of known tumor suppressors downloaded from OncoKB database [72]. In each of the datasets, we included negative controls by keeping the same context of positive controls and adding, as partners, lowly expressed genes (average gene expression < 0.01 quantile) expected to display no-essentiality in different contexts.

Context essentiality performance assessment

Similar to the essentiality performance assessment, we compared the NICE scores computed by PRODE to the ones of other methodologies. Importantly, whenever possible, for each analysis, we included MSI and cell culture type as covariates. In this case, we included an additional heuristic, Np-value (neighborhood p value). Similarly to MUFFINN, we computed Np-value scores by adapting a previously published methodology, NetSig [37], originally designed for the identification of cancer driver genes. Within our framework, Np-value computes the context-essentiality score by aggregating p values from the first-level neighborhood of each gene. p values are obtained from the linear model fit for each gene. To assess the performance of each tool, we performed an analysis at lineage-level and computed context-essentiality scores when a sufficient number of cell-line models was available (at least 3 cell-lines displaying the context modification and 3 cell-lines being wild-type). While in the case of isogenic screens and double KO screens datasets, we stratified cell-lines based on the expression level of the gene(s) part of the context (z-score < − 1.5 for loss of function, − 0.5 < z-score < 0.5 for control cell-lines), in the case of SLdb validation dataset we leveraged genomic information. Cell lines carrying a LoF of the tumor suppressor gene defined as the context were identified when displaying (1) z-scored expression < − 1.5 or (2) copy-number = 0 or (3) presence of a deleterious point mutation, annotated as loss-of-function by OncoKB. The final score used to compute precision, recall, and AUCs resulted as the averaged context-essentiality scores of each tool across tested lineages.

Gene silencing and crystal violet assays

For gene specific silencing, we used GPP Web Portal (http://portals.broadinstitute.org/gpp/public/) to design two shRNA per target gene to be cloned into a doxycycline inducible lentiviral vector. To produce the vectors, lenti pLKO.1 puro back-bones with the cloned shRNA were co-transfected into HEK293T cells with the packaging plasmids pVSVg, and psPAX2. HT1197 cells were transduced with filtered viral media plus polybrene 5 μg/mL and then selected by puromycin exposure. Cells were seeded in the presence of doxycycline in 24 well plates at a concentration of 5000 cells/well to perform a proliferation assay at day 7 and 14. Cells were fixed with 4% formaldehyde and stained with a crystal violet 0.1% solution (w/v). After rinsing, 10% acetic acid solution was introduced to release the crystal violet from the cells of every well, and the absorbance was read at 595 nm for a semi-quantitation.

Genome-wide CRISPR screening on HT1197 cell-line

We performed a genome-wide CRISPR screening with the Gecko v2 library, a 2-vector library system [73] that consists of over 120,000 unique gRNAs for gene knock-out human genome (6 sgRNAs per gene, 4 sgRNAs per miRNA, 1000 control sgRNAs non-targeting). The library was expanded in-house; full library representation post expansion was verified by NGS (loss of 64 sgRNA on 123,411 and Gini index of 0.10). To make lentivirus, lentiCRISPR backbones with sgRNA cloned were co-transfected into HEK293T cells with the packaging plasmids pVSVg and psPAX2. Culture supernatant was titrated as described in [74]. Since an m.o.i. of 0.3 is critical to maximize the integration of 1 sgRNA per cell, transduction efficiency experiments were performed on HT1197, with increasing volumes of virus. A HT1197 WT clone and a 3KO clone (already expressing Cas9) were transduced with the GecKo v2 library, selected with puromycin for 6 days, and then cultured for 28 days (34 days post infection) equivalent to 13 replication cycles. A library coverage of 300 × per sgRNA was maintained at every step. Genomic DNA was extracted from the equivalent of 40 million cells at day 6 and 34 post transduction. A two-round PCR was performed for the amplification of the genome integrated sgRNA sequences and the addition of multiplexing barcodes for NGS sequencing.

Assessment of clinical relevance of OXPHOS genes within HER2 + breast cancer tumors

We investigated the clinical relevance of the OXPHOS genes identified by PRODE in two different ways. First, following a similar procedure as previous works [9,10,11], we fitted, for each gene, a multivariate Cox regression model including a multiplicative term between two binary variables encoding (1) the low/high average of OXPHOS (stratified according to the median) and (2) the presence of the HER2 + subtype. We also included covariates such as age and genomic instability index. The rationale behind the analysis is that low expression of an essential gene in the context of Her2 + tumors may result in lower tumoral fitness and, therefore, linked to better patient survival. A negative hazard ratio reported in Additional file 1: Fig. S10b reflects a worse survival of patients with HER2 + tumors and high OXPHOS gene expression. Kaplan–Meier curves in Fig. 6e were obtained by averaging the expression of 29 OXPHOS genes in each patient. Finally, we evaluated the association of each OXPHOS gene expression to the response to trastuzumab inhibitor across 5 patients’ cohorts. In order to do that, the response to treatment was binarized as in the original publications, as explained in [9]. To test the association between patients’ response and gene expression depicted in Fig. 6f, we fitted logistic regression models, including the expression of each OXPHOS gene and the expression of ERBB2 as a covariate. Before model fitting, expression levels were log-transformed and z-scored across patients.

Data availability

The source code is publicly available under the MIT open-source license compliant with Open Source Initiative. PRODE’s source code is deposited as an R package in Zenodo (version used throughout this manuscript; https://doiorg.publicaciones.saludcastillayleon.es/10.5281/zenodo.14697551 [75]) while up-to-date versions are available on GitHub (https://github.com/demichelislab/prodeTool). The code and the data to generate the figures are available on Zenodo (https://doiorg.publicaciones.saludcastillayleon.es/10.5281/zenodo.14026954 [76]).

References

Bartha I, di Iulio J, Venter JC, Telenti A. Human gene essentiality. Nat Rev Genet. 2018;19:51–62.
Article CAS PubMed Google Scholar
Rancati G, Moffat J, Typas A, Pavelka N. Emerging and evolving concepts in gene essentiality. Nat Rev Genet. 2018;19:34–49.
Article CAS PubMed Google Scholar
Behan FM, Iorio F, Picco G, Gonçalves E, Beaver CM, Migliardi G, et al. Prioritization of cancer therapeutic targets using CRISPR-Cas9 screens. Nature. 2019;568:511–6.
Article CAS PubMed Google Scholar
Hart T, Chandrashekhar M, Aregger M, Steinhart Z, Brown KR, MacLeod G, et al. High-resolution CRISPR screens reveal fitness genes and genotype-specific cancer liabilities. Cell. 2015;163:1515–26.
Article CAS PubMed Google Scholar
O’Neil NJ, Bailey ML, Hieter P. Synthetic lethality and cancer. Nat Rev Genet. 2017;18:613–23.
Article PubMed Google Scholar
Topatana W, Juengpanich S, Li S, Cao J, Hu J, Lee J, et al. Advances in synthetic lethality for cancer therapy: cellular mechanism and clinical translation. J Hematol Oncol. 2020;13:118.
Article PubMed PubMed Central Google Scholar
Hu Y, Petit SA, Ficarro SB, Toomire KJ, Xie A, Lim E, et al. PARP1-driven poly-ADP-ribosylation regulates BRCA1 function in homologous recombination-mediated DNA repair. Cancer Discov. 2014;4:1430–47.
Article CAS PubMed PubMed Central Google Scholar
Lord CJ, Ashworth A. PARP inhibitors: synthetic lethality in the clinic. Science. 2017;355:1152–8.
Article CAS PubMed PubMed Central Google Scholar
Dinstag G, Shulman ED, Elis E, Ben-Zvi DS, Tirosh O, Maimon E, et al. Clinically oriented prediction of patient response to targeted and immunotherapies from the tumor transcriptome. Med. 2023;4:15-30.e8.
Article CAS PubMed Google Scholar
Lee JS, Das A, Jerby-Arnon L, Arafeh R, Auslander N, Davidson M, et al. Harnessing synthetic lethality to predict the response to cancer treatment. Nat Commun. 2018;9:2546.
Article PubMed PubMed Central Google Scholar
Lee JS, Nair NU, Dinstag G, Chapman L, Chung Y, Wang K, et al. Synthetic lethality-mediated precision oncology via the tumor transcriptome. Cell. 2021;184:2487-2502.e13.
Article CAS PubMed PubMed Central Google Scholar
Chan EM, Shibue T, McFarland JM, Gaeta B, Ghandi M, Dumont N, et al. WRN helicase is a synthetic lethal target in microsatellite unstable cancers. Nature. 2019;568:551–6.
Article CAS PubMed PubMed Central Google Scholar
Cohen-Sharir Y, McFarland JM, Abdusamad M, Marquis C, Bernhard SV, Kazachkova M, et al. Aneuploidy renders cancer cells vulnerable to mitotic checkpoint inhibition. Nature. 2021;590:486–91.
Article CAS PubMed PubMed Central Google Scholar
Hong C, Schubert M, Tijhuis AE, Requesens M, Roorda M, van den Brink A, et al. cGAS–STING drives the IL-6-dependent survival of chromosomally instable cancers. Nature. 2022;607:366–73.
Article CAS PubMed Google Scholar
Tsherniak A, Vazquez F, Montgomery PG, Weir BA, Kryukov G, Cowley GS, et al. Defining a cancer dependency map. Cell. 2017;170:564-576.e16.
Article CAS PubMed PubMed Central Google Scholar
Meyers RM, Bryan JG, McFarland JM, Weir BA, Sizemore AE, Xu H, et al. Computational correction of copy number effect improves specificity of CRISPR-Cas9 essentiality screens in cancer cells. Nat Genet. 2017;49:1779–84.
Article CAS PubMed PubMed Central Google Scholar
Dempster JM, Boyle I, Vazquez F, Root DE, Boehm JS, Hahn WC, et al. Chronos: a cell population dynamics model of CRISPR experiments that improves inference of gene fitness effects. Genome Biol. 2021;22:343.
Article PubMed PubMed Central Google Scholar
Li W, Xu H, Xiao T, Cong L, Love MI, Zhang F, et al. MAGeCK enables robust identification of essential genes from genome-scale CRISPR/Cas9 knockout screens. Genome Biol. 2014;15:554.
Article PubMed PubMed Central Google Scholar
Iorio F, Behan FM, Gonçalves E, Bhosle SG, Chen E, Shepherd R, et al. Unsupervised correction of gene-independent cell responses to CRISPR-Cas9 targeting. BMC Genomics. 2018;19:604.
Article PubMed PubMed Central Google Scholar
Pacini C, Duncan E, Gonçalves E, Gilbert J, Bhosle S, Horswell S, et al. A comprehensive clinically informed map of dependencies in cancer cells and framework for target prioritization. Cancer Cell. 2024;42:301-316.e9.
Article CAS PubMed Google Scholar
Lord CJ, Quinn N, Ryan CJ. Integrative analysis of large-scale loss-of-function screens identifies robust cancer-associated genetic interactions. Elife. 2020;9. https://doiorg.publicaciones.saludcastillayleon.es/10.7554/eLife.58925.
Pacini C, Dempster JM, Boyle I, Gonçalves E, Najgebauer H, Karakoc E, et al. Integrated cross-study datasets of genetic dependencies in cancer. Nat Commun. 2021;12:1661.
Article CAS PubMed PubMed Central Google Scholar
Ryan CJ, Bajrami I, Lord CJ. Synthetic lethality and cancer - penetrance as the major barrier. Trends Cancer Res. 2018;4:671–83.
Article CAS Google Scholar
Heo S-J, Enriquez LD, Federman S, Chang AY, Mace R, Shevade K, et al. Compact CRISPR genetic screens enabled by improved guide RNA library cloning. Genome Biol. 2024;25:25.
Article CAS PubMed PubMed Central Google Scholar
Smith I, Greenside PG, Natoli T, Lahr DL, Wadden D, Tirosh I, et al. Evaluation of RNAi and CRISPR technologies by large-scale gene expression profiling in the connectivity map. PLoS Biol. 2017;15: e2003213.
Article PubMed PubMed Central Google Scholar
Morgens DW, Deans RM, Li A, Bassik MC. Systematic comparison of CRISPR/Cas9 and RNAi screens for essential genes. Nat Biotechnol. 2016;34:634–6.
Article CAS PubMed PubMed Central Google Scholar
Azhagesan K, Ravindran B, Raman K. Network-based features enable prediction of essential genes across diverse organisms. PLoS One. 2018;13: e0208722.
Article PubMed PubMed Central Google Scholar
Benstead-Hume G, Wooller SK, Renaut J, Dias S, Woodbine L, Carr AM, et al. Biological network topology features predict gene dependencies in cancer cell-lines. Bioinform Adv. 2022;2:vbac084.
Article PubMed PubMed Central Google Scholar
Zhong J, Tang C, Peng W, Xie M, Sun Y, Tang Q, et al. A novel essential protein identification method based on PPI networks and gene expression data. BMC Bioinformatics. 2021;22:248.
Article CAS PubMed PubMed Central Google Scholar
Zhang X, Xiao W, Xiao W. DeepHE: accurately predicting human essential genes based on deep learning. PLoS Comput Biol. 2020;16: e1008229.
Article CAS PubMed PubMed Central Google Scholar
Jiang P, Wang H, Li W, Zang C, Li B, Wong YJ, et al. Network analysis of gene essentiality in functional genomics experiments. Genome Biol. 2015;16:239.
Article PubMed PubMed Central Google Scholar
Wang W, Malyutina A, Pessia A, Saarela J, Heckman CA, Tang J. Combined gene essentiality scoring improves the prediction of cancer dependency maps. EBioMedicine. 2019;50:67–80.
Article CAS PubMed PubMed Central Google Scholar
Ku AA, Hu H-M, Zhao X, Shah KN, Kongara S, Wu D, et al. Integration of multiple biological contexts reveals principles of synthetic lethality that affect reproducibility. Nat Commun. 2020;11:2375.
Article CAS PubMed PubMed Central Google Scholar
Sharma S, Dincer C, Weidemüller P, Wright GJ, Petsalaki E. CEN-tools: an integrative platform to identify the contexts of essential genes. Mol Syst Biol. 2020;16: e9698.
Article CAS PubMed PubMed Central Google Scholar
Kelly MR, Kostyrko K, Han K, Mooney NA, Jeng EE, Spees K, et al. Combined proteomic and genetic interaction mapping reveals new RAS effector pathways and susceptibilities. Cancer Discov. 2020;10:1950–67.
Article CAS PubMed PubMed Central Google Scholar
Cho A, Shim JE, Kim E, Supek F, Lehner B, Lee I. MUFFINN: cancer gene discovery via network analysis of somatic mutation data. Genome Biol. 2016;17:129.
Article PubMed PubMed Central Google Scholar
Horn H, Lawrence MS, Chouinard CR, Shrestha Y, Hu JX, Worstell E, et al. NetSig: network-based discovery from cancer genomes. Nat Methods. 2018;15:61–6.
Article CAS PubMed Google Scholar
Guney E, Oliva B. Exploiting protein-protein interaction networks for genome-wide disease-gene prioritization. PLoS One. 2012;7: e43557.
Article CAS PubMed PubMed Central Google Scholar
Kolde R, Laur S, Adler P, Vilo J. Robust rank aggregation for gene list integration and meta-analysis. Bioinformatics. 2012;28:573–80.
Article CAS PubMed PubMed Central Google Scholar
Vinceti A, Karakoc E, Pacini C, Perron U, De Lucia RR, Garnett MJ, et al. CoRe: a robustly benchmarked R package for identifying core-fitness genes in genome-wide pooled CRISPR-Cas9 screens. BMC Genomics. 2021;22:828.
Article CAS PubMed PubMed Central Google Scholar
Dempster JM, Pacini C, Pantel S, Behan FM, Green T, Krill-Burger J, et al. Agreement between two large pan-cancer CRISPR-Cas9 gene dependency data sets. Nat Commun. 2019;10:5817.
Article CAS PubMed PubMed Central Google Scholar
McDonald ER 3rd, de Weck A, Schlabach MR, Billy E, Mavrakis KJ, Hoffman GR, et al. Project DRIVE: a compendium of cancer dependencies and synthetic lethal relationships uncovered by large-scale, deep RNAi screening. Cell. 2017;170:577-592.e10.
Article CAS PubMed Google Scholar
Bradford J, Perrin D. A benchmark of computational CRISPR-Cas9 guide design methods. PLoS Comput Biol. 2019;15: e1007274.
Article CAS PubMed PubMed Central Google Scholar
DeWeirdt PC, Sangree AK, Hanna RE, Sanson KR, Hegde M, Strand C, et al. Genetic screens in isogenic mammalian cell lines without single cell cloning. Nat Commun. 2020;11:752.
Article CAS PubMed PubMed Central Google Scholar
Singh A, Modak SB, Chaturvedi MM, et al. SWI/SNF chromatin remodelers: structural, functional and mechanistic implications. Cell Biochem Biophys. 2023;81:167–87.
Article CAS PubMed Google Scholar
Parrish PCR, Thomas JD, Gabel AM, Kamlapurkar S, Bradley RK, Berger AH. Discovery of synthetic lethal and tumor suppressor paralog pairs in the human genome. Cell Rep. 2021;36: 109597.
Article CAS PubMed PubMed Central Google Scholar
Esmaeili Anvar N, Lin C, Ma X, Wilson LL, Steger R, Sangree AK, et al. Efficient gene knockout and genetic interaction screening using the in4mer CRISPR/Cas12a multiplex knockout platform. Nat Commun. 2024;15:3577.
Article CAS PubMed PubMed Central Google Scholar
Munoz DM, Cassiani PJ, Li L, Billy E, Korn JM, Jones MD, et al. CRISPR screens provide a comprehensive assessment of cancer vulnerabilities but generate false-positive hits for highly amplified genomic regions. Cancer Discov. 2016;6:900–13.
Article CAS PubMed Google Scholar
Evers B, Jastrzebski K, Heijmans JPM, Grernrum W, Beijersbergen RL, Bernards R. CRISPR knockout screening outperforms shRNA and CRISPRi in identifying essential genes. Nat Biotechnol. 2016;34:631–3.
Article CAS PubMed Google Scholar
Jackson AL, Bartz SR, Schelter J, Kobayashi SV, Burchard J, Mao M, et al. Expression profiling reveals off-target gene regulation by RNAi. Nat Biotechnol. 2003;21:635–7.
Article CAS PubMed Google Scholar
Bhinder B, Djaballah H. Systematic analysis of RNAi reports identifies dismal commonality at gene-level and reveals an unprecedented enrichment in pooled shRNA screens. Comb Chem High Throughput Screen. 2013;16:665–81.
Article CAS PubMed PubMed Central Google Scholar
UROsource. Available from: https://urosource.uroweb.org/resource-centres/ESUR23/255854/abstract. Cited 2023 Dec 19.
Wang E, Aifantis I. RNA splicing and cancer. Trends Cancer. 2020;6:631–44.
Article CAS PubMed Google Scholar
Wang J, Wu M, Huang X, Wang L, Zhang S, Liu H, et al. SynLethDB 2.0: a web-based knowledge graph database on synthetic lethality for novel anticancer drug discovery. Database. 2022;2022. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/database/baac030.
Setton J, Zinda M, Riaz N, Durocher D, Zimmermann M, Koehler M, et al. Synthetic lethality in cancer therapeutics: the next generation. Cancer Discov. 2021;11:1626–35.
Article CAS PubMed PubMed Central Google Scholar
Yersal O, Barutca S. Biological subtypes of breast cancer: prognostic and therapeutic implications. World J Clin Oncol. 2014;5:412–24.
Article PubMed PubMed Central Google Scholar
Costanzo M, VanderSluis B, Koch EN, Baryshnikova A, Pons C, Tan G, et al. A global genetic interaction network maps a wiring diagram of cellular function. Science. 2016;353:aaf1420–aaf1420.
Article PubMed Google Scholar
Feng X, Tang M, Dede M, Su D, Pei G, Jiang D, et al. Genome-wide CRISPR screens using isogenic cells reveal vulnerabilities conferred by loss of tumor suppressors. Sci Adv. 2022;8: eabm6638.
Article CAS PubMed PubMed Central Google Scholar
Thompson NA, Ranzani M, van der Weyden L, Iyer V, Offord V, Droop A, et al. Combinatorial CRISPR screen identifies fitness effects of gene paralogues. Nat Commun. 2021;12:1–11.
Article Google Scholar
Peng X, Wang J, Peng W, Wu F-X, Pan Y. Protein-protein interactions: detection, reliability assessment and applications. Brief Bioinform. 2017;18:798–819.
CAS PubMed Google Scholar
Eldad S, Hertz R, Vainer G, Saada A, Bar-Tana J. Treatment of ErbB2 breast cancer by mitochondrial targeting. Cancer Metab. 2020;8. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s40170-020-00223-8.
Kim CY, Baek S, Cha J, Yang S, Kim E, Marcotte EM, et al. HumanNet v3: an improved database of human gene networks for disease research. Nucleic Acids Res. 2022;50:D632–9.
Article CAS PubMed Google Scholar
Stark C, Breitkreutz B-J, Reguly T, Boucher L, Breitkreutz A, Tyers M. BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 2006;34:D535–9.
Article CAS PubMed Google Scholar
Nair NU, Cheng K, Naddaf L, Sharon E, Pal LR, Rajagopal PS, et al. Cross-species identification of cancer resistance–associated genes that may mediate human cancer risk. Sci Adv. 2022;8. https://doiorg.publicaciones.saludcastillayleon.es/10.1126/sciadv.abj7176.
Wilks C, Zheng SC, Chen FY, Charles R, Solomon B, Ling JP, et al. recount3: summaries and queries for large-scale RNA-seq expression and splicing. Genome Biol. 2021;22:323.
Article CAS PubMed PubMed Central Google Scholar
Kim E, Hart T. Improved analysis of CRISPR fitness screens and reduced off-target effects with the BAGEL2 gene essentiality classifier. Genome Med. 2021;13:2.
Article CAS PubMed PubMed Central Google Scholar
Gallo D, Young JTF, Fourtounis J, Martino G, Álvarez-Quilón A, Bernier C, et al. CCNE1 amplification is synthetic lethal with PKMYT1 kinase inhibition. Nature. 2022;604:749–56.
Article CAS PubMed PubMed Central Google Scholar
Horlbeck MA, Xu A, Wang M, Bennett NK, Park CY, Bogdanoff D, et al. Mapping the genetic landscape of human cells. Cell. 2018;174:953-967.e22.
Article CAS PubMed PubMed Central Google Scholar
Han K, Jeng EE, Hess GT, Morgens DW, Li A, Bassik MC. Synergistic drug combinations for cancer identified in a CRISPR screen for pairwise genetic interactions. Nat Biotechnol. 2017;35:463–74.
Article CAS PubMed PubMed Central Google Scholar
Najm FJ, Strand C, Donovan KF, Hegde M, Sanson KR, Vaimberg EW, et al. Orthologous CRISPR–Cas9 enzymes for combinatorial genetic screens. Nat Biotechnol. 2018;36:179–89.
Article CAS PubMed Google Scholar
Shen JP, Zhao D, Sasik R, Luebeck J, Birmingham A, Bojorquez-Gomez A, et al. Combinatorial CRISPR–Cas9 screens for de novo mapping of genetic interactions. Nat Methods. 2017;14:573–6.
Article CAS PubMed PubMed Central Google Scholar
Chakravarty D, Gao J, Phillips SM, Kundra R, Zhang H, Wang J, et al. OncoKB: a precision oncology knowledge base. JCO Precis Oncol. 2017;2017. https://doiorg.publicaciones.saludcastillayleon.es/10.1200/PO.17.00011.
Sanjana NE, Shalem O, Zhang F. Improved vectors and genome-wide libraries for CRISPR screening. Nat Methods. 2014;11:783–4.
Article CAS PubMed PubMed Central Google Scholar
Pizzato M, Erlwein O, Bonsall D, Kaye S, Muir D, McClure MO. A one-step SYBR green I-based product-enhanced reverse transcriptase assay for the quantitation of retroviruses in cell culture supernatants. J Virol Methods. 2009;156:1–7.
Article CAS PubMed Google Scholar
Cantore T, Laboratory of Computational and Functional Oncology-CIBIO - Unitn. PRODE: neighborhood informed essential and context essential scores. Zenodo; 2025. https://doiorg.publicaciones.saludcastillayleon.es/10.5281/ZENODO.14697551.
Cantore T, Laboratory of Computational and Functional Oncology-CIBIO - Unitn. Code for “PRODE recovers essential and context-essential genes through neighborhood-informed scores”. Zenodo; 2025. https://doiorg.publicaciones.saludcastillayleon.es/10.5281/ZENODO.14026954.

Download references

Acknowledgements

The authors thank Francesco Iorio and Pasquale Rescigno for their constructive input; Francesca Lorenzin and Gian Marco Franceschini for fruitful discussions during the development and implementation of PRODE; and the NGS facility of the CIBIO Department at the University of Trento.

Review history

The review history is available as Additional file 3.

Peer review information

Tim Sands was the primary editor of this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Funding

This project received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement no. 648670; SPICE) to F.D. and from the Italian Ministry of University and Research (FARE Program) to F.D., and was supported by Dipartimenti di Eccellenza 2023–2027 (Legge 232/2016) funded by the Italian Ministry of University and Research.

Author information

Authors and Affiliations

Laboratory of Computational and Functional Oncology, Department of Cellular, Computational, and Integrative Biology, University of Trento, Via Sommarive 9, Trento, 38123, Italy
Thomas Cantore, Paola Gasperini, Riccardo Bevilacqua, Yari Ciani & Francesca Demichelis
Cancer Data Science Laboratory (CDSL), Center for Cancer Research (CCR), National Cancer Institute (NCI), National Institutes of Health (NIH), Bethesda, MD, USA
Sanju Sinha & Eytan Ruppin
Currently at Sanford Burnham Prebys Medical Discovery Institute, San Diego, CA, USA
Sanju Sinha

Authors

Thomas Cantore
View author publications
You can also search for this author inPubMed Google Scholar
Paola Gasperini
View author publications
You can also search for this author inPubMed Google Scholar
Riccardo Bevilacqua
View author publications
You can also search for this author inPubMed Google Scholar
Yari Ciani
View author publications
You can also search for this author inPubMed Google Scholar
Sanju Sinha
View author publications
You can also search for this author inPubMed Google Scholar
Eytan Ruppin
View author publications
You can also search for this author inPubMed Google Scholar
Francesca Demichelis
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

TC developed the PRODE framework. TC and FD conceptualized the work. YC, SS, and ER contributed technical input for PRODE development and performance assessment. PG and RB generated experimental screening and in vitro validation data. FD supervised the work. TC prepared the manuscript draft, and all authors contributed to the final version.

Corresponding author

Correspondence to Francesca Demichelis.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Competing interests

E.R. is (non-paid) member of the scientific advisory boards of Pangea Biomed (divested), GSK Oncology, the WIN consortium, and the ProCan project. E.R. is a founder of MedAware Ltd and Pangea Biomed.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1. PDF document containing supplementary figures

Additional file 2. Excel document collecting supplementary tables

Additional file 3. Peer review history

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Cantore, T., Gasperini, P., Bevilacqua, R. et al. PRODE recovers essential and context-essential genes through neighborhood-informed scores. Genome Biol 26, 42 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13059-025-03501-0

Download citation

Received: 19 March 2024
Accepted: 05 February 2025
Published: 28 February 2025
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13059-025-03501-0

PRODE recovers essential and context-essential genes through neighborhood-informed scores

Abstract

Background

Results

PRODE integrates gene effects and protein–protein interactions (PPIs) to quantify gene essentiality and context-specific essentiality

PRODE robustly identifies essential genes and biological processes across different screening technologies

NIE scores are robust across screening libraries and loss of function datasets

Neighborhood essentiality uncovers functionally redundant essential genes

PRODE recovers essential genes missed by shRNA technology

PRODE recovers experimentally derived synthetic lethal context-essential genes

PRODE identifies OXPHOS genes as context-essential in HER2 + breast cancer

Discussion

Conclusions

Methods

Data collection and curation

PRODE method outline

NIE scores

NICE scores

Validation datasets and performance assessment

Essentiality—validation datasets

Essentiality—performance assessment

Context-essentiality—validation datasets

Context essentiality performance assessment

Gene silencing and crystal violet assays

Genome-wide CRISPR screening on HT1197 cell-line

Assessment of clinical relevance of OXPHOS genes within HER2 + breast cancer tumors

Data availability

References

Acknowledgements

Review history

Peer review information

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Competing interests

Additional information

Publisher’s Note

Supplementary Information

Additional file 1. PDF document containing supplementary figures

Additional file 2. Excel document collecting supplementary tables

Additional file 3. Peer review history

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Genome Biology

Contact us