Considerations in the search for epistasis

Balvert, Marleen; Cooper-Knock, Johnathan; Stamp, Julian; Byrne, Ross P.; Mourragui, Soufiane; van Gils, Juami; Benonisdottir, Stefania; Schlüter, Johannes; Kenna, Kevin; Abeln, Sanne; Iacoangeli, Alfredo; Daub, Joséphine T.; Browning, Brian L.; Taş, Gizem; Hu, Jiajing; Wang, Yan; Alhathli, Elham; Harvey, Calum; Pianesi, Luna; Schulte, Sara C.; González-Domínguez, Jorge; Garrisson, Erik; Snyder, Michael P.; Schönhuth, Alexander; Sng, Letitia M. F.; Twine, Natalie A.

doi:10.1186/s13059-024-03427-z

Review
Open access
Published: 19 November 2024

Considerations in the search for epistasis

Marleen Balvert¹^na1,
Johnathan Cooper-Knock ORCID: orcid.org/0000-0002-0873-8689²^na1,
Julian Stamp³,
Ross P. Byrne⁴,
Soufiane Mourragui⁵,
Juami van Gils⁶,
Stefania Benonisdottir^7,19,
Johannes Schlüter⁸,
Kevin Kenna⁹,
Sanne Abeln²¹,
Alfredo Iacoangeli^10,11,12,
Joséphine T. Daub²¹,
Brian L. Browning¹³,
Gizem Taş^1,9,
Jiajing Hu¹⁰,
Yan Wang⁹,
Elham Alhathli²,
Calum Harvey²,
Luna Pianesi⁸,
Sara C. Schulte¹⁴,
Jorge González-Domínguez¹⁵,
Erik Garrisson¹⁶,
Lorentz workshop on epistasis,
Michael P. Snyder¹⁷,
Alexander Schönhuth⁸^na2,
Letitia M. F. Sng¹⁸^na2 &
…
Natalie A. Twine¹⁸^na2

Genome Biology volume 25, Article number: 296 (2024) Cite this article

2655 Accesses
6 Altmetric
Metrics details

An Author Correction to this article was published on 20 January 2025

This article has been updated

Abstract

Epistasis refers to changes in the effect on phenotype of a unit of genetic information, such as a single nucleotide polymorphism or a gene, dependent on the context of other genetic units. Such interactions are both biologically plausible and good candidates to explain observations which are not fully explained by an additive heritability model. However, the search for epistasis has so far largely failed to recover this missing heritability. We identify key challenges and propose that future works need to leverage idealized systems, known biology and even previously identified epistatic interactions, in order to guide the search for new interactions.

Introduction

Epistasis refers to changes in the effect of a unit of genetic information (such as a single nucleotide polymorphism or a gene) on a phenotype, dependent on the context of other genetic units. Such interactions are biologically plausible and offer a potential explanation for phenomena not fully accounted for by an additive heritability model. Heritability is a measure of the extent to which phenotypic variation is genetically determined. Broad-sense heritability refers to heritability measured by comparison of concordance rates for phenotype between monozygotic and dizygotic twins who share 100% or 50% of their genetics, respectively [1]. Missing heritability commonly refers to the gap between measured broad-sense heritability and heritability calculated by adding together the individual contributions of phenotype-associated SNPs genomewide (i.e., narrow-sense heritability). Missing heritability is important because it implies that we have an incomplete understanding of the genetic basis of health and disease. A number of possibilities could explain this missing heritability, including gene-environment interactions. Epistatic interactions are another candidate to explain a proportion of missing heritability but an alternative explanation is that current knowledge is simply missing the statistical power to discover all important additive effects. However, there is good observational evidence for epistasis, for example, from large-scale screens in yeast studying the effect of combinations of individual gene knockouts [2, 3].

A meta-analysis of twin studies concluded that for 69% of traits the data was consistent with an additive model whereby monozygotic twin correlations were almost exactly double dizygotic twin correlations [4]. However, even this study provides evidence for non-additive genetic effects in a subset of traits. For traits such as depressive disorder, hyperkinetic disorders, and atopic dermatitis, the authors observed monozygotic twin correlations which were greater than double the dizygotic twin correlations, consistent with a non-additive genetic effect. Moreover, even observations consistent with an additive model are not equivalent to actually demonstrating an additive model and the presence of an additive model does not necessarily rule out the possibility of an underlying epistatic model. Interestingly, the effect sizes of a majority of SNPs vary between genetic backgrounds [5], suggesting the presence of interactions between the genetic background and the SNP. Finally, in simulations of epistasis, additive models used to measure narrow-sense heritability fail to account for non-linear interactions between genetic variants and thus dramatically underestimate true heritability [6].

The problem is that previous searches for epistasis have so far largely failed to recover missing heritability [7]. Various computational approaches using statistics, combinatorics, and machine learning have been applied to try and detect epistasis. Each of these approaches try to address the issue of identifying relevant potential epistatic interactions from an enormous search space, either by enumerating all possibilities or by finding an efficient way to move through the search space. Consideration of epistasis inherently leads to a combinatorial explosion: the number of potential interactions increases exponentially with the number of genetic characteristics involved in each interaction.

During a workshop entitled “A multidisciplinary approach to epistasis detection,” held at the Lorentz Center in The Netherlands in July 2023, 41 experts on epistasis detection from a variety of fields came together. Through interactions and discussions, we identified challenges that need to be addressed in order to advance epistasis detection. We consider the central combinatorial challenge of epistasis identification through two perspectives: statistical and mathematical approaches to case–control studies versus leveraging biological knowledge and models (Fig. 1). Each of the two perspectives is addressed through three subtopics. For the statistical and mathematical perspective, we start by reviewing specific problems with popular model assumptions and pose the question of whether it is possible to avoid assuming any mathematical form. Next, we discuss the potential of novel generative AI models for the analysis of case–control cohort data. Third, we show empirically the importance of accounting for population structure in case–control cohort studies, which unfortunately is often overlooked. In the second half of this review, we discuss biological observations of epistasis. We start with the idea that search for epistasis should always start with biological models. Second, we discuss whether one should consider inter- and intragenic epistasis separately. Finally, we propose the use-case for a “database of epistasis” and provide guidelines for the characteristics that such a database should have.

What assumptions of epistasis are being made and what are their implications?

Epistasis is a natural expectation of a complex system, but the search for epistasis is challenging primarily due to the combinatorial explosion of possibilities. In this section, we delve into the assumptions, mathematical or otherwise, that underpin current methods of epistasis detection and posit the use of state-of-the-art machine learning approaches in a new generation of data-driven epistasis detection methods. Many existing approaches have been recently reviewed [8]; here we extend this analysis by considering the conceptual limitations of current works and more novel approaches.

Generalizing the functional form of epistasis

If epistasis is taken in its statistical sense as the deviation from the additive/linear baseline, then all other terms—namely quadratic and higher order interactions—are epistasis [9]. The relation between genotype and phenotype can then be represented as a function that maps a discrete sequence space onto one or more binary- or real-valued traits. Extending the formulation used in [10], a phenotype impacted by epistasis can be formulated mathematically as:

$$y=\sum_{a\in A}\beta_{\alpha\left(a\right)}\prod_{i\in\left(1,\cdots,N\right)}x_i^{a_i},$$

(1)

where $N$ is the total number of SNPs in the data, ${x}_{i}$ encode the SNP information (e.g., allelic dosage), y symbolizes the phenotype, and

$$A:=\{a\in \{\text{0,1}{\}}^{N}: {1}^{T}a\le d\}$$

with $d$ the order of the highest-order interaction. The parameter $d$ allows one to choose a maximum order for the epistatic interaction, which can be at most $N$. ${\beta }_{\alpha (a)}$ are the parameters to be estimated denoting the magnitude of the epistatic effect of the variants corresponding to the vector $a$, where $\alpha (a)$ is the index corresponding to the vector $a$ if one were to order all elements of $A$. The vector $a$ thus indicates which variants are included in the $\alpha (a)$ th interaction. For example, in the case of $N=3$ and $d=2$ this would give:

$$\begin{aligned} y & = {\upbeta }_{0}{x}_{1}^{0}{x}_{2}^{0}{x}_{3}^{0}+ {\upbeta }_{1}{x}_{1}^{1}{x}_{2}^{0}{x}_{3}^{0} + {\upbeta }_{2}{x}_{1}^{0}{x}_{2}^{1}{x}_{3}^{0} + {\upbeta }_{3}{x}_{1}^{0}{x}_{2}^{0}{x}_{3}^{1} + {\upbeta }_{4}{x}_{1}^{1}{x}_{2}^{1}{x}_{3}^{0}+ {\upbeta }_{5}{x}_{1}^{1}{x}_{2}^{0}{x}_{3}^{1}+ {\upbeta }_{6}{x}_{1}^{0}{x}_{2}^{1}{x}_{3}^{1}\\ &= {\upbeta }_{0}+ {\upbeta }_{1}{x}_{1}+ {\upbeta }_{2}{x}_{2}+ {\upbeta }_{3}{x}_{3}+ {\upbeta }_{4}{x}_{1}{x}_{2}+ {\upbeta }_{5}{x}_{1}{x}_{3}+ {\upbeta }_{6}{x}_{2}{x}_{3}\cdot\end{aligned}$$

(2)

Note that since $d=2$, the interaction between all three variants is not included.

In other words, epistasis is the combined effect of any combination of SNPs up to a certain order of magnitude. For binary traits, one can apply the logit function to the right-hand side of (1). Note that explicitly using formulation (1) leads to a combinatorial explosion in the number of terms and hence parameters to be estimated as the number of SNPs and the degree $d$ increase, hence explicitly estimating the effect of all interaction terms is infeasible.

Any function can be represented as a series expansion with the commonly used Taylor and Fourier series expansion [11]. The difference between the two representations for modeling epistasis is the reference frame. The Taylor series uses the wild type as reference to quantify epistatic interactions and in the Fourier series epistatic effects are averages over all backgrounds [12]. With epistasis, a wide body of literature suggests that many different mathematical formulations can be linked using the weighted Walsh-Hadamard transform [12].

Models that identify epistatic interactions from genotype data nearly always make assumptions on the form of the epistatic relationship (Table 1). There is a gradient in current approaches of epistasis detection: from models assuming a specific form of epistasis (e.g., BOOST [13], BitEpi [14], Fiuncho [15], IRELAND [16, 17], MDR [17]) to models that learn an epistatic relationship of any form (e.g., [12, 18,19,20]). These assumptions are described in the paragraphs below.

Table 1 Table summarizing the assumptions made by various methodological approaches and tools for detecting epistasis

Full size table

Approaches that assume a specific form of epistatic interaction, for example, pairwise, triplet, or quadruplet interactions, are often easier to understand and can provide directly interpretable outcomes. However, they still suffer from the combinatorial explosion if there is no constraint on the type and number of interacting variants. Consequently, several methods focus on two-way interactions only [13, 21, 22] while other more exhaustive search methods are limited by the computational complexity of the approach and often do not go beyond four-way interactions [14, 15, 23, 24]. Fiuncho and IRELAND do go beyond four-way interactions [14,15,16,17, 23, 24], though they are limited in the number of SNPs they can analyze simultaneously. Whether going beyond four-way interactions is clinically relevant and can be validated beyond statistical evidence remains an open question. It is, however, biologically plausible that many SNPs are involved in the same epistatic interaction [25, 26].

On the other hand, freeform approaches such as deep neural networks (DNNs) [19, 20, 27, 28], as supported by mathematical theorems (the universal approximation theorem, e.g., [29, 30]), can approximate arbitrary functional relationships, thereby in theory they can avoid the requirement to impose any assumption in terms of functional relationships driving epistasis. This makes them more flexible and less prone to computational limits. In practice, however, these approaches require additional steps, perhaps yet to be developed, to not only implicitly capture but also provide an explicit description of the epistatic interaction [18]. Because DNNs tend to use a large number of input variables for phenotype prediction, they arguably assume a highly polygenic or even omnigenic trait [29] in practice.

Many classical machine learning (ML) approaches sit between these two extremes, such as decision tree ensembles, i.e., random forest [31,32,33,34,35], boosting [36], and support vector machines [37,38,39,40]. These approaches make mild assumptions on the functional form of the epistatic interaction, allowing them to deal with higher-order interactions. For random forests, these assumptions include that SNPs forming interactions must be independent of each other. Random forests also assume that the relationship between epistasis and genetic variants can be described as a combination of decision trees, while support vector machines assume that one can separate cases from controls by using a hyperplane in the (transformed) variant space.

The standard formulations of epistasis presented above link genotype to phenotype by means of statistical models, machine learning approaches, or combinatorics, all based on large datasets. However, the joint probability structure of the dataset is never explicitly exploited during inference. Modelization and subsequent dissection of the joint probability between genotypic features, and between genotypic features and a phenotype of interest, offer another approach to study epistasis.

Using generative approaches to model and explore epistasis

Apart from the classical ML approaches aforementioned, recent algorithmic and computational advances have offered insight into the potential of deep generative models in genetics [41, 42]. Inspired by many successful applications in other scientific fields [43,44,45], we envision that leveraging generative approaches could offer a transformative approach to identify genetic interactions. Note that, just as for DNN based classifiers, universal approximation theorems for probability distributions [46] support the utmost flexibility of generative deep learning approaches. A large part of the advantage of this approach rests on the ability to perturb a latent space representation of genetic interactions and make observations regarding the effect on phenotypes: in effect providing an experimental system with a tractable number of variables. We explain and explore this idea below.

A deep generative model aims to construct a condensed representation of the genetic information that accurately describes the distribution of genetic variance in the population from which observed genetic data is sampled. That is, the generative model learns how to represent an individual’s genetic information in a condensed manner with minimal loss of information, meaning that it can reconstruct the original genetic information from this condensed representation with high accuracy. Deep generative models are typically composed of three components: the encoder, the latent space, and the decoder (Fig. 2). Firstly, the encoder maps a sample to a space of much lower dimensionality. This so-called latent space offers an intriguing property—it is continuous, unlike the binary nature of genetic profiles (wild-type or mutated). Finally, the decoder takes a point within this latent space, whether it originates from the encoder or is chosen randomly, and reconstructs the corresponding sample. Note that this sample could be non-existent if the point in the latent space is chosen randomly. Although deep generative models come in various styles [47,48,49], the roles of the encoder and decoder may differ, but they all share these fundamental characteristics.

We foresee three kinds of applications that require the development of models with a disentangled latent space [48, 50]. Firstly, extrapolating from protein structure work, we expect that well-designed models will exhibit emergent information in the latent space [51]. In more concrete detail, using interpretable labels (such as diseased/non-diseased), one trains the encoder to map differently labeled data to distant parts of the latent space, while placing identically labeled data near to each other. Such training procedures can be implemented by the integration of contrastive loss functions [52] into the training of the encoder-decoder architecture. As per their definition, contrastive loss functions are exactly the drivers that keep similar things together, while keeping separate things apart when embedding data into latent space. As a result, the interpretability of the model obtained by an appropriately structured continuously valued latent space could be instrumental in increasing the power of standard analysis (Fig. 2, Interpretability). For example, linking parts of the latent space to known phenotypes (e.g. disease risk) could aid in identifying new disease-risk regions. This expansion of the available dataset would enhance the power of standard epistasis detection analyses. A second, more direct application of such a model involves using it as an “oracle” that provides quantitative insights into the perturbation caused by a pair of genetic alterations (Fig. 2, Perturbation). For instance, given two alleles A and B at different loci, one could measure the perturbation in the latent space induced by each mutation and compare it to the perturbation in the latent space caused by A and B combined. If the combined mutation leads to the same perturbation as the two individual combinations together, then there is no indication of epistasis, else there is. Lastly, the model can be used in a more exploratory manner through the design of optimization routines (Fig. 2, Optimization). Using the decoder’s gradients enables the identification of genetic pairs that lead to maximal perturbations in the latent space, indicating interaction within these pairs and hence identifying potential epistatic interactions. These three directions, far from being exhaustive, showcase the potential of deep generative models in the detection of epistatic interactions. Since the use of these methods in genomic applications is still in its infancy but highly promising, extensive further research along these lines is necessary.

Population structure confounds regression-based epistasis detection

In addition to the assumptions on the form of the epistatic interaction, there are underlying assumptions of genetic data that should inform epistasis detection models, particularly linkage disequilibrium (LD). However, many epistasis detection datasets and tools fail to account for LD structures which means they will be particularly vulnerable to population mismatch. Here we include a detailed consideration of this failure within epistasis detection and how this could be addressed.

Events in human evolutionary history such as migration and admixture [53] are reflected in differences in allele frequencies (AFs) between different populations [54]. The concept of genetic populations is a simplified description of these genetic patterns [55]. The differences in AFs between populations are called population structure and have been described as a confounding factor in genome-wide association studies (GWAS) [56,57,58,59]. In a naive association test, the samples are modeled as independent, an assumption that cannot hold when there are such systematic genetic trends within the data. Epistasis analysis, similar to GWAS, is vulnerable to confounding from population structure, which if uncorrected, can result in substantial p value inflation and false positives in analyses with no true epistatic interactions (Fig. 3). Furthermore, previous research has shown that a slight change in the AF of a SNP results in a substantial decrease in power to replicate the main effects of said SNP when there is an underlying epistatic model [60]. Detection and correction of population structure are thus of core importance to the study of epistasis. We propose that solutions to this problem can be informed by common practices from GWAS analysis.

In GWAS, there are two main approaches to correcting for population structure. The first includes principal components of genetic similarity as additional covariates in a linear model. Principal component analysis (PCA) aims to explain the variance–covariance structure of a high-dimensional data set with a relatively small number of linear combinations of the original variables [61]. The first few principal components of genetic data often capture population structure and are suitable covariates for correcting this source of confounding [62]. The second includes a random effect that is informed by the genetic covariance between samples in a linear mixed model (LMM) approach. By including a random effect that covaries with the genetic similarity, the samples are no longer modeled as independent. This method, while computationally more costly, is able to account for population structure without overfitting; in the presence of cryptic relatedness LMMs outperform principal component-based correction methods [57, 63].

Analogous methods to those used in GWAS have been adopted to account for population structure in some epistasis detection approaches, including methods adopting PCA correction (e.g., MBMDR-PC [64]) and LMMs (e.g., REMMA-epi [65] and FaST-LMM-epi [66, 67]). Indeed, LMM approaches have been shown to produce significantly lower statistical inflation than PLINK’s pairwise epistasis method [65, 68]. Surprisingly, several commonly used methods for epistasis detection including PLINK epistasis and BOOST [13] do not have a built-in option for including covariates or otherwise correcting for population structure.

Our simulation analysis suggests that simply ignoring population structure in these cases is unwise and would lead to substantial statistical inflation and false positives (Fig. 3, Additional file 1: Supplementary Methods). Here we simulated traits with no true epistatic effects (only additive effects) in a structured population and performed plink epistasis detection to evaluate the impact of population structure on the resulting test statistics. We expect that if epistasis tests were inherently robust to confounding from population structure there would be no significant hits or p value inflation, as no epistasis was simulated. For population structure correction, the phenotype was adjusted using multiple linear regression on the first 20 PCs prior to analysis. The simulation code is open source at https://github.com/jdstamp/leiden_paper. QQ plots of our simulations show evidence of statistical inflation and large numbers of false positives only in the simulations with no correction for population structure (Fig. 3, “unadjusted” panels on right), indicating that population structure can confound epistasis detection methods, while analyses corrected for structure were well-controlled (Fig. 3, “PCA adjusted” panels on left). We thus recommend that researchers using methods without built-in population structure correction for epistasis analysis address population structure, for example, by first adjusting their phenotype using principal components (taking the residuals from a multiple regression), or alternatively using a suitable LMM approach.

Leveraging biology in the search for epistasis

The first part of this review focused on using statistical and mathematical approaches to identify epistasis from data. In this second part, we focus on if and how biological information can be leveraged to look for epistasis. We pose two questions surrounding the use of model systems and intergenic versus intragenic mechanisms in the search for epistasis, followed by a discussion on the usefulness of a database of epistasis.

Should a search for epistasis start with biological observations?

We assert that conclusive evidence for the role of epistasis in determining disease heritability has come from interactions that have been identified in large case–control cohorts and model systems such as cell lines and organisms. We hypothesize that true epistatic interactions will be observable in both model systems and case–control cohorts. However, false positive epistatic interactions may be more likely in case–control datasets where, for example, population structure is imperfectly matched. On the other hand, the potential challenge with model systems is knowing whether the readout and the cell/tissue context are a correct approximation of disease. Indeed, a model system may suggest an epistatic interaction which is specific to the genetic background of the model organism and may not be important in an outbred population.

Many double mutant genetic knockout screens have been performed, both in model organisms and human cell lines [69]. The problem with the use of organisms for epistasis detection is, again, the size of the search space. Extensive large-scale screens have been performed in yeast, focusing on large-scale characterization of cells with combinations of two or more individual gene knockouts or temperature alleles [2, 70, 71]. Such screens have proved useful, for example, in delineating biological pathways containing genes with similar interaction profiles. However, they have not, as yet, provided the scale necessary for an exhaustive search for epistasis. Indeed higher organisms such as mice are not at all suitable for large-scale screens due to the practical undertaking involved in exhaustive characterization.

Unlike organisms, cells are more tractable. In mammalian cells, many double mutant screens have been performed, mostly in the context of cancer where gene knockouts have important therapeutic implications since gene knockouts can represent drug-targeting conditions. For example, in one recent study double mutant screens were performed for ~ 34,937 gene pairs in MCF-10A breast cell lines, and their effects on tumor growth were examined in mice [72]. Statistically significant gene pairs were identified and grouped into interacting modules. Interestingly, the genes within a group exhibited epistatic effects on gene expression of other group members. Overall, this study revealed the gene interaction network of tumor growth and has important implications for therapeutic strategies. Another recent work examined 1191 putative functional gene pairs and/or paralogs in human melanoma lines and identified 109 pairs that affected fitness [73]. An important consideration in a biological experiment is context, for example, epistatic interactions may only be apparent in a specific environment; when that environment is the presence of a particular toxin or therapeutic, this observation can be used to identify epistatic interactions which have the potential to guide personalized medicine [74].

An important and well understood biological consideration is the separation between intergenic and intragenic mechanisms of epistasis. There is good evidence for intragenic interactions such as haplotypes associating with altered gene expression depleted for deleterious coding alleles [75]. Similarly, there is evidence for intergenic epistasis, particularly between genes of similar function [76], where an established example concerns mutations within different hemoglobin beta-chains [77]. Intergenic epistasis across different functional pathways also tends to be the result of compensatory adaptation [78] and typically genes within the same pathway have a similar interaction profile [70]. The problem with separating intergenic and intragenic epistasis is that both are combined in real-world biological systems. To circumvent this, we suggest a stepwise approach where intergenic epistasis is analyzed and identified before searching for intragenic epistasis within each intergenic interaction.

An alternative framing is that the search for epistasis should prioritise variants where biological evidence for an effect is provided by a case-control cohort instead of a model system. If there is an epistatic interaction, we might expect to be able to measure the association between phenotype and genotype even with only one of the involved variants. Intuitively, this will depend on the frequency of the alleles in question, the size of the study population, and the effect size of the genetic variants. If true, then we should be able to use prioritization methods based on independent models (i.e. an additive model) to reduce the search space size for epistasis. One study has already applied this principle [21] where the search for epistasis was focused on additional genetic variants which increased the effect of another, significant in isolation, genetic variant. Their results are promising, showing that in several datasets this method outperforms GBOOST and Lasso. A future extension for this approach might be to use symbolic regression [79, 80] to detect epistasis between genetic variants with nominal significance and to determine the mathematical formulation of the relationship without a pre-specification.

A database of epistasis

An obvious approach to leveraging known information in the search for epistasis is to use the literature of known epistatic interactions. There have been efforts to collect large amounts of epistasis data in one platform. For example, using pre-selected gene-specific transcription factors in Saccharomyces cerevisiae [3], SynLetDB provides a database specifically for synthetic lethality cases. However, most studies report genome-wide epistasis which are restricted to one organism or are phenotype-specific such as Alzheimer’s disease [81, 82] or in cancer [83], and are not collated and standardized into a single database. A database that does exist for epistasis across multiple diseases is driven by a single methodological approach (https://epistasis-disease-atlas.com, [84]), therefore is limited in the types of epistatic interactions it contains. Furthermore, it remains difficult for researchers to reuse epistasis data because of the different definitions of epistasis, and different experimental and computational techniques used to identify epistasis.

We argue that although studies on epistasis can be highly diverse, there is a core set of data and metadata that can and should be reported for all studies to be able to effectively leverage known epistatic interactions. This set of minimum reporting standards could be based on other guidelines available such as MIAME/MINSEQE [85] that are used in sequence-based platforms, for example, the EGA (https://web2.ega-archive.org/) or GEO (www.ncbi.nlm.nih.gov/geo/), and should include metadata per study, per sample, and per interaction, as outlined in Table 2. If all studies publishing epistasis information adhere to the same set of minimal reporting standards, referencing and using known epistasis would become much simpler. Additionally, it would facilitate the collection of epistasis information into one large database, which would benefit many researchers in the field of epistasis. Because epistasis is such a complex phenomenon, generating a database would be helpful to explore available data or validate new results (Fig. 4B). Additionally, a database could identify genes of interest, or other studies using a specific method of epistasis detection.

Table 2 The set of minimum reporting standards for a database of epistasis

Full size table

It is important to consider which types of researchers may use a database of epistasis. For example, researchers studying a certain gene or SNP may want to query that gene/SNP to get an overview of potential epistatic interactions, while studies identifying epistatic interactions may query specific interactions to validate their findings, or search for orthogonal sources of validation, e.g., knock-out studies for specific interactions demonstrating a phenotypic effect or reduced expression level at the mRNA or protein level. On the other hand, researchers focusing on a specific disease or other phenotype may query that phenotype to find any associated interactions.

However, because epistasis is such a diverse phenomenon, creating a comprehensive database to capture this information poses several challenges. For example, it would be ideal for a database to contain curated positive (i.e., epistatic interaction occurs) and negative (i.e., epistatic interaction does not occur) cases. A further complication is that computational methods use a range of measures of confidence for epistatic interactions (e.g., statistical significance (p value/FDR), feature weight, knock-out experiments), and thus defining and standardizing positive and negative epistatic interaction is not straightforward. Likewise, these computational methods use different approaches (e.g., neural nets, random forests, regression), and thus have different metrics to detect epistasis, which may not be directly comparable (Fig. 4A). Hence, the provenance of each epistatic interaction, as suggested in Table 2, is essential to filter data and find epistatic interactions with multiple sources of evidence.

Conclusion

The search for epistatic interactions which influence disease heritability is challenging but essential to facilitate effective personalized medicine. We have outlined some of the challenges, particularly the intractability of modeling epistasis. While we have suggested some modeling approaches that may be fruitful in the future, we have also considered real steps that could be used to improve models by integrating known biology. In particular, we suggest that all models of epistasis in the future should include an explicit correction for population structure and we show the importance of this through simulation. We make the case for a database of epistasis to bring together what is already known in the hope that this is a firmer foundation for discovery than strict model definitions, which may or may not be representative. Using generative models may be a way in which these biological observations can be summarized in a meaningful fashion.

Data availability

The code for the simulated data is available at Github [86] and Zenodo [87] under an MIT license.

Change history

20 January 2025
A Correction to this paper has been published: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13059-025-03477-x

References

Verweij KJH, Yang J, Lahti J, Veijola J, Hintsanen M, Pulkki-Råback L, et al. Maintenance of genetic variation in human personality: testing evolutionary models by estimating heritability due to common causal variants and investigating the effect of distant inbreeding. Evolution. 2012;66:3238–51.
PubMed PubMed Central Google Scholar
Segrè D, Deluna A, Church GM, Kishony R. Modular epistasis in yeast metabolism. Nat Genet. 2005;37:77–83.
PubMed Google Scholar
Sameith K, Amini S, Groot Koerkamp MJA, van Leenen D, Brok M, Brabers N, et al. A high-resolution gene expression atlas of epistasis between gene-specific transcription factors exposes potential mechanisms for genetic interactions. BMC Biol. 2015;13:112.
PubMed PubMed Central Google Scholar
Polderman TJC, Benyamin B, de Leeuw CA, Sullivan PF, van Bochoven A, Visscher PM, et al. Meta-analysis of the heritability of human traits based on fifty years of twin studies. Nat Genet. 2015;47:702–9.
PubMed CAS Google Scholar
Patel RA, Musharoff SA, Spence JP, Pimentel H, Tcheandjieu C, Mostafavi H, et al. Genetic interactions drive heterogeneity in causal variant effect sizes for gene expression and complex traits. Am J Hum Genet. 2022;109:1286–97.
PubMed PubMed Central CAS Google Scholar
Li J, Li X, Zhang S, Snyder M. Gene-environment interaction in the era of precision medicine. Cell. 2019;177:38–44.
PubMed PubMed Central CAS Google Scholar
Mackay TF, Moore JH. Why epistasis is important for tackling complex human disease genetics. Genome Med. 2014;6:124.
PubMed Google Scholar
Russ D, Williams JA, Cardoso VR, Bravo-Merodio L, Pendleton SC, Aziz F, et al. Evaluating the detection ability of a range of epistasis detection methods on simulated data for pure and impure epistatic models. PLoS ONE. 2022;17:e0263390.
PubMed PubMed Central CAS Google Scholar
Mäki-Tanila A, Hill WG. Influence of gene interaction on complex trait variation with multilocus models. Genetics. 2014;198:355–67.
PubMed PubMed Central Google Scholar
Cordell HJ. Epistasis: what it means, what it doesn’t mean, and statistical methods to detect it in humans. Hum Mol Genet. 2002;11:2463–8.
PubMed CAS Google Scholar
Epistasis and evolution. Evolutionary biology. Oxford University Press; 2021. Available from: https://oxfordbibliographies.com/view/document/obo-9780199941728/obo-9780199941728-0137.xml.
Poelwijk FJ, Krishna V, Ranganathan R. The context-dependence of mutations: a linkage of formalisms. PLoS Comput Biol. 2016;12:e1004771.
PubMed PubMed Central Google Scholar
Wan X, Yang C, Yang Q, Xue H, Fan X, Tang NLS, et al. BOOST: a fast approach to detecting gene-gene interactions in genome-wide case-control studies. Am J Hum Genet. 2010;87:325–40.
PubMed PubMed Central CAS Google Scholar
Bayat A, Hosking B, Jain Y, Hosking C, Kodikara M, Reti D, et al. Fast and accurate exhaustive higher-order epistasis search with BitEpi. Sci Rep. 2021;11:1–12.
Google Scholar
Ponte-Fernández C, González-Domínguez J, Martín MJ. Fiuncho: a program for any-order epistasis detection in CPU clusters. J Supercomput. 2022;78:15338–57.
Google Scholar
Balvert M. Iterative rule extension for logic analysis of data: an MILP-based heuristic to derive interpretable binary classifiers from large data sets. INFORMS J Comput. 2024. Available from: https://doiorg.publicaciones.saludcastillayleon.es/10.1287/ijoc.2021.0284.
Pattin KA, White BC, Barney N, Gui J, Nelson HH, Kelsey KT, et al. A computationally efficient hypothesis testing method for epistasis analysis using multifactor dimensionality reduction. Genet Epidemiol. 2009;33:87–94.
PubMed PubMed Central Google Scholar
Aghazadeh A, Nisonoff H, Ocal O, Brookes DH, Huang Y, Koyluoglu OO, et al. Epistatic Net allows the sparse spectral regularization of deep neural networks for inferring fitness functions. Nat Commun. 2021;12:5225.
PubMed PubMed Central CAS Google Scholar
Motsinger-Reif AA, Fanelli TJ, Davis AC, Ritchie MD. Power of grammatical evolution neural networks to detect gene-gene interactions in the presence of error. BMC Res Notes. 2008;1:65.
PubMed PubMed Central Google Scholar
Li X, Liu L, Zhou J, Wang C. Heterogeneity analysis and diagnosis of complex diseases based on deep learning method. Sci Rep. 2018;8:6155.
PubMed PubMed Central Google Scholar
Slim L, Chatelain C, Azencott C-A, Vert J-P. Novel methods for epistasis detection in genome-wide association studies. PLoS ONE. 2020;15:e0242927.
PubMed PubMed Central CAS Google Scholar
Chang YC, Wu JT, Hong MY, Tung YA, Hsieh PH, Yee SW, et al. GenEpi: gene-based epistasis discovery using machine learning. BMC Bioinformatics. 2020;21:68.
PubMed PubMed Central Google Scholar
Knijnenburg TA, Klau GW, Iorio F, Garnett MJ, McDermott U, Shmulevich I, et al. Logic models to predict continuous outputs based on binary inputs with an application to personalized cancer therapy. Sci Rep. 2016;6:1–14.
Google Scholar
Sun Y, Gu Y, Ren Q, Li Y, Shang J, Liu JX, et al. MDSN: a module detection method for identifying high-order epistatic interactions. Genes. 2022;13. Available from: https://doiorg.publicaciones.saludcastillayleon.es/10.3390/genes13122403.
Weinreich DM, Lan Y, Jaffe J, Heckendorn RB. The influence of higher-order epistasis on biological fitness landscape topography. J Stat Phys. 2018;172:208–25.
PubMed PubMed Central Google Scholar
Weinreich DM, Lan Y, Wylie CS, Heckendorn RB. Should evolutionary geneticists worry about higher-order epistasis? Curr Opin Genet Dev. 2013;23:700–7.
PubMed PubMed Central CAS Google Scholar
Beam AL, Motsinger-Reif A, Doyle J. Bayesian neural networks for detecting epistasis in genetic association studies. BMC Bioinformatics. 2014;15:368.
PubMed PubMed Central Google Scholar
Cui T, El Mekkaoui K, Reinvall J, Havulinna AS, Marttinen P, Kaski S. Gene-gene interaction detection with deep learning. Commun Biol. 2022;5:1238.
PubMed PubMed Central Google Scholar
Cybenko G. Approximation by superpositions of a sigmoidal function. Math Control Signals Systems. 1992;5:455–455.
Google Scholar
Hornik K, Stinchcombe M, White H. Multilayer feedforward networks are universal approximators. Neural Netw. 1989;2:359–66.
Google Scholar
Lunetta KL, Hayward LB, Segal J, Van Eerdewegh P. Screening large-scale association study data: exploiting interactions using random forests. BMC Genet. 2004;5:32.
PubMed PubMed Central Google Scholar
Jiang R, Tang W, Wu X, Fu W. A random forest approach to the detection of epistatic interactions in case-control studies. BMC Bioinformatics. 2009;10(Suppl 1):S65.
PubMed PubMed Central Google Scholar
Yoshida M, Koike A. SNPInterForest: a new method for detecting epistatic interactions. BMC Bioinformatics. 2011;12:469.
PubMed PubMed Central Google Scholar
Botta V, Louppe G, Geurts P, Wehenkel L. Exploiting SNP correlations within random forest for genome-wide association studies. PLoS ONE. 2014;9:e93379.
PubMed PubMed Central Google Scholar
Holliday JA, Wang T, Aitken S. Predicting adaptive phenotypes from multilocus genotypes in Sitka spruce (Picea sitchensis) using random forest. G3. 2012;2:1085–93.
PubMed PubMed Central CAS Google Scholar
Li J, Horstman B, Chen Y. Detecting epistatic effects in association studies at a genomic level based on an ensemble approach. Bioinformatics. 2011;27:i222–9.
PubMed PubMed Central CAS Google Scholar
Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using support vector machines. Mach Learn. 2002;46:389–422.
Google Scholar
Chen S-H, Sun J, Dimitrov L, Turner AR, Adams TS, Meyers DA, et al. A support vector machine approach for detecting gene-gene interaction. Genet Epidemiol. 2008;32:152–67.
PubMed Google Scholar
Shen Y, Liu Z, Ott J. Support vector machines with L1 penalty for detecting gene-gene interactions. Int J Data Min Bioinform. 2012;6:463–70.
PubMed Google Scholar
Saha S, Perrin L, Röder L, Brun C, Spinelli L. Epi-MEIF: detecting higher order epistatic interactions for complex traits using mixed effect conditional inference forests. Nucleic Acids Res. 2022;50:e114.
PubMed PubMed Central CAS Google Scholar
Cheng J, Novati G, Pan J, Bycroft C, Žemgulytė A, Applebaum T, et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science. 2023;381:eadg7492.
PubMed CAS Google Scholar
Frazer J, Notin P, Dias M, Gomez A, Min JK, Brock K, et al. Disease variant prediction with deep generative models of evolutionary data. Nature. 2021;599:91–5.
PubMed CAS Google Scholar
Isacchini G, Walczak AM, Mora T, Nourmohammad A. Deep generative selection models of T and B cell receptor repertoires with soNNia. Proc Natl Acad Sci U S A. 2021;118. Available from: https://doiorg.publicaciones.saludcastillayleon.es/10.1073/pnas.2023141118.
Lopez R, Regier J, Cole MB, Jordan MI, Yosef N. Deep generative modeling for single-cell transcriptomics. Nat Methods. 2018;15:1053–8.
PubMed PubMed Central CAS Google Scholar
Samanta B, De A, Jana G, Gomez V, Chattaraj PK, Ganguly N, et al. NEVAE: a deep generative model for molecular graphs. J Mach Learn Res. 2020;21:4556–88.
Google Scholar
Lu Y, Lu J. A universal approximation theorem of deep neural networks for expressing probability distributions. arXiv [cs.LG]. 2020. Available from: http://arxiv.org/abs/2004.08867.
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. Generative adversarial nets. Adv Neural Inf Process Syst. 2014;27. Available from: https://proceedings.neurips.cc/paper_files/paper/2014/hash/5ca3e9b122f61f8f06494c97b1afccf3-Abstract.html.
Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models. Adv Neural Inf Process Syst. 2020;33:6840–51.
Google Scholar
Kingma DP, Welling M. Auto-encoding variational Bayes. arXiv [stat.ML]. 2013. Available from: http://arxiv.org/abs/1312.6114v11.
Liu K, Cao G, Zhou F, Liu B, Duan J, Qiu G. Towards disentangling latent space for unsupervised semantic face editing. IEEE Trans Image Process. 2022;31:1475–89.
PubMed Google Scholar
Rives A, Meier J, Sercu T, Goyal S, Lin Z, Liu J, et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc Natl Acad Sci U S A. 2021;118. https://doiorg.publicaciones.saludcastillayleon.es/10.1073/pnas.2016239118.
Schroff F, Kalenichenko D, Philbin J. FaceNet: a unified embedding for face recognition and clustering. 2015 IEEE conference on computer vision and pattern recognition (CVPR). IEEE; 2015. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/cvpr.2015.7298682.
Nielsen R, Akey JM, Jakobsson M, Pritchard JK, Tishkoff S, Willerslev E. Tracing the peopling of the world through genomics. Nature. 2017;541:302–10.
PubMed PubMed Central CAS Google Scholar
1000 Genomes Project Consortium, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, et al. A global reference for human genetic variation. Nature. 2015;526:68–74.
Google Scholar
Coop G. Genetic similarity versus genetic ancestry groups as sample descriptors in human genetics. arXiv [q-bio.PE]. 2022. Available from: http://arxiv.org/abs/2207.11595
Bhatia G, Furlotte NA, Loh PR, Liu X, Finucane HK, Gusev A, et al. Correcting subtle stratification in summary association statistics. bioRxiv. 2016. p. 076133. Available from: https://www.biorxiv.org/content/10.1101/076133v1. [cited 2024 Feb 9].
Sul JH, Martin LS, Eskin E. Population structure in genetic studies: confounding factors and mixed models. PLoS Genet. 2018;14:e1007309.
PubMed PubMed Central Google Scholar
Zhou X, Stephens M. Genome-wide efficient mixed-model analysis for association studies. Nat Genet. 2012;44:821–4.
PubMed PubMed Central CAS Google Scholar
Hellwege JN, Keaton JM, Giri A, Gao X, Velez Edwards DR, Edwards TL. Population stratification in genetic association studies. Curr Protoc Hum Genet. 2017;95:1.22.1-1.22.23.
PubMed Google Scholar
Greene CS, Penrod NM, Williams SM, Moore JH. Failure to replicate a genetic association may provide important clues about genetic architecture. PLoS ONE. 2009;4:e5639.
PubMed PubMed Central Google Scholar
Johnson RA, Wichern DW. Applied multivariate statistical analysis. London: Pearson Prentice Hall; 2007.
Google Scholar
Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 2006;38:904–9.
PubMed CAS Google Scholar
Yao Y, Ochoa A. Limitations of principal components in quantitative genetic association models for human studies. Elife. 2023;12. Available from: https://doiorg.publicaciones.saludcastillayleon.es/10.7554/eLife.79238.
Abegaz F, Van Lishout F, Mahachie John JM, Chiachoompu K, Bhardwaj A, Duroux D, et al. Performance of model-based multifactor dimensionality reduction methods for epistasis detection by controlling population structure. BioData Min. 2021;14:16.
PubMed PubMed Central CAS Google Scholar
Ning C, Wang D, Kang H, Mrode R, Zhou L, Xu S, et al. A rapid epistatic mixed-model association analysis by linear retransformations of genomic estimated values. Bioinformatics. 2018;34:1817–25.
PubMed PubMed Central CAS Google Scholar
Lippert C, Listgarten J, Liu Y, Kadie CM, Davidson RI, Heckerman D. FaST linear mixed models for genome-wide association studies. Nat Methods. 2011;8:833–5.
PubMed CAS Google Scholar
Lippert C, Listgarten J, Davidson RI, Baxter S, Poon H, Kadie CM, et al. An exhaustive epistatic SNP association analysis on expanded Wellcome Trust data. Sci Rep. 2013;3:1099.
PubMed PubMed Central Google Scholar
Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015;4:7.
PubMed PubMed Central Google Scholar
Mackay TFC. Epistasis and quantitative traits: using model organisms to study gene-gene interactions. Nat Rev Genet. 2014;15:22–33.
PubMed CAS Google Scholar
Kuzmin E, VanderSluis B, Wang W, Tan G, Deshpande R, Chen Y, et al. Systematic analysis of complex genetic interactions. Science. 2018;360. Available from: https://doiorg.publicaciones.saludcastillayleon.es/10.1126/science.aao1729.
Costanzo M, VanderSluis B, Koch EN, Baryshnikova A, Pons C, Tan G, et al. A global genetic interaction network maps a wiring diagram of cellular function. Science. 2016;353:aaf1420–aaf1420.
PubMed PubMed Central Google Scholar
Zhao X, Li J, Liu Z, Powers S. Combinatorial CRISPR/Cas9 screening reveals epistatic networks of interacting tumor suppressor genes and therapeutic targets in human breast cancer. Cancer Res. 2021;81:6090–105.
PubMed PubMed Central CAS Google Scholar
Thompson NA, Ranzani M, van der Weyden L, Iyer V, Offord V, Droop A, et al. Combinatorial CRISPR screen identifies fitness effects of gene paralogues. Nat Commun. 2021;12:1302.
PubMed PubMed Central CAS Google Scholar
Han K, Jeng EE, Hess GT, Morgens DW, Li A, Bassik MC. Synergistic drug combinations for cancer identified in a CRISPR screen for pairwise genetic interactions. Nat Biotechnol. 2017;35:463–74.
PubMed PubMed Central CAS Google Scholar
Cisneros AF, Gagnon-Arsenault I, Dubé AK, Després PC, Kumar P, Lafontaine K, et al. Epistasis between promoter activity and coding mutations shapes gene evolvability. Sci Adv. 2023;9:eadd9109.
PubMed PubMed Central CAS Google Scholar
Mapping the genetic landscape of human cells. Available from: https://www.cell.com/cell/pdf/S0092-8674(18)30735-9.pdf.
Tufts DM, Natarajan C, Revsbech IG, Projecto-Garcia J, Hoffmann FG, Weber RE, et al. Epistasis constrains mutational pathways of hemoglobin adaptation in high-altitude pikas. Mol Biol Evol. 2015;32:287–98.
PubMed CAS Google Scholar
Rojas Echenique JI, Kryazhimskiy S, Nguyen Ba AN, Desai MM. Modular epistasis and the compensatory evolution of gene deletion mutants. PLoS Genet. 2019;15:e1007958.
PubMed PubMed Central Google Scholar
Schmidt M, Lipson H. Distilling free-form natural laws from experimental data. Science. 2009;324:81–5.
PubMed CAS Google Scholar
Vladislavleva EY. Model-based problem solving through symbolic regression via Pareto genetic programming. CentER: Tilburg University; 2008.
Google Scholar
Lundberg M, Sng LMF, Szul P, Dunne R, Bayat A, Burnham SC, et al. Novel Alzheimer’s disease genes and epistasis identified using machine learning GWAS platform. Sci Rep. 2023;13:17662.
PubMed PubMed Central CAS Google Scholar
Wang H, Bennett DA, De Jager PL, Zhang QY, Zhang HY. Genome-wide epistasis analysis for Alzheimer’s disease and implications for genetic risk prediction. Alzheimers Res Ther. 2021;13:55.
PubMed PubMed Central CAS Google Scholar
Park S, Lehner B. Cancer type-dependent genetic interactions between cancer driver alterations indicate plasticity of epistasis across cell types. Mol Syst Biol. 2015;11:824.
PubMed PubMed Central Google Scholar
Hoffmann M, Poschenrieder JM, Incudini M, Baier S, Fritz A, Maier A, et al. Network medicine-based epistasis detection in complex diseases: ready for quantum computing. Nucleic Acids Res. 2024;52:10144–60.
PubMed PubMed Central CAS Google Scholar
Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, et al. Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet. 2001;29:365–71.
PubMed CAS Google Scholar
Balvert M, Cooper-Knock J, Stamp J, Byrne RP, Mourragui S, van Gils J et al. Population structure confounds regression-based epistasis detection. 2024. Github https://github.com/jdstamp/leiden_paper.
Balvert M, Cooper-Knock J, Stamp J, Byrne RP, Mourragui S, van Gils J, et al. Population structure confounds regression-based epistasis detection. 2024. Zenodo. https://doiorg.publicaciones.saludcastillayleon.es/10.5281/zenodo.13940750.
Browning SR, Browning BL, Daviglus ML, Durazo-Arvizu RA, Schneiderman N, Kaplan RC, et al. Ancestry-specific recent effective population size in the Americas. PLoS Genet. 2018;14(5):e1007385. https://doiorg.publicaciones.saludcastillayleon.es/10.1371/journal.pgen.1007385.
Article PubMed PubMed Central CAS Google Scholar
Gravel S, Henn BM, Gutenkunst RN, Indap AR, Marth GT, Clark AG, et al. Proc Natl Acad Sci U S A. 2011;108(29):11983–8. https://doiorg.publicaciones.saludcastillayleon.es/10.1073/pnas.1019276108.
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work is the result of fruitful discussions that took place at the workshop “A multidisciplinary approach to epistasis detection” in July 2023, in Leiden, the Netherlands. We thank the Lorentz Center for their support in facilitating and co-funding the workshop.

Additional members of the Lorentz workshop on epistasis

Name	Affiliation
Ammar Al-Chalabi	King’s College London
Jorge Avila Cartes	Università degli Studi di Milano-Bicocca
Jasmijn Baaijens	Delft University of Technology
Joanna von Berg	Princess Maxima Center for Pediatric Oncology
Davide Bolognini	Fondazione Human Technopole
Paola Bonizzoni	Università degli Studi di Milano-Bicocca
Andrea Guarracino	University of Tennessee
Mehmet Koyuturk	Case Western Reserve University
Magda Markowska	University of Warsaw
Johannes Schlüter	Bielefeld University
Raghuram Dandinasivara	Bielefeld University
Jasper van Bemmelen	Delft University of Technology
Sebastian Vorbrugg	Max Planck Institute for Biology
Sai Zhang	University of Florida
Bogdan Pasanuic	University of Pennsylvania

Peer review information

Andrew Cosgrove was the primary editor of this article at Genome Biology and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Review history

The review history is available as Additional file 2.

Funding

We also thank the Artificial Intelligence Journal, the Company of Biologists, the Netherlands Organization for Scientific Research (Veni grant VI.Veni.192.043), Bielefeld University, Ronin, and funding from the European Union’s Horizon 2020 research and innovation programme under Marie Skłodowska-Curie grant agreements No 956229 (ALPACA) and No 872539 (PANGAIA) for providing funding for the workshop. R.P.B. is supported by funding from the Motor Neurone Disease Association (Byrne/Oct22/979–799). J.C.K. was supported by the Wellcome Trust (216596/Z/19/Z). C.H./J.C.K. are supported by the MNDA (899–792). M.B. is supported by the Netherlands Organization for Scientific Research (Veni grant VI.Veni.192.043). M.P.S. is supported by the National Institutes of Health (CEGS 5P50HG00773504, 1P50HL083800, 1R01HL101388, 1R01-HL122939, S10OD025212, P30DK116074, and UM1HG009442).

Author information

Marleen Balvert and Johnathan Cooper-Knock contributed equally to this work.
Alexander Schönhuth, Letitia M. F. Sng, and Natalie A. Twine contributed equally to this work.

Authors and Affiliations

Tilburg University, Tilburg, The Netherlands
Marleen Balvert & Gizem Taş
SITraN, University of Sheffield, Sheffield, UK
Johnathan Cooper-Knock, Elham Alhathli & Calum Harvey
Brown University, Providence, USA
Julian Stamp
Smurfit Institute of Genetics, Trinity College Dublin, Dublin, Ireland
Ross P. Byrne
Hubrecht Institute, Utrecht, The Netherlands
Soufiane Mourragui
Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
Juami van Gils
University of Oxford, Oxford, UK
Stefania Benonisdottir
Bielefeld University, Bielefeld, Germany
Johannes Schlüter, Luna Pianesi & Alexander Schönhuth
UMC Utrecht, Utrecht, The Netherlands
Kevin Kenna, Gizem Taş & Yan Wang
Department of Biostatistics and Health Informatics, King’s College London, London, UK
Alfredo Iacoangeli & Jiajing Hu
Department of Basic and Clinical Neuroscience, King’s College London, London, UK
Alfredo Iacoangeli
NIHR BRC SLAM NHS Foundation Trust, London, UK
Alfredo Iacoangeli
University of Washington, Seattle, USA
Brian L. Browning
Algorithmic Bioinformatics and Center for Digital Medicine, Heinrich Heine University, Düsseldorf, Germany
Sara C. Schulte
CITIC, University of A Coruña, A Coruña, Spain
Jorge González-Domínguez
University of Tennessee, Knoxville, USA
Erik Garrisson
Department of Genetics, Stanford University, Stanford, USA
Michael P. Snyder
Commonwealth Scientific and Industrial Research Organisation, Westmead, Australia
Letitia M. F. Sng & Natalie A. Twine
University of Iceland, Reykjavik, Iceland
Stefania Benonisdottir
Department of Epidemiology, University of Florida, Gainesville, FL, USA
Sai Zhang
Utrecht University, Utrecht, The Netherlands
Sanne Abeln & Joséphine T. Daub

Authors

Marleen Balvert
View author publications
You can also search for this author inPubMed Google Scholar
Johnathan Cooper-Knock
View author publications
You can also search for this author inPubMed Google Scholar
Julian Stamp
View author publications
You can also search for this author inPubMed Google Scholar
Ross P. Byrne
View author publications
You can also search for this author inPubMed Google Scholar
Soufiane Mourragui
View author publications
You can also search for this author inPubMed Google Scholar
Juami van Gils
View author publications
You can also search for this author inPubMed Google Scholar
Stefania Benonisdottir
View author publications
You can also search for this author inPubMed Google Scholar
Johannes Schlüter
View author publications
You can also search for this author inPubMed Google Scholar
Kevin Kenna
View author publications
You can also search for this author inPubMed Google Scholar
Sanne Abeln
View author publications
You can also search for this author inPubMed Google Scholar
Alfredo Iacoangeli
View author publications
You can also search for this author inPubMed Google Scholar
Joséphine T. Daub
View author publications
You can also search for this author inPubMed Google Scholar
Brian L. Browning
View author publications
You can also search for this author inPubMed Google Scholar
Gizem Taş
View author publications
You can also search for this author inPubMed Google Scholar
Jiajing Hu
View author publications
You can also search for this author inPubMed Google Scholar
Yan Wang
View author publications
You can also search for this author inPubMed Google Scholar
Elham Alhathli
View author publications
You can also search for this author inPubMed Google Scholar
Calum Harvey
View author publications
You can also search for this author inPubMed Google Scholar
Luna Pianesi
View author publications
You can also search for this author inPubMed Google Scholar
Sara C. Schulte
View author publications
You can also search for this author inPubMed Google Scholar
Jorge González-Domínguez
View author publications
You can also search for this author inPubMed Google Scholar
Erik Garrisson
View author publications
You can also search for this author inPubMed Google Scholar
Michael P. Snyder
View author publications
You can also search for this author inPubMed Google Scholar
Alexander Schönhuth
View author publications
You can also search for this author inPubMed Google Scholar
Letitia M. F. Sng
View author publications
You can also search for this author inPubMed Google Scholar
Natalie A. Twine
View author publications
You can also search for this author inPubMed Google Scholar

Consortia

Lorentz workshop on epistasis

Ammar Al-Chalabi
, Jorge Avila Cartes
, Jasmijn Baaijens
, Joanna von Berg
, Davide Bolognini
, Paola Bonizzoni
, Andrea Guarracino
, Mehmet Koyuturk
, Magda Markowska
, Johannes Schlüter
, Raghuram Dandinasivara
, Jasper van Bemmelen
, Sebastian Vorbrugg
, Sai Zhang
& Bogdan Pasanuic

Contributions

All authors contributed to the writing of the manuscript. All authors read and approved the final version.

Corresponding authors

Correspondence to Marleen Balvert, Johnathan Cooper-Knock, Letitia M. F. Sng or Natalie A. Twine.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original version of this article was revised: Affiliations for authors Joséphine Daub and Sanne Abeln have been corrected.

Supplementary Information

13059_2024_3427_MOESM1_ESM.docx

Additional file 1. Supplementary methods for simulation of the effect of population structure on detection of epistasis, based on models in [88] and [89].

Additional file 2. Review history.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Balvert, M., Cooper-Knock, J., Stamp, J. et al. Considerations in the search for epistasis. Genome Biol 25, 296 (2024). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13059-024-03427-z

Download citation

Received: 01 March 2024
Accepted: 23 October 2024
Published: 19 November 2024
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13059-024-03427-z

Considerations in the search for epistasis

Abstract

Introduction

What assumptions of epistasis are being made and what are their implications?

Generalizing the functional form of epistasis

Using generative approaches to model and explore epistasis

Population structure confounds regression-based epistasis detection

Leveraging biology in the search for epistasis

Should a search for epistasis start with biological observations?

A database of epistasis

Conclusion

Data availability

Change history

20 January 2025

References

Acknowledgements

Peer review information

Review history

Funding

Author information

Authors and Affiliations

Consortia

Lorentz workshop on epistasis

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher’s Note

Supplementary Information

13059_2024_3427_MOESM1_ESM.docx

Additional file 2. Review history.

Rights and permissions

About this article

Cite this article

Share this article

Genome Biology

Contact us