Skip to main content
Fig. 1 | Genome Biology

Fig. 1

From: Neglecting the impact of normalization in semi-synthetic RNA-seq data simulations generates artificial false positives

Fig. 1

Empirical FDR control against nominal FDR level. Average over 50 semi-synthetic dataset generated from the GTEx Heart atrial appendage VS Heart left ventricle data. Fifty percent of the true differentially expressed (DE) genes are randomly sampled in each semi-synthetic dataset (i.e., 2889 genes remain unpermuted as true positives) and considered as gold-standard DE genes. Panel A reproduces the results from Li et al. [1] Fig. 2A when all methods are applied to the same data (first permuted to generate null gene expression and then normalized) on the full sample size (372 and 386 samples in each group respectively). Panel B studies the impact of both the sample size as well as the respective order between the data normalization and the random permutations to generate non-differentially expressed genes on the FDR control of the Wilcoxon test and on both asymptotic and permutation tests from dearseq. Of note, when applied to non-normalized data, the heteroskedasticity weights estimated by dearseq are subject to caution because observed values are then not comparable across samples

Back to article page