Skip to main content
Fig. 1 | Genome Biology

Fig. 1

From: Response to "Neglecting normalization impact in semi-synthetic RNA-seq data simulation generates artificial false positives" and "Winsorization greatly reduces false positives by popular differential expression methods when analyzing human population samples"

Fig. 1

Illustration of a the two strategies for generating semi-synthetic data from real data and b the generation of permuted data used in Fig. 1 of our published study [1]. a Permutation-based strategy and model-based strategy for generating semi-synthetic data. There are three schemes for the permutation-based strategy: scheme 1—“permutation first” generates semi-synthetic data by permuting the real count data, followed by normalization and then DE analysis (bottom-left); scheme 2—“no normalization” generates semi-synthetic data by permuting the real count data, followed by DE analysis directly, without normalization (top-left); scheme 3—“normalization first” generates semi-synthetic data by normalizing the real data (bottom-middle), which are no longer counts, followed by permutation and then DE analysis (bottom-right). In the model-based strategy (top-right), we fit a multi-gene NB distribution to the real samples using the simulator scDesign3 [4]. For each true differentially expressed gene (DEG) we fit a NB distribution using the samples under each condition; for each true non-DEG, we fit one NB distribution by pooling the samples from both conditions; then, we generate semi-synthetic data by sampling from the fitted multi-gene NB distribution. b For the analysis in Fig. 1 of our published study [1], all genes are permuted and become true non-DEGs, followed by normalization and then DE analysis

Back to article page