Fig. 1
From: Systematic evaluation of methylation-based cell type deconvolution methods for plasma cell-free DNA

Schematic overview of the benchmark study design. In this benchmark study, WGBS data from 182 samples representing 35 cell types is initially randomly divided into two halves. One half is designated for the creation of the reference methylation atlas, while the other is utilized for generating in silico cfDNA samples. Ground truth cell type proportions are then generated using either a uniform distribution, Dirichlet distribution, or a constrained random distribution with blood cells as the primary cell types. Subsequently, the deconvolution performance is rigorously assessed under various influencing factors, including reference marker selection, sequencing depth, and reference completeness. Five evaluation metrics, root-mean-square error (RMSE), Pearson’s correlation coefficient, Spearman’s rank correlation, Lin’s concordance correlation coefficient (CCC), and Jensen–Shannon divergence (JSD), are employed to scrutinize the accuracy of predicted proportions (\({P}_{p}\)) against ground truth proportions (\({P}_{g}\)). Additionally, two real-world datasets including cfDNA samples from both patients and controls were included to evaluate performance on real clinical applications. Reference methylation atlas generated from all the 182 samples were used for the deconvolution of real-world datasets. Two metrics were applied to evaluate the deconvolution performance in these datasets: (1) the statistical difference in the cfDNA fraction of affected tissues between diseased and healthy individuals and (2) the ROC-AUC of machine learning models for disease detection based on the estimated fractions of all cell types