Fig. 1

Overview of data processing and model evaluation. A Schematic overview of the data preprocessing and evaluation pipeline used in this study. Cell type-specific and ubiquitous peak sequences were annotated, and models were evaluated independently in these genomic regions. Models were evaluated on both “reference accuracy” (the models’ ability to predict experimentally measured accessibility from the reference genome) and “variant effect accuracy” (the models’ ability to predict allele-specific differences in accessibility). B Four previously published datasets are used in subsequent analyses. The experimental assays and number of chromatin accessibility profiles are shown. Only chromatin accessibility profiles from ATAC-seq or DNase-seq are analyzed in this work. C For each of the four datasets, the majority of test set sequences are cell type-specific. Distributions shown are over test set sequences that had a peak in at least one chromatin accessibility profile in the dataset