Fig. 1

Detected number of core genes (genes present in all (100%) input genomes) in relation to the average number of genes (y-axis) compared to the average POCP values (x-axis) per dataset, tool, and sequence similarity threshold. Roary, Panaroo, and PPanGGOLiN were run with different sequence similarity thresholds, as shown in the legend. Each tool’s default parameter for sequence similarity is printed in bold. Filled symbols represent genus-level records, while non-filled symbols represent species-level records. For example, all tools show similar results for the Chlamydia trachomatis species-level dataset, where they generate a core gene set that covers ~ 86% of the average gene count. However, for the Chlamydia genus-level dataset, the core genes covering the average number of genes range from ~ 0% (Roary 95%, Panaroo 98%, PPanGGOLiN 95%) to ~ 83% (RIBAP). Again, note that in this comparison, only genes that were detected in all input genomes (no shell or cloud genes) are included. In the supplement, we additionally show the results for genes present in 99%, 95%, and 90% of the input genomes (Additional file 1: Fig. S1 and Additional file 4: Table S3). RIBAP uses the Roary 95% sequence similarity results to refine the gene groups (Fig. 2)