Fig. 2
From: aurora: a machine learning gwas tool for analyzing microbial habitat adaptation

Construction and visualization of simulated data. Four simulated datasets were constructed using methods with different assumptions: Simulate_pan_genome.py script published along with Scoary [22], pangenome simulation tool Simurg [33], and two multiple state speciation and extinction models (MuSSE). A Visualization of the construction of the two MuSSE models. The speciation (λ) and extinction (μ) rates are shown for each state. The corresponding transition matrices are shown next to the model scheme. In the case of MuSSE1 model, if a strain finished the simulation in a state with blue color (S1, S2, S3) then the strain was considered to belong to the blue class of the phenotype. If the final state was red (S4, S5, and S6), then the strain belongs to the red class of the phenotype. In the case of MuSSE2 model, once a strain passed the red states (S3 and S5), it was considered to belong to the red phenotype even if it later returned to one of the blue states (S1, S2, S4). Each time a strain transitions into a new state, it gains the causal gene that corresponds to the colored border ring. The colored rings around the states correspond to the colored circles in the phylogenetic trees below. Additionally, in MuSSE2 simulation if a strain gains a causal gene (S2, S3, S4, S5) then its current extinction rate is reduced by 0.01. B Phylogenetic trees showing how the phenotype classes (inner color strip) and causal genes (outer ring with circles) were distributed. Causal genes are those which contribute to the red phenotype. The subsequent GWAS analysis was used to discover adaptation factors to the red phenotype