Skip to main content
Fig. 2 | Genome Biology

Fig. 2

From: MIDAA: deep archetypal analysis for interpretable multi-omic data integration based on biological principles

Fig. 2

Multimodal deep archetypal analysis reconstructs an efficient and biologically meaningful latent space. A Archetype distribution plotted over the RNA UMAP. B A 2d projection of the simplex latent space. Here weights vectors are plotted in 2d polar coordinates. Cells that closely resemble archetypes are far from the center and close to the specific archetype on the outer circle, point on the inside are a mixture of different archetypes. The weights components can be identified by considering the direction of each point in the space as a mixture of unitary vectors pointing at the text labels on the outer circle. A detailed mathematical description of the projection can be found in the Methods section C Heatmap of normalized [0–1] cell progenitor scores for cells with archetype probability ≥ 80% and K-means clustering in VAE and MOFA space. D–E GSEA enrichment analysis for archetypes 1 and 3 using the cell progenitor gene sets from [29]. F–G UMAP and 2d simplex projection of the dataset in [30]. H Correlation of transcription factor motif deviation and archetype weights. GATA 1 is an erythropoietic commitment marker and TCF3 is enriched in dendritic progenitors. I The generative nature of the model makes it easy to produce synthetic datasets from the latent space. First of all the user can sample from a Dirichlet distribution specifying the concentration parameter and from that the decoder generates realistic multi-modal data. J–K Concordance of gene expression and promoter accessibility in a synthetic dataset consisting mainly of the erythropoietic and stem archetypes

Back to article page