stDyer enables spatial domain clustering with dynamic graph embedding

Xu, Ke; Xu, Yu; Wang, Zirui; Zhou, Xin Maizie; Zhang, Lu

doi:10.1186/s13059-025-03503-y

Methodology
Open access
Published: 20 February 2025

stDyer enables spatial domain clustering with dynamic graph embedding

Ke Xu¹,
Yu Xu¹,
Zirui Wang¹,
Xin Maizie Zhou² &
…
Lu Zhang^1,3

Genome Biology volume 26, Article number: 34 (2025) Cite this article

1735 Accesses
2 Altmetric
Metrics details

Abstract

Spatially resolved transcriptomics (SRT) data provide critical insights into gene expression patterns within tissue contexts, necessitating effective methods for identifying spatial domains. We introduce stDyer, an end-to-end deep learning framework for spatial domain clustering in SRT data. stDyer combines Gaussian Mixture Variational AutoEncoder with graph attention networks to learn embeddings and perform clustering. Its dynamic graphs adaptively link units based on Gaussian Mixture assignments, improving clustering and producing smoother domain boundaries. stDyer’s mini-batch strategy and multi-GPU support facilitate scalability to large datasets. Benchmarking against state-of-the-art tools, stDyer demonstrates superior performance in spatial domain clustering, multi-slice analysis, and large-scale dataset handling.

Peer Review reports

Background

Identifying spatial domains is a vital task for analyzing spatially resolved transcriptomics (SRT). These domains represent distinct areas where the units involved (either spots or cells, depending on the SRT technologies) show similar gene expression patterns. Exploring these domains is crucial to understanding the spatial organization of gene activity [1, 2]. Traditional methods for clustering or community detection, such as K-means and the Leiden algorithm, typically rely only on gene expression data and disregard spatial information, often resulting in spatial domains that lack spatial continuity. To address this issue, various algorithms have been developed to incorporate both gene expression and spatial coordinates. These methods can be broadly classified into probabilistic approaches and deep learning models.

In probabilistic approaches, HMRF [3] groups units with similar gene expression profiles and takes into account their spatial coordinates using a Hidden Markov Random Field. HMRF employs the expectation-maximization algorithm and Gaussian density function to estimate model parameters. However, estimating a large number of parameters in the variable precision matrix across clusters can lead to unstable results. BayesSpace [4] employs a t-distributed error model to generate more robust spatial domains. It improves HMRF by utilizing Markov Chain Monte Carlo for parameter inference, allowing for more efficient exploration in the parameter space. BASS [5] implements a Bayesian hierarchical framework to simultaneously identify spatial domains and their featured cell types across multiple samples. SpaDo [6] first estimates the cell type distributions and then computes the distances between these distributions to perform spatial domain clustering. CellCharter [7] initially extracts features using scVI [8] and subsequently aggregates neighbors’ features for a Gaussian Mixture Model to identify spatial domains.

Deep learning models employ graph convolutional networks (GCNs) [9] on a spatial graph constructed using spatial coordinates of units, such as the K-nearest neighbor (KNN) graph. They group unit embeddings using unsupervised clustering methods to recognize spatial domains. SpaGCN [10] builds a graph by integrating the spatial coordinates and histological features of the units and applies GCNs to aggregate information from the neighboring units on this graph. STAGATE [11] introduces graph attention networks (GATs) to a spatial graph to capture intricate spatial dependencies on gene expression and incorporates a cell type-aware module from pre-clustered gene expressions. GraphST [12] enhances the graph embeddings of units by adopting a graph self-supervised contrastive learning model that considers both gene expression profiles and spatial information.

Although the existing methods have been successfully applied to identify spatial domains for various tissues and species, many studies have explored that their performance is still unsatisfactory [1, 13, 14]. All these methods rely on a fixed spatial graph and assume that a unit is derived from the same spatial domain as its neighboring units. However, this assumption may not hold true for units located at the boundaries of spatial domains, where neighboring units may belong to different domains. The misclassification of those boundary units may also extend its influence to units within the domain. Additionally, most existing deep learning models for spatial domain clustering [11, 15,16,17] are not designed to be end-to-end, resulting in suboptimal performance as they necessitate an independent clustering step. In addition, the application of existing tools to multi-slice and large-scale datasets poses another challenge with respect to scalability [18, 19]. Furthermore, most of existing tools that support multi-slice clustering treat multiple slices as a single large slice and ignore the spatial relationship for multiple slices that are vertically aligned. There are currently no methods tailored for spatial domain clustering that can effectively address all of these issues simultaneously.

We introduce stDyer, an end-to-end and scalable deep spatial domain clustering approach on SRT data. stDyer employs a Gaussian Mixture Variational AutoEncoder [20,21,22] (GMVAE) with GAT [23] and graph embedding in the latent space. Instead of using an independent clustering step, stDyer enables deep representation learning and clustering from Gaussian Mixture Models (GMMs) simultaneously. stDyer also introduces dynamic graphs to include more edges to a KNN spatial graph. These new edges link a unit to others that belong to the same Gaussian Mixture as it does and increase the likelihood that units at the domain boundaries establish connections with others belonging to the same spatial domain. Moreover, stDyer introduces mini-batch neighbor sampling strategy to enable its application to large-scale datasets. To our knowledge, stDyer is also the first method that could enable multi-GPU training for spatial domain clustering. We compared stDyer with 8 state-of-the-art tools on four different SRT technologies, including 10x Visium, osmFISH, STARmap and Stereo-seq. stDyer demonstrated excellent performance on these technologies and produced smooth domain boundaries. stDyer also improved the overall performance on the 10x Visium dataset without obvious batch effects for multi-slice joint analysis. In the ablation studies, we demonstrated that dynamic graphs could substantially improve spatial domain clustering when compared to the utilization of a KNN spatial graph (stDyer(KNN)). stDyer could also be extended to jointly analyze SRT data from multiple slices and scale up to large-scale datasets.

Results

Workflow of stDyer

stDyer employs GMVAE with GAT and graph embedding on dynamic graphs to improve spatial domain clustering using SRT data (Fig. 1a). We construct a KNN spatial graph using the spatial coordinates of units within SRT data. This graph, together with gene expression profiles, serves as input for the GMVAE (Additional file 1: Note S1 and Additional file 1: Fig. S1). The encoder layer of the GMVAE is structured as GATs (“Methods” section) to consider the spatial graph structure (“Methods” section). GAT aggregates information from a target unit’s neighbors to generate an aggregated representation (Fig. 1b), enhancing the accuracy of spatial domain inference (Additional file 1: Note S2). Besides GATs, stDyer also considers the graph structure in the latent space by encouraging the gene expression of a unit to be reconstructed by its neighbors in the graph. GMVAE assumes unit embeddings follow GMMs in the latent space, which could be used to generate temporary spatial domain labels for all units at each epoch. In the first epoch using dynamic graph for training, stDyer modifies the KNN spatial graph by connecting unit i to extra neighboring units (12 for the 10x Visium technology and 4 for the other technologies), where these additional units and unit i must share the same temporary spatial label (“Methods” section). This is the initial dynamic graph we generated. Starting from the next training epoch, dynamic graphs are continuously updated by substituting the extra units based on temporary spatial labels from GMMs. stDyer includes a post-processing module to remove outliers and smooth spatial domain boundaries by jointly considering the predicted spatial labels of all units (“Methods” section). stDyer also applies integrated gradient (IG) analysis (“Methods” section) to identify spatially variable genes (SVGs) with spatially correlated expression.

stDyer improves spatial domain identification on the DLPFC dataset from 10x Visium technology

We evaluated the performance of stDyer using the human dorsolateral prefrontal cortex (DLPFC) dataset [1] generated by 10x Visium technology. This dataset includes 12 slices, with unit numbers ranging from 3460 to 4789. Each slice has 5 or 7 annotated spatial domains (Additional file 1: Fig. S2). We benchmarked stDyer with 10 state-of-the-art methods for spatial domain clustering: HMRF, SpaGCN, BayesSpace, SEDR, DeepST, BASS, STAGATE, GraphST, SpaDo, and CellCharter based on adjusted rand index (ARI; “Methods” section). We evaluated these methods on all 12 slices (Fig. 2k) and observed stDyer achieved an average ARI of 0.612, which was significantly higher than the second-best method GraphST (average ARI = 0.538, p-value = 0.014, Wilcoxon rank-sum test). We also performed an ablation study to examine the efficacy of using dynamic graphs in stDyer. Our results indicated that stDyer (average ARI = 0.612, p-value = 0.002, Fig. 2k) substantially outperformed stDyer (KNN) (average ARI=0.486).

We first visualized slice 151673 (Fig. 2a and b), where most methods performed well. On this slice, stDyer, DeepST, STAGATE, SpaDo, and GraphST (Fig. 2c) successfully identified 6 laminar layers and white matter (WM) as indicated in the annotation (Fig. 2b). We then evaluated the embeddings from 6 methods (stDyer, SEDR, DeepST, STAGATE, GraphST, and CellCharter) using Silhouette score (Fig. 2d), with our method achieving the highest score, indicating the best cluster separateness. Visualization of these embeddings exhibited continuity and aligned with the adjacency pattern in the annotation from L1 to WM (Fig. 2e). However, GraphST and CellCharter exhibited worse separateness since L1 was also adjacent to L3 in addition to L2 in their embedding visualizations (Fig. 2e).

We also visualized slice 151674 (Fig. 2f and g), where only stDyer (Fig. 2h) could identify six laminar layers and WM in the annotation (Fig. 2g). The other methods struggled to distinguish between L2 to L6, e.g., GraphST failed to separate L4 and L5 domains (Fig. 2h). Similar patterns were observed on the other slices (Additional file 1: Fig. S2). For the 6 methods that could generate unit embeddings (stDyer, SEDR, DeepST, STAGATE, GraphST and CellCharter), stDyer still produced the highest Silhouette score (Fig. 2i) and it was the only one that could separate out L4 domain (highlighted by the red rectangle in Fig. 2j).

We selected 50 SVGs with the highest IG values for each domain (“Methods” section, Additional file 2: Tab. S1 and Additional file 1: Fig. S3) and found some of them were reported as marker genes of cortex layers. For example, a previous study identified HPCAL1, PCP4, and MBP as marker genes of L2, L5, and WM [1] (Fig. 2l), respectively. The same study also revealed CCK could distinguish L2, L3, and L6 from the other layers. We found that NEFM could be a marker gene of L4.

stDyer supports multi-slice spatial domain clustering

We divided 12 slices from the DLPFC dataset into three slice sections according to their spatial distance of the third dimension, thus each section includes four adjacent slices (Fig. 3a, Additional file 2: Tab. S2) [1]. The KNN spatial graph was constructed using three-dimensional coordinates of units (“Methods” section), which could include neighboring units from the same and adjacent slices. By jointly analyzing four slices from each section, we observed multi-slice stDyer (stDyer(M)) outperformed single-slice stDyer (stDyer(S)) on 8 out of 12 DLPFC slices (Additional file 1: Fig. S2). For example, with multi-slice clustering, the ARI for spatial clustering on slice 151510 improved from 0.496 using stDyer(S) to 0.572 using stDyer(M)(Fig. 3b). The 4 slices (151507, 151509, 151669, and 151675) that did not show improvement by multi-slice clustering had a higher average ARI score than the 8 slices (0.654 versus 0.591) that benefited from multi-slice clustering. This suggested that multi-slice clustering is particularly effective for slices with lower initial quality, as it leverages neighboring spots from adjacent slices to enhance unit embeddings for clustering. We also compared stDyer(M) with 4 methods (GraphST(M), STAGATE(M), SEDR(M), and CellCharter(M)) (Fig. 3c) that also supported multi-slice clustering and generated unit embeddings. The stDyer(M) achieved an average ARI of 0.650 for all three sections, substantially outperforming the second-best method, GraphST, which had an average ARI of 0.540 (p-value = 0.019, Fig. 3c). UMAP visualization (Fig. 3d) for the four slices suggested that unit embeddings from stDyer(M) were much more consistent with the spatial domain annotation. We also observed stDyer(M) provided more accurate unit embeddings of slender domains including L2, L4, L5, and L6 (Fig. 3d).

stDyer recognizes both tumor and interface domains on a zebrafish tumor dataset from 10x Visium technology

We further applied stDyer to a zebrafish melanoma dataset from 10x Visium technology [14] to assess its ability to identify highly heterogeneous domains. This dataset comprises 2179 units with 16,107 genes, including spatial domains from tumor, interface, and normal tissues. The normal tissue domains are further categorized into the brain, muscle, secondary muscle, skin, and others. We used Silhouette score (SS, [24]) to evaluate the performance of spatial domain clustering tools on the zebrafish tumor dataset. Unit annotations from a previous study, generated using Seurat [25], were based on gene expression similarities. The computation of SS does not rely on the ground truth labels and SS can measure how well each unit fits within its assigned cluster compared to other clusters, providing insight into cluster separability and cohesion (“Methods” section). stDyer achieved the best SS of 0.166 (Fig. 4a), outperforming the second best, BayesSpace (SS = 0.098). There were six methods that successfully identified both tumor and interface domains (Fig. 4a), including stDyer (SS = 0.166), SpaGCN (SS = 0.087), BASS (SS = 0.059), DeepST (SS = 0.042), CellCharter (SS = 0.011), and SpaDo (SS = 0.016). Meanwhile, three methods accurately identified the tumor domain but failed to distinguish the interface domain (STAGATE (SS = 0.083), GraphST (SS = 0.041), and SEDR (SS = − 0.029)). Additionally, stDyer identified the secondary muscle domain, which was not detected by stDyer (KNN) (SS = 0.151).

Based IG information, we observed SVG expression values exhibited clear correlations with the corresponding spatial domains (Fig. 4b, Additional file 2: Tab. S3 and Additional file 1: Fig. S4). We found ppt1 uniquely showed high expression profiles within the tumor domain (Fig. 4b). Its homeotic gene PPT1 has been found to be associated with tumor promotion and prognosis [26]. We noticed that the expression levels of rps18 progressively decreased as the units transitioned from the tumor domain through the interface domain and into the muscle domain. By computing the distances between each unit in the interface domain and its nearest unit in the tumor domain, we found that the Pearson correlation of −0.66 between the distance and the expression of rps18, with a p-value of $2.2 \times 10^{-16}$ (Additional file 1: Fig. S5). Notably, a previous study reported that human melanoma circulating tumor cells exhibit a gene signature involving ribosomal protein large/small subunits (RPL/RPS) associated with melanoma brain metastasis [27]. To extend this analysis, we performed the same correlation analysis for all genes and filtered out genes with absolute Pearson correlation values less than 0.6. This process identified 64 genes, 48 of which belonged to the rpl/rps family. These findings suggest that zebrafish melanoma tumor cells might also possess a similar gene signature associated with tumor progression. However, it is important to note that additional evidence from independent experiments is necessary to confirm this conclusion, given the highly heterogeneous features of tumor cells.

stDyer generates spatial domains with smooth boundaries on the mouse cortex dataset from osmFISH

We analyzed a mouse cortex dataset comprising 4839 units and 33 genes from osmFISH [13] to evaluate the performance of stDyer on in situ hybridization technology. These units can be classified into 11 spatial domains, including hippocampus (HPC), internal capsule and caudoputamen (ICC), neocortical layer 1 (L1), neocortical layer 2–3 lateral (L2-3l), neocortical layer 2–3 medial (L2-3m), neocortical layer 3–4 (L3–4), neocortical layer 4 (L4), neocortical layer (L5), neocortical layer (L6), ventricle (VT), and white matter (WM). stDyer achieved the best performance with ARI of 0.733 (Fig. 5a), which was higher than stDyer (KNN) (ARI = 0.632) and all 9 other state-of-the-art tools (the second-best tool SpaDo with ARI = 0.661). stDyer (ARI = 0.733), DeepST (ARI = 0.561), BASS (ARI = 0.573), GraphST (ARI = 0.373), SpaDo (ARI = 0.661), and CellCharter (ARI = 0.487) could all generate notably smooth spatial domain boundaries (Fig. 5a), whereas the remaining three methods preferred to label neighboring units with different domains. DeepST, BASS and SpaDo incorrectly merged L2–3l and L2–3m domains in their results. stDyer also obtained the highest Silhouette score (Fig. 5b) and generated highly separable unit embeddings from different domains (Fig. 5c). It was noted that stDyer generated unit embeddings of L2–3l domain as a line that could be easily distinguished from L2–3m (Fig. 5c).

stDyer detects clear laminar structures on the mouse cortex dataset from STARmap

stDyer was used to analyze a mouse cortex dataset [28] comprising 1207 units and 1020 genes from STARmap to evaluate the performance of stDyer on in situ sequencing technology. The units in this dataset were annotated to be from seven spatial domains, including corpus callosum (CC), hippocampus (HPC), neocortical layer 1 (L1), neocortical layer 2/3 (L2/L3), neocortical layer 4 (L4), neocortical layer 5 (L5), and neocortical layer 6 (L6). As shown in Fig. 6a, SpaDo (ARI = 0.730) showed superior performance in predicting the laminar structure of the mouse cortex. stDyer (ARI = 0.664) ranked as the second-best method, with some mixing observed between the HPC domain and the L5 domain. Additionally, when we applied a similar strategy to SpaDo, utilizing Louvain clustering followed by hierarchical clustering for initialization, stDyer successfully identified all 7 domains correctly, achieving an ARI of 0.742 (Additional file 1: Fig. S6). The other tools demonstrated issues in identifying different layers. There were only a few cells annotated in the second and sixth domains predicted by BASS (ARI = 0.645) and SEDR (ARI = 0.584), which could be due to incorrect labeling of units from the domain boundaries. CellCharter (ARI = 0.542) also mixed HPC domain and L5 domain together. DeepST (ARI = 0.531) and STAGATE (ARI = 0.527) divided L2/L3 into two vertical sections. SpaGCN (ARI = 0.443) failed to predict the hippocampus domain, while GraphST (ARI = 0.465) was unable to predict L1 and divided L2/L3 erroneously. BayesSpace (ARI = 0.219) exhibited poor performance on this dataset, yielding only four spatial domains with clear boundaries. We observed the unit embeddings from stDyer showed excellent separation between different spatial domains (Fig. 6b) and achieved the highest Silhouette score and the lowest Davies-Bouldin score (Fig. 6c, “Methods” section) compared to the other five methods. We observed that the IG analysis (Additional file 2: Tab. S4 and Additional file 1: Fig. S7) also revealed SVGs associated with L2/3 and L6 (Fig. 6d).

stDyer can be scaled up to accommodate large-scale mouse embryo datasets from Stereo-Seq

Stereo-seq has been used to generate SRT data with 53 slices from 8 stages of mouse embryos [2]. We first applied stDyer and 9 existing methods (including SpaGCN, BayesSpace, SEDR, DeepST, BASS, STAGATE, GraphST, SpaDo, and CellCharter) to the slice (Slice ID: 7) with the largest number of units (i.e., 155,047) out of all 13 slices of E16.5 mouse embryos from Stereo-seq. In this comparison, stDyer obtained the top ARI score of 0.359 (Fig. 7a), which surpassed those achieved by SpaGCN (ARI = 0.254), BayesSpace (ARI = 0.284), CellCharter (ARI = 0.262) and SpaDo (ARI = 0.247). Other methods failed to run on this slice due to its large unit number. Nonetheless, the ARI scores for slices of this dataset were not as high as those for other datasets. One reason could be the large number of spatial domains in this dataset, which increases the likelihood of mismatch between the predicted spatial domain labels and the annotated ones. Additionally, the increased geometric complexity of these spatial domains makes it more challenging to achieve a very high ARI score. We evaluated the geometric complexity of each benchmarked dataset with the Silhouette score by taking spatial coordinates and annotated labels as input (Additional file 1: Fig. S8). For each dataset, the ARI score in the figure is computed as the average ARI scores across all methods used in this study. We observed that slices from Stereo-seq indeed have higher geometry complexity (lower Silhouette score) and correspondingly lower ARI scores. The Pearson correlation value between ARI and each data geometric complexity metric is significant (R = 0.82, p = $1.1\times 10^{-7}$). We also investigated if stDyer could identify major organs and tissues in mouse embryo accurately (Fig. 7b and c). We found stDyer identified highly concordant spatial domains with annotations, and the SVGs (Fig. 7d) of these domains also matched with the corresponding marker genes [29] or their homologous genes [30]. Specifically, the gene Krt5 was identified for the epidermis, Myl1 and Itm2a for muscle, Nova1 and Hoxb8 for spinal cord, Cldn11 for meninges, Actg2 for smooth muscle, Plac8 for GI tract and thymus, Kif26b for jaw and tooth, Col2a1 for bone, Alb for liver, and Tnni3 for heart. We also found stDyer separated the mouse brain into three distinct domains (domains 2, 13, and 15) and identified some SVGs associated with these domains that are functionally linked to the development of brain areas. For example, domain 13 was characterized by the presence of Neurod6, a gene known to be selectively active in certain deep-layer pyramidal neurons of the mouse brain [31]. In domain 15, the gene Adgrv1 was found to be highly expressed, which is essential for the development of auditory functions [32]. Domain 2 worked in conjunction with the other two domains to define a particular brain area.

We also benchmarked stDyer with 9 existing methods on all 13 slices of E16.5 mouse embryo with unit numbers varying from 59,119 to 155,047 (Additional file 2: Tab. S5). We found DeepST, BASS and STAGATE failed to analyze the slice with the smallest number of units (Slice ID: 1) due to insufficient graphics memory (> 80 GB for DeepST and STAGATE) or encountering segfault from C stack overflow (BASS), suggesting they could be inappropriate to be applied to Stereo-seq data. Among the seven remaining methods, stDyer, SpaGCN, BayesSpace, CellCharter and SpaDo were the only ones able to analyze all 13 slices. stDyer also outperformed the other tools on spatial domain clustering across all 13 slices (Fig. 7e). It achieved an average ARI of 0.393 (Additional file 2: Tab. S5) compared to 0.352 (SpaGCN), 0.346 (BayesSpace), 0.338 (SpaDo), and 0.344 (CellCharter), respectively.

To further evaluate their performance on scalability, we sampled 100,000 to 500,000 units from the combined units from 13 slices. BayesSpace, however, encountered an error due to matrix size and could not process datasets larger than 200,000 units. SpaDo consumed more than 3 TB (2 TB of physical memory and 1 TB of virtual memory) of memory for datasets larger than 200,000 units. SpaGCN’s memory consumption was significant, requiring 195 GB, 762 GB, and 2048 GB of host CPU memory for the datasets including 100,000, 200,000, and 300,000 units, respectively (Fig. 7f). For the datasets with over 300,000 units, SpaGCN exhausted all 2919 GB of available host CPU memory and was terminated by the operating system’s out-of-memory management. In contrast, stDyer was more memory-efficient, requiring only about 25 GB of graphics memory for all five simulated datasets due to the implementation of mini-batch strategy. For each batch, stDyer only selects a small number of units as well as its neighbors to construct the spatial graph, avoiding loading the whole spatial graph from all units into the graphics memory (Fig. 7g). CellCharter was also capable of running on large-scale datasets as it adopted the mini-batch strategy.

We also evaluated the runtime of these methods on four datasets with different numbers of units and genes (Fig. 7h): the mouse cortex dataset from the STARmap technology (1207 units and 1020 genes), the mouse cortex dataset from the osmFISH technology (4939 units and 33 genes), the slice 151673 from the DLPFC dataset of 10x Visium technology (3639 units and 33,538 genes), and the E2_S8 slice from the MOSTA dataset of Stereo-seq technology (120,815 units and 28,701 genes). SpaGCN and SpaDo were the two fastest methods and finished clustering within 3 min for the first three datasets. stDyer consumed 1.33, 3.83, and 4.38 min for these datasets, respectively, while CellCharter took 4.15, 5.40, and 4.88 min. BayesSpace was the slowest, taking 9.16, 34.48, and 28.61 min, respectively. For the E2_S8 slice from the MOSTA dataset of Stereo-seq technology (120,815 units and 28,701 genes), CellCharter was the fastest, completing in 18.33 min, followed by SpaGCN, stDyer, and SpaDo at 75.66, 365.11, and 382.81 min, respectively. BayesSpace again took the longest, at 960.73 min (16.01 h). Overall, for datasets with small to medium number of units (STARmap, osmFISH, 10x Visium), stDyer completed clustering within 5 min. For large datasets, stDyer produced results in a few hours. The relatively slower speed of stDyer compared to SpaGCN and CellCharter may stem from its use of graph attention networks, which require intensive computation and increasing processing time. Although not the fastest, stDyer remains scalable to larger datasets, as its graphics memory requirement does not increase significantly with the number of units.

We also investigated the impact of batch sizes and the number of units on the runtime of stDyer and GPU memory usage with an NVIDIA A100 card on the simulated datasets (Fig. 7i, j and k). We found that setting the batch size to 1000 could significantly reduce the runtime for processing 100,000 units (Fig. 7i). It only required to occupy 17.64 GB of GPU memory (Fig. 7j), which could even be satisfied by consumer GPUs like RTX 3090. By varying the total unit numbers (batch size = 1000, Fig. 7k), we observed stDyer only required 1.87 and 26.11 h to process 100,000 and 1,000,000 units, respectively. stDyer’s performance can be sped up further through the utilization of multiple GPU cards. Our findings indicated that by employing eight NVIDIA A100 cards, the analysis time for 1,000,000 units could be cut down to just 5.53 h (Fig. 7l).

Discussion

The SRT technologies have revolutionized our understanding of the spatial dynamics of cells and gene expression, providing an efficient way in investigating the intricate relationship between gene activity and its spatial distribution within tissue development or tumor microenvironment. Based on SRT data, spatial domains could be identified to investigate spatial heterogeneity of gene expression patterns and their functional implications. The performance of existing tools for spatial domain clustering is still unsatisfactory and it is challenging to accurately delineate the boundaries of spatial domains. In this study, we introduced stDyer as an end-to-end and scalable graph-based deep learning model for spatial domain clustering using dynamic graphs. Instead of using a fixed spatial graph, the edges in dynamic graphs would be updated based on units’ temporary spatial domain labels from GMVAE. Thus, there is a high chance that units sharing the same temporary spatial domain labels could be connected in dynamic graphs. We evaluated the performance of stDyer on different SRT technologies and tissues and proved it could substantially improve spatial domain clustering. The ablation studies also highlighted the superiority of the dynamic graphs over a KNN spatial graph.

Some SRT technologies, such as 10x Visium and Stereo-seq, are capable of producing histological images of the same tissue slice, which is expected to be beneficial for spatial domain clustering. We tried to concatenate image patch embeddings with gene expression profiles of the 151674 slice from the DLPFC dataset as input for our model. We cut the whole image into small patches for each unit and extracted the image feature embeddings using the pre-trained ResNet50 [33]. We found the ARI decreased to 0.237 (ARI without image: 0.639) and the predicted spatial domains were influenced much by noises in the histological image (Additional file 1: Fig. S9). This suggested that image features could not guarantee to improve spatial domain clustering. To enhance clustering outcomes, the use of high-quality, biologically informative images or advanced image processing models will be desired.

Recent advancements in SRT technologies, such as Stereo-Seq, enable the measurement of a huge number of units for individual slices. Although most current available ST datasets contain no more than 200,000 units due to cost or technical limitations, the trend of the development of spatial transcriptomics is towards increasing the throughput of unit numbers and decreasing the cost per unit. For example, 10x Genomics company has developed the 10x Visium HD technology, which is now commercialized and can measure more than 500,000 units. Although 10x Visium HD is not widely used yet, it is promising that more researchers will apply it for various kinds of tissues to generate diverse ST datasets. Many graph-based approaches, such as STAGATE and GraphST, struggle to handle large spatial graphs that contain upwards of 100,000 units due to their reliance on a basic implementation of graph convolutional networks. These methods attempt to load the entire graph into the GPU graphics memory, which is inherently limited and unable to expand at the same rate as the growing number of units being measured. To address this issue, stDyer utilizes a mini-batch technique to efficiently analyze graphs with millions of units. Additionally, stDyer is also able to support multiple slices spatial domain clustering, where the slices could be aligned vertically (e.g., DLPFC) or horizontally adjacent (e.g., concatenated as a single large slice). It is worth noting that stDyer cannot handle batch effects, which might not be an issue if slices originate from the same donor or sample and are well-measured, thereby minimizing batch effects. For instance, in the 10x Visium DLPFC datasets, each of the three sections corresponds to a specific donor and consists of four adjacent slices. These vertically aligned slices exhibit no noticeable batch effects and have similar tissue structure, making them suitable for spatial domain detection using stDyer. In contrast, horizontally adjacent slices typically cover a large tissue area from the same donor, too extensive to be captured by a single slice. Since they originate from the same donor, the batch effects in these slices may also be minimal. However, for slices with comparable tissue structure but significant batch effects, it may be necessary to remove batch effects using relevant tools before performing spatial domain clustering analysis.

stDyer is an open-source tool with broad applicability for all existing SRT technologies. Although our experiments are limited to datasets produced by 10x Visium, osmFISH, STARmap, and Stereo-seq, stDyer is designed to handle datasets from any SRT technologies with gene expression profiles and spatial information. We have improved stDyer to enable the processing of multiple slices and incorporated both a mini-batch approach and multi-GPU support to efficiently handle the analysis of large-scale datasets. In the future, we would like to extend stDyer by incorporating histological images.

Conclusions

In conclusion, we present stDyer as an innovative framework for spatial domain clustering in spatial transcriptomics. Compared to existing tools, stDyer demonstrates superior performance in identifying spatial domains for both single-slice and multi-slice joint analysis while also enabling the detection of spatially variable genes for downstream analysis. Additionally, ablation studies on real datasets highlight the distinct advantages and robustness of stDyer.

Methods

Data preprocessing

A SRT dataset comprises two primary components: one matrix representing the gene expression profiles of m genes in n units ($\varvec{X}\in \mathbb {R}^{n\times m}$ ) and the other matrix containing two-dimensional spatial coordinates for n units ($\varvec{C}\in \mathbb {R}^{n\times 2}$). We performed data preprocessing on the gene expression matrix using functions provided by Scanpy [34]: (1) removed ERCC spike-ins; (2) removed units [scanpy.pp.filter_cells(min_counts=1)] and genes [scanpy.pp.filter_genes(min_counts=1)] with zero RNA counts; (3) selected top 3000 highly variable genes [scanpy.pp.highly_variable_genes]; (4) library normalization [scanpy.pp.normalize_total]; (5) logarithmic transformation [scanpy.pp.log1p]; (6) gene-wise z-score standardization [scanpy.pp.scale].

Construct dynamic graphs

We established the initial spatial graph by employing a KNN approach (K = 4 for osmFISH, STARmap and Stereo-seq; K = 6 for 10x Visium due to its distinctive honeycomb structure) on the matrix $\varvec{C}$ to link neighboring units based on Euclidean distance. During each training epoch of GMVAE, a temporary spatial domain label was assigned to each unit based on GMM (Additional file 1: Fig. S10). Subsequently, additional neighbors (12 for 10x Visium and 4 for osmFISH, STARmap and Stereo-seq) for each unit were incorporated into the KNN spatial graph to connect units sharing identical spatial domain labels. For a given unit, the extra neighbors were chosen at random from other units that share the same temporary spatial domain label with it unless this unit’s closest 18 (for 10x Visium technology) or 8 (for other SRT technologies) spatial neighbors sharing the same label with it. In such a case, these closest units would be selected as the unit’s extra neighbors. In a dynamic graph, each unit has a total of 18 (for 10x Visium technology) or 8 neighbors (Stereo-seq, osmFISH, and STARmap technologies). For technologies with patterned unit organization, specifically 10x Visium (Additional file 1: Fig. S11a) and Stereo-seq (Additional file 1: Fig. S11b), the configurations of 18 and 8 neighbors correspond to two-hop neighbors in graph theory. For STARmap and osmFISH technologies, which lack patterned unit organization, we performed ablation studies and found that 8 neighbors achieved the best ARI scores of 0.73 (Additional file 1: Fig. S12a) and 0.74 (Additional file 1: Fig. S12b), respectively. Based on these results, we selected 8 neighbors for STARmap and osmFISH technologies.

Preprocessing of SRT data from multiple slices

We divided 12 slices from the DLFPC dataset into three sections, each comprising four spatially contiguous slices, and applied the alignment method PASTE2 [35]. This method refines the two-dimensional spatial coordinates of units within each slice, aligning and transforming them into a unified coordinate system. The unified spatial coordinates by PASTE2 will replace original (unaligned) spatial coordinates across all evaluated methods to ensure fair comparisons in multi-slice clustering. Given that each slice contains only two-dimensional spatial information, we introduced an additional artificial spatial coordinate of the third dimension to merge the four slices in the same section, thus enabling the construction of a three-dimensional spatial graph that spans across four slices. Assuming perfect overlap of the slices, this strategy allows each unit to identify 6 spatial neighbors from its own slice, as well as from the slices immediately above and below it, providing a total of 18 nearest neighbors for the three-dimensional spatial graph construction (Additional file 1: Note S3).

The network structure of stDyer

To construct dynamic graphs, stDyer designs an end-to-end spatial domain clustering framework based on GMVAE with graph embedding (Additional file 1: Fig. S1). The framework includes an inference module (encoder) and a generative module (decoder). For the encoder, stDyer utilizes GATs (Fig. 1b) to aggregate gene expression profiles from the neighbors of a given unit in a spatial (dynamic) graph. The graph attention network is essential for aggregating information from neighboring units, enabling our model to incorporate contextual spatial information and more accurately identify the spatial domain of the target unit. This aggregation improves the smoothness of the predicted spatial domains (Additional file 1: Note S2). In the context of spatial transcriptomic analysis, the attention mechanism concepts (query, key, and value) are defined as follows: Query($Q_i$): a learnable vector $Q_i$ representing the features of target unit with index i. Keys($K_\cdot$): A set of $k+1$ learnable vectors representing the features of the target unit and its k neighbors. Values($V_\cdot$): A set of $k+1$ learnable vectors representing the information to be aggregated from the target unit and its neighbors. The attention mechanism works by computing the attention scores through the multiplication of the query vector $Q_i$ with each of the key vectors $K_\cdot$. These attention scores serve as weights for aggregating the value vectors $V_\cdot$ through a weighted average, resulting in the aggregated embedding $G_i$. $G_i$, the output of the GAT, integrates information from the target unit i and its neighbors and is subsequently used to infer the spatial domain of the target unit. The detailed implementation of the models is in Additional file 1: Note S1.

Probabilistic models of modified GMVAE in stDyer

We modified the GMVAE in stDyer by replacing its encoder from FC layers to GAT and utilized $\theta$ and $\phi$ to represent the parameters involved during generative (decoder) and inference (encoder) processes. The generative process could be represented as follows:

$$\begin{aligned} p(y_i=j)&=\frac{1}{k^{\text {cluster}}},j=1,2,\cdots ,k^{\text {cluster}}\end{aligned}$$

(1)

$$\begin{aligned} p_{\theta }(\varvec{z}_i|y_i=j)&=\mathcal {N}(\varvec{\mu }_{j}^{\text {prior}},{\varvec{\sigma }_{j}^{\text {prior}}}^2),j=1,2,\cdots ,k^{\text {cluster}}\end{aligned}$$

(2)

$$\begin{aligned} p_{\theta }(\varvec{x}_i|\varvec{z}_i)&=\mathcal {N}(\varvec{\mu }^{\varvec{x}}_{i,j},{\varvec{\sigma }^{\varvec{x}}_{i,j}}^2),j=1,2,\cdots ,k^{\text {cluster}}\end{aligned}$$

(3)

$$\begin{aligned} p_{\theta }(y_i=j,\varvec{z}_i,\varvec{x}_i)&=p(y_i=j)p_{\theta }(\varvec{z}_i|y_i=j)p_{\theta }(\varvec{x}_i|y_i=j,\varvec{z}_i),j=1,2,\cdots ,k^{\text {cluster}} \end{aligned}$$

(4)

$\varvec{x}_i$, ${y}_i$, and $\varvec{z}_i$ are three random variables that denote the gene expression profile, spatial domain label and latent embedding from the jth mode of GMM of the target unit i. $\varvec{\mu }_{j}^{\text {prior}}$ and $\varvec{\sigma }_{j}^{\text {prior}}$ denote the mean and standard deviation of the prior distribution of the jth Gaussian mixture, respectively. Similarly, $\varvec{\mu }^{\varvec{x}}_{i,j}$ and $\varvec{\sigma }^{\varvec{x}}_{i,j}$ represent the mean and standard deviation of the distribution of the output of the decoder. We adopt a multivariate Gaussian distribution as the output distribution for the generative process to facilitate neural network training. By using a multivariate Gaussian distribution, the neural networks reconstruct normalized expression values rather than discrete expression counts. This approach reduces the range of values that need to be reconstructed, which typically leads to a smaller range of gradients. This reduction mitigates the risk of gradient explosion and enhances training stability. Notably, the output distribution is unit-wise, meaning that each unit has its own unique multivariate Gaussian distribution. This approach is distinct from using a single model to represent all gene expression values of each unit. Given the expressive power of neural networks, reconstructing the expression values of a unit as a unit-specific n-dimensional vector is straightforward, regardless of whether these values are sparse or not. The inference process is formulated as follows:

$$\begin{aligned} q_{\phi }(y_{i}=j|\varvec{x}_i,\{\varvec{x}_i\}^{\text {neigh}})&=\hat{\varvec{Y}}_i(j),j=1,2,\cdots ,k^{\text {cluster}}\end{aligned}$$

(5)

$$\begin{aligned} q_{\phi }(\varvec{z}_i|\varvec{x}_i,\{\varvec{x}_i\}^{\text {neigh}},y_{i}=j)&=\mathcal {N}(\varvec{\mu }_{i,j}^{\text {post}},{\varvec{\sigma }_{i,j}^{\text {post}}}^2),j=1,2,\cdots ,k^{\text {cluster}}\end{aligned}$$

(6)

$$\begin{aligned} q_{\phi }(y_{i}=j,\varvec{z}_i|\varvec{x}_i,\{\varvec{x}_i\}^{\text {neigh}})&=q_{\phi }(y_{i}=j|\varvec{x}_i,\{\varvec{x}_i\}^{\text {neigh}})\cdot q_{\phi }(\varvec{z}_i|\varvec{x}_i,\{\varvec{x}_i\}^{\text {neigh}},y_{i}=j),\nonumber \\&\qquad j=1,2,\cdots ,k^{\text {cluster}} \end{aligned}$$

(7)

where $\hat{\varvec{Y}}_i(j)$ denotes the predicted probability that the target unit i belongs to the jth mode of GMM, $\{\varvec{x}_i\}^{\text {neigh}}$ represents the gene expression profiles of the neighbors of unit i. The spatial domain assignment vector $\hat{\varvec{Y}}_i(j)$ is inferred by GAT in Additional file 1: Eq. S3. $\varvec{\mu }_{j}^{\text {post}}$ and $\varvec{\sigma }_{j}^{\text {post}}$ denote the mean and standard deviation of the posterior distribution of the jth Gaussian mixture.

The objective function of stDyer

The objective function of stDyer is to maximize the log-likelihood of the marginal distribution across all involved n units:

$$\begin{aligned} \max _{\theta ,\phi }\sum _{i=1}^{n}{\log {p_{\theta }(\varvec{x}_i)}} \end{aligned}$$

(8)

stDyer incorporates a smoothness constraint to encourage neighboring units to share the same spatial domain labels and have contiguous embedding.

$$\begin{aligned}&\text {JS}_{\theta ,\phi }^{\text {smooth}}=\sum _{r=1}^{k_i^{\text {neigh}}}\text {JS}_{\theta ,\phi }^{\text {smooth}(r)}\end{aligned}$$

(9)

$$\begin{aligned} &\text {JS}_{\theta ,\phi }^{\text {smooth}(r)} \overset{\textrm{def}}{=}\text {JS}\left(q_{\phi }(y_{i},\varvec{z}_i|\varvec{x}_i,\{\varvec{x}_i\}^{\text {neigh}})\| q_{\phi }(y_{r},\varvec{z}_r|\varvec{x}_r,\{\varvec{x}_r\}^{\text {neigh}})\right),\nonumber \\ &\qquad \qquad \qquad \qquad r=1,2,\cdots ,k_i^{\text {neigh}} \end{aligned}$$

(10)

The objective function could be rewritten as:

$$\begin{aligned} \max _{\theta ,\phi }\sum _{i=1}^{n}\log {p_{\theta }(\varvec{x}_i)}-\text {JS}_{\theta ,\phi }^{\text {smooth}} \end{aligned}$$

(11)

where $\text {JS}$ denotes Jensen-Shannon divergence and $\varvec{x}_r$ refers to the gene expression profile of the neighboring unit r of unit i. The log-likelihood $\log {p_{\theta }(\varvec{x}_i)}$ for the target unit i can be decomposed as:

$$\begin{aligned} \log {p_{\theta }(\varvec{x}_i)}&=\text {KL}\left(q_{\phi }(y_{i},\varvec{z}_i|\varvec{x}_i,\{\varvec{x}_i\}^{\text {neigh}})\| p_{\theta }(y_i,\varvec{z}_i|\varvec{x}_i)\right)\nonumber \\&+E_{q_{\phi }(y_{i},\varvec{z}_i|\varvec{x}_i,\{\varvec{x}_i\}^{\text {neigh}})}\log {\frac{p_{\theta }(y_i,\varvec{z}_i,\varvec{x}_i)}{q_{\phi }(y_{i},\varvec{z}_i|\varvec{x}_i,\{\varvec{x}_i\}^{\text {neigh}})}} \end{aligned}$$

(12)

stDyer aims to reconstruct $\varvec{x}_i$ using its neighboring units $\varvec{x}_r$ to promote similarity in their spatial domain labels and latent embeddings. Thus, Eq. 12 could be rewritten as:

$$\begin{aligned} \log {p_{\theta }(\varvec{x}_i)}&=\text {KL}\left(q_{\phi }(y_{r},\varvec{z}_r|\varvec{x}_r,\{\varvec{x}_r\}^{\text {neigh}})\| p_{\theta }(y_i,\varvec{z}_i|\varvec{x}_i)\right)\nonumber \\&+E_{q_{\phi }(y_{r},\varvec{z}_r|\varvec{x}_r,\{\varvec{x}_r\}^{\text {neigh}})}\log {\frac{p_{\theta }(y_i,\varvec{z}_i,\varvec{x}_i)}{q_{\phi }(y_{r},\varvec{z}_r|\varvec{x}_r,\{\varvec{x}_r\}^{\text {neigh}})}},\nonumber \\&\quad r=1,2,\cdots ,k_i^{\text {neigh}} \end{aligned}$$

(13)

By taking the average of both sides of the two equations, Eq. 12 and the mean of Eq. 13 over $r=1,2,\cdots ,k_i^{\text {neigh}}$, we derived:

$$\begin{aligned} \log {p_{\theta }(\varvec{x}_i)}&=\frac{1}{2k_i^{\text {neigh}}}\sum _{r=1}^{k_i^{\text {neigh}}}\left[\vphantom{\frac{0}{1}}\text {KL}\left(q_{\phi }(y_{i},\varvec{z}_i|\varvec{x}_i,\{\varvec{x}_i\}^{\text {neigh}})\| p_{\theta }(y_i,\varvec{z}_i|\varvec{x}_i)\right)\right.\nonumber \\&\quad +\text {KL}\left(q_{\phi }(y_{r},\varvec{z}_r|\varvec{x}_r,\{\varvec{x}_r\}^{\text {neigh}})\| p_{\theta }(y_i,\varvec{z}_i|\varvec{x}_i)\right)\nonumber \\&\quad +E_{q_{\phi }(y_{i},\varvec{z}_i|\varvec{x}_i,\{\varvec{x}_i\}^{\text {neigh}})}\log {\frac{p_{\theta }(y_i,\varvec{z}_i,\varvec{x}_i)}{q_{\phi }(y_{i},\varvec{z}_i|\varvec{x}_i,\{\varvec{x}_i\}^{\text {neigh}})}}\nonumber \\&\quad +E_{q_{\phi }(y_{r},\varvec{z}_r|\varvec{x}_r,\{\varvec{x}_r\}^{\text {neigh}})}\log\left. {\frac{p_{\theta }(y_i,\varvec{z}_i,\varvec{x}_i)}{q_{\phi }(y_{r},\varvec{z}_r|\varvec{x}_r,\{\varvec{x}_r\}^{\text {neigh}})}}\right]\nonumber \\&=\frac{1}{k_i^{\text {neigh}}}\sum _{r=1}^{k_i^{\text {neigh}}}{G_{\theta ,\phi }^r}-\frac{1}{2}(L_{\text {self}}^{i}+L_{\text {neigh}}^{i})\end{aligned}$$

(14)

$$\begin{aligned} L_{\text {self}}^{i}&{\overset{\textrm{def}}{=}}E_{q_{\phi }(y_{i},\varvec{z}_i|\varvec{x}_i,\{\varvec{x}_i\}^{\text {neigh}})}\log {\frac{q_{\phi }(y_{i},\varvec{z}_i|\varvec{x}_i,\{\varvec{x}_i\}^{\text {neigh}})}{p_{\theta }(y_i,\varvec{z}_i,\varvec{x}_i)}}\end{aligned}$$

(15)

$$\begin{aligned} L_{\text {neigh}}^{i}&{\overset{\textrm{def}}{=}}E_{q_{\phi }(y_{r},\varvec{z}_r|\varvec{x}_r,\{\varvec{x}_r\}^{\text {neigh}})}\log {\frac{q_{\phi }(y_{r},\varvec{z}_r|\varvec{x}_r,\{\varvec{x}_r\}^{\text {neigh}})}{p_{\theta }(y_i,\varvec{z}_i,\varvec{x}_i)}}\end{aligned}$$

(16)

$$\begin{aligned} G_{\theta ,\phi }^r&{\overset{\textrm{def}}{=}}\frac{1}{2}\Bigg [E_{q_{\phi }(y_{i},\varvec{z}_i|\varvec{x}_i,\{\varvec{x}_i\}^{\text {neigh}})}\log {\frac{q_{\phi }(y_{i},\varvec{z}_i|\varvec{x}_i,\{\varvec{x}_i\}^{\text {neigh}})}{p_{\theta }(y_i,\varvec{z}_i|\varvec{x}_i)}}\Bigg .\nonumber \\&\quad \Bigg .+E_{q_{\phi }(y_{r},\varvec{z}_r|\varvec{x}_r,\{\varvec{x}_r\}^{\text {neigh}})}\log {\frac{q_{\phi }(y_{r},\varvec{z}_r|\varvec{x}_r,\{\varvec{x}_r\}^{\text {neigh}})}{p_{\theta }(y_i,\varvec{z}_i|\varvec{x}_i)}}\Bigg ] \end{aligned}$$

(17)

As $\text {JS}_{\theta ,\phi }^{\text {smooth}(r)}$ was the lower bound of $G_{\theta ,\phi }^r$ (Additional file 1: Note S4), the objective function Eq. 11 could be reformatted with Eq. 14 as:

$$\begin{aligned}&\max _{\theta ,\phi }\sum _{i=1}^{n}{\log {p_{\theta }(\varvec{x}_i)}-\text {JS}_{\theta ,\phi }^{\text {smooth}}}=\max _{\theta ,\phi }\sum _{i=1}^{n}{\log {p_{\theta }(\varvec{x}_i)}-G_{\theta ,\phi }}\nonumber \\ =&\min _{\theta ,\phi }\sum _{i=1}^{n}{\frac{1}{2}(L_{\text {self}}^{i}+L_{\text {neigh}}^{i})}{\overset{\textrm{def}}{=}}\min {L_{\text {GMVAE}}} \end{aligned}$$

(18)

As defined in Eqs. 15 and 16, $L_{\text {self}}^{i}$ and $L_{\text {neigh}}^{i}$ can be decomposed as:

$$\begin{aligned} L_{\text {self}}^{i}&=L_{\text {rec}}^{\text {self}}+L_{\text {gauss}}^{\text {self}}+L_{\text {cat}}^{\text {self}}\end{aligned}$$

(19)

$$\begin{aligned} L_{\text {neigh}}^{i}&=L_{\text {rec}}^{\text {neigh}}+L_{\text {gauss}}^{\text {neigh}}+L_{\text {cat}}^{\text {neigh}}=\frac{1}{k_i^{\text {neigh}}}\sum _{r=1}^{k_i^{\text {neigh}}}{L_{\text {rec}}^r+L_{\text {gauss}}^r+L_{\text {cat}}^r}\end{aligned}$$

(20)

$$\begin{aligned} L_{\text {rec}}^{\text {self}}&{\overset{\textrm{def}}{=}}E_{q_{\phi }(y_{i},\varvec{z}_i|\varvec{x}_i,\{\varvec{x}_i\}^{\text {neigh}})}{\log {\frac{1}{p_{\theta }(\varvec{x}_i|\varvec{z}_i)}}}\end{aligned}$$

(21)

$$\begin{aligned} L_{\text {gauss}}^{\text {self}}&\overset{\textrm{def}}{=}E_{q_{\phi }(y_{i},\varvec{z}_i|\varvec{x}_i,\{\varvec{x}_i\}^{\text {neigh}})}\log {\frac{q_{\phi }(\varvec{z}_i|\varvec{x}_i,\{\varvec{x}_i\}^{\text {neigh}},y_{i})}{p_{\theta }(\varvec{z}_i|y_i)}}\end{aligned}$$

(22)

$$\begin{aligned} L_{\text {cat}}^{\text {self}}&\overset{\textrm{def}}{=}E_{q_{\phi }(y_{i},\varvec{z}_i|\varvec{x}_i,\{\varvec{x}_i\}^{\text {neigh}})}\log {\frac{q_{\phi }(y_{i}|\varvec{x}_i,\{\varvec{x}_i\}^{\text {neigh}})}{p_{\theta }(y_i)}}\end{aligned}$$

(23)

$$\begin{aligned} L_{\text {rec}}^{\text {neigh}}&\overset{\textrm{def}}{=}\frac{1}{k_i^{\text {neigh}}}\sum _{r=1}^{k_i^{\text {neigh}}}{L_{\text {rec}}^r}\end{aligned}$$

(24)

$$\begin{aligned} L_{\text {gauss}}^{\text {neigh}}&\overset{\textrm{def}}{=}\frac{1}{k_i^{\text {neigh}}}\sum _{r=1}^{k_i^{\text {neigh}}}{L_{\text {gauss}}^r}\end{aligned}$$

(25)

$$\begin{aligned} L_{\text {cat}}^{\text {neigh}}&\overset{\textrm{def}}{=}\frac{1}{k_i^{\text {neigh}}}\sum _{r=1}^{k_i^{\text {neigh}}}{L_{\text {cat}}^r}\end{aligned}$$

(26)

$$\begin{aligned} L_{\text {rec}}^r&\overset{\textrm{def}}{=}E_{q_{\phi }(y_{r},\varvec{z}_r|\varvec{x}_r,\{\varvec{x}_r\}^{\text {neigh}})}{\log {\frac{1}{p_{\theta }(\varvec{x}_i|\varvec{z}_i)}}}\end{aligned}$$

(27)

$$\begin{aligned} L_{\text {gauss}}^r&\overset{\textrm{def}}{=}E_{q_{\phi }(y_{r},\varvec{z}_r|\varvec{x}_r,\{\varvec{x}_r\}^{\text {neigh}})}{\log {\frac{q_{\phi }(\varvec{z}_r|\varvec{x}_r,\{\varvec{x}_r\}^{\text {neigh}},y_{r})}{p_{\theta }(\varvec{z}_i|y_i)}}}\end{aligned}$$

(28)

$$\begin{aligned} L_{\text {cat}}^r&\overset{\textrm{def}}{=}E_{q_{\phi }(y_{r},\varvec{z}_r|\varvec{x}_r,\{\varvec{x}_r\}^{\text {neigh}})}{\log {\frac{q_{\phi }(y_{r}|\varvec{x}_r,\{\varvec{x}_r\}^{\text {neigh}})}{p_{\theta }(y_i)}}} \end{aligned}$$

(29)

Here, $L_{\text {rec}}^{\text {self}}$ represents the loss when reconstructing the target unit from its own embedding, while $L_{\text {rec}}^r$ represents the loss when reconstructing the target unit from the embedding of its rth neighboring unit. Both $L_{\text {gauss}}^{\text {self}}$ and $L_{\text {gauss}}^r$ regularize the parameters of the posterior distribution to ensure consistency with those of the prior distribution in each mixture for the target unit i and its rth neighboring unit, respectively. Finally, $L_{\text {cat}}^{\text {self}}$ and $L_{\text {cat}}^r$ encourage the alignment between the prior distribution and the posterior distribution for the predicted GMM assignment of the target unit and its rth neighbor unit.

Using previously defined generative and inference processes, $L_{\text {rec}}^{\text {self}}$, $L_{\text {gauss}}^{\text {self}}$, $L_{\text {cat}}^{\text {self}}$, $L_{\text {rec}}^{r}$, $L_{\text {gauss}}^{r}$, and $L_{\text {cat}}^{r}$ can be further expanded as:

$$\begin{aligned} L_{\text {rec}}^{\text {self}}&=\sum _{j=1}^{k^{\text {cluster}}}\frac{1}{2}\hat{\varvec{Y}}_{i}(j)*\left[\left(\frac{\hat{\varvec{X}}_{i,j}-\varvec{X}_{i}}{\varvec{\sigma }_{\varvec{x}_{i,j}}}\right)^2+\log {2\pi \varvec{\sigma }_{\varvec{x}_{i,j}}^2}\right]\end{aligned}$$

(30)

$$\begin{aligned} L_{\text {rec}}^r&=\sum _{j=1}^{k^{\text {cluster}}}\frac{1}{2}\hat{\varvec{Y}}_{r}(j)*\left[\left(\frac{\hat{\varvec{X}}_{r,j}-\varvec{X}_{i}}{\varvec{\sigma }_{\varvec{x}_{r,j}}}\right)^2+\log {2\pi \varvec{\sigma }_{\varvec{x}_{r,j}}^2}\right],r=1,2,\cdots ,k_i^{\text {neigh}}\end{aligned}$$

(31)

$$\begin{aligned} L_{\text {gauss}}^{\text {self}}&=\sum _{j=1}^{k^{\text {cluster}}}\frac{1}{2}\hat{\varvec{Y}}_{i}(j)*\left[\textrm{e}^{\log {{\varvec{\sigma }_{i,j}^{\text {post}}}^2}}-\textrm{e}^{\log {{\varvec{\sigma }_{j}^{\text {prior}}}^2}}+\frac{(\varvec{\mu }_{i,j}^{\text {post}}-\varvec{\mu }_{j}^{\text {prior}})^2}{\textrm{e}^{\log {{\varvec{\sigma }_{j}^{\text {prior}}}^2}}}\right.\nonumber \\&\quad \left.-\text {dim}(\varvec{\mu }_{j}^{\text {prior}})+\log {{\varvec{\sigma }_{j}^{\text {prior}}}^2}-\log {{\varvec{\sigma }_{i,j}^{\text {post}}}^2}\vphantom{\frac{1}{2}}\right]\end{aligned}$$

(32)

$$\begin{aligned} L_{\text {gauss}}^r&=\sum _{j=1}^{k^{\text {cluster}}}\frac{1}{2}\hat{\varvec{Y}}_{r}(j)*\left[\textrm{e}^{\log {{\varvec{\sigma }_{r,j}^{\text {post}}}^2}}-\textrm{e}^{\log {{\varvec{\sigma }_{j}^{\text {prior}}}^2}}+\frac{(\varvec{\mu }_{r,j}^{\text {post}}-\varvec{\mu }_{j}^{\text {prior}})^2}{\textrm{e}^{\log {{\varvec{\sigma }_{j}^{\text {prior}}}^2}}}\right.\nonumber \\&\quad \left.\vphantom{\frac{1^{1}}{2_{2}}}-\text {dim}(\varvec{\mu }_{j}^{\text {prior}})+\log {{\varvec{\sigma }_{j}^{\text {prior}}}^2}-\log {{\varvec{\sigma }_{r,j}^{\text {post}}}^2}\right],\nonumber \\&r=1,2,\cdots ,k_i^{\text {neigh}}\end{aligned}$$

(33)

$$\begin{aligned} L_{\text {cat}}^{\text {self}}&=\text {KL}(\hat{\varvec{Y}}_{i}\| \frac{1}{k^{\text {cluster}}})\end{aligned}$$

(34)

$$\begin{aligned} L_{\text {cat}}^r&=\text {KL}(\hat{\varvec{Y}}_{r}\| \frac{1}{k^{\text {cluster}}}),r=1,2,\cdots ,k_i^{\text {neigh}} \end{aligned}$$

(35)

where $\varvec{X}_i$ represents the expression profile of the target unit i and $\hat{\varvec{X}}_{i,j}$ denotes the predicted $\varvec{X}_i$ from the jth mode of GMM. The $\text {dim}(\varvec{\mu }_{j}^{\text {prior}})$ denotes the dimensionality of $\varvec{\mu }_{j}^{\text {prior}}$.

Training strategies of stDyer

stDyer adopted a two-stage training process. In the first stage (1–50 epochs), stDyer was trained based on the spatial KNN graph and the Gaussian losses ($L_{\text {gauss}}^{\text {self}}$ and $L_{\text {gauss}}^{\text {neigh}}$) was ignored to enable the model to focus more on the reconstruction losses. In the second stage trained on dynamic graphs, we applied either mclust [36] or Louvain and hierarchical clustering to group unit embeddings after the 50th epoch and generate pseudo spatial domain labels $\tilde{\varvec{Y}}$ for all units. We then initialized predicted spatial domain labels $\hat{\varvec{Y}}$ using $\tilde{\varvec{Y}}$ using cross-entropy loss (Additional file 1: Eq. S17) for 10 epochs (Additional file 1: Note S5). From the 61st to the 200th epoch, $L_{\text {GMVAE}}$ was employed for model training and spatial domain clustering. We utilized an Adam optimizer to optimize these objective functions with a learning rate of 0.0001 and training epochs of 200. The prior distribution of unit domain labels was initialized as a uniform distribution and updated based on unit temporary spatial domain labels from the last epoch with a learning rate of 0.1.

Spatial domain label refinement

To refine the spatial domains with outlier labels, we used sklearn.neural_network.MLPClassifier to construct the autoencoder with mean squared error (MSE) as the loss function. For model training, the input and output of this autoencoder are spatial coordinates after min-max normalization and spatial domain labels from the modified GMVAE. We up-sampled the units from the smallest domain during model training to avoid this domain disappearing after refinement. This was done by scaling them up to the unit number of the second-smallest domains.

Integrated gradient analysis

We incorporated the integrated gradient analysis [37] in stDyer to identify SVGs. The IG vector was used to represent the effect of gene g in predicting the spatial domain label of unit i.

$$\begin{aligned} \varvec{IG}_{ig}=\varvec{X}_{i\cdot }\odot \left( \frac{1}{2}\times \frac{\partial (\frac{1}{m}\hat{\varvec{Y}}_{ig})}{\partial \varvec{X}_{i\cdot }}+\sum _{p=2}^{m-1}\frac{\partial (\frac{p}{m}\hat{\varvec{Y}}_{ig})}{\partial \varvec{X}_{i\cdot }}+\frac{1}{2}\times \frac{\partial (\frac{m}{m}\hat{\varvec{Y}}_{ig})}{\partial \varvec{X}_{i\cdot }}\right) \times \frac{1}{m} \end{aligned}$$

(36)

where m (50 by default) is the number of steps needed in the summation procedure in Eq. 36 to estimate the definite integral. $\varvec{X}$ and $\hat{\varvec{Y}}$ are gene expression profiles of units and their predicted spatial domain labels, respectively. We averaged IG vectors of gene g across all units for each spatial domain to estimate its impact in prediction. Thus, SVGs for each domain were selected if their IGs within the top 50 largest IGs.

Evaluation metrics

We used Adjusted Rand Index (ARI) to evaluate the performance of spatial domain clustering [38]. ARI is a score computed between ground truth domain labels and predicted domain labels, ranging from − 1 to 1. All methods are required to predict the same number of spatial domains as the ground truth spatial domain number. The influences of the predicted spatial domain number on ARI scores are described in Additional file 1: Note S6. A higher ARI score indicates better clustering performance. ARI is defined as the following given the ground truth spatial domain $\varvec{A}=(a_1,a_2,\cdots ,a_t)$ and the predicted spatial domain $\varvec{B}=(b_1,b_2,\cdots ,b_p)$:

$$\begin{aligned} ARI={\frac{\sum _{i,j}{\left( \begin{array}{c}{c_{i,j}}\\ {2}\end{array}\right) }-\left[ \sum _{i}{\left( \begin{array}{c}{a_{i}}\\ {2}\end{array}\right) }\sum _{j}{\left( \begin{array}{c}{b_{j}}\\ {2}\end{array}\right) }\right]\left/ {\left( \begin{array}{c}{n}\\ {2}\end{array}\right) }\right.}{{\frac{1}{2}}\left[ \sum _{i}{\left( \begin{array}{c}{a_{i}}\\ {2}\end{array}\right) }+\sum _{j}{\left( \begin{array}{c}{b_{j}}\\ {2}\end{array}\right) }\right] -\left[ \sum _{i}{\left( \begin{array}{c}{a_{i}}\\ {2}\end{array}\right) }\sum _{j}{\left( \begin{array}{c}{b_{j}}\\ {2}\end{array}\right) }\right] \left/{\left( \begin{array}{c}{n}\\ {2}\end{array}\right) }\right.}} \end{aligned}$$

(37)

where $c_{i,j}=|a_i\cap b_j|$, and n is the number of units. We computed the p-value of ARI on the multi-slice DLPFC dataset using the Wilcoxon rank-sum test with stat_compare_means function in ggpubr package [39]. Silhouette score [24] is used to evaluate the separateness of embeddings. A higher Silhouette score indicates better separateness of embeddings, while a lower Silhouette score indicates worse separateness of embeddings (i.e., overlapping). Silhouette score is computed as the average Silhouette coefficients of all units. The Silhouette coefficient for a unit is computed as:

$$\begin{aligned} s = \frac{b - a}{max(a, b)} \end{aligned}$$

(38)

where a is the average distance between a unit and all other units in the same spatial domain and b is the average distance between a unit and all other units in the nearest spatial domain. We computed the p-value of the Pearson correlation coefficient using the stat_cor function in ggpubr package [39]. Davies-Bouldin score (DBS) [40] is used to evaluate the separateness between clusters and variation within clusters of embeddings. A lower score indicates higher separateness and lower variation, with the lowest possible DBS being 0. DBS is defined as the average similarity between a spatial domain i and its most similar spatial domain j. The similarity between two spatial domains i and j is computed as:

$$\begin{aligned} R_{ij} = \frac{s_i + s_j}{d_{ij}} \end{aligned}$$

(39)

where $s_i$ is the average distance between each unit in the spatial domain i and the centroid of the spatial domain i, and $d_{ij}$ is the distance between centroids of spatial domains i and j. Then DBS can be formulated as:

$$\begin{aligned} DBS = \frac{1}{k} \sum _{i=1}^k \max _{i \ne j} R_{ij} \end{aligned}$$

(40)

where k is the number of spatial domains. The settings of hyper-parameters of stDyer and the other involved methods were shown in Additional file 1: Note S5 and Note S7.

Data availability

The source code and processed datasets below together with reproducible Jupyter notebooks are available on Zenodo (https://doiorg.publicaciones.saludcastillayleon.es/10.5281/zenodo.14438018) [41] under the MIT license. The human dorsolateral prefrontal cortex dataset [1] is available at https://research.libd.org/spatialLIBD/ [42] in h5 format. The mouse cortex (osmFISH) dataset [13] can be downloaded at http://linnarssonlab.org/osmFISH/availability/ [43] in loom format. The mouse cortex (STARmap) dataset [28] is accessible at https://figshare.com/articles/dataset/STARmap_datasets/22565209 [44]. The zebrafish tumor dataset [14] is available at the Gene Expression Omnibus (GEO) under accession number “GSE159709” [45]. The Stereo-seq datasets [2] are downloaded from https://db.cngb.org/stomics/mosta/download/ [46]. The source code of stDyer is publicly available on GitHub (https://github.com/ericcombiolab/stDyer) [47] as well as in the “stDyer_repr” directory on Zenodo (https://doiorg.publicaciones.saludcastillayleon.es/10.5281/zenodo.14438018) [41] under the MIT license.

References

Maynard KR, Collado-Torres L, Weber LM, Uytingco C, Barry BK, Williams SR, et al. Transcriptome-scale spatial gene expression in the human dorsolateral prefrontal cortex. Nat Neurosci. 2021;24(3):425–36. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/s41593-020-00787-0.
Article CAS PubMed PubMed Central Google Scholar
Chen A, Liao S, Cheng M, Ma K, Wu L, Lai Y, et al. Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays. Cell. 2022;0(0). https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.cell.2022.04.003.
Zhu Q, Shah S, Dries R, Cai L, Yuan GC. Identification of spatially associated subpopulations by combining scRNAseq and sequential fluorescence in situ hybridization data. Nat Biotechnol. 2018;36(12):1183–90. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/nbt.4260.
Article CAS Google Scholar
Zhao E, Stone MR, Ren X, Guenthoer J, Smythe KS, Pulliam T, et al. Spatial transcriptomics at subspot resolution with BayesSpace. Nat Biotechnol. 2021;1–10. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/s41587-021-00935-2.
Li Z, Zhou X. BASS: multi-scale and multi-sample analysis enables accurate cell type clustering and spatial domain detection in spatial transcriptomic studies. Genome Biol. 2022;23(1):168. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13059-022-02734-7.
Article CAS PubMed PubMed Central Google Scholar
Duan B, Chen S, Cheng X, Liu Q. Multi-slice spatial transcriptome domain analysis with SpaDo. Genome Biol. 2024;25(1):73. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13059-024-03213-x.
Article PubMed PubMed Central Google Scholar
Varrone M, Tavernari D, Santamaria-Martínez A, Walsh LA, Ciriello G. Cell Charter reveals spatial cell niches associated with tissue remodeling and cell plasticity. Nat Genet. 2024;56(1):74–84. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/s41588-023-01588-4.
Article CAS PubMed Google Scholar
Lopez R, Regier J, Cole MB, Jordan MI, Yosef N. Deep generative modeling for single-cell transcriptomics. Nat Methods. 2018;15(12):1053–8. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/s41592-018-0229-2.
Article CAS PubMed PubMed Central Google Scholar
Kipf TN, Welling M. Semi-supervised classification with graph convolutional networks. 2016. https://openreview.net/forum?id=SJU4ayYgl. Accessed 22 Jan 2022.
Hu J, Li X, Coleman K, Schroeder A, Ma N, Irwin DJ, et al. SpaGCN: Integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network. Nat Methods. 2021;1–10. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/s41592-021-01255-8.
Dong K, Zhang S. Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder. Nat Commun. 2022;13(1):1739. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/s41467-022-29439-6.
Article CAS PubMed PubMed Central Google Scholar
Long Y, Ang KS, Li M, Chong KLK, Sethi R, Zhong C, et al. Spatially informed clustering, integration, and deconvolution of spatial transcriptomics with GraphST. Nat Commun. 2023;14(1):1155. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/s41467-023-36796-3.
Article CAS PubMed PubMed Central Google Scholar
Codeluppi S, Borm LE, Zeisel A, La Manno G, van Lunteren JA, Svensson CI, et al. Spatial organization of the somatosensory cortex revealed by osmFISH. Nat Methods. 2018;15(11):932–5. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/s41592-018-0175-z.
Article CAS PubMed Google Scholar
Hunter MV, Moncada R, Weiss JM, Yanai I, White RM. Spatially resolved transcriptomics reveals the architecture of the tumor-microenvironment interface. Nat Commun. 2021;12(1):6278. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/s41467-021-26614-z.
Article CAS PubMed PubMed Central Google Scholar
Cang Z, Ning X, Nie A, Xu M, Zhang J. SCAN-IT: domain segmentation of spatial transcriptomics images by graph neural network. vol. 32. 2021. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9552951/. Accessed 21 Jan 2023.
Ren H, Walker BL, Cang Z, Nie Q. Identifying multicellular spatiotemporal organization of cells with SpaceFlow. Nat Commun. 2022;13(1):4076. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/s41467-022-31739-w.
Article CAS PubMed PubMed Central Google Scholar
Xu C, Jin X, Wei S, Wang P, Luo M, Xu Z, et al. DeepST: identifying spatial domains in spatial transcriptomics by deep learning. Nucleic Acids Res. 2022;50(22):e131. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/nar/gkac901.
Article CAS PubMed PubMed Central Google Scholar
Yuan Z, Zhao F, Lin S, Zhao Y, Yao J, Cui Y, et al. Benchmarking spatial clustering methods with spatially resolved transcriptomics data. Nat Methods. 2024;1–11. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/s41592-024-02215-8.
Hu Y, Xie M, Li Y, Rao M, Shen W, Luo C, et al. Benchmarking clustering, alignment, and integration methods for spatial transcriptomics. Genome Biol. 2024;25(1):212. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13059-024-03361-0.
Article PubMed PubMed Central Google Scholar
Dilokthanakul N, Mediano PAM, Garnelo M, Lee MCH, Salimbeni H, Arulkumaran K, et al. Deep unsupervised clustering with gaussian mixture variational autoencoders. https://arxiv.org/pdf/1611.02648. Accessed 19 July 2022.
Jiang Z, Zheng Y, Tan H, Tang B, Zhou H. Variational Deep Embedding: An Unsupervised and Generative Approach to Clustering. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17. 2017. pp. 1965–72. https://doiorg.publicaciones.saludcastillayleon.es/10.24963/ijcai.2017/273.
Yang L, Cheung NM, Li J, Fang J. Deep Clustering by Gaussian Mixture Variational Autoencoders With Graph Embedding. In: 2019 International Conference on Computer Vision. Piscataway: IEEE; 2019. pp. 6439–48. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/ICCV.2019.00654.
Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, Yoshua Bengio. Graph Attention Networks. 2022. https://openreview.net/forum?id=rJXMpikCZ. Accessed 4 Nov 2022.
Rousseeuw PJ. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math. 1987;20:53–65. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/0377-0427(87)90125-7.
Article Google Scholar
Stuart T, Butler A, Hoffman P, Hafemeister C, Papalexi E, Mauck WM, et al. Comprehensive Integration of Single-Cell Data. Cell. 2019;177(7):1888-1902.e21. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.cell.2019.05.031.
Article CAS PubMed PubMed Central Google Scholar
Rebecca VW, Nicastri MC, Fennelly C, Chude CI, Barber-Rotenberg JS, Ronghe A, et al. PPT1 Promotes Tumor Growth and Is the Molecular Target of Chloroquine Derivatives in Cancer. Cancer Discov. 2019;9(2):220–9. https://doiorg.publicaciones.saludcastillayleon.es/10.1158/2159-8290.CD-18-0706.
Article CAS PubMed Google Scholar
Bowley TY, Lagutina IV, Francis C, Sivakumar S, Selwyn RG, Taylor E, et al. The RPL/RPS gene signature of melanoma CTCs associates with brain metastasis. Cancer Res Commun. 2022;2(11):1436–48. https://doiorg.publicaciones.saludcastillayleon.es/10.1158/2767-9764.CRC-22-0337.
Article CAS PubMed PubMed Central Google Scholar
Wang X, Allen WE, Wright MA, Sylwestrak EL, Samusik N, Vesuna S, et al. Three-dimensional intact-tissue sequencing of single-cell transcriptional states. Science. 2018;361(6400). https://doiorg.publicaciones.saludcastillayleon.es/10.1126/science.aat5691.
Baldarelli RM, Smith CL, Ringwald M, Richardson JE, Bult CJ. Mouse Genome Informatics: an integrated knowledgebase system for the laboratory mouse. Genetics. 2024;227(1). https://doiorg.publicaciones.saludcastillayleon.es/10.1093/genetics/iyae031.
Safran M, Rosen N, Twik M, BarShir R, Stein TI, Dahary D, et al. The GeneCards Suite. In: Abugessaisa I, Kasukawa T, editors. Practical Guide to Life Science Databases. Singapore: Springer Nature Singapore. 2021. pp. 27–56. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/978-981-16-5812-9_2.
Tutukova S, Tarabykin V, Hernandez-Miranda LR. The Role of Neurod Genes in Brain Development, Function, and Disease. Front Mol Neurosci. 2021;14:662774. https://doiorg.publicaciones.saludcastillayleon.es/10.3389/fnmol.2021.662774.
Article CAS PubMed PubMed Central Google Scholar
Yan W, Long P, Chen T, Liu W, Yao L, Ren Z, et al. A Natural Occurring Mouse Model with Adgrv1 Mutation of Usher Syndrome 2C and Characterization of its Recombinant Inbred Strains. Cell Physiol Biochem Int J Exp Cell Physiol Biochem Pharmacol. 2018;47(5):1883–97. https://doiorg.publicaciones.saludcastillayleon.es/10.1159/000491068.
Article CAS Google Scholar
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. 2016. pp. 770–8. https://openaccess.thecvf.com/content_cvpr_2016/html/He_Deep_Residual_Learning_CVPR_2016_paper.html. Accessed 28 Jan 2023.
Wolf FA, Angerer P, Theis FJ. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 2018;19(1):15. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13059-017-1382-0.
Article PubMed PubMed Central Google Scholar
Liu X, Zeira R, Raphael BJ. Partial alignment of multislice spatially resolved transcriptomics data. Genome Res. 2023;33(7):1124–32. https://doiorg.publicaciones.saludcastillayleon.es/10.1101/gr.277670.123.
Article CAS PubMed PubMed Central Google Scholar
Scrucca L, Fop M, Murphy TB, Raftery AE. mclust 5: Clustering, Classification and Density Estimation Using Gaussian Finite Mixture Models. The R Journal. 2016;8(1):289–317. https://doiorg.publicaciones.saludcastillayleon.es/10.32614/RJ-2016-021.
Article PubMed PubMed Central Google Scholar
Sundararajan M, Taly A, Yan Q. Axiomatic Attribution for Deep Networks. In: Precup D, Teh YW, editors. Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research. vol. 70. New York: PMLR; 2017. pp. 3319–28. https://proceedings.mlr.press/v70/sundararajan17a.html.
Hubert L, Arabie P. Comparing partitions. J Classif. 1985;2(1):193–218. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/BF01908075.
Article Google Scholar
Kassambara A. ggpubr: ‘ggplot2’ Based Publication Ready Plots. R package version 0.6.0. 2023 https://rpkgs.datanovia.com/ggpubr/. Accessed 21 Feb 2023.
Davies DL, Bouldin DW. A Cluster Separation Measure. IEEE Trans Pattern Anal Mach Intell. 1979;PAMI–1(2):224–7. https://doiorg.publicaciones.saludcastillayleon.es/10.1109/TPAMI.1979.4766909.
Article Google Scholar
Xu K, Xu Y, Wang Z, Zhou XM, Zhang L. stDyer enables spatial domain clustering with dynamic graph embedding. Zenodo. 2024. https://doiorg.publicaciones.saludcastillayleon.es/10.5281/zenodo.14438018.
Collado-Torres L. spatialLIBD: an R/Bioconductor package to visualize spatially-resolved transcriptomics data. LIBD. Datasets. 2022. https://research.libd.org/spatialLIBD/. Accessed 5 May 2023.
Linnarsson S. Spatial organization of the somatosensory cortex revealed by osmFISH. Linnarsson Lab. Datasets. 2018. https://linnarssonlab.org/osmFISH/availability/. Accessed 28 Oct 2023.
Zhao F. STARmap* datasets. 2023. Figshare. Datasets. https://doiorg.publicaciones.saludcastillayleon.es/10.6084/m9.figshare.22565209.v1.
Hunter M. Spatially resolved transcriptomics reveals the architecture of the tumor-microenvironment interface. Gene Expression Omnibus. Datasets. 2021. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE159709. Accessed 21 Aug 2023.
Wang J. MOSTA: Mouse Organogenesis Spatiotemporal Transcriptomic Atlas. China National GeneBank. Datasets. 2022. https://db.cngb.org/stomics/mosta/download/. Accessed 29 Jan 2023.
Xu K, Xu Y, Wang Z, Zhou XM, Zhang L. stDyer enables spatial domain clustering with dynamic graph embedding. GitHub. 2024. https://github.com/ericcombiolab/stDyer. Accessed 26 Nov 2024.

Download references

Peer review information

Claudia Feng and George Inglis were the primary editors of this article and managed its editorial process and peer review in collaboration with the rest of the editorial team. The peer-review history is available in the online version of this article.

Funding

This research was partially supported by Young Collaborative Research Grant (C2004-23Y), HMRF (11221026), HKBU Start-up Grant Tier 2 (RC-SGT2/19-20/SCI/007), and NIH NIGMS Maximizing Investigators’ Research Award (MIRA) R35 GM146960.

Author information

Authors and Affiliations

Department of Computer Science, Hong Kong Baptist University, Hong Kong, China
Ke Xu, Yu Xu, Zirui Wang & Lu Zhang
Department of Biomedical Engineering, Vanderbilt University, 2301 Vanderbilt Place, 37235, Nashville, Tennessee, USA
Xin Maizie Zhou
Institute of Systems Medicine and Health Sciences, School of Chinese Medicine, Hong Kong Baptist University, Hong Kong, China
Lu Zhang

Authors

Ke Xu
View author publications
You can also search for this author inPubMed Google Scholar
Yu Xu
View author publications
You can also search for this author inPubMed Google Scholar
Zirui Wang
View author publications
You can also search for this author inPubMed Google Scholar
Xin Maizie Zhou
View author publications
You can also search for this author inPubMed Google Scholar
Lu Zhang
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

L.Z. and X.M.Z. conceived and supervised the project. K.X. and L.Z. designed the method, conducted the experiments, and wrote the original manuscript. Y.X. and Z.W. reviewed and polished the manuscript. K.X., X.M.Z., and L.Z. revised the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Xin Maizie Zhou or Lu Zhang.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Supplementary Notes S1-S7 and Supplementary Figures S1-S17.

Additional file 2: Supplementary Tables S1-S5.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Xu, K., Xu, Y., Wang, Z. et al. stDyer enables spatial domain clustering with dynamic graph embedding. Genome Biol 26, 34 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13059-025-03503-y

Download citation

Received: 21 June 2024
Accepted: 12 February 2025
Published: 20 February 2025
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13059-025-03503-y

stDyer enables spatial domain clustering with dynamic graph embedding

Abstract

Background

Results

Workflow of stDyer

stDyer improves spatial domain identification on the DLPFC dataset from 10x Visium technology

stDyer supports multi-slice spatial domain clustering

stDyer recognizes both tumor and interface domains on a zebrafish tumor dataset from 10x Visium technology

stDyer generates spatial domains with smooth boundaries on the mouse cortex dataset from osmFISH

stDyer detects clear laminar structures on the mouse cortex dataset from STARmap

stDyer can be scaled up to accommodate large-scale mouse embryo datasets from Stereo-Seq

Discussion

Conclusions

Methods

Data preprocessing

Construct dynamic graphs

Preprocessing of SRT data from multiple slices

The network structure of stDyer

Probabilistic models of modified GMVAE in stDyer

The objective function of stDyer

Training strategies of stDyer

Spatial domain label refinement

Integrated gradient analysis

Evaluation metrics

Data availability

References

Peer review information

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Supplementary Information

Additional file 1: Supplementary Notes S1-S7 and Supplementary Figures S1-S17.

Additional file 2: Supplementary Tables S1-S5.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Genome Biology

Contact us