Skip to main content
Fig. 2 | Genome Biology

Fig. 2

From: Polygraph: a software framework for the systematic assessment of synthetic regulatory DNA elements

Fig. 2

Sequence analysis of native and designed yeast promoters. A Box plots showing the distribution of the edit distance from the most similar strong native yeast promoter, for weak native yeast promoters (Native (Weak)), and synthetic promoter sequences designed by editing Native (Weak) promoters via directed evolution (Directed Evolution), gradient-based optimization (Gradient), or guided evolution (Guided Evolution). B Box plots showing the distribution of GC content in each group of sequences. Native (Strong) represents strong native yeast promoters. C Bar plot showing the number of differentially abundant 4-mers (Mann–Whitney U test FDR-adjusted p value < 0.01) in each group of promoters compared to Native (Strong). D Number of differentially abundant transcription factor binding motifs (Mann–Whitney U test FDR-adjusted p value < 0.01) in each group of promoters compared to Native (Strong). E Histogram of SFP1 motif start locations in each group of promoters. F Non-negative matrix factorization (NMF) of the motif frequency matrix into 5 components. Each column represents a sequence and colors show the contribution of each factor to the motif composition of the sequence. G Heatmap showing the top motifs and their contributions to each NMF component. The top 15 motifs ranked by maximum contribution to any NMF component are shown. H Top 4 motifs enriched in factor 0. I PCA visualization of sequence embeddings from the last convolutional layer of a sequence-to-expression predictive model, for all groups of promoter sequences. Ellipses represent the 95% confidence boundary of multivariate normal distributions fitted to the data. J Box plots showing the distance of each sequence to its nearest neighbor in the Native (Strong) group, in the embedding space shown in I. K Box plots showing the Euclidean distance of each sequence to its 5 nearest neighbors in the same group in the embedding space shown in I, a metric of within-group sequence diversity. *: p < 0.05, **: p < 0.01, ***: p < 0.001

Back to article page