Fig. 5

Seqrutinator is robust and flexible. A Sequence fate in different pipelines. Top: Alluvial plot showing the fate of initial BAHD, CYP and UGT representatives (2003, 6782, and 3994 sequences respectively from 16 species sets (Ath10 for A. thaliana) and the curated SwissProt set) in different pipelines of following Pfam scans with cut-off thresholds as indicated. Bottom: Schematic illustration of applied pipelines (S: SSR (1); N: NHHR (2); G: GIR (3) C: CGSR (4); P: PR (5); PS: Pfam Scan; and A: accepted). α2 and α5 indicate pipes with the more strict 2.35σ cut-off in NHHR and PR, respectively. B HMMERCTTER clustering of BAHD sequence sets. Top: Cluster-wise colored maximum likelihood trees and HMMERCTTER partitions of five BAHD sequence-sets as indicated: Input: partition of initial sequences; Pfam: partition of sequences obtained with most significant Pfam scan (expect value 1E-50); 12345 def: partition of sequences accepted by default Seqrutinator pipeline; 4235 and 134: partitions of sequences accepted by alternative Seqrutinator pipelines. Each cluster is automatically assigned a different color, black leaves are unclustered sequences or orphans. Bottom: Numerical abstract of clustering analysis of all nine tested datasets. Shown are the total number of sequences, the number and percentage of clustered sequences, the number of clusters and the cluster scores ((Clustered sequences-Orphans)/Total Sequences). C Boxplots of cluster sizes of obtained HMMERCTTER partitions. The dotted lines show the mean and the standard deviation