Skip to main content
Fig. 4 | Genome Biology

Fig. 4

From: Seqrutinator: scrutiny of large protein superfamily sequence datasets for the identification and elimination of non-functional homologues

Fig. 4

Non-functional homologue sequences negatively affect MSA processing. A The number of reliable columns was computed with BMGE, entropy setting 0.8, for the MSAs made with the sequences before Seqrutinator (input) or the sequences retained after each module in the default pipeline. Species codes are as in Fig. 3. B Bar, boxplot, and raincloud density representation of number of reliable columns of the final output MSAs and the pseudo-input MSAs of the BAHD, CYP, and UGT cases. The pseudo-input MSA was obtained by removing all NFH sequences and subsequently all gap columns from the input MSAs. Gray lines connect output with corresponding pseudo-input set. * indicates significant difference with p < 0.001 (Wilcoxon signed-rank test). C Density and boxplot showing differences in reliable columns between pseudo-input and output MSAs of the BAHD, CYP, and UGT cases

Back to article page