Skip to main content
Fig. 2 | Genome Biology

Fig. 2

From: Seqrutinator: scrutiny of large protein superfamily sequence datasets for the identification and elimination of non-functional homologues

Fig. 2

Schematic of the procedure with default seqrutinator pipeline. The workflow for protein superfamily sequence mining consists of three blocks (left). Block 1 concerns the preparation of the input for the automated Seqrutinator pipeline in the second block. Block 2 illustrates Seqrutinator's modules in default order, including eventual iterations indicated by circular arrows and described in the main text and Additional file 1: Supplemental Document 1. NH Hit means non-homologous hit. The various “_removed.fsa” are archives with the removed sequences for each of the modules that can be analyzed in block 3, directed at the identification and recovery of inadvertently removed FH sequences. The schematic MSAs on the right show the truncation of the MSA in block 1 and, for each of the five modules of the automated pipeline, which sequences (indicated by triangles) are removed and why

Back to article page