Skip to main content
Fig. 1 | Genome Biology

Fig. 1

From: PRESCOTT: a population aware, epistatic, and structural model accurately predicts missense effects

Fig. 1

Conceptual building blocks in ESCOTT and PRESCOTT. The ESCOTT model is described in panels AJ and PRESCOTT in KL. A Given a protein sequence, for example in humans, ESCOTT examines its homologs and reconstructs a phylogenetic tree of sequences on which it evaluates the positions of the mutation in other species in the tree. The position of the query sequence in the tree is indicated by the red edge. B The effect of a mutation, such as P-to-R at position i, will be predicted based on sequences (panels C, F, and G) and on the model structure for the query sequence, if available (panels D and E). C ESCOTT estimates an evolutionary conservation term (TJET): for position j in the query sequence, occupied by the amino acid S, ESCOTT considers the level in the tree (dashed line—gray subtree) where the amino acid at that position appeared and remained conserved thereafter. ESCOTT defines the evolutionary conservation [23] of position j by looking at the height of maximal subtrees within the whole tree where the position is conserved, not necessarily with the same amino acid: here, five such maximal subtrees are shown, colored gray, cyan, brown and orange to differentiate amino acids. Two subtrees of maximum height (gray and cyan) are used to set the evolutionary conservation of the position. DE Positions 500–756 of the MHL1 human protein are colored on the structural model according to ESCOTT’s model terms, which combine evolutionary conservation (TJET), physico-chemical properties (PC), and structural core positions (CV). F ESCOTT estimates an epistatic term that evaluates the effect of a mutation P-to-R with the minimal global amount of changes needed to “accept” R within species in the tree. The further the sequences accepting R in the tree (green) are from the reference one (red), the greater the mutational effect. G ESCOTT compares sequence positions by favoring residues which are more conserved in the tree as shown for residue S at position j (panel G) vs residue P at position i (panel F). H ESCOTT mutational map records mutational effects for all the 500–756 positions of the MHL1 human protein. Mutation P-to-R at position 603 is highlighted and the score reported with the corresponding color code. I Averages of the scores by columns (across 19 mutations) are reported for positions 500–756 of MHL1. J MLH1 structure colored with average scores in I. K PRESCOTT combines ESCOTT scores and allele frequency in human populations, depending on whether allele frequency is higher than a fixed threshold. L Allele frequencies are computed for the eight populations in gnomAD and a PopMax model employed. Each missense mutation is analyzed independently with respect to the eight populations

Back to article page