Selectome © 2008/2017   

Filtering steps

Different kinds of quality filtering are used in Selectome to improve CodeML computations and discard low quality families and MSA regions:
  • Discard families without targeted taxa sequences (E.g. without Euteleostomi taxa)
  • Discard families with less than 6 sequences
    • Increase statistical power of codeml and the following methods
  • Filter by sequence with MaxAlign
    • Replace 'X' amino acids by gaps '-' in all sequences for this step
    • Then MaxAlign removes short sequences that disrupt the alignment
  • Re-do 2
  • Align sequences with Pagan, a phylogeny(-indel)-aware alignment method using partial-order graphs
    • Isolation of non-homologous regions
  • Compute MCoffee scores on the Pagan alignment
    • Replace residues with low score (< 8, from 0 to 7; and keep from 8 to 9) by 'x' (in lowercase for distinction with original X/undefined bases)
    • Follow the same MCoffee methods used by Ensembl Compara
  • Compute Guidance scores on the Pagan alignment, with Pagan aligner
    • Replace residues with low score (≤ 0.93) by 'x'
  • Merge MCoffee and Guidance scores
  • Map masking on related nucleotide MSA
    • Replace Selenocysteins/stop codons by 'x' because codeml cannot handled them
  • TrimAl removes columns with less than 4 residues
    • Discard poor alignment columns
  • Run M0 model
    • Pre-compute some parameters for faster branch-site model convergence
    • Shift to mitochondrial genetic code if required
  • Run Branch-site model with slimcodeml
    • Use branch lengths from the M0 model now on, as well as M0 initial parameters for branch-site model computations