bio info statistical-methods[1]
TRANSCRIPT
Dot Matrix•First described by Gibbs and McIntyre (1970)•Dot matrix analysis of DNA sequence (W=11, S=7) Phage P22 c2 repressor
Phage lambda cI
Dot Matrix•Dot matrix analysis of amino acid sequence (W=1, S=1) Phage lambda cI
Phage P22 c2 repressor
Filtering in Dot Matrix•Filtering can be applied using Sliding windows Window size Match requirement (Stringency) DNA 15 10 Protein 2/3 2
•For DNA Long Windows, higher Stringency For Proteins Short Windows, Low Stringency For Protein Domains Long Windows, Low Stringency
Dynamic programming•Compares every pair of characters in the two sequences and generates an alignment
•Alignment includes matches, mismatches and gaps
•Alignments obtained depend on the choice of scoring system
Scoring matrices
•Certain amino acid substitutions common in related proteins from different species
Proteins still function with these substitutions
PAM (Percent Accepted Mutation)
•Based on evolutionary principles
•Each matrix gives the changes expected for a given period of evolutionary time
•Each change at a particular site is assumed to be independent of previous mutational events
•Estimations are based on 1572 changes in 71 groups of protein sequences that were at least 85% similar
PAM (Percent Accepted Mutation)
PAM1 matrix estimates what rate of substitution would be expected if 1% of the amino acids had changed
Similarity Matrix used40% PAM12050% PAM8060% PAM6014-27% PAM250
BLOSUM (Blocks Amino acid Substitution Matrices)
Matrix values are based on amino acid substitutions in a large set of ~2000 conserved amino acid patterns (blocks)
Note: patterns are found by MOTIFMOTIF program