calculating substitution matrices

Calculating substitution matrices • http://www.techfak.uni-bielef eld.de/bcd/Curric/PrwAli/node D.html#wm5 Two models one random (R) and one match (M) for sequence alignment The random model assumes that letter a occurs independently with some frequency q a , the probability of the two sequences is just the product of the probabilities of each amino acid:

Upload: lucius-farmer

Post on 01-Jan-2016

15 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

DESCRIPTION

Calculating substitution matrices. - PowerPoint PPT Presentation

TRANSCRIPT

Calculating substitution matrices

• http://www.techfak.uni-bielefeld.de/bcd/Curric/PrwAli/nodeD.html#wm5

Two models one random (R) and one match (M) for sequence alignmentThe random model assumes that letter a occurs independently with some frequency qa, the probability of the two sequences is just the product of the probabilities of each amino acid:P(x,y|R) =iqxi jqyj

http://www.techfak.uni-bielefeld.de/bcd/Curric/PrwAli/nodeD.html#wm5

Page 2: Calculating substitution matrices

Odds ratio

• The match model aligns residues with a joint probability pab

– P (x,y|M) = ipxiyi

• The ratio of match to random is known as odds ratio:

P(x,y|M)/P(x,y|R) = i (pxiyi/qxiqyi)

Page 3: Calculating substitution matrices

Log odds ratio

• s(a, b) = log (pab/qaqb)• S = i s(xi, yi)• This last equation is the sum of individual

scores for each aligned pair of residues. The first equation refers to scores in a matrix, for instance, proteins exhibit a 20 X 20 matrix known as a score or substitution matrix. (BLOSUM, PAM)

Page 4: Calculating substitution matrices

Significance of scores using alignment algorithms

• Calculate a raw Score– Sum of scores for each letter to letter and letter

to null position

• Calculate a bit score– Normalizes for scoring system used

• Calculate an E-value– Calculated from bit score to account for

probability the hit arose by chance

Page 5: Calculating substitution matrices

Raw score

• Calculated from substitution matrices (PAM, BLOSUM), and gap costs

• There are substitution matrices for nucleotides also:– States, D.J., Gish, W. & Altschul, S.F. (1991)

"Improved sensitivity of nucleic acid database searches using application-specific scoring matrices." Methods 3:66-70.

Page 6: Calculating substitution matrices

Bit score

• S’ = (S – lnK)/ ln 2• lambda and K are parameters dependent upon

the scoring system (substitution matrix and gap costs) employed – Karlin, S. & Altschul, S.F. (1990) "Methods for assessing the

statistical significance of molecular sequence features by using general scoring schemes." Proc. Natl. Acad. Sci. USA

87:2264-2268. – http://www.ncbi.nlm.nih.gov/BLAST/matrix_info.html#lambda

• Gap costs – the standard cost associated with a gap of length g

http://www.ncbi.nlm.nih.gov/BLAST/matrix_info.html#lambda

Page 7: Calculating substitution matrices

Gap costs• Can be linear – like we did in our matrix

(g) = -gd

• Can be an “affine” score – most prevalent now(g) = -d – (g-1)e

Where d is called the gap-open penalty and e is called the gap-extension penalty. The gap extension penalty e is usually less than the d, allowing long insertions and deletions to be penalized less

Page 8: Calculating substitution matrices

E - value

• E = N/2S’

• This is an approximation for the number (E) of distinct HSP’s with normalized score at least S’ expected to occur by chance when two random protein sequences of sufficient lengths m and n are compared

• N = mn (search space size)

Page 9: Calculating substitution matrices

Database searching

• If a protein is compared to whole database, n is the database length in residues

• The equation can be converted to:– S’ = log2(N/E)

• If a protein of length 250 might be compared to a protein database of 5 x 106 residues, to achieve a marginally significant E-value of 0.05 a normalized score of 38 bits is necessary

Page 10: Calculating substitution matrices

Significance of E - value

• E value is between 1 and 0

• The lower the E value the more significant the match

• Note that the E value is dependent on the length of query sequence – An E value of .05 is more significant for a query of 100 amino acids, than 200 amino acids

Electrophilic Aromatic Substitution - UCLA …harding/summaries/14D09.pdfElectrophilic Aromatic Substitution What is Electrophilic Aromatic Substitution? (EAS) EAS is a substitution

Parallel Algorithms for Forward and Back Substitution in ... · tion, sparse symmetric matrices, triangular solution. 1. Introduction When dealing with problems of structural and

Comparing Protein Sequences Tutorial 4. Comparing Protein Sequences Substitution Matrices –PAM - Point Accepted Mutations –BLOSUM - Blocks Substitution

Construction of Substitution matrices BLOSUM BLO CKS SU BSTITUTION M ATRIX PAM

Thomas algorithm to solve tridiagonal matrices. Basically sets up an LU decomposition three parts 1) decomposition 2) forward substitution 3) backward

Gaussian Elimination Method with Backward Substitution …fac.ksu.edu.sa/sites/default/files/matlab_lecture2_0.pdf · Vectors and Matrices For Statement If Statement Functions that

SubVis: an interactive R package for exploring the effects of … · 2017-06-27 · exploring the effects of multiple substitution matrices on pairwise sequence alignment ... Computational

Formal Computational Skills Matrices 1. Overview Motivation: many mathematical uses eg Writing networks operations Solving linear equations Calculating

SECTION 6.2 Integration by Substitution. U-SUBSTITUTION

Sequence analysis 2005 - lecture 5 Sequence analysis course Lecture 5 Multiple sequence alignment 1 of 3 Amino acid substitution matrices

Training on - CPRAC · SUBSPORT portal presentation ! Training method ! Definition of substitution ! Drivers for substitution ! Stakeholders in substitution ! Substitution steps

Compositionally Adjusted Substitution Matrices for Protein Database Searches