pairwise alignment
DESCRIPTION
Pairwise Alignment. Alexei Drummond. Week 1 Learning Outcomes. Have an appreciation of what Computational Biology is Know what DNA, RNA and Protein sequences are :-) - PowerPoint PPT PresentationTRANSCRIPT
Pairwise Alignment
Alexei Drummond
2CS369 2007
Week 1 Learning Outcomes
• Have an appreciation of what Computational Biology is• Know what DNA, RNA and Protein sequences are :-)• Understand that sequence evolution can be modeled with
a stochastic model of evolution, so that the probability of evolving from one character to another in a certain time can be calculated
• Know what the Jukes Cantor and General time-reversible models molecular evolution imply in terms of rates and base frequencies.
3CS369 2007
Week 2 Learning Outcomes
• Understand the basic principles of dynamic programming
• Be familiar with the application of dynamic programming to a variety of simple examples such as– Knapsack problem– RNA secondary structure problem
CS369 2007 4
Dynamic Programming• method for solving combinatorial optimization
problems
• guaranteed to give optimal solution
• generalization of “divide-and-conquer”
• relies on “Principle of Optimality”
i.e. sub-optimal solution of sub-problem cannot be part of optimal solution of original problem instance.
CS369 2007 5
Auckland
Te Kuiti
Wellington
Principle of Optimality
CS369 2007 6
Auckland
Te Kuiti
Wellington
Principle of Optimality
CS369 2007 7
Key to efficiency
• computation is carried out bottom-up • store solutions to sub-problems in a table • all possible sub-problems solved once each, beginning
with smallest sub-problems • work up to original problem instance • only optimal solutions to sub-problems are used to
compute solution to problem at next level • DO NOT carry out computation in recursive, top-down
manner– same sub-problems would be solved many times
CS369 2007 8
Pairwise alignment
Sequences
x = a c g g t sy = a w g c c t t
Alignment
x = a – c g g – t sy = a w – g c c t t
CS369 2007 9
Scoring• Numeric score associated with each column• Total score = sum of column scores• Column types:
(1) Identical (+ve) (2) Conservative (+ve)
(3) Non-conservative (-ve) (4) Gap (-ve)
x = a – c g g – t sy = a w – g c c t t
CS369 2007 10
Scoring
• Model-based– Log-odds scoring
• Empirical– Often used for amino acid alignments– PAM matrices– BLOSUM matrices– JTT– WAG
• Different matrices used depending on the level of similarity of the sequences.– How do you know the similarity before doing the alignment?
CS369 2007 11
Log-odds matrices
“What we want to know is whether two sequences are homologous (evolutionarily related) or not, so we want an alignment score that reflects that. Theory says that if you want to compare two hypotheses, a good score is the log-odds score: the logarithm of the ratio of the likelihoods of your two hypotheses. If we assume that each aligned residue pair is statistically independent of the others (biologically dubious, but mathematically convenient), the alignment score is the sum of the individual log-odds score for each aligned residue pair.”Sean R Eddy 2004
CS369 2007 12
Log-odds matrices
“The numerator (pab) is the likelihood of the hypothesis we want to test: that these two residues are correlated because they’re homologous. Thus, pab are the target frequencies: the probability that we expect to observe residues a and b alignment in homologous sequence alignments. The denominator is the likelihood of a null hypothesis: that these two residues are uncorrelated and unrelated, occurring independently”Sean R Eddy, 2004
€
s(a,b) =1
λlog
pab
fa fb
CS369 2007 13
Evolutionary interpretation of match/mismatch scores
x y
x y
a, b not homologous
a, b homologoust/2
x y
€
d = μt
x y
€
d = ∞
(d=0.1 is roughly 90% similarity)
d = average number of changes per site
CS369 2007 14
Jukes Cantor Model
• All mutations are equally likely– xy at the same rate for all x, y
• All nucleotides are equally likely (equal base frequencies: – {0.25, 0.25, 0.25, 0.25} for DNA– {0.05,…,0.05} for Proteins
€
x,y ∈ {A,C,G,T}
x,y ∈ {A,R,N,D,C,E,Q,G,H,I,L,K,M,F,P,S,T,W ,Y,V}
DNA
Proteins
CS369 2007 15
Evolutionary interpretation of match/mismatch scores (DNA)
x y
€
d = μt
x y
€
d = ∞
(d=0.1 is roughly 90% similarity)
d = average number of changes per site
€
Pxx (d) =1
4+
3
4e
−4
3d
€
Px≠y (d) =1− Pxx (d)
=3
4−
3
4e
−4
3d
€
limd →∞
Pxx (d) = 0.25
limd →∞
Px≠y (d) = 0.75
CS369 2007 16
Log-odds match score
€
sxx =1
λlog
Pxx (d)
limd →∞
Pxx (d)=
1
λlog
Pxx (d)
0.25
Probability of ending in the same state after time d
Probability of ending in the same state after infinite time
CS369 2007 17
Log-odds mismatch score
€
sx≠y =1
λlog
Px≠y (d)
limd →∞
Px≠y (d)=
1
λlog
Px≠y (d)
0.75
Probability of ending in y (different from x) after time d
Probability of ending in y (different from x), after infinite time
CS369 2007 18
Evolutionary interpretation of match/mismatch scores (DNA)
Match and mismatch probabilities
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.5 1 1.5 2
Evolutionary distance (substitutions per site)
Probability
P(x=y,d)P(x!=y,d)
CS369 2007 19
Evolutionary interpretation of match/mismatch scores (DNA)
LogOdds Scores (Jukes Cantor model)
-2.50
-2.00
-1.50
-1.00
-0.50
0.00
0.50
1.00
1.50
2.00
0 0.5 1 1.5 2
Evolutionary distance (substitutions per site)
Scores
LogOdds(match)
LogOdds (mismatch)
CS369 2007 20
BLOSUM50 matrix
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
CS369 2007 21
• Linear score: (g) = -gd
gap penality
• Affine score: (g) = -d - (g-1)e
gap-open penality gap-extension penalty
Gap penalties
----------g
y
x
CS369 2007 22
Needleman & Wunsch algorithm
• Dynamic programming algorithm for global alignment
• Needleman & Wunsch (‘70), modified Gotoh (‘82)
Assumptions:
Linear gap score d
Symmetric scoring matrix S
s(a,b) = s(b,a) score from lining up a and b
s(a,-) = s(-,a) = -d score from lining up a with -
CS369 2007 23
Principle of Optimality
Given sequences:
Define:
F(i,j) = score of best alignment
between
and
€
Y = (y1,y2,...,yn )
X = (x1, x2,..., xm )
€
(x1,x2,..., x i)
€
(y1,y2,..., y j )
CS369 2007 24
Principle of Optimality
Optimal alignment
€
x1, x2, x3, ..., x i
€
y1, y2, y3, ..., y j
€
F(i, j)
CS369 2007 25
Principle of Optimality
Optimal alignment
€
x1, x2, x3, ..., x i
€
y1, y2, y3, ..., y j
Looks like ……
€
x1,x2,x3,...,x i−1
€
y1,y2,y3,...,y j−1
€
x i
€
y j
€
F(i, j)
€
F(i −1, j −1) + s(x i, y j )
CS369 2007 26
Principle of Optimality
Optimal alignment
€
x1, x2, x3, ..., x i
€
y1, y2, y3, ..., y j
Looks like ……
€
x1,x2,x3,...,x i−1
€
y1,y2,y3,...,y j−1
€
x i
€
y j
€
F(i, j)
€
F(i −1, j −1) + s(x i, y j )
or ……………
€
x1,x2, x3,...,x i
€
y1,y2, y3,...,y j−1
€
−
€
y j
€
F(i, j −1) − d
CS369 2007 27
Principle of Optimality
Optimal alignment
€
x1, x2, x3, ..., x i
€
y1, y2, y3, ..., y j
Looks like ……
€
x1,x2,x3,...,x i−1
€
y1,y2,y3,...,y j−1
€
x i
€
y j
€
F(i, j)
€
F(i −1, j −1) + s(x i, y j )
or ……………
€
x1,x2, x3,...,x i
€
y1,y2, y3,...,y j−1
€
−
€
y j
€
F(i, j −1) − d
or ……………
€
x1,x2,x3,..., x i−1
€
y1,y2,y3,...,y j
€
x i
€
−
€
F(i −1, j) − d
CS369 2007 28
Principle of Optimality
Optimal alignment
€
x1, x2, x3, ..., x i
€
y1, y2, y3, ..., y j
Looks like ……
€
x1,x2,x3,...,x i−1
€
y1,y2,y3,...,y j−1
€
x i
€
y j
€
F(i, j)
€
F(i −1, j −1) + s(x i, y j )
or ……………
€
x1,x2, x3,...,x i
€
y1,y2, y3,...,y j−1
€
−
€
y j
€
F(i, j −1) − d
or ……………
€
x1,x2,x3,..., x i−1
€
y1,y2,y3,...,y j
€
x i
€
−
€
F(i −1, j) − d
so ……………
€
F(i −1, j −1) + s(x i,y j )
F(i, j) = max F(i, j −1) − d
F(i −1, j) − d
CS369 2007 29
Principle of OptimalityBasis:
€
x1, x2, x3, ..., x i
€
− − − − ... −
€
y1, y2, y3, ..., y j
€
− − − − ... −
€
F(i,0) = F(i −1,0) + s(x i,−)
€
F(0, j) = F(0, j −1) + s(−, y j )
€
F(0,0) = 0
CS369 2007 30
Filling up table
0
F matrix
0
1
2
m
0 1 2 n
X
Y
CS369 2007 31
Filling up table
0
F matrix
0
1
2
m
0 1 2 n
X
Y
CS369 2007 32
Filling up table
0
F matrix
0
1
2
m
0 1 2 n
X
Y
CS369 2007 33
Filling up table
0
F matrix
0
1
2
m
0 1 2 n
X
Y
CS369 2007 34
Filling up table
0
F matrix
0
1
2
m
0 1 2 n
X
Y
CS369 2007 35
Filling up table
0
F matrix
0
1
2
m
0 1 2 n
X
Y
CS369 2007 36
Filling up table
0
F matrix
0
1
2
m
0 1 2 n
X
Y
CS369 2007 37
Filling up table
0
F matrix
0
1
2
m
0 1 2 n
X
Y
CS369 2007 38
Filling up table
0
F matrix
0
1
2
m
0 1 2 n
X
Y
CS369 2007 39
Filling up table
0
F matrix
0
1
2
m
0 1 2 n
X
Y
CS369 2007 40
Filling up table
0
F matrix
0
1
2
m
0 1 2 n
X
Y
CS369 2007 41
Filling up table
0
F matrix
0
1
2
m
0 1 2 n
X
Y
CS369 2007 42
Filling up table
0
F matrix
0
1
2
m
0 1 2 n
X
Y
CS369 2007 43
Filling up table
0
F matrix
0
1
2
m
0 1 2 n
X
Y
Optimalalignmentscore
CS369 2007 44
Constructing alignment
0
F matrix
0
1
2
m
0 1 2 n
X
Y
Optimalalignmentscore
CS369 2007 45
Example
0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80
-8 -2 -9 -17 -25 -33 -42 -49 -57 -65 -73
-16 -10 -3 -4 -12 -20 -28 -36 -44 -52 -60
-24 -18 -11 -6 -7 -15 -5 -13 -21 -29 -37
-32 -14 -18 -13 -8 -9 -13 -7 -3 -11 -19
-40 -22 -8 -16 -16 -9 -12 -15 -7 3 -5
-48 -30 -16 -3 -11 -11 -12 -12 -15 -5 2
-56 -38 -24 -11 -6 -12 -14 -15 -12 -9 1
F matrix
0
1
2
m
0 1 2 n
X
Y
Optimalalignmentscore
H E A G A W G H E E
P
A
W
H
E
A
E
CS369 2007 46
Example
0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80
-8 -2 -9 -17 -25 -33 -42 -49 -57 -65 -73
-16 -10 -3 -4 -12 -20 -28 -36 -44 -52 -60
-24 -18 -11 -6 -7 -15 -5 -13 -21 -29 -37
-32 -14 -18 -13 -8 -9 -13 -7 -3 -11 -19
-40 -22 -8 -16 -16 -9 -12 -15 -7 3 -5
-48 -30 -16 -3 -11 -11 -12 -12 -15 -5 2
-56 -38 -24 -11 -6 -12 -14 -15 -12 -9 1
F matrix
0
1
2
m
0 1 2 n
X
Optimalalignmentscore
P
A
W
H
E
A
E
? ? ? ? ? ? ? ? ? ? E
X
Y
Y
H E A G A W G H E E
? ? ? ? ? ? ? ? ? ? EAlignment
CS369 2007 47
Example
0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80
-8 -2 -9 -17 -25 -33 -42 -49 -57 -65 -73
-16 -10 -3 -4 -12 -20 -28 -36 -44 -52 -60
-24 -18 -11 -6 -7 -15 -5 -13 -21 -29 -37
-32 -14 -18 -13 -8 -9 -13 -7 -3 -11 -19
-40 -22 -8 -16 -16 -9 -12 -15 -7 3 -5
-48 -30 -16 -3 -11 -11 -12 -12 -15 -5 2
-56 -38 -24 -11 -6 -12 -14 -15 -12 -9 1
F matrix
0
1
2
m
0 1 2 n
X
Optimalalignmentscore
P
A
W
H
E
A
E
Y
H E A G A W G H E E
X
Y ? ? ? ? ? ? ? ? ? - E
? ? ? ? ? ? ? ? ? A EAlignment
CS369 2007 48
Example
0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80
-8 -2 -9 -17 -25 -33 -42 -49 -57 -65 -73
-16 -10 -3 -4 -12 -20 -28 -36 -44 -52 -60
-24 -18 -11 -6 -7 -15 -5 -13 -21 -29 -37
-32 -14 -18 -13 -8 -9 -13 -7 -3 -11 -19
-40 -22 -8 -16 -16 -9 -12 -15 -7 3 -5
-48 -30 -16 -3 -11 -11 -12 -12 -15 -5 2
-56 -38 -24 -11 -6 -12 -14 -15 -12 -9 1
F matrix
0
1
2
m
0 1 2 n
X
Optimalalignmentscore
P
A
W
H
E
A
E
Y
H E A G A W G H E E
X
Y ? ? ? ? ? ? ? ? E - E
? ? ? ? ? ? ? ? E A EAlignment
CS369 2007 49
Example
0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80
-8 -2 -9 -17 -25 -33 -42 -49 -57 -65 -73
-16 -10 -3 -4 -12 -20 -28 -36 -44 -52 -60
-24 -18 -11 -6 -7 -15 -5 -13 -21 -29 -37
-32 -14 -18 -13 -8 -9 -13 -7 -3 -11 -19
-40 -22 -8 -16 -16 -9 -12 -15 -7 3 -5
-48 -30 -16 -3 -11 -11 -12 -12 -15 -5 2
-56 -38 -24 -11 -6 -12 -14 -15 -12 -9 1
F matrix
0
1
2
m
0 1 2 n
X
Optimalalignmentscore
P
A
W
H
E
A
E
Y
H E A G A W G H E E
X
Y ? ? ? ? ? ? ? H E - E
? ? ? ? ? ? ? H E A EAlignment
CS369 2007 50
Example
0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80
-8 -2 -9 -17 -25 -33 -42 -49 -57 -65 -73
-16 -10 -3 -4 -12 -20 -28 -36 -44 -52 -60
-24 -18 -11 -6 -7 -15 -5 -13 -21 -29 -37
-32 -14 -18 -13 -8 -9 -13 -7 -3 -11 -19
-40 -22 -8 -16 -16 -9 -12 -15 -7 3 -5
-48 -30 -16 -3 -11 -11 -12 -12 -15 -5 2
-56 -38 -24 -11 -6 -12 -14 -15 -12 -9 1
F matrix
0
1
2
m
0 1 2 n
X
Optimalalignmentscore
P
A
W
H
E
A
E
Y
H E A G A W G H E E
X
Y ? ? ? ? ? ? G H E - E
? ? ? ? ? ? - H E A EAlignment
CS369 2007 51
Example
0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80
-8 -2 -9 -17 -25 -33 -42 -49 -57 -65 -73
-16 -10 -3 -4 -12 -20 -28 -36 -44 -52 -60
-24 -18 -11 -6 -7 -15 -5 -13 -21 -29 -37
-32 -14 -18 -13 -8 -9 -13 -7 -3 -11 -19
-40 -22 -8 -16 -16 -9 -12 -15 -7 3 -5
-48 -30 -16 -3 -11 -11 -12 -12 -15 -5 2
-56 -38 -24 -11 -6 -12 -14 -15 -12 -9 1
F matrix
0
1
2
m
0 1 2 n
X
Optimalalignmentscore
P
A
W
H
E
A
E
Y
H E A G A W G H E E
AlignmentX
Y ? ? ? ? ? W G H E - E
? ? ? ? ? W - H E A E
CS369 2007 52
Example
0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80
-8 -2 -9 -17 -25 -33 -42 -49 -57 -65 -73
-16 -10 -3 -4 -12 -20 -28 -36 -44 -52 -60
-24 -18 -11 -6 -7 -15 -5 -13 -21 -29 -37
-32 -14 -18 -13 -8 -9 -13 -7 -3 -11 -19
-40 -22 -8 -16 -16 -9 -12 -15 -7 3 -5
-48 -30 -16 -3 -11 -11 -12 -12 -15 -5 2
-56 -38 -24 -11 -6 -12 -14 -15 -12 -9 1
F matrix
0
1
2
m
0 1 2 n
X
Optimalalignmentscore
P
A
W
H
E
A
E
Y
H E A G A W G H E E
AlignmentX
Y ? ? ? ? A W G H E - E
? ? ? ? A W - H E A E
CS369 2007 53
Example
0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80
-8 -2 -9 -17 -25 -33 -42 -49 -57 -65 -73
-16 -10 -3 -4 -12 -20 -28 -36 -44 -52 -60
-24 -18 -11 -6 -7 -15 -5 -13 -21 -29 -37
-32 -14 -18 -13 -8 -9 -13 -7 -3 -11 -19
-40 -22 -8 -16 -16 -9 -12 -15 -7 3 -5
-48 -30 -16 -3 -11 -11 -12 -12 -15 -5 2
-56 -38 -24 -11 -6 -12 -14 -15 -12 -9 1
F matrix
0
1
2
m
0 1 2 n
X
Optimalalignmentscore
P
A
W
H
E
A
E
Y
H E A G A W G H E E
AlignmentX
Y ? ? ? G A W G H E - E
? ? ? - A W - H E A E
CS369 2007 54
Example
0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80
-8 -2 -9 -17 -25 -33 -42 -49 -57 -65 -73
-16 -10 -3 -4 -12 -20 -28 -36 -44 -52 -60
-24 -18 -11 -6 -7 -15 -5 -13 -21 -29 -37
-32 -14 -18 -13 -8 -9 -13 -7 -3 -11 -19
-40 -22 -8 -16 -16 -9 -12 -15 -7 3 -5
-48 -30 -16 -3 -11 -11 -12 -12 -15 -5 2
-56 -38 -24 -11 -6 -12 -14 -15 -12 -9 1
F matrix
0
1
2
m
0 1 2 n
X
Optimalalignmentscore
P
A
W
H
E
A
E
Y
H E A G A W G H E E
AlignmentX
Y ? ? A G A W G H E - E
? ? P - A W - H E A E
CS369 2007 55
Example
0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80
-8 -2 -9 -17 -25 -33 -42 -49 -57 -65 -73
-16 -10 -3 -4 -12 -20 -28 -36 -44 -52 -60
-24 -18 -11 -6 -7 -15 -5 -13 -21 -29 -37
-32 -14 -18 -13 -8 -9 -13 -7 -3 -11 -19
-40 -22 -8 -16 -16 -9 -12 -15 -7 3 -5
-48 -30 -16 -3 -11 -11 -12 -12 -15 -5 2
-56 -38 -24 -11 -6 -12 -14 -15 -12 -9 1
F matrix
0
1
2
m
0 1 2 n
X
Optimalalignmentscore
P
A
W
H
E
A
E
Y
H E A G A W G H E E
AlignmentX
Y ? E A G A W G H E - E
? - P - A W - H E A E
CS369 2007 56
Example
0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80
-8 -2 -9 -17 -25 -33 -42 -49 -57 -65 -73
-16 -10 -3 -4 -12 -20 -28 -36 -44 -52 -60
-24 -18 -11 -6 -7 -15 -5 -13 -21 -29 -37
-32 -14 -18 -13 -8 -9 -13 -7 -3 -11 -19
-40 -22 -8 -16 -16 -9 -12 -15 -7 3 -5
-48 -30 -16 -3 -11 -11 -12 -12 -15 -5 2
-56 -38 -24 -11 -6 -12 -14 -15 -12 -9 1
F matrix
0
1
2
m
0 1 2 n
X
Optimalalignmentscore
P
A
W
H
E
A
E
Y
H E A G A W G H E E
AlignmentX
Y H E A G A W G H E - E
- - P - A W - H E A E
CS369 2007 57
Time and space
€
⇒ Θ(mn)
F matrix
0
1
2
m
0 1 2 n
€
(m +1) × (n +1) table entries space
Each entry computed in constant time
€
⇒ Θ(mn) time
CS369 2007 58
Smith & Waterman algorithmComputes local alignment.
i.e. look for best alignment of subsequences of X and Y, ignoring scoresof regions on either side
Y
X
Best subsequence alignment
CS369 2007 59
Principle of Optimality
Given sequences
Define F(i,j) = score of best suffix alignment
between
and
N.B. Includes empty alignment with score 0
€
Y = (y1, y2,..., yn )
X = (x1,x2,...,xm )
€
(xs,xs+1,...,x i) where s ≤ i
€
(yr, yr+1,..., y j ) where r ≤ j
CS369 2007 60
Dynamic Programming recurrencesOptimal alignment
€
xr, xr+1, xr+2, ...,x i
€
ys, ys+1, ys+2, ..., y j
Looks like ……
€
xr,xr+2,xr+2,...,x i−1
€
ys,ys+1, ys+2,..., y j−1
€
x i
€
y j
€
F(i, j)
€
F(i −1, j −1) + s(x i,y j )
or ……………
€
xr, xr+1, xr+2,..., x i
€
ys, ys+1, ys+2,...,y j−1
€
−
€
y j
€
F(i, j −1) − d
or ……………
€
xr,xr+1,xr+2,...,x i−1
€
ys, ys+1,ys+2,...,y j
€
x i
€
−
€
F(i −1, j) − d
or ……………
€
xr, xr+1, xr+2, ...,x i
€
ys, ys+1, ys+2, ..., y j
€
0
CS369 2007 61
Principle of Optimality
so ……
€
0
F(i −1, j −1) + s(x i,y j )
F(i, j) = max F(i, j −1) − d
F(i −1, j) − d
€
F(i,0) = F(0, j) = 0Basis:
CS369 2007 62
ExampleF H E A G A W G H E E
0 0 0 0 0 0 0 0 0 0 0
P 0 0 0 0 0 0 0 0 0 0 0
A 0 0 0 5 0 5 0 0 0 0 0
W 0 0 0 0 2 0 20 12 4 0 0
H 0 10 2 0 0 0 12 18 22 14 6
E 0 2 16 8 0 0 4 10 18 28 20
A 0 0 8 21 13 5 0 4 10 20 27
E 0 0 6 13 18 12 4 0 4 16 26
CS369 2007 63
ExampleF H E A G A W G H E E
0 0 0 0 0 0 0 0 0 0 0
P 0 0 0 0 0 0 0 0 0 0 0
A 0 0 0 5 0 5 0 0 0 0 0
W 0 0 0 0 2 0 20 12 4 0 0
H 0 10 2 0 0 0 12 18 22 14 6
E 0 2 16 8 0 0 4 10 18 28 20
A 0 0 8 21 13 5 0 4 10 20 27
E 0 0 6 13 18 12 4 0 4 16 26
AlignmentX
Y A W G H E
A W - H E
CS369 2007 64
Repeated (local) matches
Long sequences - interested in all local alignments with significant score,> threshold T.
e.g. copies of repeated domain or motif in a protein.
X = sequence containing motif
Y = target sequence
Method is asymmetric
Y
Matching parts of X
CS369 2007 65
Principle of Optimality
Given sequences
Define F(i,j) (i ≥ 1) = best sum of match scores in
and €
Y = (y1,y2,..., yn )
X = (x1,x2,..., xm )
€
(x1,x2,..., x i)
€
(y1, y2,..., y j )
€
y j
€
x i
€
y j
assuming
and match ends in
is in a matched region
or
CS369 2007 66
Ends of matches
€
F(0,0) = 0
€
F(0, j) = best sum of completed match scores to
€
(y1, y2,...,y j )
assuming that
€
y j is not in a matched region
€
F(0, j −1)
F(0, j) = max F(i, j −1) − T, i =1,...,n
Row 0 therefore marks unmatched regions and ends of matches in Y.
CS369 2007 67
General recurrence
€
F(0, j)
F(i −1, j −1) + s(x i, y j )
F(i, j) = max F(i, j −1) − d
F(i −1, j) − d
Start of new match
Extension of previous match
CS369 2007 68
ExampleF H E A G A W G H E E
0 0 0 0 1 1 1 1 1 3 9
P 0 0 0 0 1 1 1 1 1 3 9
A 0 0 0 5 1 6 1 1 1 3 9
W 0 0 0 0 2 1 21 13 5 3 9
H 0 10 2 0 1 1 13 19 23 15 9
E 0 2 16 8 1 1 5 11 19 29 21
A 0 0 8 21 13 6 1 5 11 21 28
E 0 0 6 13 18 12 4 1 5 17 27
9
Extra cell for final total score
CS369 2007 69
Example
AlignmentX
Y H E A G A W G H E E
H E A . A W - H E .
Extra cell for final total score
F H E A G A W G H E E
0 0 0 0 1 1 1 1 1 3 9
P 0 0 0 0 1 1 1 1 1 3 9
A 0 0 0 5 1 6 1 1 1 3 9
W 0 0 0 0 2 1 21 13 5 3 9
H 0 10 2 0 1 1 13 19 23 15 9
E 0 2 16 8 1 1 5 11 19 29 21
A 0 0 8 21 13 6 1 5 11 21 28
E 0 0 6 13 18 12 4 1 5 17 27
9
CS369 2007 70
Overlap matchesY Y
X X
YY
X X
Don’t penalise overhanging ends i.e. set F(i,0) = F(0,j) = 0
€
F(i −1, j −1) + s(x i,y j )
F(i, j) = max F(i, j −1) − d
F(i −1, j) − d
Otherwise
CS369 2007 71
ExampleF H E A G A W G H E E
0 0 0 0 0 0 0 0 0 0 0
P 0 -2̀ -1 -1 -2 -1 -4 -2 -2 -1 -1
A 0 -2 -2 4 -1 3 -4 -4 -4 -3 -2
W 0 -3 -5 -4 1 -4 18 10 2 6 -6
H 0 10 2 6 -6 -1 10 16 20 12 4
E 0 2 16 8 0 7 2 8 16 26 18
A 0 -2 8 21 13 5 3 2 8 18 25
E 0 0 4 13 18 12 4 4 2 14 24
CS369 2007 72
ExampleF H E A G A W G H E E
0 0 0 0 0 0 0 0 0 0 0
P 0 -2̀ -1 -1 -2 -1 -4 -2 -2 -1 -1
A 0 -2 -2 4 -1 3 -4 -4 -4 -3 -2
W 0 -3 -5 -4 1 -4 18 10 2 6 -6
H 0 10 2 6 -6 -1 10 16 20 12 4
E 0 2 16 8 0 7 2 8 16 26 18
A 0 -2 8 21 13 5 3 2 8 18 25
E 0 0 4 13 18 12 4 4 2 14 24
AlignmentX
Y G A W G H E E
P A W - H E A