pairwise alignment

Pairwise Alignment

Alexei Drummond

2CS369 2007

Week 1 Learning Outcomes

• Have an appreciation of what Computational Biology is• Know what DNA, RNA and Protein sequences are :-)• Understand that sequence evolution can be modeled with

a stochastic model of evolution, so that the probability of evolving from one character to another in a certain time can be calculated

• Know what the Jukes Cantor and General time-reversible models molecular evolution imply in terms of rates and base frequencies.

3CS369 2007

Week 2 Learning Outcomes

• Understand the basic principles of dynamic programming

• Be familiar with the application of dynamic programming to a variety of simple examples such as– Knapsack problem– RNA secondary structure problem

CS369 2007 4

Dynamic Programming• method for solving combinatorial optimization

problems

• guaranteed to give optimal solution

• generalization of “divide-and-conquer”

• relies on “Principle of Optimality”

i.e. sub-optimal solution of sub-problem cannot be part of optimal solution of original problem instance.

CS369 2007 5

Auckland

Te Kuiti

Wellington

Principle of Optimality

CS369 2007 6

Auckland

Te Kuiti

Wellington


CS369 2007 7

Key to efficiency

• computation is carried out bottom-up • store solutions to sub-problems in a table • all possible sub-problems solved once each, beginning

with smallest sub-problems • work up to original problem instance • only optimal solutions to sub-problems are used to

compute solution to problem at next level • DO NOT carry out computation in recursive, top-down

manner– same sub-problems would be solved many times

CS369 2007 8

Pairwise alignment

Sequences

x = a c g g t sy = a w g c c t t

Alignment

x = a – c g g – t sy = a w – g c c t t

CS369 2007 9

Scoring• Numeric score associated with each column• Total score = sum of column scores• Column types:

(1) Identical (+ve) (2) Conservative (+ve)

(3) Non-conservative (-ve) (4) Gap (-ve)

x = a – c g g – t sy = a w – g c c t t

CS369 2007 10

Scoring

• Model-based– Log-odds scoring

• Empirical– Often used for amino acid alignments– PAM matrices– BLOSUM matrices– JTT– WAG

• Different matrices used depending on the level of similarity of the sequences.– How do you know the similarity before doing the alignment?

CS369 2007 11

Log-odds matrices

“What we want to know is whether two sequences are homologous (evolutionarily related) or not, so we want an alignment score that reflects that. Theory says that if you want to compare two hypotheses, a good score is the log-odds score: the logarithm of the ratio of the likelihoods of your two hypotheses. If we assume that each aligned residue pair is statistically independent of the others (biologically dubious, but mathematically convenient), the alignment score is the sum of the individual log-odds score for each aligned residue pair.”Sean R Eddy 2004

CS369 2007 12

Log-odds matrices

“The numerator (pab) is the likelihood of the hypothesis we want to test: that these two residues are correlated because they’re homologous. Thus, pab are the target frequencies: the probability that we expect to observe residues a and b alignment in homologous sequence alignments. The denominator is the likelihood of a null hypothesis: that these two residues are uncorrelated and unrelated, occurring independently”Sean R Eddy, 2004

€

s(a,b) =1

λlog

pab

fa fb

CS369 2007 13

Evolutionary interpretation of match/mismatch scores

x y

x y

a, b not homologous

a, b homologoust/2

x y

€

d = μt

x y

€

d = ∞

(d=0.1 is roughly 90% similarity)

d = average number of changes per site

CS369 2007 14

Jukes Cantor Model

• All mutations are equally likely– xy at the same rate for all x, y

• All nucleotides are equally likely (equal base frequencies: – {0.25, 0.25, 0.25, 0.25} for DNA– {0.05,…,0.05} for Proteins

€

x,y ∈ {A,C,G,T}

x,y ∈ {A,R,N,D,C,E,Q,G,H,I,L,K,M,F,P,S,T,W ,Y,V}

DNA

Proteins

CS369 2007 15

Evolutionary interpretation of match/mismatch scores (DNA)

x y

€

d = μt

x y

€

d = ∞

(d=0.1 is roughly 90% similarity)

d = average number of changes per site

€

Pxx (d) =1

4+

3

4e

−4

3d

€

Px≠y (d) =1− Pxx (d)

=3

4−

3

4e

−4

3d

€

limd →∞

Pxx (d) = 0.25

limd →∞

Px≠y (d) = 0.75

CS369 2007 16

Log-odds match score

€

sxx =1

λlog

Pxx (d)

limd →∞

Pxx (d)=

1

λlog

Pxx (d)

0.25

Probability of ending in the same state after time d

Probability of ending in the same state after infinite time

CS369 2007 17

Log-odds mismatch score

€

sx≠y =1

λlog

Px≠y (d)

limd →∞

Px≠y (d)=

1

λlog

Px≠y (d)

0.75

Probability of ending in y (different from x) after time d

Probability of ending in y (different from x), after infinite time

CS369 2007 18


Match and mismatch probabilities

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.5 1 1.5 2

Evolutionary distance (substitutions per site)

Probability

P(x=y,d)P(x!=y,d)

CS369 2007 19


LogOdds Scores (Jukes Cantor model)

-2.50

-2.00

-1.50

-1.00

-0.50

0.00

0.50

1.00

1.50

2.00

0 0.5 1 1.5 2

Evolutionary distance (substitutions per site)

Scores

LogOdds(match)

LogOdds (mismatch)

CS369 2007 20

BLOSUM50 matrix

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

CS369 2007 21

• Linear score: (g) = -gd

gap penality

• Affine score: (g) = -d - (g-1)e

gap-open penality gap-extension penalty

Gap penalties

----------g

y

x

CS369 2007 22

Needleman & Wunsch algorithm

• Dynamic programming algorithm for global alignment

• Needleman & Wunsch (‘70), modified Gotoh (‘82)

Assumptions:

Linear gap score d

Symmetric scoring matrix S

s(a,b) = s(b,a) score from lining up a and b

s(a,-) = s(-,a) = -d score from lining up a with -

CS369 2007 23


Given sequences:

Define:

F(i,j) = score of best alignment

between

and

€

Y = (y1,y2,...,yn )

X = (x1, x2,..., xm )

€

(x1,x2,..., x i)

€

(y1,y2,..., y j )

CS369 2007 24


Optimal alignment

€

x1, x2, x3, ..., x i

€

y1, y2, y3, ..., y j

€

F(i, j)

CS369 2007 25


Optimal alignment

€

x1, x2, x3, ..., x i

€

y1, y2, y3, ..., y j

Looks like ……

€

x1,x2,x3,...,x i−1

€

y1,y2,y3,...,y j−1

€

x i

€

y j

€

F(i, j)

€

F(i −1, j −1) + s(x i, y j )

CS369 2007 26


Optimal alignment

€

x1, x2, x3, ..., x i

€

y1, y2, y3, ..., y j

Looks like ……

€

x1,x2,x3,...,x i−1

€

y1,y2,y3,...,y j−1

€

x i

€

y j

€

F(i, j)

€

F(i −1, j −1) + s(x i, y j )

or ……………

€

x1,x2, x3,...,x i

€

y1,y2, y3,...,y j−1

€

−

€

y j

€

F(i, j −1) − d

CS369 2007 27


Optimal alignment

€

x1, x2, x3, ..., x i

€

y1, y2, y3, ..., y j

Looks like ……

€

x1,x2,x3,...,x i−1

€

y1,y2,y3,...,y j−1

€

x i

€

y j

€

F(i, j)

€

F(i −1, j −1) + s(x i, y j )

or ……………

€

x1,x2, x3,...,x i

€

y1,y2, y3,...,y j−1

€

−

€

y j

€

F(i, j −1) − d

or ……………

€

x1,x2,x3,..., x i−1

€

y1,y2,y3,...,y j

€

x i

€

−

€

F(i −1, j) − d

CS369 2007 28


Optimal alignment

€

x1, x2, x3, ..., x i

€

y1, y2, y3, ..., y j

Looks like ……

€

x1,x2,x3,...,x i−1

€

y1,y2,y3,...,y j−1

€

x i

€

y j

€

F(i, j)

€

F(i −1, j −1) + s(x i, y j )

or ……………

€

x1,x2, x3,...,x i

€

y1,y2, y3,...,y j−1

€

−

€

y j

€

F(i, j −1) − d

or ……………

€

x1,x2,x3,..., x i−1

€

y1,y2,y3,...,y j

€

x i

€

−

€

F(i −1, j) − d

so ……………

€

F(i −1, j −1) + s(x i,y j )

F(i, j) = max F(i, j −1) − d

F(i −1, j) − d

CS369 2007 29

Principle of OptimalityBasis:

€

x1, x2, x3, ..., x i

€

− − − − ... −

€

y1, y2, y3, ..., y j

€

− − − − ... −

€

F(i,0) = F(i −1,0) + s(x i,−)

€

F(0, j) = F(0, j −1) + s(−, y j )

€

F(0,0) = 0

CS369 2007 30

Filling up table

0

F matrix

0

1

2

m

0 1 2 n

X

Y

CS369 2007 31

Filling up table

0

F matrix

0

1

2

m

0 1 2 n

X

Y

CS369 2007 32

Filling up table

0

F matrix

0

1

2

m

0 1 2 n

X

Y

CS369 2007 33

Filling up table

0

F matrix

0

1

2

m

0 1 2 n

X

Y

CS369 2007 34

Filling up table

0

F matrix

0

1

2

m

0 1 2 n

X

Y

CS369 2007 35

Filling up table

0

F matrix

0

1

2

m

0 1 2 n

X

Y

CS369 2007 36

Filling up table

0

F matrix

0

1

2

m

0 1 2 n

X

Y

CS369 2007 37

Filling up table

0

F matrix

0

1

2

m

0 1 2 n

X

Y

CS369 2007 38

Filling up table

0

F matrix

0

1

2

m

0 1 2 n

X

Y

CS369 2007 39

Filling up table

0

F matrix

0

1

2

m

0 1 2 n

X

Y

CS369 2007 40

Filling up table

0

F matrix

0

1

2

m

0 1 2 n

X

Y

CS369 2007 41

Filling up table

0

F matrix

0

1

2

m

0 1 2 n

X

Y

CS369 2007 42

Filling up table

0

F matrix

0

1

2

m

0 1 2 n

X

Y

CS369 2007 43

Filling up table

0

F matrix

0

1

2

m

0 1 2 n

X

Y

Optimalalignmentscore

CS369 2007 44

Constructing alignment

0

F matrix

0

1

2

m

0 1 2 n

X

Y


CS369 2007 45

Example

0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80

-8 -2 -9 -17 -25 -33 -42 -49 -57 -65 -73

-16 -10 -3 -4 -12 -20 -28 -36 -44 -52 -60

-24 -18 -11 -6 -7 -15 -5 -13 -21 -29 -37

-32 -14 -18 -13 -8 -9 -13 -7 -3 -11 -19

-40 -22 -8 -16 -16 -9 -12 -15 -7 3 -5

-48 -30 -16 -3 -11 -11 -12 -12 -15 -5 2

-56 -38 -24 -11 -6 -12 -14 -15 -12 -9 1

F matrix

0

1

2

m

0 1 2 n

X

Y


H E A G A W G H E E

P

A

W

H

E

A

E

CS369 2007 46

Example

0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80

-8 -2 -9 -17 -25 -33 -42 -49 -57 -65 -73

-16 -10 -3 -4 -12 -20 -28 -36 -44 -52 -60

-24 -18 -11 -6 -7 -15 -5 -13 -21 -29 -37

-32 -14 -18 -13 -8 -9 -13 -7 -3 -11 -19

-40 -22 -8 -16 -16 -9 -12 -15 -7 3 -5

-48 -30 -16 -3 -11 -11 -12 -12 -15 -5 2

-56 -38 -24 -11 -6 -12 -14 -15 -12 -9 1

F matrix

0

1

2

m

0 1 2 n

X


P

A

W

H

E

A

E

? ? ? ? ? ? ? ? ? ? E

X

Y

Y

H E A G A W G H E E

? ? ? ? ? ? ? ? ? ? EAlignment

CS369 2007 47

Example

0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80

-8 -2 -9 -17 -25 -33 -42 -49 -57 -65 -73

-16 -10 -3 -4 -12 -20 -28 -36 -44 -52 -60

-24 -18 -11 -6 -7 -15 -5 -13 -21 -29 -37

-32 -14 -18 -13 -8 -9 -13 -7 -3 -11 -19

-40 -22 -8 -16 -16 -9 -12 -15 -7 3 -5

-48 -30 -16 -3 -11 -11 -12 -12 -15 -5 2

-56 -38 -24 -11 -6 -12 -14 -15 -12 -9 1

F matrix

0

1

2

m

0 1 2 n

X


P

A

W

H

E

A

E

Y

H E A G A W G H E E

X

Y ? ? ? ? ? ? ? ? ? - E

? ? ? ? ? ? ? ? ? A EAlignment

CS369 2007 48

Example

0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80

-8 -2 -9 -17 -25 -33 -42 -49 -57 -65 -73

-16 -10 -3 -4 -12 -20 -28 -36 -44 -52 -60

-24 -18 -11 -6 -7 -15 -5 -13 -21 -29 -37

-32 -14 -18 -13 -8 -9 -13 -7 -3 -11 -19

-40 -22 -8 -16 -16 -9 -12 -15 -7 3 -5

-48 -30 -16 -3 -11 -11 -12 -12 -15 -5 2

-56 -38 -24 -11 -6 -12 -14 -15 -12 -9 1

F matrix

0

1

2

m

0 1 2 n

X


P

A

W

H

E

A

E

Y

H E A G A W G H E E

X

Y ? ? ? ? ? ? ? ? E - E

? ? ? ? ? ? ? ? E A EAlignment

CS369 2007 49

Example

0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80

-8 -2 -9 -17 -25 -33 -42 -49 -57 -65 -73

-16 -10 -3 -4 -12 -20 -28 -36 -44 -52 -60

-24 -18 -11 -6 -7 -15 -5 -13 -21 -29 -37

-32 -14 -18 -13 -8 -9 -13 -7 -3 -11 -19

-40 -22 -8 -16 -16 -9 -12 -15 -7 3 -5

-48 -30 -16 -3 -11 -11 -12 -12 -15 -5 2

-56 -38 -24 -11 -6 -12 -14 -15 -12 -9 1

F matrix

0

1

2

m

0 1 2 n

X


P

A

W

H

E

A

E

Y

H E A G A W G H E E

X

Y ? ? ? ? ? ? ? H E - E

? ? ? ? ? ? ? H E A EAlignment

CS369 2007 50

Example

0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80

-8 -2 -9 -17 -25 -33 -42 -49 -57 -65 -73

-16 -10 -3 -4 -12 -20 -28 -36 -44 -52 -60

-24 -18 -11 -6 -7 -15 -5 -13 -21 -29 -37

-32 -14 -18 -13 -8 -9 -13 -7 -3 -11 -19

-40 -22 -8 -16 -16 -9 -12 -15 -7 3 -5

-48 -30 -16 -3 -11 -11 -12 -12 -15 -5 2

-56 -38 -24 -11 -6 -12 -14 -15 -12 -9 1

F matrix

0

1

2

m

0 1 2 n

X


P

A

W

H

E

A

E

Y

H E A G A W G H E E

X

Y ? ? ? ? ? ? G H E - E

? ? ? ? ? ? - H E A EAlignment

CS369 2007 51

Example

0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80

-8 -2 -9 -17 -25 -33 -42 -49 -57 -65 -73

-16 -10 -3 -4 -12 -20 -28 -36 -44 -52 -60

-24 -18 -11 -6 -7 -15 -5 -13 -21 -29 -37

-32 -14 -18 -13 -8 -9 -13 -7 -3 -11 -19

-40 -22 -8 -16 -16 -9 -12 -15 -7 3 -5

-48 -30 -16 -3 -11 -11 -12 -12 -15 -5 2

-56 -38 -24 -11 -6 -12 -14 -15 -12 -9 1

F matrix

0

1

2

m

0 1 2 n

X


P

A

W

H

E

A

E

Y

H E A G A W G H E E

AlignmentX

Y ? ? ? ? ? W G H E - E

? ? ? ? ? W - H E A E

CS369 2007 52

Example

0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80

-8 -2 -9 -17 -25 -33 -42 -49 -57 -65 -73

-16 -10 -3 -4 -12 -20 -28 -36 -44 -52 -60

-24 -18 -11 -6 -7 -15 -5 -13 -21 -29 -37

-32 -14 -18 -13 -8 -9 -13 -7 -3 -11 -19

-40 -22 -8 -16 -16 -9 -12 -15 -7 3 -5

-48 -30 -16 -3 -11 -11 -12 -12 -15 -5 2

-56 -38 -24 -11 -6 -12 -14 -15 -12 -9 1

F matrix

0

1

2

m

0 1 2 n

X


P

A

W

H

E

A

E

Y

H E A G A W G H E E

AlignmentX

Y ? ? ? ? A W G H E - E

? ? ? ? A W - H E A E

CS369 2007 53

Example

0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80

-8 -2 -9 -17 -25 -33 -42 -49 -57 -65 -73

-16 -10 -3 -4 -12 -20 -28 -36 -44 -52 -60

-24 -18 -11 -6 -7 -15 -5 -13 -21 -29 -37

-32 -14 -18 -13 -8 -9 -13 -7 -3 -11 -19

-40 -22 -8 -16 -16 -9 -12 -15 -7 3 -5

-48 -30 -16 -3 -11 -11 -12 -12 -15 -5 2

-56 -38 -24 -11 -6 -12 -14 -15 -12 -9 1

F matrix

0

1

2

m

0 1 2 n

X


P

A

W

H

E

A

E

Y

H E A G A W G H E E

AlignmentX

Y ? ? ? G A W G H E - E

? ? ? - A W - H E A E

CS369 2007 54

Example

0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80

-8 -2 -9 -17 -25 -33 -42 -49 -57 -65 -73

-16 -10 -3 -4 -12 -20 -28 -36 -44 -52 -60

-24 -18 -11 -6 -7 -15 -5 -13 -21 -29 -37

-32 -14 -18 -13 -8 -9 -13 -7 -3 -11 -19

-40 -22 -8 -16 -16 -9 -12 -15 -7 3 -5

-48 -30 -16 -3 -11 -11 -12 -12 -15 -5 2

-56 -38 -24 -11 -6 -12 -14 -15 -12 -9 1

F matrix

0

1

2

m

0 1 2 n

X


P

A

W

H

E

A

E

Y

H E A G A W G H E E

AlignmentX

Y ? ? A G A W G H E - E

? ? P - A W - H E A E

CS369 2007 55

Example

0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80

-8 -2 -9 -17 -25 -33 -42 -49 -57 -65 -73

-16 -10 -3 -4 -12 -20 -28 -36 -44 -52 -60

-24 -18 -11 -6 -7 -15 -5 -13 -21 -29 -37

-32 -14 -18 -13 -8 -9 -13 -7 -3 -11 -19

-40 -22 -8 -16 -16 -9 -12 -15 -7 3 -5

-48 -30 -16 -3 -11 -11 -12 -12 -15 -5 2

-56 -38 -24 -11 -6 -12 -14 -15 -12 -9 1

F matrix

0

1

2

m

0 1 2 n

X


P

A

W

H

E

A

E

Y

H E A G A W G H E E

AlignmentX

Y ? E A G A W G H E - E

? - P - A W - H E A E

CS369 2007 56

Example

0 -8 -16 -24 -32 -40 -48 -56 -64 -72 -80

-8 -2 -9 -17 -25 -33 -42 -49 -57 -65 -73

-16 -10 -3 -4 -12 -20 -28 -36 -44 -52 -60

-24 -18 -11 -6 -7 -15 -5 -13 -21 -29 -37

-32 -14 -18 -13 -8 -9 -13 -7 -3 -11 -19

-40 -22 -8 -16 -16 -9 -12 -15 -7 3 -5

-48 -30 -16 -3 -11 -11 -12 -12 -15 -5 2

-56 -38 -24 -11 -6 -12 -14 -15 -12 -9 1

F matrix

0

1

2

m

0 1 2 n

X


P

A

W

H

E

A

E

Y

H E A G A W G H E E

AlignmentX

Y H E A G A W G H E - E

- - P - A W - H E A E

CS369 2007 57

Time and space

€

⇒ Θ(mn)

F matrix

0

1

2

m

0 1 2 n

€

(m +1) × (n +1) table entries space

Each entry computed in constant time

€

⇒ Θ(mn) time

CS369 2007 58

Smith & Waterman algorithmComputes local alignment.

i.e. look for best alignment of subsequences of X and Y, ignoring scoresof regions on either side

Y

X

Best subsequence alignment

CS369 2007 59


Given sequences

Define F(i,j) = score of best suffix alignment

between

and

N.B. Includes empty alignment with score 0

€

Y = (y1, y2,..., yn )

X = (x1,x2,...,xm )

€

(xs,xs+1,...,x i) where s ≤ i

€

(yr, yr+1,..., y j ) where r ≤ j

CS369 2007 60

Dynamic Programming recurrencesOptimal alignment

€

xr, xr+1, xr+2, ...,x i

€

ys, ys+1, ys+2, ..., y j

Looks like ……

€

xr,xr+2,xr+2,...,x i−1

€

ys,ys+1, ys+2,..., y j−1

€

x i

€

y j

€

F(i, j)

€

F(i −1, j −1) + s(x i,y j )

or ……………

€

xr, xr+1, xr+2,..., x i

€

ys, ys+1, ys+2,...,y j−1

€

−

€

y j

€

F(i, j −1) − d

or ……………

€

xr,xr+1,xr+2,...,x i−1

€

ys, ys+1,ys+2,...,y j

€

x i

€

−

€

F(i −1, j) − d

or ……………

€

xr, xr+1, xr+2, ...,x i

€

ys, ys+1, ys+2, ..., y j

€

0

CS369 2007 61


so ……

€

0

F(i −1, j −1) + s(x i,y j )

F(i, j) = max F(i, j −1) − d

F(i −1, j) − d

€

F(i,0) = F(0, j) = 0Basis:

CS369 2007 62

ExampleF H E A G A W G H E E

0 0 0 0 0 0 0 0 0 0 0

P 0 0 0 0 0 0 0 0 0 0 0

A 0 0 0 5 0 5 0 0 0 0 0

W 0 0 0 0 2 0 20 12 4 0 0

H 0 10 2 0 0 0 12 18 22 14 6

E 0 2 16 8 0 0 4 10 18 28 20

A 0 0 8 21 13 5 0 4 10 20 27

E 0 0 6 13 18 12 4 0 4 16 26

CS369 2007 63


0 0 0 0 0 0 0 0 0 0 0

P 0 0 0 0 0 0 0 0 0 0 0

A 0 0 0 5 0 5 0 0 0 0 0

W 0 0 0 0 2 0 20 12 4 0 0

H 0 10 2 0 0 0 12 18 22 14 6

E 0 2 16 8 0 0 4 10 18 28 20

A 0 0 8 21 13 5 0 4 10 20 27

E 0 0 6 13 18 12 4 0 4 16 26

AlignmentX

Y A W G H E

A W - H E

CS369 2007 64

Repeated (local) matches

Long sequences - interested in all local alignments with significant score,> threshold T.

e.g. copies of repeated domain or motif in a protein.

X = sequence containing motif

Y = target sequence

Method is asymmetric

Y

Matching parts of X

CS369 2007 65


Given sequences

Define F(i,j) (i ≥ 1) = best sum of match scores in

and €

Y = (y1,y2,..., yn )

X = (x1,x2,..., xm )

€

(x1,x2,..., x i)

€

(y1, y2,..., y j )

€

y j

€

x i

€

y j

assuming

and match ends in

is in a matched region

or

CS369 2007 66

Ends of matches

€

F(0,0) = 0

€

F(0, j) = best sum of completed match scores to

€

(y1, y2,...,y j )

assuming that

€

y j is not in a matched region

€

F(0, j −1)

F(0, j) = max F(i, j −1) − T, i =1,...,n

Row 0 therefore marks unmatched regions and ends of matches in Y.

CS369 2007 67

General recurrence

€

F(0, j)

F(i −1, j −1) + s(x i, y j )

F(i, j) = max F(i, j −1) − d

F(i −1, j) − d

Start of new match

Extension of previous match

CS369 2007 68


0 0 0 0 1 1 1 1 1 3 9

P 0 0 0 0 1 1 1 1 1 3 9

A 0 0 0 5 1 6 1 1 1 3 9

W 0 0 0 0 2 1 21 13 5 3 9

H 0 10 2 0 1 1 13 19 23 15 9

E 0 2 16 8 1 1 5 11 19 29 21

A 0 0 8 21 13 6 1 5 11 21 28

E 0 0 6 13 18 12 4 1 5 17 27

9

Extra cell for final total score

CS369 2007 69

Example

AlignmentX

Y H E A G A W G H E E

H E A . A W - H E .

Extra cell for final total score

F H E A G A W G H E E

0 0 0 0 1 1 1 1 1 3 9

P 0 0 0 0 1 1 1 1 1 3 9

A 0 0 0 5 1 6 1 1 1 3 9

W 0 0 0 0 2 1 21 13 5 3 9

H 0 10 2 0 1 1 13 19 23 15 9

E 0 2 16 8 1 1 5 11 19 29 21

A 0 0 8 21 13 6 1 5 11 21 28

E 0 0 6 13 18 12 4 1 5 17 27

9

CS369 2007 70

Overlap matchesY Y

X X

YY

X X

Don’t penalise overhanging ends i.e. set F(i,0) = F(0,j) = 0

€

F(i −1, j −1) + s(x i,y j )

F(i, j) = max F(i, j −1) − d

F(i −1, j) − d

Otherwise

CS369 2007 71


0 0 0 0 0 0 0 0 0 0 0

P 0 -2̀ -1 -1 -2 -1 -4 -2 -2 -1 -1

A 0 -2 -2 4 -1 3 -4 -4 -4 -3 -2

W 0 -3 -5 -4 1 -4 18 10 2 6 -6

H 0 10 2 6 -6 -1 10 16 20 12 4

E 0 2 16 8 0 7 2 8 16 26 18

A 0 -2 8 21 13 5 3 2 8 18 25

E 0 0 4 13 18 12 4 4 2 14 24

CS369 2007 72


0 0 0 0 0 0 0 0 0 0 0

P 0 -2̀ -1 -1 -2 -1 -4 -2 -2 -1 -1

A 0 -2 -2 4 -1 3 -4 -4 -4 -3 -2

W 0 -3 -5 -4 1 -4 18 10 2 6 -6

H 0 10 2 6 -6 -1 10 16 20 12 4

E 0 2 16 8 0 7 2 8 16 26 18

A 0 -2 8 21 13 5 3 2 8 18 25

E 0 0 4 13 18 12 4 4 2 14 24

AlignmentX

Y G A W G H E E

P A W - H E A

pairwise alignment

Documents

c g g t sy

w g c c t t cs369

w g c c t talignmentx

w g c c t tcs369

alignment score

possible subproblems

mannersame subproblems

smallest subproblems