sequence alignment

20
Sequence Alignment Oct 9, 2002 Joon Lee Genomics & Computational Biology

Upload: roary-brennan

Post on 30-Dec-2015

37 views

Category:

Documents


3 download

DESCRIPTION

Sequence Alignment. Oct 9, 2002 Joon Lee Genomics & Computational Biology. Dynamic Programming. Optimization problems: find the best decision one after another Subproblems are not independent Subproblems share subsubproblems Solve subproblem, save its answer in a table. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Sequence Alignment

Sequence Alignment

Oct 9, 2002

Joon Lee

Genomics & Computational Biology

Page 2: Sequence Alignment

2002-10-09 Genomics & Computational Biology 2

Dynamic Programming

• Optimization problems: find the best decision one after another

• Subproblems are not independent

• Subproblems share subsubproblems

• Solve subproblem, save its answer in a table

Page 3: Sequence Alignment

2002-10-09 Genomics & Computational Biology 3

Four Steps of DP

1. Characterize the structure of an optimal solution

2. Recursively define the value of an optimal solution

3. Compute the value of an optimal solution in a bottom-up fashion

4. Construct an optimal solution from computed information

Page 4: Sequence Alignment

2002-10-09 Genomics & Computational Biology 4

Sequence Alignment

Sequence 1: G A A T T C A G T T A

Sequence 2: G G A T C G A

Page 5: Sequence Alignment

2002-10-09 Genomics & Computational Biology 5

Align or insert gap

G A A T T C A G T T A| | | | | |G G A _ T C _ G _ _ A

G _ A A T T C A G T T A| | | | | |G G _ A _ T C _ G _ _ A

Page 6: Sequence Alignment

2002-10-09 Genomics & Computational Biology 6

Three Steps of SA

1. Initialization: gap penalty

2. Scoring: matrix fill

3. Alignment: trace back

Page 7: Sequence Alignment

2002-10-09 Genomics & Computational Biology 7

Step 1: Initialization

G A A T T C A G T T A0 -2 -4 -6 -8 -10 -12 -14 -16 -18 -20 -22

G -2

G -4

A -6

T -8

C -10

G -12

A -14

Page 8: Sequence Alignment

2002-10-09 Genomics & Computational Biology 8

Step 2: Scoring

• A = a1a2…an, B = b1b2…bm

• Sij : score at (i,j)• s(aibj) : matching score between ai and bj

• w : gap penalty

figure source

Page 9: Sequence Alignment

2002-10-09 Genomics & Computational Biology 9

Step 2: Scoring

• Match: +2

• Mismatch: -1

• Gap: -2

Page 10: Sequence Alignment

2002-10-09 Genomics & Computational Biology 10

Step 2: Scoring

G A A T T C A G T T A0 -2 -4 -6 -8 -10 -12 -14 -16 -18 -20 -22

G -2 2

G -4

A -6

T -8

C -10

G -12

A -14

0 + 2 = 2

-2 + (-2) = -4

-2 + (-2) = -4

0 + 2 = 2

-2 + (-2) = -4

-2 + (-2) = -4

Page 11: Sequence Alignment

2002-10-09 Genomics & Computational Biology 11

Step 2: Scoring

G A A T T C A G T T A0 -2 -4 -6 -8 -10 -12 -14 -16 -18 -20 -22

G -2 2 0

G -4

A -6

T -8

C -10

G -12

A -14

-2 + (-1) = -3

-4 + (-2) = -6

2 + (-2) = 0

-2 + (-1) = -3

-4 + (-2) = -6

2 + (-2) = 0

Page 12: Sequence Alignment

2002-10-09 Genomics & Computational Biology 12

Step 2: Scoring

G A A T T C A G T T A0 -2 -4 -6 -8 -10 -12 -14 -16 -18 -20 -22

G -2 2 0

G -4 0

A -6

T -8

C -10

G -12

A -14

-2 + 2 = 0

2 + (-2) = 0

-4 + (-2) = -6

-2 + 2 = 0

2 + (-2) = 0

-4 + (-2) = -6

Page 13: Sequence Alignment

2002-10-09 Genomics & Computational Biology 13

Step 2: Scoring

G A A T T C A G T T A0 -2 -4 -6 -8 -10 -12 -14 -16 -18 -20 -22

G -2 2 0 -2 -4 -6 -8 -10 -12 -14 -16 -18

G -4 0 1 -1 -3 -5 -7 -9 -8 -10 -12 -14

A -6 -2 2 3 1 -1 -3 -5 -7 -9 -11 -10

T -8 -4 0 1 5 3 1 -1 -3 -5 -7 -9

C -10 -6 -2 -1 3 4 5 3 1 -1 -3 -5

G -12 -8 -4 -3 1 2 3 4 5 3 1 -1

A -14 -10 -6 -2 -1 0 1 5 3 4 2 3

Page 14: Sequence Alignment

2002-10-09 Genomics & Computational Biology 14

Step 3: Trace back

G A A T T C A G T T A0 -2 -4 -6 -8 -10 -12 -14 -16 -18 -20 -22

G -2 2 0 -2 -4 -6 -8 -10 -12 -14 -16 -18

G -4 0 1 -1 -3 -5 -7 -9 -8 -10 -12 -14

A -6 -2 2 3 1 -1 -3 -5 -7 -9 -11 -10

T -8 -4 0 1 5 3 1 -1 -3 -5 -7 -9

C -10 -6 -2 -1 3 4 5 3 1 -1 -3 -5

G -12 -8 -4 -3 1 2 3 4 5 3 1 -1

A -14 -10 -6 -2 -1 0 1 5 3 4 2 3

Page 15: Sequence Alignment

2002-10-09 Genomics & Computational Biology 15

Step 3: Trace back

G A A T T C A G T T AG G A _ T C _ G _ _ A

G A A T T C A G T T AG G A T _ C _ G _ _ A

Page 16: Sequence Alignment

2002-10-09 Genomics & Computational Biology 16

Excercise

G C A T C C G

G

A

T

C

G

• Match: +2• Mismatch: -1• Gap: -2

Page 17: Sequence Alignment

2002-10-09 Genomics & Computational Biology 17

Excercise

G C A T C C G

0 -2 -4 -6 -8 -10 -12 -14

G -2 2 0 -2 -4 -6 -8 -10

A -4 0 1 2 0 -2 -4 -6

T -6 -2 -1 0 4 2 0 -2

C -8 -4 0 -2 2 6 4 2

G -10 -6 -2 -1 0 4 5 6

• Match: +2• Mismatch: -1• Gap: -2

G C A T C C GG A T C GG A T C GG A T C G

Page 18: Sequence Alignment

2002-10-09 Genomics & Computational Biology 18

Amino acids

• Match/mismatch → Substitution matrix

Page 19: Sequence Alignment

2002-10-09 Genomics & Computational Biology 19

Global & Local alignment

• Global: Needlman-Wunsch Algorithm

• Local: Smith-Waterman Algorithm

From Mount Bioinformatics Chap 3

Page 20: Sequence Alignment

2002-10-09 Genomics & Computational Biology 20

References

• Sequence alignment with Java applet– http://linneus20.ethz.ch:8080/5_4_5.html