sequence alignment
DESCRIPTION
Sequence Alignment. Oct 9, 2002 Joon Lee Genomics & Computational Biology. Dynamic Programming. Optimization problems: find the best decision one after another Subproblems are not independent Subproblems share subsubproblems Solve subproblem, save its answer in a table. - PowerPoint PPT PresentationTRANSCRIPT
Sequence Alignment
Oct 9, 2002
Joon Lee
Genomics & Computational Biology
2002-10-09 Genomics & Computational Biology 2
Dynamic Programming
• Optimization problems: find the best decision one after another
• Subproblems are not independent
• Subproblems share subsubproblems
• Solve subproblem, save its answer in a table
2002-10-09 Genomics & Computational Biology 3
Four Steps of DP
1. Characterize the structure of an optimal solution
2. Recursively define the value of an optimal solution
3. Compute the value of an optimal solution in a bottom-up fashion
4. Construct an optimal solution from computed information
2002-10-09 Genomics & Computational Biology 4
Sequence Alignment
Sequence 1: G A A T T C A G T T A
Sequence 2: G G A T C G A
2002-10-09 Genomics & Computational Biology 5
Align or insert gap
G A A T T C A G T T A| | | | | |G G A _ T C _ G _ _ A
G _ A A T T C A G T T A| | | | | |G G _ A _ T C _ G _ _ A
2002-10-09 Genomics & Computational Biology 6
Three Steps of SA
1. Initialization: gap penalty
2. Scoring: matrix fill
3. Alignment: trace back
2002-10-09 Genomics & Computational Biology 7
Step 1: Initialization
G A A T T C A G T T A0 -2 -4 -6 -8 -10 -12 -14 -16 -18 -20 -22
G -2
G -4
A -6
T -8
C -10
G -12
A -14
2002-10-09 Genomics & Computational Biology 8
Step 2: Scoring
• A = a1a2…an, B = b1b2…bm
• Sij : score at (i,j)• s(aibj) : matching score between ai and bj
• w : gap penalty
figure source
2002-10-09 Genomics & Computational Biology 9
Step 2: Scoring
• Match: +2
• Mismatch: -1
• Gap: -2
2002-10-09 Genomics & Computational Biology 10
Step 2: Scoring
G A A T T C A G T T A0 -2 -4 -6 -8 -10 -12 -14 -16 -18 -20 -22
G -2 2
G -4
A -6
T -8
C -10
G -12
A -14
0 + 2 = 2
-2 + (-2) = -4
-2 + (-2) = -4
0 + 2 = 2
-2 + (-2) = -4
-2 + (-2) = -4
2002-10-09 Genomics & Computational Biology 11
Step 2: Scoring
G A A T T C A G T T A0 -2 -4 -6 -8 -10 -12 -14 -16 -18 -20 -22
G -2 2 0
G -4
A -6
T -8
C -10
G -12
A -14
-2 + (-1) = -3
-4 + (-2) = -6
2 + (-2) = 0
-2 + (-1) = -3
-4 + (-2) = -6
2 + (-2) = 0
2002-10-09 Genomics & Computational Biology 12
Step 2: Scoring
G A A T T C A G T T A0 -2 -4 -6 -8 -10 -12 -14 -16 -18 -20 -22
G -2 2 0
G -4 0
A -6
T -8
C -10
G -12
A -14
-2 + 2 = 0
2 + (-2) = 0
-4 + (-2) = -6
-2 + 2 = 0
2 + (-2) = 0
-4 + (-2) = -6
2002-10-09 Genomics & Computational Biology 13
Step 2: Scoring
G A A T T C A G T T A0 -2 -4 -6 -8 -10 -12 -14 -16 -18 -20 -22
G -2 2 0 -2 -4 -6 -8 -10 -12 -14 -16 -18
G -4 0 1 -1 -3 -5 -7 -9 -8 -10 -12 -14
A -6 -2 2 3 1 -1 -3 -5 -7 -9 -11 -10
T -8 -4 0 1 5 3 1 -1 -3 -5 -7 -9
C -10 -6 -2 -1 3 4 5 3 1 -1 -3 -5
G -12 -8 -4 -3 1 2 3 4 5 3 1 -1
A -14 -10 -6 -2 -1 0 1 5 3 4 2 3
2002-10-09 Genomics & Computational Biology 14
Step 3: Trace back
G A A T T C A G T T A0 -2 -4 -6 -8 -10 -12 -14 -16 -18 -20 -22
G -2 2 0 -2 -4 -6 -8 -10 -12 -14 -16 -18
G -4 0 1 -1 -3 -5 -7 -9 -8 -10 -12 -14
A -6 -2 2 3 1 -1 -3 -5 -7 -9 -11 -10
T -8 -4 0 1 5 3 1 -1 -3 -5 -7 -9
C -10 -6 -2 -1 3 4 5 3 1 -1 -3 -5
G -12 -8 -4 -3 1 2 3 4 5 3 1 -1
A -14 -10 -6 -2 -1 0 1 5 3 4 2 3
2002-10-09 Genomics & Computational Biology 15
Step 3: Trace back
G A A T T C A G T T AG G A _ T C _ G _ _ A
G A A T T C A G T T AG G A T _ C _ G _ _ A
2002-10-09 Genomics & Computational Biology 16
Excercise
G C A T C C G
G
A
T
C
G
• Match: +2• Mismatch: -1• Gap: -2
2002-10-09 Genomics & Computational Biology 17
Excercise
G C A T C C G
0 -2 -4 -6 -8 -10 -12 -14
G -2 2 0 -2 -4 -6 -8 -10
A -4 0 1 2 0 -2 -4 -6
T -6 -2 -1 0 4 2 0 -2
C -8 -4 0 -2 2 6 4 2
G -10 -6 -2 -1 0 4 5 6
• Match: +2• Mismatch: -1• Gap: -2
G C A T C C GG A T C GG A T C GG A T C G
2002-10-09 Genomics & Computational Biology 18
Amino acids
• Match/mismatch → Substitution matrix
2002-10-09 Genomics & Computational Biology 19
Global & Local alignment
• Global: Needlman-Wunsch Algorithm
• Local: Smith-Waterman Algorithm
From Mount Bioinformatics Chap 3
2002-10-09 Genomics & Computational Biology 20
References
• Sequence alignment with Java applet– http://linneus20.ethz.ch:8080/5_4_5.html