a sub-quadratic sequence alignment algorithm. global alignment alignment graph for s = aacgacga, t =...
DESCRIPTION
FOUR RUSSIAN ALGORITHMTRANSCRIPT
![Page 1: A Sub-quadratic Sequence Alignment Algorithm. Global alignment Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1)](https://reader035.vdocuments.site/reader035/viewer/2022062413/5a4d1b737f8b9ab0599b614b/html5/thumbnails/1.jpg)
A Sub-quadratic Sequence Alignment Algorithm
![Page 2: A Sub-quadratic Sequence Alignment Algorithm. Global alignment Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1)](https://reader035.vdocuments.site/reader035/viewer/2022062413/5a4d1b737f8b9ab0599b614b/html5/thumbnails/2.jpg)
Global alignment
ag
a
g
c
a
t
c
agcagcaa 31
1
2
3
5
4 65 7 80
7
6
8
2
4
Alignment graph for S = aacgacga, T = ctacgaga
Complexity: O(n2)
V(i,j) = max {V(i-1,j-1) + (S[i], T[j]),V(i-1,j) + (S[i], -),V(i,j-1) + (-, T[j])
}
![Page 3: A Sub-quadratic Sequence Alignment Algorithm. Global alignment Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1)](https://reader035.vdocuments.site/reader035/viewer/2022062413/5a4d1b737f8b9ab0599b614b/html5/thumbnails/3.jpg)
FOUR RUSSIAN ALGORITHM
![Page 4: A Sub-quadratic Sequence Alignment Algorithm. Global alignment Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1)](https://reader035.vdocuments.site/reader035/viewer/2022062413/5a4d1b737f8b9ab0599b614b/html5/thumbnails/4.jpg)
![Page 5: A Sub-quadratic Sequence Alignment Algorithm. Global alignment Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1)](https://reader035.vdocuments.site/reader035/viewer/2022062413/5a4d1b737f8b9ab0599b614b/html5/thumbnails/5.jpg)
UNRESTRICTED SCORING FUNCTION
![Page 6: A Sub-quadratic Sequence Alignment Algorithm. Global alignment Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1)](https://reader035.vdocuments.site/reader035/viewer/2022062413/5a4d1b737f8b9ab0599b614b/html5/thumbnails/6.jpg)
Main idea: Compress the sequences
• S = aacgacga • T = ctacgaga
0
21 3
4 5
c t a
g g
0
1 3
2
4
a g
c
g
LZ-78: Divide the sequence into distinct words
1 2 3 4
a ac g acg a1 2 3 4 5
c t a cg ag a
Trie Trie
The number of distinct words: )( lognnO
![Page 7: A Sub-quadratic Sequence Alignment Algorithm. Global alignment Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1)](https://reader035.vdocuments.site/reader035/viewer/2022062413/5a4d1b737f8b9ab0599b614b/html5/thumbnails/7.jpg)
a acg g ac act
3/4 3/2 acg
5/4 5/2aga
2 3 4
1
2
3
4
5
0 1
g
a
gca
agca
aca
ga
ca
Main idea
03
52
1
ag c
t
Trie for T
4g
g
01
23
4
ac
gTrie for S
• Compute the alignment score in each block• Propagate the scores between the adjacent blocks
![Page 8: A Sub-quadratic Sequence Alignment Algorithm. Global alignment Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1)](https://reader035.vdocuments.site/reader035/viewer/2022062413/5a4d1b737f8b9ab0599b614b/html5/thumbnails/8.jpg)
Main idea
• Compress the sequence into words• Pre-compute the score for each block• Do alignment between blocks
• Note:– Replace normal characters by words– Operate on blocks
![Page 9: A Sub-quadratic Sequence Alignment Algorithm. Global alignment Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1)](https://reader035.vdocuments.site/reader035/viewer/2022062413/5a4d1b737f8b9ab0599b614b/html5/thumbnails/9.jpg)
COMPRESS THE SEQUENCELZ-78
![Page 10: A Sub-quadratic Sequence Alignment Algorithm. Global alignment Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1)](https://reader035.vdocuments.site/reader035/viewer/2022062413/5a4d1b737f8b9ab0599b614b/html5/thumbnails/10.jpg)
LZ-78
• S = aacgacga • T = ctacgaga
0
21 3
4 5
c t a
g g
0
1 3
2
4
a g
c
g
LZ-78: Divide the sequence into distinct words
1 2 3 4
a ac g acg a1 2 3 4 5
c t a cg ag a
Trie Trie
The number of distinct words: )( lognnO
![Page 11: A Sub-quadratic Sequence Alignment Algorithm. Global alignment Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1)](https://reader035.vdocuments.site/reader035/viewer/2022062413/5a4d1b737f8b9ab0599b614b/html5/thumbnails/11.jpg)
LZ-78
• Theorem (Lempel and Ziv):– Constant alphabet sequence S– The maximal number of distinct phrases in S is
O(n/log n).
• Tighter upper bound: O(hn/log n) – h is the entropy factor – a real number, 0 < h 1– Entropy is small sequence is repetitive
![Page 12: A Sub-quadratic Sequence Alignment Algorithm. Global alignment Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1)](https://reader035.vdocuments.site/reader035/viewer/2022062413/5a4d1b737f8b9ab0599b614b/html5/thumbnails/12.jpg)
COMPUTE THE ALIGNMENT SCORE IN EACH BLOCK
![Page 13: A Sub-quadratic Sequence Alignment Algorithm. Global alignment Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1)](https://reader035.vdocuments.site/reader035/viewer/2022062413/5a4d1b737f8b9ab0599b614b/html5/thumbnails/13.jpg)
a acg g ac act
3/4 3/2 acg
5/4 5/2aga
2 3 4
1
2
3
4
5
0 1
g
a
gca
agca
aca
ga
ca
Compute the alignment score in each block•
![Page 14: A Sub-quadratic Sequence Alignment Algorithm. Global alignment Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1)](https://reader035.vdocuments.site/reader035/viewer/2022062413/5a4d1b737f8b9ab0599b614b/html5/thumbnails/14.jpg)
• Given– Input border: I– Block
• Compute– Output border: O
O
g
a
gca
G0
20
1
2 3 4
13
4
55
I
![Page 15: A Sub-quadratic Sequence Alignment Algorithm. Global alignment Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1)](https://reader035.vdocuments.site/reader035/viewer/2022062413/5a4d1b737f8b9ab0599b614b/html5/thumbnails/15.jpg)
Matrices
• I[i] : is the input border value• DIST[i,j] : weight of the optimal path– From entry i of the input border– To entry j of its output border
• OUT[i,j] : merges the information from input row I and DIST– OUT[i,j]=I[i] + DIST[i,j]
• O[j] = max{OUT[i,j] for i=1..n}
O
g
a
gca
G0
20
1
2 3 4
13
4
55
I
![Page 16: A Sub-quadratic Sequence Alignment Algorithm. Global alignment Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1)](https://reader035.vdocuments.site/reader035/viewer/2022062413/5a4d1b737f8b9ab0599b614b/html5/thumbnails/16.jpg)
DIST and OUT matrix example
O
g
a
gca
G0
20
1
2 3 4
13
4
55
I
DIST matrix OUT matrixI (input borders)
Block – sub-sequences “acg”, “ag”
0 1 2 3 4 5
I0 0 -1 -2 -3 △ △
I1 -1 -1 -2 -1 -3 △
I2 -2 0 0 1 -1 -3
I3 △ -2 -2 0 -2 -2
I4 △ △ -2 0 -1 -1
I5 △ △ △ -2 -1 0
0 1 2 3 4 5
1 0 -1 -2 - -
1 1 0 1 -1 -
1 3 3 4 2 0
-12 0 0 2 0 0
-13 -13 -1 1 0 0
-14 -14 -14 1 2 3
I0=1
I1=2
I2=3
I3=2
I4=1
I5=3
O0 O1 O2 O3 O4 O5
1 3 3 4 2 3
max col
![Page 17: A Sub-quadratic Sequence Alignment Algorithm. Global alignment Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1)](https://reader035.vdocuments.site/reader035/viewer/2022062413/5a4d1b737f8b9ab0599b614b/html5/thumbnails/17.jpg)
• For each block, given two sub-sequence S1, S2
• Compute (from scratch) DIST in (n*m) time• Given I and DIST, compute OUT in (n*m) time• Given OUT[i,j], Compute O in (m*n) time
![Page 18: A Sub-quadratic Sequence Alignment Algorithm. Global alignment Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1)](https://reader035.vdocuments.site/reader035/viewer/2022062413/5a4d1b737f8b9ab0599b614b/html5/thumbnails/18.jpg)
Revise• Compress the sequence• Pre-compute DIST[i,j] for
each block• Compute border values of
each blocks
• Remaining questions– How to compute DIST[i,j]
efficiently?– How to compute O[j] from
I[i] and DIST[i,j] efficiently?
a acg g ac acta
4/4cg
5/4 5/3aga
2 3 4
1
2
3
4
5
0 1
![Page 19: A Sub-quadratic Sequence Alignment Algorithm. Global alignment Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1)](https://reader035.vdocuments.site/reader035/viewer/2022062413/5a4d1b737f8b9ab0599b614b/html5/thumbnails/19.jpg)
COMPUTE O[J] EFFICIENTLY
![Page 20: A Sub-quadratic Sequence Alignment Algorithm. Global alignment Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1)](https://reader035.vdocuments.site/reader035/viewer/2022062413/5a4d1b737f8b9ab0599b614b/html5/thumbnails/20.jpg)
Compute O[j] efficiently
• For each block of two sub-sequences S1, S2• Given– I[i]– DIST[i,j]
• Compute– O[j]
![Page 21: A Sub-quadratic Sequence Alignment Algorithm. Global alignment Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1)](https://reader035.vdocuments.site/reader035/viewer/2022062413/5a4d1b737f8b9ab0599b614b/html5/thumbnails/21.jpg)
DIST and OUT matrix example
O
g
a
gca
G0
20
1
2 3 4
13
4
55
I
DIST matrix OUT matrixI (input borders)
Block – sub-sequences “acg”, “ag”
0 1 2 3 4 5
I0 0 -1 -2 -3 △ △
I1 -1 -1 -2 -1 -3 △
I2 -2 0 0 1 -1 -3
I3 △ -2 -2 0 -2 -2
I4 △ △ -2 0 -1 -1
I5 △ △ △ -2 -1 0
0 1 2 3 4 5
1 0 -1 -2 - -
1 1 0 1 -1 -
1 3 3 4 2 0
-12 0 0 2 0 0
-13 -13 -1 1 0 0
-14 -14 -14 1 2 3
I0=1
I1=2
I2=3
I3=2
I4=1
I5=3
O0 O1 O2 O3 O4 O5
1 3 3 4 2 3
max col
![Page 22: A Sub-quadratic Sequence Alignment Algorithm. Global alignment Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1)](https://reader035.vdocuments.site/reader035/viewer/2022062413/5a4d1b737f8b9ab0599b614b/html5/thumbnails/22.jpg)
Compute O without explicit OUT
O
g
a
gca
G0
20
1
2 3 4
13
4
55
I
DIST matrix I (input borders)
Block – sub-sequences “acg”, “ag”
0 1 2 3 4 5
I0 0 -1 -2 -3 △ △
I1 -1 -1 -2 -1 -3 △
I2 -2 0 0 1 -1 -3
I3 △ -2 -2 0 -2 -2
I4 △ △ -2 0 -1 -1
I5 △ △ △ -2 -1 0
I0=1
I1=2
I2=3
I3=2
I4=1
I5=3
O0 O1 O2 O3 O4 O5
1 3 3 4 2 3
SMAWK
![Page 23: A Sub-quadratic Sequence Alignment Algorithm. Global alignment Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1)](https://reader035.vdocuments.site/reader035/viewer/2022062413/5a4d1b737f8b9ab0599b614b/html5/thumbnails/23.jpg)
• Given DIST[i,j], I[i] we can compute O[j] in O(n+m)– Without creating OUT[i,j]
• How? Why?
![Page 24: A Sub-quadratic Sequence Alignment Algorithm. Global alignment Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1)](https://reader035.vdocuments.site/reader035/viewer/2022062413/5a4d1b737f8b9ab0599b614b/html5/thumbnails/24.jpg)
Why?
• Aggarwal, Park and Schmidt observed that DIST and OUT matrices are Monge arrays.
• Definition: a matrix M[0…m,0…n] is totally monotone if either condition 1 or 2 below holds for all a,b=0…m; c,d=0…n: 1. Convex condition:
M[a,c]M[b,c]M[a,d]M[b,d] for all a<b and c<d.2. Concave condition:
M[a,c]M[b,c]M[a,d]M[b,d] for all a<b and c<d.
![Page 25: A Sub-quadratic Sequence Alignment Algorithm. Global alignment Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1)](https://reader035.vdocuments.site/reader035/viewer/2022062413/5a4d1b737f8b9ab0599b614b/html5/thumbnails/25.jpg)
How?
• Aggarwal et. al. gave a recursive algorithm, called SMAWK, which can find
all row and column maxima of a totally monotone matrixby querying only O(n) elements of the matrix.
![Page 26: A Sub-quadratic Sequence Alignment Algorithm. Global alignment Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1)](https://reader035.vdocuments.site/reader035/viewer/2022062413/5a4d1b737f8b9ab0599b614b/html5/thumbnails/26.jpg)
• Why DIST[i,j] is totally monotone?
O
g
a
gca
G0
20
1
2 3 4
13
4
55
I
The concave condition
If b-c is better than a-c, then b-d is better than a-d.
a b
dc
![Page 27: A Sub-quadratic Sequence Alignment Algorithm. Global alignment Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1)](https://reader035.vdocuments.site/reader035/viewer/2022062413/5a4d1b737f8b9ab0599b614b/html5/thumbnails/27.jpg)
Other problem
• Rectangle problem of DIST
• Set upper right corner of OUT to -• Set lower left corner of OUT to -(n+i-1)*k• Preserve the totally monotone property of
OUT
0 1 2 3 4 5
I0 0 -1 -2 -3 △ △I1 -1 -1 -2 -1 -3 △I2 -2 0 0 1 -1 -3
I3 △ -2 -2 0 -2 -2
I4 △ △ -2 0 -1 -1
I5 △ △ △ -2 -1 0
![Page 28: A Sub-quadratic Sequence Alignment Algorithm. Global alignment Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1)](https://reader035.vdocuments.site/reader035/viewer/2022062413/5a4d1b737f8b9ab0599b614b/html5/thumbnails/28.jpg)
COMPUTE DIST[I,J] EFFICIENTLY
![Page 29: A Sub-quadratic Sequence Alignment Algorithm. Global alignment Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1)](https://reader035.vdocuments.site/reader035/viewer/2022062413/5a4d1b737f8b9ab0599b614b/html5/thumbnails/29.jpg)
a acg g ac act
3/4 3/2 acg
5/4 5/2aga
2 3 4
1
2
3
4
5
0 1
g
a
gca
agca
aca
ga
ca
Compute DIST[i,j] for block(5/4)
03
52
1
ag c
t
Trie for T
4g
g
01
23
4
ac
gTrie for S
![Page 30: A Sub-quadratic Sequence Alignment Algorithm. Global alignment Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1)](https://reader035.vdocuments.site/reader035/viewer/2022062413/5a4d1b737f8b9ab0599b614b/html5/thumbnails/30.jpg)
gca
g
a
gca
g
a
I0
I4 I5I2I3
I1
O3 DIST matrix
0-1-2ΔΔΔI5 = 3
-1-10-2ΔΔI4 = 1
-2-20-2-2ΔI3 = 2
-3-1100-2I2 = 3
Δ-2-1-2-1-1I1 = 2
ΔΔ-3-2-10I0 = 1
0-1-2ΔΔΔI5 = 3
-1-10-2ΔΔI4 = 1
-2-20-2-2ΔI3 = 2
-3-1100-2I2 = 3
Δ-2-1-2-1-1I1 = 2
ΔΔ-3-2-10I0 = 1
![Page 31: A Sub-quadratic Sequence Alignment Algorithm. Global alignment Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1)](https://reader035.vdocuments.site/reader035/viewer/2022062413/5a4d1b737f8b9ab0599b614b/html5/thumbnails/31.jpg)
gca
g
a
gca
g
a
I0
I4 I5I2I3
I1
O3 DIST matrix
0-1-2ΔΔΔI5 = 3
-1-10-2ΔΔI4 = 1
-2-20-2-2ΔI3 = 2
-3-1100-2I2 = 3
Δ-2-1-2-1-1I1 = 2
ΔΔ-3-2-10I0 = 1
0-1-2ΔΔΔI5 = 3
-1-10-2ΔΔI4 = 1
-2-20-2-2ΔI3 = 2
-3-1100-2I2 = 3
Δ-2-1-2-1-1I1 = 2
ΔΔ-3-2-10I0 = 1
![Page 32: A Sub-quadratic Sequence Alignment Algorithm. Global alignment Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1)](https://reader035.vdocuments.site/reader035/viewer/2022062413/5a4d1b737f8b9ab0599b614b/html5/thumbnails/32.jpg)
gca
g
a
gca
g
a
I0
I4 I5I2I3
I1
O3 DIST matrix
0-1-2ΔΔΔI5 = 3
-1-10-2ΔΔI4 = 1
-2-20-2-2ΔI3 = 2
-3-1100-2I2 = 3
Δ-2-1-2-1-1I1 = 2
ΔΔ-3-2-10I0 = 1
0-1-2ΔΔΔI5 = 3
-1-10-2ΔΔI4 = 1
-2-20-2-2ΔI3 = 2
-3-1100-2I2 = 3
Δ-2-1-2-1-1I1 = 2
ΔΔ-3-2-10I0 = 1
![Page 33: A Sub-quadratic Sequence Alignment Algorithm. Global alignment Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1)](https://reader035.vdocuments.site/reader035/viewer/2022062413/5a4d1b737f8b9ab0599b614b/html5/thumbnails/33.jpg)
gca
g
a
gca
g
a
I0
I4 I5I2I3
I1
O3 DIST matrix
0-1-2ΔΔΔI5 = 3
-1-10-2ΔΔI4 = 1
-2-20-2-2ΔI3 = 2
-3-1100-2I2 = 3
Δ-2-1-2-1-1I1 = 2
ΔΔ-3-2-10I0 = 1
0-1-2ΔΔΔI5 = 3
-1-10-2ΔΔI4 = 1
-2-20-2-2ΔI3 = 2
-3-1100-2I2 = 3
Δ-2-1-2-1-1I1 = 2
ΔΔ-3-2-10I0 = 1
![Page 34: A Sub-quadratic Sequence Alignment Algorithm. Global alignment Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1)](https://reader035.vdocuments.site/reader035/viewer/2022062413/5a4d1b737f8b9ab0599b614b/html5/thumbnails/34.jpg)
gca
g
a
gca
g
a
I0
I4 I5I2I3
I1
O3 DIST matrix
0-1-2ΔΔΔI5 = 3
-1-10-2ΔΔI4 = 1
-2-20-2-2ΔI3 = 2
-3-1100-2I2 = 3
Δ-2-1-2-1-1I1 = 2
ΔΔ-3-2-10I0 = 1
0-1-2ΔΔΔI5 = 3
-1-10-2ΔΔI4 = 1
-2-20-2-2ΔI3 = 2
-3-1100-2I2 = 3
Δ-2-1-2-1-1I1 = 2
ΔΔ-3-2-10I0 = 1
![Page 35: A Sub-quadratic Sequence Alignment Algorithm. Global alignment Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1)](https://reader035.vdocuments.site/reader035/viewer/2022062413/5a4d1b737f8b9ab0599b614b/html5/thumbnails/35.jpg)
• Only column m in DIST[i,j] is new
• DIST block can be updated in O(m+n)
![Page 36: A Sub-quadratic Sequence Alignment Algorithm. Global alignment Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1)](https://reader035.vdocuments.site/reader035/viewer/2022062413/5a4d1b737f8b9ab0599b614b/html5/thumbnails/36.jpg)
MANTAINING DIRECT ACCESS TO DIST TABLE
![Page 37: A Sub-quadratic Sequence Alignment Algorithm. Global alignment Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1)](https://reader035.vdocuments.site/reader035/viewer/2022062413/5a4d1b737f8b9ab0599b614b/html5/thumbnails/37.jpg)
-3
-1
1
0
0
-2
a a c g a c g actacgaga
Trie for T0
1 3
2
4
g
ga
c
Trie for S0
31
2
54
g
cta
g
2 3 4
12
3
4
5
01
![Page 38: A Sub-quadratic Sequence Alignment Algorithm. Global alignment Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1)](https://reader035.vdocuments.site/reader035/viewer/2022062413/5a4d1b737f8b9ab0599b614b/html5/thumbnails/38.jpg)
-3
-1
1
0
0
-2
-2
-2
0
-2
-2
-1
-1
0
-2
0
-1
-2-2
-1
-2
-1
-1
-3
-2
-1
0
a a c g a c g actacgaga
Trie for T0
1 3
2
4
g
ga
c
Trie for S0
31
2
54
g
cta
g
2 3 4
12
3
4
5
01
![Page 39: A Sub-quadratic Sequence Alignment Algorithm. Global alignment Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1)](https://reader035.vdocuments.site/reader035/viewer/2022062413/5a4d1b737f8b9ab0599b614b/html5/thumbnails/39.jpg)
DIST
-3
-1
1
0
0
-2
-2
-2
0
-2
-2
-1
-1
0
-2
0
-1
-2-2
-1
-2
-1
-1
-3
-2
-1
0
a a c g a c g actacgaga
Trie for T0
1 3
2
4
g
ga
c
Trie for S0
31
2
54
g
cta
g
2 3 4
12
3
4
5
01
![Page 40: A Sub-quadratic Sequence Alignment Algorithm. Global alignment Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1)](https://reader035.vdocuments.site/reader035/viewer/2022062413/5a4d1b737f8b9ab0599b614b/html5/thumbnails/40.jpg)
![Page 41: A Sub-quadratic Sequence Alignment Algorithm. Global alignment Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1)](https://reader035.vdocuments.site/reader035/viewer/2022062413/5a4d1b737f8b9ab0599b614b/html5/thumbnails/41.jpg)
Complexity
• Assume |S| = |T| = n• Number of words in S, T = O(hn/log n)• Number of blocks in alignment graph O(h2n2/(log n)2)• For each block
– Update new DIST block O(t = size of the border)– Create direct access table O(t)
• Propagating I/O across blocks – SMAWK O(t)
• Sum of the sizes of all borders is O(hn2/log n)• Total complexity: O(hn2/log n)
![Page 42: A Sub-quadratic Sequence Alignment Algorithm. Global alignment Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1)](https://reader035.vdocuments.site/reader035/viewer/2022062413/5a4d1b737f8b9ab0599b614b/html5/thumbnails/42.jpg)
Other extensions
• Trace• Reducing the space complexity for discrete
scoring• Local alignment
![Page 43: A Sub-quadratic Sequence Alignment Algorithm. Global alignment Alignment graph for S = aacgacga, T = ctacgaga Complexity: O(n 2 ) V(i,j) = max { V(i-1,j-1)](https://reader035.vdocuments.site/reader035/viewer/2022062413/5a4d1b737f8b9ab0599b614b/html5/thumbnails/43.jpg)
References
• Crochemore, M.; Landau, G. M. & Ziv-Ukelson, M. A sub-quadratic sequence alignment algorithm for unrestricted cost matricesACM-SIAM, 2002, 679-688
• Some pictures from 葉恆青