algorithms @ ymu, 2005
DESCRIPTION
Algorithms @ YMU, 2005. 呂學一 http://www.csie.ntu.edu.tw/~hil/. Today. Exact string matching in linear time. The Exact String Matching Problem. Input a string P –– the pattern a string S –– the text Output all the occurrences of P in S. Illustration. 1 2 3 4 5 6 7 8 9 0 1 2 3 - PowerPoint PPT PresentationTRANSCRIPT
2005/6/32005/6/3 Algorithms @ YMUAlgorithms @ YMU 11
Algorithms @ YMU, 2005
呂學一
http://www.csie.ntu.edu.tw/~hil/
2005/6/32005/6/3 Algorithms @ YMUAlgorithms @ YMU 22
TodayToday
Exact string matching in linear time.
2005/6/32005/6/3 Algorithms @ YMUAlgorithms @ YMU 33
The Exact String The Exact String Matching ProblemMatching Problem Input
– a string P –– the pattern– a string S –– the text
Output– all the occurrences of P in S.
2005/6/32005/6/3 Algorithms @ YMUAlgorithms @ YMU 44
IllustrationIllustration
1 2 3 4 5 6 7 8 9 0 1 2 3 S = t a t a t t a t a t a t a P = t a t a Output
16810
2005/6/32005/6/3 Algorithms @ YMUAlgorithms @ YMU 55
B. Why? B. Why?
Computer Science– Dictionary, database– Search engines: Yahoo!, Google, …
Biology:– Blast
Warm-up for this course:– A well studied problem,– The idea/technique behind.
2005/6/32005/6/3 Algorithms @ YMUAlgorithms @ YMU 66
C. How?
2005/6/32005/6/3 Algorithms @ YMUAlgorithms @ YMU 77
Notation for stringsNotation for strings
S is a string– |S| = the length of S.– substring: S[i…j].
2005/6/32005/6/3 Algorithms @ YMUAlgorithms @ YMU 88
A naïve algorithmA naïve algorithm
Input: S and P. Output: all occurrences of P in S.
for i=1 to |S|
if S[i…i+|P|-1] equals P
output i;
2005/6/32005/6/3 Algorithms @ YMUAlgorithms @ YMU 99
IllustrationIllustration
1 2 3 4 5 6 7 8 9 0 1 2 3 S = t a t a t t a t a t a t a P = t a t a Output
16810
2005/6/32005/6/3 Algorithms @ YMUAlgorithms @ YMU 1010
Another approach
Dan Gusfield’s Z values
2005/6/32005/6/3 Algorithms @ YMUAlgorithms @ YMU 1111
The Z values of a The Z values of a string Sstring S Z(i) of a string S is the largest integer d
such that S[1…d] = S[i…i+d-1]
Si i+d-1
2005/6/32005/6/3 Algorithms @ YMUAlgorithms @ YMU 1212
Clearly, …Clearly, …
If Z(1), Z(2), …, Z(|S|) are the Z values of S, then– Z(1) = |S|;– Z(i) ≥ 0 for each i = 1, 2, …, |S|.
2005/6/32005/6/3 Algorithms @ YMUAlgorithms @ YMU 1313
For example, …For example, …
S = a a g c a a t a a a g c
Z = 12 1 0 0 2 1 0 2 4 1 0 0
a a a a a
a a a g c
a
2005/6/32005/6/3 Algorithms @ YMUAlgorithms @ YMU 1414
Question
How do we find all occurrences of P in S using Z values (of what)?
2005/6/32005/6/3 Algorithms @ YMUAlgorithms @ YMU 1515
Exact String Matching Exact String Matching with Z valueswith Z values
computing Z values of PS;
for i=1 to |S|
if Z(i+|P|)>=|P| then
output i;
P
i+|P| i+|P|+d-1
S
2005/6/32005/6/3 Algorithms @ YMUAlgorithms @ YMU 1616
Time complexity?Time complexity?
computing Z values of PS;
for i=1 to |S|
if Z(i+n)>=|P| then
output i; O(|S|) + time for computing the Z values
of PS.
2005/6/32005/6/3 Algorithms @ YMUAlgorithms @ YMU 1717
Computing Z(i) naivelyComputing Z(i) naively
For i=1 to |S| {
let j = i;
let Z(i) = 0;
while (S[j]==S[j-i+1]){
Z(i)++;
j++;
}
}
Time complexit
y?
Is it tight?
O(|S|2)
S = 000…000
2005/6/32005/6/3 Algorithms @ YMUAlgorithms @ YMU 1818
Z(i) can be naively computed in O(Z(i)+1) timeFor i=1 to |S| {
let j = i;
let Z(i) = 0;
while (S[j]==S[j-i+1]){
Z(i)++;
j++;
}
}
We need this
observation later.
2005/6/32005/6/3 Algorithms @ YMUAlgorithms @ YMU 1919
Z values in linear time
2005/6/32005/6/3 Algorithms @ YMUAlgorithms @ YMU 2020
NotationNotation
右護法 (i) = max{j + Z(j) – 1 | 1< j ≤ i}.– Abbreviated as 右 (i).
左護法 (i) = min{j | 右 (j) = 右 (i)}.– Abbreviated as 左 (i).
觀察 : 左右護法均 nondecreasing.
2005/6/32005/6/3 Algorithms @ YMUAlgorithms @ YMU 2121
Illustration Illustration
Si i+Z(i) – 1i – 1 +Z(i – 1) – 1i – 1
2005/6/32005/6/3 Algorithms @ YMUAlgorithms @ YMU 2222
IllustrationIllustration
i1 2 右 (i)左 (i)
2005/6/32005/6/3 Algorithms @ YMUAlgorithms @ YMU 2323
For example, …For example, … 1 1 1 1 1 1 1 1 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7
S = a a b a a b c a x a a b a a b c yZ = 1 0 3 1 0 0 1 0 7 1 0 3 1 0 0 0
1 1 1 1 1 1 1 1i+Z(i)-1 = 2 2 6 5 5 6 8 8 6 1 1 5 4 4 5 6
1 1 1 1 1 1 1 1右 (i) = 2 2 6 6 6 6 8 8 6 6 6 6 6 6 6 6
1 1 1 1 1 1 1 1左 (i) = 2 2 4 4 4 4 8 8 0 0 0 0 0 0 0 0
2005/6/32005/6/3 Algorithms @ YMUAlgorithms @ YMU 2424
StrategyStrategy
Computing Z(i), 右 (i), 左 (i) from – Z(1), Z(2), …, Z(i – 1);– 右 (i – 1);– 左 (i – 1).
2005/6/32005/6/3 Algorithms @ YMUAlgorithms @ YMU 2525
Case 1: Case 1: 右右 (i-1) ≤ i-1.(i-1) ≤ i-1.
右 (i-1) does not cover i. – (S[i]未能受到右護法的庇護 )
Computing Z(i) naively in O(1+Z(i)) time. 左 (i) = i. 右 (i) = i + Z(i) – 1. Observation (need this later)
1+Z(i) = 右 (i) – i + 2 ≤ 右 (i) – 右 (i – 1) + 1.
2005/6/32005/6/3 Algorithms @ YMUAlgorithms @ YMU 2626
Case 2:Case 2: 右右 (i-1) ≥ i(i-1) ≥ iand Z(j) < and Z(j) < 右右 (i-1)-i+1.(i-1)-i+1.
i 右 (i-1)左 (i-1)
i – 左 (i-1)+1 = j
Z(j)
2005/6/32005/6/3 Algorithms @ YMUAlgorithms @ YMU 2727
Z(i) = Z(j), Z(i) = Z(j), 左左 (i) = (i) = 左左(i-1), (i-1), 右右 (i) = (i) = 右右 (i-1).(i-1).
i 右 (i-1)左 (i-1)
i – 左 (i-1)+1 = j
Z(j)
2005/6/32005/6/3 Algorithms @ YMUAlgorithms @ YMU 2828
Case 3:Case 3: 右右 (i-1) ≥ i(i-1) ≥ iand Z(j) ≥ and Z(j) ≥ 右右 (i-1)-i+1.(i-1)-i+1.
i 右 (i-1)左 (i-1)
i – 左 (i-1)+1 = j
Z(j)
2005/6/32005/6/3 Algorithms @ YMUAlgorithms @ YMU 2929
Finding Z(i) by Finding Z(i) by comparsions comparsions starting from starting from 右右 (i-1)+1. (i-1)+1. Why?Why?
i 右 (i-1)左 (i-1)
i – 左 (i-1)+1 = j
Z(j)
2005/6/32005/6/3 Algorithms @ YMUAlgorithms @ YMU 3030
Computing Computing 左左 (i) and (i) and 右右(i).(i).
i 右 (i-1)左 (i-1)
i – 左 (i-1) + 1 = j
Z(j)
右 (i) = i + Z(i) -1. 左 (i) = i. How many comparisons?
– 右 (i)- 右 (i-1)+1.
2005/6/32005/6/3 Algorithms @ YMUAlgorithms @ YMU 3131
Time complexity is Time complexity is linear.linear. Case 1:
– O(Z(i)+1) = O( 右 (i)- 右 (i-1)+1). Case 2:
– O(1) = O( 右 (i)- 右 (i-1)+1). Case 3:
– O( 右 (i)- 右 (i -1)+1).
2005/6/32005/6/3 Algorithms @ YMUAlgorithms @ YMU 3232
Overall time Overall time complexitycomplexity
|))(|(|)(| SOSO
|).(| SO
右
)1)1()((||
1
i右i右OS
i