11 string matching
TRANSCRIPT
![Page 1: 11 String Matching](https://reader034.vdocuments.site/reader034/viewer/2022051312/5475cfc7b4af9fa30a8b5dfe/html5/thumbnails/1.jpg)
String Matching
String Matching
![Page 2: 11 String Matching](https://reader034.vdocuments.site/reader034/viewer/2022051312/5475cfc7b4af9fa30a8b5dfe/html5/thumbnails/2.jpg)
String Matching Algorithms
Finding Patterns in a given Text
Datastructures: Tries, Suffix-Tries, Suffix Arrays
Algorithms:
Naive ApproachBoyer-MooreRabin-KarpKnuth-Morris-Pratt (KMP)
Literature: Dan Gusfield, Algorithms on strings, trees, and
sequences
CLRS (Cormen,. . .), Introduction to Algorithms
String Matching
![Page 3: 11 String Matching](https://reader034.vdocuments.site/reader034/viewer/2022051312/5475cfc7b4af9fa30a8b5dfe/html5/thumbnails/3.jpg)
Naive Approach
Naive Approach
n = text.size();
m = pattern.size();
for s = 0 to n - m {
if (pattern[1 .. m] = text[s+1 .. s+m]) add_result(s);
}
For T = an, P = am and m = n/2 the worst case occurs, yieldinga running time of Θ(n2).
String Matching
![Page 4: 11 String Matching](https://reader034.vdocuments.site/reader034/viewer/2022051312/5475cfc7b4af9fa30a8b5dfe/html5/thumbnails/4.jpg)
Rabin-Karp
Rabin-Karp
n=text.lenght();
m=pattern.length();
hpattern = hash(pattern)
htext = hash(text[0..m-1])
for s = 0 to n - m {
if (htext == hpattern)
if (pattern[1 .. m] = text[s .. s+m-1])
add_result(s);
htext = hash(s+1,s+m)
}
String Matching
![Page 5: 11 String Matching](https://reader034.vdocuments.site/reader034/viewer/2022051312/5475cfc7b4af9fa30a8b5dfe/html5/thumbnails/5.jpg)
Properties of Rabin-Karp
Properties of Rabin-Karp-Algorithm
Worst case running time (as for the naive approach) isO((n − m + 1)m).
On average good, i.e. O(n + m).
String Matching
![Page 6: 11 String Matching](https://reader034.vdocuments.site/reader034/viewer/2022051312/5475cfc7b4af9fa30a8b5dfe/html5/thumbnails/6.jpg)
Boyer-Moore
Compare right → left.
possible that some text chars are never compared
Good explanation in Dan Gusfield, Algorithms on strings trees
and sequences
Bad char shifts
String Matching
![Page 7: 11 String Matching](https://reader034.vdocuments.site/reader034/viewer/2022051312/5475cfc7b4af9fa30a8b5dfe/html5/thumbnails/7.jpg)
Boyer-Moore, strong good suffix rule
(strong) good suffix rule
T: prstabstubabvqxrst
*
P: qcabdabdab
String Matching
![Page 8: 11 String Matching](https://reader034.vdocuments.site/reader034/viewer/2022051312/5475cfc7b4af9fa30a8b5dfe/html5/thumbnails/8.jpg)
Boyer-Moore, strong good suffix rule
(strong) good suffix rule
T: prstabstubabvqxrst
*
P: qcabdabdab
String Matching
![Page 9: 11 String Matching](https://reader034.vdocuments.site/reader034/viewer/2022051312/5475cfc7b4af9fa30a8b5dfe/html5/thumbnails/9.jpg)
Boyer-Moore, strong good suffix rule
(strong) good suffix rule
T: prstabstubabvqxrst
*
P: qcabdabdab
String Matching
![Page 10: 11 String Matching](https://reader034.vdocuments.site/reader034/viewer/2022051312/5475cfc7b4af9fa30a8b5dfe/html5/thumbnails/10.jpg)
Boyer-Moore, strong good suffix rule
(strong) good suffix rule
T: prstabstubabvqxrst
*
P: qcabdabdab
P: qcabdabdab
String Matching
![Page 11: 11 String Matching](https://reader034.vdocuments.site/reader034/viewer/2022051312/5475cfc7b4af9fa30a8b5dfe/html5/thumbnails/11.jpg)
Boyer-Moore, strong good suffix rule
(strong) good suffix rule
T: prstabstubabvqxrst
*
P: qcabdabdab
String Matching
![Page 12: 11 String Matching](https://reader034.vdocuments.site/reader034/viewer/2022051312/5475cfc7b4af9fa30a8b5dfe/html5/thumbnails/12.jpg)
Boyer-Moore, strong good suffix rule
(strong) good suffix rule
T: prstabstubabvqxrst
*
P: qcabdabdab
P: qcabdabdab
String Matching
![Page 13: 11 String Matching](https://reader034.vdocuments.site/reader034/viewer/2022051312/5475cfc7b4af9fa30a8b5dfe/html5/thumbnails/13.jpg)
Properties of Boyer-Moore
Properties of Boyer-Moore-Algorithm
Worst case if pattern is not in the text O(n).
Best case O(n/m) running time.
In practice one of the best known algorithms for stringmatching.
details see e.g.http://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string_s
String Matching
![Page 14: 11 String Matching](https://reader034.vdocuments.site/reader034/viewer/2022051312/5475cfc7b4af9fa30a8b5dfe/html5/thumbnails/14.jpg)
Properties of KMP
Properties of KMP-Algorithm
Worst case running time is O(n).
In practice most of the time slower than Boyer Moor buteasier to code.
no details here
extension: Aho-Corasick for matching multiple strings in onepass
String Matching
![Page 15: 11 String Matching](https://reader034.vdocuments.site/reader034/viewer/2022051312/5475cfc7b4af9fa30a8b5dfe/html5/thumbnails/15.jpg)
Tries
Tries
data structure for a set of strings
each node corresponds to a prefix of some string
each edge corresponds to a character
example stolen from wikipedia: to, tea, ten, i, in, and inn
it
eo n
nna
t i
in
inn
te
tea ten
to
3 12 9
7 5
11
String Matching
![Page 16: 11 String Matching](https://reader034.vdocuments.site/reader034/viewer/2022051312/5475cfc7b4af9fa30a8b5dfe/html5/thumbnails/16.jpg)
Suffix-Trees/Tries/Arrays
Suffix-Tries/Trees
preprocessing the text not the pattern
tree containing every suffix of a text (size?)
Fast searching for any substring
trie→tree: one edge for paths without branches
there are linear time algorithm for suffix trees (clearly linearsize)
Suffix Arrays
array of length |S | listing the suffixes of S in ascending order
(simple) search in m log n time
simple implementation in O(n2 log n) and O(n) space oftensufficient
String Matching