![Page 1: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/1.jpg)
Finding Patterns in Trees and Strings
Philip Bille
![Page 2: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/2.jpg)
Agenda• Background for PhD
• Tree Matching
• Tree Inclusion Problem
• String Matching
• Regular Expression Matching Problem
• Core Techniques and Future Research
• Short Break
• Question Session
![Page 3: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/3.jpg)
Background For PhD
• Worked on data structures.
• Labeling Schemes for Small Distances in Trees.Stephen Alstrup, Philip Bille, and Theis Rauhe. SIAM J. Disc. Math., 2005 and SODA 2003.
• PhD funded by EU-project “Deep Structure, Singularities, and Computer Vision” working on tree matching problems.
![Page 4: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/4.jpg)
• A Survey on Tree Edit Distance and Related Problems. Philip Bille.Theoret. Comp. Sci., 2005.
• The Tree Inclusion Problem: In Optimal Space and Faster. Philip and Inge Li Gørtz. ICALP 2005.
• Matching Subsequences in Trees. Philip Bille and Inge Li Gørtz. CIAC 2006.
• From a 2D Shape to a String Structure using the Symmetry Set.Arjan Kuijper, Ole Fogh Olsen, Peter Giblin, Philip Bille, and Mads Nielsen. ECCV 2004.
• Matching 2D Shapes using their Symmetry Sets.Arjan Kuijper, Ole Fogh Olsen, Peter Giblin, and Philip Bille. ICPR 2006.
![Page 5: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/5.jpg)
• Trees are rooted, labeled, and ordered.
• Rooted: A specific node is designated to be the root.
• Labeled: Each node is assigned a label from an alphabet .
• Ordered: There is a given left-to-right ordering among siblings.
• We compare trees by deleting nodes.
Basic Setup
�
![Page 6: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/6.jpg)
Deleting a Node
acc b
a
a b
a
a
c b
b
b
b
b
a
![Page 7: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/7.jpg)
Deleting a Node
acc b
a
a b
a
a
c b
b
b
b
b
a
![Page 8: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/8.jpg)
Deleting a Node
acc b
a
a b
a
a
c b
b
b
b
a
![Page 9: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/9.jpg)
Deleting a Node
acc b
a
a b
a
a
c b
b
b
b
a
![Page 10: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/10.jpg)
Tree Inclusion
• is included in if can be obtained from by deleting nodes in .
• is minimally included in if is not included in any proper subtree of .
• The tree inclusion problem is to decide if is included in , and if so, compute all subtrees of which minimally include .
P T
P
P
P
PP
T T
T T
TT
![Page 11: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/11.jpg)
acc b
a
a b
a
c
c b
b
b
b
b
aa
b
a
Example
![Page 12: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/12.jpg)
acc b
a
a b
a
c
c b
b
b
b
b
aa
b
a
Example
![Page 13: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/13.jpg)
acc b
a
a b
a
c
c b
b
b
b
b
aa
b
a
Example
![Page 14: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/14.jpg)
catalog
book
chapterauthor
Muthukrishnan
book
title
Basic MathIdeas
chapter
name section
Sampling
title
New Directions
book
book
chapterauthor
Muthukrishnan Sampling
Query: “Find all books written by Muthukrishnan with a chapter that has something to do with sampling”.
Application: Querying XML Data Bases
![Page 15: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/15.jpg)
catalog
book
chapterauthor
Muthukrishnan
book
title
Basic MathIdeas
chapter
name section
Sampling
title
New Directions
book
book
chapterauthor
Muthukrishnan Sampling
Query: “Find all books written by Muthukrishnan with a chapter that has something to do with sampling”.
Application: Querying XML Data Bases
![Page 16: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/16.jpg)
[KM92]
[Che98]
[Here][Here][Here]
Time Space Ref
O(nP nT )
O(lP nT )
O(lP nT )
O(nP nT )
O(lP min(dT , lT ))
O(nP lT log log nT + nT ) O(nP + nT )
O
�nP nTlog nT
+ nT log nT
⇥
Results
![Page 17: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/17.jpg)
Practical Implications
• Significant space reduction:
• Feasible to query large XML databases.
• Faster query time since more computation can be kept in main memory.
![Page 18: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/18.jpg)
Algorithm Overview
• Reduce tree inclusion to tree embedding.
• Compute tree embeddings using a simple general framework.
• Implement the framework in 3 different ways to get the results.
![Page 19: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/19.jpg)
• An injective function from the nodes of to the nodes of is an embedding if for all nodes and :
1. ,
2. is a proper ancestor of if and only if is a proper ancestor of ,
3. is to the left of if and only if is to the left of .
• is included in if and only if there is an embedding from to .
• is minimally included in if and only if there is an embedding from to and cannot be embedded in a proper subtree of .
Tree Inclusion and Embeddings
label(v) = label(f (v))
v w f (v) f (w)
v w f (v) f (w)
fv w
P T
P T P T
P PP
T TT
![Page 20: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/20.jpg)
Computing Embeddings: P is a Path
b
a
acc b a b
a
c
c b
b
b
b
c
P T
![Page 21: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/21.jpg)
Computing Embeddings: P is a Path
b
a
acc b a b
a
c
c b
b
b
b
c
P T
![Page 22: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/22.jpg)
Computing Embeddings: P is a Path
b
a
acc b a b
a
c
c b
b
b
b
c
P T
= Active set
![Page 23: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/23.jpg)
Computing Embeddings: P is a Path
b
a
acc b a b
a
c
c b
b
b
b
c
P T
= Active set = Root of min. subtree including
![Page 24: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/24.jpg)
Computing Embeddings: P is a Path
b
a
acc b a b
a
c
c b
b
b
b
c
P T
= Active set = Root of min. subtree including
![Page 25: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/25.jpg)
Computing Embeddings: P is a Path
b
a
acc b a b
a
c
c b
b
b
b
c
P T
= Active set = Root of min. subtree including
![Page 26: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/26.jpg)
Computing Embeddings: P is a Path
b
a
acc b a b
a
c
c b
b
b
b
c
P T
= Active set = Root of min. subtree including
![Page 27: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/27.jpg)
Computing Embeddings: P is a Path
b
a
acc b a b
a
c
c b
b
b
b
c
P T
= Active set = Root of min. subtree including
![Page 28: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/28.jpg)
Computing Embeddings: P is a Path
b
a
acc b a b
a
c
c b
b
b
b
c
P T
= Active set = Root of min. subtree including
![Page 29: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/29.jpg)
Computing Embeddings: P is a Path
b
a
acc b a b
a
c
c b
b
b
b
c
P T
= Active set = Root of min. subtree including
![Page 30: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/30.jpg)
Computing Embeddings: P is a Path
b
a
acc b a b
a
c
c b
b
b
b
c
P T
= Active set = Root of min. subtree including
![Page 31: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/31.jpg)
Computing Embeddings: P is a Path
b
a
acc b a b
a
c
c b
b
b
b
c
P T
= Active set = Root of min. subtree including
![Page 32: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/32.jpg)
Computing Embeddings: P is a Path
b
a
acc b a b
a
c
c b
b
b
b
c
P T
= Active set = Root of min. subtree including
![Page 33: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/33.jpg)
Time Complexity
• At each step of the algorithm the active set “moves up”.
• Each parent pointer in is traversed a constant number of times.
• Using a simple data structure and exploiting the ordering of the nodes we get a total running time of . O(nT )
T
![Page 34: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/34.jpg)
Computing Embeddings: P is not a Path
acc b a b
a
c
c b
b
b
b
c
P T
b
b
a
![Page 35: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/35.jpg)
Computing Embeddings: P is not a Path
acc b a b
a
c
c b
b
b
b
c
P T
b
b
a
![Page 36: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/36.jpg)
Computing Embeddings: P is not a Path
acc b a b
a
c
c b
b
b
b
c
P T
b
b
a
![Page 37: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/37.jpg)
Computing Embeddings: P is not a Path
acc b a b
a
c
c b
b
b
b
c
P T
b
b
a
![Page 38: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/38.jpg)
Computing Embeddings: P is not a Path
acc b a b
a
c
c b
b
b
b
c
P T
b
b
a
![Page 39: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/39.jpg)
Computing Embeddings: P is not a Path
acc b a b
a
c
c b
b
b
b
c
P T
b
b
a
![Page 40: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/40.jpg)
Computing Embeddings: P is not a Path
acc b a b
a
c
c b
b
b
b
c
P T
b
b
a
![Page 41: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/41.jpg)
Computing Embeddings: P is not a Path
acc b a b
a
c
c b
b
b
b
c
P T
b
b
a
![Page 42: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/42.jpg)
Computing Embeddings: P is not a Path
acc b a b
a
c
c b
b
b
b
c
P T
b
b
a
![Page 43: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/43.jpg)
Time Complexity
• Time complexity is bounded by the time used to compute embeddings for each root-to-leaf path in .
• => Time: O(lP nT )
P
![Page 44: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/44.jpg)
Algorithm 2
• Reconsider the case when is path:
• Let denote the nearest ancestor of node in labeled .
• At each step we “essentially” compute for each node in the active set.
firstlabel(v , l) v T l
P
firstlabel(v , l) v
![Page 45: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/45.jpg)
• Idea: Use a fast data structure supporting queries. Known as the tree color problem.
• Theorem [Dietz1989]: For any tree there is a data structure using space, expected preprocessing time which supports queries in time .
Algorithm 2
O(nT )O(nT )O(log log nT )
firstlabel
firstlabelT
![Page 46: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/46.jpg)
• For each node in we have an active set of size at most and for each node in this active set we have to compute a query.
• => Time:
Time Complexity
lT
O(nP lT log log nT + nT )
firstlabelP
![Page 47: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/47.jpg)
Algorithm 3: Idea
• Divide T into micro trees of size which overlap in at most 2 nodes. Based on clustering technique from [AHLT1997].
• We represent each micro tree by a constant number of nodes in a macro tree and connect them according to the overlap of the micro trees.
O(log nT )O(nT / log nT )
![Page 48: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/48.jpg)
Algorithm 3: Idea
• Active sets are represented compactly in space as small bit strings for each micro tree.
• We preprocess micro trees using a “Four Russian Technique” such that we can update the active set in constant time for each micro tree.
• Leads to an time algorithm. O
�nP nTlog nT
+ nT log nT
⇥
O(nT / log nT )
![Page 49: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/49.jpg)
Space Complexity
• Linear Space?
• No!
![Page 50: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/50.jpg)
lT
lT
lT
lT
lT
lT lT
dP
P
The Problem: Algorithm 1 and 2
• Storing all active sets uses space. �(lT dP )
![Page 51: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/51.jpg)
lT
lT
lT
P
lT
Trick 1: Recurse to subtree with the most leaves
• The number of active sets stored does not exceed .
• => Total space for stored active sets is .
O(log lP )
O(lT log lP )
![Page 52: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/52.jpg)
Trick 2: Strengthen Analysis
• Nodes in the active set for are roots of (disjoint) subtrees that embed .
• => Each of these subtrees have at least leaves.
• => The size of the active set for is at most .
P
v
T
lP (v)
v O(lT /lP (v))
P (v)v
![Page 53: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/53.jpg)
Space Complexity: Algorithm 1 and 2
• Trick 1 and 2 combined gives exponentially decreasing sizes of the stored active sets.
• => Total size of the stored active sets is .
• Space complexity is .
• Trick 2 shows that algorithm 2 in fact runs in time.
O(nP + nT )
O(lP lT log log nT + nT )
O(lT )
![Page 54: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/54.jpg)
Space Complexity: Algorithm 3
• Each active sets is represented in space.
• Trick 1 gives us that the total space for the stored active sets is
O(nT / log nT )
O
�nTlog nT
log lP
⇥= O(nT )
![Page 55: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/55.jpg)
Summary
• Time:
• Space:
min
�⌅⇤
⌅⇥
O(lP nT ),
O(lP lT log log nT + nT ),
O( nP nTlog nT+ nT log nT ).
O(nP + nT )
![Page 56: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/56.jpg)
String Matching
• Fast and Compact Regular Expression Matching. Philip Bille and Martin Farach-Colton, 2005, submitted.
• New Algorithms for Regular Expression Matching. Philip Bille, ICALP 2006.
• Improved Approximate String Matching and Regular Expression Matching on Ziv-Lempel Compressed Texts. Philip Bille and, Rolf Fagerberg, and Inge Li Gørtz, CPM 2007.
![Page 57: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/57.jpg)
Regular Expressions
• The regular expressions are defined recursively:
• A character is a regular expression.
• If and are regular expressions then so is
• the concatenation ,
• the union , and
• the kleene star .
� � �
S T
ST
S�
S | T
![Page 58: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/58.jpg)
Regular Expressions
• The language of a regular expression is defined by:
• For any , .
• For regular expressions and :
L(R) R
L(�) = {�}� � �
L(ST ) = L(S)L(T )
L(S|T ) = L(S) � L(T )
L(S�) = {�} ⇥ L(S) ⇥ L(S)2 ⇥ L(S)3 ⇥ · · ·
S T
![Page 59: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/59.jpg)
L(R) = {ac, b, ab, aab, aaab, aaaab, . . .}
Example
R = ac |a�b
![Page 60: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/60.jpg)
Regular expression Matching
• Given a regular expression and a string the regular expression matching problem is to decide if .
• Example: matches .
Q � L(R)
R = ac |a�b Q = aaaab
R Q
![Page 61: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/61.jpg)
Applications:
• Lexical analysis phase in compilers.
• Protein searching.
• Text editing and programming languages (e.g. EMACS and Perl).
![Page 62: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/62.jpg)
[Tho68]
[NR04]
[Mye92, Here]
[Here]
Time
�⌅⇤
⌅⇥
O(nm logww +m logw) if m > w
O(n logm +m logm) if⇥w < m � w
O(min(n +m2, n logm +m logm)) if m �⇥w .
Results
O
�nm
log n+ n +m logm
⇥
O((n + 2m)�m/w⇥)
O(nm) O(m)
O((2m + �)�m/w⇥)
O(n)
Space Ref
O(m)
![Page 63: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/63.jpg)
Practical Implications
• Except for Thompson’s algorithm all previous algorithms use large tables and perform a long series of lookups in the tables.
• => Many expensive cache misses.
• New algorithm does not require the large tables.
![Page 64: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/64.jpg)
Algorithm Overview
• Construct non-deterministic finite automata (NFA) using Thompson’s classical algorithm.
• Decompose the NFA into small subautomata.
• Simulate each subautomata using the arithmetic and logical instruction of the word RAM.
• Use the simulation of the each subautomata to simulate the entire NFA.
![Page 65: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/65.jpg)
Thompson’s Algorithm
!N(T )! ! !
N(S)
N(ST )
N(S)!
N(T )!
!
!
N(S|T )
N(S)
!
!
! !
N(S!)
N(!)
![Page 66: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/66.jpg)
Thompson NFA
ba!
!
a c! !
! !
!
!
!
! ! !!
!
• Thompson-NFA (TNFA) for .
• accepts if and only if there is path from to that “spells” out .
• if and only if accepts .
R = ac |a�b
N(R) Q � � Q
Q � L(R) N(R) Q
![Page 67: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/67.jpg)
Properties of TNFAs
• Linear number of states and transitions.
• Incoming transitions to a state have the same label.
• States with an incoming transition labeled ( -states) have exactly 1 predecessor.
� � � �
![Page 68: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/68.jpg)
Simulating TNFAs
• Let be TNFA with states. To test acceptance we use the following operations. For a state-set and :
• : Find set of states reachable from via a single -transition.
• : Find set of states reachable from via a path of -transitions.
• time for both operations.
A mS � � �
Move(S,�)
Close(S)
S �
�S
O(m)
![Page 69: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/69.jpg)
Simulating TNFAs
• Let be a string of length .
• The state-set simulation of on produces state-sets as follows:
• is the set of states reachable from through a path that spells out .
• if and only if .
• time and space.
Q n
A Q S0, S1, . . . , Sn
S0 := Close({�})
Si := Close(Move(Si�1, Q[i ]))
Si � Q[1..i ]
Q � L(R) � � Sn
O(nm) O(m)
![Page 70: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/70.jpg)
Four-Russian Speedup
b
ba!
!
A1 A3
a c! !
A2
! !
!
!
!
! ! !!
!
!! ! !
!
!! !
!
• Decompose TNFA into sub-automata with states.
• Preprocess subautomata to get and in constant time for each. Subautomata are made “deterministic”.
• => time and space algorithm [Myers92, BFC05].
O(log n)
Move Close
O
�nm
log n+ n +m logm
⇥O(n)
![Page 71: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/71.jpg)
Word-Level Parallel Algorithm
b
ba!
!
A1 A3
a c! !
A2
! !
!
!
!
! ! !!
!
!! ! !
!
!! !
!
• Idea: Use essentially same decomposition into subautomata.
• Simulate and using the arithmetic and logical instructions of the word RAM.
Move Close
![Page 72: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/72.jpg)
Simple Algorithm for small TNFAs
• Suppose is a TNFA with states.
• Order the states such that the (unique) predecessor of -state is .
• Represent state-sets as a bit string.
A m = O(�w)
� i i � 1
![Page 73: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/73.jpg)
0 0 1 0 1 0 0 0S =1 2 3 4 5 6 7 8
Representation of State-Sets
1
5
3 4
72
b
a
!
!
!
!
!
!
6
8
!
![Page 74: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/74.jpg)
• For each represent -states using a bit string:
Move Operation: Preprocessing
1
5
3 4
72
b
a
!
!
!
!
!
!
6
8
!
0 0 0 1 0 0 0 0
1 2 3 4 5 6 7 8
Da =
0 0 0 0 0 1 0 0
1 2 3 4 5 6 7 8
Db =
� � � �
![Page 75: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/75.jpg)
Move Operation: Simulation
1
5
3 4
72
b
a
!
!
!
!
!
!
6
8
!
• We compute as Move(S,�)
S� := (S >> 1) &D�
![Page 76: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/76.jpg)
• Compute where :
S >> 1
0 1 1 0 1 0 0 0
1 2 3 4 5 6 7 8
Move(S, a) S =
Example:
1
5
3 4
72
b
a
!
!
!
!
!
!
6
8
!
S� := (S >> 1) &D�
0 0 1 1 0 1 0 0
1 2 3 4 5 6 7 8
0 0 0 1 0 0 0 0&Da
0 0 0 1 0 0 0 0
![Page 77: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/77.jpg)
• Compute where :
S >> 1
0 1 1 0 1 0 0 0
1 2 3 4 5 6 7 8
Move(S, a) S =
Example:
1
5
3 4
72
b
a
!
!
!
!
!
!
6
8
!
S� := (S >> 1) &D�
0 0 1 1 0 1 0 0
1 2 3 4 5 6 7 8
0 0 0 1 0 0 0 0&Da
0 0 0 1 0 0 0 0
![Page 78: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/78.jpg)
• Compute where :
S >> 1
0 1 1 0 1 0 0 0
1 2 3 4 5 6 7 8
Move(S, a) S =
Example:
1
5
3 4
72
b
a
!
!
!
!
!
!
6
8
!
S� := (S >> 1) &D�
0 0 1 1 0 1 0 0
1 2 3 4 5 6 7 8
0 0 0 1 0 0 0 0&Da
0 0 0 1 0 0 0 0
![Page 79: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/79.jpg)
• Compute where :
S >> 1
0 1 1 0 1 0 0 0
1 2 3 4 5 6 7 8
Move(S, a) S =
Example:
1
5
3 4
72
b
a
!
!
!
!
!
!
6
8
!
S� := (S >> 1) &D�
0 0 1 1 0 1 0 0
1 2 3 4 5 6 7 8
0 0 0 1 0 0 0 0&Da
0 0 0 1 0 0 0 0
![Page 80: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/80.jpg)
• Encode -paths compactly:
Close Operation: Preprocessing
1
5
3 4
72
b
a
!
!
!
!
!
!
6
8
!
0 0 0 0 0 0 0 0E1 E2 E3 E4 E5 E6 E7 E8
1 1 0 1 0 1 1 0
1 2 3 4 5 6 7 8
E2 =
E =
�
![Page 81: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/81.jpg)
Close Operation: Preprocessing
• 3 constant bit strings for doing word tricks:
I = (10m)m
X = 1(0m)m�1
C = 1(0m�1)m�1
![Page 82: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/82.jpg)
Close Operation: Simulation
• is computed as:Close(S)
Y := (S �X) & E
Z := ((Y | I)� (I >> m)) & IS� := ((Z ⇥ C) << w �m(m + 1)) >> w �m
![Page 83: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/83.jpg)
• Compute where . S =Close(S)
Example:
1
5
3 4
72
b
a
!
!
!
!
!
!
6
8
!
1 0 0 1 1 0 0 0
1 2 3 4 5 6 7 8
![Page 84: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/84.jpg)
Step 1: Y := (S �X) & E
0 0 0 0 0 0 0 0S S S S S S S S
E1 E4 E5 E6 E7 E80 0 0 0 0 0 0 0E2 E3
Y1 Y4 Y5 Y6 Y7 Y80 0 0 0 0 0 0 0Y2 Y3
S �X =
&E
Y =
1 1 0 1 0 1 1 0
1 0 0 1 0 0 0 0
1 0 0 1 1 0 0 0
1 2 3 4 5 6 7 8
Y2 =
S =
&E21
5
3 4
72
b
a
!
!
!
!
!
!
6
8
!
![Page 85: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/85.jpg)
Y1 Y4 Y5 Y6 Y7 Y81 1 1 1 1 1 1 1Y2 Y3
11 1 1 1 1 1 10 0 0 0 0 0 0 0�(I >> m))
� � � � � � � �01 1 1 1 1 1 1
0 0 0 0 0 0 0 001 1 1 1 1 1 1
0 0 0 0 0 0 0 01 1 1 1 1 1 11&I
Z =
1 0 0 1 0 0 0 0
1 2 3 4 5 6 7 8
1
1 1 0 0 0 1 1 1 1
Step 2:
1
5
3 4
72
b
a
!
!
!
!
!
!
6
8
!
Z := ((Y | I)� (I >> m)) & I
Y | I =
Y2 =
0 0 0 0 0 0 0 0 1�
![Page 86: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/86.jpg)
Step 3:
• produces a bit string containing the test bits of as a consecutive substring.
• Shifts clears remaining bits and aligns the substring.
S� := ((Z ⇥ C) << w �m(m + 1)) >> w �m
Z � C Z
![Page 87: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/87.jpg)
Complexity
• Lemma: For TNFAs with states we can support and in constant time using space and preprocessing.
• => For string and regular expression of lengths and regular expression matching can be solved in time and space.
O(�w) Move Close
O(m) O(m2)
Q R n m = O(�w)
O(n +m2) O(m)
![Page 88: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/88.jpg)
Another Algorithm
• Main bottleneck: Need an length string to represent the transitive closure of -transitions.
• Idea: Compute a “good” separator for TNFAs and use a Divide-and-Conquer strategy.
�(m2)�
![Page 89: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/89.jpg)
• There exists two states and whose removal partitions a TNFA into two subgraphs, and , of roughly equal size such that:
• Any path from to goes through .
• Any path from to goes through .
�PO �PI �PI �PO
PO
PI
PI PO
�PI �PI
Separator Property of TNFA
PIPO
POPI �PI
�PI
![Page 90: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/90.jpg)
1.Determine which of and are -reachable
2.Update the state-set accordingly.
3.Recurse in parallel on and .
Recursive Closure Algorithm
�PI�PI �
PI PO
![Page 91: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/91.jpg)
O(m)
Complexity
• Each of the levels of recursion can be handled in parallel in constant time.
• => Lemma: For TNFAs with states we can support and in time using space and preprocessing.
• => For string and regular expression of lengths and , resp., regular expression matching can be solved in time and space .
O(logm)
Move Close
O(logm) O(m)
m = O(w)O(m logm)
Q nR m = O(w)O(n logm +m logm)
![Page 92: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/92.jpg)
Plug and Play
• Time:
• Space:
�⌅⇤
⌅⇥
O(nm logww +m logw) if m > w
O(n logm +m logm) if⇥w < m � w
O(min(n +m2, n logm +m logm)) if m �⇥w .
O(m)
![Page 93: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/93.jpg)
Core Techniques
• Data Structures: Organize information efficiently.
• Nearest common ancestors, firstlabel, dictionaries, dynamic perfect hashing, predecessors.
• Tree Techniques: Use combinatorial properties of trees.
• Heavy-path decomposition, varieties of tree clusterings with or without macro trees.
• Word-Level Parallelism: Encode and simulate algorithms using arithmetic and logical instructions of the word RAM.
• Four Russian technique, word level-parallelism.
![Page 94: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/94.jpg)
Future Research
• Bring state-of-the-art techniques to combinatorial pattern matching and related areas. Many important problems need them!
• Use developed algorithms to improve practical applications (e.g., bioinformatics, XML data bases).
• Word parallel regular expression matching looks promising.
![Page 95: Finding Patterns in Trees and Stringsphbi/files/talks/2007pmitasS.pdf · • A Survey on Tree Edit Distance and Related Problems. Philip Bille. Theoret. Comp. Sci., 2005. • The](https://reader036.vdocuments.site/reader036/viewer/2022071107/5fe208b60d35ff10184a5780/html5/thumbnails/95.jpg)
Thanks!