a new top-down algorithm for tree inclusion
DESCRIPTION
A New Top-down Algorithm for Tree Inclusion. Dr. Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba, Canada R3B 2E9. Outline. Motivation Basic algorithm for tree inclusion problem -Definition -Algorithm description Improvements - PowerPoint PPT PresentationTRANSCRIPT
A New Top-down Algorithmfor Tree Inclusion
Dr. Yangjun Chen
Dept. Applied Computer Science,
University of Winnipeg
515 Portage Ave.
Winnipeg, Manitoba, Canada R3B 2E9
Outline
Motivation Basic algorithm for tree
inclusion problem- Definition- Algorithm description
Improvements Summary
Given two ordered labeled trees P and T, called the pattern and the target,respectively. An interesting problem is: Can we obtain pattern P by deletingsome nodes from target T? That is, is there a sequence v1 , ..., vk of nodessuch that for
T0 = T andTi+1 = delete(Ti, vi +1) for i = 0, ..., k - 1,
we have Tk = P. If this is the case, we say, P is included in T, T contains P,or say, T covers P.
Motivation
a
b d
e f
T:
c b de f
T:adelete(T, c)
Motivation
s
vp
v n adv
“reads”“book”
s
np vp
det n v np adv
“The” “student”“reads” det adj n
“the”“interesting” “book”
“again and again”
Linguistic analysis
Definition 1 Let F and G be labeled ordered forests. We define an ordered embedding (, G, F) as an injective function : V(G) V(F) such that for all nodes v, u V(G),i) label(v) = label((v)); (label preservation condition)ii) v is an ancestor of u iff (v) is an ancestor of (u);(ancestor condition)iii) v is to the left of u iff (v) is to the left of (u); (Sibling condition)
Tree inclusion algorithm Definition
a
b b
G:a
d b
e b
b
F:
Algorithm
Tree inclusion algorithm
1. Let T = <t; T1, ..., Tk> (k 1) be a tree and G = <P1 , ..., Pl>(l 1) be a forest. We handle G as a tree P = <pv; P1, ..., Pl>,where pv represents a virtual node, matching any node in T.
2. Consider a node in P with children v1, ..., vj. We use a pair <i, v>(i j) to represent an ordered forest containing the first i subtreesof v: <P[v1], ..., P[vi]>. Then, <j, pv> represents the first j treesin G.
P:
v1 vi vk
… …
v
<i, v>
Algorithm
Tree inclusion algorithm
3. In addition, h(v) represents the height of v in a tree; and (v)represents a link from v in P to the leaf node on the left-mostpath in P[v].
Let v’ be a leaf node in P. Wedenote by -1(v’) a set of nodesx such that for each v x (v) = v’.
-1(v3) = {v1, v2, v3}
v1
v5
v4
v2
v3
(v1)
(v2)
P:
The tree inclusion checking is done by calling two functions recursively:top-down(T, G),bottom-up(T’, G),
where T is a tree, and T’ and G are two forests.
Algorithm
Tree inclusion algorithm
Each of the two functions returns a pair <i, v> with v being pv or a node onthe left-most path in P1.
T = <t; T1, ..., Tk>
T’ = <T1’, ..., Tk’>
G = <P1, ..., PL>
Function: top-down(T, G)
Tree inclusion algorithm
Case 1: G = <P1>; or G = <P1, ..., Pl> (l > 1), but |T | |P1| + |P2|.
In this case, we try to find a pair <i, v> such that T contains the first isubtrees of v, where v = pv , or v -1(v’) and v’ is the leaf node on the
left-most path in P1.
T: G:
P1
pv
G:
……P1 P2
pv
|T| |P1| + |P2|.
T: t
t
Pl
p1
In top-down(T, G), two cases will be handled.
p1
Function: top-down(T, G)
Tree inclusion algorithm
i) If t is a leaf node, we will check whether label(t) = label((p1)), where p1
is the root of P1. If it is the case, return <1, parent of (p1)>.
Otherwise, return <0, parent of (p1)>.
T = <t; T1, ..., Tk>: G:
P1
pv
G:
……P1 P2
pv
|T | |P1| + |P2|.
t
t
T = <t; T1, ..., Tk>:
Pl
case 1:
Function: top-down(T, G)
Tree inclusion algorithm
ii) If |T| < |P1| or height(t) < height(p1), we will make a recursive call
top-down(T , <P11, ..., P1j>), where <P11, ..., P1j> be a forest of
the subtrees of p1. The return value of top-down(T , <P11, ..., P1j>)
is used as the return value of top-down(T, G)
|T | < |P1|G:
……
pv
p1
… …P11 P1jP1i
T: t
Pl
case 1:
Function: top-down(T, G)
Tree inclusion algorithm
iii) If |T| |P1| (but |T | |P1| + |P2|) and height(t) height(p1), two casesneed to be considered:
• label(t) = label(p1). Call bottom-up(<T1, ..., Tk>, <P11, ..., P1j>).
• label(t) label(p1). Call bottom-up(<T1, ..., Tk>, <P1>).
p1
… …P11 P1jP1i
t
… …T1 TkTi
label(t) = label(p1)
p1
… …P11 P1j
P1i
t
… …T1 TkTi
label(t) label(p1)
case 1:
In both sub-cases, assume that the return value is <i, v>. A further checkingneeds to be conducted:
Function: top-down(T, G)
Tree inclusion algorithm
• If label(t) = label(v) and i = the outdegree of v, the return value shouldbe <1, v’s parent>.
• Otherwise, the return value is the same as <i, v>.
T:t
P1:p1
vor label(t) label(v)
label(t) = label(v)
case 1:
Function: top-down(T, G)
Tree inclusion algorithm
Case 2: G = <P1, ..., Pl> (l > 1), and |T| > |P1| + |P2|. In this case, we
will call bottom-up(<T1, ..., Tk>, G). Assume that the return value is <i, v>.
The following checkings will be continually conducted.
Case 1: G = <P1>; or G = <P1, ..., Pl> (l > 1), but |T | |P1| + |P2|.
G:
……P1 P2
pv
|T | > |P1| + |P2|
Pl
T:
……T1 T2
t
Tk
Function: top-down(T, G)
Tree inclusion algorithm
iv) If v = p1’s parent, the return value is the same as <i, v>. v) If v p1’s parent, check whether label(t) = label(v)) and
i = the outdegree of v. If so, the return value will be changed to<1, v’s parent>. Otherwise, the return value remains <i, v>.
Case 2: G = <P1, ..., Pl> (l > 1), and |T | > |P1| + |P2|. In this case, we
will call bottom-up(<T1, ..., Tk>, G).
Assume that the return value is <i, v>. The following checkings will becontinually conducted.
G:
… …P1 P2
pv
v = p1’s parent = pv
……P1 P2
pv
v p1’s parent
vPi Pl Pl
Function: bottom-up(T’, G)
Tree inclusion algorithm
bottom-up(T’, G) is designed to handle the case that both T’ and G are
forests. Let T’ = <T1, ..., Tk> and G = <P1, ..., Pq>. In bottom-up(T’, G),
we will make a series of calls top-down(Tl, <Pjl, ..., Pq>), where l = 1, ..., k,
j1 = 0, and j1 j2 ... jh q (for some h k), controlled as follows.
… …
Pi
… …
TkT1 Ti P1 PqT2
…
top-down(Tl, <Pjl, ..., Pq>)
T’: G:
Function: bottom-up(T’, G)
Tree inclusion algorithm
1. Two index variables l, j are used to scan T1, ..., Tk and P1, ..., Pq,respectively.
2. Let <il, vl> be the return value of top-down(Tl, <Pj, ..., Pq>). If vl = pj’sparent, set j to be j + il - 1. Otherwise, j is not changed. Set l to be l + 1.Go to (2).
3. The loop terminates when all Tl’s or all Pj’s are examined.
bottom-up(T’, G) is designed to handle the case that both T’ and G are
forests. Let T’ = <T1, ..., Tk> and G = <P1, ..., Pq>. In bottom-up(T’, G),
we will make a series of calls top-down(Tl, <Pjl, ..., Pq>), where l = 1, ..., k,
j1 = 0, and j1 j2 ... jh q (for some h k), controlled as follows.
Function: bottom-up(T’, G)
Tree inclusion algorithm
• If j > 0 when the loop terminates, bottom-up(T’, G) returns<j, p1’s parent>.
… …
Pi
… …
TkT1 Ti P1 PqT2
…
Pj
Function: bottom-up(T’, G)
Tree inclusion algorithm
i) Let <i1, v1>, <i2, v2>, ..., <ik, vk> be the respective return values of
top-down(T1, <P1, ..., Pq>),
top-down(T2, <P1, ..., Pq>), ... ...
top-down(Tk, <P1, ..., Pq>).
Since j = 0, each vl -1(v’) (l = 1, ..., k).
• Otherwise, j = 0. In this case, we will continue to searching for a pair<i, v> such that T’ contains the first i subtrees of v, where v -1(v’) andv’ is the leaf node on the left-most path in P1, as described below.
• If j > 0 when the loop terminates, bottom-up(T’, G) returns<j, p1’s parent>.
P1
v1
v2
vk
…
ii) If each il = 0, return <0, ,>, where is considered to be a descendant ofany node in G. Otherwise, find the first vg with children w1, ..., wh such thatvg is not a descendant of any other vj, and ig > 0. Call
bottom-up(<Tg+1, ..., Tk>, <P[wig+1], ..., P[wh]>).
Function: bottom-up(T’, G)
Tree inclusion algorithm
i) Let <i1, v1>, ..., <ik, vk> be the return values of top-down(T1, <P1, ..., Pq>),..., top-down(Tk, <P1, ..., Pq>), respectively. Since j = 0, each vl -1(v’)(l = 1, ..., k).
• Let <x, y> be its return value. If y = vg, then the return value ofbottom-up(T’, G) is set to be <ig + x, vg>.
• Otherwise, the return value is <ig, vg>.
… …
Tg+1T1 TgT2
P1
v1
vg
vk
Tk
… …
ig
Further improvements
Tree inclusion algorithm
In the case j = 0:
Let <i1, v1>, ..., <ik, vk> be the return values of top-down(T1, <P1, ..., Pq>),..., top-down(Tk, <P1, ..., Pq>). We will find the first vg such that it is not adescendant of any other vj and ig > 0. Then,
bottom-up(<Tg+1, ..., Tk>, <P[wig+1], ..., P[wh]>).
is invoked. This shows that all the return values except <ig, vg> are not usedin the subsequent computation. Thus, the work for looking for such valuesshould be avoided.
… …
Tg+1T1 TgT2
P1
v1
vg
vk
Tk
… …
Let <ij, vj> be the return value of top-down(Tj, <P1, ..., Pq>) such that ij > 0 and vj is p1 or a
descendant of p1. Then, during the execution of top-down(Tj+1, <P1, ..., Pq>), once we have
detected that it can only produce a return value <ij+1, vj+1> with vj+1 being a descendant of vj, we
should stop the corresponding computation immediately since this return value will not be usedin the subsequent searching. For this purpose, we rearrange top-down(Tj+1, <P1, ..., Pq>) to
top-down(Tj+1, <P1, ..., Pq>, vj) with vj being used to transfer information, called a
controlling-node.
Further improvements
Tree inclusion algorithm
Assume that in the execution of top-down(Tj+1, <P1, ..., Pq>, vj), we have the followingfunction calls: top-down(Tj+1,1, <P1, ..., Pq>, u1) returns <a1, u1>,
top-down(Tj+1,2, <P1, ..., Pq>, u2) returns <a1, u2>,
With all uj’s being a proper descendant of vj. Then the bottom-up function call withsome ui as a controlling node should not be conducted.
… …
bottom-up(<Tj+1,i , ... >, <… …>, ui ).
Summary
• An efficient method for tree inclusion problem- O|T|min{DP, |leaves(P)|}) time and- O(|T| + |P|) spacewhere DP – the height of P, and
• Future work- adapt the algorithm to a data stream environment - adapt the algorithm to an indexingenvironment
leaves(P) - set of the leaf nodes of P.
Thank you.