a new top-down algorithm for tree inclusion

24
A New Top-down Algorithm for Tree Inclusion Dr. Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba, Canada R3B 2E9

Upload: jackson-daniel

Post on 31-Dec-2015

32 views

Category:

Documents


1 download

DESCRIPTION

A New Top-down Algorithm for Tree Inclusion. Dr. Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba, Canada R3B 2E9. Outline. Motivation Basic algorithm for tree inclusion problem -Definition -Algorithm description Improvements - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A New Top-down Algorithm for Tree Inclusion

A New Top-down Algorithmfor Tree Inclusion

Dr. Yangjun Chen

Dept. Applied Computer Science,

University of Winnipeg

515 Portage Ave.

Winnipeg, Manitoba, Canada R3B 2E9

Page 2: A New Top-down Algorithm for Tree Inclusion

Outline

Motivation Basic algorithm for tree

inclusion problem- Definition- Algorithm description

Improvements Summary

Page 3: A New Top-down Algorithm for Tree Inclusion

Given two ordered labeled trees P and T, called the pattern and the target,respectively. An interesting problem is: Can we obtain pattern P by deletingsome nodes from target T? That is, is there a sequence v1 , ..., vk of nodessuch that for

T0 = T andTi+1 = delete(Ti, vi +1) for i = 0, ..., k - 1,

we have Tk = P. If this is the case, we say, P is included in T, T contains P,or say, T covers P.

Motivation

a

b d

e f

T:

c b de f

T:adelete(T, c)

Page 4: A New Top-down Algorithm for Tree Inclusion

Motivation

s

vp

v n adv

“reads”“book”

s

np vp

det n v np adv

“The” “student”“reads” det adj n

“the”“interesting” “book”

“again and again”

Linguistic analysis

Page 5: A New Top-down Algorithm for Tree Inclusion

Definition 1 Let F and G be labeled ordered forests. We define an ordered embedding (, G, F) as an injective function : V(G) V(F) such that for all nodes v, u V(G),i) label(v) = label((v)); (label preservation condition)ii) v is an ancestor of u iff (v) is an ancestor of (u);(ancestor condition)iii) v is to the left of u iff (v) is to the left of (u); (Sibling condition)

Tree inclusion algorithm Definition

a

b b

G:a

d b

e b

b

F:

Page 6: A New Top-down Algorithm for Tree Inclusion

Algorithm

Tree inclusion algorithm

1. Let T = <t; T1, ..., Tk> (k 1) be a tree and G = <P1 , ..., Pl>(l 1) be a forest. We handle G as a tree P = <pv; P1, ..., Pl>,where pv represents a virtual node, matching any node in T.

2. Consider a node in P with children v1, ..., vj. We use a pair <i, v>(i j) to represent an ordered forest containing the first i subtreesof v: <P[v1], ..., P[vi]>. Then, <j, pv> represents the first j treesin G.

P:

v1 vi vk

… …

v

<i, v>

Page 7: A New Top-down Algorithm for Tree Inclusion

Algorithm

Tree inclusion algorithm

3. In addition, h(v) represents the height of v in a tree; and (v)represents a link from v in P to the leaf node on the left-mostpath in P[v].

Let v’ be a leaf node in P. Wedenote by -1(v’) a set of nodesx such that for each v x (v) = v’.

-1(v3) = {v1, v2, v3}

v1

v5

v4

v2

v3

(v1)

(v2)

P:

Page 8: A New Top-down Algorithm for Tree Inclusion

The tree inclusion checking is done by calling two functions recursively:top-down(T, G),bottom-up(T’, G),

where T is a tree, and T’ and G are two forests.

Algorithm

Tree inclusion algorithm

Each of the two functions returns a pair <i, v> with v being pv or a node onthe left-most path in P1.

T = <t; T1, ..., Tk>

T’ = <T1’, ..., Tk’>

G = <P1, ..., PL>

Page 9: A New Top-down Algorithm for Tree Inclusion

Function: top-down(T, G)

Tree inclusion algorithm

Case 1: G = <P1>; or G = <P1, ..., Pl> (l > 1), but |T | |P1| + |P2|.

In this case, we try to find a pair <i, v> such that T contains the first isubtrees of v, where v = pv , or v -1(v’) and v’ is the leaf node on the

left-most path in P1.

T: G:

P1

pv

G:

……P1 P2

pv

|T| |P1| + |P2|.

T: t

t

Pl

p1

In top-down(T, G), two cases will be handled.

p1

Page 10: A New Top-down Algorithm for Tree Inclusion

Function: top-down(T, G)

Tree inclusion algorithm

i) If t is a leaf node, we will check whether label(t) = label((p1)), where p1

is the root of P1. If it is the case, return <1, parent of (p1)>.

Otherwise, return <0, parent of (p1)>.

T = <t; T1, ..., Tk>: G:

P1

pv

G:

……P1 P2

pv

|T | |P1| + |P2|.

t

t

T = <t; T1, ..., Tk>:

Pl

case 1:

Page 11: A New Top-down Algorithm for Tree Inclusion

Function: top-down(T, G)

Tree inclusion algorithm

ii) If |T| < |P1| or height(t) < height(p1), we will make a recursive call

top-down(T , <P11, ..., P1j>), where <P11, ..., P1j> be a forest of

the subtrees of p1. The return value of top-down(T , <P11, ..., P1j>)

is used as the return value of top-down(T, G)

|T | < |P1|G:

……

pv

p1

… …P11 P1jP1i

T: t

Pl

case 1:

Page 12: A New Top-down Algorithm for Tree Inclusion

Function: top-down(T, G)

Tree inclusion algorithm

iii) If |T| |P1| (but |T | |P1| + |P2|) and height(t) height(p1), two casesneed to be considered:

• label(t) = label(p1). Call bottom-up(<T1, ..., Tk>, <P11, ..., P1j>).

• label(t) label(p1). Call bottom-up(<T1, ..., Tk>, <P1>).

p1

… …P11 P1jP1i

t

… …T1 TkTi

label(t) = label(p1)

p1

… …P11 P1j

P1i

t

… …T1 TkTi

label(t) label(p1)

case 1:

Page 13: A New Top-down Algorithm for Tree Inclusion

In both sub-cases, assume that the return value is <i, v>. A further checkingneeds to be conducted:

Function: top-down(T, G)

Tree inclusion algorithm

• If label(t) = label(v) and i = the outdegree of v, the return value shouldbe <1, v’s parent>.

• Otherwise, the return value is the same as <i, v>.

T:t

P1:p1

vor label(t) label(v)

label(t) = label(v)

case 1:

Page 14: A New Top-down Algorithm for Tree Inclusion

Function: top-down(T, G)

Tree inclusion algorithm

Case 2: G = <P1, ..., Pl> (l > 1), and |T| > |P1| + |P2|. In this case, we

will call bottom-up(<T1, ..., Tk>, G). Assume that the return value is <i, v>.

The following checkings will be continually conducted.

Case 1: G = <P1>; or G = <P1, ..., Pl> (l > 1), but |T | |P1| + |P2|.

G:

……P1 P2

pv

|T | > |P1| + |P2|

Pl

T:

……T1 T2

t

Tk

Page 15: A New Top-down Algorithm for Tree Inclusion

Function: top-down(T, G)

Tree inclusion algorithm

iv) If v = p1’s parent, the return value is the same as <i, v>. v) If v p1’s parent, check whether label(t) = label(v)) and

i = the outdegree of v. If so, the return value will be changed to<1, v’s parent>. Otherwise, the return value remains <i, v>.

Case 2: G = <P1, ..., Pl> (l > 1), and |T | > |P1| + |P2|. In this case, we

will call bottom-up(<T1, ..., Tk>, G).

Assume that the return value is <i, v>. The following checkings will becontinually conducted.

G:

… …P1 P2

pv

v = p1’s parent = pv

……P1 P2

pv

v p1’s parent

vPi Pl Pl

Page 16: A New Top-down Algorithm for Tree Inclusion

Function: bottom-up(T’, G)

Tree inclusion algorithm

bottom-up(T’, G) is designed to handle the case that both T’ and G are

forests. Let T’ = <T1, ..., Tk> and G = <P1, ..., Pq>. In bottom-up(T’, G),

we will make a series of calls top-down(Tl, <Pjl, ..., Pq>), where l = 1, ..., k,

j1 = 0, and j1 j2 ... jh q (for some h k), controlled as follows.

… …

Pi

… …

TkT1 Ti P1 PqT2

top-down(Tl, <Pjl, ..., Pq>)

T’: G:

Page 17: A New Top-down Algorithm for Tree Inclusion

Function: bottom-up(T’, G)

Tree inclusion algorithm

1. Two index variables l, j are used to scan T1, ..., Tk and P1, ..., Pq,respectively.

2. Let <il, vl> be the return value of top-down(Tl, <Pj, ..., Pq>). If vl = pj’sparent, set j to be j + il - 1. Otherwise, j is not changed. Set l to be l + 1.Go to (2).

3. The loop terminates when all Tl’s or all Pj’s are examined.

bottom-up(T’, G) is designed to handle the case that both T’ and G are

forests. Let T’ = <T1, ..., Tk> and G = <P1, ..., Pq>. In bottom-up(T’, G),

we will make a series of calls top-down(Tl, <Pjl, ..., Pq>), where l = 1, ..., k,

j1 = 0, and j1 j2 ... jh q (for some h k), controlled as follows.

Page 18: A New Top-down Algorithm for Tree Inclusion

Function: bottom-up(T’, G)

Tree inclusion algorithm

• If j > 0 when the loop terminates, bottom-up(T’, G) returns<j, p1’s parent>.

… …

Pi

… …

TkT1 Ti P1 PqT2

Pj

Page 19: A New Top-down Algorithm for Tree Inclusion

Function: bottom-up(T’, G)

Tree inclusion algorithm

i) Let <i1, v1>, <i2, v2>, ..., <ik, vk> be the respective return values of

top-down(T1, <P1, ..., Pq>),

top-down(T2, <P1, ..., Pq>), ... ...

top-down(Tk, <P1, ..., Pq>).

Since j = 0, each vl -1(v’) (l = 1, ..., k).

• Otherwise, j = 0. In this case, we will continue to searching for a pair<i, v> such that T’ contains the first i subtrees of v, where v -1(v’) andv’ is the leaf node on the left-most path in P1, as described below.

• If j > 0 when the loop terminates, bottom-up(T’, G) returns<j, p1’s parent>.

P1

v1

v2

vk

Page 20: A New Top-down Algorithm for Tree Inclusion

ii) If each il = 0, return <0, ,>, where is considered to be a descendant ofany node in G. Otherwise, find the first vg with children w1, ..., wh such thatvg is not a descendant of any other vj, and ig > 0. Call

bottom-up(<Tg+1, ..., Tk>, <P[wig+1], ..., P[wh]>).

Function: bottom-up(T’, G)

Tree inclusion algorithm

i) Let <i1, v1>, ..., <ik, vk> be the return values of top-down(T1, <P1, ..., Pq>),..., top-down(Tk, <P1, ..., Pq>), respectively. Since j = 0, each vl -1(v’)(l = 1, ..., k).

• Let <x, y> be its return value. If y = vg, then the return value ofbottom-up(T’, G) is set to be <ig + x, vg>.

• Otherwise, the return value is <ig, vg>.

… …

Tg+1T1 TgT2

P1

v1

vg

vk

Tk

… …

ig

Page 21: A New Top-down Algorithm for Tree Inclusion

Further improvements

Tree inclusion algorithm

In the case j = 0:

Let <i1, v1>, ..., <ik, vk> be the return values of top-down(T1, <P1, ..., Pq>),..., top-down(Tk, <P1, ..., Pq>). We will find the first vg such that it is not adescendant of any other vj and ig > 0. Then,

bottom-up(<Tg+1, ..., Tk>, <P[wig+1], ..., P[wh]>).

is invoked. This shows that all the return values except <ig, vg> are not usedin the subsequent computation. Thus, the work for looking for such valuesshould be avoided.

… …

Tg+1T1 TgT2

P1

v1

vg

vk

Tk

… …

Page 22: A New Top-down Algorithm for Tree Inclusion

Let <ij, vj> be the return value of top-down(Tj, <P1, ..., Pq>) such that ij > 0 and vj is p1 or a

descendant of p1. Then, during the execution of top-down(Tj+1, <P1, ..., Pq>), once we have

detected that it can only produce a return value <ij+1, vj+1> with vj+1 being a descendant of vj, we

should stop the corresponding computation immediately since this return value will not be usedin the subsequent searching. For this purpose, we rearrange top-down(Tj+1, <P1, ..., Pq>) to

top-down(Tj+1, <P1, ..., Pq>, vj) with vj being used to transfer information, called a

controlling-node.

Further improvements

Tree inclusion algorithm

Assume that in the execution of top-down(Tj+1, <P1, ..., Pq>, vj), we have the followingfunction calls: top-down(Tj+1,1, <P1, ..., Pq>, u1) returns <a1, u1>,

top-down(Tj+1,2, <P1, ..., Pq>, u2) returns <a1, u2>,

With all uj’s being a proper descendant of vj. Then the bottom-up function call withsome ui as a controlling node should not be conducted.

… …

bottom-up(<Tj+1,i , ... >, <… …>, ui ).

Page 23: A New Top-down Algorithm for Tree Inclusion

Summary

• An efficient method for tree inclusion problem- O|T|min{DP, |leaves(P)|}) time and- O(|T| + |P|) spacewhere DP – the height of P, and

• Future work- adapt the algorithm to a data stream environment - adapt the algorithm to an indexingenvironment

leaves(P) - set of the leaf nodes of P.

Page 24: A New Top-down Algorithm for Tree Inclusion

Thank you.