a new top-down algorithm for tree inclusion

Post on 31-Dec-2015

32 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

A New Top-down Algorithm for Tree Inclusion. Dr. Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba, Canada R3B 2E9. Outline. Motivation Basic algorithm for tree inclusion problem -Definition -Algorithm description Improvements - PowerPoint PPT Presentation

TRANSCRIPT

A New Top-down Algorithmfor Tree Inclusion

Dr. Yangjun Chen

Dept. Applied Computer Science,

University of Winnipeg

515 Portage Ave.

Winnipeg, Manitoba, Canada R3B 2E9

Outline

Motivation Basic algorithm for tree

inclusion problem- Definition- Algorithm description

Improvements Summary

Given two ordered labeled trees P and T, called the pattern and the target,respectively. An interesting problem is: Can we obtain pattern P by deletingsome nodes from target T? That is, is there a sequence v1 , ..., vk of nodessuch that for

T0 = T andTi+1 = delete(Ti, vi +1) for i = 0, ..., k - 1,

we have Tk = P. If this is the case, we say, P is included in T, T contains P,or say, T covers P.

Motivation

a

b d

e f

T:

c b de f

T:adelete(T, c)

Motivation

s

vp

v n adv

“reads”“book”

s

np vp

det n v np adv

“The” “student”“reads” det adj n

“the”“interesting” “book”

“again and again”

Linguistic analysis

Definition 1 Let F and G be labeled ordered forests. We define an ordered embedding (, G, F) as an injective function : V(G) V(F) such that for all nodes v, u V(G),i) label(v) = label((v)); (label preservation condition)ii) v is an ancestor of u iff (v) is an ancestor of (u);(ancestor condition)iii) v is to the left of u iff (v) is to the left of (u); (Sibling condition)

Tree inclusion algorithm Definition

a

b b

G:a

d b

e b

b

F:

Algorithm

Tree inclusion algorithm

1. Let T = <t; T1, ..., Tk> (k 1) be a tree and G = <P1 , ..., Pl>(l 1) be a forest. We handle G as a tree P = <pv; P1, ..., Pl>,where pv represents a virtual node, matching any node in T.

2. Consider a node in P with children v1, ..., vj. We use a pair <i, v>(i j) to represent an ordered forest containing the first i subtreesof v: <P[v1], ..., P[vi]>. Then, <j, pv> represents the first j treesin G.

P:

v1 vi vk

… …

v

<i, v>

Algorithm

Tree inclusion algorithm

3. In addition, h(v) represents the height of v in a tree; and (v)represents a link from v in P to the leaf node on the left-mostpath in P[v].

Let v’ be a leaf node in P. Wedenote by -1(v’) a set of nodesx such that for each v x (v) = v’.

-1(v3) = {v1, v2, v3}

v1

v5

v4

v2

v3

(v1)

(v2)

P:

The tree inclusion checking is done by calling two functions recursively:top-down(T, G),bottom-up(T’, G),

where T is a tree, and T’ and G are two forests.

Algorithm

Tree inclusion algorithm

Each of the two functions returns a pair <i, v> with v being pv or a node onthe left-most path in P1.

T = <t; T1, ..., Tk>

T’ = <T1’, ..., Tk’>

G = <P1, ..., PL>

Function: top-down(T, G)

Tree inclusion algorithm

Case 1: G = <P1>; or G = <P1, ..., Pl> (l > 1), but |T | |P1| + |P2|.

In this case, we try to find a pair <i, v> such that T contains the first isubtrees of v, where v = pv , or v -1(v’) and v’ is the leaf node on the

left-most path in P1.

T: G:

P1

pv

G:

……P1 P2

pv

|T| |P1| + |P2|.

T: t

t

Pl

p1

In top-down(T, G), two cases will be handled.

p1

Function: top-down(T, G)

Tree inclusion algorithm

i) If t is a leaf node, we will check whether label(t) = label((p1)), where p1

is the root of P1. If it is the case, return <1, parent of (p1)>.

Otherwise, return <0, parent of (p1)>.

T = <t; T1, ..., Tk>: G:

P1

pv

G:

……P1 P2

pv

|T | |P1| + |P2|.

t

t

T = <t; T1, ..., Tk>:

Pl

case 1:

Function: top-down(T, G)

Tree inclusion algorithm

ii) If |T| < |P1| or height(t) < height(p1), we will make a recursive call

top-down(T , <P11, ..., P1j>), where <P11, ..., P1j> be a forest of

the subtrees of p1. The return value of top-down(T , <P11, ..., P1j>)

is used as the return value of top-down(T, G)

|T | < |P1|G:

……

pv

p1

… …P11 P1jP1i

T: t

Pl

case 1:

Function: top-down(T, G)

Tree inclusion algorithm

iii) If |T| |P1| (but |T | |P1| + |P2|) and height(t) height(p1), two casesneed to be considered:

• label(t) = label(p1). Call bottom-up(<T1, ..., Tk>, <P11, ..., P1j>).

• label(t) label(p1). Call bottom-up(<T1, ..., Tk>, <P1>).

p1

… …P11 P1jP1i

t

… …T1 TkTi

label(t) = label(p1)

p1

… …P11 P1j

P1i

t

… …T1 TkTi

label(t) label(p1)

case 1:

In both sub-cases, assume that the return value is <i, v>. A further checkingneeds to be conducted:

Function: top-down(T, G)

Tree inclusion algorithm

• If label(t) = label(v) and i = the outdegree of v, the return value shouldbe <1, v’s parent>.

• Otherwise, the return value is the same as <i, v>.

T:t

P1:p1

vor label(t) label(v)

label(t) = label(v)

case 1:

Function: top-down(T, G)

Tree inclusion algorithm

Case 2: G = <P1, ..., Pl> (l > 1), and |T| > |P1| + |P2|. In this case, we

will call bottom-up(<T1, ..., Tk>, G). Assume that the return value is <i, v>.

The following checkings will be continually conducted.

Case 1: G = <P1>; or G = <P1, ..., Pl> (l > 1), but |T | |P1| + |P2|.

G:

……P1 P2

pv

|T | > |P1| + |P2|

Pl

T:

……T1 T2

t

Tk

Function: top-down(T, G)

Tree inclusion algorithm

iv) If v = p1’s parent, the return value is the same as <i, v>. v) If v p1’s parent, check whether label(t) = label(v)) and

i = the outdegree of v. If so, the return value will be changed to<1, v’s parent>. Otherwise, the return value remains <i, v>.

Case 2: G = <P1, ..., Pl> (l > 1), and |T | > |P1| + |P2|. In this case, we

will call bottom-up(<T1, ..., Tk>, G).

Assume that the return value is <i, v>. The following checkings will becontinually conducted.

G:

… …P1 P2

pv

v = p1’s parent = pv

……P1 P2

pv

v p1’s parent

vPi Pl Pl

Function: bottom-up(T’, G)

Tree inclusion algorithm

bottom-up(T’, G) is designed to handle the case that both T’ and G are

forests. Let T’ = <T1, ..., Tk> and G = <P1, ..., Pq>. In bottom-up(T’, G),

we will make a series of calls top-down(Tl, <Pjl, ..., Pq>), where l = 1, ..., k,

j1 = 0, and j1 j2 ... jh q (for some h k), controlled as follows.

… …

Pi

… …

TkT1 Ti P1 PqT2

top-down(Tl, <Pjl, ..., Pq>)

T’: G:

Function: bottom-up(T’, G)

Tree inclusion algorithm

1. Two index variables l, j are used to scan T1, ..., Tk and P1, ..., Pq,respectively.

2. Let <il, vl> be the return value of top-down(Tl, <Pj, ..., Pq>). If vl = pj’sparent, set j to be j + il - 1. Otherwise, j is not changed. Set l to be l + 1.Go to (2).

3. The loop terminates when all Tl’s or all Pj’s are examined.

bottom-up(T’, G) is designed to handle the case that both T’ and G are

forests. Let T’ = <T1, ..., Tk> and G = <P1, ..., Pq>. In bottom-up(T’, G),

we will make a series of calls top-down(Tl, <Pjl, ..., Pq>), where l = 1, ..., k,

j1 = 0, and j1 j2 ... jh q (for some h k), controlled as follows.

Function: bottom-up(T’, G)

Tree inclusion algorithm

• If j > 0 when the loop terminates, bottom-up(T’, G) returns<j, p1’s parent>.

… …

Pi

… …

TkT1 Ti P1 PqT2

Pj

Function: bottom-up(T’, G)

Tree inclusion algorithm

i) Let <i1, v1>, <i2, v2>, ..., <ik, vk> be the respective return values of

top-down(T1, <P1, ..., Pq>),

top-down(T2, <P1, ..., Pq>), ... ...

top-down(Tk, <P1, ..., Pq>).

Since j = 0, each vl -1(v’) (l = 1, ..., k).

• Otherwise, j = 0. In this case, we will continue to searching for a pair<i, v> such that T’ contains the first i subtrees of v, where v -1(v’) andv’ is the leaf node on the left-most path in P1, as described below.

• If j > 0 when the loop terminates, bottom-up(T’, G) returns<j, p1’s parent>.

P1

v1

v2

vk

ii) If each il = 0, return <0, ,>, where is considered to be a descendant ofany node in G. Otherwise, find the first vg with children w1, ..., wh such thatvg is not a descendant of any other vj, and ig > 0. Call

bottom-up(<Tg+1, ..., Tk>, <P[wig+1], ..., P[wh]>).

Function: bottom-up(T’, G)

Tree inclusion algorithm

i) Let <i1, v1>, ..., <ik, vk> be the return values of top-down(T1, <P1, ..., Pq>),..., top-down(Tk, <P1, ..., Pq>), respectively. Since j = 0, each vl -1(v’)(l = 1, ..., k).

• Let <x, y> be its return value. If y = vg, then the return value ofbottom-up(T’, G) is set to be <ig + x, vg>.

• Otherwise, the return value is <ig, vg>.

… …

Tg+1T1 TgT2

P1

v1

vg

vk

Tk

… …

ig

Further improvements

Tree inclusion algorithm

In the case j = 0:

Let <i1, v1>, ..., <ik, vk> be the return values of top-down(T1, <P1, ..., Pq>),..., top-down(Tk, <P1, ..., Pq>). We will find the first vg such that it is not adescendant of any other vj and ig > 0. Then,

bottom-up(<Tg+1, ..., Tk>, <P[wig+1], ..., P[wh]>).

is invoked. This shows that all the return values except <ig, vg> are not usedin the subsequent computation. Thus, the work for looking for such valuesshould be avoided.

… …

Tg+1T1 TgT2

P1

v1

vg

vk

Tk

… …

Let <ij, vj> be the return value of top-down(Tj, <P1, ..., Pq>) such that ij > 0 and vj is p1 or a

descendant of p1. Then, during the execution of top-down(Tj+1, <P1, ..., Pq>), once we have

detected that it can only produce a return value <ij+1, vj+1> with vj+1 being a descendant of vj, we

should stop the corresponding computation immediately since this return value will not be usedin the subsequent searching. For this purpose, we rearrange top-down(Tj+1, <P1, ..., Pq>) to

top-down(Tj+1, <P1, ..., Pq>, vj) with vj being used to transfer information, called a

controlling-node.

Further improvements

Tree inclusion algorithm

Assume that in the execution of top-down(Tj+1, <P1, ..., Pq>, vj), we have the followingfunction calls: top-down(Tj+1,1, <P1, ..., Pq>, u1) returns <a1, u1>,

top-down(Tj+1,2, <P1, ..., Pq>, u2) returns <a1, u2>,

With all uj’s being a proper descendant of vj. Then the bottom-up function call withsome ui as a controlling node should not be conducted.

… …

bottom-up(<Tj+1,i , ... >, <… …>, ui ).

Summary

• An efficient method for tree inclusion problem- O|T|min{DP, |leaves(P)|}) time and- O(|T| + |P|) spacewhere DP – the height of P, and

• Future work- adapt the algorithm to a data stream environment - adapt the algorithm to an indexingenvironment

leaves(P) - set of the leaf nodes of P.

Thank you.

top related