recursive graph deduction and reachability queries yangjun chen dept. applied computer science,...

33
Recursive Graph Deduction and Reachability Queries Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba, Canada R3B 2E9

Post on 21-Dec-2015

223 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Recursive Graph Deduction and Reachability Queries Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,

Recursive Graph Deduction and Reachability Queries

Yangjun Chen

Dept. Applied Computer Science,

University of Winnipeg

515 Portage Ave.

Winnipeg, Manitoba, Canada R3B 2E9

Page 2: Recursive Graph Deduction and Reachability Queries Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,

Outline

• Motivation• Graph deduction

- Basic definitions- Critical nodes and critical subgraphs - Evaluation of reachability queries

• Recursive graph deduction (RGD)- Recursive deduction- Evaluation of reachability queries

based on RGD • Conclusion

Page 3: Recursive Graph Deduction and Reachability Queries Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,

Motivation

Efficient method to evaluate graph reachability queriesGiven a directed acyclic graph (DAG) G, check whether a node v is reachable from another node u through a path in G.

ApplicationXML data processing, gene-regulatory networks or metabolic networks. It is well known that XML documents are often represented by tree structure. However, an XML document may contain IDREF/ID references that turn itself into a directed, but sparse graph: a tree structure plus a few reference links. For a metabolic network, the graph reachability models a relationship whether two genes interact with each other or whether two proteins participate in a common pathway. Many such graphs are sparse.

Page 4: Recursive Graph Deduction and Reachability Queries Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,

A simple method- store a transitive closure as a matrix

Motivation

cb

a

d e

G:

cb

a

d e

G*:

M =

abcde

a b c d e00000

10000

10100

00100

10000

M* =

abcde

a b c d e00000

10000

10100

10100

10000

O(n2) spacequery time: O(1)

Page 5: Recursive Graph Deduction and Reachability Queries Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,

Question:

Is it possible to reduce the size of M*, but still have a constant query time?

Motivation

Page 6: Recursive Graph Deduction and Reachability Queries Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,

Graph deduction Basic definitions

a

b

c

k

d

r h

e

f g

i j

Let G be a sparse graph. we will first find a spanning tree T of G.

The spanning tree of G is represented by the solid arrows, which covers all nodes of G.

Page 7: Recursive Graph Deduction and Reachability Queries Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,

• tree edges (Etree): edges appearing in T.

• cross edges (Ecross): any edge (u, v) such that u and v

are not on the same path in T.

• forward edges (Eforward): any edge (u, v) not appearing

in T, but there exists a path from u to v in T.

• back edges (Eback): any edge (u, v) not appearing in T,

but there exists a path from v to u in T.

i

a

b

c

kd

r h

e

f g

j

Graph deductionEdge classification

Page 8: Recursive Graph Deduction and Reachability Queries Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,

Graph deduction Tree encoding

• Let G be a DAG. we will first find a spanning tree T of G. • Each node v in T will be assigned an interval [start, end),

where start is v’s preorder number and end - 1 is the largestpreorder number among all the nodes in T[v]. So anothernode u labeled [start’, end’) is a descendant of v(with respect to T) iff start’ [start, end).

i

[3, 4)

j[11, 12)

[9, 12)

[5, 9)

k

d

r

[8, 9)

he

fc

b

a

[10, 11)

[6, 9)

[7, 8)[4, 5)[2, 4)

[1, 5)

[0, 12)

g

Page 9: Recursive Graph Deduction and Reachability Queries Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,

Graph deduction Tree encoding

• Let v and u be two nodes in T, labeled [a, b) and [a’, b’), respectively.If a [a’, b’), v is a descendant of u. In this case, we say, [a, b) is subsumedby [a’, b’).

• Also, we must have b b’. Therefore, if v and u are not on the same path in T,we have either a’ b or a b’.

• In the former case, we say, [a, b) is smaller than [a’, b’), denoted[a, b) [a’, b’). In the latter case, [a’, b’) is smaller than [a, b).

i

[3, 4)

j[11, 12)

[9, 12)

[5, 9)

k

d

r

[8, 9)

he

fc

b

a

[10, 11)

[6, 9)

[7, 8)[4, 5)[2, 4)

[1, 5)

[0, 12)

g

Page 10: Recursive Graph Deduction and Reachability Queries Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,

Graph deduction Critical nodes and critical subgraph

• We denote by E’ the set of all cross edges. Denoteby V’ the set of all the end points of the crossedges. That is, V’ = Vstart Vend, where Vstart

contains all the start nodes while Vend all the endnodes of the cross edges.

Vstart = {d, f, g, h}

Vend = {c, k, e, d, g}i j

k

d

r

he

fc

b

a

g

Page 11: Recursive Graph Deduction and Reachability Queries Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,

Definition 1 (anti-subsuming subset) A subset S Vstart is

called an anti-subsuming subset iff |S| > 1 and no two nodes in S are related by ancestor-descendant relationship with respect to T.

{d, f}{d, g}{d, h}{f, g}{f, h}{g, h}

{d, f, g}{d, f, h}{d, g, h}{f, g, h}{d, f, g, h}

anti-subsumming subsets:

Graph deduction Critical nodes and critical subgraph

i jk

d

r

he

fc

b

a

g

Page 12: Recursive Graph Deduction and Reachability Queries Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,

Definition 2 (critical node) A node v in a spanning tree T of G is critical if v Vstart or there exists an anti-subsuming subset S = {v1, v2, ..., vk} for k 2 such that

v is the lowest common ancestor of v1, v2, ..., vk. We denote Vc the set of all critical

nodes. In the graph, node e is the lowest common ancestor of {f, g}, and node a is the lowest common ancestor of {d, f, g, h}. So e and a are critical nodes. In addition, each v Vstart is a critical node. So all the critical nodes of G with respect to T are

{d, f, g, h, e, a}.

h

i jk

d

r

e

fc

b

a

g

Graph deduction Critical nodes and critical subgraph

Vc = {d, f, g, h, e, a}.

Page 13: Recursive Graph Deduction and Reachability Queries Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,

Critical node recognition

Algorithm critical-node-recognition(T)

1. Mark any node in T, which belongs to Vstart.

2. Let v be the first marked node encountered during the bottom-up searching of T. Create the first node for v in Gc.

3. Let u be the currently encountered node in T. Let u’ be a node in T, for which a node in Gc is created just before u is met. Do (4) or (5), depending on

whether u is a marked node or not.

4. If u is a marked node, then do the following.

(a) If u’ is not a child (descendant) of u, create a link from u to u’, called a left-sibling link and denoted as left-sibling(u) = u’.

Graph deduction

Page 14: Recursive Graph Deduction and Reachability Queries Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,

Critical node recognition

Algorithm critical-node-recognition(T) (continued)

(b) If u’ is a child (descendant) of u, we will first create a link from u’ to u, called a parent link and denoted as parent(u’) = u. Then, we will go along a left-sibling chain starting from u’ until we meet a node u’’ which is not a child (descendant) of u. For each encountered node w except u’’, set parent(w) u. Set left-

sibling(u) u’’. Remove left-sibling(w) for each child w of u.

5. If u is a non-marked node, then do the following.

(c) If u’ is not a child (descendant) of u, no node will be created.

(d) If u’ is a child (descendant) of u, we will go along a left-sibling chain starting from u’ until we meet a node u’’ which is not a child (descendant) of u. If the number of the nodes encountered during the chain navigation (not including u’’) is more than 1, we will create new node in Gc and do the same operation as

(4.b). Otherwise, no node is created.

Graph deduction

Page 15: Recursive Graph Deduction and Reachability Queries Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,

Sample trace

… …

u’’

uu’’ is not a child of u.

u’

link to the left sibling

… …

u’’

u

u’

d d f d f g

(c)(b)(a)

ehgfd

a

(f)

d f g

e

d f g

e h(e)(d)

Graph deduction

i jk

d

rhe

fc

b

a

g

Page 16: Recursive Graph Deduction and Reachability Queries Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,

Tree deduction• Let T be a spanning tree of G. Denote by Tr a reduction

of T obtained by removing all those nodes v Vc Vend. Deleting a node v entails connecting v’s parent to each of v’s children. So, removing a node in this way corresponds to the elimination of a tree edge.

• Example: Tr obtained by removing the nodes b, r, i, and j one by one. (Note that none of them belongs to Vc Vend. Vc = {a, d, e, f, g, h} and Vend = {c, d, e, g, k}.)

Graph deduction

a

c

k

d he

f g

Tr:

i jk

d

rhe

fc

b

a

g

Page 17: Recursive Graph Deduction and Reachability Queries Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,

Graph deduction

Critical subgraph

Definition 4 (critical subgraph) Let G(V, E) be a DAG. Let T bea spanning tree of G. The critical subgraph Gc of G with respectto T is graph with node set V(Tr) and edge set E(Tr) Ecross.

a

c

k

d he

f g

Gc:

The reachability of any two nodes can be checked by using Tor Gc.

Page 18: Recursive Graph Deduction and Reachability Queries Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,

Graph deduction

i

[3, 4)

j[11, 12)

[9, 12)

[5, 9)

k

d

r

[8, 9)

he

fc

b

a

[10, 11)

[6, 9)

[7, 8)[4, 5)[2, 4)

[1, 5)

[0, 12)

g

r f ?

r d ?

a

c

k

d he

f g

Gc:

Page 19: Recursive Graph Deduction and Reachability Queries Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,

Graph deduction Evaluation of reachablity queries

Definition 5 (anchor nodes) Let G be a DAG and T a spanningtree of G. Let v be a node in T. Denote by Cv all the criticalnodes in T[v]. We associate two anchor nodes with v asbelow.i) A node u Cv is called an anchor node (of the first kind) of

v if u is closest to v. u is denoted v*.ii) A node w is called an anchor node (of the second kind) of v

if it is the lowest ancestor of v (in T), which has a crossincoming edge. w is denoted v**.Example. r* = e. It is because node e is critical and closest tonode r in T[r]. But r** does not exist since it does not have anancestor which has a cross incoming edge. e* = e** = e. That is, both the first and second kinds of anchornodes of e are e itself.

Page 20: Recursive Graph Deduction and Reachability Queries Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,

Example.r* = e. It is because node e is critical and closest tonode r in T[r]. But r** does not exist since it does not have anancestor which has a cross incoming edge. e* = e** = e. That is, both the first and second kinds of anchornodes of e are e itself.

Graph deduction Evaluation of reachablity queries

i jk

d

rhe

fc

b

a

g

f** = e

Page 21: Recursive Graph Deduction and Reachability Queries Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,

Graph deduction Evaluation of reachablity queries

Definition 6 (non-tree labels) Let v be a node in G. Thenon-tree label of v is a pair <x, y>, where

- x = v* if v* exists. If v* does not exists, let x be the specialsymbol “-”.

- y = v** if v** exists. If v** does not exist, let y be “-”.

Page 22: Recursive Graph Deduction and Reachability Queries Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,

Graph deduction Example

a

b

c

k

d

rh

e

f g

ij

<a, ->

<d, ->

<-, c>

<-, k>

<d, d>

<f, e> <g, g>

<e, -> <h, ->

<e, e><-, -> <-, ->

[5, 9)

[4, 5)

r* = e

d** = d

d is reachable frome through a pathin Gc. So d isreachable from r.

r d ?

a

c

k

d he

f g

Gc:

Page 23: Recursive Graph Deduction and Reachability Queries Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,

a

c

k

d he

f g

aefdk

hgc

acdefghk

11-423-23

213-2-21-

(1, 1)(2, 3)(1, 4)(1, 2)(1, 3)(2, 2)(2, 1)(1, 5)

Index(v)

Graph deduction Evaluation of reachablity queries

Reachability checking over Gc:

Decompose Gc into chains:

Page 24: Recursive Graph Deduction and Reachability Queries Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,

Graph deduction Evaluation of reachablity queries

Reachability checking over G:

b

c

k

1st chain r

e

f

2nd chain

d

(2, 1)(1, 2)(3, 2)(4, -)(5, -)

(2, 2)(1, 2)(3, 2)(4, -)(5, -)

(2, 3)(1, 3)(3, -)(4, -)(5, -)

(2, 4)(1, 3)(3, -)(4, -)(5, -)

a

i

4th chain (4, 1)

(1, 1)(2, 1)(3, 1)(5, 1)(4, 2)(1, -)(2, -)(3, -)(5, -)

h

g

3rd chain (3, 1)

(1, 2)(2, 2)(4, 2)(5, 1)

(3, 2)(1, 2)(2, -)(4, -)(5, -)

j5th chain (5, 1)

(1, -)(2, -)(3, -)(4, -)

Index(v)

(1, 1)(2, 4)(3, -)(4, -)(5, -)

(1, 2)(2, -)(3, -)(4, -)(5, -)

(1, 3)(2, -)(3, -)(4, -)(5, -)

abcdefghijkr

111232322--32

214-42322---1

31---2-23---2

41------22---

51------1-1--

Page 25: Recursive Graph Deduction and Reachability Queries Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,

From the above discussion, we can see that Gc is much smaller than G. However, it can be observed that Gc itself can be further re duced, leading to a further reduction of space requirement.

Recursive graph decomposition Recursive deduction

Using the above method, we can find a series of graphreductions:

G0 = G, G1, ..., Gk, (k 1)

where Gi is a critical subgraph of Gi-1 (i = 1, ..., k).In order to construct such critical subgraphs, a series ofspanning trees have to be established:

T0, T1, ..., Tk-1,

where each Ti is a spanning tree of Gi (i = 0, ..., k - 1), used toconstruct Gi+1.

Page 26: Recursive Graph Deduction and Reachability Queries Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,

To check reachability efficiently, each node v in G will be asssociated with two sequences: an interval sequence and an anchor node sequence:

1) [0(v), 0(v)), ..., [j(v), j(v)) (j k - 1)

where each [i(v), i(v)) is an interval generated by labeling Ti;

2) (x0(v), y0(v)), ..., (xj(v), yj(v)),

where each is a pointer to an anchor node of the first kind (a node appearing in Gi+1) while each a pointer to an anchor node of the second kind (also, a node in Gi+1).

Recursive graph decomposition Recursive deduction

Page 27: Recursive Graph Deduction and Reachability Queries Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,

Recursive graph decomposition Recursive deduction

G0:

U

[0(u), 0(u))

v[0(v), 0(v))

w[0(w), 0(w))

z[0(z), 0(z))

G1:

U

[1(u), 1(u))

v[1(v), 1(v))

w[1(w), 1(w))

z[1(z), 1(z))

Gj:

U

[j(u), j(u))

v[j(v), j(v))

w[j(w), j(w))

z[j(z), j(z))

*

**

*

**

*

**

Page 28: Recursive Graph Deduction and Reachability Queries Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,

Recursive graph decomposition Recursive deduction

Example

g

<c, ->

<c, ->

<-, k>

<-, ->

<-, ->

<c, ->

<c, ->

c

k

f

a <c, ->

h

e

d

i jk

d

r

he

fc

b

a

g

G0: G1: G2:c

k

ck

112

(1, 1)(1, 2)

Index(v)

Page 29: Recursive Graph Deduction and Reachability Queries Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,

Recursive graph decomposition Recursive deduction

abcdefghijkr

[0, 12)[0, 8) <a, -><c, ->[1, 5) <d, ->[2, 4)[7, 8) <-, c><c, ->[4, 5)[4, 6) <d, d><-, ->[6, 9)[2, 8] <e, e><c, ->[7, 8)[3, 6) <f, e><-, ->[8, 9)[6, 8) <g, g><c, ->[9, 12)[1, 8) <h, -><c, ->[[10, 11) <-, ->[11, 12) <-, ->[3, 4)[5, 6) <-, k><-, k>[5, 9) <e, ->

ExampleInterval sequence: Anchor node sequence:

Page 30: Recursive Graph Deduction and Reachability Queries Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,

Recursive graph decomposition Evaluation of reachability queries

abcdefghijkr

<a, -><c, -><d, -><-, c><c, -><d, d><-, -><e, e><c, -><f, e><-, -><g, g><c, -><h, -><c, -><-, -><-, -><-, k><-, k><e, ->

Anchor node sequence:

a{1, *}

c {1, **}{2, *}

g{1, *}{1, **}

e {1, *}{1, **}

h{1, *}

f{1, *}

d {1, *}{1, **}

b

k {1, **}{2, **}

r

{2, *}

{2, *}

{2, *}

{1, **}

{1, *}

{2, *}

g k ?[0(g), 0(g)) = [8, 9); 0(k), 0(k)) = [3, 4);[1(g), 1(g)) = [6, 8); 1(k), 1(k)) = [5, 6).In G2, k is reachable from c, which shows that k isreachable from g.

Page 31: Recursive Graph Deduction and Reachability Queries Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,

Summary

• Transitive closure compression based on graph deduction- DAG decomposition: a spanning and a subgraph- Reachability checking: tree labels and reachability of anchor nodes in the subgraph

• Transitive closure compression based on recursive graph deduction- DAG decomposition: a series of spanning trees and a subgraph - Reachability checking: interval sequences and anchor node sequences

Page 32: Recursive Graph Deduction and Reachability Queries Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,

Summary

• Computational complexities- labeling time: O(ke + bk

1.5nk)- space overhead: O(kn + bknk)- query time: O(k)where n – number of the nodes of G,

e - number of the nodes of G,nk – number of the nodes of Gk, andbk – width of Gk.

Page 33: Recursive Graph Deduction and Reachability Queries Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,

Thank you.