fast d irection- a ware p roximity for graph mining

27
Fast Direction-Aware Proximity for Graph Mining KDD 2007, San Jose Hanghang Tong, Yehuda Koren, Christos Faloutsos

Upload: kael

Post on 20-Feb-2016

24 views

Category:

Documents


0 download

DESCRIPTION

Fast D irection- A ware P roximity for Graph Mining. KDD 2007, San Jose Hanghang Tong, Yehuda Koren, Christos Faloutsos. Defining Direction-Aware Proximity (DAP): escape probability. Define Random Walk ( RW ) on the graph Esc_Prob(A  B) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Fast D irection- A ware  P roximity  for Graph Mining

Fast Direction-Aware Proximity for Graph Mining

KDD 2007, San JoseHanghang Tong, Yehuda Koren,

Christos Faloutsos

Page 2: Fast D irection- A ware  P roximity  for Graph Mining

2

Defining Direction-Aware Proximity (DAP): escape probability

• Define Random Walk (RW) on the graph• Esc_Prob(AB)– Prob (starting at A, reaches B before returning to A)

Esc_Prob = Pr (smile before cry)

A Bthe remaining graph

Page 3: Fast D irection- A ware  P roximity  for Graph Mining

3

Esc_Prob(1->5) =

1,1 1,2 1,3 1,4 1,5 1,6

2,1 2,2 2,3 2,4 2,5 2,6

3,1 3,2 3,3 3,4 3,5 3,6

4,1 4,2 4,3 4,4 4,5 4,6

5,1 5,2 5,3 5,4 5,5 5,6

6,1 6,2 6,3 6,4 6,5 6,6

p p p p p p

p p p p p p

p p p p p p

p p p p p p

p p p p p p

p p p p p p

P=

I - +

-1

1 5

3

2

6

4

0.5 0.5

0.5

0.50.5

0.5

0.5

1

0.5 1

P: Transition matrix (row norm.)

Page 4: Fast D irection- A ware  P roximity  for Graph Mining

Intuition of Formula

1 2 3

2

,

,

1. = + + ,

2. tells the probability that start from , take two

steps to arrive at 3. gives the stationary distribution.4. tells the probability we started from and

i j

i j

Q I P I P P P

P i

jQQ i

ended with .j

1,1 1,2 1,3 1,4 1,5 1,6

2,1 2,2 2,3 2,4 2,5 2,6

3,1 3,2 3,3 3,4 3,5 3,6

4,1 4,2 4,3 4,4 4,5 4,6

5,1 5,2 5,3 5,4 5,5 5,6

6,1 6,2 6,3 6,4 6,5 6,6

p p p p p p

p p p p p p

p p p p p p

p p p p p p

p p p p p p

p p p p p p

P*P=

1,1 1,2 1,3 1,4 1,5 1,6

2,1 2,2 2,3 2,4 2,5 2,6

3,1 3,2 3,3 3,4 3,5 3,6

4,1 4,2 4,3 4,4 4,5 4,6

5,1 5,2 5,3 5,4 5,5 5,6

6,1 6,2 6,3 6,4 6,5 6,6

p p p p p p

p p p p p p

p p p p p p

p p p p p p

p p p p p p

p p p p p p

Page 5: Fast D irection- A ware  P roximity  for Graph Mining

5

Esc_Prob(1->5) =

1,1 1,2 1,3 1,4 1,5 1,6

2,1 2,2 2,3 2,4 2,5 2,6

3,1 3,2 3,3 3,4 3,5 3,6

4,1 4,2 4,3 4,4 4,5 4,6

5,1 5,2 5,3 5,4 5,5 5,6

6,1 6,2 6,3 6,4 6,5 6,6

p p p p p p

p p p p p p

p p p p p p

p p p p p p

p p p p p p

p p p p p p

P=

I - +

-1

1 5

3

2

6

4

0.5 0.5

0.5

0.50.5

0.5

0.5

1

0.5 1

P: Transition matrix (row norm.)

Page 6: Fast D irection- A ware  P roximity  for Graph Mining

6

• Case 1, Medium Size Graph– Matrix inversion is feasible, but…– What if we want many proximities?– Q: How to get all (n ) proximities efficiently?– A: FastAllDAP!

• Case 2: Large Size Graph – Matrix inversion is infeasible– Q: How to get one proximity efficiently?– A: FastOneDAP!

Challenges

2

Page 7: Fast D irection- A ware  P roximity  for Graph Mining

7

FastAllDAP

• Q1: How to efficiently compute all possible proximities on a medium size graph?– a.k.a. how to efficiently solve multiple linear

systems simultaneously?• Goal: reduce # of matrix inversions!

Page 8: Fast D irection- A ware  P roximity  for Graph Mining

8

1,1 1,2 1,3 1,4 1,5 1,6

2,1 2,2 2,3 2,4 2,5 2,6

3,1 3,2 3,3 3,4 3,5 3,6

4,1 4,2 4,3 4,4 4,5 4,6

5,1 5,2 5,3 5,4 5,5 5,6

6,1 6,2 6,3 6,4 6,5 6,6

p p p p p p

p p p p p p

p p p p p p

p p p p p p

p p p p p p

p p p p p p

FastAllDAP: Observation

1 5

3

2

6

4

0.5 0.5

0.5

0.50.5

0.5

0.5

1

0.5 1

1,1 1,2 1,3 1,4 1,5 1,6

2,1 2,2 2,3 2,4 2,5 2,6

3,1 3,2 3,3 3,4 3,5 3,6

4,1 4,2 4,3 4,4 4,5 4,6

5,1 5,2 5,3 5,4 5,5 5,6

6,1 6,2 6,3 6,4 6,5 6,6

p p p p p p

p p p p p p

p p p p p p

p p p p p p

p p p p p p

p p p p p p

Need two different matrix inversions!

P=

P=

Page 9: Fast D irection- A ware  P roximity  for Graph Mining

9

1,1 1,2 1,3 1,4 1,5 1,6

2,1 2,2 2,3 2,4 2,5 2,6

3,1 3,2 3,3 3,4 3,5 3,6

4,1 4,2 4,3 4,4 4,5 4,6

5,1 5,2 5,3 5,4 5,5 5,6

6,1 6,2 6,3 6,4 6,5 6,6

p p p p p p

p p p p p p

p p p p p p

p p p p p p

p p p p p p

p p p p p p

FastAllDAP: Rescue

1,1 1,2 1,3 1,4 1,5 1,6

2,1 2,2 2,3 2,4 2,5 2,6

3,1 3,2 3,3 3,4 3,5 3,6

4,1 4,2 4,3 4,4 4,5 4,6

5,1 5,2 5,3 5,4 5,5 5,6

6,1 6,2 6,3 6,4 6,5 6,6

p p p p p p

p p p p p p

p p p p p p

p p p p p p

p p p p p p

p p p p p p

Redundancy among different linear systems!

P=

P=

Overlap between two gray parts!

Prox(1 5)

Prox(1 6)

Page 10: Fast D irection- A ware  P roximity  for Graph Mining

10

FastAllDAP: Theorem

• Theorem:

• Proof: by SM Lemma

• Example:

Page 11: Fast D irection- A ware  P roximity  for Graph Mining

11

FastAllDAP: Algorithm• Alg.– Compute Q– For i,j =1,…, n, compute

• Computational Save O(1) instead of O(n )!

• Example– w/ 1000 nodes, – 1m matrix inversion vs. 1 matrix!

2

Page 12: Fast D irection- A ware  P roximity  for Graph Mining

12

FastOneDAP

• Q1: How to efficiently compute one single proximity on a large size graph?– a.k.a. how to solve one linear system

efficiently?• Goal: avoid matrix inversion!

Page 13: Fast D irection- A ware  P roximity  for Graph Mining

13

FastOneDAP: Observation

1 5

3

2

6

4

0.5 0.5

0.5

0.50.5

0.5

0.5

1

0.5 1

Partial Info. (4 elements /2 cols ) of Q is enough!

Page 14: Fast D irection- A ware  P roximity  for Graph Mining

14

FastOneDAP: Observation

• Q: How to compute one column of Q?• A: Taylor expansion

Reminder:

i col of Qth

[0, …0, 1, 0, …, 0]T

Page 15: Fast D irection- A ware  P roximity  for Graph Mining

15

FastOneDAP: Observation

x x x

Sparse matrix-vector multiplications!

….

i col of Qth[0, …0, 1, 0, …, 0]

T

Page 16: Fast D irection- A ware  P roximity  for Graph Mining

16

FastOneDAP: Iterative Alg.

• Alg. to estimate i Col of Qth

Page 17: Fast D irection- A ware  P roximity  for Graph Mining

17

FastOneDAP: Property• Convergence Guaranteed !

• Computational Save– Example: • 100K nodes and 1M edges (50 Iterations)• 10,000,000x fast!

• Footnote: 1 col is enough! – (details in paper)

Page 18: Fast D irection- A ware  P roximity  for Graph Mining

18

Esc_Prob is good, but…

• Issue #1: – `Degree-1 node’ effect

• Issue #2:–Weakly connected pair

Need some practical modifications!

Page 19: Fast D irection- A ware  P roximity  for Graph Mining

19

Issue#1: `degree-1 node’ effect[Faloutsos+] [Koren+]

• no influence for degree-1 nodes (E, F)!– known as ‘pizza delivery guy’ problem in undirected graph

• Solutions: Universal Absorbing Boundary!

A BD1 1

A BD1 1/3

E F

1/31/311

Esc_Prob(a->b)=1

Esc_Prob(a->b)=1

Page 20: Fast D irection- A ware  P roximity  for Graph Mining

20

Universal Absorbing Boundary

U-A-B is a black-hole!

A BD1 1

U-A-B

Footnote: fly-out probability = 0.1

A BD0.9 0.9

U-A-B0.1

0.10.1

1

Page 21: Fast D irection- A ware  P roximity  for Graph Mining

21

Introducing Universal-Absorbing-Boundary

A BD0.9 0.9

U-A-B0.1

0.10.1

A BD0.9 0.3

E F

0.30.30.90.9

U-A-B

0.1

0.10.10.10.1

Prox(a->b)=0.91

Prox(a->b)=0.74

A BD1 1

A BD1 1/3

E F

1/31/311

Footnote: fly-out probability = 0.1

Esc_Prob(a->b)=1

Esc_Prob(a->b)=1

Page 22: Fast D irection- A ware  P roximity  for Graph Mining

22

Issue#2: Weakly connected pair

A B1 1 1

wi j

Prox(AB) = Prox (BA)=0

Solution: Partial symmetry!

a w

i j(1-a) w

.

.

Page 23: Fast D irection- A ware  P roximity  for Graph Mining

23

Practical Modifications: Partial Symmetry

A B1 1 1

Prox(AB) = Prox (BA)=0

A B0.9 0.9 0.9

0.1 0.1 0.1

Prox(AB) =0.081 > Prox (BA)=0.009

Page 24: Fast D irection- A ware  P roximity  for Graph Mining

24

Efficiency: FastAllDAP

Size of Graph

Time (sec)Straight-Solver

FastAllDAP

1,000xfaster!

Page 25: Fast D irection- A ware  P roximity  for Graph Mining

25

Efficiency: FastOneDAP

Size of Graph

Time (sec)

FastOneDAP

Straight-Solver

1,0000xfaster!

Page 26: Fast D irection- A ware  P roximity  for Graph Mining

27

Link Prediction: direction

• Q: Given the existence of the link, what is the direction of the link?

• A: Compare prox(ij) and prox(ji)>70%

Prox (ij) - Prox (ji)

density

Page 27: Fast D irection- A ware  P roximity  for Graph Mining