k-nearest neighbors in uncertain graphs (michalis potamias, francesco bonchi, aristides gionis,...

46
k-Nearest Neighbors in Uncertain Graphs Michalis Potamias Francesco Bonchi Aristides Gionis George Kollios

Upload: michalis-potamias

Post on 05-Aug-2015

1.404 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi, Aristides Gionis, George Kollios)

k-Nearest Neighbors in Uncertain Graphs

Michalis Potamias Francesco Bonchi

Aristides Gionis George Kollios

Page 2: k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi, Aristides Gionis, George Kollios)

Nearest Neighbors in Uncertain Graphs @ VLDB 2010

2

Thesis

• Many complex networks are modeled as probabilistic (i.e., uncertain) graphs.

• The probabilistic treatment of such graphs leads to better understanding of real data.

Page 3: k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi, Aristides Gionis, George Kollios)

Nearest Neighbors in Uncertain Graphs @ VLDB 2010

3

Source: Asthana et al., Genome Research 2004

Possible interactions between proteins are established through biological experiments that entail uncertainty. The edge probabilityrepresents that uncertainty. A

B C

D

0.2

0.4

0.6

0.3 0.7

A

B C

D

Probabilistic Protein-Protein Interaction Networks

Page 4: k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi, Aristides Gionis, George Kollios)

Nearest Neighbors in Uncertain Graphs @ VLDB 2010

4

• Neighbors of a given node in a standard graph?– Nodes close in terms of shortest path distance!

• How do we define neighbors in probabilistic graphs?

• How do we define the distance?

– Treat them as weighted graphs (N06)– Nodes with high reliability(GR04)– Most probable path (BI03)– …shortest paths? (VLDB10)

A

B C

D

0.2

0.4

0.6

0.3 0.7

A

B C

D

Probabilistic Protein-Protein Interaction Networks

Page 5: k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi, Aristides Gionis, George Kollios)

Nearest Neighbors in Uncertain Graphs @ VLDB 2010

5

• Why is it important to find good neighbors of proteins in PPI networks?– Detection of candidate co-complex relationships.– Actual co-complex relationships can be

established through experiments in the lab.

Probabilistic Protein-Protein Interaction Networks

Page 6: k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi, Aristides Gionis, George Kollios)

Nearest Neighbors in Uncertain Graphs @ VLDB 2010

6

Outline

• Thesis

• Probabilistic PPI Networks

• Distance Definition

• Sampling Algorithms

• kNN Pruning

• Experiments

Page 7: k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi, Aristides Gionis, George Kollios)

Nearest Neighbors in Uncertain Graphs @ VLDB 2010

7

Outline

• Thesis

• Probabilistic PPI Networks

• Distance Definition

• Sampling Algorithms

• kNN Pruning

• Experiments

Page 8: k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi, Aristides Gionis, George Kollios)

Nearest Neighbors in Uncertain Graphs @ VLDB 2010

8

A

B C

D

0.2

0.4

0.6

0.3 0.7

A

B C

D

A

B C

D

A

B C

D

A

B C

D

A

B C

D

A

B C

D

A

B C

D

A

B C

D

A

B C

D

A

B C

D

A

B C

D

A

B C

D

A

B C

D

A

B C

D

A

B C

D

A

B C

D

A

B C

D

A

B C

D

A

B C

D

A

B C

D

A

B C

D

A

B C

D

A

B C

D

A

B C

D

A

B C

D

A

B C

D

A

B C

D

A

B C

D

A

B C

D

A

B C

D

A

B C

D

A

B C

D

Distance Definition

Page 9: k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi, Aristides Gionis, George Kollios)

Nearest Neighbors in Uncertain Graphs @ VLDB 2010

9

Distance Definition

the graphA

B C

D

0.2

0.4

0.6

0.3 0.7

Page 10: k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi, Aristides Gionis, George Kollios)

Nearest Neighbors in Uncertain Graphs @ VLDB 2010

10

Distance Definition

the graph

)),(1()),(1()),(1(

),(),()Pr(

DApDCpCBp

DBpBApworld

a worldA

B C

D

0.2

0.4

0.6

0.3 0.7

A

B C

D

Page 11: k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi, Aristides Gionis, George Kollios)

Nearest Neighbors in Uncertain Graphs @ VLDB 2010

12

Distance Definition

the graph a worldA

B C

D

0.2

0.4

0.6

0.3 0.7

A

B C

D

.3.26

.44

1 2 infshortest path length d(B,D)

PDF

)),(1()),(1()),(1(

),(),()Pr(

DApDCpCBp

DBpBApworld

Page 12: k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi, Aristides Gionis, George Kollios)

Nearest Neighbors in Uncertain Graphs @ VLDB 2010

14

• Use well known statistics of the Shortest Path PDF:– Median– Majority (mode)– ExpectedReliable

• infinity problem

• Hard! they require explicit enumeration of possible worlds: resort to sampling!

.3.26

.44

1 2 inf46.1

inf

2

exp

d

d

d

maj

med

shortest path length d(B,D)

PDF

Distance Definition

Page 13: k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi, Aristides Gionis, George Kollios)

Nearest Neighbors in Uncertain Graphs @ VLDB 2010

15

Outline

• Thesis

• Probabilistic PPI Networks

• Distance Definition

• Sampling Algorithms

• kNN Pruning

• Experiments

Page 14: k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi, Aristides Gionis, George Kollios)

Nearest Neighbors in Uncertain Graphs @ VLDB 2010

16

1. sample (a small number of) worlds

2. compute sample median (approximation)

3. output result– Median (Chernoff bound) – ExpectedReliable (Hoeffding inequality)– Majority (No bound)

Sampling Algorithms

Page 15: k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi, Aristides Gionis, George Kollios)

Nearest Neighbors in Uncertain Graphs @ VLDB 2010

17

Sampling Algorithms

BIOMINEdatabase of biological entities and uncertain interactions fromUHelsinki1M nodes, 10M edges

FLICKRusers from flickr.com. edges have been created assuming homophily based on jaccard of flickr groups77K nodes, 20M edges

Page 16: k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi, Aristides Gionis, George Kollios)

Nearest Neighbors in Uncertain Graphs @ VLDB 2010

18

Outline

• Thesis

• Probabilistic PPI Networks

• Distance Definition

• Sampling Algorithms

• kNN Pruning

• Experiments

Page 17: k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi, Aristides Gionis, George Kollios)

Nearest Neighbors in Uncertain Graphs @ VLDB 2010

19

kNN Pruning

• Query: Given a probabilistic graph, and a source node find the set of k nodes closest to the source.

• Naïve algorithm:1. sample worlds

2. run dijkstra traversals and compute a pdf of the sp distance per node

3. calculate the median distance to all nodes using the pdf’s

4. compute k-nn

Page 18: k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi, Aristides Gionis, George Kollios)

Nearest Neighbors in Uncertain Graphs @ VLDB 2010

20

kNN Pruning

1nn - mediannode: Asample: 5 worlds

E

B

G

D

0.9

A

C

F

0.3

0.4

0.6

0.8

0.5

0.3

0.7

naive

Page 19: k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi, Aristides Gionis, George Kollios)

Nearest Neighbors in Uncertain Graphs @ VLDB 2010

21

kNN Pruning

1nn - mediannode: Asample: 5 worlds

E

B

G

D

0.9

A

C

F

0.3

0.4

0.6

0.8

0.5

0.3

0.7

E

B

G

D

A

C

F

1

B C D E F G

2 3

naive

Page 20: k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi, Aristides Gionis, George Kollios)

Nearest Neighbors in Uncertain Graphs @ VLDB 2010

22

kNN Pruning

1nn - mediannode: Asample: 5 worlds

B C D E F G

E

B

G

D

0.9

A

C

F

0.3

0.4

0.6

0.8

0.5

0.3

0.7

E

B

G

D

A

C

F

E

B

G

D

A

C

F

1 2 3

naive

Page 21: k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi, Aristides Gionis, George Kollios)

Nearest Neighbors in Uncertain Graphs @ VLDB 2010

23

kNN Pruning

1nn - mediannode: Asample: 5 worlds

E

B

G

D

A

C

F

E

B

G

D

0.9

A

C

F

0.3

0.4

0.6

0.8

0.5

0.3

0.7

E

B

G

D

A

C

F

E

B

G

D

A

C

F

1

B C D E F G

2 3 21 2 2

naive

Page 22: k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi, Aristides Gionis, George Kollios)

Nearest Neighbors in Uncertain Graphs @ VLDB 2010

24

kNN Pruning

1nn - mediannode: Asample: 5 worlds

E

B

G

D

A

C

F

E

B

G

D

0.9

A

C

F

0.3

0.4

0.6

0.8

0.5

0.3

0.7

E

B

G

D

A

C

F

E

B

G

D

A

C

F

E

B

G

D

A

C

F

1

B C D E F G

2 3 21 2 2

naive

Page 23: k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi, Aristides Gionis, George Kollios)

Nearest Neighbors in Uncertain Graphs @ VLDB 2010

25

kNN Pruning

1nn - mediannode: Asample: 5 worlds

E

B

G

D

A

C

F

E

B

G

D

0.9

A

C

F

0.3

0.4

0.6

0.8

0.5

0.3

0.7

E

B

G

D

A

C

F

E

B

G

D

A

C

F

E

B

G

D

A

C

F

E

B

G

D

A

C

F

1

B C D E F G

2 3 21 2 2

naive

Page 24: k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi, Aristides Gionis, George Kollios)

Nearest Neighbors in Uncertain Graphs @ VLDB 2010

26

kNN Pruning

1nn - mediannode: Asample: 5 worlds

E

B

G

D

A

C

F

E

B

G

D

0.9

A

C

F

0.3

0.4

0.6

0.8

0.5

0.3

0.7

E

B

G

D

A

C

F

E

B

G

D

A

C

F

E

B

G

D

A

C

F

E

B

G

D

A

C

F

1

B C D E F G

2 3 21 2 2

1

2

3

naive

Page 25: k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi, Aristides Gionis, George Kollios)

Nearest Neighbors in Uncertain Graphs @ VLDB 2010

27

kNN Pruning

1nn - mediannode: Asample: 5 worlds

E

B

G

D

A

C

F

E

B

G

D

0.9

A

C

F

0.3

0.4

0.6

0.8

0.5

0.3

0.7

E

B

G

D

A

C

F

E

B

G

D

A

C

F

E

B

G

D

A

C

F

E

B

G

D

A

C

F

1

B C D E F G

2 3 21 2 2

1

2

3

naive

Page 26: k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi, Aristides Gionis, George Kollios)

Nearest Neighbors in Uncertain Graphs @ VLDB 2010

28

kNN Pruning

1nn - mediannode: Asample: 5 worlds

E

B

G

D

0.9

A

C

F

0.3

0.4

0.6

0.8

0.5

0.3

0.7

• algorithm– sample worlds on the fly– increase the horizon of each dijkstra one hop at a

time– maintain truncated pdf histograms

Page 27: k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi, Aristides Gionis, George Kollios)

Nearest Neighbors in Uncertain Graphs @ VLDB 2010

29

kNN Pruning

1nn - mediannode: Asample: 5 worlds

E

B

G

D

0.9

A

C

F

0.3

0.4

0.6

0.8

0.5

0.3

0.7

Page 28: k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi, Aristides Gionis, George Kollios)

Nearest Neighbors in Uncertain Graphs @ VLDB 2010

30

kNN Pruning

1nn - mediannode: Asample: 5 worlds

E

B

G

D

0.9

A

C

F

0.3

0.4

0.6

0.8

0.5

0.3

0.71

B

B

A

Page 29: k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi, Aristides Gionis, George Kollios)

Nearest Neighbors in Uncertain Graphs @ VLDB 2010

31

kNN Pruning

1nn - mediannode: Asample: 5 worlds

E

B

G

D

0.9

A

C

F

0.3

0.4

0.6

0.8

0.5

0.3

0.71

B

B

A

B

A

Page 30: k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi, Aristides Gionis, George Kollios)

Nearest Neighbors in Uncertain Graphs @ VLDB 2010

32

kNN Pruning

1nn - mediannode: Asample: 5 worlds

E

B

G

D

0.9

A

C

F

0.3

0.4

0.6

0.8

0.5

0.3

0.71

B C

B

A

B

A

B

A

C

1

Page 31: k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi, Aristides Gionis, George Kollios)

Nearest Neighbors in Uncertain Graphs @ VLDB 2010

33

kNN Pruning

1nn - mediannode: Asample: 5 worlds

E

B

G

D

0.9

A

C

F

0.3

0.4

0.6

0.8

0.5

0.3

0.7

B

A

B

A

B

A

C

B

A

C

1

B C

1

Page 32: k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi, Aristides Gionis, George Kollios)

Nearest Neighbors in Uncertain Graphs @ VLDB 2010

34

kNN Pruning

1nn - mediannode: Asample: 5 worlds

E

B

G

D

0.9

A

C

F

0.3

0.4

0.6

0.8

0.5

0.3

0.7

B

A

B

A

B

A

C

B

A

C

A

1

B C

1

Page 33: k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi, Aristides Gionis, George Kollios)

Nearest Neighbors in Uncertain Graphs @ VLDB 2010

35

kNN Pruning

1nn - mediannode: Asample: 5 worlds

E

B

G

D

0.9

A

C

F

0.3

0.4

0.6

0.8

0.5

0.3

0.7

B

A

B

A

B

A

C

B

A

C

A

1

B C

1

1

>1

Page 34: k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi, Aristides Gionis, George Kollios)

Nearest Neighbors in Uncertain Graphs @ VLDB 2010

36

kNN Pruning

1nn - mediannode: Asample: 5 worlds

E

B

G

D

0.9

A

C

F

0.3

0.4

0.6

0.8

0.5

0.3

0.7

B

A

B

A

B

A

C

B

A

C

A

1

B C

1

•B has distance 1•C has distance greater than 1•D, E, F, G, … were not discovered (d>1)•1NN set is complete with B – no need to cont

•just 2 nodes visited (and 2 histograms maintained)•worlds were only partially instantiated •same answer as the naive

•with a small cost: dijkstra state needs to be maintained in memory for all worlds

1

>1

Page 35: k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi, Aristides Gionis, George Kollios)

Nearest Neighbors in Uncertain Graphs @ VLDB 2010

37

kNN Pruning

BIOMINEdatabase of biological entities and uncertain interactions fromUHelsinki1M nodes, 10M edges

FLICKRusers from flickr.com. edges have been created assuming homophily based on jaccard of flickr groups77K nodes, 20M edges

DBLPauthors from dblp. probabilities have been assigned based on number of coauthored papers226K nodes, 1.4M edges

for 200 worlds and 5NN the speedups were:247x (BIOMINE), 111x (FLICKR), 269x (DBLP)

Page 36: k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi, Aristides Gionis, George Kollios)

Nearest Neighbors in Uncertain Graphs @ VLDB 2010

38

kNN Pruning

BIOMINEdatabase of biological entities and uncertain interactions fromUHelsinki1M nodes, 10M edges

FLICKRusers from flickr.com. edges have been created assuming homophily based on jaccard of flickr groups77K nodes, 20M edges

DBLPauthors from dblp. probabilities have been assigned based on number of coauthored papers226K nodes, 1.4M edges

for 200 worlds and 5NN the speedups were:247x (BIOMINE), 111x (FLICKR), 269x (DBLP)

Page 37: k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi, Aristides Gionis, George Kollios)

Nearest Neighbors in Uncertain Graphs @ VLDB 2010

39

Less uncertainty, more pruning

Page 38: k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi, Aristides Gionis, George Kollios)

Nearest Neighbors in Uncertain Graphs @ VLDB 2010

40

Less uncertainty, more pruning

A

B C

D

0.2

0.4

0.6

0.3 0.7

d

A

B C

D

1-0.8

1-0.6

1-0.4

1-0.7 1-0.3

d

d

d d

•boost probabilities of edges by giving each edge d chances

•d=1: original graph•increasing d, p goes to 1

Page 39: k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi, Aristides Gionis, George Kollios)

Nearest Neighbors in Uncertain Graphs @ VLDB 2010

41

Less uncertainty, more pruning

A

B C

D

0.2

0.4

0.6

0.3 0.7

d

A

B C

D

1-0.8

1-0.6

1-0.4

1-0.7 1-0.3

d

d

d d

•boost probabilities of edges by giving each edge d chances

•d=1: original graph•increasing d, p goes to 1

Page 40: k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi, Aristides Gionis, George Kollios)

Nearest Neighbors in Uncertain Graphs @ VLDB 2010

42

Less uncertainty, more pruning

A

B C

D

0.2

0.4

0.6

0.3 0.7

d

A

B C

D

1-0.8

1-0.6

1-0.4

1-0.7 1-0.3

d

d

d d

•boost probabilities of edges by giving each edge d chances

•d=1: original graph•increasing d, p goes to 1

Page 41: k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi, Aristides Gionis, George Kollios)

Nearest Neighbors in Uncertain Graphs @ VLDB 2010

43

Outline

• Thesis

• Probabilistic PPI Networks

• Distance Definition

• Sampling Algorithms

• kNN Pruning

• Experiments

Page 42: k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi, Aristides Gionis, George Kollios)

Nearest Neighbors in Uncertain Graphs @ VLDB 2010

44

Experiments

• Dataset– Probabilistic PPI network

[Krogan et al, Nature 06]

– Protein co-complex relationships (ground truth)

[Mewes et al, Nuc Acids Res 04]

• Experiment– Choose a ground truth edge

(A,B)– Choose a node C s.t. there is

no ground truth edge (A,C)– Classification task: Distinguish

between the two types of edges: (A,B) and (A,C)

Page 43: k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi, Aristides Gionis, George Kollios)

Nearest Neighbors in Uncertain Graphs @ VLDB 2010

45

Experiments

• Dataset– Probabilistic PPI network

[Krogan et al, Nature 06]

– Protein co-complex relationships (ground truth)

[Mewes et al, Nuc Acids Res 04]

• Experiment– Choose a ground truth edge

(A,B)– Choose a node C s.t. there is

no ground truth edge (A,C)– Classification task: Distinguish

between the two types of edges: (A,B) and (A,C)

Page 44: k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi, Aristides Gionis, George Kollios)

Nearest Neighbors in Uncertain Graphs @ VLDB 2010

46

Conclusion

• Probabilistic graph analysis benefits from possible-world semantics.

– Extended standard graph concepts to probabilistic graphs and designed approximation algorithms to compute them

– Introduced novel pruning algorithms for kNN in probabilistic graphs

– Confirmed the efficacy of our framework on real data.

Page 45: k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi, Aristides Gionis, George Kollios)

Nearest Neighbors in Uncertain Graphs @ VLDB 2010

47

Future Work

• Enrich model– Node probabilities– Arbitrary PDFs

• Explore random walks further

Page 46: k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi, Aristides Gionis, George Kollios)

Nearest Neighbors in Uncertain Graphs @ VLDB 2010

48

Thank you!

?