graphscope : parameter-free mining of large time-evolving graphs

34
GraphScope: Parameter-Free Mining of Large Time-Evolving Graphs Jimeng Sun CMU Spiros Papadimitriou IBM Philip S. Yu IBM

Upload: price-bates

Post on 03-Jan-2016

88 views

Category:

Documents


1 download

DESCRIPTION

GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs. Jimeng Sun CMU Spiros Papadimitriou IBM Philip S. Yu IBM Christos Faloutsos CMU. Motivation of GraphScope. Time-evolving graphs Network traffic graphs Email networks Customer product relationships - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs

GraphScope: Parameter-Free Mining of Large Time-Evolving GraphsJimeng Sun CMU

Spiros Papadimitriou IBM

Philip S. Yu IBM

Christos Faloutsos CMU

Page 2: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs

Motivation of GraphScope

Time-evolving graphs Network traffic graphs Email networks Customer product relationshipsCall detail records in telecom networks Financial transaction data

Key questions:1. How to monitor community structures?

2. How to detect the change points?

2

Page 3: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs

3

1. Community discovery

5 10 15 20 25

5

10

15

20

25

5 10 15 20 25

5

10

15

20

25

Products

Graph Adjacency matrix

289 /300

48/50

5/200 2/75

Books

CEOsResearchers

BMWs

97%

96%

3%

3%

54%54%

Simultaneously group: customers and products,or, source-destination traffic graphs,or, sender-recipient communication, etc…

Cus

tom

ers

Product groups

Cus

tom

er g

roup

s

Customers

ProductsCustomers

Products

e.g.,

Page 4: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs

4

2. Change detection

time

Find change points in group structure

Products

Cus

tom

ers

Produ

cts

holiday season

Page 5: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs

Given graphs G1, G2, … Gt where Gi is n-by-m

1. partition them into time segments G(1), G(2), …

2. for each segment, identify the groups

5

Problem definition

time

1. Scalable, 2. Parameter-free, 3. Incremental

G(1) G(2)

Page 6: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs

6

Outline

MotivationGraphScope

Community discovery Change detection

Experiments

Page 7: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs

7

Community detectionClustering problem Compression problem

t = 0 t = 1 t = 2

Page 8: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs

8

Cost objective within a time segment

p 1,1

p 1,2

p 1,3

p 2,1

p 2,2

p 2,3

p 3,3

p 3,2

p 3,1

n1

n2

n3k =

3 row

groups

m 1

m 2

m 3

ℓ = 3

col. g

roup

s

dsegment duration

log dnimj

i,j d nimj H(pi,j)

density of ones (edges)

d n1m2 H(p1,2) bits for (1,2)

code cost

bits total

i,j+

description cost

+

+ log* d

Page 9: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs

9

Cost objective within a time segment

code cost(blocks)

description cost(blocks’ model)

+

one row groupone col group

n row groupsm col groups

low

high low

high

Page 10: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs

10

Cost objectivewithin a time segment

code cost(blocks)

description cost(blocks’ model)

+

k = 3 row groupsℓ = 3 col groups

low

low

Page 11: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs

Search for the optimum grouping

Problem is NP-hard even for one timestamp on column permutation onlyReduction from TSP problem [Johnson+ 03]

HeuristicsSearch: Split, Merge, Shuffle Initialization: Resume, Restart

11

Page 12: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs

12

Outline

MotivationGraphScope

Community discovery Change detection

Experiments

Page 13: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs

13

Change point detection

Option 1:Append to current segment

Page 14: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs

14

Change point detection

change point

Option 2:Start new segment

Page 15: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs

15

Change point detection

1: append

2: split (time)

In both cases, we do row & col. shuffles, splits and/or merges

Choose the most parsimonious option

Page 16: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs

16

Outline

MotivationGraphScope

Single timestamp Multiple timestamp

Experiments

Page 17: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs

Objectives

Effectiveness on Community discoveryChange detection

Compression benefit Scalable, incremental computation

17

Page 18: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs

18

Evolving communitiesNETWORK

29K hosts (nodes)12K edges (on avg)1,220 hours

~ 14.6M edges totaltime

Page 19: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs

19

Community change pointsENRON

34K email addresses12K emails (on avg)165 weeks

~ 2M emails total

Key change-pointscorrespond to

key events

Page 20: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs

Compression gain

20GraphScope gives 10%-150% compression gain

Graphscope

Page 21: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs

21

Graph stream clusteringScalability—NETWORK

29K hosts (nodes) 12K edges per hour (on average) 1,220 hours (timestamps) ~ 14.6M edges total

< 2 sec / snapshot on avg

Page 22: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs

Related work

Co-clustering [Dhillon+ KDD03] [Chakrabarti+ KDD04]

Graph partitioning [Karypis+ 99]

Time-evolving graphs [Chakrabarti+ KDD06] [Chi+ KDD07] [Asur+ KDD07]

22

Page 23: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs

23

Summary

Organize into few, homogeneous communities

Find changes in community structure

Scalable Parameter-free Incremental

Page 24: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs

GraphScope: Parameter-Free Mining of Large Time-Evolving GraphsJimeng Sun

Spiros Papadimitriou

Philip S. Yu

Christos Faloutsos

Page 25: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs

25

Graph stream clustering

t = 0 t = 1 t = 2

Page 26: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs

28

Graph clustering – [Chakrabarti+ KDD’04]

versus

Column groups Column groups

Row

gro

ups

Row

gro

ups

Good Clustering

1. Similar nodes are grouped together

2. As few groups as necessary

A few, homogeneous

blocks

Good Compression

Why is this better?

implies

Page 27: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs

29

Graph clustering – [Chakrabarti+ KDD’04]

versus

Column groups Column groups

Row

gro

ups

Row

gro

ups

Good Clustering

1. Similar nodes are grouped together

2. As few groups as necessary

A few, homogeneous

blocks

Good Compression

Why is this better?

implies

Good Clustering

GoodCompression

implies

Page 28: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs

30

log nimj

Assumes group paritionings,sizes and densities are given

i,j nimj H(pi,j)

Cost objective

n1

n2

n3

m1 m2 m3

p1,1 p1,2 p1,3

p2,1 p2,2 p2,3

p3,3p3,2p3,1

n £ m adj. matrix

k =

3 r

ow g

roup

s

ℓ = 3 col. groups

density of ones (edges)

n1m2 H(p1,2) bits for (1,2)

code cost

bits total

irow-partitionidescription j

col-partitionjdescription

i,jtransmit#edges ei,j

+

+

description cost

+

block size entropy

Page 29: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs

31

Graph clusteringScalability

Number of edges

Tim

e (s

ec)

Splits

Shuffles

Linear on the number of edges Scalable

Time vs. Size

Page 30: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs

32

Cost objective

code cost(blocks)

description cost(blocks’ model)

+

one row groupone col group

n row groupsm col groups

low

high low

high

Page 31: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs

33

Cost objective

code cost(blocks)

description cost(blocks’ model)

+

k = 3 row groupsℓ = 3 col groups

low

low

Page 32: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs

34

Search for optimum

k

bit

cost

Cost vs. number of groups

one row

groupone

col group

n row

groupsm

col g

roupsk =

3 row

groupsℓ =

3 co

l groups

Page 33: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs

35

splitshuffle

k = 5, ℓ = 5k = 5, ℓ = 5

Search for optimumSummary

k=1, ℓ=2 k=2, ℓ=2 k=2, ℓ=3 k=3, ℓ=3 k=3, ℓ=4 k=4, ℓ=4 k=4, ℓ=5

k = 1, ℓ = 1

splitshuffle

Split:Increase k or ℓ

Shuffle:Rearrange rows and cols

Merge:Decrease k or ℓ

Page 34: GraphScope : Parameter-Free Mining of Large Time-Evolving Graphs

36

Graph clustering – [Chakrabarti+ KDD’04]

Given a graph of interactions or associationsCustomers to products Documents to termsPeople to peopleComputer communicationsFinancial transactions

Find simultaneouslyCommunities (source and destination)Their number