jure leskovec, cornell/stanford university - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfthe network...

48
Jure Leskovec, Cornell/Stanford University Joint work with Kevin Lang, Anirban Dasgupta and Michael Mahoney, Yahoo! Research

Upload: phungtruc

Post on 20-Mar-2018

224 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Jure Leskovec, Cornell/Stanford University - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfThe Network Community Profile (NCP) plot of the graph is: How community like is a set of nodes?

Jure Leskovec, Cornell/Stanford UniversityJoint work with Kevin Lang, Anirban Dasgupta and Michael Mahoney, Yahoo! Research

Page 2: Jure Leskovec, Cornell/Stanford University - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfThe Network Community Profile (NCP) plot of the graph is: How community like is a set of nodes?

Network: an interaction graph: • Nodes represent “entities”• Edges represent “interaction” between pairs of entities

2

Page 3: Jure Leskovec, Cornell/Stanford University - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfThe Network Community Profile (NCP) plot of the graph is: How community like is a set of nodes?

Are there natural clusters, communities, partitions, etc.?

Concept-based clusters, link-based clusters, density-based clusters, …

3

Page 4: Jure Leskovec, Cornell/Stanford University - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfThe Network Community Profile (NCP) plot of the graph is: How community like is a set of nodes?

Bid, click and impression information for “keyword x advertiser” pair

Mine information at query-time to provide new ads

Maximize CTR, RPS, advertiser ROI

4

Page 5: Jure Leskovec, Cornell/Stanford University - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfThe Network Community Profile (NCP) plot of the graph is: How community like is a set of nodes?

Find micro-markets by partitioning the “query x advertiser” graph:

advertiser

qu

ery

5

Page 6: Jure Leskovec, Cornell/Stanford University - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfThe Network Community Profile (NCP) plot of the graph is: How community like is a set of nodes?

Linear (low-rank) methods: If Gaussian, then low-rank space is good

Kernel (non-linear) methods: If low-dimensional manifold, then kernels are

good Hierarchical methods: Top-down and bottom-up – common in social

sciences Graph partitioning methods: Define “edge counting” metric – conductance,

expansion, modularity, etc. – and optimize!

“It is a matter of common experience that communities exist in networks ... Although not precisely defined, communities are usually thought of as sets of nodes with better connections amongst its members than with the rest of the world.”

6

Page 7: Jure Leskovec, Cornell/Stanford University - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfThe Network Community Profile (NCP) plot of the graph is: How community like is a set of nodes?

Communities:

Sets of nodes with lots of connections inside and few to outside (the rest of the network)

Assumption:

Networks are (hierarchically) composed of communities Communities, clusters,

groups, modules7

Page 8: Jure Leskovec, Cornell/Stanford University - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfThe Network Community Profile (NCP) plot of the graph is: How community like is a set of nodes?

Communities:

Sets of nodes with lots of connections inside and few to outside (the rest of the network)

Assumption:

Networks are (hierarchically) composed of communities

8

Question:Are large networks

really like this?

Hierarchical community structure

Page 9: Jure Leskovec, Cornell/Stanford University - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfThe Network Community Profile (NCP) plot of the graph is: How community like is a set of nodes?

Let A be the adjacency matrix of G=(V,E).

The conductance of a set S of nodes is:

The Network Community Profile (NCP) plot of the graph is:

How community like is a set of nodes? S

S’

9

Page 10: Jure Leskovec, Cornell/Stanford University - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfThe Network Community Profile (NCP) plot of the graph is: How community like is a set of nodes?

Score: Φ(S) = # edges cut / # edges inside

What is “best” community of

5 nodes?

10

Page 11: Jure Leskovec, Cornell/Stanford University - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfThe Network Community Profile (NCP) plot of the graph is: How community like is a set of nodes?

Score: Φ(S) = # edges cut / # edges inside

Bad community

Φ=5/6 = 0.83

What is “best” community of

5 nodes?

11

Page 12: Jure Leskovec, Cornell/Stanford University - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfThe Network Community Profile (NCP) plot of the graph is: How community like is a set of nodes?

Score: Φ(S) = # edges cut / # edges inside

Better community

Φ=5/7 = 0.7

Bad community

Φ=2/5 = 0.4

What is “best” community of

5 nodes?

12

Page 13: Jure Leskovec, Cornell/Stanford University - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfThe Network Community Profile (NCP) plot of the graph is: How community like is a set of nodes?

Score: Φ(S) = # edges cut / # edges inside

Better community

Φ=5/7 = 0.7

Bad community

Φ=2/5 = 0.4

Best community

Φ=2/8 = 0.25

What is “best” community of

5 nodes?

13

Page 14: Jure Leskovec, Cornell/Stanford University - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfThe Network Community Profile (NCP) plot of the graph is: How community like is a set of nodes?

Network community profile (NCP) plot

Plot the score of best community of size k

14

Community size, log k

log Φ(k)Φ(5)=0.25

Φ(7)=0.18

k=5 k=7

Page 15: Jure Leskovec, Cornell/Stanford University - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfThe Network Community Profile (NCP) plot of the graph is: How community like is a set of nodes?

Idea: Use approximation algorithms for NP-hard graph partitioning problems as experimental probes of network structure. Spectral (quadratic approx): confuses “long paths” with “deep

cuts” Multi-commodity flow (log(n) approx): difficulty with expanders SDP (sqrt(log(n)) approx): best in theory Metis (multi-resolution heuristic): common in practice X+MQI: post-processing step on, e.g., MQI of Metis

Local Spectral - connected and tighter sets (empirically) Metis+MQI - best conductance (empirically)

We are not interested in partitions per se, but in probing network structure

15

Page 16: Jure Leskovec, Cornell/Stanford University - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfThe Network Community Profile (NCP) plot of the graph is: How community like is a set of nodes?

d-dimensional meshes California road network

16

Page 17: Jure Leskovec, Cornell/Stanford University - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfThe Network Community Profile (NCP) plot of the graph is: How community like is a set of nodes?

Zachary’s university karate club social network

During the study club split into 2

The split (squares vs. circles) corresponds to cut B

17

Page 18: Jure Leskovec, Cornell/Stanford University - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfThe Network Community Profile (NCP) plot of the graph is: How community like is a set of nodes?

Collaborations between scientists in Networks [Newman, 2005]

18

Page 19: Jure Leskovec, Cornell/Stanford University - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfThe Network Community Profile (NCP) plot of the graph is: How community like is a set of nodes?

19

[Ravasz&Barabasi, 2003]

[Clauset,Moore&Newman, 2008]

Page 20: Jure Leskovec, Cornell/Stanford University - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfThe Network Community Profile (NCP) plot of the graph is: How community like is a set of nodes?

Previously researchers examined community structure of small networks (~100 nodes)

We examined more than 100 different largenetworks

Large networks look very different!

20

Page 21: Jure Leskovec, Cornell/Stanford University - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfThe Network Community Profile (NCP) plot of the graph is: How community like is a set of nodes?

Typical example:General relativity collaboration network (4,158 nodes, 13,422 edges)

21

Page 22: Jure Leskovec, Cornell/Stanford University - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfThe Network Community Profile (NCP) plot of the graph is: How community like is a set of nodes?

Φ(k

), (c

on

du

ctan

ce)

k, (community size)

Better and better communities

Communities get worse and worse

Best community has ~100 nodes

22

Page 23: Jure Leskovec, Cornell/Stanford University - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfThe Network Community Profile (NCP) plot of the graph is: How community like is a set of nodes?

Definition: Whisker is a maximal set of nodes connected to the network by a single edge

Whiskers are responsible for downward slope of NCP plot

NCP plot

Largest whisker

23

Page 24: Jure Leskovec, Cornell/Stanford University - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfThe Network Community Profile (NCP) plot of the graph is: How community like is a set of nodes?

Network structure: Core-periphery

(jellyfish, octopus)

Whiskers are responsible for

good communities

Denser and denser core of

the network

Core contains ~60% nodes and

~80% edges

24

Page 25: Jure Leskovec, Cornell/Stanford University - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfThe Network Community Profile (NCP) plot of the graph is: How community like is a set of nodes?

Each new edge inside the community costs more

NCP plot

Φ=2/4 = 0.5

Φ=8/6 = 1.3

Φ=64/14 = 4.5

Each node has twice as many children

Φ=1/3 = 0.33

25

Page 26: Jure Leskovec, Cornell/Stanford University - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfThe Network Community Profile (NCP) plot of the graph is: How community like is a set of nodes?

26

Whiskers:

Edge to cut

Whiskers in real networks are non-trivial (richer than trees)

Page 27: Jure Leskovec, Cornell/Stanford University - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfThe Network Community Profile (NCP) plot of the graph is: How community like is a set of nodes?

27

WhiskersWhiskers in real networks are larger than

expected based on density and degree sequence

Page 28: Jure Leskovec, Cornell/Stanford University - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfThe Network Community Profile (NCP) plot of the graph is: How community like is a set of nodes?

28

Page 29: Jure Leskovec, Cornell/Stanford University - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfThe Network Community Profile (NCP) plot of the graph is: How community like is a set of nodes?

29

Nothing happens!Now we have 2-edge connected whiskers to

deal with. Indicates the recursiveness of our core-periphery structure: as we remove the

periphery, the core itself breaks into core and the periphery

Page 30: Jure Leskovec, Cornell/Stanford University - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfThe Network Community Profile (NCP) plot of the graph is: How community like is a set of nodes?

What if we allow cuts that give disconnected communities?

•Cut all whiskers •Compose communities out of whiskers•How good “community” do we get?

30

Page 31: Jure Leskovec, Cornell/Stanford University - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfThe Network Community Profile (NCP) plot of the graph is: How community like is a set of nodes?

LiveJournal

Rewired network

Local spectral

Bag-of-whiskers

Metis+MQI

31

Page 32: Jure Leskovec, Cornell/Stanford University - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfThe Network Community Profile (NCP) plot of the graph is: How community like is a set of nodes?

Regularization properties: spectral embeddings stretch along directions in which the random-walk mixes slowly Resulting hyperplane cuts have "good" conductance

cuts, but may not yield the optimal cuts

spectral embedding flow based embedding

32

Page 33: Jure Leskovec, Cornell/Stanford University - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfThe Network Community Profile (NCP) plot of the graph is: How community like is a set of nodes?

Metis+MQI (red) gives sets with better conductance.

Local Spectral (blue) gives tighter and more well-rounded sets.

33

ext/

int

Do

ts a

re c

on

nec

ted

clu

ster

s

Page 34: Jure Leskovec, Cornell/Stanford University - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfThe Network Community Profile (NCP) plot of the graph is: How community like is a set of nodes?

Two ca. 500 node communities from Local Spectral:

Two ca. 500 node communities from Metis+MQI:

34

Page 35: Jure Leskovec, Cornell/Stanford University - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfThe Network Community Profile (NCP) plot of the graph is: How community like is a set of nodes?

... can be computed from:

Spectral embedding (independent of balance)

SDP-based methods (for volume-balanced partitions)

35

Page 36: Jure Leskovec, Cornell/Stanford University - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfThe Network Community Profile (NCP) plot of the graph is: How community like is a set of nodes?

What is a good model that explains such network structure?

None of the existing models work

Pref. attachment Small World Geometric Pref. Attachment

FlatDown and Flat

Flat and Down

36

Page 37: Jure Leskovec, Cornell/Stanford University - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfThe Network Community Profile (NCP) plot of the graph is: How community like is a set of nodes?

37

Note: Sparsity is the issue, not heavy-tails per se. (Power laws with 2< <3 give us the appropriate sparsity)

Page 38: Jure Leskovec, Cornell/Stanford University - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfThe Network Community Profile (NCP) plot of the graph is: How community like is a set of nodes?

Forest Fire [LKF05]:connections spread like a fire New node joins the network Selects a seed node Connects to some of its neighbors Continue recursively

38

Notes:• Preferential attachment flavor - second neighbor is not uniform at random.• Copying flavor - since burn seed’s neighbors.• Hierarchical flavor - seed is parent.• “Local” flavor - burn “near” --in a diffusion sense -- the seed vertex.

As community grows it blendsinto the core of

the network

Page 39: Jure Leskovec, Cornell/Stanford University - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfThe Network Community Profile (NCP) plot of the graph is: How community like is a set of nodes?

rewired

network

Bag of whiskers

39

Page 40: Jure Leskovec, Cornell/Stanford University - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfThe Network Community Profile (NCP) plot of the graph is: How community like is a set of nodes?

Whiskers:

Largest whisker has ~100 nodes

Whisker size is independent of network size

Core:

60% of the nodes, 80% edges

Core has little structure (hard to cut)

Still more structure than the random network

40

Page 41: Jure Leskovec, Cornell/Stanford University - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfThe Network Community Profile (NCP) plot of the graph is: How community like is a set of nodes?

The Dunbar number 150 individuals is maximum community size On-line communities have 60 members and break down at around

80, military, churches, divisions, etc. – all close to the Dunbar's 150

Common bond vs. common identity theory Common bond (people are attached to individual community

members) are smaller and more cohesive Common identity (people are attached to the group as a whole)

focused around common interest and tend to be larger and more diverse

What edges “mean” and community identification social networks - reasons an individual adds a link to a friend can

vary enormously citation networks or web graphs - links are more “expensive” and

are more semantically uniform41

Page 42: Jure Leskovec, Cornell/Stanford University - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfThe Network Community Profile (NCP) plot of the graph is: How community like is a set of nodes?

Networks with “ground truth” communities:

LiveJournal12: users create and explicitly join on-line groups

DBLP co-authorships: publication venues can be viewed as communities

Amazon product co-purchasing: each item belongs to one or more hierarchically organized

categories, as defined by Amazon

IMDB collaboration: countries of production and languages may be viewed as

communities42

Page 43: Jure Leskovec, Cornell/Stanford University - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfThe Network Community Profile (NCP) plot of the graph is: How community like is a set of nodes?

LiveJournal DBLP

Amazon IMDB

Rewired

NetworkGround truth

43

Page 44: Jure Leskovec, Cornell/Stanford University - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfThe Network Community Profile (NCP) plot of the graph is: How community like is a set of nodes?

NCP plot is a way to analyze network community structure

Our results agree with previous work on small networks (people did not hit the Dunbar’s limit)

But large networks are different:

Whiskers + Core (core-periphery) structure

Small well isolated communities blend into the core of the networks as they grow

44

Page 45: Jure Leskovec, Cornell/Stanford University - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfThe Network Community Profile (NCP) plot of the graph is: How community like is a set of nodes?

45

Page 46: Jure Leskovec, Cornell/Stanford University - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfThe Network Community Profile (NCP) plot of the graph is: How community like is a set of nodes?

Assume a recursive Kronecker model. Fit it to G. We get K =

What does this tell about the network structure?

46

0.9 0.5

0.5 0.1

Core0.9 edges

Periphery0.1 edges

0.5 edges

0.5 edges

As opposed to:

0.9 0.1

0.1 0.9

which gives a hierarchyCore-peripheryNo communities

No good cuts 0.9 0.9

0.1

0.1

Page 47: Jure Leskovec, Cornell/Stanford University - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfThe Network Community Profile (NCP) plot of the graph is: How community like is a set of nodes?

Assume a recursive Kronecker model. Fit it to G. We get K =

What does this tell about the network structure?

47

0.9 0.5

0.5 0.1

Core0.9 edges

Periphery0.1 edges

0.5 edges

0.5 edges

As opposed to:

0.9 0.1

0.1 0.9

which gives a hierarchy

0.9 0.9

0.1

0.1

Page 48: Jure Leskovec, Cornell/Stanford University - cs.cmu.edujure/pub/ncp-cornell-oct08.pdfThe Network Community Profile (NCP) plot of the graph is: How community like is a set of nodes?

48