statistical properties of network community structure

27
Statistical properties of network community structure Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

Upload: yukio

Post on 23-Feb-2016

24 views

Category:

Documents


0 download

DESCRIPTION

Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research. Statistical properties of network community structure. Network communities. Communities: Sets of nodes with lots of connections inside and few to outside (the rest of the network) Assumption: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Statistical properties of network community structure

Statistical properties of network community structureJure Leskovec, CMUKevin Lang, Anirban Dasgupta and Michael MahoneyYahoo! Research

Page 2: Statistical properties of network community structure

2

Network communities

Communities: Sets of nodes with lots

of connections inside and few to outside (the rest of the network)

Assumption: Networks are

(hierarchically) composed of communities (modules)

Communities, clusters, groups,

modules

Our question:Are large

networks really like this?

Page 3: Statistical properties of network community structure

3

Community score (quality) How community like is a

set of nodes?

Need a natural intuitive measure

Conductance (normalized cut)Φ(S) = # edges cut / # edges inside

Small Φ(S) corresponds to more community-like sets of nodes

S

S’

Page 4: Statistical properties of network community structure

4

Community score (quality)

Score: Φ(S) = # edges cut / # edges inside

What is “best”

community of 5 nodes?

Page 5: Statistical properties of network community structure

5

Community score (quality)

Score: Φ(S) = # edges cut / # edges inside

Bad communit

yΦ=5/6 = 0.83

What is “best”

community of 5 nodes?

Page 6: Statistical properties of network community structure

6

Community score (quality)

Score: Φ(S) = # edges cut / # edges inside

Better communit

y

Φ=5/7 = 0.7

Bad communit

y

Φ=2/5 = 0.4

What is “best”

community of 5 nodes?

Page 7: Statistical properties of network community structure

7

Community score (quality)

Score: Φ(S) = # edges cut / # edges inside

Better communit

y

Φ=5/7 = 0.7

Bad communit

y

Φ=2/5 = 0.4

Best communit

yΦ=2/8 = 0.25

What is “best”

community of 5 nodes?

Page 8: Statistical properties of network community structure

8

Network Community Profile Plot We define:

Network community profile (NCP) plotPlot the score of best community of size k

Search over all subsets of size k and find best: Φ(k=5) = 0.25

NCP plot is intractable to compute Use approximation algorithm

Page 9: Statistical properties of network community structure

9

NCP plot: Small Social Network Dolphin social network

Two communities of dolphins

NCP plotNetwork

Page 10: Statistical properties of network community structure

10

NCP plot: Zachary’s karate club

Zachary’s university karate club social network During the study club split into 2 The split (squares vs. circles) corresponds to cut B

NCP plotNetwork

Page 11: Statistical properties of network community structure

11

NCP plot: Network Science Collaborations between scientists in Networks

NCP plotNetwork

Page 12: Statistical properties of network community structure

12

Geometric and Hierarchical graphs

Hierarchical network

Geometric (grid-like) network – Small social

networks– Geometric and– Hierarchical networkhave downward NCP plot

Page 13: Statistical properties of network community structure

13

Our work: Large networks Previously researchers examined community

structure of small networks (~100 nodes) We examined more than 70 different large

social and information networks

Large real-world networks look very

different!

Page 14: Statistical properties of network community structure

14

Example of our findings Typical example:

General relativity collaboration network (4,158 nodes, 13,422 edges)

Page 15: Statistical properties of network community structure

15

Com

mun

ity sc

ore

Community size

NCP: LiveJournal (N=5M, E=42M)

Better and better

communities

Best communities get worse and worse

Best community

has 100 nodes

Page 16: Statistical properties of network community structure

16

Explanation: Downward part Whiskers are responsible for

downward slope of NCP plot

Whisker is a set of nodes connected to the network by a single

edge

NCP plot

Largest whisker

Page 17: Statistical properties of network community structure

17

Explanation: Upward part Each new edge inside the

community costs moreNCP plot

Φ=2/4 = 0.5Φ=8/6 = 1.3

Φ=64/14 = 4.5

Each node has twice as many

children

Φ=1/3 = 0.33

Page 18: Statistical properties of network community structure

18

Suggested network structure

Network structure: Core-

periphery (jellyfish, octopus)

Whiskers are responsible for

good communities

Denser and denser

core of the network

Core contains 60%

node and 80% edges

Page 19: Statistical properties of network community structure

19

Caveat: Bag of whiskersWhat if we allow

cuts that give disconnected communities?

•Cut all whiskers •Compose communities out of whiskers•How good “community” do we get?

Page 20: Statistical properties of network community structure

20

Communities made of whiskers

Com

mun

ity sc

ore

Community size

We get better community scores when

composing disconnected sets of

whiskersConnected communitie

sBag of whiskers

Page 21: Statistical properties of network community structure

21

Comparison to rewire network Take a real network G Rewire edges for a long time We obtain a random graph with same degree

distribution as the real network G

Page 22: Statistical properties of network community structure

22

Comparison to a rewired network

Rewired network: random

network with same degree distribution

Page 23: Statistical properties of network community structure

23

What is a good model?

What is a good model that explains such network structure?

None of the existing models work

Pref. attachment Small World Geometric Pref. Attachment

FlatDown and Flat

Flat and Down

Page 24: Statistical properties of network community structure

24

Forest Fire model works Forest Fire: connections spread like a fire

New node joins the network Selects a seed node Connects to some of its neighbors Continue recursively

As community grows it

blends into the core of

the network

Page 25: Statistical properties of network community structure

25

Forest Fire NCP plot

rewired

network

Bag of whisk

ers

Page 26: Statistical properties of network community structure

26

Conclusion and connections Whiskers:

Largest whisker has ~100 nodes Independent of network size Dunbar number: a person can maintain social

relationship to at most 150 people Bond vs. identity communities

Core: Core has little structure (hard to cut) Still more structure than the random network

Page 27: Statistical properties of network community structure

27

Conclusion and connections NCP plot is a way to analyze network

community structure Our results agree with previous work on small

networks (that are commonly used for testing community finding algorithms)

But large networks are different Large networks

Whiskers + Core structure Small well isolated communities blend into the core

of the networks as they grow