statistical properties of network community structure

26
Statistical properties of network community structure Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research

Upload: cade

Post on 23-Feb-2016

30 views

Category:

Documents


0 download

DESCRIPTION

Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research. Statistical properties of network community structure. Network communities. Communities: Sets of nodes with lots of connections inside and few to outside (the rest of the network) Assumption: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Statistical properties of network community structure

Statistical properties of network community structureJure Leskovec, CMUKevin Lang, Anirban Dasgupta and Michael MahoneyYahoo! Research

Page 2: Statistical properties of network community structure

Network communities

Communities: Sets of nodes with lots

of connections inside and few to outside (the rest of the network)

Assumption: Networks are

(hierarchically) composed of communities (modules)

Communities, clusters, groups,

modules

Page 3: Statistical properties of network community structure

Community score (quality) How community like is a

set of nodes? Want a measure that

corresponds to intuition Conductance

(normalized cut):Φ(S) = # edges cut / # edges inside

Small Φ(S) corresponds to more community-like sets of nodes

Page 4: Statistical properties of network community structure

Community score (quality)

Score: Φ(S) = # edges cut / # edges inside

Page 5: Statistical properties of network community structure

Community score (quality)

Score: Φ(S) = # edges cut / # edges inside

Bad communit

yΦ=5/7 = 0.7

Page 6: Statistical properties of network community structure

Community score (quality)

Score: Φ(S) = # edges cut / # edges inside

Better communit

y

Φ=5/7 = 0.7

Bad communit

y

Φ=2/5 = 0.4

Page 7: Statistical properties of network community structure

Community score (quality)

Score: Φ(S) = # edges cut / # edges inside

Better communit

y

Φ=5/7 = 0.7

Bad communit

y

Φ=2/5 = 0.4

Best communit

yΦ=2/8 = 0.25

Page 8: Statistical properties of network community structure

Network Community Profile Plot We define:

Network community profile (NCP) plotPlot the score of best community of size k

Search over all subsets of size k and find best: Φ(k=5) = 0.25

NCP plot is intractable to compute Use approximation algorithm

Page 9: Statistical properties of network community structure

NCP plot: Small Social Network Dolphin social network

Two communities of dolphins

NCP plotNetwork

Page 10: Statistical properties of network community structure

NCP plot: Zachary’s karate club

Zachary’s university karate club social network During the study club split into 2 The split (squares vs. circles) corresponds to cut B

NCP plotNetwork

Page 11: Statistical properties of network community structure

NCP plot: Network Science Collaborations between scientists in Networks

NCP plotNetwork

Page 12: Statistical properties of network community structure

Geometric and Hierarchical graphs

Hierarchical network

Geometric (grid-like) network – Small social

networks– Geometric and– Hierarchical networkhave downward NCP plot

Page 13: Statistical properties of network community structure

Our work: Large networks Previously researchers examined community

structure of small networks (~100 nodes) We examined more than 70 different large

social and information networks

Large real-world networks look

completely different!

Page 14: Statistical properties of network community structure

Example of our findings Typical example:

General relativity collaboration network (4,158 nodes, 13,422 edges)

Page 15: Statistical properties of network community structure

NCP: LiveJournal (N=5M, E=42M)

Better and better

communities

Worse and worse

communities

Best community

has 100 nodes

Com

mun

ity sc

ore

Community size

Page 16: Statistical properties of network community structure

Explanation: Downward part Whiskers are responsible for

downward slope of NCP plot

Whisker is a set of nodes connected to the network by a single

edge

NCP plot

Largest whisker

Page 17: Statistical properties of network community structure

Explanation: Upward part Each new edge inside the

community costs moreNCP plot

Φ=2/1 = 2Φ=8/3 = 2.6

Φ=64/11 = 5.8

Each node has twice as many

children

Page 18: Statistical properties of network community structure

Suggested Network Structure

Network structure: Core-

periphery, jellyfish, octopus

Whiskers are responsible for

good communities

Denser and denser core

of the network

Core contains 60%

node and 80% edges

Page 19: Statistical properties of network community structure

Caveat: Bag of whiskersWhat if we allow

cuts that give disconnected communities?

Cut all whiskers and compose

communities out of them

Page 20: Statistical properties of network community structure

Caveat: Bag of whiskersCo

mm

unity

scor

e

Community size

We get better community scores when

composing disconnected sets of

whiskersConnected communitie

sBag of whiskers

Page 21: Statistical properties of network community structure

Comparison to a rewired network

Rewired network: random

network with same degree distribution

Page 22: Statistical properties of network community structure

What is a good model?

What is a good model that explains such network structure?

None of the existing models work

Pref. attachment Small World Geometric Pref. Attachment

FlatDown and Flat

Flat and Down

Page 23: Statistical properties of network community structure

Forest Fire model works Forest Fire: connections spread like a fire

New node joins the network Selects a seed node Connects to some of its neighbors Continue recursively

As community grows it

blends into the core of

the network

Page 24: Statistical properties of network community structure

Forest Fire NCP plot

rewired

network

Bag of whisk

ers

Page 25: Statistical properties of network community structure

Conclusion and connections Whiskers:

Largest whisker has ~100 nodes Independent of network size Dunbar number: a person can maintain social

relationship to 150 people Bond vs. identity communites

Core: Newman et al. analyzed 400k node product network▪ Largest community has 50% nodes▪ Community was labeled “miscelaneous”

Page 26: Statistical properties of network community structure

Conclusion and connections NCP plot is a way to analyze network

community structure Our results agree with previous work on small

networks But large networks are fundamentally

different Large networks have core-periphery structure

Small well isolated communities blend into the core of the networks as they grow