mtaap12: scalable community detection

35
Scalable Multi-threaded Community Detection in Social Networks E. Jason Riedy 1 , David A. Bader 1 , and Henning Meyerhenke 2 1 School of Comp. Science and Engineering, Georgia Inst. of Technology 2 Inst. of Theoretical Informatics, Karlsruhe Inst. of Technology (KIT) 25 May 2012

Upload: jason-riedy

Post on 29-Aug-2014

1.252 views

Category:

Technology


0 download

DESCRIPTION

Presented at MTAAP12 in Shanghai.

TRANSCRIPT

Page 1: MTAAP12: Scalable Community Detection

Scalable Multi-threaded Community Detectionin Social Networks

E. Jason Riedy1, David A. Bader1, and Henning Meyerhenke2

1School of Comp. Science and Engineering, Georgia Inst. of Technology2Inst. of Theoretical Informatics, Karlsruhe Inst. of Technology (KIT)

25 May 2012

Page 2: MTAAP12: Scalable Community Detection

Exascale data analysis

Health care Finding outbreaks, population epidemiology

Social networks Advertising, searching, grouping

Intelligence Decisions at scale, regulating algorithms

Systems biology Understanding interactions, drug design

Power grid Disruptions, conservation

Simulation Discrete events, cracking meshes

• Graph clustering is common in all application areas.

MTAAP 2012—Scalable Community Detection—Jason Riedy 2/35

Page 3: MTAAP12: Scalable Community Detection

These are not easy graphs.Yifan Hu’s (AT&T) visualization of the in-2004 data set

http://www2.research.att.com/~yifanhu/gallery.html

MTAAP 2012—Scalable Community Detection—Jason Riedy 3/35

Page 4: MTAAP12: Scalable Community Detection

But no shortage of structure...

Protein interactions, Giot et al., “A ProteinInteraction Map of Drosophila melanogaster”,Science 302, 1722-1736, 2003.

Jason’s network via LinkedIn Labs

• Locally, there are clusters or communities.

• First pass over a massive social graph:• Find smaller communities of interest.• Analyze / visualize top-ranked communities.

• Our part: Community detection at massive scale. (Or kindalarge, given available data.)

MTAAP 2012—Scalable Community Detection—Jason Riedy 4/35

Page 5: MTAAP12: Scalable Community Detection

Outline

Motivation

Defining community detection and metrics

Shooting for massive graphs

Our parallel method

Implementation and platform details

Performance

Conclusions and plans

MTAAP 2012—Scalable Community Detection—Jason Riedy 5/35

Page 6: MTAAP12: Scalable Community Detection

Community detection

What do we mean?• Partition a graph’s

vertices into disjointcommunities.

• A community locallymaximizes some metric.• Modularity,

conductance, ...

• Trying to capture thatvertices are more similarwithin one communitythan betweencommunities. Jason’s network via LinkedIn Labs

MTAAP 2012—Scalable Community Detection—Jason Riedy 6/35

Page 7: MTAAP12: Scalable Community Detection

Community detection

Assumptions

• Disjoint partitioning ofvertices.

• There is no one uniqueanswer.• Many metrics are

NP-complete tooptimize (Brandes, etal.[1]).

• Graph is lossyrepresentation.

• Want an adaptabledetection method. Jason’s network via LinkedIn Labs

MTAAP 2012—Scalable Community Detection—Jason Riedy 7/35

Page 8: MTAAP12: Scalable Community Detection

Common community metric: Modularity

• Modularity: Deviation of connectivity in the communityinduced by a vertex set S from some expected backgroundmodel of connectivity.

• We take Newman [2]’s basic uniform model.

• Let m count all edges in graph G, mS count of edges withboth endpoints in S, and xS count the edges with anyendpoint in S. Modularity QS :

QS = (mS − x2S/4m)/m

• Total modularity: sum of modularities of disjoint subsets.

• A sufficiently positive modularity implies some structure.

• Known issues: Resolution limit, NP-complete opt. prob.

MTAAP 2012—Scalable Community Detection—Jason Riedy 8/35

Page 9: MTAAP12: Scalable Community Detection

Can we tackle massive graphs now?

Parallel, of course...• Massive needs distributed memory, right?

• Well... Not really. Can buy a 2 TiB Intel-based Dell serveron-line for around $200k USD, a 1.5 TiB from IBM, etc.

Image: dell.com.

Not an endorsement, just evidence!

• Publicly available “real-world” data fits...

• Start with shared memory to see what needs done.

• Specialized architectures provide larger shared-memory viewsover distributed implementations (e.g. Cray XMT).

MTAAP 2012—Scalable Community Detection—Jason Riedy 9/35

Page 10: MTAAP12: Scalable Community Detection

Multi-threaded algorithm design points

A scalable multi-threaded graph analysis algorithm

• ... avoids global locks and frequent global synchronization.

• ... distributes computation over edges rather than only vertices.

• ... works with data as local to an edge as possible.

• ... uses compact data structures that agglomerate memoryreferences.

MTAAP 2012—Scalable Community Detection—Jason Riedy 10/35

Page 11: MTAAP12: Scalable Community Detection

Sequential agglomerative method

A

B

C

D

E

FG

• A common method (e.g. Clauset, etal. [3]) agglomerates vertices intocommunities.

• Each vertex begins in its owncommunity.

• An edge is chosen to contract.• Merging maximally increases

modularity.• Priority queue.

• Known often to fall into an O(n2)performance trap with modularity(Wakita & Tsurumi [4]).

MTAAP 2012—Scalable Community Detection—Jason Riedy 11/35

Page 12: MTAAP12: Scalable Community Detection

Sequential agglomerative method

A

B

C

D

E

FG

C

B

• A common method (e.g. Clauset, etal. [3]) agglomerates vertices intocommunities.

• Each vertex begins in its owncommunity.

• An edge is chosen to contract.• Merging maximally increases

modularity.• Priority queue.

• Known often to fall into an O(n2)performance trap with modularity(Wakita & Tsurumi [4]).

MTAAP 2012—Scalable Community Detection—Jason Riedy 12/35

Page 13: MTAAP12: Scalable Community Detection

Sequential agglomerative method

A

B

C

D

E

FG

C

B

D

A

• A common method (e.g. Clauset, etal. [3]) agglomerates vertices intocommunities.

• Each vertex begins in its owncommunity.

• An edge is chosen to contract.• Merging maximally increases

modularity.• Priority queue.

• Known often to fall into an O(n2)performance trap with modularity(Wakita & Tsurumi [4]).

MTAAP 2012—Scalable Community Detection—Jason Riedy 13/35

Page 14: MTAAP12: Scalable Community Detection

Sequential agglomerative method

A

B

C

D

E

FG

C

B

D

A

B

C

• A common method (e.g. Clauset, etal. [3]) agglomerates vertices intocommunities.

• Each vertex begins in its owncommunity.

• An edge is chosen to contract.• Merging maximally increases

modularity.• Priority queue.

• Known often to fall into an O(n2)performance trap with modularity(Wakita & Tsurumi [4]).

MTAAP 2012—Scalable Community Detection—Jason Riedy 14/35

Page 15: MTAAP12: Scalable Community Detection

Parallel agglomerative method

A

B

C

D

E

FG

• We use a matching to avoid the queue.

• Compute a heavy weight, largematching.• Simple greedy algorithm.• Maximal matching.• Within factor of 2 in weight.

• Merge all matched communities atonce.

• Maintains some balance.

• Produces different results.

• Agnostic to weighting, matching...• Can maximize modularity, minimize

conductance.• Modifying matching permits easy

exploration.

MTAAP 2012—Scalable Community Detection—Jason Riedy 15/35

Page 16: MTAAP12: Scalable Community Detection

Parallel agglomerative method

A

B

C

D

E

FG

C

D

G

• We use a matching to avoid the queue.

• Compute a heavy weight, largematching.• Simple greedy algorithm.• Maximal matching.• Within factor of 2 in weight.

• Merge all matched communities atonce.

• Maintains some balance.

• Produces different results.

• Agnostic to weighting, matching...• Can maximize modularity, minimize

conductance.• Modifying matching permits easy

exploration.

MTAAP 2012—Scalable Community Detection—Jason Riedy 16/35

Page 17: MTAAP12: Scalable Community Detection

Parallel agglomerative method

A

B

C

D

E

FG

C

D

G

E

B

C

• We use a matching to avoid the queue.

• Compute a heavy weight, largematching.• Simple greedy algorithm.• Maximal matching.• Within factor of 2 in weight.

• Merge all matched communities atonce.

• Maintains some balance.

• Produces different results.

• Agnostic to weighting, matching...• Can maximize modularity, minimize

conductance.• Modifying matching permits easy

exploration.

MTAAP 2012—Scalable Community Detection—Jason Riedy 17/35

Page 18: MTAAP12: Scalable Community Detection

Platform: Cray XMT2

Tolerates latency by massive multithreading.

• Hardware: 128 threads per processor• Context switch on every cycle (500 MHz)• Many outstanding memory requests (180/proc)• “No” caches...

• Flexibly supports dynamic load balancing• Globally hashed address space, no data cache

• Support for fine-grained, word-level synchronization• Full/empty bit on with every memory word

• 64 processor XMT2 at CSCS,the Swiss NationalSupercomputer Centre.

• 500 MHz processors, 8192threads, 2 TiB of sharedmemory

Image: cray.com

MTAAP 2012—Scalable Community Detection—Jason Riedy 18/35

Page 19: MTAAP12: Scalable Community Detection

Platform: Intel R© E7-8870-based server

Tolerates some latency by hyperthreading.

• “Westmere:” 2 threads / core, 10 cores / socket, four sockets.• Fast cores (2.4 GHz), fast memory (1 066 MHz).• Not so many outstanding memory requests (60/socket), but

large caches (30 MiB L3 per socket).

• Good system support• Transparent hugepages reduces TLB costs.• Fast, user-level locking. (HLE would be better...)• OpenMP, although I didn’t tune it...

• mirasol, #17 on Graph500(thanks to UCB)

• Four processors (80 threads),256 GiB memory

• gcc 4.6.1, Linux kernel3.2.0-rc5

Image: Intel R© press kit

MTAAP 2012—Scalable Community Detection—Jason Riedy 19/35

Page 20: MTAAP12: Scalable Community Detection

Platform: Other Intel R©-based servers

Different design points

• “Nehalem” X5570: 2.93 GHz, 2 threads/core, 4 cores/socket,2 sockets, 8 MiB cache/socket

• “Westmere” X5650: 2.66 GHz, 2 threads/core, 6 cores/socket,2 sockets, 12 MiB cache/socket

• All with 1 066 MHz memory.

• Does the Westmere E7-8870’s scale affect performance?

• Nodes in Georgia Tech CSEcluster jinx

• 24-48 GiB memory, smalltests

Image: Intel R© press kit

MTAAP 2012—Scalable Community Detection—Jason Riedy 20/35

Page 21: MTAAP12: Scalable Community Detection

Implementation: Data structures

Extremely basic for graph G = (V,E)

• An array of (i, j;w) weighted edge pairs, each i, j stored onlyonce and packed, uses 3|E| space

• An array to store self-edges, d(i) = w, |V |• A temporary floating-point array for scores, |E|• A additional temporary arrays using 4|V |+ 2|E| to store

degrees, matching choices, offsets...

• Weights count number of agglomerated vertices or edges.

• Scoring methods (modularity, conductance) need onlyvertex-local counts.

• Storing an undirected graph in a symmetric manner reducesmemory usage drastically and works with our simple matcher.

MTAAP 2012—Scalable Community Detection—Jason Riedy 21/35

Page 22: MTAAP12: Scalable Community Detection

Implementation: Data structures

Extremely basic for graph G = (V,E)

• An array of (i, j;w) weighted edge pairs, each i, j stored onlyonce and packed, uses 3|E| space

• An array to store self-edges, d(i) = w, |V |• A temporary floating-point array for scores, |E|• A additional temporary arrays using 4|V |+ 2|E| to store

degrees, matching choices, offsets...

• Original ignored order in edge array, killed OpenMP.

• New: Roughly bucket edge array by first stored index.Non-adjacent CSR-like structure.

• New: Hash i, j to determine order. Scatter among buckets.

MTAAP 2012—Scalable Community Detection—Jason Riedy 22/35

Page 23: MTAAP12: Scalable Community Detection

Implementation: Routines

Three primitives: Scoring, matching, contracting

Scoring Trivial.

Matching Repeat until no ready, unmatched vertex:

1 For each unmatched vertex in parallel, find thebest unmatched neighbor in its bucket.

2 Try to point remote match at that edge (lock,check if best, unlock).

3 If pointing succeeded, try to point self-match atthat edge.

4 If both succeeded, yeah! If not and there wassome eligible neighbor, re-add self to ready,unmatched list.

(Possibly too simple, but...)

MTAAP 2012—Scalable Community Detection—Jason Riedy 23/35

Page 24: MTAAP12: Scalable Community Detection

Implementation: Routines

Contracting

1 Map each i, j to new vertices, re-order by hashing.

2 Accumulate counts for new i′ bins, prefix-sum for offset.

3 Copy into new bins.

• Only synchronizing in the prefix-sum. That could be removed ifI don’t re-order the i′, j′ pair; haven’t timed the difference.

• Actually, the current code copies twice... On short list forfixing.

• Binning as opposed to original list-chasing enabledIntel/OpenMP support with reasonable performance.

MTAAP 2012—Scalable Community Detection—Jason Riedy 24/35

Page 25: MTAAP12: Scalable Community Detection

Performance summary

Two moderate-sized graphs, one large

Graph |V | |E| Reference

rmat-24-16 15 580 378 262 482 711 [5, 6]soc-LiveJournal1 4 847 571 68 993 773 [7]

uk-2007-05 105 896 555 3 301 876 564 [8]

Peak processing rates in edges/second

Platform rmat-24-16 soc-LiveJournal1 uk-2007-05

X5570 1.83× 106 3.89× 106

X5650 2.54× 106 4.98× 106

E7-8870 5.86× 106 6.90× 106 6.54× 106

XMT 1.20× 106 0.41× 106

XMT2 2.11× 106 1.73× 106 3.11× 106

MTAAP 2012—Scalable Community Detection—Jason Riedy 25/35

Page 26: MTAAP12: Scalable Community Detection

Performance: Time to solution

Threads (OpenMP) / Processors (XMT)

Tim

e (s

) 3

10

32

100

316

1000

3162

3

10

32

100

316

1000

3162

rmat−24−16

●●●●●●

●●●●●●

●●●●●●●●●●●●

●●●●●●

1 2 4 8 16 32 64 128

soc−LiveJournal1

●●●●●●

●●●●●●

●●●●●●

●●●●●●●●●●●●

1 2 4 8 16 32 64 128

IntelC

ray XM

T

Platform

● X5570 (4−core) X5650 (6−core) E7−8870 (10−core) XMT XMT2

MTAAP 2012—Scalable Community Detection—Jason Riedy 26/35

Page 27: MTAAP12: Scalable Community Detection

Performance: Rate (edges/second)

Threads (OpenMP) / Processors (XMT)

Edg

es p

er s

econ

d

105

105.5

106

106.5

107

107.5

105

105.5

106

106.5

107

107.5

rmat−24−16

●●●●●●

●●●●●●

●●●●●●

●●●●●●

●●●●●●

1 2 4 8 16 32 64 128

soc−LiveJournal1

●●

●●

●●

●●

●●

●●●

●●●●●●

●●●●

●●●●●

1 2 4 8 16 32 64 128

IntelC

ray XM

T

Platform

● X5570 (4−core) X5650 (6−core) E7−8870 (10−core) XMT XMT2

MTAAP 2012—Scalable Community Detection—Jason Riedy 27/35

Page 28: MTAAP12: Scalable Community Detection

Performance: Modularity

step

Mod

ular

ity

0.0

0.2

0.4

0.6

0.8

●●

0 10 20 30

Graph

● coAuthorsCiteseer ● eu−2005 ● uk−2002

Termination metric

● Coverage Max Average

• Timing results: Stop when coverage ≥ 0.5 (communities cover 1/2 edges).

• More work ⇒ higher modularity. Choice up to application.

MTAAP 2012—Scalable Community Detection—Jason Riedy 28/35

Page 29: MTAAP12: Scalable Community Detection

Performance: Small-scale speedup

Threads (OpenMP) / Processors (XMT)

Spe

ed−

up o

ver

one

thre

ad (

Ope

nMP

) or

pro

cess

or (

XM

T)

1

2

4

8

16

32

1

2

4

8

16

32

rmat−24−16

●●

●●●

●●●●●●

●●●●●●

●●●●●●

1 2 4 8 16 32 64 128

soc−LiveJournal1

●●●●●●

●●●

●●●

●●●●●●

●●●●●●

1 2 4 8 16 32 64 128

IntelC

ray XM

T

Platform

● X5570 (4−core) X5650 (6−core) E7−8870 (10−core) XMT XMT2

MTAAP 2012—Scalable Community Detection—Jason Riedy 29/35

Page 30: MTAAP12: Scalable Community Detection

Performance: Large-scale time

Threads (OpenMP) / Processors (XMT)

Tim

e (s

)

210

212

214

● ● ●

●● ●

● ●

504.9s

1063s

6917s

31568s

1 2 4 8 16 32 64

Platform

● E7−8870 (10−core) XMT2

MTAAP 2012—Scalable Community Detection—Jason Riedy 30/35

Page 31: MTAAP12: Scalable Community Detection

Performance: Large-scale speedup

Threads (OpenMP) / Processors (XMT)

Spe

ed u

p ov

er s

ingl

e pr

oces

sor/

thre

ad

21

22

23

24

● ● ●

● ●

● ●

13.7x

29.6x

1 2 4 8 16 32 64

Platform

● E7−8870 (10−core) XMT2

MTAAP 2012—Scalable Community Detection—Jason Riedy 31/35

Page 32: MTAAP12: Scalable Community Detection

Conclusions and plans

• Code:http://www.cc.gatech.edu/~jriedy/community-detection/

• Some low-hanging fruit remains:• Eliminate one unnecessary copy during contraction.• Deal with stars.

• Then... Practical experiments.• How volatile are modularity and conductance to perturbations?• What matching schemes work well?• How do different metrics compare in applications?

• Extending to streaming graph data!• Includes developing parallel refinement...• And possibly de-clustering or manipulating the dendogram....• Very much WIP, more tricky than anticipated.

MTAAP 2012—Scalable Community Detection—Jason Riedy 32/35

Page 33: MTAAP12: Scalable Community Detection

Acknowledgment of support

MTAAP 2012—Scalable Community Detection—Jason Riedy 33/35

Page 34: MTAAP12: Scalable Community Detection

Bibliography I

U. Brandes, D. Delling, M. Gaertler, R. Gorke, M. Hoefer,Z. Nikoloski, and D. Wagner, “On modularity clustering,” IEEETrans. Knowledge and Data Engineering, vol. 20, no. 2, pp.172–188, 2008.

M. Newman, “Modularity and community structure innetworks,” Proc. of the National Academy of Sciences, vol. 103,no. 23, pp. 8577–8582, 2006.

A. Clauset, M. Newman, and C. Moore, “Finding communitystructure in very large networks,” Physical Review E, vol. 70,no. 6, p. 66111, 2004.

K. Wakita and T. Tsurumi, “Finding community structure inmega-scale social networks,” CoRR, vol. abs/cs/0702048, 2007.

MTAAP 2012—Scalable Community Detection—Jason Riedy 34/35

Page 35: MTAAP12: Scalable Community Detection

Bibliography II

D. Chakrabarti, Y. Zhan, and C. Faloutsos, “R-MAT: Arecursive model for graph mining,” in Proc. 4th SIAM Intl.Conf. on Data Mining (SDM). Orlando, FL: SIAM, Apr. 2004.

D. Bader, J. Gilbert, J. Kepner, D. Koester, E. Loh,K. Madduri, W. Mann, and T. Meuse, HPCS SSCA#2 GraphAnalysis Benchmark Specifications v1.1, Jul. 2005.

J. Leskovec, “Stanford large network dataset collection,” Athttp://snap.stanford.edu/data/, Oct. 2011.

P. Boldi, B. Codenotti, M. Santini, and S. Vigna, “Ubicrawler:A scalable fully distributed web crawler,” Software: Practice &Experience, vol. 34, no. 8, pp. 711–726, 2004.

MTAAP 2012—Scalable Community Detection—Jason Riedy 35/35