efficient graph processing with distributed immutable view rong chen rong chen +, xin ding +, peng...

52
Efficient Graph Processing with Distributed Immutable View Rong Chen + , Xin Ding + , Peng Wang + , Haibo Chen + , Binyu Zang + and Haibing Guan * Institute of Parallel and Distributed Systems + Department of Computer Science * Shanghai Jiao Tong University 2014 HPDC Communication Computation

Upload: oscar-knight

Post on 20-Jan-2016

222 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

Efficient Graph Processing

with Distributed Immutable View

Rong Chen+, Xin Ding+, Peng Wang+, Haibo Chen+, Binyu Zang+ and Haibing Guan*

Institute of Parallel and Distributed Systems +

Department of Computer Science *

Shanghai Jiao Tong University

2014HPDC

CommunicationComputation

Page 2: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

100 Hrs of Video

every minute

1.11 Billion Users

6 Billion Photos400 Million

Tweets/day

How do we understand and use Big Data?

Big Data Everywhere

Page 3: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

100 Hrs of Video

every minute

1.11 Billion Users

6 Billion Photos400 Million

Tweets/day

NLP

Big Data Big Learning

Machine Learning and Data Mining

Page 4: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

It’s about the graphs ...

Page 5: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

4 5

3 1 4

Example: PageRank

A centrality analysis algorithm to measure the relative rank for each element of a linked set

Characteristics□ Linked set data dependence□ Rank of who links it local accesses□ Convergence iterative computation

∑( 𝑗 , 𝑖 )∈𝐸

❑𝜔 𝑖𝑗𝑅 𝑗𝛼+(1−𝛼)𝑅𝑖=¿

4 5

1 23

4 5

3 1 4

4 5

3 1 21

Page 6: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

Existing Graph-parallel Systems

“Think as a vertex” philosophy1. aggregate value of neighbors2. update itself value3. activate neighbors

compute (v) PageRank

double sum = 0double value, last =

v.get ()foreach (n in v.in_nbrs) sum += n.value /

n.nedges;

value = 0.15 + 0.85 * sum;

v.set (value);

activate (v.out_nbrs);

1

2

3

4 5

1 23

Page 7: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

Existing Graph-parallel Systems

“Think as a vertex” philosophy1. aggregate value of neighbors2. update itself value3. activate neighbors

Execution Engine□ sync: BSP-like model□ async: dist. sched_queues

Communication□ message passing: push value□ dist. shared memory: sync & pull

value

4 5

1 23

1 2

3 4 1

423

comp.

comm.

1 2push

1 1pull

2sync

barrier

Page 8: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

Issues of Existing Systems

Pregel[SIGMOD’09]→ Sync engine→ Edge-cut

+ Message Passingw/o dynamic

comp.high contention

3

keep alive

21

4x1

x1

2 1 master

2 1 replica

msg

GraphLab[VLDB’12]

PowerGraph[OSDI’12]

Page 9: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

Issues of Existing Systems

Pregel[SIGMOD’09]→ Sync engine→ Edge-cut

+ Message Passing

GraphLab[VLDB’12]→ Async engine→ Edge-cut

+ DSM (replicas)w/o dynamic

comp.high contention

high contention

hard to programduplicated

edgesheavy comm. cost

3

keep alive

2

233

1 1

2

replica

11

44x1

x1

x2 x

2

5

dup

2 1 master

2 1 replica

msg

PowerGraph[OSDI’12]

Page 10: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

Issues of Existing Systems

Pregel[SIGMOD’09]→ Sync engine→ Edge-cut

+ Message Passing

GraphLab[VLDB’12]→ Async engine→ Edge-cut

+ DSM (replicas)

PowerGraph[OSDI’12]→ (A)Sync engine → Vertex-cut

+ GAS (replicas)w/o dynamic comp.

high contention

high contention

hard to programduplicated

edges

heavy comm. cost

high contentionheavy comm.

cost

3

keep alive

2

3

1 1

2

1

x5

x5

1

44x1

x1

233

1 1

2

replica

1

4x2 x

2

5

2 1 master

2 1 replica

msg

5

dup

Page 11: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

Contributions

Distributed Immutable View□ Easy to program/debug□ Support dynamic computation□ Minimized communication cost (x1 /replica)□ Contention (comp. & comm.) immunity

Multicore-based Cluster Support□ Hierarchical sync. & deterministic execution□ Improve parallelism and locality

Page 12: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

Outline

Distributed Immutable View→ Graph organization→ Vertex computation→ Message passing→ Change of execution flow

Multicore-based Cluster Support→ Hierarchical model→ Parallelism improvement

Evaluation

Page 13: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

General Idea

: For most graph algorithms, vertex only aggregates neighbors’ data in one direction and activates in another direction□ e.g. PageRank, SSSP, Community Detection, …

Observation

Local aggregation/update & distributed activation□ Partitioning: avoid duplicate edges□ Computation: one-way local semantics□ Communication: merge update & activate messages

Page 14: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

Graph Organization

Partitioning graph and build local sub-graph□ Normal edge-cut: randomized (e.g., hash-based)

or heuristic (e.g., Metis)□ Only create one direction edges (e.g., in-edges)

→ Avoid duplicated edges□ Create read-only replicas for edges spanning

machines

4 5

23 1

4

3 1

4

23 1

5

21

master

replica

M1 M2 M3

Page 15: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

Vertex Computation

Local aggregation/update□ Support dynamic computation

→ one-way local semantic□ Immutable view: read-only access neighbors

→ Eliminate contention on vertex

4 5

23 1

4

3 1

4

23 1

5

21

M1 M2 M3

read-only

Page 16: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

Communication

Sync. & Distributed Activation□ Merge update & activate messages

1. Update value of replicas2. Invite replicas to activate neighbors

4 5

23 1

4

3 1

4

23 1

5

21

rlist:W1 l-act: 1value: 8 msg: 4

l-act:3value:6 msg:3

msg: v|m|se.g. 8 4 0

M1 M2 M3

84

active

s0

Page 17: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

Communication

Distributed Activation□ Unidirectional message passing

→ Replica will never be activated→ Always master replicas → Contention immunity

4 5

23 1

4

3 1

4

23 1

5

21

M1 M2 M3

Page 18: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

in-q

ueu

es

M1

M3

out-queues

Change of Execution Flow

Original Execution Flow (e.g. Pregel)

5

parsing11

8

computation sending

1

4

7

10

receiving

high overhead

high contention

M2 M3

M1

thread

vertex

message

4

2

Page 19: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

Change of Execution Flow

M1

M3

out-queuescomputation sending

1

4

7

10

receiving lock-free

2

3

8

9

5

2

11

8

4

3

1

6

17 4

47

4

7

1

3

6

Execution Flow on Distributed Immutable View

low overhead

no contention

thread

master

4replica

4

M2 M3

M1

Page 20: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

Outline

Distributed Immutable View→ Graph organization→ Vertex computation→ Message passing→ Change of execution flow

Multicore-based Cluster Support→ Hierarchical model→ Parallelism improvement

Evaluation

Page 21: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

Multicore Support

Two Challenges1. Two-level hierarchical organization

→ Preserve synchronous and deterministic computation nature (easy to program/debug)

2. Original BSP-like model is hard to parallelize → High contention to buffer and parse

messages→ Poor locality in message parsing

Page 22: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

Hierarchical Model

Design Principle□ Three level: iteration worker thread□ Only the last-level participants perform actual

tasks□ Parents (i.e. higher level participants) just wait

until all children finish their tasks

loop

tasktasktask

Level-0Level-1Level-2

worker

thread

iteration

global barrier

local barrier

Page 23: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

Parallelism Improvement

Original BSP-like model is hard to parallelize

M1

M3

out-queues

in-q

ueu

es 5

parsing

2

11

8

computation sending

1

4

7

10

receiving

thread

vertex

message

4

M2 M3

M1

Page 24: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

Parallelism Improvement

Original BSP-like model is hard to parallelize

M1

M3

priv. out-queues

in-q

ueu

es 5

parsing

2

11

8

computation sending

1

4

7

10

receiving

M1

M3

high contention

poor locality

thread

vertex

message

4

M2 M3

M1

Page 25: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

Parallelism Improvement

M1

M3

out-queues

1

4

7

10

2

3

8

9

5

2

11

8

4

3

1

6

17 4

47

1

7

4

6

3

computation sending receiving

Distributed immutable view opens an opportunity

thread

master

4replica

4

M2 M3

M1

Page 26: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

M2 M3

M1

Parallelism Improvement

M1

M3

priv. out-queues

1

4

7

10 M1

M3

2

3

8

9

5

2

11

8

1

7

4

4

71

7

4

6

3 4

3

1

6poor locality

lock-freecomputation sending receiving

Distributed immutable view opens an opportunity

thread

master

4replica

4

Page 27: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

Parallelism Improvement

M1

M31

4

7

10 M1

M3

2

3

8

9

5

2

11

8

1

7

4

4

71

7

4

3

6 6

3

1

4

lock-freecomputation sending receiving

Distributed immutable view opens an opportunity

no interference

thread

master

4replica

4

M2 M3

M1

priv. out-queues

Page 28: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

M2 M3

M1

Parallelism Improvement

Distributed immutable view opens an opportunity

M1

M31

4

7

10 M1

M3

2

3

8

9

5

2

11

8

1

7

4

4

71

7

4

3

6 6

3

4

1

lock-free

sorted

computation sending receiving

good locality

thread

master

4replica

4

priv. out-queues

Page 29: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

Outline

Distributed Immutable View→ Graph organization→ Vertex computation→ Message passing→ Change of execution flow

Multicore-based Cluster Support→ Hierarchical model→ Parallelism improvement

Implementation & Experiment

Page 30: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

Implementation

Cyclops(MT)□ Based on (Java &

Hadoop)□ ~2,800 SLOC□ Provide mostly compatible user interface□ Graph ingress and partitioning

→ Compatible I/O-interface→ Add an additional phase to build replicas

□ Fault tolerance→ Incremental checkpoint→ Replication-based FT [DSN’14]

Page 31: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

Experiment Settings

Platform□ 6X12-core AMD Opteron (64G RAM, 1GigE NIC)

Graph Algorithms□ PageRank (PR), Community Detection (CD),

Alternating Least Squares (ALS), Single Source Shortest Path (SSSP)

Workload□ 7 real-world dataset from SNAP1 □ 1 synthetic dataset from GraphLab2

1http://snap.stanford.edu/data/

Dataset

|V| |E|

Amazon 0.4M 3.4M

GWeb 0.9M 5.1M

LJournal 4.8M 69M

Wiki 5.7M 130M

SYN-GL 0.1M 2.7M

DBLP 0.3M 1.0M

RoadCA 1.9M 5.5M

2http://graphlab.org

Page 32: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

Overall Performance Improvement

Amazon Gweb LJournal Wiki SYN-GL DBLP RoadCA0123456789

10 HamaCyclopsCyclopsMT

Norm

aliz

ed S

peedup

PageRank ALS CD SSSP

Push-mode

8.69X

2.06X

48 workers

6 workers(8)

Page 33: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

Performance Scalability

6 12 24 4805

101520253035 Hama

CyclopsCy-clopsMT

Norm

aliz

ed

Speedup

Amazon6 12 24 48

GWeb6 12 24 48

LJournal6 12 24 48

Wiki

50

.2

6 12 24 4805

101520253035

Norm

aliz

ed

Speedup

SYN-GL6 12 24 48

DBLP6 12 24 48

RoadCA

threads

workers

Page 34: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

Performance Breakdown

Amazon GWeb Ljournal Wiki SYN-GL DBLP RoadCA0.0

0.2

0.4

0.6

0.8

1.0

PARSESENDCOMPSYNC

Rati

o o

f Exe

c-Tim

e

PageRank ALS CD SSSP

0 6 12 18 24 300

100020003000400050006000

Iteration

#M

ess

ag

es

(K)

0 6 12 18 24 300

200

400

600

800

1000

Hama

Iteration#V

ert

ice

s (K

)

CyclopsMT

HamaCyclops

Page 35: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

Comparison with PowerGraph1

Amazon GWeb LJournal Wiki0

20406080

100120 CyclopsMT

Pow-er-Graph

Exe

c-Tim

e

(Sec)

Amazon GWeb LJournal Wiki0

500

1000

1500

2000

#M

ess

ages

(M)

Dataset

COMP%

Amazon 11%GWeb 15%

LJournal 25%Wiki 39%

Cyclops-like engine on GraphLab1 Platform

Preliminary Results

Regular Natural0

4

8

12

Exe

c-Tim

e

(Sec)

1http://graphlab.org 2synthetic 10-million vertex regular (even edge) and power-law (α=2.0) graphs

22

1C++ & Boost RPC lib.

Page 36: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

Conclusion

Cyclops: a new synchronous vertex-oriented graph processing system□ Preserve synchronous and deterministic

computation nature (easy to program/debug)□ Provide efficient vertex computation with

significantly fewer messages and contention immunity by distributed immutable view

□ Further support multicore-based cluster with hierarchical processing model and high parallelism

Source Code: http://ipads.se.sjtu.edu.cn/projects/cyclops

Page 37: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

Questions

Thanks

Cyclopshttp://

ipads.se.sjtu.edu.cn/projects/cyclops.html

IPADS

Institute of Parallel and Distributed

Systems

Page 38: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

PowerLyra: differentiated graph computation and partitioning on skewed natural graphs□ Hybrid engine and partitioning algorithms□ Outperform PowerGraph by up to 3.26X

for natural graphs

What’s Next?

http://ipads.se.sjtu.edu.cn/projects/powerlyra.html

21

3Low

High

R N048

1216

Exe

c-T

ime

(S

ec)

Preliminary Results

PLPGCyclops

Power-law: “most vertices have relatively few neighbors while a few have many neighbors”

Page 39: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

Generality

Algorithms: aggregate/activate all neighbors□ e.g. Community Detection (CD)□ Transfer to undirected graph and duplicate edges

4

3 1

4

23 1

5

21

M1 M2 M354 5

23 1

4 5

23 1

4

3 1

4

23 1

5

21

M1 M2 M3

Page 40: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

Generality

Algorithms: aggregate/activate all neighbors□ e.g. Community Detection (CD)□ Transfer to undirected graph and duplicate edges□ Still aggregate in one direction (e.g. in-edges)

and activate in another direction (e.g. out-edges)□ Preserve all benefits of Cyclops

→ x1 /replica & contention immunity & good locality

4

3 1

4

23 1

5

21

M1 M2 M354 5

23 1

Page 41: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

4

3 1

4

23 1

5

21

M1 M2 M35

Generality

Difference between Cyclops and GraphLab1. How to construct local sub-graph2. How to aggregate/activate neighbors

4

3 1

4

23 1

5

21

M1 M2 M354 5

23 1

4 5

23 1

Page 42: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

Improvement of CyclopsMT

6x1

x1

/1

6x2

x1

/1

6x4

x1

/1

6x8

x1

/1

6x1

x1

/1

6x1

x2

/2

6x1

x4

/4

6x1

x8

/8

6x1

x8

/1

6x1

x8

/2

6x1

x8

/4

6x1

x8

/8

0.0

5.0

10.0

15.0

20.0

25.0

30.0 SEND COMP SYNC

Exe

cuti

on T

ime (

Sec)

#[M]achines MxWxT/R#[W]orkers

#[T]hreads

#[R]eceivers

Cyclops

CyclopsMT

Page 43: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

Communication Efficiency

Hama

Cyclops

Hama

Cyclops

Hama

Cyclops0.1 1.0 10.0 100.0 1,000.0

SENDPARSE

Exec-Time (Sec)

50M

25M

5M

25.6X

16.2X

55.6%

12.6X

25.0%

W0

W1

W2

W3

W4

W5message:(id,data)

Hadoop RPC lib (Java) Boost RPC lib (C++)Hadoop RPC lib (Java)

Hama:PowerGrap

h:Cyclops:

send + buffer + parse (contention)

send + update

(contention)

31.5%

Page 44: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

Using Heuristic Edge-cut (i.e. Metis)

Amazon Gweb LJournal Wiki SYN-GL DBLP RoadCA0

5

10

15

20

25 HamaCyclopsCyclopsMT

Norm

aliz

ed S

peedup

PageRank ALS CD SSSP

23.04X

5.95X

48 workers

6 workers(8)

Page 45: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

Memory Consumption

Configuration

Max Cap (GB)

Max Usage (GB)

Young GC2

(#)Full GC2

(#)

Hama/48 1.7 1.5 132 69

Cyclops/48 4.0 3.0 45 15

CyclopsMT/6x8

12.6/8 11.0/8 268/8 32/8

Memory Behavior1 per Worker(PageRank with Wiki dataset)

2 GC: Concurrent Mark-Sweep

1 jStat

Page 46: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

Ingress Time

Dataset

LD REP INIT TOT

H C H C H C H C

Amazon 6.2 5.9 0.0 2.5 1.7 1.5 7.9 9.9

GWeb 7.1 6.8 0.0 2.8 2.6 1.9 9.7 11.4

LJournal 27.1 31.0 0.0 44.7 17.9 9.2 45.0 84.9

Wiki 46.7 46.7 0.0 62.2 33.4 20.4 80.0 129.3

SYN-GL 4.2 4.0 0.0 2.6 2.4 1.8 6.6 8.4

DBLP 4.1 4.1 0.0 1.5 1.3 0.9 5.4 6.5

RoadCA 6.4 6.2 0.0 3.9 0.9 0.6 7.3 10.7

CyclopsHama

Page 47: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

Selective Activation

Sync. & Distributed Activation□ Merge update & activate messages

1. Update value of replicas2. Invite replicas to activate neighbors

4 5

23 1

4

3 1

4

23 1

5

21

rlist:W1 l-act: 1value: 8 msg: 4

l-act:3value:6 msg:3

msg: v|m|se.g. 8 4 0

M1 M2 M3

84

active

msg: v|m|s|l

*Selective Activation (e.g. ALS)

Option: Activation_List

s0

Page 48: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

M2 M3

M1

Parallelism Improvement

Distributed immutable view opens an opportunity

M1

M3

out-queues

1

4

7

10 M1

M3

2

3

8

9

5

2

11

8

1

7

4

4

71

7

4

3

6 6

3

4

1

lock-free

sorted

computation sending receiving

good locality

comp.threads

comm.threadsvs.

separateconfiguration

thread

master

4replica

4

Page 49: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

w/ dynamic comp.

no contention

easy to program

duplicated edges

low comm. cost

CyclopsExisting graph-parallel

systems (e.g., Pregel, GraphLab, PowerGraph)

Cyclops(MT)→ Distributed

Immutable View

w/o dynamic comp.

high contention

hard to program

duplicated edges

heavy comm. cost

233

1 1

5replica

1

4x1

x1

Page 50: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

BiGraph: bipartite-oriented distributed graph partitioning for big learning□ A set of online distributed graph partition

algorithms designed for bipartite graphs and applications

□ Partition graphs in a differentiated way and loading data according to the data affinity

□ Outperform PowerGraph with default partition by up to 17.75X, and save up to 96% network traffic

What’s Next?

http://ipads.se.sjtu.edu.cn/projects/powerlyra.html

Page 51: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan
Page 52: Efficient Graph Processing with Distributed Immutable View Rong Chen Rong Chen +, Xin Ding +, Peng Wang +, Haibo Chen +, Binyu Zang + and Haibing Guan

Multicore Support

Two Challenges1. Two-level hierarchical organization

→ Preserve synchronous and deterministic computation nature (easy to program/debug)

2. Original BSP-like model is hard to parallelize → High contention to buffer and parse

messages→ Poor locality in message parsing→ Asymmetric degree of parallelism for CPU and

NIC