ling liu part 02：big graph processing

Ling Liu

School of Computer Science College of Compu2ng

Part II:

Distributed Graph Processing

2 2

Big Data Trends

Big Data

Volume

Velocity Variety

1 zettabyte = a trillion gigabytes (1021 bytes) CISCO, 2012

500 million Tweets per day

100 hours of video are uploaded every minute

3 3

Why Graphs?

Graphs are everywhere

!

Social Network Graphs

Road Networks

National Security

Business Analytics Biological

Graphs

Friendship Graph Facebook Engineering, 2010

Brain Network The Journal of Neuroscience 2011

US Road Network www.pavementinteractive.org

Web Security Graph

McAfee, 2013

Intelligence Data Model NSA, 2013

4 4

How Big?

Social Scale !   1 billion ver2ces, 100 billion edges

!   111 PB adjacency matrix !   2.92 TB adjacency list

NSA-RD-2013-056001v1

Social scale. . .

1 billion vertices, 100 billion edges

111 PB adjacency matrix

2.92 TB adjacency list

2.92 TB edge list

Twitter graph from Gephi dataset(http://www.gephi.org)

Paul Burkhardt, Chris Waring An NSA Big Graph experimentWeb Scale !   50 billion ver2ces, 1 trillion edges

!   271 EB adjacency matrix !   29.5 TB adjacency list

Brain Scale !   100 billion ver2ces, 100 trillion edges

!   1.1 ZB adjacency matrix !   2.83 PB adjacency list

NSA-RD-2013-056001v1

Web scale. . .

50 billion vertices, 1 trillion edges

271 EB adjacency matrix

29.5 TB adjacency list

29.1 TB edge list

Internet graph from the Opte Project(http://www.opte.org/maps)

Web graph from the SNAP database(http://snap.stanford.edu/data)

Paul Burkhardt, Chris Waring An NSA Big Graph experiment

NSA-RD-2013-056001v1

Brain scale. . .

100 billion vertices, 100 trillion edges

2.08 mNA · bytes2 (molar bytes) adjacency matrix

2.84 PB adjacency list

2.84 PB edge list

Human connectome.Gerhard et al., Frontiers in Neuroinformatics 5(3), 2011

2NA = 6.022⇥ 1023mol�1

Paul Burkhardt, Chris Waring An NSA Big Graph experiment

5 5

Big Graph Data Technical Challenges

Huge and growing size -‐ Requires massive storage capaci2es -‐ Graph analy2cs usually requires much bigger compu2ng and storage resources

Complicated correlaEons among data enEEes (verEces) -‐ Make it hard to parallelize graph processing (hard to par22on) -‐ Most exis2ng big data systems are not designed to handle such complexity

Skewed distribuEon (i.e., high-‐degree verEces) -‐ Makes it hard to ensure load balancing

6 6

Parallel Graph Processing: Challenges

•  Structure driven computa2on –  Storage and Data Transfer Issues

•  Irregular Graph Structure and Computa2on Model –  Storage and Data/Computa2on Par22oning Issues –  Par22oning v.s. Load/Resource Balancing

6

7 7

Parallel Graph Processing: OpportuniEes

•  Extend Exis2ng Paradigms –  Vertex centric –  Edge centric

•  BUILD NEW FRAMEWORKS for Parallel Graph Processing –  Single Machine Solu2ons

•  GraphLego [ACM HPDC 2015] / GraphTwist [VLDB2015]

–  Distributed Approaches •  GraphMap [IEEE SC 2015], PathGraph [IEEE SC 2014]

7

8 8

Build New Graph Frameworks: Key Requirements/Challenges

•  Less pre-‐processing •  Low and load-‐balanced computaEon •  Low and load-‐balanced communicaEon •  Low memory footprint •  Scalable wrt cluster size and graph size

•  General graph processing framework for large collecEons of graph computaEon algorithms and applicaEons

9 9

Graph OperaEons: Two DisEnct Classes

IteraEve Graph Algorithms !   Each execu2on consists of a set of itera2ons

!   In each itera2on, vertex (or edge) values are updated !   All (or most) ver2ces par2cipate in the execu2on

!   Examples: PageRank, shortest paths (SSSP), connected components !   Systems: Pregel, GraphLab, GraphChi, X-‐Stream, GraphX, Pregelix

Graph PaXern Queries ! Subgraph matching problem

!   Requires fast query response 2me !   Explores a small frac2on of the en2re graph

!   Examples: friends-‐of-‐friends, triangle paberns !   Systems: RDF-‐3X, TripleBit, SHAPE

VLDB 2014

IEEE SC 2015

10 10

Distributed Approaches to Parallel Graph Processing

SHAPE: Distributed RDF System with Seman2c Hash Par22oning !   Graph Pabern Queries !   Seman2c Hash Par22oning !   Distributed RDF Query Processing !   Experiments

GraphMap: Scalable Itera2ve Graph Computa2ons !   Itera2ve Graph Computa2ons ! GraphMap Approaches !   Experiments

11 11

What Are IteraEve Graph Algorithms?

IteraEve Graph Algorithms !   Each execu2on consists of a set of itera2ons

!   In each itera2on, vertex (or edge) values are updated !   All (or most) ver2ces par2cipate in the opera2ons

!   Examples: PageRank, shortest paths (SSSP), connected components !   Systems: Google’s Pregel, GraphLab, GraphChi, X-‐Stream, GraphX, Pregelix

SSSP

Connected Components Source: amplab

12 12

Why Is IteraEve Graph Processing So Difficult?

Huge and growing size of graph data -‐ Makes it hard to store and handle the data on a single machine

Poor locality (many random accesses) -‐ Each vertex depends on its neighboring ver2ces, recursively

Huge size of intermediate data for each iteraEon -‐ Requires addi2onal compu2ng and storage resources

Heterogeneous graph algorithms -‐ Different algorithms have different computa2on and access paberns

High-‐degree verEces -‐ Make it hard to ensure load balancing

13

The problems of current computaEon models

13

14

The problems of current computaEon models

•  Ghost ver2ces maintain adjacency structure and replicate remote data.

•  Too much interac2ons among par22ons

14

“ghost” vertices

15 15

IteraEve Graph Algorithm Example

Figure source: Apache Flink

Connected Components

16 16

Why Don’t We Use MapReduce?

Of course, we can use MapReduce!

The first iteraEon of Connected Components for this graph would be …

Map

K V 2 1

K V 1 2 3 2 4 2

K V 2 4 3 4 K V

2 3 4 3

Reduce

K Values 1 2

Min 1

2 1,3,4 3 2,4 4 2,3

1 2 2

17 17

Why We Shouldn’t Use MapReduce

But …

In a typical MapReduce job, disk IOs are performed in four places

So… 10 iteraEons mean…

Figure source: http://arasan-blog.blogspot.com/

Disk IOs in 40 places

18 18

Related Work

Distributed Memory-‐Based Systems !   Messaging-‐based: Google Pregel, Apache Giraph, Apache Hama !   Vertex mirroring: GraphLab, PowerGraph, GraphX !   Dynamic load balancing: Mizan, GPS !   Graph-‐centric view: Giraph++

Disk-‐Based Systems using single machine !   Vertex-‐centric model: GraphChi !   Edge-‐centric model: X-‐Stream !   Vertex-‐Edge Centric: GraphLego

With External Memory !   Out-‐of-‐core capabili2es (Apache Giraph, Apache Hama, GraphX)

!   Not op2mized for graph computa2ons !   Users need to configure several parameters

19 19

Two Research DirecEons

IteraEve Graph Processing Systems

Disk-based systems on a single machine

!   Load a part of the input graph in memory

!   Include a set of data structures and techniques to efficiently load graph data from disk

! GraphChi, X-‐Stream, … ! Disadv.: 1) relaEvely slow, 2)

resource limitaEons of a single machine

Distributed memory-based systems on a cluster

!   Load the whole input graph in memory

!   Load all intermediate results and messages in memory

! Pregel, Giraph, Hama, GraphLab, GraphX, …

! Disadv.: 1) very high memory requirement, 2) coordinaEon of distributed machines

20 20

Main Features

Develop GraphMap !   Distributed itera2ve graph computa2on framework that effec2vely

u2lizes secondary storage !   To reduce the memory requirement of itera2ve graph computa2ons

while ensuring compe22ve (or beber) performance

Main ContribuEons !   Clear separaEon between mutable and read-‐only data !   Two-‐level parEEoning technique for locality-‐op2mized data

placement !   Dynamic access methods based on the workloads of the current

itera2on

21 21

Clear Data SeparaEon

Graph Data

Vertices and their data (mutable)

Edges and their data (read-only)

Read edge data for each iteration!

22 22

Locality-‐Based Data Placement on Disk

Edge Access Locality !   All edges (out-‐edges, in-‐edges or bi-‐edges) of a vertex are accessed

together to update its vertex value è We place all connected edges of a vertex together on disk

Vertex Access Locality !   All ver2ces in a par22on are accessed by the same worker

(processor) in every itera2on è We store all ver2ces, in a par22on, and their edges into con2guous disk blocks to u2lize sequenEal disk accesses

How can you access disk efficiently for each iteraEon?

23 23

Dynamic Access Methods

Various Workloads If the current workload is larger than the threshold?

YES NO

Sequential disk accesses

Random disk accesses

The threshold is dynamically configured based on actual access

times for each iteration and for each worker

0

20

40

60

80

100

120

140

0 1 2 3 4 5 6 7 8 9

# ac

tive

vert

ices

(x10

00)

Iteration

PageRank CC SSSP

24 24

Experiments

First Prototype of GraphMap !   BSP engine & messaging engine: U2lize Apache Hama !   Disk storage: U2lize Apache HBase

!   Two-‐dimensional key-‐value store

Sefngs !   Cluster of 21 machines on Emulab

!   12GB RAM, Xeon E5530, 500GB and 250GB SATA disks !   Connected via a 1 GigE network

! HBase (ver. 0.96) on HDFS of Hadoop (ver. 1.0.4) !   Hama (ver. 0.6.3)

IteraEve Graph Algorithms !   1) PageRank (10 iter.), 2) SSSP, 3) CC

25 25

ExecuEon Time

Analysis !   Hama fails for large graphs with more than 900M edges

while GraphMap s2ll works !   Note that, in all the cases, GraphMap is faster (up to 6 2mes)

than Hama, which is the in-‐memory system

26 26

Breakdown of GraphMap ExecuEon Time

PageRank on uk-2005 SSSP on uk-2005

CC on uk-2005

Analysis !   For PageRank, all itera2ons have similar

results except the first and last !   For SSSP, itera2on 5 – 15 u2lize

sequen2al disk accesses based on our dynamic selec2on

!   For CC, random disk accesses are selected from itera2on 24

27 27

Effects of Dynamic Access Methods

Analysis ! GraphMap chooses the

op2mal access method in most of the itera2ons

!   Possible further improvement through fine-‐tuning in itera2ons 5 and 15

!   For cit-‐Patents, GraphMap always chooses random accesses because only 3.3% ver2ces are reachable from the start vertex and thus the number of ac2ve ver2ces is always small

0

5

10

15

20

25

30

35

0 5 10 15 20 25 30 35 40

Com

puta

tion

Tim

e (s

ec)

Iteration

Sequential Random Dynamic

28 28

•  Exis2ng distributed graph systems are all in-‐memory systems. In addi2on to Hama, we give a rela2ve comparison with a few other representa2ve systems:

Comparing GraphMap with other systems

GraphMap: 12GB DRAM per node of a cluster of size 21 nodes and 252GB distributed shared memory

5x DRAM per node

29 29

Social Life Journal (LJ) Graph Dataset

Vertices: Members Edges: Friendship

Graph dataset (stored in HDFS) cit-Patents (raw size: 268MB): 3.8M vertices, 16.5M edges soc-LiveJournal1 (raw size: 1.1GB): 4.8M vertices, 69M edges

30 30

•  Cluster sepng –  6 machines (1 master & 5 slaves)

•  Spark sepng –  Spark shell (i.e., did NOT implement any Spark applica2on yet) • Built-‐in PageRank func2on of GraphX

–  All 40 cores (= 8 cores x 5 slaves) –  Por2on of memory for RDD storage: 0.52 (by default)

•  If we assign 512MB for each executor, about 265MB is dedicated for RDD storage

Our iniEal experience with SPARK / GraphX

30

31 31

•  Implemented two hash par22oning techniques using GraphX API –  1) by source vertex IDs, 2) by des2na2on vertex IDs

Use-‐defined ParEEoning (soc-‐LiveJournal1)

31

5GB slave RAM & 40 partitions

loading 1st PageRank 10 iterations

2nd PageRank 10 iterations

3rd PageRank 10 iterations

Hashing by src. vertices

10s (2.1GB) 196s (7.2GB) (12.3 / 13.2GB)

156s (7.2GB) (12.1 / 13.2GB)

OOM

Hashing by dst. vertices

10s (2.1GB)

168s (5.8GB) (12.5 / 13.2GB)

OOM

EdgePartition2D 10s (2.1GB) 199s (8GB) (12.1 / 13.2GB)

OOM

EdgePartition1D 13s (2.1GB, outlier?)

188s (6.9GB) (12.2 / 13.2GB)

173s (7.2GB) (12.3 / 13.2GB)

OOM

10GB slave RAM & 40 partitions

loading 1st PageRank 10 iterations

2nd PageRank 10 iterations

3rd PageRank 10 iterations

Hashing by src. vertices

10s (2.1GB) 189s (13.2GB) (17.5 / 26.1 GB)

186s (18.2GB) (23 / 26.1GB)

OOM

Hashing by dst. vertices

11s (2.1GB)

223s (15.9GB) (21.1 / 26.1GB)

OOM

32 32

•  Our ini2al experience with SPARK –  Spark performs well with large per-‐node with >= 68GB DRAM, as reported in the SPARK/GraphX paper.

–  Do not perform well for cluster with nodes of smaller DRAM

•  Messaging Overheads –  Distributed graph processing systems do not scale as the # nodes increases due to the amount of messaging cost among compute nodes in the cluster to synchronize the computa2on in each itera2on round

Spark/GraphX experience and Messaging Cost

33 33

Summary

GraphMap !   Distributed itera2ve graph computa2on framework that effec2vely

u2lizes secondary storage !   Clear separa2on between mutable and read-‐only data !   Locality-‐based data placement on disk !   Dynamic access methods based on the workloads of the current

itera2on

Ongoing Research !   Disk and worker coloca2on to improve the disk access performance !   Efficient and lightweight par22oning techniques, incorpora2ng our

work on GraphLego for single PC graph processing [ACM HPDC 2015] !   Comparing with SPARK/GraphX on larger DRAM cluster

34 34

General Purpose Distributed Graph System

ExisEng State of Art !   Separate efforts for the two representa2ve graph opera2ons !   Separate efforts for the scale-‐up and scale-‐out systems

Challenges for Developing a General Purpose Graph Processing System !   Different data access paberns / graph computa2on models !   Different inter-‐node communica2on effects

Possible DirecEons !   Graph summariza2on techniques !   Lightweight graph par22oning techniques !   Op2mized data storage systems and access methods

35

VISIT:

35

https://sites.google.com/site/gtshape/ https://sites.google.com/site/git_GraphLego/ https://sites.google.com/site/git_GraphTwist/

ling liu part 02：big graph processing

Data & Analytics