ling liu part 02:big graph processing
TRANSCRIPT
Ling Liu
School of Computer Science College of Compu2ng
Part II:
Distributed Graph Processing
2 2
Big Data Trends
Big Data
Volume
Velocity Variety
1 zettabyte = a trillion gigabytes (1021 bytes) CISCO, 2012
500 million Tweets per day
100 hours of video are uploaded every minute
3 3
Why Graphs?
Graphs are everywhere
!
Social Network Graphs
Road Networks
National Security
Business Analytics Biological
Graphs
Friendship Graph Facebook Engineering, 2010
Brain Network The Journal of Neuroscience 2011
US Road Network www.pavementinteractive.org
Web Security Graph
McAfee, 2013
Intelligence Data Model NSA, 2013
4 4
How Big?
Social Scale ! 1 billion ver2ces, 100 billion edges
! 111 PB adjacency matrix ! 2.92 TB adjacency list
NSA-RD-2013-056001v1
Social scale. . .
1 billion vertices, 100 billion edges
111 PB adjacency matrix
2.92 TB adjacency list
2.92 TB edge list
Twitter graph from Gephi dataset(http://www.gephi.org)
Paul Burkhardt, Chris Waring An NSA Big Graph experimentWeb Scale ! 50 billion ver2ces, 1 trillion edges
! 271 EB adjacency matrix ! 29.5 TB adjacency list
Brain Scale ! 100 billion ver2ces, 100 trillion edges
! 1.1 ZB adjacency matrix ! 2.83 PB adjacency list
NSA-RD-2013-056001v1
Web scale. . .
50 billion vertices, 1 trillion edges
271 EB adjacency matrix
29.5 TB adjacency list
29.1 TB edge list
Internet graph from the Opte Project(http://www.opte.org/maps)
Web graph from the SNAP database(http://snap.stanford.edu/data)
Paul Burkhardt, Chris Waring An NSA Big Graph experiment
NSA-RD-2013-056001v1
Brain scale. . .
100 billion vertices, 100 trillion edges
2.08 mNA · bytes2 (molar bytes) adjacency matrix
2.84 PB adjacency list
2.84 PB edge list
Human connectome.Gerhard et al., Frontiers in Neuroinformatics 5(3), 2011
2NA = 6.022⇥ 1023mol�1
Paul Burkhardt, Chris Waring An NSA Big Graph experiment
5 5
Big Graph Data Technical Challenges
Huge and growing size -‐ Requires massive storage capaci2es -‐ Graph analy2cs usually requires much bigger compu2ng and storage resources
Complicated correlaEons among data enEEes (verEces) -‐ Make it hard to parallelize graph processing (hard to par22on) -‐ Most exis2ng big data systems are not designed to handle such complexity
Skewed distribuEon (i.e., high-‐degree verEces) -‐ Makes it hard to ensure load balancing
6 6
Parallel Graph Processing: Challenges
• Structure driven computa2on – Storage and Data Transfer Issues
• Irregular Graph Structure and Computa2on Model – Storage and Data/Computa2on Par22oning Issues – Par22oning v.s. Load/Resource Balancing
6
7 7
Parallel Graph Processing: OpportuniEes
• Extend Exis2ng Paradigms – Vertex centric – Edge centric
• BUILD NEW FRAMEWORKS for Parallel Graph Processing – Single Machine Solu2ons
• GraphLego [ACM HPDC 2015] / GraphTwist [VLDB2015]
– Distributed Approaches • GraphMap [IEEE SC 2015], PathGraph [IEEE SC 2014]
7
8 8
Build New Graph Frameworks: Key Requirements/Challenges
• Less pre-‐processing • Low and load-‐balanced computaEon • Low and load-‐balanced communicaEon • Low memory footprint • Scalable wrt cluster size and graph size
• General graph processing framework for large collecEons of graph computaEon algorithms and applicaEons
9 9
Graph OperaEons: Two DisEnct Classes
IteraEve Graph Algorithms ! Each execu2on consists of a set of itera2ons
! In each itera2on, vertex (or edge) values are updated ! All (or most) ver2ces par2cipate in the execu2on
! Examples: PageRank, shortest paths (SSSP), connected components ! Systems: Pregel, GraphLab, GraphChi, X-‐Stream, GraphX, Pregelix
Graph PaXern Queries ! Subgraph matching problem
! Requires fast query response 2me ! Explores a small frac2on of the en2re graph
! Examples: friends-‐of-‐friends, triangle paberns ! Systems: RDF-‐3X, TripleBit, SHAPE
VLDB 2014
IEEE SC 2015
10 10
Distributed Approaches to Parallel Graph Processing
SHAPE: Distributed RDF System with Seman2c Hash Par22oning ! Graph Pabern Queries ! Seman2c Hash Par22oning ! Distributed RDF Query Processing ! Experiments
GraphMap: Scalable Itera2ve Graph Computa2ons ! Itera2ve Graph Computa2ons ! GraphMap Approaches ! Experiments
11 11
What Are IteraEve Graph Algorithms?
IteraEve Graph Algorithms ! Each execu2on consists of a set of itera2ons
! In each itera2on, vertex (or edge) values are updated ! All (or most) ver2ces par2cipate in the opera2ons
! Examples: PageRank, shortest paths (SSSP), connected components ! Systems: Google’s Pregel, GraphLab, GraphChi, X-‐Stream, GraphX, Pregelix
SSSP
Connected Components Source: amplab
12 12
Why Is IteraEve Graph Processing So Difficult?
Huge and growing size of graph data -‐ Makes it hard to store and handle the data on a single machine
Poor locality (many random accesses) -‐ Each vertex depends on its neighboring ver2ces, recursively
Huge size of intermediate data for each iteraEon -‐ Requires addi2onal compu2ng and storage resources
Heterogeneous graph algorithms -‐ Different algorithms have different computa2on and access paberns
High-‐degree verEces -‐ Make it hard to ensure load balancing
13
The problems of current computaEon models
13
14
The problems of current computaEon models
• Ghost ver2ces maintain adjacency structure and replicate remote data.
• Too much interac2ons among par22ons
14
“ghost” vertices
15 15
IteraEve Graph Algorithm Example
Figure source: Apache Flink
Connected Components
16 16
Why Don’t We Use MapReduce?
Of course, we can use MapReduce!
The first iteraEon of Connected Components for this graph would be …
Map
K V 2 1
K V 1 2 3 2 4 2
K V 2 4 3 4 K V
2 3 4 3
Reduce
K Values 1 2
Min 1
2 1,3,4 3 2,4 4 2,3
1 2 2
17 17
Why We Shouldn’t Use MapReduce
But …
In a typical MapReduce job, disk IOs are performed in four places
So… 10 iteraEons mean…
Figure source: http://arasan-blog.blogspot.com/
Disk IOs in 40 places
18 18
Related Work
Distributed Memory-‐Based Systems ! Messaging-‐based: Google Pregel, Apache Giraph, Apache Hama ! Vertex mirroring: GraphLab, PowerGraph, GraphX ! Dynamic load balancing: Mizan, GPS ! Graph-‐centric view: Giraph++
Disk-‐Based Systems using single machine ! Vertex-‐centric model: GraphChi ! Edge-‐centric model: X-‐Stream ! Vertex-‐Edge Centric: GraphLego
With External Memory ! Out-‐of-‐core capabili2es (Apache Giraph, Apache Hama, GraphX)
! Not op2mized for graph computa2ons ! Users need to configure several parameters
19 19
Two Research DirecEons
IteraEve Graph Processing Systems
Disk-based systems on a single machine
! Load a part of the input graph in memory
! Include a set of data structures and techniques to efficiently load graph data from disk
! GraphChi, X-‐Stream, … ! Disadv.: 1) relaEvely slow, 2)
resource limitaEons of a single machine
Distributed memory-based systems on a cluster
! Load the whole input graph in memory
! Load all intermediate results and messages in memory
! Pregel, Giraph, Hama, GraphLab, GraphX, …
! Disadv.: 1) very high memory requirement, 2) coordinaEon of distributed machines
20 20
Main Features
Develop GraphMap ! Distributed itera2ve graph computa2on framework that effec2vely
u2lizes secondary storage ! To reduce the memory requirement of itera2ve graph computa2ons
while ensuring compe22ve (or beber) performance
Main ContribuEons ! Clear separaEon between mutable and read-‐only data ! Two-‐level parEEoning technique for locality-‐op2mized data
placement ! Dynamic access methods based on the workloads of the current
itera2on
21 21
Clear Data SeparaEon
Graph Data
Vertices and their data (mutable)
Edges and their data (read-only)
Read edge data for each iteration!
22 22
Locality-‐Based Data Placement on Disk
Edge Access Locality ! All edges (out-‐edges, in-‐edges or bi-‐edges) of a vertex are accessed
together to update its vertex value è We place all connected edges of a vertex together on disk
Vertex Access Locality ! All ver2ces in a par22on are accessed by the same worker
(processor) in every itera2on è We store all ver2ces, in a par22on, and their edges into con2guous disk blocks to u2lize sequenEal disk accesses
How can you access disk efficiently for each iteraEon?
23 23
Dynamic Access Methods
Various Workloads If the current workload is larger than the threshold?
YES NO
Sequential disk accesses
Random disk accesses
The threshold is dynamically configured based on actual access
times for each iteration and for each worker
0
20
40
60
80
100
120
140
0 1 2 3 4 5 6 7 8 9
# ac
tive
vert
ices
(x10
00)
Iteration
PageRank CC SSSP
24 24
Experiments
First Prototype of GraphMap ! BSP engine & messaging engine: U2lize Apache Hama ! Disk storage: U2lize Apache HBase
! Two-‐dimensional key-‐value store
Sefngs ! Cluster of 21 machines on Emulab
! 12GB RAM, Xeon E5530, 500GB and 250GB SATA disks ! Connected via a 1 GigE network
! HBase (ver. 0.96) on HDFS of Hadoop (ver. 1.0.4) ! Hama (ver. 0.6.3)
IteraEve Graph Algorithms ! 1) PageRank (10 iter.), 2) SSSP, 3) CC
25 25
ExecuEon Time
Analysis ! Hama fails for large graphs with more than 900M edges
while GraphMap s2ll works ! Note that, in all the cases, GraphMap is faster (up to 6 2mes)
than Hama, which is the in-‐memory system
26 26
Breakdown of GraphMap ExecuEon Time
PageRank on uk-2005 SSSP on uk-2005
CC on uk-2005
Analysis ! For PageRank, all itera2ons have similar
results except the first and last ! For SSSP, itera2on 5 – 15 u2lize
sequen2al disk accesses based on our dynamic selec2on
! For CC, random disk accesses are selected from itera2on 24
27 27
Effects of Dynamic Access Methods
Analysis ! GraphMap chooses the
op2mal access method in most of the itera2ons
! Possible further improvement through fine-‐tuning in itera2ons 5 and 15
! For cit-‐Patents, GraphMap always chooses random accesses because only 3.3% ver2ces are reachable from the start vertex and thus the number of ac2ve ver2ces is always small
0
5
10
15
20
25
30
35
0 5 10 15 20 25 30 35 40
Com
puta
tion
Tim
e (s
ec)
Iteration
Sequential Random Dynamic
28 28
• Exis2ng distributed graph systems are all in-‐memory systems. In addi2on to Hama, we give a rela2ve comparison with a few other representa2ve systems:
Comparing GraphMap with other systems
GraphMap: 12GB DRAM per node of a cluster of size 21 nodes and 252GB distributed shared memory
5x DRAM per node
29 29
Social Life Journal (LJ) Graph Dataset
Vertices: Members Edges: Friendship
Graph dataset (stored in HDFS) cit-Patents (raw size: 268MB): 3.8M vertices, 16.5M edges soc-LiveJournal1 (raw size: 1.1GB): 4.8M vertices, 69M edges
30 30
• Cluster sepng – 6 machines (1 master & 5 slaves)
• Spark sepng – Spark shell (i.e., did NOT implement any Spark applica2on yet) • Built-‐in PageRank func2on of GraphX
– All 40 cores (= 8 cores x 5 slaves) – Por2on of memory for RDD storage: 0.52 (by default)
• If we assign 512MB for each executor, about 265MB is dedicated for RDD storage
Our iniEal experience with SPARK / GraphX
30
31 31
• Implemented two hash par22oning techniques using GraphX API – 1) by source vertex IDs, 2) by des2na2on vertex IDs
Use-‐defined ParEEoning (soc-‐LiveJournal1)
31
5GB slave RAM & 40 partitions
loading 1st PageRank 10 iterations
2nd PageRank 10 iterations
3rd PageRank 10 iterations
Hashing by src. vertices
10s (2.1GB) 196s (7.2GB) (12.3 / 13.2GB)
156s (7.2GB) (12.1 / 13.2GB)
OOM
Hashing by dst. vertices
10s (2.1GB)
168s (5.8GB) (12.5 / 13.2GB)
OOM
EdgePartition2D 10s (2.1GB) 199s (8GB) (12.1 / 13.2GB)
OOM
EdgePartition1D 13s (2.1GB, outlier?)
188s (6.9GB) (12.2 / 13.2GB)
173s (7.2GB) (12.3 / 13.2GB)
OOM
10GB slave RAM & 40 partitions
loading 1st PageRank 10 iterations
2nd PageRank 10 iterations
3rd PageRank 10 iterations
Hashing by src. vertices
10s (2.1GB) 189s (13.2GB) (17.5 / 26.1 GB)
186s (18.2GB) (23 / 26.1GB)
OOM
Hashing by dst. vertices
11s (2.1GB)
223s (15.9GB) (21.1 / 26.1GB)
OOM
32 32
• Our ini2al experience with SPARK – Spark performs well with large per-‐node with >= 68GB DRAM, as reported in the SPARK/GraphX paper.
– Do not perform well for cluster with nodes of smaller DRAM
• Messaging Overheads – Distributed graph processing systems do not scale as the # nodes increases due to the amount of messaging cost among compute nodes in the cluster to synchronize the computa2on in each itera2on round
Spark/GraphX experience and Messaging Cost
33 33
Summary
GraphMap ! Distributed itera2ve graph computa2on framework that effec2vely
u2lizes secondary storage ! Clear separa2on between mutable and read-‐only data ! Locality-‐based data placement on disk ! Dynamic access methods based on the workloads of the current
itera2on
Ongoing Research ! Disk and worker coloca2on to improve the disk access performance ! Efficient and lightweight par22oning techniques, incorpora2ng our
work on GraphLego for single PC graph processing [ACM HPDC 2015] ! Comparing with SPARK/GraphX on larger DRAM cluster
34 34
General Purpose Distributed Graph System
ExisEng State of Art ! Separate efforts for the two representa2ve graph opera2ons ! Separate efforts for the scale-‐up and scale-‐out systems
Challenges for Developing a General Purpose Graph Processing System ! Different data access paberns / graph computa2on models ! Different inter-‐node communica2on effects
Possible DirecEons ! Graph summariza2on techniques ! Lightweight graph par22oning techniques ! Op2mized data storage systems and access methods
35
VISIT:
35
https://sites.google.com/site/gtshape/ https://sites.google.com/site/git_GraphLego/ https://sites.google.com/site/git_GraphTwist/