![Page 1: Processing graph/relational data with Map-Reduce and Bulk Synchronous Parallel](https://reader033.vdocuments.site/reader033/viewer/2022051312/54620e4eb4af9f581c8b45b1/html5/thumbnails/1.jpg)
Processing graph/relational datawith
Map-Reduceand
Bulk Synchronous Parallelv. 1.1
Tomasz Chodakowski,
1st Bristol Hadoop Workshop, 08-11-2010
![Page 2: Processing graph/relational data with Map-Reduce and Bulk Synchronous Parallel](https://reader033.vdocuments.site/reader033/viewer/2022051312/54620e4eb4af9f581c8b45b1/html5/thumbnails/2.jpg)
Irregular Algorithms
● Map-reduce – a simplified model for “embarasingly parallel” problems
– Easily separable into independent tasks
– Captured by static dependence graph
● Most graph algorithms are irregular, ie.:
– Dependencies between tasks arise during execution
– “don't care non-determinism” - tasks can be executed in arbitrary order yet still yield correct results.
![Page 3: Processing graph/relational data with Map-Reduce and Bulk Synchronous Parallel](https://reader033.vdocuments.site/reader033/viewer/2022051312/54620e4eb4af9f581c8b45b1/html5/thumbnails/3.jpg)
Irregular Algorithms
● Often operate on data structures with complex topologies:
– Graphs, trees, grids, ...
– Where “data elements” are connected by “relations”
● Computations on such structures depend strongly on relations between data elements
– primary source of dependencies between tasks
more in [ADP] “Amorphous Data-parallelism in Irregular Algorithms”
![Page 4: Processing graph/relational data with Map-Reduce and Bulk Synchronous Parallel](https://reader033.vdocuments.site/reader033/viewer/2022051312/54620e4eb4af9f581c8b45b1/html5/thumbnails/4.jpg)
Relational Data
● Example relations between elements:
– social interactions (co-authorship, friendship)
– web links, document references
– linked data or semantic network relations
– geo-spatial relations
– ...● Different from a relational model
– in that relations are arbitrary
![Page 5: Processing graph/relational data with Map-Reduce and Bulk Synchronous Parallel](https://reader033.vdocuments.site/reader033/viewer/2022051312/54620e4eb4af9f581c8b45b1/html5/thumbnails/5.jpg)
Graph Algorithms Rough Classification
● Aggregation, feature extraction
– Not leveraging latent relations● Network analysis (matrix-based, single relational)
– Geodesic (radius, diameter etc.)
– Spectral (eigenvector-based, centrality)● Algorithmic/node-based algorithms
– Recommender systems, belief/label propagation
– Traversal, path detection, interaction networks, etc.
![Page 6: Processing graph/relational data with Map-Reduce and Bulk Synchronous Parallel](https://reader033.vdocuments.site/reader033/viewer/2022051312/54620e4eb4af9f581c8b45b1/html5/thumbnails/6.jpg)
Iterative Vertex-based Graph Algorithms
● Iteratively:
– Compute local function of a vertex that depends on the vertex state and local graph structure (neighbourhood)
– and/or Modify local state
– and/or Modify local topology
– pass messages to neighbouring nodes
● -> “vertex-based computation”Amorphous Data-Parallelism [ADP] operator formulation:
“repeated application of neighbourhood operators in a specific order”
![Page 7: Processing graph/relational data with Map-Reduce and Bulk Synchronous Parallel](https://reader033.vdocuments.site/reader033/viewer/2022051312/54620e4eb4af9f581c8b45b1/html5/thumbnails/7.jpg)
Recent applications/developments
● Google work on graph-based YouTube recommendations:
– Leveraging latent information
– Diffusing interest in sparsely labeled video clips
● User profiling, sentiment analysis
– Facebook likes, Hunch, Gravity, MusicMetric ...
![Page 8: Processing graph/relational data with Map-Reduce and Bulk Synchronous Parallel](https://reader033.vdocuments.site/reader033/viewer/2022051312/54620e4eb4af9f581c8b45b1/html5/thumbnails/8.jpg)
Single Source Shortest Path
1144
22
55
33
66
11
99
11
3322
00
P1P1 P2P2 P1P1 P2P2workwork
Directed graph Directed graph labelled with labelled with positive integerspositive integers
This time-space This time-space view shows view shows workload and workload and communication communication between between partitionspartitions
Graph structure Graph structure split into two split into two partitions (P1, P2)partitions (P1, P2)
TimeTime
Turquoise Turquoise rectangles show rectangles show computational computational work load for a work load for a partition (work)partition (work)
![Page 9: Processing graph/relational data with Map-Reduce and Bulk Synchronous Parallel](https://reader033.vdocuments.site/reader033/viewer/2022051312/54620e4eb4af9f581c8b45b1/html5/thumbnails/9.jpg)
Single Source Shortest Path
1144
22
55
33
66
11
99
11
3322
00 0+0+66
0+0+11
0+0+99
P1P1 P2P2 P1P1 P2P2
commcomm
workwork
Thick green lines Thick green lines show, costly, inter show, costly, inter partition partition communicationscommunications
Active vertices Active vertices are in turquoiseare in turquoise
Signals being Signals being passed along passed along relations are in relations are in light greenlight green
![Page 10: Processing graph/relational data with Map-Reduce and Bulk Synchronous Parallel](https://reader033.vdocuments.site/reader033/viewer/2022051312/54620e4eb4af9f581c8b45b1/html5/thumbnails/10.jpg)
Single Source Shortest Path
1144
22
55
33
66
11
99
11
3322
00 0+0+66
0+0+11
0+0+99
P1P1 P2P2 P1P1 P2P2
commcomm
workwork
barrierbarrier
Vertical grey line Vertical grey line is a barrier is a barrier synchronisation to synchronisation to avoid race avoid race conditionsconditions
![Page 11: Processing graph/relational data with Map-Reduce and Bulk Synchronous Parallel](https://reader033.vdocuments.site/reader033/viewer/2022051312/54620e4eb4af9f581c8b45b1/html5/thumbnails/11.jpg)
Single Source Shortest Path
1144
22
55
33
66
11
99
11
3322
0066
11
99
P1P1 P2P2 P1P1 P2P2
commcomm
workwork
barrierbarrier
workwork
Work,comm,barrier Work,comm,barrier form a BSP superstepform a BSP superstep
Vertices become Vertices become active upon receiving active upon receiving signal in a previous signal in a previous superstepsuperstep
![Page 12: Processing graph/relational data with Map-Reduce and Bulk Synchronous Parallel](https://reader033.vdocuments.site/reader033/viewer/2022051312/54620e4eb4af9f581c8b45b1/html5/thumbnails/12.jpg)
Single Source Shortest Path
1144
22
55
33
66
11
99
11
3322
0066
11
991+1+11
1+1+33
6+6+22
P1P1 P2P2 P1P1 P2P2
commcomm
workwork
barrierbarrier
commcomm
workwork
After performing After performing local computation local computation they send signals to they send signals to their neighbouring their neighbouring verticesvertices
![Page 13: Processing graph/relational data with Map-Reduce and Bulk Synchronous Parallel](https://reader033.vdocuments.site/reader033/viewer/2022051312/54620e4eb4af9f581c8b45b1/html5/thumbnails/13.jpg)
Single Source Shortest Path
1144
22
55
33
66
11
99
11
3322
0066
11
991+1+11
1+1+33
6+6+22
P1P1 P2P2 P1P1 P2P2
commcomm
workwork
barrierbarrier
commcomm
workwork
barrierbarrier
![Page 14: Processing graph/relational data with Map-Reduce and Bulk Synchronous Parallel](https://reader033.vdocuments.site/reader033/viewer/2022051312/54620e4eb4af9f581c8b45b1/html5/thumbnails/14.jpg)
Single Source Shortest Path
1144
22
55
33
66
11
99
11
3322
0044
11
9988
P1P1 P2P2 P1P1 P2P2
commcomm
workwork
barrierbarrier
commcomm
workwork
barrierbarrier
workwork
![Page 15: Processing graph/relational data with Map-Reduce and Bulk Synchronous Parallel](https://reader033.vdocuments.site/reader033/viewer/2022051312/54620e4eb4af9f581c8b45b1/html5/thumbnails/15.jpg)
Single Source Shortest Path
1144
22
55
33
66
11
99
11
3322
0044
11
9988
4+4+22
P1P1 P2P2 P1P1 P2P2
commcomm
workwork
barrierbarrier
commcomm
workwork
barrierbarrier
commcomm
workwork
![Page 16: Processing graph/relational data with Map-Reduce and Bulk Synchronous Parallel](https://reader033.vdocuments.site/reader033/viewer/2022051312/54620e4eb4af9f581c8b45b1/html5/thumbnails/16.jpg)
Single Source Shortest Path
1144
22
55
33
66
11
99
11
3322
0044
11
9988
4+4+22
P1P1 P2P2 P1P1 P2P2
commcomm
workwork
barrierbarrier
commcomm
workwork
barrierbarrier
commcomm
workwork
barrierbarrier
![Page 17: Processing graph/relational data with Map-Reduce and Bulk Synchronous Parallel](https://reader033.vdocuments.site/reader033/viewer/2022051312/54620e4eb4af9f581c8b45b1/html5/thumbnails/17.jpg)
Single Source Shortest Path
1144
22
55
33
66
11
99
11
3322
0044
11
9966
P1P1 P2P2 P1P1 P2P2
commcomm
workwork
barrierbarrier
commcomm
workwork
barrierbarrier
commcomm
workwork
barrierbarrier
workwork
![Page 18: Processing graph/relational data with Map-Reduce and Bulk Synchronous Parallel](https://reader033.vdocuments.site/reader033/viewer/2022051312/54620e4eb4af9f581c8b45b1/html5/thumbnails/18.jpg)
Single Source Shortest Path
1144
22
55
33
66
11
99
11
3322
0044
11
9966
P1P1 P2P2 P1P1 P2P2
commcomm
workwork
barrierbarrier
commcomm
workwork
barrierbarrier
commcomm
workwork
barrierbarrier
commcommworkwork
barrierbarrier
Computation ends when Computation ends when there are no active there are no active vertices leftvertices left
![Page 19: Processing graph/relational data with Map-Reduce and Bulk Synchronous Parallel](https://reader033.vdocuments.site/reader033/viewer/2022051312/54620e4eb4af9f581c8b45b1/html5/thumbnails/19.jpg)
Bulk Synchronous Parallel
P1P1 P2P2 ...... PnPnsuperstepsuperstep
00
11
22
33
......
...... ...... ...... ......
superstep n cost =
wn + hn + ln
w0
h0
h1
h2
h3
l1
l0
l2
l3
w1
w2
w3
Time to finish work on slowest partition + cost of bulk communication + barrier synchronization time
![Page 20: Processing graph/relational data with Map-Reduce and Bulk Synchronous Parallel](https://reader033.vdocuments.site/reader033/viewer/2022051312/54620e4eb4af9f581c8b45b1/html5/thumbnails/20.jpg)
Bulk Synchronous Parallel
● Advantages
– Simple and portable execution model
– Clear cost model
– No concurrency control, no data races, deadlocks, etc.
● Disadvantages
– Coarse grained● Depends on a large “parallel slack”
– Requires well-partitioned problem space for efficiency (well balanced partitions)
more in [BSP] “A bridging model for parallel computation”
![Page 21: Processing graph/relational data with Map-Reduce and Bulk Synchronous Parallel](https://reader033.vdocuments.site/reader033/viewer/2022051312/54620e4eb4af9f581c8b45b1/html5/thumbnails/21.jpg)
Bulk Synchronous Parallel - extensions
● Combiners
– minimizing inter-node communication (h factor)
● Aggregators
– Computing global state (ex. map/reduce)
And other extensions...
![Page 22: Processing graph/relational data with Map-Reduce and Bulk Synchronous Parallel](https://reader033.vdocuments.site/reader033/viewer/2022051312/54620e4eb4af9f581c8b45b1/html5/thumbnails/22.jpg)
Sample code public void superStep() {public void superStep() {
int minDist = this.isStartingElement() ? 0 : Integer.MAX_VALUE;int minDist = this.isStartingElement() ? 0 : Integer.MAX_VALUE;
forfor(DistanceMessage msg: messages()) { (DistanceMessage msg: messages()) { // Choose min. proposed distance// Choose min. proposed distance
minDist = Math.min( minDist, msg.getDistance() );minDist = Math.min( minDist, msg.getDistance() );
}} ifif( minDist < this.getCurrentDistance() ) { ( minDist < this.getCurrentDistance() ) { //If improves the path, store and propagate//If improves the path, store and propagate this.setCurrentDistance(minDist);this.setCurrentDistance(minDist);
IVertex v = this.getElement();IVertex v = this.getElement(); forfor(IEdge r: v.getOutgoingEdges(DemoRelationshipTypes.KNOWS) ) {(IEdge r: v.getOutgoingEdges(DemoRelationshipTypes.KNOWS) ) { IElement recipient = r.getOtherElement(v);IElement recipient = r.getOtherElement(v); int rDist = this.getLengthOf(r);int rDist = this.getLengthOf(r); this.sendMessage( new DistanceMessage(minDist+rDist, recipient.getId()) );this.sendMessage( new DistanceMessage(minDist+rDist, recipient.getId()) ); } }} }
![Page 23: Processing graph/relational data with Map-Reduce and Bulk Synchronous Parallel](https://reader033.vdocuments.site/reader033/viewer/2022051312/54620e4eb4af9f581c8b45b1/html5/thumbnails/23.jpg)
SSSP - Map-Reduce Naive
● Idea [DPMR]:
– In map phase:● emit both signals and local vertex
structure and state– In reduce phase:
● gather signals and local vertex structure messages
● reconstruct vertex structure and state
![Page 24: Processing graph/relational data with Map-Reduce and Bulk Synchronous Parallel](https://reader033.vdocuments.site/reader033/viewer/2022051312/54620e4eb4af9f581c8b45b1/html5/thumbnails/24.jpg)
SSSP - Map-Reduce Naive
def map(Id nId, Node N):
//emit state and structure
emit(nId, N.graphStateAndStruct)
if(N.isActive)
for(nbr :N.adjacencyL)
//local computation
dist:= N.currDist+DistToNbr
//emit signals
emit(nbr.id, dist)
def reduce(Id rId, {m1,m2,..} ):
new M; M.deActivate
minDist = MAX_VALUE
for(m in {m1,m2,..})
if(m is Node) M:=m //state
else if(m is Distance) //signals
minDist = min( minDist, m )
if(M.currDist > minDist)
M.currDist:=minDist;
M.activate
emit(rId, M)
![Page 25: Processing graph/relational data with Map-Reduce and Bulk Synchronous Parallel](https://reader033.vdocuments.site/reader033/viewer/2022051312/54620e4eb4af9f581c8b45b1/html5/thumbnails/25.jpg)
SSSP - Map Reduce Naive - issues
● Cost associated with marshaling intermediate <k,v> pairs for combiners (which are optional)
– -> in-line combiner
● Need to pass the whole graph state and structure around
– -> “Shimmy trick” -- pin down the structure
● Partitions verticies without regard to graph topology
– -> cluster highly connected components together
![Page 26: Processing graph/relational data with Map-Reduce and Bulk Synchronous Parallel](https://reader033.vdocuments.site/reader033/viewer/2022051312/54620e4eb4af9f581c8b45b1/html5/thumbnails/26.jpg)
Inline Combiners
● In job configure:
– Initialize a map<NodeId, Distance>;● In job map operation:
– Do not emit interm. pairs ( emit(nbr.id, dist) ) ;
– Store them in the local map;
– Combine values in the same slots.● In job close:
– Emit a value from each slot in the map to a corresponding neighbour
● emit(nbr.id, map[nbr.id])
![Page 27: Processing graph/relational data with Map-Reduce and Bulk Synchronous Parallel](https://reader033.vdocuments.site/reader033/viewer/2022051312/54620e4eb4af9f581c8b45b1/html5/thumbnails/27.jpg)
“Shimmy trick”
● Store graph structure in a file system (no shuffle)
● Inspired by a parallel merge join
p1p1 p1p1
p2p2 p2p2
p3p3 p3p3
partitionpartition
sorted by join keysorted by join key sorted and partitioned by join keysorted and partitioned by join key
![Page 28: Processing graph/relational data with Map-Reduce and Bulk Synchronous Parallel](https://reader033.vdocuments.site/reader033/viewer/2022051312/54620e4eb4af9f581c8b45b1/html5/thumbnails/28.jpg)
“Shimmy trick”
● Assume:
– Graph G representation sorted by node ids;
– G partitioned into n parts: G1, G
2, .., G
n
– Use the same partitioner as in MR
– Set number of reducers to n● The above gives us:
– Reducer Ri, receives the same intermediate
keys as those in Gi graph partition (in
sorted order).
![Page 29: Processing graph/relational data with Map-Reduce and Bulk Synchronous Parallel](https://reader033.vdocuments.site/reader033/viewer/2022051312/54620e4eb4af9f581c8b45b1/html5/thumbnails/29.jpg)
“Shimmy trick”
def reduce(Id rId, {m1,m2,..} ):
repeat:
(id nId, node N) <- P.read()
if (nId != rId): N.deact; emit(nId, N)
until: nId == rId
minDist = MAX_VALUE
for(m in {m1,m2,..}):
minDist = min( minDist, m )
if(N.currDist > minDist)
N.currDist:=minDist;
N.activate
emit(rId, N)
def configure( ):
P.openGraphPartition()
def close( ):
repeat:
(id nId, node N) <-P.read()
N.deactivate
emit(nId, N)
![Page 30: Processing graph/relational data with Map-Reduce and Bulk Synchronous Parallel](https://reader033.vdocuments.site/reader033/viewer/2022051312/54620e4eb4af9f581c8b45b1/html5/thumbnails/30.jpg)
“Shimmy trick”
● Improvements:
– Files containing graph structure reside on dfs
– Reducers arbitrarily assigned to cluster machines
● -> remote reads.
● -> change the scheduler to assign key ranges to the same machines consistently.
![Page 31: Processing graph/relational data with Map-Reduce and Bulk Synchronous Parallel](https://reader033.vdocuments.site/reader033/viewer/2022051312/54620e4eb4af9f581c8b45b1/html5/thumbnails/31.jpg)
Topology-aware Partitioner
● Choose a partitioner that:
– minimizes inter-block traffic;
– maximizes intra-block traffic;
– places adjacent nodes in the same block
● Difficult to achieve particularly with many real world datasets:
– Power-law distributions
– Reported that state of the art partitioners (ex. parmetis) fail for such cases (???)
![Page 32: Processing graph/relational data with Map-Reduce and Bulk Synchronous Parallel](https://reader033.vdocuments.site/reader033/viewer/2022051312/54620e4eb4af9f581c8b45b1/html5/thumbnails/32.jpg)
MR Graph Processing Design Pattern
● [DPMR] reports 60% 70% improvement over naive implementation
● Solution closely resembles the BSP model
![Page 33: Processing graph/relational data with Map-Reduce and Bulk Synchronous Parallel](https://reader033.vdocuments.site/reader033/viewer/2022051312/54620e4eb4af9f581c8b45b1/html5/thumbnails/33.jpg)
BSP (inspired) implementations
● Google Pregel:– classic BSP, C++, production
● CMU GraphLab– inspired by BSP, java, multi-core
– consistency models, custom schedulers
● Apache Hama– scientific computation package that runs on top of
Hadoop, BSP, MS Dryad (?)
● Signal/Collect (Zurich University)– Scala, not yet distributed
● ...
![Page 34: Processing graph/relational data with Map-Reduce and Bulk Synchronous Parallel](https://reader033.vdocuments.site/reader033/viewer/2022051312/54620e4eb4af9f581c8b45b1/html5/thumbnails/34.jpg)
Open questions
● What problems are particularly suitable for MR and which ones for BSP – where are the boundaries?
– Topology-based centrality algorithms (PageRank):
● Algebraic, matrix-based methods vs. vertex-based ones?
● When considering graph algorithms:
– MR user base vs. BSP ergonomy?
– Performance overheads?● Relaxing the BSP synchronous schedule -->
“Amorphous data parallelism”
![Page 35: Processing graph/relational data with Map-Reduce and Bulk Synchronous Parallel](https://reader033.vdocuments.site/reader033/viewer/2022051312/54620e4eb4af9f581c8b45b1/html5/thumbnails/35.jpg)
POC, Sample Code
● Project Masuria (early stages, 2011-02)
– http://masuria-project.org/– As much POC of BSP framework as it is
(distributed) OSGI playground.
● Sample code:
– https://github.com/tch/Cloud9 *– [email protected]:tch_sandbox.git
– RunSSSPNaive.java
– RunSSSPShimmy.java *
* - expect (my) bugs
Based on Jimmy Lin and Michael Schatz Cloud9 library
![Page 36: Processing graph/relational data with Map-Reduce and Bulk Synchronous Parallel](https://reader033.vdocuments.site/reader033/viewer/2022051312/54620e4eb4af9f581c8b45b1/html5/thumbnails/36.jpg)
References
● [ADP] “Amorphous Data-parallelism in Irregular Algorithms”, Keshav Pingali et al.
● [BSP] “A bridging model for parallel computation”, Leslie G. Valiant
● [DPMR] “Design Patterns for Efficient Graph Algorithms in MapReduce”, Jimmy Lin and Michael Schatz