graphlab: a new framework for parallel machine learning · graph-parallel processing i...

74
GraphLab: A New Framework For Parallel Machine Learning Amir H. Payberah [email protected] Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 1 / 42

Upload: others

Post on 15-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

GraphLab: A New Framework For Parallel MachineLearning

Amir H. [email protected]

Amirkabir University of Technology(Tehran Polytechnic)

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 1 / 42

Page 2: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Reminder

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 2 / 42

Page 3: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Data-Parallel Model for Large-Scale Graph Processing

I The platforms that have worked well for developing parallel applica-tions are not necessarily effective for large-scale graph problems.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 3 / 42

Page 4: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Graph-Parallel Processing

I Restricts the types of computation.

I New techniques to partition and distribute graphs.

I Exploit graph structure.

I Executes graph algorithms orders-of-magnitude faster than moregeneral data-parallel systems.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 4 / 42

Page 5: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Data-Parallel vs. Graph-Parallel Computation

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 5 / 42

Page 6: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Pregel

I Vertex-centric

I Bulk Synchronous Parallel (BSP)

I Runs in sequence of iterations (supersteps)

I A vertex in superstep S can:• reads messages sent to it in superstep S-1.• sends messages to other vertices: receiving at superstep S+1.• modifies its state.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 6 / 42

Page 7: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Pregel

I Vertex-centric

I Bulk Synchronous Parallel (BSP)

I Runs in sequence of iterations (supersteps)

I A vertex in superstep S can:• reads messages sent to it in superstep S-1.• sends messages to other vertices: receiving at superstep S+1.• modifies its state.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 6 / 42

Page 8: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Pregel

I Vertex-centric

I Bulk Synchronous Parallel (BSP)

I Runs in sequence of iterations (supersteps)

I A vertex in superstep S can:• reads messages sent to it in superstep S-1.• sends messages to other vertices: receiving at superstep S+1.• modifies its state.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 6 / 42

Page 9: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Pregel

I Vertex-centric

I Bulk Synchronous Parallel (BSP)

I Runs in sequence of iterations (supersteps)

I A vertex in superstep S can:• reads messages sent to it in superstep S-1.• sends messages to other vertices: receiving at superstep S+1.• modifies its state.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 6 / 42

Page 10: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Pregel Limitations

I Inefficient if different regions of the graph converge at differentspeed.

I Can suffer if one task is more expensive than the others.

I Runtime of each phase is determined by the slowest machine.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 7 / 42

Page 11: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 8 / 42

Page 12: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Data Model

I A directed graph that stores the program state, called data graph.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 9 / 42

Page 13: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Vertex Scope

I The scope of vertex v is the data stored in vertex v, in all adjacentvertices and adjacent edges.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 10 / 42

Page 14: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Programming Model (1/3)

I Rather than adopting a message passing as in Pregel, GraphLaballows the user defined function of a vertex to read and modify anyof the data in its scope.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 11 / 42

Page 15: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Programming Model (2/3)

I Update function: user-defined function similar to Compute in Pregel.

I Can read and modify the data within the scope of a vertex.

I Schedules the future execution of other update functions.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 12 / 42

Page 16: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Programming Model (3/3)

I Sync function: similar to aggregate in Pregel.

I Maintains global aggregates.

I Performs periodically in the background.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 13 / 42

Page 17: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Execution Model

I Each task in the set of tasks T , is a tuple (f, v) consisting of anupdate function f and a vertex v.

I After executing an update function (f, g, · · ·) the modified scopedata in Sv is written back to the data graph.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 14 / 42

Page 18: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Execution Model

I Each task in the set of tasks T , is a tuple (f, v) consisting of anupdate function f and a vertex v.

I After executing an update function (f, g, · · ·) the modified scopedata in Sv is written back to the data graph.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 14 / 42

Page 19: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Execution Model

I Each task in the set of tasks T , is a tuple (f, v) consisting of anupdate function f and a vertex v.

I After executing an update function (f, g, · · ·) the modified scopedata in Sv is written back to the data graph.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 14 / 42

Page 20: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Example: PageRank

GraphLab_PageRank(i)

// compute sum over neighbors

total = 0

foreach(j in in_neighbors(i)):

total = total + R[j] * wji

// update the PageRank

R[i] = 0.15 + total

// trigger neighbors to run again

foreach(j in out_neighbors(i)):

signal vertex-program on j

R[i] = 0.15 +∑

j∈Nbrs(i)

wjiR[j]

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 15 / 42

Page 21: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Data Consistency (1/3)

I Overlapped scopes: race-condition in simultaneous execution of twoupdate functions.

I Full consistency: during the execution f(v), no other function readsor modifies data within the v scope.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 16 / 42

Page 22: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Data Consistency (1/3)

I Overlapped scopes: race-condition in simultaneous execution of twoupdate functions.

I Full consistency: during the execution f(v), no other function readsor modifies data within the v scope.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 16 / 42

Page 23: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Data Consistency (2/3)

I Edge consistency: during the execution f(v), no other functionreads or modifies any of the data on v or any of the edges adja-cent to v.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 17 / 42

Page 24: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Data Consistency (3/3)

I Vertex consistency: during the execution f(v), no other functionwill be applied to v.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 18 / 42

Page 25: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Sequential Consistency (1/2)

I Proving the correctness of a parallel algorithm: sequential consistency

I Sequential consistency: if for every parallel execution, there exists asequential execution of update functions that produces an equivalentresult.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 19 / 42

Page 26: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Sequential Consistency (1/2)

I Proving the correctness of a parallel algorithm: sequential consistency

I Sequential consistency: if for every parallel execution, there exists asequential execution of update functions that produces an equivalentresult.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 19 / 42

Page 27: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Sequential Consistency (2/2)

I A simple method to achieve serializability is to ensure that the scopesof concurrently executing update functions do not overlap.

• The full consistency model is used.

• The edge consistency model is used and update functions do not modifydata in adjacent vertices.

• The vertex consistency model is used and update functions only accesslocal vertex data.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 20 / 42

Page 28: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Sequential Consistency (2/2)

I A simple method to achieve serializability is to ensure that the scopesof concurrently executing update functions do not overlap.

• The full consistency model is used.

• The edge consistency model is used and update functions do not modifydata in adjacent vertices.

• The vertex consistency model is used and update functions only accesslocal vertex data.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 20 / 42

Page 29: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Sequential Consistency (2/2)

I A simple method to achieve serializability is to ensure that the scopesof concurrently executing update functions do not overlap.

• The full consistency model is used.

• The edge consistency model is used and update functions do not modifydata in adjacent vertices.

• The vertex consistency model is used and update functions only accesslocal vertex data.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 20 / 42

Page 30: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Sequential Consistency (2/2)

I A simple method to achieve serializability is to ensure that the scopesof concurrently executing update functions do not overlap.

• The full consistency model is used.

• The edge consistency model is used and update functions do not modifydata in adjacent vertices.

• The vertex consistency model is used and update functions only accesslocal vertex data.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 20 / 42

Page 31: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Consistency vs. Parallelism

Consistency vs. Parallelism

[Low, Y., GraphLab: A Distributed Abstraction for Large Scale Machine Learning (Doctoral dissertation, University of

California), 2013.]

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 21 / 42

Page 32: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

GraphLab Implementation

I Shared memory implementation

I Distributed implementation

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 22 / 42

Page 33: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

GraphLab Implementation

I Shared memory implementation

I Distributed implementation

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 22 / 42

Page 34: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Tasks Schedulers (1/2)

I In what order should the tasks (vertex-update function pairs) becalled?

• A collection of base schedules, e.g., round-robin, and synchronous.• Set scheduler: enables users to compose custom update schedules.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 23 / 42

Page 35: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Tasks Schedulers (1/2)

I In what order should the tasks (vertex-update function pairs) becalled?

• A collection of base schedules, e.g., round-robin, and synchronous.• Set scheduler: enables users to compose custom update schedules.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 23 / 42

Page 36: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Tasks Schedulers (2/2)

I How to add new task in the queue?

• FIFO: only permits task creation but do not permit task reordering.• Prioritized: permits task reordering at the cost of increased overhead.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 24 / 42

Page 37: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Tasks Schedulers (2/2)

I How to add new task in the queue?• FIFO: only permits task creation but do not permit task reordering.• Prioritized: permits task reordering at the cost of increased overhead.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 24 / 42

Page 38: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Consistency

I Implemented in C++ using PThreads for parallelism.

I Consistency: read-write lock

I Vertex consistency• Central vertex (write-lock)

I Edge consistency• Central vertex (write-lock)• Adjacent vertices (read-locks)

I Full consistency• Central vertex (write-locks)• Adjacent vertices (write-locks)

I Deadlocks are avoided by acquiring locks sequentially following acanonical order.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 25 / 42

Page 39: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Consistency

I Implemented in C++ using PThreads for parallelism.

I Consistency: read-write lock

I Vertex consistency• Central vertex (write-lock)

I Edge consistency• Central vertex (write-lock)• Adjacent vertices (read-locks)

I Full consistency• Central vertex (write-locks)• Adjacent vertices (write-locks)

I Deadlocks are avoided by acquiring locks sequentially following acanonical order.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 25 / 42

Page 40: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

GraphLab Implementation

I Shared memory implementation

I Distributed implementation

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 26 / 42

Page 41: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Distributed Implementation

I Graph partitioning• How to efficiently load, partition and distribute the data graph across

machines?

I Consistency• How to achieve consistency in the distributed setting?

I Fault tolerance

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 27 / 42

Page 42: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Graph Partitioning

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 28 / 42

Page 43: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Graph Partitioning - Phase 1 (1/2)

I Two-phase partitioning.

I Partitioning the data graph into k parts, called atom.• k � number of machines

I meta-graph: the graph of atoms (one vertex for each atom).

I Atom weight: the amount of data it stores.

I Edge weight: the number of edges crossing the atoms.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 29 / 42

Page 44: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Graph Partitioning - Phase 1 (1/2)

I Two-phase partitioning.

I Partitioning the data graph into k parts, called atom.• k � number of machines

I meta-graph: the graph of atoms (one vertex for each atom).

I Atom weight: the amount of data it stores.

I Edge weight: the number of edges crossing the atoms.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 29 / 42

Page 45: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Graph Partitioning - Phase 1 (1/2)

I Two-phase partitioning.

I Partitioning the data graph into k parts, called atom.• k � number of machines

I meta-graph: the graph of atoms (one vertex for each atom).

I Atom weight: the amount of data it stores.

I Edge weight: the number of edges crossing the atoms.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 29 / 42

Page 46: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Graph Partitioning - Phase 1 (1/2)

I Two-phase partitioning.

I Partitioning the data graph into k parts, called atom.• k � number of machines

I meta-graph: the graph of atoms (one vertex for each atom).

I Atom weight: the amount of data it stores.

I Edge weight: the number of edges crossing the atoms.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 29 / 42

Page 47: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Graph Partitioning - Phase 1 (2/2)

I Each atom is stored as a separate file on a distributed storage system,e.g., HDFS.

I Each atom file is a simple binary that stores interior and the ghostsof the partition information.

I Ghost: set of vertices and edges adjacent to the partition boundary.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 30 / 42

Page 48: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Graph Partitioning - Phase 1 (2/2)

I Each atom is stored as a separate file on a distributed storage system,e.g., HDFS.

I Each atom file is a simple binary that stores interior and the ghostsof the partition information.

I Ghost: set of vertices and edges adjacent to the partition boundary.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 30 / 42

Page 49: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Graph Partitioning - Phase 1 (2/2)

I Each atom is stored as a separate file on a distributed storage system,e.g., HDFS.

I Each atom file is a simple binary that stores interior and the ghostsof the partition information.

I Ghost: set of vertices and edges adjacent to the partition boundary.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 30 / 42

Page 50: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Graph Partitioning - Phase 2

I Meta-graph is very small.

I A fast balanced partition of the meta-graph over the physical ma-chines.

I Assigning graph atoms to machines.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 31 / 42

Page 51: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Consistency

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 32 / 42

Page 52: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Consistency

I To achieve a serializable parallel execution of a set of dependenttasks.

• Chromatic engine• Distributed locking engine

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 33 / 42

Page 53: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Consistency - Chromatic Engine

I Construct a vertex coloring: assigns a color to each vertex such thatno adjacent vertices share the same color.

I Edge consistency: executing, synchronously, all update tasks asso-ciated with vertices of the same color before proceeding to the nextcolor.

I Full consistency: no vertex shares the same color as any of its dis-tance two neighbors.

I Vertex consistency: assigning all vertices the same color.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 34 / 42

Page 54: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Consistency - Chromatic Engine

I Construct a vertex coloring: assigns a color to each vertex such thatno adjacent vertices share the same color.

I Edge consistency: executing, synchronously, all update tasks asso-ciated with vertices of the same color before proceeding to the nextcolor.

I Full consistency: no vertex shares the same color as any of its dis-tance two neighbors.

I Vertex consistency: assigning all vertices the same color.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 34 / 42

Page 55: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Consistency - Chromatic Engine

I Construct a vertex coloring: assigns a color to each vertex such thatno adjacent vertices share the same color.

I Edge consistency: executing, synchronously, all update tasks asso-ciated with vertices of the same color before proceeding to the nextcolor.

I Full consistency: no vertex shares the same color as any of its dis-tance two neighbors.

I Vertex consistency: assigning all vertices the same color.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 34 / 42

Page 56: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Consistency - Chromatic Engine

I Construct a vertex coloring: assigns a color to each vertex such thatno adjacent vertices share the same color.

I Edge consistency: executing, synchronously, all update tasks asso-ciated with vertices of the same color before proceeding to the nextcolor.

I Full consistency: no vertex shares the same color as any of its dis-tance two neighbors.

I Vertex consistency: assigning all vertices the same color.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 34 / 42

Page 57: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Consistency - Distributed Locking Engine

I Associating a readers-writer lock with each vertex.

I Vertex consistency• Central vertex (write-lock)

I Edge consistency• Central vertex (write-lock), Adjacent vertices (read-locks)

I Full consistency• Central vertex (write-locks), Adjacent vertices (write-locks)

I Deadlocks are avoided by acquiring locks sequentially following acanonical order.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 35 / 42

Page 58: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Consistency - Distributed Locking Engine

I Associating a readers-writer lock with each vertex.

I Vertex consistency• Central vertex (write-lock)

I Edge consistency• Central vertex (write-lock), Adjacent vertices (read-locks)

I Full consistency• Central vertex (write-locks), Adjacent vertices (write-locks)

I Deadlocks are avoided by acquiring locks sequentially following acanonical order.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 35 / 42

Page 59: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Consistency - Distributed Locking Engine

I Associating a readers-writer lock with each vertex.

I Vertex consistency• Central vertex (write-lock)

I Edge consistency• Central vertex (write-lock), Adjacent vertices (read-locks)

I Full consistency• Central vertex (write-locks), Adjacent vertices (write-locks)

I Deadlocks are avoided by acquiring locks sequentially following acanonical order.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 35 / 42

Page 60: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Consistency - Distributed Locking Engine

I Associating a readers-writer lock with each vertex.

I Vertex consistency• Central vertex (write-lock)

I Edge consistency• Central vertex (write-lock), Adjacent vertices (read-locks)

I Full consistency• Central vertex (write-locks), Adjacent vertices (write-locks)

I Deadlocks are avoided by acquiring locks sequentially following acanonical order.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 35 / 42

Page 61: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Consistency - Distributed Locking Engine

I Associating a readers-writer lock with each vertex.

I Vertex consistency• Central vertex (write-lock)

I Edge consistency• Central vertex (write-lock), Adjacent vertices (read-locks)

I Full consistency• Central vertex (write-locks), Adjacent vertices (write-locks)

I Deadlocks are avoided by acquiring locks sequentially following acanonical order.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 35 / 42

Page 62: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Fault Tolerance

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 36 / 42

Page 63: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Fault Tolerance - Synchronous

I The systems periodically signals all computation activity to halt.

I Then synchronizes all caches (ghosts) and saves to disk all datawhich has been modified since the last snapshot.

I Simple, but eliminates the systems advantage of asynchronous com-putation.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 37 / 42

Page 64: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Fault Tolerance - Synchronous

I The systems periodically signals all computation activity to halt.

I Then synchronizes all caches (ghosts) and saves to disk all datawhich has been modified since the last snapshot.

I Simple, but eliminates the systems advantage of asynchronous com-putation.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 37 / 42

Page 65: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Fault Tolerance - Synchronous

I The systems periodically signals all computation activity to halt.

I Then synchronizes all caches (ghosts) and saves to disk all datawhich has been modified since the last snapshot.

I Simple, but eliminates the systems advantage of asynchronous com-putation.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 37 / 42

Page 66: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Fault Tolerance - Asynchronous

I Based on the Chandy-Lamport algorithm.

I The snapshot function is implemented as an update function invertices.

I The snapshot update takes priority over all other update functions.

I Edge Consistency is used on all update functions.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 38 / 42

Page 67: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Fault Tolerance - Asynchronous

I Based on the Chandy-Lamport algorithm.

I The snapshot function is implemented as an update function invertices.

I The snapshot update takes priority over all other update functions.

I Edge Consistency is used on all update functions.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 38 / 42

Page 68: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Fault Tolerance - Asynchronous

I Based on the Chandy-Lamport algorithm.

I The snapshot function is implemented as an update function invertices.

I The snapshot update takes priority over all other update functions.

I Edge Consistency is used on all update functions.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 38 / 42

Page 69: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Fault Tolerance - Asynchronous

I Based on the Chandy-Lamport algorithm.

I The snapshot function is implemented as an update function invertices.

I The snapshot update takes priority over all other update functions.

I Edge Consistency is used on all update functions.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 38 / 42

Page 70: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Fault Tolerance - Asynchronous

I Based on the Chandy-Lamport algorithm.

I The snapshot function is implemented as an update function invertices.

I The snapshot update takes priority over all other update functions.

I Edge Consistency is used on all update functions.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 38 / 42

Page 71: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Summary

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 39 / 42

Page 72: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

GraphLab Summary

I Asynchronous model

I Vertex-centric

I Communication: distributed shared memory

I Three consistency levels: full, edge-level, and vertex-level

I Partitioning: two-phase partitioning

I Consistency: chromatic engine (graph coloring), distributed lockengine (reader-writer lock)

I Fault tolerance: synchronous, asynchronous (chandy-lamport)Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 40 / 42

Page 73: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

GraphLab Limitations

I Poor performance on Natural graphs.

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 41 / 42

Page 74: GraphLab: A New Framework For Parallel Machine Learning · Graph-Parallel Processing I Restrictsthetypes of computation. I New techniques topartition and distribute graphs. I Exploit

Questions?

Amir H. Payberah (Tehran Polytechnic) GraphLab 1393/9/8 42 / 42