carnegie mellon university graphlab tutorial yucheng low
TRANSCRIPT
![Page 1: Carnegie Mellon University GraphLab Tutorial Yucheng Low](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e585503460f94b50dbe/html5/thumbnails/1.jpg)
Carnegie Mellon University
GraphLab TutorialYucheng Low
2
![Page 2: Carnegie Mellon University GraphLab Tutorial Yucheng Low](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e585503460f94b50dbe/html5/thumbnails/2.jpg)
GraphLab Team
YuchengLow
AapoKyrola
JayGu
JosephGonzalez
DannyBickson
Carlos Guestrin
![Page 3: Carnegie Mellon University GraphLab Tutorial Yucheng Low](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e585503460f94b50dbe/html5/thumbnails/3.jpg)
GraphLab 0.5 (2010) Internal Experimental Code
Insanely Templatized
Development History
GraphLab 1 (2011)
Nearly Everything is Templatized
First Open Source Release (< June 2011 LGPL >= June 2011 APL)
GraphLab 2 (2012)
Many Things are Templatized
Shared Memory : Jan 2012Distributed : May 2012
![Page 4: Carnegie Mellon University GraphLab Tutorial Yucheng Low](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e585503460f94b50dbe/html5/thumbnails/4.jpg)
Graphlab 2 Technical Design Goals
Improved useabilityDecreased compile timeAs good or better performance than GraphLab 1Improved distributed scalability
… other abstraction changes … (come to the talk!)
![Page 5: Carnegie Mellon University GraphLab Tutorial Yucheng Low](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e585503460f94b50dbe/html5/thumbnails/5.jpg)
Development HistoryEver since GraphLab 1.0, all active development are open source (APL):
code.google.com/p/graphlabapi/
(Even current experimental code. Activated with a --experimental flag on ./configure )
![Page 6: Carnegie Mellon University GraphLab Tutorial Yucheng Low](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e585503460f94b50dbe/html5/thumbnails/6.jpg)
Guaranteed Target Platforms• Any x86 Linux system with gcc >= 4.2• Any x86 Mac system with gcc 4.2.1 ( OS X 10.5 ?? )
• Other platforms?
… We welcome contributors.
![Page 7: Carnegie Mellon University GraphLab Tutorial Yucheng Low](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e585503460f94b50dbe/html5/thumbnails/7.jpg)
Tutorial OutlineGraphLab in a few slides + PageRankChecking out GraphLab v2Implementing PageRank in GraphLab v2Overview of different GraphLab schedulersPreview of Distributed GraphLab v2
(may not work in your checkout!)Ongoing work… (however much as time allows)
![Page 8: Carnegie Mellon University GraphLab Tutorial Yucheng Low](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e585503460f94b50dbe/html5/thumbnails/8.jpg)
WarningA preview of code still in intensive development!
Things may or may not work for you!
Interface may still change!
GraphLab 1 GraphLab 2 still has a number of performance regressions we are ironing out.
![Page 9: Carnegie Mellon University GraphLab Tutorial Yucheng Low](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e585503460f94b50dbe/html5/thumbnails/9.jpg)
PageRank ExampleIterate:
Where:α is the random reset probabilityL[j] is the number of links on page j
1 32
4 65
![Page 10: Carnegie Mellon University GraphLab Tutorial Yucheng Low](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e585503460f94b50dbe/html5/thumbnails/10.jpg)
10
The GraphLab Framework
Scheduler Consistency Model
Graph BasedData Representation
Update FunctionsUser Computation
![Page 11: Carnegie Mellon University GraphLab Tutorial Yucheng Low](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e585503460f94b50dbe/html5/thumbnails/11.jpg)
11
Data GraphA graph with arbitrary data (C++ Objects) associated with each vertex and edge
Vertex Data:• Webpage• Webpage Features
Edge Data:• Link weight
Graph:• Link graph
![Page 12: Carnegie Mellon University GraphLab Tutorial Yucheng Low](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e585503460f94b50dbe/html5/thumbnails/12.jpg)
12
The GraphLab Framework
Scheduler Consistency Model
Graph BasedData Representation
Update FunctionsUser Computation
![Page 13: Carnegie Mellon University GraphLab Tutorial Yucheng Low](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e585503460f94b50dbe/html5/thumbnails/13.jpg)
pagerank(i, scope){ // Get Neighborhood data (R[i], Wij, R[j]) scope;
// Update the vertex data
// Reschedule Neighbors if needed if R[i] changes then reschedule_neighbors_of(i); }
;][)1(][][
iNj
ji jRWiR
Update Functions
13
An update function is a user defined program which when applied to a vertex transforms the data in the scope of the vertex
![Page 14: Carnegie Mellon University GraphLab Tutorial Yucheng Low](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e585503460f94b50dbe/html5/thumbnails/14.jpg)
14
Dynamic Schedule
e f g
kjih
dcbaCPU 1
CPU 2
a
h
a
b
b
i
Process repeats until scheduler is empty
![Page 15: Carnegie Mellon University GraphLab Tutorial Yucheng Low](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e585503460f94b50dbe/html5/thumbnails/15.jpg)
Source Code Interjection 1
Graph, update functions, and schedulers
![Page 16: Carnegie Mellon University GraphLab Tutorial Yucheng Low](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e585503460f94b50dbe/html5/thumbnails/16.jpg)
--scope=vertex--scope=edge
![Page 17: Carnegie Mellon University GraphLab Tutorial Yucheng Low](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e585503460f94b50dbe/html5/thumbnails/17.jpg)
Consistency
Trade-offConsistency “Throughput”
# “iterations” per second
Goal of ML algorithm: Converge
False Trade-off
![Page 18: Carnegie Mellon University GraphLab Tutorial Yucheng Low](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e585503460f94b50dbe/html5/thumbnails/18.jpg)
18
Ensuring Race-Free CodeHow much can computation overlap?
![Page 19: Carnegie Mellon University GraphLab Tutorial Yucheng Low](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e585503460f94b50dbe/html5/thumbnails/19.jpg)
19
The GraphLab Framework
Scheduler Consistency Model
Graph BasedData Representation
Update FunctionsUser Computation
![Page 20: Carnegie Mellon University GraphLab Tutorial Yucheng Low](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e585503460f94b50dbe/html5/thumbnails/20.jpg)
Importance of ConsistencyFast ML Algorithm development cycle:
Build
Test
Debug
Tweak Model
Necessary for framework to behave predictably and consistently and avoid problems caused by non-determinism.Is the execution wrong? Or is the model wrong?
20
![Page 21: Carnegie Mellon University GraphLab Tutorial Yucheng Low](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e585503460f94b50dbe/html5/thumbnails/21.jpg)
Full Consistency
Guaranteed safety for all update functions
![Page 22: Carnegie Mellon University GraphLab Tutorial Yucheng Low](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e585503460f94b50dbe/html5/thumbnails/22.jpg)
Full Consistency
Parallel update only allowed two vertices apart Reduced opportunities for parallelism
![Page 23: Carnegie Mellon University GraphLab Tutorial Yucheng Low](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e585503460f94b50dbe/html5/thumbnails/23.jpg)
Obtaining More Parallelism
Not all update functions will modify the entire scope!
Belief Propagation: Only uses edge dataGibbs Sampling: Only needs to read adjacent vertices
![Page 24: Carnegie Mellon University GraphLab Tutorial Yucheng Low](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e585503460f94b50dbe/html5/thumbnails/24.jpg)
Edge Consistency
![Page 25: Carnegie Mellon University GraphLab Tutorial Yucheng Low](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e585503460f94b50dbe/html5/thumbnails/25.jpg)
Obtaining More Parallelism
“Map” operations. Feature extraction on vertex data
![Page 26: Carnegie Mellon University GraphLab Tutorial Yucheng Low](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e585503460f94b50dbe/html5/thumbnails/26.jpg)
Vertex Consistency
![Page 27: Carnegie Mellon University GraphLab Tutorial Yucheng Low](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e585503460f94b50dbe/html5/thumbnails/27.jpg)
The GraphLab Framework
Scheduler Consistency Model
Graph BasedData Representation
Update FunctionsUser Computation
27
![Page 28: Carnegie Mellon University GraphLab Tutorial Yucheng Low](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e585503460f94b50dbe/html5/thumbnails/28.jpg)
Shared VariablesGlobal aggregation through Sync OperationA global parallel reduction over the graph dataSynced variables recomputed at defined intervals while update functions are running
Sync: HighestPageRank
Sync: Loglikelihood
28
![Page 29: Carnegie Mellon University GraphLab Tutorial Yucheng Low](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e585503460f94b50dbe/html5/thumbnails/29.jpg)
Source Code Interjection 2
Shared variables
![Page 30: Carnegie Mellon University GraphLab Tutorial Yucheng Low](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e585503460f94b50dbe/html5/thumbnails/30.jpg)
What can we do with these primitives?
…many many things…
![Page 31: Carnegie Mellon University GraphLab Tutorial Yucheng Low](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e585503460f94b50dbe/html5/thumbnails/31.jpg)
Matrix FactorizationNetflix Collaborative Filtering
Alternating Least Squares Matrix Factorization
Model: 0.5 million nodes, 99 million edges
Netflix
Users
Movies
d
![Page 32: Carnegie Mellon University GraphLab Tutorial Yucheng Low](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e585503460f94b50dbe/html5/thumbnails/32.jpg)
NetflixSpeedup Increasing size of the matrix factorization
![Page 33: Carnegie Mellon University GraphLab Tutorial Yucheng Low](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e585503460f94b50dbe/html5/thumbnails/33.jpg)
Video Co-SegmentationDiscover “coherent”segment types acrossa video (extends Batra et al. ‘10)
1. Form super-voxels video2. EM & inference in Markov random field
Large model: 23 million nodes, 390 million edges
GraphLab
Ideal
![Page 34: Carnegie Mellon University GraphLab Tutorial Yucheng Low](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e585503460f94b50dbe/html5/thumbnails/34.jpg)
Many MoreTensor FactorizationBayesian Matrix FactorizationGraphical Model Inference/LearningLinear SVMEM clusteringLinear Solvers using GaBPSVDEtc.
![Page 35: Carnegie Mellon University GraphLab Tutorial Yucheng Low](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e585503460f94b50dbe/html5/thumbnails/35.jpg)
Distributed Preview
![Page 36: Carnegie Mellon University GraphLab Tutorial Yucheng Low](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e585503460f94b50dbe/html5/thumbnails/36.jpg)
GraphLab 2 Abstraction
Changes(an overview couple of them)
(Come to the talk for the rest!)
![Page 37: Carnegie Mellon University GraphLab Tutorial Yucheng Low](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e585503460f94b50dbe/html5/thumbnails/37.jpg)
Exploiting Update Functors
(for the greater good)
![Page 38: Carnegie Mellon University GraphLab Tutorial Yucheng Low](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e585503460f94b50dbe/html5/thumbnails/38.jpg)
Exploiting Update Functors (for the greater good)
1. Update Functors store state2. Scheduler schedules update functor instances.
3. We can use update functors as a controlled asynchronous message passing to communicate between vertices!
![Page 39: Carnegie Mellon University GraphLab Tutorial Yucheng Low](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e585503460f94b50dbe/html5/thumbnails/39.jpg)
Delta Based Update Functorsstruct pagerank : public iupdate_functor<graph, pagerank> {
double delta;pagerank(double d) : delta(d) { }void operator+=(pagerank& other) { delta +=
other.delta; }void operator()(icontext_type& context) {
vertex_data& vdata = context.vertex_data();
vdata.rank += delta;if(abs(delta) > EPSILON) {
double out_delta = delta * (1 – RESET_PROB) *
1/context.num_out_edges(edge.source());
context.schedule_out_neighbors(pagerank(out_delta));}
}};// Initial Rank: R[i] = 0;// Initial Schedule: pagerank(RESET_PROB);
![Page 40: Carnegie Mellon University GraphLab Tutorial Yucheng Low](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e585503460f94b50dbe/html5/thumbnails/40.jpg)
Asynchronous Message PassingObviously not all computation can be written this way. But when it can; it can be extremely fast.
![Page 41: Carnegie Mellon University GraphLab Tutorial Yucheng Low](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e585503460f94b50dbe/html5/thumbnails/41.jpg)
Factorized Updates
![Page 42: Carnegie Mellon University GraphLab Tutorial Yucheng Low](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e585503460f94b50dbe/html5/thumbnails/42.jpg)
PageRank in GraphLab
struct pagerank : public iupdate_functor<graph, pagerank> {
void operator()(icontext_type& context) {vertex_data& vdata =
context.vertex_data(); double sum = 0;foreach ( edge_type edge,
context.in_edges() )sum +=
context.const_edge_data(edge).weight *
context.const_vertex_data(edge.source()).rank;double old_rank = vdata.rank;vdata.rank = RESET_PROB + (1-RESET_PROB) *
sum;double residual = abs(vdata.rank –
old_rank) /
context.num_out_edges();if (residual > EPSILON)
context.reschedule_out_neighbors(pagerank());}
};
![Page 43: Carnegie Mellon University GraphLab Tutorial Yucheng Low](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e585503460f94b50dbe/html5/thumbnails/43.jpg)
PageRank in GraphLab
struct pagerank : public iupdate_functor<graph, pagerank> {
void operator()(icontext_type& context) {vertex_data& vdata =
context.vertex_data(); double sum = 0;foreach ( edge_type edge,
context.in_edges() )sum +=
context.const_edge_data(edge).weight *
context.const_vertex_data(edge.source()).rank;double old_rank = vdata.rank;vdata.rank = RESET_PROB + (1-RESET_PROB) *
sum;double residual = abs(vdata.rank –
old_rank) /
context.num_out_edges();if (residual > EPSILON)
context.reschedule_out_neighbors(pagerank());}
};
Atomic Single Vertex Apply
Parallel Scatter [Reschedule]
Parallel “Sum” Gather
![Page 44: Carnegie Mellon University GraphLab Tutorial Yucheng Low](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e585503460f94b50dbe/html5/thumbnails/44.jpg)
Decomposable Update Functors
Decompose update functions into 3 phases:
+ + … + Δ
Y YY
ParallelSum
User Defined:
Gather( ) ΔY
Δ1 + Δ2 Δ3
Y Scope
Gather
Y
YApply( , Δ) Y
Apply the accumulated value to center vertex
User Defined:
Apply
Y
Scatter( )
Update adjacent edgesand vertices.
User Defined:Y
Scatter
![Page 45: Carnegie Mellon University GraphLab Tutorial Yucheng Low](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e585503460f94b50dbe/html5/thumbnails/45.jpg)
Factorized PageRankstruct pagerank : public iupdate_functor<graph, pagerank> { double accum = 0, residual = 0;
void gather(icontext_type& context, const edge_type& edge) {
accum += context.const_edge_data(edge).weight *
context.const_vertex_data(edge.source()).rank;}void merge(const pagerank& other) { accum +=
other.accum; }void apply(icontext_type& context) {
vertex_data& vdata = context.vertex_data();double old_value = vdata.rank;vdata.rank = RESET_PROB + (1 - RESET_PROB)
* accum; residual = fabs(vdata.rank – old_value) /
context.num_out_edges();}void scatter(icontext_type& context, const
edge_type& edge) {if (residual > EPSILON)
context.schedule(edge.target(), pagerank());
}};
![Page 46: Carnegie Mellon University GraphLab Tutorial Yucheng Low](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e585503460f94b50dbe/html5/thumbnails/46.jpg)
Demo of *everything*
PageRank
![Page 47: Carnegie Mellon University GraphLab Tutorial Yucheng Low](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e585503460f94b50dbe/html5/thumbnails/47.jpg)
Ongoing WorkExtensions to improve performance on large graphs.
(See the GraphLab talk later!!)Better distributed Graph representation methodsPossibly better Graph PartitioningOff-core Graph storageContinually changing graphs
All New rewrite of distributed GraphLab (come back in May!)
![Page 48: Carnegie Mellon University GraphLab Tutorial Yucheng Low](https://reader036.vdocuments.site/reader036/viewer/2022062516/56649e585503460f94b50dbe/html5/thumbnails/48.jpg)
Ongoing WorkExtensions to improve performance on large graphs.
(See the GraphLab talk later!!)Better distributed Graph representation methodsPossibly better Graph PartitioningOff-core Graph storageContinually changing graphs
All New rewrite of distributed GraphLab (come back in May!)