carnegie mellon university danny bickson yucheng low aapo kyrola carlos guestrin joe hellerstein...
TRANSCRIPT
![Page 1: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/1.jpg)
Carnegie Mellon University
Danny Bickson
YuchengLow
AapoKyrola
Carlos Guestrin
JoeHellerstein
AlexSmola
Parallel Machine Learning for Large-Scale Graphs
JayGu
JosephGonzalez
The GraphLab Team:
![Page 2: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/2.jpg)
Parallelism is DifficultWide array of different parallel architectures:
Different challenges for each architecture
GPUs Multicore Clusters Clouds Supercomputers
High Level Abstractions to make things easier
![Page 3: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/3.jpg)
How will wedesign and implement
parallel learning systems?
![Page 4: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/4.jpg)
Map-Reduce / HadoopBuild learning algorithms on-top of
high-level parallel abstractions
... a popular answer:
![Page 5: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/5.jpg)
BeliefPropagation
Label Propagation
KernelMethods
Deep BeliefNetworks
NeuralNetworks
Tensor Factorization
PageRank
Lasso
Map-Reduce for Data-Parallel MLExcellent for large data-parallel tasks!
Data-Parallel Graph-Parallel
CrossValidation
Feature Extraction
Map Reduce
Computing SufficientStatistics
![Page 6: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/6.jpg)
Example of Graph Parallelism
![Page 7: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/7.jpg)
PageRank ExampleIterate:
Where:α is the random reset probabilityL[j] is the number of links on page j
1 32
4 65
![Page 8: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/8.jpg)
Properties of Graph Parallel Algorithms
DependencyGraph
IterativeComputation
My Rank
Friends Rank
LocalUpdates
![Page 9: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/9.jpg)
BeliefPropagation
SVM
KernelMethods
Deep BeliefNetworks
NeuralNetworks
Tensor Factorization
PageRank
Lasso
Addressing Graph-Parallel MLWe need alternatives to Map-Reduce
Data-Parallel Graph-Parallel
CrossValidation
Feature Extraction
Map Reduce
Computing SufficientStatistics
Map Reduce?Pregel (Giraph)?
![Page 10: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/10.jpg)
Barrie
rPregel (Giraph)
Bulk Synchronous Parallel Model:
Compute Communicate
![Page 11: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/11.jpg)
Bulk synchronous computation can be
highly inefficient
Problem:
![Page 12: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/12.jpg)
BSP Systems Problem:Curse of the Slow Job
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
CPU 1
CPU 2
CPU 3
CPU 1
CPU 2
CPU 3
Data
Data
Data
Data
Data
Data
Data
CPU 1
CPU 2
CPU 3
Iterations
Barr
ier
Barr
ier
Data
Data
Data
Data
Data
Data
Data
Barr
ier
![Page 13: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/13.jpg)
BeliefPropagationSVM
KernelMethods
Deep BeliefNetworks
NeuralNetworks
Tensor Factorization
PageRank
Lasso
The Need for a New AbstractionIf not Pregel, then what?
Data-Parallel Graph-Parallel
CrossValidation
Feature Extraction
Map Reduce
Computing SufficientStatistics
Pregel (Giraph)
![Page 14: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/14.jpg)
The GraphLab SolutionDesigned specifically for ML needs
Express data dependenciesIterative
Simplifies the design of parallel programs:Abstract away hardware issuesAutomatic data synchronizationAddresses multiple hardware architectures
MulticoreDistributedCloud computing GPU implementation in progress
![Page 15: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/15.jpg)
What is GraphLab?
![Page 16: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/16.jpg)
The GraphLab Framework
Scheduler Consistency Model
Graph BasedData Representation
Update FunctionsUser Computation
![Page 17: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/17.jpg)
Data GraphA graph with arbitrary data (C++ Objects) associated with each vertex and edge.
Vertex Data:• User profile text• Current interests estimates
Edge Data:• Similarity weights
Graph:• Social Network
![Page 18: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/18.jpg)
pagerank(i, scope){ // Get Neighborhood data (R[i], Wij, R[j]) scope;
// Update the vertex data
// Reschedule Neighbors if needed if R[i] changes then reschedule_neighbors_of(i); }
;][)1(][][
iNj
ji jRWiR
Update FunctionsAn update function is a user defined program which when applied to a vertex transforms the data in the scope of the vertex
Dynamic computation
![Page 19: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/19.jpg)
The Scheduler
CPU 1
CPU 2
The scheduler determines the order that vertices are updated
e f g
kjih
dcba b
ih
a
i
b e f
j
c
Sch
edule
r
The process repeats until the scheduler is empty
![Page 20: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/20.jpg)
The GraphLab Framework
Scheduler Consistency Model
Graph BasedData Representation
Update FunctionsUser Computation
![Page 21: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/21.jpg)
Ensuring Race-Free CodeHow much can computation overlap?
![Page 22: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/22.jpg)
Need for Consistency?
No Consisten
cy Higher Throug
hput(#updates/sec)
Potentially Slower
Convergence of ML
![Page 23: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/23.jpg)
Inconsistent ALS
0 2000000 4000000 6000000 8000000
2
20 Dynamic Inconsistent
Dynamic
Updates
Trai
n RM
SE
Netflix data, 8 cores
Consistent
![Page 24: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/24.jpg)
Even Simple PageRank can be Dangerous
GraphLab_pagerank(scope) {ref sum = scope.center_valuesum = 0forall (neighbor in scope.in_neighbors )
sum = sum + neighbor.value / nbr.num_out_edges
sum = ALPHA + (1-ALPHA) * sum…
![Page 25: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/25.jpg)
Inconsistent PageRank
![Page 26: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/26.jpg)
Even Simple PageRank can be Dangerous
GraphLab_pagerank(scope) {ref sum = scope.center_valuesum = 0forall (neighbor in scope.in_neighbors)
sum = sum + neighbor.value / nbr.num_out_edges
sum = ALPHA + (1-ALPHA) * sum…
CPU 1 CPU 2Read
Read-write race CPU 1 reads bad PageRank estimate, as CPU 2 computes value
![Page 27: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/27.jpg)
Race Condition Can Be Very SubtleGraphLab_pagerank(scope) {
ref sum = scope.center_valuesum = 0forall (neighbor in scope.in_neighbors)
sum = sum + neighbor.value / neighbor.num_out_edges
sum = ALPHA + (1-ALPHA) * sum…
GraphLab_pagerank(scope) {sum = 0forall (neighbor in scope.in_neighbors)
sum = sum + neighbor.value / nbr.num_out_edges
sum = ALPHA + (1-ALPHA) * sumscope.center_value = sum …
Uns
tabl
eSt
able
This was actually encountered in user code.
![Page 28: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/28.jpg)
GraphLab Ensures Sequential Consistency
For each parallel execution, there exists a sequential execution of update functions which produces the same result.
CPU 1
CPU 2
SingleCPU
Parallel
Sequential
time
![Page 29: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/29.jpg)
Consistency Rules
Guaranteed sequential consistency for all update functions
Data
![Page 30: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/30.jpg)
Full Consistency
![Page 31: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/31.jpg)
Obtaining More Parallelism
![Page 32: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/32.jpg)
Edge Consistency
CPU 1 CPU 2
Safe
Read
![Page 33: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/33.jpg)
The GraphLab Framework
Scheduler Consistency Model
Graph BasedData Representation
Update FunctionsUser Computation
![Page 34: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/34.jpg)
Carnegie Mellon University
What algorithms are implemented in
GraphLab?
![Page 35: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/35.jpg)
Bayesian Tensor Factorization
Gibbs SamplingDynamic Block Gibbs Sampling
MatrixFactorization
Lasso
SVM
Belief PropagationPageRank
CoEM
K-Means
SVD
LDA
…Many others…Linear Solvers
Splash SamplerAlternating Least
Squares
![Page 36: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/36.jpg)
GraphLab Libraries
Matrix factorizationSVD,PMF, BPTF, ALS, NMF, Sparse ALS, Weighted ALS, SVD++, time-SVD++, SGD
Linear SolversJacobi, GaBP, Shotgun Lasso, Sparse logistic regression, CG
ClusteringK-means, Fuzzy K-means, LDA, K-core decomposition
InferenceDiscrete BP, NBP, Kernel BP
![Page 37: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/37.jpg)
Carnegie Mellon University
Efficient MulticoreCollaborative Filtering
LeBuSiShu team – 5th place in track1
Institute of AutomationChinese Academy of Sciences
Machine Learning DeptCarnegie Mellon University
ACM KDD CUP Workshop 2011
Yao Wu Qiang Yan Danny Bickson Yucheng LowQing Yang
![Page 38: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/38.jpg)
![Page 39: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/39.jpg)
ACM KDD CUP 2011
• Task: predict music score• Two main challenges:
• Data magnitude – 260M ratings• Taxonomy of data
![Page 40: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/40.jpg)
Data taxonomy
![Page 41: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/41.jpg)
Our approach
• Use ensemble method• Custom SGD algorithm for handling
taxonomy
![Page 42: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/42.jpg)
Ensemble method
• Solutions are merged using linear regression
![Page 43: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/43.jpg)
Performance results
Blended Validation RMSE: 19.90
![Page 44: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/44.jpg)
Classical Matrix Factorization
Sparse Matrix
Users
Item
d
![Page 45: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/45.jpg)
MFITR
Sparse Matrix
Users
d
Features of the ArtistFeatures of the AlbumItem Specific Features
“Effective Feature of an Item”
![Page 46: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/46.jpg)
Intuitively, features of an artist and features of his/her album should be “similar”. How do we express this?
Album
Artist
Track
• Penalty terms which ensure Artist/Album/Track features are “close”
• Strength of penalty depends on “normalized rating similarity”
(See neighborhood model)
![Page 47: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/47.jpg)
Fine Tuning ChallengeDataset has around 260M observed ratings12 different algorithms, total 53 tunable parametersHow do we train and cross validate all these parameters?
USE GRAPHLAB!
![Page 48: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/48.jpg)
16 Cores Runtime
![Page 49: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/49.jpg)
Speedup plots
![Page 50: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/50.jpg)
![Page 51: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/51.jpg)
Carnegie Mellon University
Who is using GraphLab?
![Page 52: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/52.jpg)
Universities using GraphLab
![Page 53: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/53.jpg)
Companies tyring out GraphLab2400++ Unique Downloads Tracked
(possibly many more from direct repository checkouts)
Startups using GraphLab
![Page 54: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/54.jpg)
User community
![Page 55: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/55.jpg)
Performance results
![Page 56: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/56.jpg)
GraphLab vs. Pregel (BSP)
Multicore PageRank (25M Vertices, 355M Edges)
0 10 20 30 40 50 60 701
100
10000
1000000
100000000
Number of Updates
Num
-Ver
tices 51% updated only once
02000
40006000
800010000
1200014000
1.00E-02
1.00E+00
1.00E+02
1.00E+04
1.00E+06
1.00E+08
Runtime (s)
L1 E
rror
GraphLab
Pregel(via GraphLab)
0.0E+00 5.0E+08 1.0E+09 1.5E+09 2.0E+091.00E-02
1.00E+00
1.00E+02
1.00E+04
1.00E+06
1.00E+08
Updates
L1 E
rror
GraphLab
Pregel(via GraphLab)
![Page 57: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/57.jpg)
CoEM (Rosie Jones, 2005)Named Entity Recognition Task
the dog
Australia
Catalina Island
<X> ran quickly
travelled to <X>
<X> is pleasant
Hadoop 95 Cores 7.5 hrs
Is “Dog” an animal?Is “Catalina” a place?
Vertices: 2 MillionEdges: 200 Million
![Page 58: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/58.jpg)
0 2 4 6 8 10 12 14 160
2
4
6
8
10
12
14
16
Number of CPUs
Spee
dup
Bett
er
Optimal
GraphLab CoEM
CoEM (Rosie Jones, 2005)
62
GraphLab 16 Cores 30 min
15x Faster!6x fewer CPUs!
Hadoop 95 Cores 7.5 hrs
![Page 59: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/59.jpg)
Carnegie Mellon
GraphLab in the Cloud
![Page 60: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/60.jpg)
CoEM (Rosie Jones, 2005)
Optimal
0 2 4 6 8 10 12 14 160
2
4
6
8
10
12
14
16
Number of CPUs
Sp
eed
up
Bett
er
Small
Large
GraphLab 16 Cores 30 min
Hadoop 95 Cores 7.5 hrs
GraphLabin the Cloud
32 EC2 machines
80 secs
0.3% of Hadoop time
![Page 61: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/61.jpg)
Cost-Time Tradeoff video co-segmentation results
more machines, higher cost
fast
er
a few machines helps a lot
diminishingreturns
![Page 62: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/62.jpg)
Netflix Collaborative FilteringAlternating Least Squares Matrix Factorization
Model: 0.5 million nodes, 99 million edges
Netflix
Users
Movies
DHadoopMPI
GraphLab
Ideal
D=100
D=20
![Page 63: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/63.jpg)
Multicore Abstraction Comparison
Netflix Matrix Factorization
0 2000000 4000000 6000000 8000000 10000000-0.036
-0.034
-0.032
-0.03
-0.028
-0.026
-0.024
-0.022
-0.02
DynamicRound Robin
Updates
Log
Test
Err
or
Dynamic Computation,Faster Convergence
![Page 64: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/64.jpg)
The Cost of Hadoop
![Page 65: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/65.jpg)
Carnegie Mellon University
Fault Tolerance
![Page 66: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/66.jpg)
Fault-ToleranceLarger Problems Increased chance of Machine Failure
GraphLab2 Introduces two fault tolerance (checkpointing) mechanisms
Synchronous SnapshotsChandi-Lamport Asynchronous Snapshots
![Page 67: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/67.jpg)
Synchronous Snapshots
Run GraphLab Run GraphLab
Barrier + Snapshot
Tim
e
Run GraphLab Run GraphLab
Barrier + Snapshot
Run GraphLab Run GraphLab
![Page 68: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/68.jpg)
Curse of the slow machine
sync.Snapshot
No Snapshot
![Page 69: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/69.jpg)
Curse of the Slow Machine
Run GraphLab
Run GraphLab
Tim
e
Barrier + Snapshot
Run GraphLabRun GraphLab
![Page 70: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/70.jpg)
Curse of the slow machine
sync.Snapshot
No Snapshot
Delayed sync.Snapshot
![Page 71: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/71.jpg)
Asynchronous Snapshots
struct chandy_lamport { void operator()(icontext_type& context) {
save(context.vertex_data()); foreach ( edge_type edge, context.in_edges() )
{if (edge.source() was not marked as
saved) {save(context.edge_data(edge));context.schedule(edge.source(),
chandy_lamport());}
}... Repeat for context.out_edgesMark context.vertex() as saved;
}};
Chandy Lamport algorithm implementable as a GraphLab update function! Requires edge consistency
![Page 72: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/72.jpg)
Snapshot Performance
Async.Snapshot
sync.Snapshot
No Snapshot
![Page 73: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/73.jpg)
Snapshot with 15s fault injection
No SnapshotAsync.
Snapshot
sync.Snapshot
Halt 1 out of 16 machines 15s
![Page 74: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/74.jpg)
New challenges
![Page 75: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/75.jpg)
Natural Graphs Power Law
Top 1% of vertices is adjacent to
53% of the edges!
Yahoo! Web Graph: 1.4B Verts, 6.7B Edges
“Power Law”
![Page 76: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/76.jpg)
Problem: High Degree Vertices
High degree vertices limit parallelism:
Touch a LargeAmount of State
Requires Heavy Locking
Processed Sequentially
![Page 77: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/77.jpg)
Split gather and scatter across machines:
High Communication in Distributed Updates
Y
Machine 1 Machine 2
Data from neighbors transmitted separately
across network
![Page 78: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/78.jpg)
High Degree Vertices are Common
Use
rs
Movies
Netflix
“Social” People Popular Movies
θZwZwZwZw
θZwZwZwZw
θZwZwZwZw
θZwZwZwZw
Bα
Hyper Parameters
Doc
s
Words
LDA
Common Words
Obama
![Page 79: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/79.jpg)
Factorized Update Functors
Delta Update Functors
Two Core Changes to Abstraction
Monolithic Updates
++
++ +
+
++
Gather Apply Scatter
Decomposed Updates
Monolithic Updates Composable Update “Messages”
f1 f2
(f1o f2)( )
![Page 80: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/80.jpg)
Decomposable Update Functors
Locks are acquired only for region within a scope Relaxed Consistency
+ + … + Δ
Y YY
ParallelSum
User Defined:
Gather( ) ΔY
Δ1 + Δ2 Δ3
Y Scope
Gather
Y
YApply( , Δ) Y
Apply the accumulated value to center vertex
User Defined:
Apply
Y
Scatter( )
Update adjacent edgesand vertices.
User Defined:Y
Scatter
![Page 81: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/81.jpg)
Factorized PageRankdouble gather(scope, edge) {
return edge.source().value().rank /
scope.num_out_edge(edge.source())}
double merge(acc1, acc2) { return acc1 + acc2 }
void apply(scope, accum) {old_value = scope.center_value().rankscope.center_value().rank = ALPHA + (1 - ALPHA) *
accumscope.center_value().residual =
abs(scope.center_value().rank – old_value)}
void scatter(scope, edge) {if (scope.center_vertex().residual > EPSILON)
reschedule_schedule(edge.target())}
![Page 82: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/82.jpg)
Y
Split gather and scatter across machines:
Factorized Updates: Significant Decrease in Communication
( o )( )Y
YYF1 F2
YY
Small amount of data transmitted over network
![Page 83: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/83.jpg)
Factorized ConsistencyNeighboring vertices maybe be updated simultaneously:
A B
Gather Gather
![Page 84: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/84.jpg)
Apply
Factorized Consistency LockingGather on an edge cannot occur during apply:
A B
Gather
Vertex B gathers on other neighbors while A is performing Apply
![Page 85: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/85.jpg)
Decomposable Loopy Belief Propagation
Gather: Accumulates product of in messages
Apply: Updates central belief
Scatter: Computes out messages and schedules adjacent vertices
![Page 86: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/86.jpg)
Decomposable Alternating Least Squares (ALS)
y1
y2
y3
y4
w1
w2
x1
x2
x3Use
r Fac
tors
(W)
Movie Factors (X)
Use
rs
Movies
Netflix
Use
rs
≈x
Movies
Gather: Sum terms
wi
xj
Update Function:
Apply: matrix inversion & multiply
![Page 87: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/87.jpg)
Comparison of Abstractions
Multicore PageRank (25M Vertices, 355M Edges)
0 1000 2000 3000 4000 5000 60001.00E-021.00E-011.00E+001.00E+011.00E+021.00E+031.00E+041.00E+051.00E+061.00E+071.00E+08
Runtime (s)
L1 E
rror
GraphLab1
FactorizedUpdates
![Page 88: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/88.jpg)
Need for Vertex Level Asynchrony
Exploit commutative associative “sum”
Y
+ + + + + Y
Costly gather for a single change!
![Page 89: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/89.jpg)
Commut-Assoc Vertex Level Asynchrony
Exploit commutative associative “sum”
+ + + + + Y
Y
![Page 90: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/90.jpg)
Exploit commutative associative “sum”
+ + + + + + Δ Y
Y
Commut-Assoc Vertex Level Asynchrony
+ Δ
![Page 91: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/91.jpg)
Delta Updates: Vertex Level Asynchrony
Exploit commutative associative “sum”
+ + + + + + Δ YOld (Cached) Sum
Y
![Page 92: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/92.jpg)
Exploit commutative associative “sum”
YΔ Δ
Delta Updates: Vertex Level Asynchrony
+ + + + + + Δ YOld (Cached) Sum
![Page 93: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/93.jpg)
Delta Update
void update(scope, delta) {scope.center_value() = scope.center_value() +
deltaif(abs(delta) > EPSILON) {
out_delta = delta * (1 – ALPHA) *1 /
scope.num_out_edge(edge.source())reschedule_out_neighbors(delta)
}}
double merge(delta, delta) { return delta + delta }
Program starts with: schedule_all(ALPHA)
![Page 94: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/94.jpg)
Multicore Abstraction Comparison
Multicore PageRank (25M Vertices, 355M Edges)
0 2000 4000 6000 8000 10000 12000 140001.00E-021.00E-011.00E+001.00E+011.00E+021.00E+031.00E+041.00E+051.00E+061.00E+071.00E+08
Delta
Factorized
GraphLab 1
Simulated Pregel
Runtime (s)
L1 E
rror
![Page 95: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/95.jpg)
Distributed Abstraction Comparison
Distributed PageRank (25M Vertices, 355M Edges)
2 3 4 5 6 7 80
50
100
150
200
250
300
350
400
# Machines (8 CPUs per Machine)
Runti
me
(s)
2 3 4 5 6 7 80
5
10
15
20
25
30
35
# Machines (8 CPUs per Machine)
Tota
l Com
mun
icati
on (G
B)
GraphLab1
GraphLab2 (Delta Updates)
GraphLab1
GraphLab2 (Delta Updates)
![Page 96: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/96.jpg)
PageRankAltavista Webgraph 2002
1.4B vertices, 6.7B edges
Hadoop 9000 s800 cores
Prototype GraphLab2 431s512 cores
Known Inefficiencies.
2x gain possible
![Page 97: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/97.jpg)
Decomposed Update Functions: Expose parallelism in high-degree vertices:
Delta Update Functions: Expose asynchrony in high-degree vertices
Summary of GraphLab2
++
++ +
+
++
Gather Apply Scatter
Y YΔ
![Page 98: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/98.jpg)
Lessons LearnedMachine Learning:
Asynchronous often much faster than SynchronousDynamic computation often faster
However, can be difficult to define optimal thresholds:
Science to do!
Consistency can improve performance
Sometimes required for convergenceThough there are cases where relaxed consistency is sufficient
System:Distributed asynchronous systems are harder to build
But, no distributed barriers == better scalability and performance
Scaling up by an order of magnitude requires rethinking of design assumptions
E.g., distributed graph representation
High degree vertices & natural graphs can limit parallelism
Need further assumptions on update functions
![Page 99: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/99.jpg)
SummaryAn abstraction tailored to Machine Learning
Targets Graph-Parallel Algorithms
Naturally expressesData/computational dependenciesDynamic iterative computation
Simplifies parallel algorithm designAutomatically ensures data consistencyAchieves state-of-the-art parallel performance on a variety of problems
![Page 100: Carnegie Mellon University Danny Bickson Yucheng Low Aapo Kyrola Carlos Guestrin Joe Hellerstein Alex Smola Parallel Machine Learning for Large-Scale Graphs](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649c3e5503460f948e89cd/html5/thumbnails/100.jpg)
Carnegie Mellon
Parallel GraphLab 1.1
Multicore Available TodayGraphLab2 (in the Cloud)
soon…
http://graphlab.org
Documentation… Code… Tutorials…