a new parallel framework for machine learning
DESCRIPTION
A New Parallel Framework for Machine Learning. Joseph Gonzalez Joint work with. Yucheng Low. Aapo Kyrola. Danny Bickson. Carlos Guestrin. Guy Blelloch. Joe Hellerstein. David O’Hallaron. Alex Smola. In ML we face BIG problems. 24 Million Wikipedia Pages. 750 Million - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/1.jpg)
![Page 2: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/2.jpg)
Carnegie Mellon
Joseph GonzalezJoint work with
YuchengLow
AapoKyrola
DannyBickson
CarlosGuestrin
GuyBlelloch
JoeHellerstein
DavidO’Hallaron
A New Parallel Framework for Machine Learning
AlexSmola
![Page 3: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/3.jpg)
In ML we face BIG problems
48 Hours a MinuteYouTube
24 Million Wikipedia Pages
750 MillionFacebook Users
6 Billion Flickr Photos
![Page 4: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/4.jpg)
Massive data provides opportunities for rich probabilistic structure …
4
![Page 5: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/5.jpg)
Shopper 1 Shopper 2
Cameras Cooking
5
Social Network
![Page 6: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/6.jpg)
What are the tools for massive data?
6
![Page 7: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/7.jpg)
Parallelism: Hope & ChallengesWide array of different parallel architectures:
New Challenges for Designing Machine Learning Algorithms: Race conditions and deadlocksManaging distributed model state
New Challenges for Implementing Machine Learning Algorithms:Parallel debugging and profilingHardware specific APIs
7
GPUs Multicore Clusters Mini Clouds Clouds
![Page 8: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/8.jpg)
8
Massive Structured Problems
Advances Parallel Hardware?Thesis:
“Parallel Learning and Inference in Probabilistic Graphical Models”
![Page 9: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/9.jpg)
9
Massive Structured Problems
Advances Parallel Hardware
“Parallel Learning and Inference in Probabilistic Graphical Models”
GraphLab
Parallel Algorithms forProbabilistic Learning and Inference
Probabilistic Graphical Models
![Page 10: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/10.jpg)
10
GraphLab
Massive Structured Problems
Advances Parallel Hardware
Parallel Algorithms forProbabilistic Learning and Inference
Probabilistic Graphical Models
![Page 11: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/11.jpg)
11
Massive Structured Problems
Advances Parallel HardwareGraphLab
Parallel Algorithms forProbabilistic Learning and Inference
Probabilistic Graphical Models
![Page 12: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/12.jpg)
How will wedesign and implement
parallel learning systems?
![Page 13: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/13.jpg)
Threads, Locks, & Messages
“low level parallel primitives”
We could use ….
![Page 14: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/14.jpg)
Threads, Locks, and MessagesML experts repeatedly solve the same parallel design challenges:
Implement and debug complex parallel systemTune for a specific parallel platformTwo months later the conference paper contains:
“We implemented ______ in parallel.”The resulting code:
is difficult to maintainis difficult to extendcouples learning model to parallel implementation
14
Graduate students
![Page 15: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/15.jpg)
Map-Reduce / HadoopBuild learning algorithms on-top of
high-level parallel abstractions
... a better answer:
![Page 16: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/16.jpg)
CPU 1 CPU 2 CPU 3 CPU 4
MapReduce – Map Phase
16
Embarrassingly Parallel independent computation
12.9
42.3
21.3
25.8
No Communication needed
![Page 17: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/17.jpg)
CPU 1 CPU 2 CPU 3 CPU 4
MapReduce – Map Phase
17
12.9
42.3
21.3
25.8
24.1
84.3
18.4
84.4
Image Features
![Page 18: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/18.jpg)
CPU 1 CPU 2 CPU 3 CPU 4
MapReduce – Map Phase
18
Embarrassingly Parallel independent computation
12.9
42.3
21.3
25.8
17.5
67.5
14.9
34.3
24.1
84.3
18.4
84.4
No Communication needed
![Page 19: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/19.jpg)
CPU 1 CPU 2
MapReduce – Reduce Phase
19
12.9
42.3
21.3
25.8
24.1
84.3
18.4
84.4
17.5
67.5
14.9
34.3
2226.
26
1726.
31
Image Features
Attractive Face Statistics
Not Attractive FaceStatistics
N A A N N N A A N A N A
Attractive FacesNot AttractiveFaces
![Page 20: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/20.jpg)
BeliefPropagation
Label Propagation
KernelMethods
Deep BeliefNetworks
NeuralNetworks
Tensor Factorization
PageRank
Lasso
Map-Reduce for Data-Parallel MLExcellent for large data-parallel tasks!
20
Data-Parallel Graph-Parallel
CrossValidation
Feature Extraction
Map Reduce
Computing SufficientStatistics
Is there more toMachine Learning
?
![Page 21: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/21.jpg)
Concrete ExampleLabel Propagation
![Page 22: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/22.jpg)
Profile
Label Propagation AlgorithmSocial Arithmetic:
Recurrence Algorithm:
iterate until convergence
Parallelism:Compute all Likes[i] in parallel
Sue Ann
Carlos
Me
50% What I list on my profile40% Sue Ann Likes10% Carlos Like
40%
10%
50%
80% Cameras20% Biking
30% Cameras70% Biking
50% Cameras50% Biking
I Like:
+60% Cameras, 40% Biking
![Page 23: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/23.jpg)
Properties of Graph Parallel Algorithms
DependencyGraph
IterativeComputation
What I Like
What My Friends Like
Factored Computation
![Page 24: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/24.jpg)
?
BeliefPropagation
Label Propagation
KernelMethods
Deep BeliefNetworks
NeuralNetworks
Tensor Factorization
PageRank
Lasso
Map-Reduce for Data-Parallel MLExcellent for large data-parallel tasks!
24
Data-Parallel Graph-Parallel
CrossValidation
Feature Extraction
Map Reduce
Computing SufficientStatistics
Map Reduce?
![Page 25: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/25.jpg)
Why not use Map-Reducefor
Graph Parallel Algorithms?
![Page 26: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/26.jpg)
Data DependenciesMap-Reduce does not efficiently express dependent data
User must code substantial data transformations Costly data replication
Inde
pend
ent D
ata
Row
s
![Page 27: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/27.jpg)
Slow
Proc
esso
rIterative Algorithms
Map-Reduce not efficiently express iterative algorithms:
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
CPU 1
CPU 2
CPU 3
Data
Data
Data
Data
Data
Data
Data
CPU 1
CPU 2
CPU 3
Data
Data
Data
Data
Data
Data
Data
CPU 1
CPU 2
CPU 3
Iterations
Barr
ier
Barr
ier
Barr
ier
![Page 28: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/28.jpg)
MapAbuse: Iterative MapReduceOnly a subset of data needs computation:
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
CPU 1
CPU 2
CPU 3
Data
Data
Data
Data
Data
Data
Data
CPU 1
CPU 2
CPU 3
Data
Data
Data
Data
Data
Data
Data
CPU 1
CPU 2
CPU 3
Iterations
Barr
ier
Barr
ier
Barr
ier
![Page 29: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/29.jpg)
MapAbuse: Iterative MapReduceSystem is not optimized for iteration:
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
Data
CPU 1
CPU 2
CPU 3
Data
Data
Data
Data
Data
Data
Data
CPU 1
CPU 2
CPU 3
Data
Data
Data
Data
Data
Data
Data
CPU 1
CPU 2
CPU 3
Iterations
Disk Penalty
Disk Penalty
Disk Penalty
Startup Penalty
Startup Penalty
Startup Penalty
![Page 30: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/30.jpg)
BeliefPropagation
SVM
KernelMethods
Deep BeliefNetworks
NeuralNetworks
Tensor Factorization
PageRank
Lasso
Map-Reduce for Data-Parallel MLExcellent for large data-parallel tasks!
30
Data-Parallel Graph-Parallel
CrossValidation
Feature Extraction
Map Reduce
Computing SufficientStatistics
Map Reduce?Pregel (Giraph)?
![Page 31: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/31.jpg)
BarrierPregel (Giraph)
Bulk Synchronous Parallel Model:
Compute Communicate
![Page 32: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/32.jpg)
Bulk synchronous computation can be highly inefficient.
32
Example: Loopy Belief Propagation
Problem
![Page 33: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/33.jpg)
Loopy Belief Propagation (Loopy BP)
• Iteratively estimate the “beliefs” about vertices– Read in messages– Updates marginal
estimate (belief)– Send updated
out messages• Repeat for all variables
until convergence
33
![Page 34: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/34.jpg)
Bulk Synchronous Loopy BP
• Often considered embarrassingly parallel – Associate processor
with each vertex– Receive all messages– Update all beliefs– Send all messages
• Proposed by:– Brunton et al. CRV’06– Mendiburu et al. GECC’07– Kang,et al. LDMTA’10– …
34
![Page 35: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/35.jpg)
Sequential Computational Structure
35
![Page 36: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/36.jpg)
Hidden Sequential Structure
36
![Page 37: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/37.jpg)
Hidden Sequential Structure
• Running Time:
EvidenceEvidence
Time for a singleparallel iteration Number of Iterations
37
![Page 38: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/38.jpg)
Optimal Sequential Algorithm
Forward-Backward
Bulk Synchronous2n2/p
p ≤ 2n
RunningTime
2n
Gap
p = 1Optimal Parallel
np = 2 38
![Page 39: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/39.jpg)
39
The Splash Operation• Generalize the optimal chain algorithm:
to arbitrary cyclic graphs:
~
1) Grow a BFS Spanning tree with fixed size
2) Forward Pass computing all messages at each vertex
3) Backward Pass computing all messages at each vertex
![Page 40: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/40.jpg)
Data-Parallel Algorithms can be Inefficient
The limitations of the Map-Reduce abstraction can lead to inefficient parallel algorithms.
1 2 3 4 5 6 7 80
100020003000400050006000700080009000
Number of CPUs
Runti
me
in S
econ
ds
Optimized in Memory Bulk Synchronous
Asynchronous Splash BP
![Page 41: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/41.jpg)
BeliefPropagationSVM
KernelMethods
Deep BeliefNetworks
NeuralNetworks
Tensor Factorization
PageRank
Lasso
The Need for a New AbstractionMap-Reduce is not well suited for Graph-Parallelism
41
Data-Parallel Graph-Parallel
CrossValidation
Feature Extraction
Map Reduce
Computing SufficientStatistics
Pregel (Giraph)
![Page 42: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/42.jpg)
What is GraphLab?
![Page 43: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/43.jpg)
The GraphLab Framework
Scheduler Consistency Model
Graph BasedData Representation
Update FunctionsUser Computation
43
![Page 44: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/44.jpg)
Data Graph
44
A graph with arbitrary data (C++ Objects) associated with each vertex and edge.
Vertex Data:• User profile text• Current interests estimates
Edge Data:• Similarity weights
Graph:• Social Network
![Page 45: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/45.jpg)
Implementing the Data GraphMulticore Setting
In MemoryChallenge:
Fast lookup, low overhead
Solution:Dense data-structuresFixed Vdata & Edata typesImmutable graph structure
Cluster Setting
In MemoryPartition Graph:
ParMETIS or Random Cuts
Cached Ghosting
Node 1 Node 2
A BC D
A BC D
A B
C D
![Page 46: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/46.jpg)
The GraphLab Framework
Scheduler Consistency Model
Graph BasedData Representation
Update FunctionsUser Computation
46
![Page 47: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/47.jpg)
label_prop(i, scope){ // Get Neighborhood data (Likes[i], Wij, Likes[j]) scope;
// Update the vertex data
// Reschedule Neighbors if needed if Likes[i] changes then reschedule_neighbors_of(i); }
Update Functions
47
An update function is a user defined program which when applied to a vertex transforms the data in the scope of the vertex
![Page 48: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/48.jpg)
The GraphLab Framework
Scheduler Consistency Model
Graph BasedData Representation
Update FunctionsUser Computation
48
![Page 49: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/49.jpg)
The Scheduler
49
CPU 1
CPU 2
The scheduler determines the order that vertices are updated.
e f g
kjih
dcba b
ih
a
i
b e f
j
c
Sche
dule
r
The process repeats until the scheduler is empty.
![Page 50: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/50.jpg)
Choosing a Schedule
GraphLab provides several different schedulersRound Robin: vertices are updated in a fixed orderFIFO: Vertices are updated in the order they are addedPriority: Vertices are updated in priority order
50
The choice of schedule affects the correctness and parallel performance of the algorithm
Obtain different algorithms by simply changing a flag! --scheduler=roundrobin --scheduler=fifo --scheduler=priority
![Page 51: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/51.jpg)
The GraphLab Framework
Scheduler Consistency Model
Graph BasedData Representation
Update FunctionsUser Computation
52
![Page 52: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/52.jpg)
Ensuring Race-Free CodeHow much can computation overlap?
![Page 53: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/53.jpg)
CPU 1 CPU 2
Common Problem: Write-Write Race
54
Processors running adjacent update functions simultaneously modify shared data:
CPU1 writes: CPU2 writes:
Final Value
![Page 54: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/54.jpg)
Importance of Consistency
Alternating Least Squares
![Page 55: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/55.jpg)
Importance of Consistency
Build
Test
Debug
Tweak Model
![Page 56: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/56.jpg)
GraphLab Ensures Sequential Consistency
57
For each parallel execution, there exists a sequential execution of update functions which produces the same result.
CPU 1
CPU 2
SingleCPU
Parallel
Sequential
time
![Page 57: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/57.jpg)
Consistency Rules
58
Guaranteed sequential consistency for all update functions
Data
![Page 58: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/58.jpg)
Full Consistency
59
![Page 59: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/59.jpg)
Obtaining More Parallelism
60
![Page 60: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/60.jpg)
Edge Consistency
61
CPU 1 CPU 2
Safe
Read
![Page 61: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/61.jpg)
Consistency Through R/W LocksRead/Write locks:
Full Consistency
Edge Consistency
Write Write WriteCanonical Lock Ordering
Read Write ReadRead Write
![Page 62: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/62.jpg)
Multicore Setting: Pthread R/W LocksDistributed Setting: Distributed Locking
Prefetch Locks and Data
Allow computation to proceed while locks/data are requested.
Node 2
Consistency Through R/W Locks
Node 1Data GraphPartition
Lock Pipeline
![Page 63: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/63.jpg)
The GraphLab Framework
Scheduler Consistency Model
Graph BasedData Representation
Update FunctionsUser Computation
65
![Page 64: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/64.jpg)
Anatomy of a GraphLab Program:
1) #include <graphlab.hpp>2) Define C++ Update Function3) Build data graph using the C++ graph object4) Set engine parameters:
1) Scheduler type 2) Consistency model
5) Add initial vertices to the scheduler 6) Run the engine on the graph [Blocking call]7) Final answer is stored in the graph
![Page 65: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/65.jpg)
Algorithms Implemented PageRankLoopy Belief PropagationGibbs SamplingCoEMGraphical Model Parameter LearningProbabilistic Matrix/Tensor FactorizationAlternating Least SquaresLasso with Sparse FeaturesSupport Vector Machines with Sparse FeaturesLabel-Propagation…
![Page 66: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/66.jpg)
Shared MemoryExperiments
Shared Memory Setting16 Core Workstation
68
![Page 67: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/67.jpg)
Loopy Belief Propagation
69
3D retinal image denoising
Data GraphUpdate Function:
Loopy BP Update EquationScheduler:
SplashBPConsistency Model:
Edge Consistency
Vertices: 1 MillionEdges: 3 Million
![Page 68: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/68.jpg)
Loopy Belief Propagation
70
0 2 4 6 8 10 12 14 160
2
4
6
8
10
12
14
16
Number of CPUs
Spee
dup
Optimal
Bette
r
SplashBP
15.5x speedup
![Page 69: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/69.jpg)
Gibbs SamplingProtein-protein interaction networks [Elidan et al. 2006]
Provably correct ParallelizationEdge Consistency
71
Discrete MRF14K Vertices100K Edges
Backbone
Protein
Interactions
Side-Chain
![Page 70: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/70.jpg)
Gibbs Sampling
72
0 2 4 6 8 10 12 14 160
2
4
6
8
10
12
14
16
Number of CPUs
Spee
dup
OptimalBe
tter
Chromatic Gibbs Sampler
![Page 71: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/71.jpg)
Carnegie Mellon
An asynchronous Gibbs Sampler that adaptively addresses strong dependencies.
Splash Gibbs Sampler
73
![Page 72: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/72.jpg)
74
Splash Gibbs Sampler
Step 1: Grow multiple Splashes in parallel:
ConditionallyIndependent
![Page 73: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/73.jpg)
75
Splash Gibbs Sampler
Step 1: Grow multiple Splashes in parallel:
ConditionallyIndependent
Tree-width = 1
![Page 74: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/74.jpg)
76
Splash Gibbs Sampler
Step 1: Grow multiple Splashes in parallel:
ConditionallyIndependent
Tree-width = 2
![Page 75: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/75.jpg)
77
Splash Gibbs Sampler
Step 2: Calibrate the trees in parallel
![Page 76: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/76.jpg)
78
Splash Gibbs Sampler
Step 3: Sample trees in parallel
![Page 77: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/77.jpg)
79
Experimental Results
The Splash sampler outperforms the Chromatic sampler on models with strong dependencies
Likelihood Final Sample
Bette
r
Splash
Chromatic
“Mixing”
BetterSplash
Chromatic
Speedup in Sample Generation
Bette
r
Splash
Chromatic
• Markov logic network with strong dependencies10K Variables 28K Factors
![Page 78: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/78.jpg)
CoEM (Rosie Jones, 2005)Named Entity Recognition Task
the dog
Australia
Catalina Island
<X> ran quickly
travelled to <X>
<X> is pleasant
Hadoop 95 Cores 7.5 hrs
Is “Dog” an animal?Is “Catalina” a place?
Vertices: 2 MillionEdges: 200 Million
![Page 79: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/79.jpg)
0 2 4 6 8 10 12 14 160
2
4
6
8
10
12
14
16
Number of CPUs
Spee
dup
Bette
r
Optimal
GraphLab CoEM
CoEM (Rosie Jones, 2005)
81
GraphLab 16 Cores 30 min
15x Faster!6x fewer CPUs!
Hadoop 95 Cores 7.5 hrs
![Page 80: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/80.jpg)
Lasso: Regularized Linear Model
82
Data matrix,n x d
weights d x 1 Observationsn x 1
5 Features
4 Examples
Shooting Algorithm [Coordinate Descent]• Updates on weight vertices modify
losses on observation vertices.
Requires theFull Consistency ModelFinancial prediction dataset
from Kogan et al [2009].
Regularization
![Page 81: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/81.jpg)
Full Consistency
83
Optimal
0 2 4 6 8 10 12 14 160
2
4
6
8
10
12
14
16
Number of CPUs
Spee
dup
Bette
r
Dense
Sparse
![Page 82: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/82.jpg)
0 2 4 6 8 10 12 14 160
2
4
6
8
10
12
14
16
Number of CPUs
Spee
dup
Relaxing Consistency
84
Why does this work? (See Shotgut ICML Paper)
0 2 4 6 8 10 12 14 160
2
4
6
8
10
12
14
16
Number of CPUs
Spee
dup
Bette
r Optimal
Dense
Sparse
![Page 83: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/83.jpg)
ExperimentsAmazon EC2
High-Performance Nodes
85
![Page 84: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/84.jpg)
Video Cosegmentation
Segments mean the same
Model: 10.5 million nodes, 31 million edges
Gaussian EM clustering + BP on 3D grid
![Page 85: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/85.jpg)
Video Coseg. Speedups
![Page 86: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/86.jpg)
Matrix FactorizationNetflix Collaborative Filtering
Alternating Least Squares Matrix FactorizationModel: 0.5 million nodes, 99 million edges
Netflix
Users
Movies
d
![Page 87: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/87.jpg)
NetflixSpeedup Increasing size of the matrix factorization
![Page 88: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/88.jpg)
The Cost of Hadoop
![Page 89: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/89.jpg)
SummaryAn abstraction tailored to Machine Learning
Targets Graph-Parallel Algorithms
Naturally expressesData/computational dependenciesDynamic iterative computation
Simplifies parallel algorithm designAutomatically ensures data consistencyAchieves state-of-the-art parallel performance on a variety of problems
92
![Page 90: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/90.jpg)
Carnegie Mellon
Checkout GraphLab
http://graphlab.org
93
Documentation… Code… Tutorials…
Questions & Feedback
![Page 91: A New Parallel Framework for Machine Learning](https://reader033.vdocuments.site/reader033/viewer/2022051402/56815e06550346895dcc533d/html5/thumbnails/91.jpg)
Current/Future Work Out-of-core StorageHadoop/HDFS Integration
Graph ConstructionGraph StorageLaunching GraphLab from HadoopFault Tolerance through HDFS Checkpoints
Sub-scope parallelismAddress the challenge of very high degree nodes
Improved graph partitioningSupport for dynamic graph structure