distributed machine learning and graph processing with sparse
TRANSCRIPT
![Page 1: Distributed Machine Learning and Graph Processing with Sparse](https://reader034.vdocuments.site/reader034/viewer/2022051008/58a1a24c1a28abd8728bea84/html5/thumbnails/1.jpg)
Distributed Machine Learning and Graph Processing with Sparse Matrices
Shivaram Venkataraman*, Erik Bodzsar#
Indrajit Roy+, Alvin AuYoung+, Rob Schreiber+
*UC Berkeley, #U Chicago, +HP Labs
![Page 2: Distributed Machine Learning and Graph Processing with Sparse](https://reader034.vdocuments.site/reader034/viewer/2022051008/58a1a24c1a28abd8728bea84/html5/thumbnails/2.jpg)
Big Data, Complex Algorithms
PageRank(Dominant eigenvector)
Recommendations(Matrix factorization)
Anomaly detection(Top-K eigenvalues)
User Importance(Vertex Centrality)
2
Machine learning + Graph algorithms
![Page 3: Distributed Machine Learning and Graph Processing with Sparse](https://reader034.vdocuments.site/reader034/viewer/2022051008/58a1a24c1a28abd8728bea84/html5/thumbnails/3.jpg)
Large-Scale Processing Frameworks
Data-parallel frameworks – MapReduce/Dryad (2004)– Process each record in parallel
– Use case: Computing sufficient statistics, analytics queries
Graph-centric frameworks – Pregel/GraphLab (2010)– Process each vertex in parallel
– Use case: Graphical models
Array-based frameworks – MadLINQ (2012)– Process blocks of array in parallel
– Use case: Linear Algebra Operations
3
![Page 4: Distributed Machine Learning and Graph Processing with Sparse](https://reader034.vdocuments.site/reader034/viewer/2022051008/58a1a24c1a28abd8728bea84/html5/thumbnails/4.jpg)
PageRank using Matrices
Power Method
Dominant eigenvector
4
Mp
M = web graph matrixp = PageRank vector
Simplified algorithm repeat { p = M*p }
Linear Algebra Operations on Sparse Matrices
p
![Page 5: Distributed Machine Learning and Graph Processing with Sparse](https://reader034.vdocuments.site/reader034/viewer/2022051008/58a1a24c1a28abd8728bea84/html5/thumbnails/5.jpg)
Presto
Large-scale machine learning and
graph processing on sparse matrices
5
Extend R – make it scalable, distributed
![Page 6: Distributed Machine Learning and Graph Processing with Sparse](https://reader034.vdocuments.site/reader034/viewer/2022051008/58a1a24c1a28abd8728bea84/html5/thumbnails/6.jpg)
Challenge 1 – Sparse Matrices
6
![Page 7: Distributed Machine Learning and Graph Processing with Sparse](https://reader034.vdocuments.site/reader034/viewer/2022051008/58a1a24c1a28abd8728bea84/html5/thumbnails/7.jpg)
Challenge 1 – Sparse Matrices
1
10
100
1000
10000
1 11 21 31 41 51 61 71 81 91
Bloc
k de
nsit
y (n
orm
aliz
ed )
Block ID
LiveJournal Netflix ClueWeb-1B
1000x more data Computation imbalance
7
![Page 8: Distributed Machine Learning and Graph Processing with Sparse](https://reader034.vdocuments.site/reader034/viewer/2022051008/58a1a24c1a28abd8728bea84/html5/thumbnails/8.jpg)
Challenge 2 – Data Sharing
Sharing data through pipes/network
Time-inefficient (sending copies)Space-inefficient (extra copies)
Process
copy of data
local copy
Process
data
Process
copy of data
Process
copy of data
Server 1
network
copynetwork
copy
Server 2
8
Sparse matrices
Communication overhead
![Page 9: Distributed Machine Learning and Graph Processing with Sparse](https://reader034.vdocuments.site/reader034/viewer/2022051008/58a1a24c1a28abd8728bea84/html5/thumbnails/9.jpg)
Outline
• Motivation
• Programming model
• Design
• Applications and Results
9
![Page 10: Distributed Machine Learning and Graph Processing with Sparse](https://reader034.vdocuments.site/reader034/viewer/2022051008/58a1a24c1a28abd8728bea84/html5/thumbnails/10.jpg)
darray
10
![Page 11: Distributed Machine Learning and Graph Processing with Sparse](https://reader034.vdocuments.site/reader034/viewer/2022051008/58a1a24c1a28abd8728bea84/html5/thumbnails/11.jpg)
foreach f (x)
11
![Page 12: Distributed Machine Learning and Graph Processing with Sparse](https://reader034.vdocuments.site/reader034/viewer/2022051008/58a1a24c1a28abd8728bea84/html5/thumbnails/12.jpg)
PageRank Using Presto
M darray(dim=c(N,N),blocks=(s,N))
P darray(dim=c(N,1),blocks=(s,1))
while(..){
foreach(i,1:len,
calculate(m=splits(M,i),
x=splits(P), p=splits(P,i)) {
p m*x
}
)}
Create Distributed Array
12
M p
P1
P2
PN/s
![Page 13: Distributed Machine Learning and Graph Processing with Sparse](https://reader034.vdocuments.site/reader034/viewer/2022051008/58a1a24c1a28abd8728bea84/html5/thumbnails/13.jpg)
PageRank Using Presto
M darray(dim=c(N,N),blocks=(s,N))
P darray(dim=c(N,1),blocks=(s,1))
while(..){
foreach(i,1:len,
calculate(m=splits(M,i),
x=splits(P), p=splits(P,i)) {
p m*x
}
)}
Execute function in a cluster
Pass array partitions
13
p
P1
P2
PN/s
M
![Page 14: Distributed Machine Learning and Graph Processing with Sparse](https://reader034.vdocuments.site/reader034/viewer/2022051008/58a1a24c1a28abd8728bea84/html5/thumbnails/14.jpg)
Presto Architecture
WorkerWorker
Master
R instanceR instance
DR
AM
R instance R instanceR instance
DR
AM
R instance
14
![Page 15: Distributed Machine Learning and Graph Processing with Sparse](https://reader034.vdocuments.site/reader034/viewer/2022051008/58a1a24c1a28abd8728bea84/html5/thumbnails/15.jpg)
Repartitioning Matrices
Profile execution
Repartition
15
Partition if max(𝑡)
𝑚𝑒𝑑𝑖𝑎𝑛 (𝑡)> 𝛿
![Page 16: Distributed Machine Learning and Graph Processing with Sparse](https://reader034.vdocuments.site/reader034/viewer/2022051008/58a1a24c1a28abd8728bea84/html5/thumbnails/16.jpg)
Maintaining Size Invariants
invariant(mat,vec, type=ROW)
16
![Page 17: Distributed Machine Learning and Graph Processing with Sparse](https://reader034.vdocuments.site/reader034/viewer/2022051008/58a1a24c1a28abd8728bea84/html5/thumbnails/17.jpg)
Sharing Distributed Arrays
Versioned distributed arrays
17
Goal: Zero-copy sharing across cores
Immutable partitions Safe sharing
![Page 18: Distributed Machine Learning and Graph Processing with Sparse](https://reader034.vdocuments.site/reader034/viewer/2022051008/58a1a24c1a28abd8728bea84/html5/thumbnails/18.jpg)
Data Sharing Challenges
1. Garbage collection
R object data partR object header
R instance R instance
2. Header conflicts
18
![Page 19: Distributed Machine Learning and Graph Processing with Sparse](https://reader034.vdocuments.site/reader034/viewer/2022051008/58a1a24c1a28abd8728bea84/html5/thumbnails/19.jpg)
Overriding R’s allocator
page
Shared R object data part
Allocate process-local headers. Map data in shared memory
Local R object header
page boundary page boundary
19
![Page 20: Distributed Machine Learning and Graph Processing with Sparse](https://reader034.vdocuments.site/reader034/viewer/2022051008/58a1a24c1a28abd8728bea84/html5/thumbnails/20.jpg)
Outline
• Motivation
• Programming model
• Design
• Applications and Results
20
![Page 21: Distributed Machine Learning and Graph Processing with Sparse](https://reader034.vdocuments.site/reader034/viewer/2022051008/58a1a24c1a28abd8728bea84/html5/thumbnails/21.jpg)
demo
5 node cluster 8 cores per node
PageRank on 1.5B edge Twitter data
demodemo
demo
21
![Page 22: Distributed Machine Learning and Graph Processing with Sparse](https://reader034.vdocuments.site/reader034/viewer/2022051008/58a1a24c1a28abd8728bea84/html5/thumbnails/22.jpg)
22
![Page 23: Distributed Machine Learning and Graph Processing with Sparse](https://reader034.vdocuments.site/reader034/viewer/2022051008/58a1a24c1a28abd8728bea84/html5/thumbnails/23.jpg)
Applications Implemented in Presto
Application Algorithm Presto LOC
PageRank Eigenvector calculation 41
Triangle counting Top-K eigenvalues 121
Netflix recommendation Matrix factorization 130
Centrality measure Graph algorithm 132
k-path connectivity Graph algorithm 30
k-means Clustering 71
Sequence alignment Smith-Waterman 64
Fewer than 140 lines of code
23
![Page 24: Distributed Machine Learning and Graph Processing with Sparse](https://reader034.vdocuments.site/reader034/viewer/2022051008/58a1a24c1a28abd8728bea84/html5/thumbnails/24.jpg)
Evaluation Overview
Evaluation Setup - 25 machine cluster
- Machine: 24 cores, 96GB RAM, 10Gbps network
Data-sharing benefits – 1.5B edge Twitter graph
Repartitioning analysis – 6B edge Web-graph
Faster than Spark and Hadoop using in-memory data
Collaborative Filtering using Netflix dataset
24
![Page 25: Distributed Machine Learning and Graph Processing with Sparse](https://reader034.vdocuments.site/reader034/viewer/2022051008/58a1a24c1a28abd8728bea84/html5/thumbnails/25.jpg)
Data sharing benefits
25
4.45
2.49
1.63
0.71
0.7
0.72
10
20
40
CORE
S
4.38
2.21
1.22
1.22
2.12
4.16
10
20
40
CORE
S
Compute TransferNo sharing
Sharing
![Page 26: Distributed Machine Learning and Graph Processing with Sparse](https://reader034.vdocuments.site/reader034/viewer/2022051008/58a1a24c1a28abd8728bea84/html5/thumbnails/26.jpg)
0
10
20
30
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Split
siz
e (G
B)
Iteration count
Repartitioning Progress
26
![Page 27: Distributed Machine Learning and Graph Processing with Sparse](https://reader034.vdocuments.site/reader034/viewer/2022051008/58a1a24c1a28abd8728bea84/html5/thumbnails/27.jpg)
0 20 40 60 80 100 120 140 160
Wor
kers
Transfer Compute
0 20 40 60 80 100 120 140 160
Wor
kers
Repartitioning benefits
No Repartition
Repartition
27
![Page 28: Distributed Machine Learning and Graph Processing with Sparse](https://reader034.vdocuments.site/reader034/viewer/2022051008/58a1a24c1a28abd8728bea84/html5/thumbnails/28.jpg)
Related Work
Large scale data processing frameworks
– MapReduce, Dryad, Spark, GraphLab
Matrix Computations – Ricardo, MadLINQ
HPC systems – ARPACK, Combinatorial BLAS
Multi-core R packages – doMC, snow, Rmpi
28
![Page 29: Distributed Machine Learning and Graph Processing with Sparse](https://reader034.vdocuments.site/reader034/viewer/2022051008/58a1a24c1a28abd8728bea84/html5/thumbnails/29.jpg)
Presto
29
Co-partitioning
matrices
Locality-based
scheduling
Caching
partitions
![Page 30: Distributed Machine Learning and Graph Processing with Sparse](https://reader034.vdocuments.site/reader034/viewer/2022051008/58a1a24c1a28abd8728bea84/html5/thumbnails/30.jpg)
Conclusion
Presto: Large scale array-based framework extends R
Challenges with Sparse matrices
Repartitioning, sharing versioned arrays
30
![Page 31: Distributed Machine Learning and Graph Processing with Sparse](https://reader034.vdocuments.site/reader034/viewer/2022051008/58a1a24c1a28abd8728bea84/html5/thumbnails/31.jpg)
Backup Slides
31
![Page 32: Distributed Machine Learning and Graph Processing with Sparse](https://reader034.vdocuments.site/reader034/viewer/2022051008/58a1a24c1a28abd8728bea84/html5/thumbnails/32.jpg)
Netflix Collaborative Filtering
32
755.112
380.985
256.1
234.236
202.725
155.299
0 200 400 600 800
8
16
24
32
40
48
Time (seconds)
Num
ber
of c
ores
Load t(R)×R R×t(R)×R
![Page 33: Distributed Machine Learning and Graph Processing with Sparse](https://reader034.vdocuments.site/reader034/viewer/2022051008/58a1a24c1a28abd8728bea84/html5/thumbnails/33.jpg)
Repartitioning benefits
0
50
100
150
200
250
300
350
400
2000
3000
4000
5000
6000
7000
8000
0 5 10 15 20
Cum
ula
tive
par
titi
onin
g ti
me
(s)
Tim
e to
con
verg
ence
(s)
Number of Repartitions
Convergence Time
Time spent partitioning
33