![Page 1: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2](https://reader033.vdocuments.site/reader033/viewer/2022053018/5f1de67113820d0fa51df33d/html5/thumbnails/1.jpg)
Piccolo: Building fast distributed programs with partitioned tables
Russell Power Jinyang Li New York University
![Page 2: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2](https://reader033.vdocuments.site/reader033/viewer/2022053018/5f1de67113820d0fa51df33d/html5/thumbnails/2.jpg)
Motivating Example: PageRank
for each node X in graph: for each edge XZ: next[Z] += curr[X]
Repeat until convergence
AB,C,D
BE
CD
…
A: 0.12
B: 0.15
C: 0.2
…
A: 0
B: 0
C: 0
…
Curr Next Input Graph
A: 0.2
B: 0.16
C: 0.21
…
A: 0.2
B: 0.16
C: 0.21
…
A: 0.25
B: 0.17
C: 0.22
…
A: 0.25
B: 0.17
C: 0.22
…
Fits in memory!
![Page 3: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2](https://reader033.vdocuments.site/reader033/viewer/2022053018/5f1de67113820d0fa51df33d/html5/thumbnails/3.jpg)
PageRank in MapReduce
1
2 3
Distributed Storage
Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2
Rank stream A:0.1, B:0.2
Data flow models do not expose global state.
![Page 4: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2](https://reader033.vdocuments.site/reader033/viewer/2022053018/5f1de67113820d0fa51df33d/html5/thumbnails/4.jpg)
PageRank in MapReduce
1
2 3
Distributed Storage
Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2
Rank stream A:0.1, B:0.2
Data flow models do not expose global state.
![Page 5: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2](https://reader033.vdocuments.site/reader033/viewer/2022053018/5f1de67113820d0fa51df33d/html5/thumbnails/5.jpg)
PageRank With MPI/RPC
1
2 3
Distributed Storage
Graph A->B,C
…
Ranks A: 0 …
Graph B->D …
Ranks B: 0 …
Graph C->E,F
…
Ranks C: 0 …
User explicitly programs
communication
![Page 6: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2](https://reader033.vdocuments.site/reader033/viewer/2022053018/5f1de67113820d0fa51df33d/html5/thumbnails/6.jpg)
Piccolo’s Goal: Distributed Shared State
1
2 3
Distributed Storage
Graph A->B,C B->D …
Ranks A: 0
B: 0 …
read/write
Distributed in-memory state
![Page 7: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2](https://reader033.vdocuments.site/reader033/viewer/2022053018/5f1de67113820d0fa51df33d/html5/thumbnails/7.jpg)
Piccolo’s Goal: Distributed Shared State
1
2 3
Graph A->B,C
…
Ranks A: 0 …
Graph B->D …
Ranks B: 0 …
Graph C->E,F
…
Ranks C: 0 …
Piccolo runtime handles
communication
![Page 8: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2](https://reader033.vdocuments.site/reader033/viewer/2022053018/5f1de67113820d0fa51df33d/html5/thumbnails/8.jpg)
Ease of use Performance
![Page 9: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2](https://reader033.vdocuments.site/reader033/viewer/2022053018/5f1de67113820d0fa51df33d/html5/thumbnails/9.jpg)
![Page 10: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2](https://reader033.vdocuments.site/reader033/viewer/2022053018/5f1de67113820d0fa51df33d/html5/thumbnails/10.jpg)
Talk outline
Motivation Piccolo's Programming Model Runtime Scheduling Evaluation
![Page 11: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2](https://reader033.vdocuments.site/reader033/viewer/2022053018/5f1de67113820d0fa51df33d/html5/thumbnails/11.jpg)
Programming Model
1
2 3
Graph AB,C BD …
Ranks A: 0
B: 0 …
read/write x get/put update/iterate
Implemented as library for C++ and Python
![Page 12: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2](https://reader033.vdocuments.site/reader033/viewer/2022053018/5f1de67113820d0fa51df33d/html5/thumbnails/12.jpg)
def main(): for i in range(50): launch_jobs(NUM_MACHINES, pr_kernel,
graph, curr, next) swap(curr, next) next.clear()
def pr_kernel(graph, curr, next): i = my_instance n = len(graph)/NUM_MACHINES for s in graph[(i-1)*n:i*n] for t in s.out: next[t] += curr[s.id] / len(s.out)
Naïve PageRank with Piccolo
Run by a single controller
Jobs run by many machines
curr = Table(key=PageID, value=double) next = Table(key=PageID, value=double)
Controller launches jobs in parallel
![Page 13: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2](https://reader033.vdocuments.site/reader033/viewer/2022053018/5f1de67113820d0fa51df33d/html5/thumbnails/13.jpg)
Naïve PageRank is Slow
1
2 3 Graph A->B,C
…
Ranks A: 0 …
Graph B->D …
Ranks B: 0 …
Graph C->E,F
…
Ranks C: 0 …
get
put
put
put
get
get
![Page 14: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2](https://reader033.vdocuments.site/reader033/viewer/2022053018/5f1de67113820d0fa51df33d/html5/thumbnails/14.jpg)
PageRank: Exploiting Locality Control table partitioning
Co-locate tables
Co-locate execution with table
curr = Table(…,partitions=100,partition_by=site) next = Table(…,partitions=100,partition_by=site) group_tables(curr,next,graph)
def pr_kernel(graph, curr, next): for s in graph.get_iterator(my_instance) for t in s.out: next[t] += curr[s.id] / len(s.out)
def main(): for i in range(50): launch_jobs(curr.num_partitions, pr_kernel, graph, curr, next, locality=curr) swap(curr, next) next.clear()
![Page 15: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2](https://reader033.vdocuments.site/reader033/viewer/2022053018/5f1de67113820d0fa51df33d/html5/thumbnails/15.jpg)
Exploiting Locality
1
2 3 Graph A->B,C
…
Ranks A: 0 …
Graph B->D …
Ranks B: 0 …
Graph C->E,F
…
Ranks C: 0 …
get
put
put
put
get
get
![Page 16: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2](https://reader033.vdocuments.site/reader033/viewer/2022053018/5f1de67113820d0fa51df33d/html5/thumbnails/16.jpg)
Exploiting Locality
1
2 3 Graph A->B,C
…
Ranks A: 0 …
Graph B->D …
Ranks B: 0 …
Graph C->E,F
…
Ranks C: 0 …
put
get
put put
get get
![Page 17: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2](https://reader033.vdocuments.site/reader033/viewer/2022053018/5f1de67113820d0fa51df33d/html5/thumbnails/17.jpg)
Synchronization
1
2 3 Graph A->B,C
…
Ranks A: 0 …
Graph B->D …
Ranks B: 0 …
Graph C->E,F
…
Ranks C: 0 …
put (a=0.3)
put (a=0.2)
How to handle synchronization?
![Page 18: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2](https://reader033.vdocuments.site/reader033/viewer/2022053018/5f1de67113820d0fa51df33d/html5/thumbnails/18.jpg)
Synchronization Primitives
Avoid write conflicts with accumulation functions NewValue = Accum(OldValue, Update)
sum, product, min, max
Global barriers are sufficient Tables provide release consistency
![Page 19: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2](https://reader033.vdocuments.site/reader033/viewer/2022053018/5f1de67113820d0fa51df33d/html5/thumbnails/19.jpg)
PageRank: Efficient Synchronization curr = Table(…,partition_by=site,accumulate=sum) next = Table(…,partition_by=site,accumulate=sum) group_tables(curr,next,graph)
def pr_kernel(graph, curr, next): for s in graph.get_iterator(my_instance) for t in s.out: next.update(t, curr.get(s.id)/len(s.out))
def main(): for i in range(50): handle = launch_jobs(curr.num_partitions, pr_kernel, graph, curr, next, locality=curr) barrier(handle) swap(curr, next) next.clear()
Accumulation via sum
Update invokes accumulation function
Explicitly wait between iterations
![Page 20: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2](https://reader033.vdocuments.site/reader033/viewer/2022053018/5f1de67113820d0fa51df33d/html5/thumbnails/20.jpg)
Efficient Synchronization
1
2 3 Graph A->B,C
…
Ranks A: 0 …
Graph B->D …
Ranks B: 0 …
Graph C->E,F
…
Ranks C: 0 …
put (a=0.3) put (a=0.2) update (a, 0.2)
update (a, 0.3)
Runtime computes sum
Workers buffer updates locally Release consistency
![Page 21: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2](https://reader033.vdocuments.site/reader033/viewer/2022053018/5f1de67113820d0fa51df33d/html5/thumbnails/21.jpg)
Table Consistency
1
2 3 Graph A->B,C
…
Ranks A: 0 …
Graph B->D …
Ranks B: 0 …
Graph C->E,F
…
Ranks C: 0 …
put (a=0.3) put (a=0.2) update (a, 0.2)
update (a, 0.3)
![Page 22: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2](https://reader033.vdocuments.site/reader033/viewer/2022053018/5f1de67113820d0fa51df33d/html5/thumbnails/22.jpg)
PageRank with Checkpointing
curr = Table(…,partition_by=site,accumulate=sum) next = Table(…,partition_by=site,accumulate=sum) group_tables(curr,next) def pr_kernel(graph, curr, next): for node in graph.get_iterator(my_instance) for t in s.out: next.update(t,curr.get(s.id)/len(s.out))
def main(): curr, userdata = restore() last = userdata.get(‘iter’, 0) for i in range(last,50): handle = launch_jobs(curr.num_partitions, pr_kernel, graph, curr, next, locality=curr) cp_barrier(handle, tables=(next), userdata={‘iter’:i}) swap(curr, next) next.clear()
Restore previous computation
User decides which tables to checkpoint and when
![Page 23: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2](https://reader033.vdocuments.site/reader033/viewer/2022053018/5f1de67113820d0fa51df33d/html5/thumbnails/23.jpg)
Distributed Storage
Recovery via Checkpointing
1
2 3 Graph A->B,C
…
Ranks A: 0 …
Graph B->D …
Ranks B: 0 …
Graph C->E,F
…
Ranks C: 0 …
Runtime uses Chandy-Lamport
protocol
![Page 24: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2](https://reader033.vdocuments.site/reader033/viewer/2022053018/5f1de67113820d0fa51df33d/html5/thumbnails/24.jpg)
Talk Outline
Motivation Piccolo's Programming Model Runtime Scheduling Evaluation
![Page 25: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2](https://reader033.vdocuments.site/reader033/viewer/2022053018/5f1de67113820d0fa51df33d/html5/thumbnails/25.jpg)
Load Balancing
1 3 2
master
J5 J3
Other workers are updating P6!
Pause updates!
P1 P1, P2
P3 P3, P4
P5 P5, P6 J1
J1, J2 J3
J3, J4 J6
J5, J6
P6
Coordinates work-
stealing
![Page 26: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2](https://reader033.vdocuments.site/reader033/viewer/2022053018/5f1de67113820d0fa51df33d/html5/thumbnails/26.jpg)
Talk Outline
Motivation Piccolo's Programming Model System Design Evaluation
![Page 27: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2](https://reader033.vdocuments.site/reader033/viewer/2022053018/5f1de67113820d0fa51df33d/html5/thumbnails/27.jpg)
Piccolo is Fast
NYU cluster, 12 nodes, 64 cores 100M-page graph
Main Hadoop Overheads:
Sorting
HDFS
Serialization
8 16 32 64Workers
0
50
100
150
200
250
300
350
400
Pag
eRan
k ite
ratio
n tim
e
(sec
onds
)
Hadoop Piccolo
![Page 28: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2](https://reader033.vdocuments.site/reader033/viewer/2022053018/5f1de67113820d0fa51df33d/html5/thumbnails/28.jpg)
Piccolo Scales Well
EC2 Cluster - linearly scaled input graph
12 24 48 100 200Workers
0
10
20
30
40
50
60
70
ideal
1 billion page graph
Pag
eRan
k ite
ratio
n tim
e
(sec
onds
)
![Page 29: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2](https://reader033.vdocuments.site/reader033/viewer/2022053018/5f1de67113820d0fa51df33d/html5/thumbnails/29.jpg)
Iterative Applications N-Body Simulation Matrix Multiply
Asynchronous Applications Distributed web crawler
Other applications
No straightforward Hadoop implementation
![Page 30: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2](https://reader033.vdocuments.site/reader033/viewer/2022053018/5f1de67113820d0fa51df33d/html5/thumbnails/30.jpg)
Related Work
Data flow MapReduce, Dryad
Tuple Spaces Linda, JavaSpaces
Distributed Shared Memory CRL, TreadMarks, Munin, Ivy UPC, Titanium
![Page 31: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2](https://reader033.vdocuments.site/reader033/viewer/2022053018/5f1de67113820d0fa51df33d/html5/thumbnails/31.jpg)
Conclusion
Distributed shared table model User-specified policies provide for
Effective use of locality Efficient synchronization Robust failure recovery
![Page 32: Piccolo: Building fast distributed programs with ... · PageRank in MapReduce 1 2 3 Distributed Storage Graph stream Rank stream A->B,C, B->D A:0.1, B:0.2 Rank stream A:0.1, B:0.2](https://reader033.vdocuments.site/reader033/viewer/2022053018/5f1de67113820d0fa51df33d/html5/thumbnails/32.jpg)
Gratuitous Cat Picture
I can haz kwestions?
Try it out: piccolo.news.cs.nyu.edu