2013 12 02 sparrow - eecs at uc berkeleykeo/talks/sparrow-spark-summit... · sparrow distributed...
TRANSCRIPT
![Page 1: 2013 12 02 Sparrow - EECS at UC Berkeleykeo/talks/sparrow-spark-summit... · Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica](https://reader031.vdocuments.site/reader031/viewer/2022021901/5b82f4217f8b9a47588c2134/html5/thumbnails/1.jpg)
Sparrow Distributed Low-Latency Spark Scheduling
Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica
![Page 2: 2013 12 02 Sparrow - EECS at UC Berkeleykeo/talks/sparrow-spark-summit... · Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica](https://reader031.vdocuments.site/reader031/viewer/2022021901/5b82f4217f8b9a47588c2134/html5/thumbnails/2.jpg)
Outline
The Spark scheduling bottleneck Sparrow’s fully distributed, fault-tolerant technique Sparrow’s near-optimal performance
![Page 3: 2013 12 02 Sparrow - EECS at UC Berkeleykeo/talks/sparrow-spark-summit... · Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica](https://reader031.vdocuments.site/reader031/viewer/2022021901/5b82f4217f8b9a47588c2134/html5/thumbnails/3.jpg)
Spark Today
Worker
Worker
Worker
Worker
Worker
Worker
Spark Context
User 1 User 2 User 3
Query Compilation
Storage
Scheduling
![Page 4: 2013 12 02 Sparrow - EECS at UC Berkeleykeo/talks/sparrow-spark-summit... · Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica](https://reader031.vdocuments.site/reader031/viewer/2022021901/5b82f4217f8b9a47588c2134/html5/thumbnails/4.jpg)
Spark Today
Worker
Worker
Worker
Worker
Worker
Worker
Spark Context
User 1 User 2 User 3
Query Compilation
Storage
Scheduling
![Page 5: 2013 12 02 Sparrow - EECS at UC Berkeleykeo/talks/sparrow-spark-summit... · Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica](https://reader031.vdocuments.site/reader031/viewer/2022021901/5b82f4217f8b9a47588c2134/html5/thumbnails/5.jpg)
Job Latencies Rapidly Decreasing
10 min. 10 sec. 100 ms 1 ms
2004: MapReduce batch job
2009: Hive query
2010: Dremel Query
2012: Impala query 2010:
In-memory Spark query
2013: Spark
streaming
![Page 6: 2013 12 02 Sparrow - EECS at UC Berkeleykeo/talks/sparrow-spark-summit... · Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica](https://reader031.vdocuments.site/reader031/viewer/2022021901/5b82f4217f8b9a47588c2134/html5/thumbnails/6.jpg)
Job latencies rapidly decreasing
![Page 7: 2013 12 02 Sparrow - EECS at UC Berkeleykeo/talks/sparrow-spark-summit... · Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica](https://reader031.vdocuments.site/reader031/viewer/2022021901/5b82f4217f8b9a47588c2134/html5/thumbnails/7.jpg)
Job latencies rapidly decreasing +
Spark deployments growing in size
Scheduling bottleneck!
![Page 8: 2013 12 02 Sparrow - EECS at UC Berkeleykeo/talks/sparrow-spark-summit... · Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica](https://reader031.vdocuments.site/reader031/viewer/2022021901/5b82f4217f8b9a47588c2134/html5/thumbnails/8.jpg)
Spark scheduler throughput:
1500 tasks / second
1 second 100
100 ms 10
10 second 1000
Task Duration Cluster size
(# 16-core machines)
![Page 9: 2013 12 02 Sparrow - EECS at UC Berkeleykeo/talks/sparrow-spark-summit... · Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica](https://reader031.vdocuments.site/reader031/viewer/2022021901/5b82f4217f8b9a47588c2134/html5/thumbnails/9.jpg)
Optimizing the Spark Scheduler
0.8: Monitoring code moved off critical path 0.8.1: Result deserialization moved off critical path Future improvements may yield 2-3x higher throughput
![Page 10: 2013 12 02 Sparrow - EECS at UC Berkeleykeo/talks/sparrow-spark-summit... · Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica](https://reader031.vdocuments.site/reader031/viewer/2022021901/5b82f4217f8b9a47588c2134/html5/thumbnails/10.jpg)
Is the scheduler the bottleneck in my cluster?
tinyurl.com/sparkdemo
![Page 11: 2013 12 02 Sparrow - EECS at UC Berkeleykeo/talks/sparrow-spark-summit... · Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica](https://reader031.vdocuments.site/reader031/viewer/2022021901/5b82f4217f8b9a47588c2134/html5/thumbnails/11.jpg)
Worker
Worker
Worker
Worker
Worker
Worker
Cluster Scheduler
Task launch
Task completion
tinyurl.com/sparkdemo
![Page 12: 2013 12 02 Sparrow - EECS at UC Berkeleykeo/talks/sparrow-spark-summit... · Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica](https://reader031.vdocuments.site/reader031/viewer/2022021901/5b82f4217f8b9a47588c2134/html5/thumbnails/12.jpg)
Worker
Worker
Worker
Worker
Worker
Worker
Cluster Scheduler
Task launch
Task completion
tinyurl.com/sparkdemo
![Page 13: 2013 12 02 Sparrow - EECS at UC Berkeleykeo/talks/sparrow-spark-summit... · Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica](https://reader031.vdocuments.site/reader031/viewer/2022021901/5b82f4217f8b9a47588c2134/html5/thumbnails/13.jpg)
Worker
Worker
Worker
Worker
Worker
Worker
Cluster Scheduler
Task launch
Task completion
Scheduler delay
tinyurl.com/sparkdemo
![Page 14: 2013 12 02 Sparrow - EECS at UC Berkeleykeo/talks/sparrow-spark-summit... · Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica](https://reader031.vdocuments.site/reader031/viewer/2022021901/5b82f4217f8b9a47588c2134/html5/thumbnails/14.jpg)
![Page 15: 2013 12 02 Sparrow - EECS at UC Berkeleykeo/talks/sparrow-spark-summit... · Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica](https://reader031.vdocuments.site/reader031/viewer/2022021901/5b82f4217f8b9a47588c2134/html5/thumbnails/15.jpg)
![Page 16: 2013 12 02 Sparrow - EECS at UC Berkeleykeo/talks/sparrow-spark-summit... · Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica](https://reader031.vdocuments.site/reader031/viewer/2022021901/5b82f4217f8b9a47588c2134/html5/thumbnails/16.jpg)
Spark Today
Worker
Worker
Worker
Worker
Worker
Worker
Spark Context
User 1 User 2 User 3
Query Compilation
Storage
Scheduling
![Page 17: 2013 12 02 Sparrow - EECS at UC Berkeleykeo/talks/sparrow-spark-summit... · Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica](https://reader031.vdocuments.site/reader031/viewer/2022021901/5b82f4217f8b9a47588c2134/html5/thumbnails/17.jpg)
Future Spark
Worker
Worker
Worker
Worker
Worker
Worker
User 1 User 2 User 3
Scheduler Query compilation
Scheduler Query compilation
Scheduler Query compilation
Benefits: High throughput Fault tolerance
![Page 18: 2013 12 02 Sparrow - EECS at UC Berkeleykeo/talks/sparrow-spark-summit... · Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica](https://reader031.vdocuments.site/reader031/viewer/2022021901/5b82f4217f8b9a47588c2134/html5/thumbnails/18.jpg)
Future Spark
Worker
Worker
Worker
Worker
Worker
Worker
User 1 User 2 User 3
Scheduler Query compilation
Scheduler Query compilation
Scheduler Query compilation
Storage:
Tachyon
![Page 19: 2013 12 02 Sparrow - EECS at UC Berkeleykeo/talks/sparrow-spark-summit... · Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica](https://reader031.vdocuments.site/reader031/viewer/2022021901/5b82f4217f8b9a47588c2134/html5/thumbnails/19.jpg)
Scheduling with Sparrow
Worker
Worker
Worker
Worker
Worker Scheduler
Scheduler
Scheduler
Scheduler Stage
Worker
![Page 20: 2013 12 02 Sparrow - EECS at UC Berkeleykeo/talks/sparrow-spark-summit... · Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica](https://reader031.vdocuments.site/reader031/viewer/2022021901/5b82f4217f8b9a47588c2134/html5/thumbnails/20.jpg)
Stage
Batch Sampling
Worker
Worker
Worker
Worker
Worker Scheduler
Scheduler
Scheduler
Scheduler
Worker
Place m tasks on the least loaded of 2m workers
4 probes (d = 2)
![Page 21: 2013 12 02 Sparrow - EECS at UC Berkeleykeo/talks/sparrow-spark-summit... · Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica](https://reader031.vdocuments.site/reader031/viewer/2022021901/5b82f4217f8b9a47588c2134/html5/thumbnails/21.jpg)
Queue length poor predictor of wait time
Worker
Worker
80 ms 155 ms
530 ms
Poor performance on heterogeneous workloads
![Page 22: 2013 12 02 Sparrow - EECS at UC Berkeleykeo/talks/sparrow-spark-summit... · Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica](https://reader031.vdocuments.site/reader031/viewer/2022021901/5b82f4217f8b9a47588c2134/html5/thumbnails/22.jpg)
Stage
Late Binding
Worker
Worker
Worker
Worker
Worker Scheduler
Scheduler
Scheduler
Scheduler
Worker
Place m tasks on the least loaded of d�m workers
4 probes (d = 2)
![Page 23: 2013 12 02 Sparrow - EECS at UC Berkeleykeo/talks/sparrow-spark-summit... · Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica](https://reader031.vdocuments.site/reader031/viewer/2022021901/5b82f4217f8b9a47588c2134/html5/thumbnails/23.jpg)
Late Binding
Scheduler
Scheduler
Scheduler
Scheduler
Place m tasks on the least loaded of d�m workers
4 probes (d = 2)
Worker
Worker
Worker
Worker
Worker
Worker
Stage
![Page 24: 2013 12 02 Sparrow - EECS at UC Berkeleykeo/talks/sparrow-spark-summit... · Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica](https://reader031.vdocuments.site/reader031/viewer/2022021901/5b82f4217f8b9a47588c2134/html5/thumbnails/24.jpg)
Late Binding
Scheduler
Scheduler
Scheduler
Scheduler
Place m tasks on the least loaded of d�m workers
Worker requests
task Worker
Worker
Worker
Worker
Worker
Worker
Stage
![Page 25: 2013 12 02 Sparrow - EECS at UC Berkeleykeo/talks/sparrow-spark-summit... · Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica](https://reader031.vdocuments.site/reader031/viewer/2022021901/5b82f4217f8b9a47588c2134/html5/thumbnails/25.jpg)
What about constraints?
![Page 26: 2013 12 02 Sparrow - EECS at UC Berkeleykeo/talks/sparrow-spark-summit... · Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica](https://reader031.vdocuments.site/reader031/viewer/2022021901/5b82f4217f8b9a47588c2134/html5/thumbnails/26.jpg)
Stage
Per-Task Constraints
Scheduler
Scheduler
Scheduler
Scheduler
Worker
Worker
Worker
Worker
Worker
Worker
Probe separately for each task
![Page 27: 2013 12 02 Sparrow - EECS at UC Berkeleykeo/talks/sparrow-spark-summit... · Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica](https://reader031.vdocuments.site/reader031/viewer/2022021901/5b82f4217f8b9a47588c2134/html5/thumbnails/27.jpg)
Technique Recap
Scheduler
Scheduler
Scheduler
Scheduler
Batch sampling +
Late binding +
Constraints
Worker
Worker
Worker
Worker
Worker
Worker
![Page 28: 2013 12 02 Sparrow - EECS at UC Berkeleykeo/talks/sparrow-spark-summit... · Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica](https://reader031.vdocuments.site/reader031/viewer/2022021901/5b82f4217f8b9a47588c2134/html5/thumbnails/28.jpg)
How well does Sparrow perform?
![Page 29: 2013 12 02 Sparrow - EECS at UC Berkeleykeo/talks/sparrow-spark-summit... · Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica](https://reader031.vdocuments.site/reader031/viewer/2022021901/5b82f4217f8b9a47588c2134/html5/thumbnails/29.jpg)
How does Sparrow compare to Spark’s native scheduler?
!"!#"""!$"""!%"""!&"""!'"""!("""
!"!#"""!$"""!%"""!&"""!'"""!("""
)*+,-.+*!/01*!21+3
/4+5!678490-.!21+3
:,485!.490;*!+<=*>7?*8:,488-@A>*4?
100 16-core EC2 nodes, 10 tasks/job, 10 schedulers, 80% load
![Page 30: 2013 12 02 Sparrow - EECS at UC Berkeleykeo/talks/sparrow-spark-summit... · Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica](https://reader031.vdocuments.site/reader031/viewer/2022021901/5b82f4217f8b9a47588c2134/html5/thumbnails/30.jpg)
TPC-H Queries: Background
TPC-H: Common benchmark for analytics workloads
Sparrow
Spark
Shark: SQL execution engine
![Page 31: 2013 12 02 Sparrow - EECS at UC Berkeleykeo/talks/sparrow-spark-summit... · Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica](https://reader031.vdocuments.site/reader031/viewer/2022021901/5b82f4217f8b9a47588c2134/html5/thumbnails/31.jpg)
!"!#""!$"""!$#""!%"""!%#""!&"""!&#""!'"""
(& (' () ($%
*+,-./,+!012+!32,4
'%$5!32+674 #&8)!32+674 599$!32+674
*:/6.2 ;-:<<.= >6+:?
TPC-H Queries
100 16-core EC2 nodes, 10 schedulers, 80% load
95
75
25
50
Percentiles
5
Within 12% of ideal Median queuing delay of 9ms
![Page 32: 2013 12 02 Sparrow - EECS at UC Berkeleykeo/talks/sparrow-spark-summit... · Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica](https://reader031.vdocuments.site/reader031/viewer/2022021901/5b82f4217f8b9a47588c2134/html5/thumbnails/32.jpg)
Policy Enforcement
Worker High Priority
Low Priority Worker User A (75%)
User B (25%)
Fair Shares Serve queues using weighted fair
queuing
Priorities Serve queues based on strict
priorities
![Page 33: 2013 12 02 Sparrow - EECS at UC Berkeleykeo/talks/sparrow-spark-summit... · Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica](https://reader031.vdocuments.site/reader031/viewer/2022021901/5b82f4217f8b9a47588c2134/html5/thumbnails/33.jpg)
Weighted Fair Sharing
!"!#"!$""!$#"!%""!%#"!&""!&#"!'""
!" !$" !%" !&" !'" !#"
()**+*,!-./0/
-+12!3/4
5/26!"5/26!$
![Page 34: 2013 12 02 Sparrow - EECS at UC Berkeleykeo/talks/sparrow-spark-summit... · Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica](https://reader031.vdocuments.site/reader031/viewer/2022021901/5b82f4217f8b9a47588c2134/html5/thumbnails/34.jpg)
Fault Tolerance
Scheduler 1
Scheduler 2
Spark Client 1 ✗ Spark Client 2
Timeout: 100ms Failover: 5ms
Re-launch queries: 15ms
!"!#"""!$"""!%"""!&"""
!" !#" !$" !%" !&" !'" !(")*+,!-./
!"!#"""!$"""!%"""!&"""
01,23!2,.456.,!7*+,!-+./
89*:12,;492<!=:*,67!#
;492<!=:*,67!$
![Page 35: 2013 12 02 Sparrow - EECS at UC Berkeleykeo/talks/sparrow-spark-summit... · Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica](https://reader031.vdocuments.site/reader031/viewer/2022021901/5b82f4217f8b9a47588c2134/html5/thumbnails/35.jpg)
Making Sparrow feature-complete
Interfacing with UI
Delay scheduling
Speculation
![Page 36: 2013 12 02 Sparrow - EECS at UC Berkeleykeo/talks/sparrow-spark-summit... · Sparrow Distributed Low-Latency Spark Scheduling Kay Ousterhout, Patrick Wendell, Matei Zaharia, Ion Stoica](https://reader031.vdocuments.site/reader031/viewer/2022021901/5b82f4217f8b9a47588c2134/html5/thumbnails/36.jpg)
(2) Distributed, fault-tolerant scheduling
with Sparrow
www.github.com/radlab/sparrow
Scheduler
Scheduler
Scheduler
Scheduler
Worker
Worker
Worker
Worker
Worker
Worker
(1) Diagnosing a Spark scheduling
bottleneck