making sense of spark performance kay ousterhout uc berkeley in collaboration with ryan rasti,...
TRANSCRIPT
![Page 1: Making Sense of Spark Performance Kay Ousterhout UC Berkeley In collaboration with Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, and Byung-Gon Chun eecs.berkeley.edu/~keo/traces](https://reader035.vdocuments.site/reader035/viewer/2022070323/56649d315503460f94a09a7f/html5/thumbnails/1.jpg)
Making Sense of Spark Performance
Kay Ousterhout
UC Berkeley
In collaboration with Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, and Byung-Gon Chun
eecs.berkeley.edu/~keo/traces
![Page 2: Making Sense of Spark Performance Kay Ousterhout UC Berkeley In collaboration with Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, and Byung-Gon Chun eecs.berkeley.edu/~keo/traces](https://reader035.vdocuments.site/reader035/viewer/2022070323/56649d315503460f94a09a7f/html5/thumbnails/2.jpg)
About Me
PhD student in Computer Science at UC Berkeley
Thesis work centers around performance of large-scale distributed systems
Spark PMC member
![Page 3: Making Sense of Spark Performance Kay Ousterhout UC Berkeley In collaboration with Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, and Byung-Gon Chun eecs.berkeley.edu/~keo/traces](https://reader035.vdocuments.site/reader035/viewer/2022070323/56649d315503460f94a09a7f/html5/thumbnails/3.jpg)
About This Talk
Overview of how Spark works
How we measured performance bottlenecks
In-depth performance analysis for a few workloads
Demo of performance analysis tool
![Page 4: Making Sense of Spark Performance Kay Ousterhout UC Berkeley In collaboration with Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, and Byung-Gon Chun eecs.berkeley.edu/~keo/traces](https://reader035.vdocuments.site/reader035/viewer/2022070323/56649d315503460f94a09a7f/html5/thumbnails/4.jpg)
…
I am SamI am Sam
Sam I amDo you like
Green eggs and ham?
Clu
ster
of
mac
hine
sCount the # of words in the document
Thank you, Sam I am
6
6
4
5
Spark driver:6+6+4+5 = 21
Spark (or Hadoop/Dryad/etc.) task
![Page 5: Making Sense of Spark Performance Kay Ousterhout UC Berkeley In collaboration with Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, and Byung-Gon Chun eecs.berkeley.edu/~keo/traces](https://reader035.vdocuments.site/reader035/viewer/2022070323/56649d315503460f94a09a7f/html5/thumbnails/5.jpg)
…
Count the # of occurrences of each word
{I: 4,you: 2,
…}
{am: 4,Green: 1,
…}
{Sam: 4,…}
{Thank: 1,eggs: 1,
…}
{I: 2,am: 2,
…}
{Sam: 1,I: 1,… }
{Green: 1,eggs: 1,
… }
{Thank: 1, you: 1,… }
I am SamI am Sam
Sam I amDo you like
Green eggs and ham?
Thank you, Sam I am
MAP REDUCE
![Page 6: Making Sense of Spark Performance Kay Ousterhout UC Berkeley In collaboration with Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, and Byung-Gon Chun eecs.berkeley.edu/~keo/traces](https://reader035.vdocuments.site/reader035/viewer/2022070323/56649d315503460f94a09a7f/html5/thumbnails/6.jpg)
…
Performance considerations
(1) Caching input data
(2) Scheduling: assigning tasks to machines
(3) Straggler tasks
(4) Network performance (e.g., during shuffle)
I am SamI am Sam
Sam I amDo you like
Green eggs and ham?
Thank you, Sam I am
![Page 7: Making Sense of Spark Performance Kay Ousterhout UC Berkeley In collaboration with Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, and Byung-Gon Chun eecs.berkeley.edu/~keo/traces](https://reader035.vdocuments.site/reader035/viewer/2022070323/56649d315503460f94a09a7f/html5/thumbnails/7.jpg)
Stragglers Scarlett [EuroSys ‘11], SkewTune [SIGMOD ‘12], LATE [OSDI ‘08], Mantri [OSDI ‘10], Dolly [NSDI ‘13], GRASS [NSDI ‘14], Wrangler [SoCC ’14]
Caching PACMan [NSDI ’12], Spark [NSDI ’12], Tachyon [SoCC ’14]
Scheduling Sparrow [SOSP ‘13], Apollo [OSDI ’14], Mesos [NSDI ‘11], DRF [NSDI ‘11], Tetris [SIGCOMM ’14], Omega [Eurosys ’13], YARN [SoCC ’13], Quincy [SOSP ‘09], KMN [OSDI ’14]
Generalized programming model Dryad [Eurosys ‘07], Spark [NSDI ’12]
Network VL2 [SIGCOMM ‘09], Hedera [NSDI ’10], Sinbad
[SIGCOMM ’13], Orchestra [SIGCOMM ’11], Baraat [SIGCOMM ‘14], Varys [SIGCOMM ’14], PeriSCOPE [OSDI ‘12], SUDO [NSDI ’12], Camdoop [NSDI ’12], Oktopus [SIGCOMM ‘11]), EyeQ [NSDI ‘12], FairCloud [SIGCOMM ’12]
![Page 8: Making Sense of Spark Performance Kay Ousterhout UC Berkeley In collaboration with Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, and Byung-Gon Chun eecs.berkeley.edu/~keo/traces](https://reader035.vdocuments.site/reader035/viewer/2022070323/56649d315503460f94a09a7f/html5/thumbnails/8.jpg)
Stragglers Scarlett [EuroSys ‘11], SkewTune [SIGMOD ‘12], LATE [OSDI ‘08], Mantri [OSDI ‘10], Dolly [NSDI ‘13], GRASS [NSDI ‘14], Wrangler [SoCC ’14]
Caching PACMan [NSDI ’12], Spark [NSDI ’12], Tachyon [SoCC ’14]
Scheduling Sparrow [SOSP ‘13], Apollo [OSDI ’14], Mesos [NSDI ‘11], DRF [NSDI ‘11], Tetris [SIGCOMM ’14], Omega [Eurosys ’13], YARN [SoCC ’13], Quincy [SOSP ‘09], KMN [OSDI ’14]
Generalized programming model Dryad [Eurosys ‘07], Spark [NSDI ’12]
Network VL2 [SIGCOMM ‘09], Hedera [NSDI ’10], Sinbad
[SIGCOMM ’13], Orchestra [SIGCOMM ’11], Baraat [SIGCOMM ‘14], Varys [SIGCOMM ’14], PeriSCOPE [OSDI ‘12], SUDO [NSDI ’12], Camdoop [NSDI ’12], Oktopus [SIGCOMM ‘11]), EyeQ [NSDI ‘12], FairCloud [SIGCOMM ’12]
Network and disk I/O are bottlenecks
Stragglers are a major issue withunknown causes
![Page 9: Making Sense of Spark Performance Kay Ousterhout UC Berkeley In collaboration with Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, and Byung-Gon Chun eecs.berkeley.edu/~keo/traces](https://reader035.vdocuments.site/reader035/viewer/2022070323/56649d315503460f94a09a7f/html5/thumbnails/9.jpg)
(1) Methodology for quantifying performance bottlenecks
(2) Bottleneck measurement for 3 SQL workloads (TPC-DS and 2
others)
This Work
![Page 10: Making Sense of Spark Performance Kay Ousterhout UC Berkeley In collaboration with Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, and Byung-Gon Chun eecs.berkeley.edu/~keo/traces](https://reader035.vdocuments.site/reader035/viewer/2022070323/56649d315503460f94a09a7f/html5/thumbnails/10.jpg)
Network optimizations
can reduce job completion time by at most 2%
CPU (not I/O) often the bottleneck
Most straggler causes can be identified and fixed
![Page 11: Making Sense of Spark Performance Kay Ousterhout UC Berkeley In collaboration with Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, and Byung-Gon Chun eecs.berkeley.edu/~keo/traces](https://reader035.vdocuments.site/reader035/viewer/2022070323/56649d315503460f94a09a7f/html5/thumbnails/11.jpg)
network read
compute
disk write
time: time to handle one record
Example Spark task:
Fine-grained instrumentation needed to understand performance
![Page 12: Making Sense of Spark Performance Kay Ousterhout UC Berkeley In collaboration with Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, and Byung-Gon Chun eecs.berkeley.edu/~keo/traces](https://reader035.vdocuments.site/reader035/viewer/2022070323/56649d315503460f94a09a7f/html5/thumbnails/12.jpg)
How much faster would a job run if the network were infinitely fast?
What’s an upper bound on the improvement from network
optimizations?
![Page 13: Making Sense of Spark Performance Kay Ousterhout UC Berkeley In collaboration with Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, and Byung-Gon Chun eecs.berkeley.edu/~keo/traces](https://reader035.vdocuments.site/reader035/viewer/2022070323/56649d315503460f94a09a7f/html5/thumbnails/13.jpg)
network read
compute
disk write
Original task runtime
How much faster could a task run if the network were infinitely fast?
compute
Task runtime with infinitely fast network
: blocked on disk: blocked on network
![Page 14: Making Sense of Spark Performance Kay Ousterhout UC Berkeley In collaboration with Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, and Byung-Gon Chun eecs.berkeley.edu/~keo/traces](https://reader035.vdocuments.site/reader035/viewer/2022070323/56649d315503460f94a09a7f/html5/thumbnails/14.jpg)
How much faster would a job run if the network were infinitely fast?
Task 0
Task 1
Task 2
time
2 sl
ots
to: Original job completion time
Task 0
Task 1 Task 22 sl
ots
tn: Job completion time with infinitely fast network
: time blocked on network
![Page 15: Making Sense of Spark Performance Kay Ousterhout UC Berkeley In collaboration with Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, and Byung-Gon Chun eecs.berkeley.edu/~keo/traces](https://reader035.vdocuments.site/reader035/viewer/2022070323/56649d315503460f94a09a7f/html5/thumbnails/15.jpg)
SQL Workloads
TPC-DS (20 machines, 850GB; 60 machines, 2.5TB)
www.tpc.org/tpcds
Big Data Benchmark (5 machines, 60GB)amplab.cs.berkeley.edu/benchmark
Databricks (9 machines, tens of GB)databricks.com
2 versions of each: in-memory, on-disk
![Page 16: Making Sense of Spark Performance Kay Ousterhout UC Berkeley In collaboration with Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, and Byung-Gon Chun eecs.berkeley.edu/~keo/traces](https://reader035.vdocuments.site/reader035/viewer/2022070323/56649d315503460f94a09a7f/html5/thumbnails/16.jpg)
How much faster could jobs get from optimizing network performance?
Median improvement at most 2%
5
95
75
25
50
Percentiles
![Page 17: Making Sense of Spark Performance Kay Ousterhout UC Berkeley In collaboration with Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, and Byung-Gon Chun eecs.berkeley.edu/~keo/traces](https://reader035.vdocuments.site/reader035/viewer/2022070323/56649d315503460f94a09a7f/html5/thumbnails/17.jpg)
How can we sanity check these numbers?
![Page 18: Making Sense of Spark Performance Kay Ousterhout UC Berkeley In collaboration with Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, and Byung-Gon Chun eecs.berkeley.edu/~keo/traces](https://reader035.vdocuments.site/reader035/viewer/2022070323/56649d315503460f94a09a7f/html5/thumbnails/18.jpg)
How much data is transferred per CPU second?
Microsoft ’09-’10: 1.9–6.35 Mb / task second
Google ’04-‘07: 1.34–1.61 Mb / machine second
![Page 19: Making Sense of Spark Performance Kay Ousterhout UC Berkeley In collaboration with Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, and Byung-Gon Chun eecs.berkeley.edu/~keo/traces](https://reader035.vdocuments.site/reader035/viewer/2022070323/56649d315503460f94a09a7f/html5/thumbnails/19.jpg)
How can this be true?
Shuffle Data < Input Data
![Page 20: Making Sense of Spark Performance Kay Ousterhout UC Berkeley In collaboration with Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, and Byung-Gon Chun eecs.berkeley.edu/~keo/traces](https://reader035.vdocuments.site/reader035/viewer/2022070323/56649d315503460f94a09a7f/html5/thumbnails/20.jpg)
What kind of hardware should I buy?
10Gbps networking hardware likely not necessary!
![Page 21: Making Sense of Spark Performance Kay Ousterhout UC Berkeley In collaboration with Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, and Byung-Gon Chun eecs.berkeley.edu/~keo/traces](https://reader035.vdocuments.site/reader035/viewer/2022070323/56649d315503460f94a09a7f/html5/thumbnails/21.jpg)
How much faster would jobs complete if the disk were infinitely fast?
![Page 22: Making Sense of Spark Performance Kay Ousterhout UC Berkeley In collaboration with Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, and Byung-Gon Chun eecs.berkeley.edu/~keo/traces](https://reader035.vdocuments.site/reader035/viewer/2022070323/56649d315503460f94a09a7f/html5/thumbnails/22.jpg)
How much faster could jobs get from optimizing disk performance?
Median improvement at most 19%
![Page 23: Making Sense of Spark Performance Kay Ousterhout UC Berkeley In collaboration with Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, and Byung-Gon Chun eecs.berkeley.edu/~keo/traces](https://reader035.vdocuments.site/reader035/viewer/2022070323/56649d315503460f94a09a7f/html5/thumbnails/23.jpg)
Disk Configuration
Our instances: 2 disks, 8 cores
Cloudera:– At least 1 disk for every 3 cores– As many as 2 disks for each core
Our instances are under provisioned results are upper bound
![Page 24: Making Sense of Spark Performance Kay Ousterhout UC Berkeley In collaboration with Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, and Byung-Gon Chun eecs.berkeley.edu/~keo/traces](https://reader035.vdocuments.site/reader035/viewer/2022070323/56649d315503460f94a09a7f/html5/thumbnails/24.jpg)
How much data is transferred per CPU second?
Google: 0.8-1.5 MB / machine second
Microsoft: 7-11 MB / task second
![Page 25: Making Sense of Spark Performance Kay Ousterhout UC Berkeley In collaboration with Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, and Byung-Gon Chun eecs.berkeley.edu/~keo/traces](https://reader035.vdocuments.site/reader035/viewer/2022070323/56649d315503460f94a09a7f/html5/thumbnails/25.jpg)
What does this mean about Spark versus Hadoop?
Faster
serialized + compressed on-disk
data
serialized + compressed
in-memory data
This work:19%
![Page 26: Making Sense of Spark Performance Kay Ousterhout UC Berkeley In collaboration with Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, and Byung-Gon Chun eecs.berkeley.edu/~keo/traces](https://reader035.vdocuments.site/reader035/viewer/2022070323/56649d315503460f94a09a7f/html5/thumbnails/26.jpg)
This work says nothing about Spark vs. Hadoop!
(on-disk data)
deserialized in-memory
data
up to 10xspark.apache.org
6x or moreamplab.cs.berkeley.e
du/benchmark/
Faster
serialized + compressed on-disk
data
serialized + compressed
in-memory data
This work:19%
![Page 27: Making Sense of Spark Performance Kay Ousterhout UC Berkeley In collaboration with Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, and Byung-Gon Chun eecs.berkeley.edu/~keo/traces](https://reader035.vdocuments.site/reader035/viewer/2022070323/56649d315503460f94a09a7f/html5/thumbnails/27.jpg)
What causes stragglers?
Takeaway: causes depend on the workload, but disk and garbage collection common
Fixing straggler causes can speed up other tasks too
![Page 28: Making Sense of Spark Performance Kay Ousterhout UC Berkeley In collaboration with Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, and Byung-Gon Chun eecs.berkeley.edu/~keo/traces](https://reader035.vdocuments.site/reader035/viewer/2022070323/56649d315503460f94a09a7f/html5/thumbnails/28.jpg)
Live demo
![Page 29: Making Sense of Spark Performance Kay Ousterhout UC Berkeley In collaboration with Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, and Byung-Gon Chun eecs.berkeley.edu/~keo/traces](https://reader035.vdocuments.site/reader035/viewer/2022070323/56649d315503460f94a09a7f/html5/thumbnails/29.jpg)
eecs.berkeley.edu/~keo/traces
![Page 31: Making Sense of Spark Performance Kay Ousterhout UC Berkeley In collaboration with Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, and Byung-Gon Chun eecs.berkeley.edu/~keo/traces](https://reader035.vdocuments.site/reader035/viewer/2022070323/56649d315503460f94a09a7f/html5/thumbnails/31.jpg)
Network optimizations
can reduce job completion time by at most 2%
CPU (not I/O) often the bottleneck19% reduction in completion time from optimizing disk
Many straggler causes can be identified and fixed
Project webpage (with links to paper and tool): eecs.berkeley.edu/~keo/traces
Contact: [email protected], @kayousterhout
![Page 32: Making Sense of Spark Performance Kay Ousterhout UC Berkeley In collaboration with Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, and Byung-Gon Chun eecs.berkeley.edu/~keo/traces](https://reader035.vdocuments.site/reader035/viewer/2022070323/56649d315503460f94a09a7f/html5/thumbnails/32.jpg)
Backup Slides
![Page 33: Making Sense of Spark Performance Kay Ousterhout UC Berkeley In collaboration with Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, and Byung-Gon Chun eecs.berkeley.edu/~keo/traces](https://reader035.vdocuments.site/reader035/viewer/2022070323/56649d315503460f94a09a7f/html5/thumbnails/33.jpg)
How do results change with scale?
![Page 34: Making Sense of Spark Performance Kay Ousterhout UC Berkeley In collaboration with Ryan Rasti, Sylvia Ratnasamy, Scott Shenker, and Byung-Gon Chun eecs.berkeley.edu/~keo/traces](https://reader035.vdocuments.site/reader035/viewer/2022070323/56649d315503460f94a09a7f/html5/thumbnails/34.jpg)
How does the utilization compare?