dynamic resource allocation for distributed dataflows
TRANSCRIPT
Dynamic Resource Allocation for Distributed Dataflows
Lauritz ThamsenTechnische Universität Berlin
04.05.2018
Distributed Dataflows
• E.g. MapReduce, SCOPE, Spark, and Flink
• Used for scalable processing in many domains, e.g.• Search and data aggregation• Relational processing• Graph analysis• Machine learning
04.05.18 Dynamic Resource Allocation for Distributed Dataflows 2
Distributed Dataflow Jobs
04.05.18 Dynamic Resource Allocation for Distributed Dataflows 3
e.g. Map or Filter e.g. Reduce
Synchronization barrierStage
Compared to HPC
• High-level programming abstractions• Comprehensive frameworks and
distributed execution engines
• Usable even without extensive knowledge of parallel and distributed programming
04.05.18 Dynamic Resource Allocation for Distributed Dataflows 4
Yet Users Need to Allocate Resources
04.05.18 Dynamic Resource Allocation for Distributed Dataflows 5
ClusterManager
Worker Node
C
C
C
C
Worker Node
C
C
C
C
Worker Node
C
C
C
C
Job and scale-outC
C
Constraints of Production Jobs
• Users need to reserve resources for runtime targets:• Usability constraints, • Service level agreements,• Cluster reservation only valid for a certain time
• But do not necessarily understand workload dynamics
04.05.18 Dynamic Resource Allocation for Distributed Dataflows 6
Two-Fold Problem
• Runtime Prediction: Estimating performance upfront is difficult as it depends on many factors, e.g. programs, data, systems, and infrastructure
• Runtime Variance: Significant performance variance due to data locality, interference, and failures
04.05.18 Dynamic Resource Allocation for Distributed Dataflows 7
Given a runtime target for a recurring batch job,
how can minimal resources be allocated automatically?
Methods & Results
04.05.18 Dynamic Resource Allocation for Distributed Dataflows 10
J. Koch, L. Thamsen,F. Schmidt, and O. Kao. “SMiPE: Estimating the
Progress of Recurring Iterative Distributed Dataflows”.
PDCAT 2017. IEEE.
L. Thamsen, I. Verbitskiy,F. Schmidt, T. Renner, and
O. Kao. “Selecting Resources for Distributed Dataflow
Systems According to Runtime Targets”. IPCCC 2016. IEEE.
L. Thamsen, I. Verbitskiy,J. Beilharz, T. Renner, A. Polze, and O. Kao. “Ellis: Dynamically Scaling Distributed Dataflows
To Meet Runtime Targets”. CloudCom 2017. IEEE.
Dynamic Resource Allocation
Similarity MatchingScale-out Modeling Resource Allocation
Scale-out
Runtime
Dynamic Resource Allocation
1104.05.18 Dynamic Resource Allocation for Distributed Dataflows
Job + Runtime
Target
Job + Runtime
TargetScale-out Models
Job + Runtime
Target
Workload History
Scale-out ModelsJob +
Runtime Target
Workload History
Scale-out ModelsSimilarity MatchingJob +
Runtime Target
Workload History
Scale-out ModelsSimilarity MatchingJob +
Runtime Target
Workload History
Scale-out ModelsSimilarity MatchingJob +
Runtime Target
Workload History
Scale-out ModelsSimilarity MatchingJob +
Runtime Target
Workload History
Scale-out ModelsSimilarity MatchingJob +
Runtime Target
Workload History
Scale-out ModelsSimilarity Matching
Assumptions
• Distributed dataflow jobs with multiple stages
• Dedicated homogeneous clusters
• Recurring batch processing jobs
1204.05.18 Dynamic Resource Allocation for Distributed Dataflows
Dynamic Resource Allocation
Similarity MatchingScale-out Modeling Resource Allocation
Scale-out
Runtime
04.05.18 Dynamic Resource Allocation for Distributed Dataflows 13
• Find a function that predicts the runtimes of a job based on previous runs
Scale-out Modeling
14
Scale-out
Runt
ime
Example Job A:
Scale-out
Runt
ime
Example Job B:
04.05.18 Dynamic Resource Allocation for Distributed Dataflows
Parametric Regression
• Non-Negative Least Squares (NNLS) for a model of distributed data-parallel processing [Ernest]
15
!"#$%&' = )* + ), ⋅1
/0#$1%#'!2 + )3 ⋅ log /0#$1%#'!2 + )7 ⋅ /0#$1%#'!2
Sequential portion
Parallel portion
Tree-like communication
patterns
Collect-like communication
patterns
04.05.18 Dynamic Resource Allocation for Distributed Dataflows
Non-parametric Regression
• Assuming locally defined behavior with Local Linear Regression (LLR)
04.05.18 Dynamic Resource Allocation for Distributed Dataflows 16
Scale-out
Runt
ime
Automatic Model Selection
• Goal: model with high accuracy when possible, otherwise use a robust fallback
• Approach: • Use parametric model for extrapolation• Dynamically select prediction model for
interpolation with k-fold cross-validation
1704.05.18 Dynamic Resource Allocation for Distributed Dataflows
Evaluation Setup
• Using up to 60 commodity nodes (8 cores, 16 GB
RAM) and exemplary distributed dataflow jobs
1804.05.18 Dynamic Resource Allocation for Distributed Dataflows
Job System Dataset Input SizeWordCount Flink Wiki 250 GB
TPC-H Query 10 Flink Tables 200 GB
K-Means Flink Points 50 GB
Grep Spark Wiki 250 GB
SGD Spark Features 10 GB
PageRank Spark Graph 3.4 GB
Cross-Validation of Models
• Repeated random sub-sampling
04.05.18 Dynamic Resource Allocation for Distributed Dataflows 19
Scale-out
Runt
ime
Random training selection
Cross-Validation of Models
• Repeated random sub-sampling
04.05.18 Dynamic Resource Allocation for Distributed Dataflows 20
With increasing amounts of training data, n = …
Scale-out
Runt
ime
Relative Prediction Error
04.05.18 Dynamic Resource Allocation for Distributed Dataflows 21
Grep (Spark) SGD (Spark)
Dynamic Resource Allocation
Similarity MatchingScale-out Modeling Resource Allocation
Scale-out
Runtime
04.05.18 Dynamic Resource Allocation for Distributed Dataflows 22
Problem of Runtime Prediction Problem of Runtime Variance
Addresses
Select Executions for Model Training
• Recurring jobs that run on updated data, code, and parameters, yielding different runtimes• Idea: Similar executions à similar runtimes
2304.05.18 Dynamic Resource Allocation for Distributed Dataflows
excurexprev
use
discardexprev exprev exprev
exprev exprev exprev
Job History
Similarity Matching
Different Similarity Measures
• Functions s (excurrent, exprevious) ϵ [0,1]
04.05.18 Dynamic Resource Allocation for Distributed Dataflows 24
Measure Type
Input sizeOffline
Scale-out
Runtime of stages
OnlineConvergence
Resource utilization
t t t
Evaluation of Similarity Measures
0
1
0 1
Accu
racy
/ Sha
re /
Similarity
Aver
age
Accu
racy
04.05.18 Dynamic Resource Allocation for Distributed Dataflows 25
0
1
0 1
Accu
racy
/ Sha
re /
Similarity
Aver
age
Accu
racy
0
1
0 1
Accu
racy
/ Sha
re/
Similarity
Aver
age
Accu
racy
Normalized Similarity Quality
• Accuracy normalized over share of selected points
04.05.18 Dynamic Resource Allocation for Distributed Dataflows 26
0
1
0 1Percentage of Most Similar RunsAv
erag
eAc
cura
cy100% 0%
0
1
0 1
Accu
racy
/ Sha
re/
Similarity
Aver
age
Accu
racy
Top 5%
Top 25%
Evaluation of Similarity Measures
2704.05.18 Dynamic Resource Allocation for Distributed Dataflows
0,3
0,5
0,7
0,9
0 1
Aver
age
Accu
racy
Percentage of Most Similar Runs
0,3
0,5
0,7
0,9
0,010,3
0,5
0,7
0,9
0
Input Convergence Network-I/O
100% 0% 100% 0% 100% 0%
ConnectedComponents
% of Most Similar Runs
0,3
0,5
0,7
0,9
0 1
Aver
age
Accu
racy
Percentage of Most Similar Runs
0,3
0,5
0,7
0,9
0,010,3
0,5
0,7
0,9
0
Input Convergence Network-I/O
100% 0% 100% 0% 100% 0%
Page Rank
0,3
0,5
0,7
0,9
0 1
Aver
age
Accu
racy
Percentage of Most Similar Runs
0,3
0,5
0,7
0,9
0,010,3
0,5
0,7
0,9
0
Input Convergence Network-I/O
100% 0% 100% 0% 100% 0%% of Most Similar Runs
Stochastic Gradient Descent
0,3
0,5
0,7
0,9
0 1
Aver
age
Accu
racy
Percentage of Most Similar Runs
0,3
0,5
0,7
0,9
0,010,3
0,5
0,7
0,9
0
Input Convergence Network-I/O
100% 0% 100% 0% 100% 0%% of Most Similar Runs
Dynamic Resource Allocation
Similarity MatchingScale-out Modeling Resource Allocation
Scale-out
Runtime
04.05.18 Dynamic Resource Allocation for Distributed Dataflows 28
Problem of Runtime Prediction Problem of Runtime Variance
Addresses
Resource Allocation
• Search for minimal scale-out for a runtime target
04.05.18 Dynamic Resource Allocation for Distributed Dataflows 29
Runtim
e
Scale-out
Runtim
e
Scale-out Scale-out Scale-out
Runtim
e
Scale-out Scale-out
Stage-wise Dynamic Allocations
04.05.18 Dynamic Resource Allocation for Distributed Dataflows 30
Job + Runtime
TargetDynamic Resource Allocation
Scale-out
Runt
ime
Scale-outScale-out
Resource Manager
Cluster
Container
Container
Container
Container
App ClientApp Master
App Worker
App WorkerApp Worker
Prototype System: Ellis
• Integrated and usable on a per-job basis with YARN
04.05.18 Dynamic Resource Allocation for Distributed Dataflows 31
Resource Manager
Cluster
Container
Container
Container
Container
App ClientApp Master
EllisApp Worker
App WorkerApp Worker
Resource Manager
Cluster
Container
Container
Container
Container
App ClientApp Master
EllisApp Worker
App WorkerApp Worker
WorkloadHistory
Evaluation Setup
• Using up to 40 commodity nodes (8 cores, 16 GB RAM) and exemplary iterative Spark jobs (MLlib)
3204.05.18 Dynamic Resource Allocation for Distributed Dataflows
Job Dataset Input SizeMulti-Layer Perceptron (MLP) Multiclass 29 GBGradient Boosted Trees (GBT) Vandermonde 111 GBStochastic Gradient Descent (SGD) Tables 37 GBK-Means Points 50 GB
Results for Gradient Boosted Trees
3304.05.18 Dynamic Resource Allocation for Distributed Dataflows
Ellis static (only initially): Ellis dynamic (per stage):
Results for All Benchmark Jobs
• Comparison of !""#$ %&'()#*!""#$ $+(+#* in three metrics:
04.05.18 Dynamic Resource Allocation for Distributed Dataflows 34
Job Violation Count Violation Sum ResourcesSGD 25% 7% 94%
K-Means 35% 5% 80%
GBT 47% 29% 100%
MLP 69% 13% 127%
04.05.18 Dynamic Resource Allocation for Distributed Dataflows 35
Dynamic Resource Allocation
Similarity MatchingScale-out Modeling Resource Allocation
Scale-out
Runtime
Problem of Runtime Prediction Problem of Runtime Variance
Addresses
Summary
• New methods for dynamic resource allocation that are applicable to distributed dataflow frameworks
• Prototype integrated and usable with YARN
• Promising evaluation results
3604.05.18 Dynamic Resource Allocation for Distributed Dataflows
References
• [GoogleMR] J. Dean and S. Ghemawat. “MapReduce: Simplified Data Processing on Large Clusters”. Commun. ACM 51.1 (2008). ACM. 2008.
• [Twitter] G. Mishne, J. Dalton, Z. Li, A. Sharma, and J. Lin. “Fast Data in the Era of Big Data: Twitter’s Real-time Related Query Suggestion Architecture”. 2013 ACM SIGMOD International Conference on Management of Data (SIGMOD ’13). ACM. 2013.
• [Google Trace] C. Reiss, A. Tumanov, G. R. Ganger, R. H. Katz, and M. A. Kozuch, “Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis”, Third ACM Symposium on Cloud Computing (SoCC ’12). ACM. 2012.
• [Quasar] C. Delimitrou and C. Kozyrakis, “Quasar: Resource-efficient and QoS- aware Cluster Management”, 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’14). ACM. 2014.
• [Morpheus] S. A. Jyothi, C. Curino, I. Menache, S. M. Narayanamurthy, A. Tumanov, J. Yaniv, R. Mavlyutov, I. n. Goiri, S. Krishnan, J. Kulkarni, and S. Rao, “Morpheus: Towards Automated SLOs for Enterprise Clusters”, 12th USENIX Conference on Operating Systems Design and Implementation (OSDI’16). USENIX. 2016.
3804.05.18 Dynamic Resource Allocation for Distributed Dataflows