fei dong shivnath babu - duke university
TRANSCRIPT
Herodotos Herodotou Fei Dong
Shivnath Babu
Duke University
Analysis in the Big Data Era Data Parallel Systems Scalability Fault-tolerance Reliability Flexibility
Cloud Computing Elasticity Reduced costs Disaster relief Highly automated
10/28/2011 2 Duke University
Data Parallel Systems in the Cloud Power to the (non-expert) users
Can provision cluster of choice in minutes Can avoid the middleman (system administrator) Pay only for the resources used
Burden to the (non-expert & expert) users Make decisions at various levels (resources → job) Meet workload requirements
10/28/2011 3 Duke University
Execu:ng MR Workloads on the Cloud
10/28/2011 Duke University 4
Number of nodes Node type
(~machine specs)
Goal: Execute MapReduce workload
Decisions:
TT TT
TT DN
TT TT
TT DN
Number of M/R slots Memory settings …
Number of M/R tasks Use compression …
Map
Map Red
Inpu
t
Out
put
Goal: Execute MapReduce workload in < 2 hours and < $10
Mul:-‐objec:ve Cluster Provisioning
0 200 400 600 800
1,000 1,200
m1.small m1.large m1.xlarge c1.medium c1.xlarge
Run
ning
Tim
e (m
in)
0.00 2.00 4.00 6.00 8.00
10.00
m1.small m1.large m1.xlarge c1.medium c1.xlarge
Cos
t ($)
Amazon EC2 Instance Type
10/28/2011 5 Duke University
Different machine specs
Moving from Develop. to Produc:on
10/28/2011 Duke University 6
Smaller, less powerful than production cluster
Develop & test MapReduce jobs
How will the jobs behave?
What settings to use for running the jobs?
Production Cluster
Development Cluster
Cluster Sizing Problem
Determine cluster resources & job-level configuration parameters to meet requirements on execution time & cost for a given workload
10/28/2011 Duke University 7
Solu:on: Elas:sizer
10/28/2011 Duke University 8
Elastisizer, part of the Starfish system http://www.cs.duke.edu/starfish
Cluster Sizing Query Interface
Resource Optimizer
Configuration Optimizer
What-if Engine
Profiler
Cluster Sizing Query Interface 1. Workload
MapReduce jobs Input data properties
2. Cluster Resources Number of nodes Type of nodes
3. Job Configurations Number of map/reduce tasks Task-level memory allocation
4. Performance Requirements Execution time Monetary cost
10/28/2011 9 Duke University
Abstracted using job profiles
Either specify or ask for recommendations
Optimize for one (possibly) subject to a constraint on the other
Cluster Sizing Example Queries
How will the jobs behave? Specify workload, production cluster’s resources &
particular configuration settings. Ask for performance. What settings to use?
Specify workload, production cluster’s resources. Ask for configuration settings.
10/28/2011 Duke University 10
Production Cluster
Development Cluster
What-if Question
Optimization Request
What-if Engine
What-‐if Ques:on
Job Profiles
Properties of Hypothetical job
Input Data Properties
Cluster Resources
Configuration Settings
Possibly Hypothetical
10/28/2011 11 Duke University
Job Profile Concise representation of program execution
Records information at the level of “task phases” Generated by Profiler through measurement or by the
What-if Engine through estimation
Memory Buffer
Merge
Sort, [Combine], [Compress]
Serialize, Partition map
Merge
split
DFS
Spill Collect Map Read 10/28/2011 12 Duke University
Job Profile Fields Dataflow: amount of data flowing through task phases Map output bytes Number of spills Number of records in buffer per spill
Costs: execution times at the level of task phases Read phase time in the map task Map phase time in the map task Spill phase time in the map task
Dataflow Statistics: statistical information about dataflow Width of input key-value pairs Map selectivity in terms of records Map output compression ratio
Cost Statistics: statistical information about resource costs I/O cost for reading from local disk per byte
CPU cost for executing the Mapper per record
CPU cost for uncompressing the input per byte
10/28/2011 13 Duke University
What-‐if Ques:on
Job Profiles
Properties of Hypothetical job
Input Data Properties
Cluster Resources
Configuration Settings
Possibly Hypothetical
10/28/2011 14 Duke University
What-if Engine Virtual Profile Estimation
Virtual Job Profiles
Task Scheduler Simulator
Virtual Profile Es:ma:on Given profile for job j, estimate profile for job j' based on new input data, cluster resources, & configuration
Dataflow Statistics
Dataflow
Cost Statistics
Costs
Profile for j
Resources
Relative Black-box
Models
Confi-guration
White-box Models
White-box Models
(Virtual) Profile for j'
Costs
Cost Statistics Dataflow
Dataflow Statistics
Input Data
Cardinality Models
10/28/2011 15 Duke University
Rela:ve Models for Cost Sta:s:cs (CS)
CSdev for job j
dev cluster resources r1 prod cluster resources r2
CSprod for job j Field Avg.
Local I/O cost / byte 17 Sort CPU cost / key 6
…
Field Avg. Local I/O cost / byte ? Sort CPU cost / key ?
…
Trai
ning
jobs
j1
j2
jn
… … Supervised learning algorithm
10/28/2011 16 Duke University
Genera:ng Training Samples Requirements
Good coverage of prediction space Generate samples efficiently
Idea 1: Try to cover the space based on workload
Apriori Training Benchmark Run a sample of the jobs in the workload Assumes knowledge of full workload Running time grows with workload & data New jobs not represented in training data
10/28/2011 17 Duke University
Genera:ng Training Samples Requirements
Good coverage of prediction space Generate samples efficiently
Idea 2: Focus on spectrum of resource usage
Fixed Training Benchmark Run representative MapReduce jobs with different resource
usage behavior Independent from actual workload & data
10/28/2011 18 Duke University
Genera:ng Training Samples Requirements
Good coverage of prediction space Generate samples efficiently
Idea 3: Focus on spectrum of resource usage + Exploit job profile abstraction
Custom Training Benchmark Small, synthetic workload Tasks within jobs behave differently in terms of CPU, I/O,
memory, and network Diverse training samples No knowledge of actual workload/input data
10/28/2011 19 Duke University
Experimental Evalua:on Different scenarios: Cluster × Workload × Data
EC2 Node Type
CPU: EC2 units
Mem I/O Perf. Cost /hour
#Maps /node
#Reds/node
MaxMem /task
m1.small 1 (1 x 1) 1.7 GB moderate $0.085 2 1 300 MB
m1.large 4 (2 x 2) 7.5 GB high $0.34 3 2 1024 MB
m1.xlarge 8 (4 x 2) 15 GB high $0.68 4 4 1536 MB
c1.medium 5 (2 x 2.5) 1.7 GB moderate $0.17 2 2 300 MB
c1.xlarge 20 (8 x 2.5) 7 GB high $0.68 8 6 400 MB
cc1.4xlarge 33.5 (8) 23 GB very high $1.60 8 6 1536 MB
10/28/2011 20 Duke University
Experimental Evalua:on Different scenarios: Cluster × Workload × Data
Abbr. MapReduce Program Domain Dataset
CO Word Co-occurrence Natural Lang Proc. Wikipedia (10GB – 22GB)
WC WordCount Text Analytics Wikipedia (30GB – 180GB)
TS TeraSort Business Analytics TeraGen (30GB – 1TB)
LG LinkGraph Graph Processing Wikipedia (compressed ~6x)
JO Join Business Analytics TPC-H (30GB – 1TB)
TF Term Freq. - Inverse Document Freq.
Information Retrieval Wikipedia (30GB – 180GB)
10/28/2011 21 Duke University
Mul:-‐objec:ve Cluster Provisioning
0 200 400 600 800
1,000 1,200
m1.small m1.large m1.xlarge c1.medium c1.xlarge
Run
ning
Tim
e (m
in) Actual
Predicted
0.00 2.00 4.00 6.00 8.00
10.00
m1.small m1.large m1.xlarge c1.medium c1.xlarge
Cos
t ($)
EC2 Instance Type for Target Cluster
Actual Predicted
Workload: All jobs, 22GB – 180GB per job Instance Type for Source Cluster: m1.large
10/28/2011 22 Duke University
Moving from Develop. to Produc:on
10/28/2011 Duke University 23
Profile a job on Development cluster, optimize for Production cluster Development cluster: 10 m1.large nodes (40 EC2 units), ~60 GB Production cluster: 30 m1.xlarge nodes (240 EC2 units), ~180 GB
0 5
10 15 20 25 30 35 40
TS WC LG JO TF CO
Run
ning
Tim
e (m
in)
MapReduce Jobs
Rules-of-Thumb Profile from Development Profile from Production
Tuning Cluster Size
10/28/2011 Duke University 24
0 10 20 30 40
5 10 20 30 Run
ning
Tim
e (m
in)
Number of Nodes
LinkGraph Actual Predicted
0 50
100 150 200
5 10 20 30 Run
ning
Tim
e (m
in)
Number of Nodes
Word Co-occurrence Actual Predicted
Profile a job on 10-node cluster, predict for other sizes Profile cluster: 10 m1.large nodes (40 EC2 units), ~60 GB Prediction clusters: 5-30 m1.large nodes (40 EC2 units), ~60 GB
Training Benchmarks Evalua:on
10/28/2011 Duke University 25
-50
50
150
250
m1.small m1.large m1.xlarge c1.medium c1.xlarge Trai
ning
Tim
e (m
in)
Instance Type
Apriori Fixed Custom
0.0
0.5
1.0
1.5
m1.small m1.xlarge c1.medium c1.xlarge Rel
ativ
e Pr
edic
tion
Err
or o
ver A
prio
ri
Ben
chm
ark
Instance Type
Apriori Fixed Custom
More info: www.cs.duke.edu/starfish
Job-level MapReduce
configuration
Workflow optimization
Workload management
Data layout tuning
Cluster sizing
J1 J2
J3
J4
10/28/2011 26 Duke University