optimizing mapreduce provisioning in the cloud
DESCRIPTION
Optimizing MapReduce Provisioning in the Cloud. Michael Cardosa, Aameek Singh†, Himabindu Pucha †, Abhishek Chandra http://www.cs.umn.edu/~cardosa Department of Computer Science, University of Minnesota † IBM Almaden Research Center. MapReduce Provisioning Problem. Platform: - PowerPoint PPT PresentationTRANSCRIPT
University of Minnesota
Optimizing MapReduce Provisioningin the Cloud
Michael Cardosa, Aameek Singh†,Himabindu Pucha†, Abhishek Chandra
http://www.cs.umn.edu/~cardosa
Department of Computer Science, University of Minnesota
†IBM Almaden Research Center
University of Minnesota
MapReduce Provisioning Problem Platform:
Virtualized Cloud Environment, which enables
Virtualized MapReduce Clusters Several MapReduce Jobs from different
users Goal: Optimize system-wide metrics, such
as: throughput, energy, load distribution, user costs
Problem: At the Cloud Service Provider level, how can we harvest opportunities to increase performance, save energy, or reduce user costs? 2
University of Minnesota
MapReduce Platform: Hadoop Open-source implementation of MapReduce
distributed computing framework Used widely: Yahoo, Facebook, NYT, (Google)
InputData
University of Minnesota
Hadoop Clusters
4
Distributed data Replicated chunks
Distributed computation Map/reduce tasks
Traditional: Dedicated physical nodes
University of Minnesota
Virtual Hadoop Clusters
5
Run Hadoop on top of VMs E.g.: Amazon Elastic MapReduce =
Hadoop+AmazonEC2
Server Pool
VM Pool
Hadoop Processes
University of Minnesota
Roadmap Intro & Problem Platform Overview Spatio-Temporal Insights for
Provisioning Building Blocks for MapReduce
Provisioning Case Study: Performance optimization Case Study: Energy optimization
6
University of Minnesota
Spatio-Temporal Insights for Provisioning
Initial Focus: Energy Savings Goal: Minimize energy usage
Energy+cooling ~ 42% of total cost [Hamilton08]
Problem: How to place the VMs on available physical servers to minimize energy usage? Minimize Cumulative Machine Uptime (CMU)
7
University of Minnesota
VM Placement: Spatial Fit
8
Job 1 Job 2 Job 3 Job 4
Co-Place complementary
workloads
University of Minnesota
Which placement is better?
9
20min
10min
100min
20min20min
20min
SHUTDOWN SHUTDOWN
A B
University of Minnesota
Time Balancing
10
20 25
90
20 25 20 25
20 25
30
20 25
30
20 25
30
Time Balance
University of Minnesota
Building Blocks for Provisioning
11
Objective-drivenresource provisioning
MapReduce Jobs
Jobprofiling
Clusterscaling Migration
Cloud Execution Environment
Initial Provisioning Continuous Optimization
University of Minnesota
Building Blocks for Provisioning Job Profiling: MapReduce job runtime
estimation Based on number of VMs allocated to job Based on input data size Offline and Online Profiling
Cluster Scaling: Changing number of VMs allocated to a particular MapReduce job Affects runtime of job; relies on Job Profiling
model Migration: Useful for continuous
optimization Load balancing, VM consolidation
12
University of Minnesota
Job Profiling: Runtime Estimation Based on Number of VMs
13
University of Minnesota
Job Profiling: Runtime Estimation Based on Input Data Size
14
University of Minnesota
Job Profiling: Runtime Estimation Online Profiling: Additional refinement
15
University of Minnesota
Cluster Scaling Increasing allocated resources (typical):
Add additional VMs to join virtualized Hadoop cluster
Job performance increases, runtime decreases
E.g, for Time Balancing: Energy reasons E.g, Load Balancing and Deadlines:
Performance
16
University of Minnesota
Cluster Scaling: Time Balancing
17
20 25
90
20 25 20 25
20 25
30
20 25
30
20 25
30
Time Balance
University of Minnesota
Roadmap Intro & Problem Platform Overview Spatio-Temporal Insights for
Provisioning Building Blocks for MapReduce
Provisioning Case Study: Performance optimization Case Study: Energy optimization
18
University of Minnesota
Case Study: Performance & Deadlines
Goal: Meet deadlines for MapReduce jobs Determine initial allocation accurately Dynamically adjust allocation to meet
deadline if necessary Monitoring: Use offline profiling to estimate
number of VMs needed based on past performance
Actuation: Online profiling: Trigger points to invoke cluster scaling
19
University of Minnesota
Case Study: Energy Savings Goal: Minimize energy consumption from
the execution of a large batch of MapReduce jobs Energy+cooling ~ 42% of total cost
[Hamilton08] Pass energy savings on to users
Problem: How to place the VMs on available physical servers to minimize energy usage? Minimize Cumulative Machine Uptime (CMU)
20
University of Minnesota
Case Study: Energy Savings Use Job Profiling to place similar-runtime
VMs together for initial provisioning Use Job Profiling to adjust number of
VMs in each cluster to adjust runtimes if needed
Monitoring: Online profiling to determine when energy could be saved by using migration or cluster scaling
Actuation: Use Cluster Scaling or Migration to dynamically adjust for inaccuracies/unknowns in initial provisioning
21
University of Minnesota
Conclusion Framework: Building blocks (STEAMEngine)
for the optimization of MapReduce provisioning from a cloud service provider perspective
Preliminary evaluations to validate usefulness of each building block
Approaches for applying building blocks to meet specific goals, e.g. performance, energy
22
University of Minnesota
Thank you! Questions?
23
University of Minnesota
Job Profiling: Runtime Estimation Based on Number of VMs
24
University of Minnesota
Cluster Scaling Increasing allocated resources (typical):
Add additional VMs to join virtualized Hadoop cluster
Job performance increases, runtime decreases
E.g, for Time Balancing: Energy reasons E.g, Load Balancing and Deadlines:
Performance
25