enabling cost-effective resource leases with virtual machines borja sotomayor university of chicago...
Post on 27-Mar-2015
217 Views
Preview:
TRANSCRIPT
Enabling Cost-Effective Resource Leases with Virtual Machines
Borja SotomayorUniversity of Chicago
borja@cs.uchicago.edu
Ian FosterArgonne National Laboratory/
University of Chicago
keahey@mcs.anl.gov
Tim FreemanArgonne National Laboratory/
University of Chicago
tfreeman@mcs.anl.gov
Kate KeaheyArgonne National Laboratory/
University of Chicago
keahey@mcs.anl.gov
HPDC 2007 Hot Topics Session
Motivation Leasing resources for short periods of time can be of great
value to many applications. Workflows, real-time applications, and applications requiring
resource co-scheduling. Leasing semantics
The glidein approach: Condor glideins, MyCluster, and Falkon Advance reservations
Meta-scheduling, deadlines, demos Utilization problems
We argue that virtualization can make resource leasing cost-effective, despite the overhead of using VMs, thus:
Providing an incentive for resource providers to allow short-term leasing of resources.
Creating an opportunity for scientific applications (resource consumers) that require multi-level scheduling.
Approach
Separate resource provisioning from execution management. Resource provisioning is handled by a new component
called the Lease Manager Execution management can continue to be handled by
a site's current scheduler (PBS/Maui, SGE, Condor, ...)
All provisioning is handled via the use of VMs Including provisioning resources for a batch job
Use VM’s suspend/resume mechanisms to backfill and suspend non-interactive/batch applications
Node 1
Node 2
Node 3 S
HO
RT-T
ER
MLEA
SE
Node 1
Node 2
Node 3 S
HO
RT-T
ER
MLEA
SE
Scheduling the lease without using virtualization :
Scheduling the lease using virtualization:
Experiment Setting Simulated testbed of 8 nodes connected by 100Mbps
network, such that at most two VMs can run simultaneously on one node.
We consider the best and worst cases
Traces Artificial traces, combining serial batch requests and ARs Would require 10h to run on testbed (assuming perfect
utilization) VM runtime overhead assumed to be 10%
Experiments
Experiment I
Is using VMs for suspend/resume backfill worth the overhead?
Assumption: we are using only one VM image Prototype scheduler supporting batch serial requests
and advance reservations, using backfilling or suspend/resume to plan around the ARs.
A Resource Management Model for VM-Based Virtual Workspaces, B.Sotomayor, Masters paper, University of Chicago. February 2007.
Best-case trace
Trace characteristics Duration of batch requests: Avg=15 min. AR resource consumption: 75% - 100% Proportion of Batch/AR: 75%/25%
Benefits from suspend/resume because the large number of relatively long batch requests limit the efficiency of backfilling.
One Image (best case)
BaselineNot using VMs (no runtime overhead) and backfilling
instead of suspend/resume
One Image (best case)
Add Runtime OverheadRunning inside a VM adds runtime overhead, but not a big hit since images are
predeployed.
One Image (best case)Use Suspend/Resume
Allows for better resource utilization than backfilling, even better than baseline
(because of long batch requests)
Worst-case trace
Same as previous trace, but with shorter batch requests (avg=5 minutes)
This also entails that there are more batch requests, since the total running time of the trace is still 10h
With a large number of relatively short requests, backfilling is already very effective, and little is gained from suspend/resume.
Furthermore, many more images have to be deployed in this case, which increases the preparation overhead.
One Image (worst case)Baseline
Not using VMs (no runtime overhead) and backfilling
instead of suspend/resume
One Image (worst case)
Add Runtime OverheadRunning inside a VM adds runtime overhead, but not a big hit since images are
predeployed.
One Image (worst case)
Use Suspend/ResumeDoesn't provide any
significant advantage over backfilling because of short
batch requests.
Experiment II
How much do we pay for the added flexibility of operating in multiple virtualized environments?
Assumption: we are using multiple images Scheduler also has application-specific knowledge (i.e., it
knows it is scheduling VMs) so it is able to also schedule timely VM image transfer.
Image reuse strategies: realistically not all images will be different
Modification of Experiment I Use 37 possible 600MB VM images. 7 images account for
70% of requests.
Multiple Images (best case)
BaselineNot using VMs (no runtime overhead) and backfilling
instead of suspend/resume
Multiple Images (best case)
Transferring imagesAdds deployment overhead which delays starting time
of batch requests.
Multiple Images (best case)
Adding Runtime OverheadMakes running time even
larger
Multiple Images (best case)Use Suspend/Resume
Better resource utilization compensates for
deployment overhead.
Multiple Images (best case)Image Reuse
Improves performance slightly.
Multiple Images (worst case)
BaselineNot using VMs (no runtime overhead) and backfilling
instead of suspend/resume
Multiple Images (worst case)
Transferring imagesAdds deployment overhead which delays starting time
of batch requests.
Multiple Images (worst case)
Adding Runtime OverheadRelatively small performance hit (the least of our concerns here)
Multiple Images (worst case)
Use Suspend/ResumeDoesn't improve significantly
over backfilling, which already does a good job thanks to the
presence of small batch requests
Multiple Images (worst case)
Image ReuseCompensates for
deployment overhead. Still not as good as baseline,
but relatively small difference
Conclusions Using virtualization can make short-term
leasing with interesting semantics cost-effective even in the presence of runtime overhead
Given reasonable strategies of deployment overhead management the cost of using multiple images is acceptable. However, only artificial stress traces have been used
so far. Preliminary results with real traces suggest that short-term leases can be integrated into real workloads and still be cost-effective (we will release these results as soon as they're solid)
Ongoing Work
Develop a better scheduler Handle parallel batch submissions
Integrate this virtualized resource manager with existing LRM This work is our top-down effort We also have a bottom-up effort
Better modeling of traces Based on real world batch submissions Non-uniform overhead
Understanding VM overhead in practice Virtualization in Practice:
http://press.mcs.anl.gov/virtualization/
Questions?
Borja SotomayorUniversity of Chicago
borja@cs.uchicago.edu
Ian FosterArgonne National Laboratory/
University of Chicago
keahey@mcs.anl.gov
Tim FreemanArgonne National Laboratory/
University of Chicago
tfreeman@mcs.anl.gov
Kate KeaheyArgonne National Laboratory/
University of Chicago
keahey@mcs.anl.gov
top related