enabling cost-effective resource leases with virtual machines borja sotomayor university of chicago...

Post on 27-Mar-2015

217 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Enabling Cost-Effective Resource Leases with Virtual Machines

Borja SotomayorUniversity of Chicago

borja@cs.uchicago.edu

Ian FosterArgonne National Laboratory/

University of Chicago

keahey@mcs.anl.gov

Tim FreemanArgonne National Laboratory/

University of Chicago

tfreeman@mcs.anl.gov

Kate KeaheyArgonne National Laboratory/

University of Chicago

keahey@mcs.anl.gov

HPDC 2007 Hot Topics Session

Motivation Leasing resources for short periods of time can be of great

value to many applications. Workflows, real-time applications, and applications requiring

resource co-scheduling. Leasing semantics

The glidein approach: Condor glideins, MyCluster, and Falkon Advance reservations

Meta-scheduling, deadlines, demos Utilization problems

We argue that virtualization can make resource leasing cost-effective, despite the overhead of using VMs, thus:

Providing an incentive for resource providers to allow short-term leasing of resources.

Creating an opportunity for scientific applications (resource consumers) that require multi-level scheduling.

Approach

Separate resource provisioning from execution management. Resource provisioning is handled by a new component

called the Lease Manager Execution management can continue to be handled by

a site's current scheduler (PBS/Maui, SGE, Condor, ...)

All provisioning is handled via the use of VMs Including provisioning resources for a batch job

Use VM’s suspend/resume mechanisms to backfill and suspend non-interactive/batch applications

Node 1

Node 2

Node 3 S

HO

RT-T

ER

MLEA

SE

Node 1

Node 2

Node 3 S

HO

RT-T

ER

MLEA

SE

Scheduling the lease without using virtualization :

Scheduling the lease using virtualization:

Experiment Setting Simulated testbed of 8 nodes connected by 100Mbps

network, such that at most two VMs can run simultaneously on one node.

We consider the best and worst cases

Traces Artificial traces, combining serial batch requests and ARs Would require 10h to run on testbed (assuming perfect

utilization) VM runtime overhead assumed to be 10%

Experiments

Experiment I

Is using VMs for suspend/resume backfill worth the overhead?

Assumption: we are using only one VM image Prototype scheduler supporting batch serial requests

and advance reservations, using backfilling or suspend/resume to plan around the ARs.

A Resource Management Model for VM-Based Virtual Workspaces, B.Sotomayor, Masters paper, University of Chicago. February 2007.

Best-case trace

Trace characteristics Duration of batch requests: Avg=15 min. AR resource consumption: 75% - 100% Proportion of Batch/AR: 75%/25%

Benefits from suspend/resume because the large number of relatively long batch requests limit the efficiency of backfilling.

One Image (best case)

BaselineNot using VMs (no runtime overhead) and backfilling

instead of suspend/resume

One Image (best case)

Add Runtime OverheadRunning inside a VM adds runtime overhead, but not a big hit since images are

predeployed.

One Image (best case)Use Suspend/Resume

Allows for better resource utilization than backfilling, even better than baseline

(because of long batch requests)

Worst-case trace

Same as previous trace, but with shorter batch requests (avg=5 minutes)

This also entails that there are more batch requests, since the total running time of the trace is still 10h

With a large number of relatively short requests, backfilling is already very effective, and little is gained from suspend/resume.

Furthermore, many more images have to be deployed in this case, which increases the preparation overhead.

One Image (worst case)Baseline

Not using VMs (no runtime overhead) and backfilling

instead of suspend/resume

One Image (worst case)

Add Runtime OverheadRunning inside a VM adds runtime overhead, but not a big hit since images are

predeployed.

One Image (worst case)

Use Suspend/ResumeDoesn't provide any

significant advantage over backfilling because of short

batch requests.

Experiment II

How much do we pay for the added flexibility of operating in multiple virtualized environments?

Assumption: we are using multiple images Scheduler also has application-specific knowledge (i.e., it

knows it is scheduling VMs) so it is able to also schedule timely VM image transfer.

Image reuse strategies: realistically not all images will be different

Modification of Experiment I Use 37 possible 600MB VM images. 7 images account for

70% of requests.

Multiple Images (best case)

BaselineNot using VMs (no runtime overhead) and backfilling

instead of suspend/resume

Multiple Images (best case)

Transferring imagesAdds deployment overhead which delays starting time

of batch requests.

Multiple Images (best case)

Adding Runtime OverheadMakes running time even

larger

Multiple Images (best case)Use Suspend/Resume

Better resource utilization compensates for

deployment overhead.

Multiple Images (best case)Image Reuse

Improves performance slightly.

Multiple Images (worst case)

BaselineNot using VMs (no runtime overhead) and backfilling

instead of suspend/resume

Multiple Images (worst case)

Transferring imagesAdds deployment overhead which delays starting time

of batch requests.

Multiple Images (worst case)

Adding Runtime OverheadRelatively small performance hit (the least of our concerns here)

Multiple Images (worst case)

Use Suspend/ResumeDoesn't improve significantly

over backfilling, which already does a good job thanks to the

presence of small batch requests

Multiple Images (worst case)

Image ReuseCompensates for

deployment overhead. Still not as good as baseline,

but relatively small difference

Conclusions Using virtualization can make short-term

leasing with interesting semantics cost-effective even in the presence of runtime overhead

Given reasonable strategies of deployment overhead management the cost of using multiple images is acceptable. However, only artificial stress traces have been used

so far. Preliminary results with real traces suggest that short-term leases can be integrated into real workloads and still be cost-effective (we will release these results as soon as they're solid)

Ongoing Work

Develop a better scheduler Handle parallel batch submissions

Integrate this virtualized resource manager with existing LRM This work is our top-down effort We also have a bottom-up effort

Better modeling of traces Based on real world batch submissions Non-uniform overhead

Understanding VM overhead in practice Virtualization in Practice:

http://press.mcs.anl.gov/virtualization/

Questions?

Borja SotomayorUniversity of Chicago

borja@cs.uchicago.edu

Ian FosterArgonne National Laboratory/

University of Chicago

keahey@mcs.anl.gov

Tim FreemanArgonne National Laboratory/

University of Chicago

tfreeman@mcs.anl.gov

Kate KeaheyArgonne National Laboratory/

University of Chicago

keahey@mcs.anl.gov

top related