ming mao, jie li, marty humphrey escience group cs department, university of virginia grid 2010 –...

20
Cloud Auto-Scaling with Deadline and Budget Constraints Ming Mao, Jie Li, Marty Humphrey eScience Group CS Department, University of Virginia Grid 2010 – Oct 27, 2010

Upload: kyler-casson

Post on 28-Mar-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Cloud Auto-Scaling with Deadline and Budget Constraints

Ming Mao, Jie Li, Marty Humphrey

eScience Group

CS Department, University of Virginia

Grid 2010 – Oct 27, 2010

Cloud Computing

A fast growing computing platform IDC - Cloud spending increases 27.4% a year to $56

billion (compared 5% a year of traditional IT) $16.5 billion (2009) -> $55.5 billion (2014)

src: Worldwide and Regional Public IT Cloud Service 2010-2014 Forecast

Two most quoted benefits Scalable computing and storage Reduced cost

Concerns Security, availability, cost management, integration

interoperability, etc.

Cost

Q1. Cost – the most important factor in practice?

Q2. Moving into Cloud == Reduced Cost ?

54.00%

63.90%

64.60%

67.00%

68.50%

75.30%

77.70%

77.90%

0.00% 20.00% 40.00% 60.00% 80.00% 100.00%

Seems like the way of future

Sharing systems with partners simpler

Alwasys offers latest functionality

Requires less in-house IT staff, costs

Encourages standard systems

Monthly payments

Easy/fast to deply to end-users

Pay only for what you use

Source: IDC Enterprise Panel, 3Q09, n = 263, Sep 2009

Rate the benefits commonly ascribed to the cloud on-demand model

72.90%78.30%79.20%81.00%82.10%

84.50%86.00%87.80%88.60%

91.60%

0.00% 20.00% 40.00% 60.00% 80.00% 100.00%

Have local presence, can come to my officesAre a technology and business model innovatorOffer both on-premise and public cloud services

Support many of my IT needesAllow managing on-premise & cloud together

Understand my business and industryProvide a complete solution

Option to move cloud offerings back on premiseOffer Service Level Agreements

Offer competitive pricing

Source: IDC Enterprise Panel, 3Q09, n = 263, Sep 2009

How important is it that Cloud service providers...

Current Auto-Scaling Mechanisms

Resource utilization information based triggers (e.g. AWS auto-scaling, RightScale, enStratus, Scalr, etc)

Where does the gap exist?

Multiple instance types

Current billing models Full hour billing

Non-ignorable instance acquisition time 7-15 min in Windows Azure

More specific performance goals

Budget awareness (e.g. dollars/month, dollars/job)

Problem Statement

Deadline(Job finish time)

Cost

Problem Statement – how to enable cloud applications to finish all the submitted jobs before user specified deadline with as little money as possible using auto-scaling.

CloudApplication

Users

Job

Cloud Server

Cloud Application Performance Model

Workload are non-dependent jobs submitted in the job queue

FCFS manner and fairly distributed

Different classes of jobs

Same performance goal (e.g.1 hour deadline)

VM instances take time to startup

Problem Formalization (1)

ijinijiViI idiV,i jt

Key variables used in the model

Problem Formalization (2)

Workload

Computing Power of Instance

Running Instance

Pending Instance

( , )j jW J n

, ( )

( , )i

ji j

j type I jj

D nP J

t n

( )

, ( )

( ( ))( , )i

i

type I i ji j

j type I jj

D d s nP J

t n

iI

Problem Formalization (3)

Scale up Sufficient budget

Insufficient budget

Scale down

'iiP W P ( ')( )

itype IiMin c

( ')iMax P ( ') ( )i itype I type Ii ic C c

i siP P W

An example

Workload Required Computing Power

1

2

3

21

: 60 10 10 40

: 60 5 20 35

: 60 20 5 35

'

j x

j y

j z

P W I I

1

2 1 2 3

3

1 2 3

: 10 10 10 45

: ' 5 ' 20 ' 10 35

: 20 5 10 35

'

j x

j n n n y

j z

V V V P

1 1 2 2 3 3( ' ' ')Min c n c n c n

1 21 1 2 2 3 3 ( ) ( )' ' ' type I type Ic n c n c n c c C where

Windows Azure Implementation

Cloud Cruise Control

Decider

&

Monitor RepositoryVM

Manager

Config

VM instancesHistorical

Data

workload

dequeue

enqueue

update update

+ , –

vm plan

vm info

( ')( )itype Ii

Min c 'jjP W P admin

users

dynamicconfiguration

notify

Evaluation - Simulation

MixAvg 30 jobs/hourSTD 5 jobs/hour

Computing Intensive

Avg 30 jobs/hourSTD 5 jobs/hour

IO IntensiveAvg 30 jobs/hourSTD 5 jobs/hour

General0.085$/hourDelay 600s

Average 300sSTD 50s

Average 300sSTD 50s

Average 300sSTD 50s

High-CPU0.17$/hourDelay 720s

Average 210sSTD 25s

Average 75sSTD 15s

Average 300sSTD 50s

High-IO0.17$/hourDelay 720s

Average 210sSTD 25s

Average 300sSTD 50s

Average 75sSTD 15s

Workload & VM simulation parameters

Stable workload & changing deadline

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

100.00%

0

1000

2000

3000

4000

5000

6000

7000

0 10 20 30 40 50 60 70 80

Utilization (%)Response (sec)

Time (hour)

Stable Worload & Changing Deadline

utilization deadline avg max min

Changing workload & fixed deadline

0

50

100

150

200

250

300

350

0

500

1000

1500

2000

2500

3000

3500

4000

0 10 20 30 40 50 60 70 80

Worload (job/h)Response (sec)

Time (hour)

Changing Workload & Fixed Deadline

deadline avg max min workload

Cost

VM Types Total Cost ($)% more than optimal

Choice #1 General 98.52$ (43%)Choice #2 High-CPU 128.86$ (87%)Choice #3 High-IO 129.71$ (88%)Choice #4 General, High-CPU, High-IO 78.62$ (14%)Optimal General, High-CPU, High-IO 68.85$

Evaluation - MODIS

MODIS200X – Year Terra & Aqua – Satellite(X - Y) – Day X to day Y 15 images / day

Moderate scale test (up to 20 instances)

Large Scale test (up to 90 instances)

* C.H. – computing hour 1C.H. = 0.12$ in Windows Azure

1hour deadline 2hour deadline 3hour deadlineTerra 2004(10-12)

Total 45 jobs4 C.H.* or 0.48$

18 min late 8 min early 20 min early9 C.H.or 1.08$ 6 C.H or 0.72$ 5 C.H.or 0.6$

Aqua 2008(30-32)Total 45 jobs

4 C.H. or 0.48$

15min late 20 min early 29 min early10 C.H or 1.2$ 7 C.H.or 0.84$ 5 C.H.or 0.6$

2 hour deadline 4 hour deadlineTerra & Aqua 2006(1-75)

Total 1125 jobs93 C.H. or 11.16$

20min late170 C.H. or 20.4$

6 min early132 C.H. or 15.84$

Terra & Aqua 2006(1-150)Total 2250 jobs

185 C.H. or 22.2$

Admission Denied 22 min early243 C.H. or 29.16$

Evaluation - MODIS

Test: Terra & Aqua 2006(1-75) - total 1125 jobs6min early

theoretical cost - 93 C.H. or 11.16$ actual cost - 132 C.H. or 15.84$

0 1 2 3 4 5

0

2

4

6

8

10

12

14

16

18

20

22

24

26

28

30

32

34

36

38

40

Time (hour)

Inst

an

ce N

um

ber

Instance Acquisition and Release

Released Acquiring Ready

Conclusions & Future works

Conclusions More cost-efficient than fixed-size instance

choice VM startup delay can affect hugely in practice

Future works More general cloud application model Multiple job classes Consider other instance types (e.g. spot

instances & reserved instances) Data transfer performance and storage cost

Thank you