dynamic fractional resource scheduling practical issues and future directions -- 2011, cetraro

Introduction Implementation Concluding Remarks Appendix

Dynamic Fractional Resource SchedulingPractical Issues and Future Directions

Mark Stillwell Frederic Vivien Henri Casanova

INRIA, Universite de Lyon, LIP

International Research Workshop on Advanced HighPerformance Computing Systems

Cetraro, ItalyJune 27–29, 2011

M. Stillwell, F. Vivien, H. Casanova INRIA

DFRS in Practice


Outline

Introduction

Implementation

Concluding Remarks


DFRS in Practice


Dynamic Fractional Resource SchedulingI HPC Scheduling Problem:

I homogeneous nodes with high-speed interconnectI network file systemI release date and number of tasks known at submissionI computation time known only after completionI goal: minimize maximum stretch

I approach:I based on virtual machine technology

I fractional resource allocationsI preemption / migration

I on-line scheduling problem → sequence of off-line resourceallocation problems

I computable, on-line performance metric related to maxstretch

I heuristics for task placement/resource allocation


DFRS in Practice


Task Placement and Resource Allocation Problem

I nodes comprise a number of resources (CPU cores,memory, bandwith, etc...)

I tasks have rigid requirements and fluid needsI job tasks have the same requirements and needsI rigid requirements specify minimum resource allocationsI fluid needs specify minimum allocations without

performance degradationI yield of a job is a value between 0 and 1

I correlated with performanceI jobs allocated product of yield and need in each resource

I Goal: find an allocation that maximizes the minimum yield


DFRS in Practice


Task Placement Heuristics

I Greedy Task Placement – incremental, puts each task onthe node with the lowest computational load on which itcan fit without violating memory constraints

I MCB Task Placement – global, iteratively appliesmulti-capacity (vector) bin-packing heuristics during abinary search for the maximized minimum yield

I achieves higher minimum yield values than GreedyI can potentially cause lots of migration

I what if the system is oversubscribed?I use a priority function to decide which jobs to run


DFRS in Practice


Priority Function?

I Virtual Time: subjective time experienced by a jobI first idea: 1

VIRTUAL TIMEI informed by ideas about fairnessI lead to good resultsI theoretically prone to starvation

I second idea: FLOW TIMEVIRTUAL TIME

I addresses starvation problemI lead to poor performance

I Third Idea: FLOW TIME(VIRTUAL TIME)2

I combines idea #1 and idea #2I addresses starvationI performs about the same as first priority function


DFRS in Practice


When to apply Heuristics

We consider a number of different options:I Job Submission – heuristics can use greedy or bin packing

approachesI Job Completion – as above, can help with throughput

when there are lots of short running jobsI Periodically – some heuristics periodically apply vector

packing to improve overall job placement


DFRS in Practice


Methodology

I experiments conducted using discrete event simulatorI mix of synthetic and real trace dataI considered only memory requirements and CPU needsI preemption/migration of jobs causes 300 second penalty to

runtimeI periodic approaches use a 600 second (10 minute) periodI absolute bound on max stretch computed for each instanceI performance comparison based on degradation from max

stretch bound


DFRS in Practice


Simulation Algorithms

I FCFSI EASYI Greedy*I GreedyPM*I /perI GreedyPM*/perI MCB*/perI GreedyPM*/per/mvtI MCB*/per/mvtI /stretch-per


DFRS in Practice


Max Stretch Degradation vs. Load, 5 minute penalty

1

10

100

1000

10000

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Max

stre

tch

degr

adat

ion

Fro

m B

ound

Load

FCFSEASYGreedy*

GreedyPM*

/per

GreedyPM*/per

MCB*/per

/stretch-per

MCB*/per/mvtGreedyPM*/per/mvt


DFRS in Practice


Bandwidth vs. Period

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0.55

2000 4000 6000 8000 10000 12000

Ave

rage

Ban

dwid

th (

GB

/s)

Period (seconds)


DFRS in Practice


Max Stretch Degradation vs. Period

5

6

7

8

9

10

11

12

13

14

2000 4000 6000 8000 10000 12000

Ave

rage

Max

stre

tch

Deg

rada

tion

Period (seconds)


DFRS in Practice


Outline

Introduction

Implementation

Concluding Remarks


DFRS in Practice


Development Platform

I Grid5000 (http://www.grid5000.fr)I Kadeploy imagesI Genepi Nodes: Quad Core Xeon 2.5GHz w/ 8GB ram

I Open-Source SoftwareI Xen 4.0I Debian Squeeze / Linux 2.6.32


DFRS in Practice

http://www.grid5000.fr


Goals

I “small” experiment topics:I VM overheadI resource requirement/needs estimationI time sharing performance impactI preemption/migration costs

I bandwidth consumptionI performance impact

I migration planning strategiesI “large” experiment topic: DFRS in practice


DFRS in Practice


Estimating Resource Needs

I Techniques:I repeated runs for benchmarking / model buildingI online model buildingI introspection / configuration variation

I Models:I constant valuesI time-varying valuesI random variables / probability distributions

I Questions:I how accurate can we be?I how accurate do we need to be?


DFRS in Practice


Benchmark Task VM Memory Requirements

bench/tasks 1 2 3 4CG 477M 251M – 147MEP <64M <64M <64M <64MFT 1930M 977M – 499MIS 637M 329M – 176MLU 218M 125M – 80MMG 485M 259M – 147MSP 369M – – 123M


DFRS in Practice


Time Sharing Performance Impact

I Process thrashing!!!I Gang SchedulingI Flexible Co-scheduling

I Uncoordinated might not be so bad...


DFRS in Practice


VM Consolidation

2

4

6

8

10

12

14

1 2 3 4 5 6 7 8

slo

wd

ow

n

consolidation factor (maximum tasks/core)

y = xCGEPFTIS

LUMG


DFRS in Practice


Parallel Task Preemption/Migration

I preemption of parallel distributed jobs is theoreticallydifficult

I in the general case this may require coordinated shutdownI on the target platform everything happens quickly enough

that it isn’t an issueI migration can be done by preempt/restore or “live”


DFRS in Practice


Preemption/Migration Timing Results

0

10

20

30

40

50

60

70

80

90

100

400 600 800 1000 1200 1400 1600 1800 2000

Tim

e (

se

co

nd

s)

Memory Allocation (MB)

PreemptRestore

Preempt+RestoreMigrate


DFRS in Practice


Migration Planning

I using only live migration is NP-hardI could fail (circular dependencies)I heuristic: try to move the highest priority job

I if it can’t move, try next highestI if none can move, preempt the lowest priority job scheduled

for migration and try againI restore any preempted jobs at the end


DFRS in Practice


Planned Implementation Validation Experiment

I generate traces using an accepted modelI annotate them with CPU/memory valuesI ”fill in” the traces using benchmarks...

I NAS parallel benchmarks (HPC)I TCPP benchmarks (Service Hosting)


DFRS in Practice


Future Work

I complete development of practical implementationI extend the model

I nonlocal resourcesI heterogeneous nodesI varying communications delays

I new or arbitrary optimization targetsI fault toleranceI energy or other costsI formal definitions of fairness


DFRS in Practice


Summary

I DFRS performs well in simulationI orders of magnitude better than batchI close to a theoretical bound

I prototype in developmentI preliminary results are promising


DFRS in Practice


References I

Stillwell, M., Schanzenbach, D., Vivien, F., and Casanova,H. (2010).Resource allocation algorithms for virtualized servicehosting platforms.Journal of Parallel and Distributed Computing,70(9):962–974.


DFRS in Practice

dynamic fractional resource scheduling practical issues and future directions -- 2011, cetraro

Technology

casanova inria dfrs

remarks appendix bandwidth

max stretch heuristics

performance degradation

max stretch bound

performance jobs

permvt greedypm

job submission heuristics