grid’2012 dubna july 19, 2012 dependable job-flow dispatching and scheduling in virtual...

30
GRID’2012 Dubna July 19, 2012 Dependable Job-flow Dispatching and Scheduling in Virtual Organizations of Distributed Computing Environments Victor Toporkov а, , Alexey Tselishchev b , Dmitry Yemelyanov a , Alexander Bobchenkov a a National Research University "MPEI” b CERN (European Organization for Nuclear Research)

Upload: bethany-whitehead

Post on 01-Jan-2016

227 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GRID’2012 Dubna July 19, 2012 Dependable Job-flow Dispatching and Scheduling in Virtual Organizations of Distributed Computing Environments Victor Toporkov

GRID’2012 Dubna July 19, 2012

Dependable Job-flow Dispatching and Scheduling in Virtual

Organizations of Distributed Computing Environments

Victor Toporkovа,, Alexey Tselishchevb, Dmitry Yemelyanova, Alexander

Bobchenkova

a National Research University "MPEI”b CERN (European Organization for

Nuclear Research) 

Page 2: GRID’2012 Dubna July 19, 2012 Dependable Job-flow Dispatching and Scheduling in Virtual Organizations of Distributed Computing Environments Victor Toporkov

Introduction• This work presents dispatching strategies

based on methods of job-flow and application-level scheduling in VO of distributed computational environments with non-dedicated resources.

• Strategies are based on economic scheduling models and diverse administration policies inside resource domains.

GRID July 19, 2012 2

Page 3: GRID’2012 Dubna July 19, 2012 Dependable Job-flow Dispatching and Scheduling in Virtual Organizations of Distributed Computing Environments Victor Toporkov

Distributed Computational Environment

• Heterogeneity

• Changing composition

• Different resource owners

• Local and global job flows

• User-required quality of service (QoS)

• Pricing policies

GRID July 19, 2012 3

Page 4: GRID’2012 Dubna July 19, 2012 Dependable Job-flow Dispatching and Scheduling in Virtual Organizations of Distributed Computing Environments Victor Toporkov

4

Conflict of Interests

• Each user is interested in effective jobs execution with the lowest costs

• Resource owners are trying to make the highest income from their resources

• VO administrators are interested in maximizing the whole VO performance in the way that satisfies both users and owners

GRID July 19, 2012

Page 5: GRID’2012 Dubna July 19, 2012 Dependable Job-flow Dispatching and Scheduling in Virtual Organizations of Distributed Computing Environments Victor Toporkov

Two Trends in Distributed Computing• A resource broker modelPro: It is decentralized and application-

specific.Contra: Integral QoS rates may be

deteriorated.

• Virtual organizations Contra: VOs naturally restrict the scalability. Pro: It possible to improve the efficiency of

resource usage and find a tradeoff between contradictory interests of different participants.

GRID July 19, 2012 5

Page 6: GRID’2012 Dubna July 19, 2012 Dependable Job-flow Dispatching and Scheduling in Virtual Organizations of Distributed Computing Environments Victor Toporkov

6

Framework for Integrated Scheduling6

Job Manager (Si,Sk)

Job Manager (Sj)

Job Manager (Si) Local

Manager Local Manager

Local Manager

Local Manager

Local Manager

Task Queues

Processor nodes

Processor nodes

TaskQueues

Metascheduler(Si, Sj, Sk job-flows strategies)

Job-flows i j k

Jobs of flow

i

Jobs of flowFlow Jobs of flowk i

j

i

k j

User job-flows

GRID July 19, 2012

Page 7: GRID’2012 Dubna July 19, 2012 Dependable Job-flow Dispatching and Scheduling in Virtual Organizations of Distributed Computing Environments Victor Toporkov

Outline• Overview of model components and

metascheduling workflow.

• Scheduling strategy search.

• Simulation results.

GRID July 19, 2012 7

Page 8: GRID’2012 Dubna July 19, 2012 Dependable Job-flow Dispatching and Scheduling in Virtual Organizations of Distributed Computing Environments Victor Toporkov

Model Components• VO, that defines resource co-allocation

dispatching strategies, pricing policies and resource load-balancing mechanisms.

• Heterogeneous hierarchical computational environment (Grid nodes, CPUs or others). Each resource is considered as non-dedicated.

• Metascheduler, which implements resource management strategies and policies of VO.

• Application-level schedulers that analyze internal job structure and schedule single tasks.

GRID July 19, 2012 8

Page 9: GRID’2012 Dubna July 19, 2012 Dependable Job-flow Dispatching and Scheduling in Virtual Organizations of Distributed Computing Environments Victor Toporkov

9

Resource Request The resource requirements are arranged into

a resource request containing:

• - minimal performance requirement for computational nodes

• maximal price tag for a single timeslot• number of simultaneously reserved timeslots• minimal slot length• the internal structure of a job as a directed acyclic

graph • deadline for the job execution

minP

maxF

n

GRID July 19, 2012

Page 10: GRID’2012 Dubna July 19, 2012 Dependable Job-flow Dispatching and Scheduling in Virtual Organizations of Distributed Computing Environments Victor Toporkov

10

Slot

• A single slot is a time span that can be assigned to a task, which is a part of a multiprocessor job. Available resources are represented as slot sets.

GRID July 19, 2012

Page 11: GRID’2012 Dubna July 19, 2012 Dependable Job-flow Dispatching and Scheduling in Virtual Organizations of Distributed Computing Environments Victor Toporkov

11

Two-Tier Scheduling

• The metascheduler analyzes available slots and finds an optimal slot combination to accommodate every job in a batch using economic criteria

• Application-level schedulers analyze the job DAG and form a resulting schedule for every task according to the strategy

GRID July 19, 2012

Page 12: GRID’2012 Dubna July 19, 2012 Dependable Job-flow Dispatching and Scheduling in Virtual Organizations of Distributed Computing Environments Victor Toporkov

12

Job Batch SchedulingPerforming an optimization of the whole batch of

jobs allows to increase overall scheduling efficiency

Job1

ResourceRequest 1

Job2

ResourceRequest 2

Job n

ResourceRequest n

Job Batch

Criterium ACriteria A Criteria B Criteria N

GRID July 19, 2012

Page 13: GRID’2012 Dubna July 19, 2012 Dependable Job-flow Dispatching and Scheduling in Virtual Organizations of Distributed Computing Environments Victor Toporkov

13

Scheduling Cycles

RESOURCES

Cycle i-1 Cycle i

Job Batch Job Batch

GRID July 19, 2012

Page 14: GRID’2012 Dubna July 19, 2012 Dependable Job-flow Dispatching and Scheduling in Virtual Organizations of Distributed Computing Environments Victor Toporkov

14

Scheduling Cycle SchemeDuring every cycle of the job batch scheduling

two problems have to be solved:

• Selecting alternative sets of slots (alternatives) that meet the requirements (resource, time, and cost)

• Choosing a slot combination that would be the

efficient or optimal in terms of the whole job batch execution in the current cycle of scheduling

GRID July 19, 2012

Page 15: GRID’2012 Dubna July 19, 2012 Dependable Job-flow Dispatching and Scheduling in Virtual Organizations of Distributed Computing Environments Victor Toporkov

15

Optimization Scheme

Job 1Job 2 Job 3 Job n…

Alternative 2

Alternative 1 Alternative 1

Alternative 3

Alternative N1

Alternative 1

Alternative 2

Alternative 3 Alternative 3

Alternative N2

Alternative 3

Alternative N3

Job 2 Job 3 Job n

.

...

.

. ..

Alternative i

Time: TiCost: Ci

Best Alternativ

es

Alternative 1

Alternative 2 Alternative 2

Alternative Nn

GRID July 19, 2012

Page 16: GRID’2012 Dubna July 19, 2012 Dependable Job-flow Dispatching and Scheduling in Virtual Organizations of Distributed Computing Environments Victor Toporkov

16

The Functional Equation

GRID July 19, 2012

Page 17: GRID’2012 Dubna July 19, 2012 Dependable Job-flow Dispatching and Scheduling in Virtual Organizations of Distributed Computing Environments Victor Toporkov

17

Window Concept• In the case of homogeneous nodes, a set of slots

for a job execution is represented with a rectangle window, and in the case of processors with varying performance, that will be a window with a rough right edge

GRID July 19, 2012

Page 18: GRID’2012 Dubna July 19, 2012 Dependable Job-flow Dispatching and Scheduling in Virtual Organizations of Distributed Computing Environments Victor Toporkov

18

AMP Algorithms

We propose algorithms for slot selection that feature linear complexity O(m), where m is the number of available time-slots

AMP (an Algorithm based on Maximal job Price) searches for a set of slots effective for a given criterion which total cost will not exceed the maximal budget S

GRID July 19, 2012

Page 19: GRID’2012 Dubna July 19, 2012 Dependable Job-flow Dispatching and Scheduling in Virtual Organizations of Distributed Computing Environments Victor Toporkov

19

Slot Search Algorithms ConceptAll available time-slots are ordered by the start

time;

While there is at least one slot available {• Adding next available slot to the window list;• Checking all slots in the window considering the start

time of the new slot and removing the slots being late;• Selection of a n-slot window best by the given criterion;

• } • Selecting the best of the found interim windows;

GRID July 19, 2012

Page 20: GRID’2012 Dubna July 19, 2012 Dependable Job-flow Dispatching and Scheduling in Virtual Organizations of Distributed Computing Environments Victor Toporkov

20

Window Selection General Scheme

Ck, Tk, Zk

Cj, Tj, Zj

Ci, Ti, Zi

Minimize:

Constraints:

min2211 mmza...zazaSca...caca mm 2211

na...aa m 21

m,...,r,ar 1,10

Slot 1

Slot m

.

.

.

GRID July 19, 2012

Page 21: GRID’2012 Dubna July 19, 2012 Dependable Job-flow Dispatching and Scheduling in Virtual Organizations of Distributed Computing Environments Victor Toporkov

21

Job-Flow Scheduling Simulation Results

Mode Average jobs being processed per cycle (max 30)

Average total budget (slot

cost) per cycle, cost

units

Average total slot usage per

cycle, time units

GAIN, %

Problem 1: Maximize total budget, limit slot usage

OPT 20.0 11945.9 421.2+12.8

NO OPT 20.0 10588.5 459.4

Problem 2: Minimize slot usage, limit total budget

OPT 12.4 7980.4 300.9+10.6

NO OPT 12.4 7830.9 332.8

GRID July 19, 2012

Page 22: GRID’2012 Dubna July 19, 2012 Dependable Job-flow Dispatching and Scheduling in Virtual Organizations of Distributed Computing Environments Victor Toporkov

22

Application-level DataApplication-level schedulers receive

following input data.• The optimal slot set and the description of all

corresponding resources• The directed acyclic information graph (DAG);

vertices correspond to job tasks • The dispatching strategy , which defines the

criterion for a schedule expected• The deadline or the maximal budget for the job

GRID July 19, 2012

Page 23: GRID’2012 Dubna July 19, 2012 Dependable Job-flow Dispatching and Scheduling in Virtual Organizations of Distributed Computing Environments Victor Toporkov

23

Critical Jobs MethodThe critical jobs method consists of three

main steps:

• Forming and ranging a set of critical jobs (longest sets of connected tasks) in the DAG.

• Consecutive planning of each critical job using dynamic programming methods.

• Resolution of possible collisions.GRID July 19, 2012

Page 24: GRID’2012 Dubna July 19, 2012 Dependable Job-flow Dispatching and Scheduling in Virtual Organizations of Distributed Computing Environments Victor Toporkov

24

Critical Jobs Method Example

GRID July 19, 2012

Page 25: GRID’2012 Dubna July 19, 2012 Dependable Job-flow Dispatching and Scheduling in Virtual Organizations of Distributed Computing Environments Victor Toporkov

25

Application Level Scheduling Simulation Results

Proposed model provides 25% advantage on average job execution time

before the consecutive application-level scheduling approach

Parameter Application-level scheduling Two-tier model (k=0.75)

Jobs number 1000 1000

Execution time 531089 time units 399465 time units

Optimal schedules 687 703

Mean collision count 3.85 4.41

Mean load (fact) 0.1841 0.1830

Mean job cost 14.51 units 14.47 units

GRID July 19, 2012

Page 26: GRID’2012 Dubna July 19, 2012 Dependable Job-flow Dispatching and Scheduling in Virtual Organizations of Distributed Computing Environments Victor Toporkov

26

Simulation Studies: Job Distribution

0,2

0,25

0,3

0,35

0,4

0,45

0,5

0,55

0,6

0,65

1 1,5 2 2,5

Scheduling interval factor, h

Su

cces

sfu

l J

ob

dis

trib

uti

on

s p

rop

ort

ion

Dependence of the proportion of the successful job distributions on the length of the distribution interval

GRID July 19, 2012

Page 27: GRID’2012 Dubna July 19, 2012 Dependable Job-flow Dispatching and Scheduling in Virtual Organizations of Distributed Computing Environments Victor Toporkov

27

Resource Utilization Level Balancing

Utilization minimization and distribution cost maximization

Distribution cost minimization

GRID July 19, 2012

Page 28: GRID’2012 Dubna July 19, 2012 Dependable Job-flow Dispatching and Scheduling in Virtual Organizations of Distributed Computing Environments Victor Toporkov

28

Conclusions and Future Work The following scheduling schemes and algorithms

were proposed and considered:

• Job-flow level scheduling model and slot selection algorithms

• Application-level scheduling and node balancing scheme

Future research will include the simulation of connected job-flow and application levels experiments together with direct comparison with the existing systems.

GRID July 19, 2012

Page 29: GRID’2012 Dubna July 19, 2012 Dependable Job-flow Dispatching and Scheduling in Virtual Organizations of Distributed Computing Environments Victor Toporkov

29

Acknowledgements• This work was partially supported by the

Council on Grants of the President of the Russian Federation for State Support of Leading Scientific Schools (SS-316.2012.9), the Russian Foundation for Basic Research (grant no. 12-07-00042), and by the Federal Target Program “Research and scientific-pedagogical cadres of innovative Russia” (state contracts nos. 16.740.11.0038 and 16.740.11.0516).

GRID July 19, 2012

Page 30: GRID’2012 Dubna July 19, 2012 Dependable Job-flow Dispatching and Scheduling in Virtual Organizations of Distributed Computing Environments Victor Toporkov

30

Thank You!

GRID July 19, 2012