hawk: hybrid datacenter scheduling - inria · hawk no centralized hawk no stealing hawk no...

49
Hawk: Hybrid Datacenter Scheduling Pamela Delgado, Florin Dinu, Anne-Marie Kermarrec, Willy Zwaenepoel 1 USENIX ATC 2015

Upload: others

Post on 16-Aug-2020

29 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

Hawk: Hybrid Datacenter Scheduling

Pamela Delgado, Florin Dinu,

Anne-Marie Kermarrec, Willy Zwaenepoel

1USENIX ATC 2015

Page 2: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

Job 1

cluster

task

2

scheduler

task

Job N

task task

… …

Introduction: datacenter scheduling

Page 3: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

cluster

3

centralized

scheduler

Job 1

task task

Job N

task task

… …

Introduction: centralized scheduling

Page 4: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

Introduction: centralized scheduling

cluster

4

centralized

scheduler… Job 1Job 2Job N

… …

Page 5: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

Introduction: centralized scheduling

cluster

5

centralized

scheduler… Job 1Job 2Job N

… …

Good: placement

Not so good: scheduling latency

Page 6: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

Introduction: distributed scheduling

6

cluster

distributedscheduler 1

distributedscheduler 2

distributedscheduler N

Job 1

Job 2

Job N

… …

Page 7: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

Introduction: distributed scheduling

7

cluster

distributedscheduler 1

distributedscheduler 2

distributedscheduler N

Good: scheduling latency

Not so good: placement

Job 1

Job 2

Job N

… …

Page 8: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

Outline

8

1) Introduction

2) HAWK hybrid scheduling

• Rationale

• Design

3) Evaluation

• Simulation

• Real cluster implementation

4) Conclusion

Page 9: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

Hybrid scheduling

9

cluster

Job 1

Job N

Job 2

distributedscheduler N

distributedscheduler 1

centralizedscheduler

Job M

… …

Page 10: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

Hawk: hybrid scheduling

10

Long jobs centralized

Short jobs distributed

Page 11: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

Hawk: hybrid scheduling

Long job 1

Short job 1

Long job M

Short job N

distributedscheduler 1

distributedscheduler N

11

Long/short:estimatedexecution time vs cut-off

centralizedscheduler

… … …

Short job 2 distributedscheduler 2

Page 12: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

Rationale for Hawk

Long job 1

Short job 1

Long job M

Short job N

12

Typical production workloads

many

few

little resources

most resources

…Short job 2

Page 13: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

Rationale for Hawk (continued)

13Source: Design Insights for MapReduce from Diverse Production Workloads, Chen et al 2012

0

20

40

60

80

100

0

20

40

60

80

100

Percentage of long jobs Percentage of task-seconds for long jobs

Page 14: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

Rationale for Hawk (continued)

14Source: Design Insights for MapReduce from Diverse Production Workloads, Chen et al 2012

0

20

40

60

80

100

0

20

40

60

80

100

Percentage of long jobs Percentage of task-seconds for long jobs

Long jobs: minority but

take up most of the resources

Page 15: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

centralized

Hawk: hybrid scheduling

distributed 1

distributed N

15

Few jobs reasonableschedulinglatency

Few resources can tradenot-so-good

placement

Long job 1

Short job 1

Short job N

Bulk ofresources good placement

Latency sensitive Fast scheduling

… …

Page 16: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

centralized

Hawk: hybrid scheduling

distributed 1

distributed N

16

Few jobs reasonableschedulinglatency

Few resources can tradenot-so-good

placement

Long job 1

Short job 1

Short job N

Bulk ofresources good placement

Latency sensitive Fast scheduling

… …

BEST OF BOTH WORLDS

Good: scheduling latency for short jobs

Good: placement for long jobs

Page 17: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

Hawk: distributed scheduling

17

• Sparrow

• Work-stealing

Page 18: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

Hawk: distributed scheduling

18

• Sparrow

• Work-stealing

Page 19: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

Sparrow

distributedscheduler

task

random

reservation

(power of two)

19

Page 20: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

Hawk: distributed scheduling

20

• Sparrow

• Work-stealing

Page 21: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

Sparrow and high load

distributedscheduler

task

Random

placement:

Low likelihood on

finding a free node21

Page 22: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

Sparrow and high load

distributedscheduler

task

Random

placement:

Low likelihood on

finding a free node22

High load + job heterogeneity

head-of-line blocking

Page 23: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

Hawk work-stealing

23

Free node!!

Page 24: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

Hawk work-stealing

24

1. Free node:

contact random

node for probes!

2. Random node:

send short tasks

reservation in queue

Page 25: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

Hawk work-stealing

25

1. Free node:

contact random

node for probes!

2. Random node:

send short tasks

reservation in queue

High load high probability

of contacting node with backlog

Page 26: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

Hawk cluster partitioning

distributedscheduler

26

Reserved nodes:

small cluster

partition

centralizedscheduler

No coordination,

challenge: no free

nodes for mice!

Page 27: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

Hawk cluster partitioning

distributedscheduler

27

Reserved nodes:

small cluster

partition

centralizedscheduler

No coordination,

challenge: no free

nodes for mice!

Short jobs schedule anywhere.

Long jobs only in non-reserved nodes.

Page 28: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

Hawk design summary

Hybrid scheduler:

long centralized, short distributed

Work-stealing

Cluster partitioning

28

Page 29: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

29

Evaluation: 1. Simulation

• Sparrow simulator

• Google trace

• Vary number of nodes to vary cluster utilization

• Measure: Job running time

• Report 50th and 90th percentiles for short and long jobs

• Normalized to Sparrow

Page 30: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

Simulated results: short jobs

0

0.2

0.4

0.6

0.8

1

1.2

10 20 30 40 50

Haw

k/Sp

arro

w

Number of nodes in the cluster (thousands)

50th 90th

lower better

30

Page 31: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

Simulated results: short jobs

0

0.2

0.4

0.6

0.8

1

1.2

10 20 30 40 50

Haw

k/Sp

arro

w

Number of nodes in the cluster (thousands)

50th 90th

lower better

31

Better across the board

Page 32: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

Simulated results: long jobs

0

0.2

0.4

0.6

0.8

1

1.2

10 20 30 40 50

Haw

k/Sp

arro

w

Number of nodes in the cluster (thousands)

50th 90th

32

lower better

Page 33: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

Simulated results: long jobs

0

0.2

0.4

0.6

0.8

1

1.2

10 20 30 40 50

Haw

k/Sp

arro

w

Number of nodes in the cluster (thousands)

50th 90th

33

Better except under high load

lower better

Page 34: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

Simulated results: long jobs

0

0.2

0.4

0.6

0.8

1

1.2

10 20 30 40 50

Haw

k/Sp

arro

w

Number of nodes in the cluster (thousands)

50th 90th

34

Very high utilization: partitioning

lower better

Page 35: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

Decomposing Hawk

1. Hawk minus centralized

2. Hawk minus stealing

3. Hawk minus partitioning

(normalized to Hawk)

35

Page 36: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

0

0.5

1

1.5

2

Hawk no centralized

50th short jobs 90th short jobs

50th long jobs 90th long jobs

Decomposing Hawk: no centralized

36

1. Hawk minus centralized

2. Hawk minus stealing

3. Hawk minus partitioning

(normalized to Hawk)

Page 37: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

Decomposing Hawk: no stealing

37

0

0.5

1

1.5

2

Hawk no stealing

50th short jobs 90th short jobs

50th long jobs 90th long jobs

19.6

1. Hawk minus centralized

2. Hawk minus stealing

3. Hawk minus partitioning

(normalized to Hawk)

Page 38: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

Decomposing Hawk: no partitioning

38

0

0.5

1

1.5

2

Hawk no partition

50th short jobs 90th short jobs

50th long jobs 90th long jobs

11.9

1. Hawk minus centralized

2. Hawk minus stealing

3. Hawk minus partitioning

(normalized to Hawk)

Page 39: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

0

0.5

1

1.5

2

Hawk no centralized Hawk no stealing Hawk no partition

50th short jobs 90th short jobs 50th long jobs 90th long jobs

Decomposing Hawk summary

39

11.919.6

Page 40: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

0

0.5

1

1.5

2

Hawk no centralized Hawk no stealing Hawk no partition

50th short jobs 90th short jobs 50th long jobs 90th long jobs

Decomposing Hawk summary

Absence of any component

reduces Hawk’s performance!

40

11.919.6

Page 41: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

Sensitivity analysis

1. Incorrect estimates of runtime

2. Cut off long/short

3. Details of stealing

41

Page 42: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

Sensitivity analysis

1. Incorrect estimates of runtime

2. Cut off long/short

3. Details of stealing

Bottom line: benefits of Hawk remain despite variation

See paper for details

42

Page 43: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

Evaluation: 2. Implementation

43

Hawk daemon

Hawkscheduler

Hawk daemon

Page 44: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

Experiment

• 100-node cluster

• Subset of Google trace

• Vary inter-arrival time to vary cluster utilization

• Measure: Job running time

• Report 50th and 90th percentile for short and long jobs

• Normalized to Sparrow

44

Page 45: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

Short jobs

45

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1 1.2 1.4 1.6 1.8 2 2.25

Haw

k/Sp

arro

w

Inter-arrival time

real 90th simulated 90th

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1 1.2 1.4 1.6 1.8 2 2.25

Haw

k/Sp

arro

w

Inter-arrival time

real 50th simulated 50th

lower better

Page 46: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

Long jobs

46

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1 1.2 1.4 1.6 1.8 2 2.25

Haw

k/Sp

arro

w

Inter-arrival time

real 90th simulated 90th

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1 1.2 1.4 1.6 1.8 2 2.25

Haw

k/Sp

arro

w

Inter-arrival time

real 50th simulated 50th

lower better

Page 47: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

Implementation

47

1. Hawk works well in real cluster

2. Good correspondence

implementation/simulation

Page 48: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

Related work

48

Centralized: Hadoop Fair Scheduler, Quincy

Eurosys’10, SOSP‘09

Two level: Yarn, Mesos

SoCC’12, NSDI’11

Distributed schedulers: Omega, Sparrow

Eurosys’12,SOSP’13

Hybrid schedulers: Mercury

Page 49: Hawk: Hybrid Datacenter Scheduling - Inria · Hawk no centralized Hawk no stealing Hawk no partition 50th short jobs 90th short jobs 50th long jobs 90th long jobs Decomposing Hawk

Conclusion

• Hawk: hybrid scheduler

long: centralized, short: distributed

work-stealing

cluster partitioning

• Hawk provides good results for short and long jobs

• Even under high cluster utilization

49