load balancing guardrails -...
TRANSCRIPT
Load Balancing GuardrailsKeeping Your Heavy Traffic on the Road to Low Response Times
Isaac Grosof (CMU)Ziv Scully (CMU)
Mor Harchol-Balter (CMU)
1
Goal: Optimal Load Balancing
Objective:Minimize mean response time E[T]
Q1: How to dispatch?
Q2: How to schedule?
2
Assumptions:Stochastic Arrivals
Known SizesPreempt-Resume
SRP
T
SRP
T
SRP
T
Theorem:SRPT is optimal
Dispatcher
Servers
SRPT: Very little prior work
Dispatcher
SRP
T
SRP
T
SRP
T
Prior Work on Dispatching
3
Dispatcher
FCFS
FCFS
FCFS
FCFS: Tons of prior work
Join-Shortest-Queue (JSQ): Winston, Weber,
Whitt, Lin, Raghavendra, Foley, McDonald, Bramson, Lu, Prabhakar, Eschenfeldt, Gamarnik, …
Join-Shortest-of-d-Queues (JSQ-d): Vvedenskaya, Dobrushin, Karpelevich, Mitzenmacher, Bramson, Ying, Srikant, Kang, Muckherjee, Borst, Leeuqaarden, ...
Least-Work-Left (LWL): Lee, Longton, Kingman,
Takahashi, Daly, Tijms, Van Hoorn, Ma, Mark, Breur, Hokstad, Kimura, Gupta, Harchol-Balter, Dai, Zwart, Osogami, Whitt, …
Size-Interval-Task-Assignment (SITA):Harchol-Balter, Crovella, Murta, Bachmat, Sarfati, Vesilo, Scheller-Wolf, …
Prior Work on Dispatching - FCFS
4
Dispatcher
FCFS
FCFS
FCFS
FCFS: Tons of prior work
Prior Work on Dispatching - SRPT
5
Random dispatch: Trivial
First Policy Iteration (FPI) Heuristic [Hyytiä & Aalto ‘12]
Multilayered Round Robin [Down & Wu ‘06]
SRPT: Very little prior work
Dispatcher
SRP
T
SRP
T
SRP
T
Good for FCFS ⇒ Good for SRPT
6
Distribution: Bounded Pareto [1, 106], α=1.5.
10 servers
Dispatcher
SRP
T
SRP
T
SRP
T
0.7 0.8 0.9 1
Mean response time E[T]for SRPT servers
200
150
100
50
0
Load (ρ)
Random SITA-E LWL
?
Good for FCFS ⇏ Good for SRPT
7
Distribution: Bounded Pareto [1, 106], α=1.5.
10 servers
Dispatcher
SRP
T
SRP
T
SRP
T
0.7 0.8 0.9 1
Mean response time E[T]for SRPT servers
200
150
100
50
0
Load (ρ)
Random SITA-E LWL
Our Contribution: Guardrails
8
→
Possiblyverybad
Disp. Policy P
SRP
T
SRP
T
SRP
T
Guaranteedheavy traffic
optimal
SRP
TDisp. Policy P+ Guardrails
SRP
T
SRP
T
SRP
T ≈
Our Contribution: Guardrails
9
→
Possiblyverybad
Disp. Policy P
SRP
T
SRP
T
SRP
T
Guaranteedheavy traffic
optimal
SRP
T
k
Disp. Policy P+ Guardrails
1
SRP
T
1
SRP
T
1SR
PT ≈
Good for FCFS ⇏ Good for SRPT
10
Distribution: Bounded Pareto [1, 106], α=1.5.
10 servers
Dispatcher
SRP
T
SRP
T
SRP
T
0.7 0.8 0.9 1
Mean response time E[T]for SRPT servers
200
150
100
50
0
Load (ρ)
Random SITA-E LWL
G-Random G-SITA-E G-LWL
Dispatching to SRPT Servers
Dispatcher
SRP
T
SRP
T
SRP
T
SRP
T
11
Dispatching to SRPT Servers
SRP
T
SRP
T
12
Dispatching to SRPT Servers
SRP
T
SRP
T
A small job needs me!
13
Leads to bad E[T]
Problem: Small Job Imbalance
SRP
T
SRP
T
14
Dispatcher
More small jobs left
More small jobs right
Balanced small jobs
Problem: Small Job Imbalance
SRP
T
SRP
T
15
Dispatcher
A small job needs me!
More small jobs left
More small jobs right
Balanced small jobs
Guardrails
SRP
T
SRP
T
16
Dispatcher
More small jobs left
More small jobs right
Balanced small jobs
Guardrails
SRP
T
SRP
T
17
Dispatcher
More small jobs left
More small jobs right
Balanced small jobs
Guardrails
SRP
T
SRP
T
18
Dispatcher
More small jobs left
More small jobs right
Balanced small jobs
Guardrails
SRP
T
SRP
T
19
Dispatcher
More small jobs left
More small jobs right
Balanced small jobs
Guardrails
20
SRP
T
SRP
T
DispatcherTo Do:• >2 job sizes?
• >2 servers?
Guardrails
21
SRP
T
SRP
T
Dispatcher>2 job sizes?
Prob.
Size
Guardrails: Bucketing
22
Dispatcher
SRP
T
SRP
T
Guardrails: Bucketing
23
[1, 10] [10, 100] [100, 1000]
… …
SRP
T
SRP
T
Guardrails: Bucketing
24
[1, 10] [10, 100] [100, 1000]
… …
SRP
T
SRP
T
SRP
T
SRP
T
Precise Dispatching Requirement
Job of size 𝑥 has rank 𝑟 ↔ 𝑥 ∈ [𝑐𝑟 , 𝑐𝑟+1)*
𝑉𝑖𝑟 𝑡 = Volume of rank 𝑟 work dispatched to server 𝑖 by time 𝑡.
Guardrail requirement:
∀ ranks 𝑟, ∀ servers 𝑖, 𝑗, ∀ times 𝑡,
25
* 𝑐 is chosen as a function of load 𝜌.
| 𝑉𝑖𝑟 𝑡 − 𝑉𝑗
𝑟 𝑡 | ≤ 𝑐𝑟+1
The Guardrail Theorem
26
PossiblyVery bad
Disp. Policy P
SRP
T
SRP
T
SRP
T
SRP
T
k≈w.r.t.E[T]
Guaranteed heavy traffic optimal
→Disp. Policy P +
Guardrails
1
SRP
T
1
SRP
T
1SR
PT
lim𝜌→1
𝐸[Resp. Time of Disp. Policy P with Guardrails]
𝐸[Resp. Time of Single SRPT Superserver]= 1, ∀𝑃