join-the-shortest-queue (jsq) routing in web server farms varun gupta joint with: mor harchol-balter...
Post on 21-Dec-2015
219 views
TRANSCRIPT
Join-the-Shortest-Queue (JSQ) Routing in Web Server Farms
VARUN GUPTA
Joint with:
Mor Harchol-Balter
Carnegie Mellon Univ.
Karl Sigman
Columbia Univ.
Ward Whitt
Columbia Univ.
2
Application: Web server farms
Commodity web servers
Local Router(Immediate Dispatch)
JSQ : most popular policy- Cisco Local Director- IBM Network Dispatcher - …
Timeshare service among
current requests
3
Model: PS server farm with JSQ
Commodity web servers
(Immediate Dispatch)Local Router
Timeshare service among
current requests
JSQ : most popular policy- Cisco Local Director- IBM Network Dispatcher - …
4
Model: PS server farm with JSQ
(Immediate Dispatch)
PS
PS
PS
• K homogenous, processor sharing servers
Local Router
5
Model: PS server farm with JSQ
PS
PS
PS
JSQ / Immed. Dispatch
Poisson
Rate
• K homogenous, processor sharing servers• Poisson arrivals• Job sizes i.i.d. ~ G
≡ M/G/K/JSQ/PS
6
Why join the shortest queue?
• Dynamic load balancing
• Simple
• Greedy for PS server farm– share server with minimum # of jobs
7
Prior Analysis of JSQ routing
2-server:
[Kingman 61] , [Flatto, McKean 77], [Cohen, Boxma 83], [Wessels, Adan, Zijm 91]
[Foschini, Salz 78], [Knessl, Makkowsky, Schuss, Tier 87]
[Conolly 84], [Rao, Posner 87], [Blanc 87], [Grassmann 80]
>2-server approximations:
[Nelson, Philips, Sigmetrics 89]
[Lin, Raghavendra, TPDS 96]
[Lui, Muntz, Towsley 95]
OUR GOAL: Analyze JSQ with PS servers and general job size distributions;
Limited to FCFS servers and mostly exponential job size distribution
JSQ
FCFS
FCFS
interested in mean response time, E[T]
8
• Observe: exponential job sizes
• How about general job sizes?
GOAL: Analysis of JSQ with PS servers
JSQ
FCFS
FCFS
M/M/K/JSQ/FCFSM/M/K/JSQ/PS
JSQ
PS
PS
jointqueuelength
Approximations exist
GOAL: Effect of job size variability on JSQ/PS
9
THEOREM: E[T] insensitive under H2* jobs
Goal: Effect of job size variability on JSQ/PS
Idea: Look at H2*(,p) distribution
2 degrees of freedom
can fix mean andcontrol variance
10
THEOREM: E[T] insensitive under H2* job size distribution
PROOF:
M/H2*/K/JSQ/PS
JSQ
PS
PS
H2*(,p)
M/M/K/JSQ/PS
JSQ
PS
PS
(1-p)
Exp()
M/M/K/JSQ/PS
JSQ
PS
PS
Exp( )1-p
stationaryqueue lengthdistribution
stationaryqueue lengthdistribution
Q: What happens to 0-sized jobs?A: Disappear on arrival
equal mean size
11
Insensitivity for general distributions?
Simulate M/G/K/JSQ/PS under following 7 distributions (all with mean 2)1. Deterministic var=0
2. Erlang2 var=2
3. Exponential var=4
4. Bimodal(1,11) var=9
5. Weibull-1 var=20
6. Weibull-2 var=76
7. Bimodal(1,101) var=99
Heavy-tailed
12
Simulation results
Number of servers = 8
Number of servers = 2
< 2% deviationfrom Exp
< 2% deviationfrom Exp
E[T]
E[T]
(95% conf intervals)
Increasing variability
13
Goal: Effect of variability on JSQ/PS
Conclusion:
E[T] is “nearly insensitive” to variability of G
14
Why is JSQ/PS “near-insensitive”?
Maybe just becauseM/G/1/PS is insensitive.
Which of the following do you think are insensitive?
???
PS
PS
RANDOM – randomly select one of K servers Round Robin – cyclic assignment Least Work Left – join the server with the smallest total remaining work
Maybe all routing policies are near-insensitive.
15
10
12
14
16
18
20
DetExp
Bim-1
Wei
b-1
Wei
b-2
Bim-2
E[T]
JSQ
Number of servers = 2???
PS
PS
16
10
12
14
16
18
20
DetExp
Bim-1
Wei
b-1
Wei
b-2
Bim-2
E[T]
RANDOM
JSQ
Number of servers = 2???
PS
PS
17
10
12
14
16
18
20
DetExp
Bim-1
Wei
b-1
Wei
b-2
Bim-2
E[T]
RANDOM
R-R
JSQ
Number of servers = 2???
PS
PS
18
10
12
14
16
18
20
DetExp
Bim-1
Wei
b-1
Wei
b-2
Bim-2
E[T]
RANDOM
R-R
LWL
JSQ
“Near-insensitivity” of JSQ is non-trivial (but cool) !
Number of servers = 2???
PS
PS
19
RecapJSQ/PS “nearly insensitive” to variability
M/M/K/JSQ/PS
JSQ
PS
PS
JSQ
FCFS
FCFS
M/M/K/JSQ/FCFS
Approximations exist
E[T]
=
M/G/K/JSQ/PS
JSQ
PS
PS
E[T]
≈
THEOREM: equality for H2*
20
OutlineJSQ/PS “nearly insensitive” to variability
M/M/K/JSQ/PS
JSQ
PS
PS
JSQ
FCFS
FCFS
M/M/K/JSQ/FCFS
Approximations exist
E[T]
=
M/G/K/JSQ/PS
JSQ
PS
PS
E[T]
≈
THEOREM: equality for H2*
PART I:
PART II: Investigate new approaches for M/M/K/JSQ
PART III: Is JSQ the best routing policy for PS servers?
21
Single Queue Approximation (SQA)
M/M/K/JSQ/PS
JSQ
PS
PS
??/M/1/PS
PS
Model queue 1 as
an independent PS queue
with state (queue length) dependent arrival rates
Mn/M/1/PS
(n)=# arrivals into queue 1 finding n jobstotal time there are n jobs in queue 1
Captures the effect of other queues in the JSQ
system
(n)≈
22
Single Queue Approximation (SQA)
Intuition test
M/M/K/JSQ/PS
JSQ
PS
PS
PS
Mn/M/1/PS
(n)=# arrivals into queue 1 finding n jobstotal time there are n jobs in queue 1
(n)≈
Q1: Which is true?
a. (0) = /Kb. (0) < /Kc. (0) > /K
Q1: Which is true?
a. (0) = /Kb. (0) < /Kc. (0) > /K
Q2: Which is true?
a. (0) = (1)b. (0) < (1)c. (0) > (1)
Q2: Which is true?
a. (0) = (1)b. (0) < (1)c. (0) > (1)
Q3: (n) as n→
a. 0.b /K
c. (/K)K
d. None of the above
Q3: (n) as n→
a. 0.b /K
c. (/K)K
d. None of the above
THEOREM: lim (n) = (/2)2 when K=2.n→
23
Single Queue Approximation (SQA)
M/M/K/JSQ/PS
JSQ
PS
PS
PS
Mn/M/1/PS
(n)=# arrivals into queue 1 finding n jobstotal time there are n jobs in queue 1
(n)≈
THEOREM: n = xn
n = Pr{n jobs in queue 1} xn = Pr{n jobs}
Where is the approximation?
Don’t know the exact (n)’s !
≡
24
Single Queue Approximation (SQA)
M/M/K/JSQ/PS
JSQ
PS
PS
PS
Mn/M/1/PS
(n)=# arrivals into queue 1 finding n jobstotal time there are n jobs in queue 1
(n)≈
Approximations for (0), (1), …, (n)
• For n≥3, (n) ≈ (/K)K
• Obtain closed form functional approx for (0), (1), (2)
Recall:(n) (/K)K
n→
25
Results (SQA)
1
2
3
4
5
6
0 10 20 30 40 50 60
Number of servers (K)
E[T]
Simulation
per serverload = 0.9
26
Results (SQA)
1
2
3
4
5
6
0 10 20 30 40 50 60
Number of servers (K)
E[T]
Simulation
SQA
< 2% error for E[T] for up to 64 servers
per serverload = 0.9
27
OutlineJSQ/PS “nearly insensitive” to variability
M/M/K/JSQ/PS
JSQ
PS
PS
JSQ
FCFS
FCFS
M/M/K/JSQ/FCFS
Approximations exist
E[T]
=
M/G/K/JSQ/PS
JSQ
PS
PS
E[T]
≈
THEOREM: equality for H2*
PART I:
PART II: Accurate approximation for M/M/K/JSQ
PART III: Is JSQ the best routing policy for PS servers?
28
10
12
14
16
18
20
DetExp
Bim-1
Wei
b-1
Wei
b-2
Bim-2
E[T]
RANDOM
R-R
LWL
JSQ
To JSQ or not to JSQ, that is the question..
???PS
PS
OPT-0 – minimize average response time given no more arrivals
29
10
12
14
16
18
20
DetExp
Bim-1
Wei
b-1
Wei
b-2
Bim-2
E[T]
RANDOM
R-R
LWL
JSQOPT-0
To JSQ or not to JSQ, that is the question..
???PS
PS
OPT-0 – minimize average response time given no more arrivals
30
10
12
14
16
18
20
DetExp
Bim-1
Wei
b-1
Wei
b-2
Bim-2
E[T]
RANDOM
R-R
LWL
JSQOPT-0
To JSQ or not to JSQ, that is the question..
???PS
PS
CONJEC: Minimum E[T] over all distributions, routing policies
Compare here for optimality
31
To JSQ or not to JSQ, that is the question..
Conclusion:
JSQ is near optimal,
without knowing job sizes or distribution
32
Conclusions
• JSQ/PS exhibits near-insensitivity to job size variability
• SQA method to analyze M/M/K/JSQ/PS
• JSQ is near-optimal for all job size distributions
M/G/K/JSQ/PS ≈ M/M/K/JSQ/PS
M/M/K/JSQ/PS = Mn/M/1/PS
M/G/K/JSQ/PS
JSQ
PS
PS
THM: H2* equivalence
THM: (n) convergenceTHM: Single queue equivalence