join-the-shortest-queue (jsq) routing in web server farms varun gupta joint with: mor harchol-balter...

32
Join-the-Shortest-Queue (JSQ) Routing in Web Server Farms VARUN GUPTA Joint with: Mor Harchol- Balter Carnegie Mellon Univ. Karl Sigman Columbia Univ. Ward Whitt Columbia Univ.

Post on 21-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Join-the-Shortest-Queue (JSQ) Routing in Web Server Farms VARUN GUPTA Joint with: Mor Harchol-Balter Carnegie Mellon Univ. Karl Sigman Columbia Univ. Ward

Join-the-Shortest-Queue (JSQ) Routing in Web Server Farms

VARUN GUPTA

Joint with:

Mor Harchol-Balter

Carnegie Mellon Univ.

Karl Sigman

Columbia Univ.

Ward Whitt

Columbia Univ.

Page 2: Join-the-Shortest-Queue (JSQ) Routing in Web Server Farms VARUN GUPTA Joint with: Mor Harchol-Balter Carnegie Mellon Univ. Karl Sigman Columbia Univ. Ward

2

Application: Web server farms

Commodity web servers

Local Router(Immediate Dispatch)

JSQ : most popular policy- Cisco Local Director- IBM Network Dispatcher - …

Timeshare service among

current requests

Page 3: Join-the-Shortest-Queue (JSQ) Routing in Web Server Farms VARUN GUPTA Joint with: Mor Harchol-Balter Carnegie Mellon Univ. Karl Sigman Columbia Univ. Ward

3

Model: PS server farm with JSQ

Commodity web servers

(Immediate Dispatch)Local Router

Timeshare service among

current requests

JSQ : most popular policy- Cisco Local Director- IBM Network Dispatcher - …

Page 4: Join-the-Shortest-Queue (JSQ) Routing in Web Server Farms VARUN GUPTA Joint with: Mor Harchol-Balter Carnegie Mellon Univ. Karl Sigman Columbia Univ. Ward

4

Model: PS server farm with JSQ

(Immediate Dispatch)

PS

PS

PS

• K homogenous, processor sharing servers

Local Router

Page 5: Join-the-Shortest-Queue (JSQ) Routing in Web Server Farms VARUN GUPTA Joint with: Mor Harchol-Balter Carnegie Mellon Univ. Karl Sigman Columbia Univ. Ward

5

Model: PS server farm with JSQ

PS

PS

PS

JSQ / Immed. Dispatch

Poisson

Rate

• K homogenous, processor sharing servers• Poisson arrivals• Job sizes i.i.d. ~ G

≡ M/G/K/JSQ/PS

Page 6: Join-the-Shortest-Queue (JSQ) Routing in Web Server Farms VARUN GUPTA Joint with: Mor Harchol-Balter Carnegie Mellon Univ. Karl Sigman Columbia Univ. Ward

6

Why join the shortest queue?

• Dynamic load balancing

• Simple

• Greedy for PS server farm– share server with minimum # of jobs

Page 7: Join-the-Shortest-Queue (JSQ) Routing in Web Server Farms VARUN GUPTA Joint with: Mor Harchol-Balter Carnegie Mellon Univ. Karl Sigman Columbia Univ. Ward

7

Prior Analysis of JSQ routing

2-server:

[Kingman 61] , [Flatto, McKean 77], [Cohen, Boxma 83], [Wessels, Adan, Zijm 91]

[Foschini, Salz 78], [Knessl, Makkowsky, Schuss, Tier 87]

[Conolly 84], [Rao, Posner 87], [Blanc 87], [Grassmann 80]

>2-server approximations:

[Nelson, Philips, Sigmetrics 89]

[Lin, Raghavendra, TPDS 96]

[Lui, Muntz, Towsley 95]

OUR GOAL: Analyze JSQ with PS servers and general job size distributions;

Limited to FCFS servers and mostly exponential job size distribution

JSQ

FCFS

FCFS

interested in mean response time, E[T]

Page 8: Join-the-Shortest-Queue (JSQ) Routing in Web Server Farms VARUN GUPTA Joint with: Mor Harchol-Balter Carnegie Mellon Univ. Karl Sigman Columbia Univ. Ward

8

• Observe: exponential job sizes

• How about general job sizes?

GOAL: Analysis of JSQ with PS servers

JSQ

FCFS

FCFS

M/M/K/JSQ/FCFSM/M/K/JSQ/PS

JSQ

PS

PS

jointqueuelength

Approximations exist

GOAL: Effect of job size variability on JSQ/PS

Page 9: Join-the-Shortest-Queue (JSQ) Routing in Web Server Farms VARUN GUPTA Joint with: Mor Harchol-Balter Carnegie Mellon Univ. Karl Sigman Columbia Univ. Ward

9

THEOREM: E[T] insensitive under H2* jobs

Goal: Effect of job size variability on JSQ/PS

Idea: Look at H2*(,p) distribution

2 degrees of freedom

can fix mean andcontrol variance

Page 10: Join-the-Shortest-Queue (JSQ) Routing in Web Server Farms VARUN GUPTA Joint with: Mor Harchol-Balter Carnegie Mellon Univ. Karl Sigman Columbia Univ. Ward

10

THEOREM: E[T] insensitive under H2* job size distribution

PROOF:

M/H2*/K/JSQ/PS

JSQ

PS

PS

H2*(,p)

M/M/K/JSQ/PS

JSQ

PS

PS

(1-p)

Exp()

M/M/K/JSQ/PS

JSQ

PS

PS

Exp( )1-p

stationaryqueue lengthdistribution

stationaryqueue lengthdistribution

Q: What happens to 0-sized jobs?A: Disappear on arrival

equal mean size

Page 11: Join-the-Shortest-Queue (JSQ) Routing in Web Server Farms VARUN GUPTA Joint with: Mor Harchol-Balter Carnegie Mellon Univ. Karl Sigman Columbia Univ. Ward

11

Insensitivity for general distributions?

Simulate M/G/K/JSQ/PS under following 7 distributions (all with mean 2)1. Deterministic var=0

2. Erlang2 var=2

3. Exponential var=4

4. Bimodal(1,11) var=9

5. Weibull-1 var=20

6. Weibull-2 var=76

7. Bimodal(1,101) var=99

Heavy-tailed

Page 12: Join-the-Shortest-Queue (JSQ) Routing in Web Server Farms VARUN GUPTA Joint with: Mor Harchol-Balter Carnegie Mellon Univ. Karl Sigman Columbia Univ. Ward

12

Simulation results

Number of servers = 8

Number of servers = 2

< 2% deviationfrom Exp

< 2% deviationfrom Exp

E[T]

E[T]

(95% conf intervals)

Increasing variability

Page 13: Join-the-Shortest-Queue (JSQ) Routing in Web Server Farms VARUN GUPTA Joint with: Mor Harchol-Balter Carnegie Mellon Univ. Karl Sigman Columbia Univ. Ward

13

Goal: Effect of variability on JSQ/PS

Conclusion:

E[T] is “nearly insensitive” to variability of G

Page 14: Join-the-Shortest-Queue (JSQ) Routing in Web Server Farms VARUN GUPTA Joint with: Mor Harchol-Balter Carnegie Mellon Univ. Karl Sigman Columbia Univ. Ward

14

Why is JSQ/PS “near-insensitive”?

Maybe just becauseM/G/1/PS is insensitive.

Which of the following do you think are insensitive?

???

PS

PS

RANDOM – randomly select one of K servers Round Robin – cyclic assignment Least Work Left – join the server with the smallest total remaining work

Maybe all routing policies are near-insensitive.

Page 15: Join-the-Shortest-Queue (JSQ) Routing in Web Server Farms VARUN GUPTA Joint with: Mor Harchol-Balter Carnegie Mellon Univ. Karl Sigman Columbia Univ. Ward

15

10

12

14

16

18

20

DetExp

Bim-1

Wei

b-1

Wei

b-2

Bim-2

E[T]

JSQ

Number of servers = 2???

PS

PS

Page 16: Join-the-Shortest-Queue (JSQ) Routing in Web Server Farms VARUN GUPTA Joint with: Mor Harchol-Balter Carnegie Mellon Univ. Karl Sigman Columbia Univ. Ward

16

10

12

14

16

18

20

DetExp

Bim-1

Wei

b-1

Wei

b-2

Bim-2

E[T]

RANDOM

JSQ

Number of servers = 2???

PS

PS

Page 17: Join-the-Shortest-Queue (JSQ) Routing in Web Server Farms VARUN GUPTA Joint with: Mor Harchol-Balter Carnegie Mellon Univ. Karl Sigman Columbia Univ. Ward

17

10

12

14

16

18

20

DetExp

Bim-1

Wei

b-1

Wei

b-2

Bim-2

E[T]

RANDOM

R-R

JSQ

Number of servers = 2???

PS

PS

Page 18: Join-the-Shortest-Queue (JSQ) Routing in Web Server Farms VARUN GUPTA Joint with: Mor Harchol-Balter Carnegie Mellon Univ. Karl Sigman Columbia Univ. Ward

18

10

12

14

16

18

20

DetExp

Bim-1

Wei

b-1

Wei

b-2

Bim-2

E[T]

RANDOM

R-R

LWL

JSQ

“Near-insensitivity” of JSQ is non-trivial (but cool) !

Number of servers = 2???

PS

PS

Page 19: Join-the-Shortest-Queue (JSQ) Routing in Web Server Farms VARUN GUPTA Joint with: Mor Harchol-Balter Carnegie Mellon Univ. Karl Sigman Columbia Univ. Ward

19

RecapJSQ/PS “nearly insensitive” to variability

M/M/K/JSQ/PS

JSQ

PS

PS

JSQ

FCFS

FCFS

M/M/K/JSQ/FCFS

Approximations exist

E[T]

=

M/G/K/JSQ/PS

JSQ

PS

PS

E[T]

THEOREM: equality for H2*

Page 20: Join-the-Shortest-Queue (JSQ) Routing in Web Server Farms VARUN GUPTA Joint with: Mor Harchol-Balter Carnegie Mellon Univ. Karl Sigman Columbia Univ. Ward

20

OutlineJSQ/PS “nearly insensitive” to variability

M/M/K/JSQ/PS

JSQ

PS

PS

JSQ

FCFS

FCFS

M/M/K/JSQ/FCFS

Approximations exist

E[T]

=

M/G/K/JSQ/PS

JSQ

PS

PS

E[T]

THEOREM: equality for H2*

PART I:

PART II: Investigate new approaches for M/M/K/JSQ

PART III: Is JSQ the best routing policy for PS servers?

Page 21: Join-the-Shortest-Queue (JSQ) Routing in Web Server Farms VARUN GUPTA Joint with: Mor Harchol-Balter Carnegie Mellon Univ. Karl Sigman Columbia Univ. Ward

21

Single Queue Approximation (SQA)

M/M/K/JSQ/PS

JSQ

PS

PS

??/M/1/PS

PS

Model queue 1 as

an independent PS queue

with state (queue length) dependent arrival rates

Mn/M/1/PS

(n)=# arrivals into queue 1 finding n jobstotal time there are n jobs in queue 1

Captures the effect of other queues in the JSQ

system

(n)≈

Page 22: Join-the-Shortest-Queue (JSQ) Routing in Web Server Farms VARUN GUPTA Joint with: Mor Harchol-Balter Carnegie Mellon Univ. Karl Sigman Columbia Univ. Ward

22

Single Queue Approximation (SQA)

Intuition test

M/M/K/JSQ/PS

JSQ

PS

PS

PS

Mn/M/1/PS

(n)=# arrivals into queue 1 finding n jobstotal time there are n jobs in queue 1

(n)≈

Q1: Which is true?

a. (0) = /Kb. (0) < /Kc. (0) > /K

Q1: Which is true?

a. (0) = /Kb. (0) < /Kc. (0) > /K

Q2: Which is true?

a. (0) = (1)b. (0) < (1)c. (0) > (1)

Q2: Which is true?

a. (0) = (1)b. (0) < (1)c. (0) > (1)

Q3: (n) as n→

a. 0.b /K

c. (/K)K

d. None of the above

Q3: (n) as n→

a. 0.b /K

c. (/K)K

d. None of the above

THEOREM: lim (n) = (/2)2 when K=2.n→

Page 23: Join-the-Shortest-Queue (JSQ) Routing in Web Server Farms VARUN GUPTA Joint with: Mor Harchol-Balter Carnegie Mellon Univ. Karl Sigman Columbia Univ. Ward

23

Single Queue Approximation (SQA)

M/M/K/JSQ/PS

JSQ

PS

PS

PS

Mn/M/1/PS

(n)=# arrivals into queue 1 finding n jobstotal time there are n jobs in queue 1

(n)≈

THEOREM: n = xn

n = Pr{n jobs in queue 1} xn = Pr{n jobs}

Where is the approximation?

Don’t know the exact (n)’s !

Page 24: Join-the-Shortest-Queue (JSQ) Routing in Web Server Farms VARUN GUPTA Joint with: Mor Harchol-Balter Carnegie Mellon Univ. Karl Sigman Columbia Univ. Ward

24

Single Queue Approximation (SQA)

M/M/K/JSQ/PS

JSQ

PS

PS

PS

Mn/M/1/PS

(n)=# arrivals into queue 1 finding n jobstotal time there are n jobs in queue 1

(n)≈

Approximations for (0), (1), …, (n)

• For n≥3, (n) ≈ (/K)K

• Obtain closed form functional approx for (0), (1), (2)

Recall:(n) (/K)K

n→

Page 25: Join-the-Shortest-Queue (JSQ) Routing in Web Server Farms VARUN GUPTA Joint with: Mor Harchol-Balter Carnegie Mellon Univ. Karl Sigman Columbia Univ. Ward

25

Results (SQA)

1

2

3

4

5

6

0 10 20 30 40 50 60

Number of servers (K)

E[T]

Simulation

per serverload = 0.9

Page 26: Join-the-Shortest-Queue (JSQ) Routing in Web Server Farms VARUN GUPTA Joint with: Mor Harchol-Balter Carnegie Mellon Univ. Karl Sigman Columbia Univ. Ward

26

Results (SQA)

1

2

3

4

5

6

0 10 20 30 40 50 60

Number of servers (K)

E[T]

Simulation

SQA

< 2% error for E[T] for up to 64 servers

per serverload = 0.9

Page 27: Join-the-Shortest-Queue (JSQ) Routing in Web Server Farms VARUN GUPTA Joint with: Mor Harchol-Balter Carnegie Mellon Univ. Karl Sigman Columbia Univ. Ward

27

OutlineJSQ/PS “nearly insensitive” to variability

M/M/K/JSQ/PS

JSQ

PS

PS

JSQ

FCFS

FCFS

M/M/K/JSQ/FCFS

Approximations exist

E[T]

=

M/G/K/JSQ/PS

JSQ

PS

PS

E[T]

THEOREM: equality for H2*

PART I:

PART II: Accurate approximation for M/M/K/JSQ

PART III: Is JSQ the best routing policy for PS servers?

Page 28: Join-the-Shortest-Queue (JSQ) Routing in Web Server Farms VARUN GUPTA Joint with: Mor Harchol-Balter Carnegie Mellon Univ. Karl Sigman Columbia Univ. Ward

28

10

12

14

16

18

20

DetExp

Bim-1

Wei

b-1

Wei

b-2

Bim-2

E[T]

RANDOM

R-R

LWL

JSQ

To JSQ or not to JSQ, that is the question..

???PS

PS

OPT-0 – minimize average response time given no more arrivals

Page 29: Join-the-Shortest-Queue (JSQ) Routing in Web Server Farms VARUN GUPTA Joint with: Mor Harchol-Balter Carnegie Mellon Univ. Karl Sigman Columbia Univ. Ward

29

10

12

14

16

18

20

DetExp

Bim-1

Wei

b-1

Wei

b-2

Bim-2

E[T]

RANDOM

R-R

LWL

JSQOPT-0

To JSQ or not to JSQ, that is the question..

???PS

PS

OPT-0 – minimize average response time given no more arrivals

Page 30: Join-the-Shortest-Queue (JSQ) Routing in Web Server Farms VARUN GUPTA Joint with: Mor Harchol-Balter Carnegie Mellon Univ. Karl Sigman Columbia Univ. Ward

30

10

12

14

16

18

20

DetExp

Bim-1

Wei

b-1

Wei

b-2

Bim-2

E[T]

RANDOM

R-R

LWL

JSQOPT-0

To JSQ or not to JSQ, that is the question..

???PS

PS

CONJEC: Minimum E[T] over all distributions, routing policies

Compare here for optimality

Page 31: Join-the-Shortest-Queue (JSQ) Routing in Web Server Farms VARUN GUPTA Joint with: Mor Harchol-Balter Carnegie Mellon Univ. Karl Sigman Columbia Univ. Ward

31

To JSQ or not to JSQ, that is the question..

Conclusion:

JSQ is near optimal,

without knowing job sizes or distribution

Page 32: Join-the-Shortest-Queue (JSQ) Routing in Web Server Farms VARUN GUPTA Joint with: Mor Harchol-Balter Carnegie Mellon Univ. Karl Sigman Columbia Univ. Ward

32

Conclusions

• JSQ/PS exhibits near-insensitivity to job size variability

• SQA method to analyze M/M/K/JSQ/PS

• JSQ is near-optimal for all job size distributions

M/G/K/JSQ/PS ≈ M/M/K/JSQ/PS

M/M/K/JSQ/PS = Mn/M/1/PS

M/G/K/JSQ/PS

JSQ

PS

PS

THM: H2* equivalence

THM: (n) convergenceTHM: Single queue equivalence