quantifying the properties of srpt scheduling mingwei gong and carey williamson department of...

Quantifying the Properties of SRPT Scheduling

Mingwei Gong and Carey WilliamsonDepartment of Computer ScienceUniversity of Calgary

July 22, 2003 2

Outline

IntroductionBackground

Web Server Scheduling PoliciesRelated Work

Research MethodologySimulation Results Defining/Refining UnfairnessQuantifying UnfairnessSummary, Conclusions, and Future Work

July 22, 2003 3

IntroductionWeb: large-scale, client-server systemWWW: World Wide Wait!User-perceived Web response time is composed of several components:

Transmission delay, propagation delay in networkQueueing delays at busy routersDelays caused by TCP protocol effects (e.g., handshaking, slow start, packet loss, retxmits)Queueing delays at the Web server itself, which may be servicing 100’s or 1000’s of concurrent requests

Our focus in this work: Web request scheduling

July 22, 2003 4

Example Scheduling Policies

FCFS: First Come First Servetypical policy for single shared resource (“unfair”)e.g., drive-thru restaurant; Sens playoff tickets

PS: Processor Sharingtime-sharing a resource amongst M jobseach job gets 1/M of the resources (equal, “fair”)e.g., CPU; VM; multi-tasking; Apache Web server

SRPT: Shortest Remaining Processing Timepre-emptive version of Shortest Job First (SJF)give resources to job that will complete quickeste.g., ??? (express lanes in grocery store)(almost)

July 22, 2003 5

Related Work

Theoretical work: SRPT is provably optimal in terms of mean response time and mean slowdown (“classical” results)

Practical work: CMU: prototype implementation in Apache Web server. The results are consistent with theoretical work.

Concern: unfairness problem (“starvation”) large jobs may be penalized (but not always true!)

July 22, 2003 6

Related Work (Cont’d)

Harchol-Balter et al. show theoretical results:

For the largest jobs, the slowdown asymptotically converges to the same value for any preemptive work-conserving scheduling policies (i.e., for these jobs, SRPT, or even LRPT, is no worse than PS)For sufficiently large jobs, the slowdown under SRPT is only marginally worse than under PS, by at most a factor of 1 + ε, for small ε > 0.

[M.Harchol-Balter, K.Sigman, and A.Wierman 2002], “Asymptotic Convergence of Scheduling Policies w.r.t. Slowdown”, Proceedings of IFIP Performance 2002, Rome, Italy, September 2002

July 22, 2003 7

Related Work (Cont’d)

[Wierman and Harchol-Balter 2003]:

[A. Wierman and M.Harchol-Balter 2003], (Best Paper) “Classifying Scheduling Policies w.r.t. Unfairness in an M/GI/1”, Proceedings of ACM SIGMETRICS, San Diego, CA, June 2003

AlwaysUnfair

SometimesUnfair

AlwaysFair FCFS

LAS

LRPT

FSP

PLCFSSRPT

SJF

PS

July 22, 2003 8Job Size

Slo

wdo

wn

PS

SRPT

0 8

A Pictorial View“crossover region” (mystery hump)

“asymptoticconvergence”

x y1

8

11-p

July 22, 2003 9

Research Questions

Do these properties hold in practice for empirical Web server workloads? (e.g., general arrival processes, service time distributions)What does “sufficiently large” mean?Is the crossover effect observable?If so, for what range of job sizes?Does it depend on the arrival process and the service time distribution? If so, how?Is PS (the “gold standard”) really “fair”?Can we do better? If so, how?

July 22, 2003 10

Overview of Research Methodology

Trace-driven simulation of simple Web serverEmpirical Web server workload trace (1M requests from WorldCup’98) for main exptsSynthetic Web server workloads for the sensitivity study experimentsProbe-based sampling methodologyEstimate job response time distributions for different job size, load level, scheduling policyGraphical comparisons of resultsStatistical tests of results (t-test, F-test)

July 22, 2003 11

Simulation Assumptions

User requests are for static Web contentServer knows response size in advance

Network bandwidth is the bottleneckAll clients are in the same LAN environment

Ignores variations in network bandwidth and propagation delay

Fluid flow approximation: service time = response size

Ignores packetization issues

Ignores TCP protocol effects

Ignores network effects

(These are consistent with SRPT literature)

July 22, 2003 12

Performance Metrics

Number of jobs in the system

Number of bytes in the system

Normalized slowdown:The slowdown of a job is its observed response time divided by the ideal response time if it were the only job in the system

Ranges between 1 and Lower is better

July 22, 2003 13

Empirical Web Server Workload

1998 WorldCup: Internet Traffic Archive: http://ita.ee.lbl.gov/

Item Value

Trace Duration 861 sec

Total Requests 1,000,000

Unique Documents 5,549

Total Transferred Bytes 3.3 GB

Smallest Transfer Size (bytes) 4

Largest Transfer Size (bytes) 2,891,887

Median Transfer Size (bytes) 889

Mean Transfer Size (bytes) 3,498

Standard Deviation (bytes) 18,815

July 22, 2003 14

TIMESTAMP SIZE

0.000000 3038

0.000315 949

0.001048 2240

0.004766 2051

0.005642 366

0.005872 201

0.006380 298

0.006742 1272

0.007271 597

0.008008 283

Preliminaries: An Example

Num

ber

of J

obs

in th

e S

yste

m

1

2

3

0.000315 0.001048

Num

ber

of B

ytes

in th

e S

yste

m

3000

4000

5000

0.000315 0.001048

Time

Jobs in System

Bytes in System

...

...

July 22, 2003 15

Observations:

The “byte backlog” is the same for each scheduling policy

The busy periods are the same for each policy.

The distribution of the number of jobs in the system is different

July 22, 2003 16

Marginal Distribution (Num Jobs in System) for PS and SRPT: differences are more pronounced at higher loads

General Observations (Empirical trace)

Load 50% Load 80% Load 95%

July 22, 2003 17

Objectives (Restated)

Compare PS policy with SRPT policy

Confirm theoretical results in previous work (Harchol-Balter et al.)

For the largest jobsFor sufficiently large jobs

Quantify unfairness properties

July 22, 2003 18

Probe-Based Sampling Algorithm

The algorithm is based on PASTA (Poisson Arrival See Time Average) Principle.

PS

PS

PS

Slowdown (1 sample)

Repeat

N

times

July 22, 2003 19

Probe-based Sampling Algorithm

For scheduling policy S =(PS, SRPT, FCFS, LRPT, …) do

For load level U = (0.50, 0.80, 0.95) do

For probe job size J = (1B, 1KB, 10KB, 1MB...) do

For trial I = (1,2,3… N) do

Insert probe job at randomly chosen point;

Simulate Web server scheduling policy;

Compute and record slowdown value observed;

end of I;

Plot marginal distribution of slowdown results;

end of J;

end of U;

end of S;

July 22, 2003 20


Example Results for 3 KB Probe Job

July 22, 2003 21


Siz

e 10

0KExample Results for 100 KB Probe Job

July 22, 2003 22


Example Results for 10 MB Probe Job

July 22, 2003 23

Statistical Summary of Results

July 22, 2003 24

Two Aspects of Unfairness

Endogenous unfairness: (SRPT) Caused by an intrinsic property of a job, such as its size. This aspect of unfairness is invariant

Exogenous unfairness: (PS)Caused by external conditions, such as the number of other jobs in the system, their sizes, and their arrival times.

Analogy: showing up at a restaurant without a reservation, wanting a table for k people

July 22, 2003 25

Observations for PSExogenous unfairnessdominant

PS is “fair” Sort of!

July 22, 2003 26

Observations for SRPTEndogenous unfairnessdominant

July 22, 2003 27

Asymptotic Convergence? Yes!

July 22, 2003 28

3M

3.5M

4M

Linear Scale Log Scale

Illustrating the crossover effect (load=95%)

July 22, 2003 29

Crossover Effect? Yes!

July 22, 2003 30

Summary and Conclusions

Trace-driven simulation of Web server scheduling strategies, using a probe-based sampling methodology (probe jobs) to estimate response time (slowdown) distributionsConfirms asymptotic convergence of the slowdown metric for the largest jobs

Confirms the existence of the “cross-over effect” for some job sizes under SRPTProvides new insights into SRPT and PS

Two types of unfairness: endogenous vs. exogenousPS is not really a “gold standard” for fairness!

July 22, 2003 31

Ongoing Work

Synthetic Web workloads Sensitivity to arrival process (self-similar traffic)Sensitivity to heavy-tailed job size distributions

Evaluate novel scheduling policies that may improve upon PS (e.g., FSP, k-SRPT, …)

July 22, 2003 32

Sensitivity to Arrival Process

A bursty arrival process (e.g., self-similar traffic, with Hurst parameter H > 0.5) makes things worse for both PS and SRPT policies

A bursty arrival process has greater impact on the performance of PS than on SRPT

PS exhibits higher exogenous unfairness than SRPT for all Hurst parameters and system loads tested

July 22, 2003 33

Sensitivity to Job Size Distribution

SRPT loves heavy-tailed distributions: the heavier the tail the better!

For all Pareto parameter values and all system loads considered, SRPT provides better performance than PS with respect to mean slowdown and standard deviation of slowdown

At high system load (U = 0.95), SRPT has more pronounced endogenous unfairness than PS

July 22, 2003 34

Thank You!

Questions?

Email: {gongm,carey}@cpsc.ucalgary.ca

M. Gong and C. Williamson, “Quantifying the Properties of SRPT Scheduling”,to appear, Proceedings of IEEE MASCOTS, Orlando, FL, October 2003

For more information:

quantifying the properties of srpt scheduling mingwei gong and carey williamson department of...

Documents

time jobs

web request scheduling

future work slide

better slide

srpt literature slide

job size slowdown ps

response size

introduction web