964 ieee transactions on computers, vol. 54, no. 8, …zbo/publications/zhou-tc05.pdf · robust...

Robust Processing Rate Allocation forProportional Slowdown Differentiation on

Internet ServersJianbin Wei, Student Member, IEEE, Xiaobo Zhou, Member, IEEE Computer Society, and

Cheng-Zhong Xu, Senior Member, IEEE

Abstract—A desirable behavior of an Internet server is that a request’s queuing delay depends on its service time in a linear fashion.

Measuring the quality of service in terms of slowdown, the ratio of a request’s queuing delay to its service time, provides a simple way

to attain the objective. Moreover, it treats client requests equally regardless of their service time, whereas response time favors

requests that need more processing resources. In this paper, we propose a proportional slowdown differentiation (PSD) service model

on Internet servers. It aims to maintain prespecified slowdown ratios between different classes of client requests. To provide PSD

services, we first derive a closed-form expression of the expected slowdown in an M=G=1 FCFS queuing system with a typical heavy-

tailed service time distribution, the bounded Pareto distribution. Based on the closed-form expression, we design a queuing-theoretic

strategy of processing-rate allocation. The rate allocation is realized by deploying a virtual server for each class. Simulation results

show that the strategy can provide controllable PSD services on Internet servers. It, however, comes along with large variance and

weak predictability due to the dynamics of Internet traffic. To address these issues, we design an integral feedback controller and

integrate it into the queuing-theoretic strategy. Simulation results demonstrate that the integrated strategy is robust and can deliver

predictable PSD services at a superior fine-grained level. We modified the Apache Web server with an implementation of the

integrated processing-rate allocation strategy. Experimental results further demonstrate its effectiveness and feasibility in practice.

Index Terms—Quality of Service, slowdown, queuing theory, feedback control, rate allocation.

�

1 INTRODUCTION

INthe past decade, we have seen an increasing demand forprovisioning of different levels of quality of service (QoS)

to various network applications and customers. Differen-tiated services (DiffServ) [7], which aims to providedifferent QoS levels between multiple classes of aggregatedtraffic flows, is a major service architecture for thisrequirement. Previous studies on differentiated servicesare mainly in network core in terms of queuing delay [13],[22], [34] and packet loss ratio [19]. For example, inproportional average delay [13], a class’s priority is setaccording to the average delay of its departed packets.

The end-to-end performance of Internet services is not

only affected by network core, but also by Internet servers.

The approaches for differentiated delay services could be

applied to server-side request queuing delay differentiation;

see [20] for a recent example. On the server side, however, a

compelling performance metric is slowdown, the ratio of a

request’s queuing delay to its service time [17]. Although

queuing delay and response time are major performance

metrics, they are unsuitable to compare requests that have

very different resource demands. For example, it is

unreasonable to ensure the same average response time

for a 10K bytes JPEG file and also a 600K bytes PDF file.

Actually, clients are likely to anticipate short delays for

“small” requests and are willing to tolerate long delays for

“large” requests. Thus, it is desirable that a request’s

queuing delay depends on its service time in a linear

fashion. Measuring quality of service in terms of slowdown

facilitates attaining this objective. In addition, it treats client

requests equally, regardless of their service time, whereas

response time favors requests that need more processing

resources [17], [18].Slowdown and its variant have been adopted as

important metrics of quality of service in recently designed

QoS-aware resource management systems, including [6],

[8], [12], [15], [17], [18], [36]. For example, in [18], the

authors proposed a size-based scheduling algorithm that

assigned high priorities to requests with small service time

so as to improve the performance of Web servers in terms of

mean slowdown and mean response time. Although the

size-based QoS-aware resource management is able to

ensure that small requests experience short response time,

none of the systems can guarantee quality differences

between different request classes in terms of slowdown.

Existing algorithms for proportional delay differentiation

services in network core are unable to provide proportional

slowdown differentiation (PSD) services because the re-

quest’s service time is a priori unknown and hard to predict

in advance.

964 IEEE TRANSACTIONS ON COMPUTERS, VOL. 54, NO. 8, AUGUST 2005

. J. Wei and C.-Z. Xu are with the Department of Electrical and ComputerEngineering, Wayne State University, 5050 Anthony Wayne Drive,Detroit, MI 48202. E-mail: {jbwei, czxu}@wayne.edu.

. X. Zhou is with the Department of Computer Science, University ofColorado at Colorado Springs, 1420 Austin Bluffs Parkway, PO Box 7150,Colorado Springs, CO 80933. E-mail: [email protected].

Manuscript received 30 June 2004; revised 14 Dec. 2004; accepted 8 Mar.2005; published online 15 June 2005.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TC-0223-0604.

0018-9340/05/$20.00 � 2005 IEEE Published by the IEEE Computer Society

In this paper, we first propose a PSD service model onInternet servers. It aims to maintain prespecified slowdownratios between different classes of client requests. We thenmodel Internet servers as an M=G=1 queuing system withbounded Pareto service time distribution, as suggested in[17], [18]. We refer to this queuing model as M=GP=1 in therest of this paper. To provide PSD services, we first derive aclosed analytic form of the expected per-class slowdown.Based on the expression, we develop a queuing-theoreticstrategy for processing-rate allocation. The processing rate ofa class is realized by virtual servers. Each virtual serverprocesses requests from the same class in an FCFS discipline.Simulation results verify that the queuing-theoretic strategycan guarantee the controllability of PSD services on average.Such services, however, come along with large variance andweakpredictability due to the dynamics of Internet traffic. Asa remedy, we design an integral feedback controller andintegrate it into the queuing-theoretic strategy. Simulationresults under different system conditions demonstrate thatthe integrated strategy is robust and can deliver predictablePSD services at a fine-grained level. We also discuss thedesign issues of the feedback controller, including thedetermination of the upper bound of a class’s processingrate and the setting of the control gain, via comprehensivesimulations. To demonstrate the effectiveness of the inte-grated strategy in real systems,we implement the strategy bymodifying the Apache Web server. Experimental resultsagree with our findings in the simulations and show itsfeasibility in practice.

The structure of the paper is as follows: Section 2presents the PSD service model. Section 3 discusses theslowdown in an M=GP=1 queuing system. Section 4presents the queuing-theoretic strategy of processing-rateallocation and evaluates its performance. Section 5 intro-duces an integral feedback controller and its integration intothe queuing-theoretic strategy. Section 6 presents animplementation of the integrated strategy and experimentalresults. Section 7 reviews related work in differentiatedservices provisioning and Section 8 concludes the paper.

2 PROPORTIONAL SLOWDOWN DIFFERENTIATION

SERVICE MODEL

The PSD service model is to maintain prespecified slow-down ratios between different classes. In the model,incoming requests are classified into N classes. Note thatthe classification is based on service providers’ policies,such as service fee paid by clients. Such a classification isindependent of the workload characteristics of requests. LetSiðkÞ denote the average slowdown of class i computed atsampling period k and �i its prespecified differentiationparameter. The PSD service model requires

SiðkÞSjðkÞ

¼ �i�j; 1 � i; j � N: ð1Þ

Due to the constraint of the conservation law, the sum ofprocessing rates allocated to all classes must equal thesystem capacity. Let ci denote the normalized processingrate allocated to class i. The rate allocation subjects to theconstraint

XNi¼1

ci ¼ 1: ð2Þ

There are three requirements of the provided PSD

services: predictability, controllability, and fairness.

. Predictability. It requires the PSD services to beconsistent and independent of variations of classworkloads.

. Controllability. It means the quality differencesbetween classes should be adjustable via theirdifferentiation parameters.

. Fairness. It means low priority classes should not beovercompromised, especially when workloads arehigh.

3 SLOWDOWN MODELING IN AN M=GP=1 SYSTEM

As suggested in [17], [18], we assume that an Internet server

can be modeled as an M=G=1 queuing system. In practice,

the interarrivals of client sessions follow an exponential

distribution [23]; many measured computer workloads are

heavy-tailed, such as sizes of documents on a Web server

[3] and sizes of FTP transfers in the Internet [31]. In general,

a heavy-tailed distribution is one for which

PrfX � xg � x��; 0 < � < 2;

where X denotes the service time density distribution.A typical heavy-tailed distribution is Pareto distribution.

Its probability density function is

fðxÞ ¼ �p�x��1; �; p > 0; x � p; ð3Þ

where � is the shape parameter. With respect to the fact that

there is some upper bound on a client request’s required

processing resource in practice, a bounded Pareto distribu-

tion has been adopted to characterize the workloads on

Internet servers [4], [17]. Let x be a request’s service time.

The probability density function of the bounded Pareto

distribution is defined as

fðxÞ ¼ �p�

1� ðp=qÞ� x��1; �; p; q > 0;

where p � x � q. We refer to an M=G=1 queuing system

where the service time follows a bounded Pareto distribu-

tion as an M=GP=1 system.

3.1 Preliminaries of Slowdown

Since �, p, and q are parameters of the bounded Pareto

distribution, for simplicity in notation, we define a function

Kð�; p; qÞ ¼ �p�

1� ðp=qÞ� :

The probability density function fðxÞ is rewritten as

fðxÞ ¼ Kð�; p; qÞx��1: ð4Þ

Letm1,m�1,m2 be its first moment (mean), themoment of its

inverse, and the second moment, respectively. By (4), we

have:

WEI ET AL.: ROBUST PROCESSING RATE ALLOCATION FOR PROPORTIONAL SLOWDOWN DIFFERENTIATION ON INTERNET SERVERS 965

m1 ¼ E½X�

¼Kð�;p;qÞ

Kð��1;p;qÞ if � 6¼ 1;

ðln q � ln pÞKð�; p; qÞ if � ¼ 1;

(ð5Þ

m�1 ¼ E½X�1� ¼ Kð�; p; qÞKð�þ 1; p; qÞ ; ð6Þ

m2 ¼ E½X2� ¼ Kð�; p; qÞKð�� 2; p; qÞ : ð7Þ

According to the Pollaczek-Khinchin formula, we obtain

the expected slowdown shown in the following lemma.

Lemma 1. Given an M=GP=1 FCFS queue where the arrival

process has rate � on a server. Let w be a request’s queuing

delay. Then, its expected queuing delay and slowdown are

E½w� ¼ �m2

2ð1� �m1Þ; ð8Þ

S ¼ Ew

x

h i¼ E½w�E½x�1� ¼ �m2m�1

2ð1� �m1Þ: ð9Þ

Note that the slowdown formula follows the fact that w and x

are independent random variables in an FCFS queue.

3.2 Slowdown on Internet Servers

We assume that the processing rate of an Internet server can

beproportionally allocated toanumberofvirtual serversusing

the proportional-share resource scheduling mechanisms,

such as PGPS [29]. Each virtual server processes requests

from a class using an FCFS discipline. It is an abstract concept

in the sense that it can be a groupof child processes or threads

in a multiprocess or multithread server [9].

Lemma 2. Given an M=GP=1 FCFS queue on a virtual server i

with processing capacity ci, let mi1, m

i�1, and mi

2 be the first

moment (mean), the moment of the inverse, and the second

moment of class i’s service time, respectively. Then,

mi1 ¼

m1

ci; ð10Þ

mi�1 ¼ cim�1; ð11Þ

mi2 ¼

m2

c2i: ð12Þ

Proof. On a virtual server with processing capacity 1, the

lower bound and upper bound of the bounded Pareto

distribution are p and q, respectively. On virtual server i

with processing capacity ci, because each class receives

ci fraction of processing capacity, a job takes 1=ci as long.

The bounds become p=ci and q=ci, respectively. Accord-

ing to (3), we have

Z q=ci

p=ci

fðxÞdx ¼ c�i 1� ðp=qÞ�ð Þ:

Thus, on virtual server i, we have the probabilitydensity function of the bounded Pareto distribution as

fðxÞ ¼ �p�

c�i 1� ðp=qÞ�ð Þx��1;

where � > 0 and p=ci � x � q=ci. With the notation

Kð�; p; qÞ, the probability density function is rewritten as

fðxÞ ¼ Kð�; p; qÞc��i x��1:

It follows that

mi1 ¼

1ci

Kð�;p;qÞKð��1;p;qÞ if � 6¼ 1;

1ciðln q � ln pÞKð�; p; qÞ if � ¼ 1;

(ð13Þ

mi�1 ¼ ci

Kð�; p; qÞKð�þ 1; p; qÞ ; ð14Þ

mi2 ¼

1

c2i

Kð�; p; qÞKð�� 2; p; qÞ : ð15Þ

By substitution with m1, m�1, and m2 in (5), (6), and(7), we obtain the result of this lemma. This concludesthe proof. tu

According to Lemma 1 and Lemma 2, we have the

following theorem.

Theorem 1. Given an M=GP=1 FCFS queue on virtual server i

with processing capacity ci, and arrival rate �i, its expected

slowdown is

Si ¼ E½si� ¼�im2m�1

2ðci � �im1Þ: ð16Þ

Note that the M=GP=1 queue reduces to an M=D=1

queue when all requests have the same service time. In an

M=D=1 FCFS system, (16) becomes

Si ¼�id

2ðci � �idÞ; ð17Þ

where d is the constant service time. This fixed-time

queuing model is valid in session-based e-commerce

applications. A session is a sequence of requests of different

types made by a single customer during a single visit to a

site. Requests at some states such as home entry or register

take approximately the same service time.

4 QUEUING-THEORETIC RATE-ALLOCATION

STRATEGY

In this section, we present a queuing-theoretic strategy for

PSDserviceprovisioning.Weevaluate theeffectivenessof the

strategywith respect to the predictability, controllability, and

fairness requirements via comprehensive simulations.

4.1 Rate Allocation Based on Queuing Theory

Basedon theanalysis of a class’s slowdown inavirtual server,

we derive the processing rate ci that should be allocated to

class i. This is a resource-allocation problem. The set of (1), in

combinationwith the resource allocation constraint (2), leads

to a linear equation system. By Theorem 1 and substituting

(16) into (1), we have


ci ¼ �im1 þ�i=�iPNi¼1 �i=�i

1�XNi¼1

�im1

!: ð18Þ

The first term of (18) is a baseline that prevents the classfrom being overloaded. By (16), if the class is overloaded,then its average slowdown becomes infinite according toqueuing theory. The second term is a portion of surplusprocessing rate determined by the class’s differentiationparameter and the load condition (i.e., normalized arrivalrate). It controls the quality differences between classes.

By substituting (18) into (16), we obtain the expectedslowdown of class i as

Si ¼�im2m�1

PNi¼1 �i=�i

2ð1�m1

PNi¼1 �iÞ

: ð19Þ

Equation (19) implies the following three basic propertiesregarding the predictability and controllability of PSDservices due to the allocation strategy:

. Slowdown of a class increases with its requestarrival rate.

. With the increase of the differentiation parameter ofa class, its slowdown increases while slowdown ofall other classes decreases.

. With the increase of a class’s load (request arrivalrate), a higher priority class causes a larger increasein its slowdown than that of a lower priority class.

4.2 Simulation Model

We built a simulation model of an M=GP=1 queuingsystem. The model consisted of a number of requestgenerators, waiting queues, a rate predictor, a processing-rate allocator, and a number of virtual servers. Fig. 1outlines the basic structure of the simulation model. In thesimulations for the queuing-theoretic strategy, the feedbackcontroller is turned off. It is turned on in the simulations ofthe integrated strategy described in Section 5.

The request generators use the functions provided byGNU scientific library (GSL) to generate requests followingexponential interarrival distribution. The bounded Paretoservice time distribution is generated by modifying thebuilt-in Pareto function of GSL. We assume that a request’sservice time isproportional to its size.Note that thenumberofclasses in the PSD service model is usually rather limited. Itvaries from two to three in many similar experiments fordifferentiation service provisioning [13], [22], [37], [38]. Inaddition, as recommended in [27], a service provider shouldsupport two or three different levels of services: premium,assured, or best-effort services. In the simulations, eachrequest was sent to the server and stored in its corresponding

waiting queue. Requests from the same class wereprocessed by a virtual server in an FCFS discipline.

The rate predictor calculates the processing rate of a classusing predicted load of the class according to (18). As pointedout in [32], to provide differentiated services, we need tomanipulate the probability distribution of a random process,where it is the number of events (request departures) thataffects mean slowdown. Meanwhile, the number of eventsshould be reasonably large so that the scheduler candetermine the effect of resource allocation on the achievedslowdown ratio. Therefore, in the simulation, the load waspredicted for every sampling period,whichwasmeasured asa thousand request departures, as suggested in [32]. It canalso be converted from event to real time. The experimentalresults presented in Section 6 demonstrate that such aconversion will not affect the performance of the proposedstrategies. In these simulations, we applied the movingaverage method to predict a class’s load. A class’s predictedload is the average of its past five sampling periods. The rateallocator enforces the rate allocation set by the rate predictorfor every sampling period.

Simulation parameters were set as follows: The shapeparameter (�) of the bounded Pareto distribution was set to1.5, as suggested in [13]. As indicated in [17], its lowerbound and upper bound were set to 0.1 and 100,respectively. We also carried out experiments with differentshape parameters and larger upper bounds to evaluate theirimpact on the performance of the proposed strategies. Allclasses were assumed to have a similar load in long time-scales. As we shall see in Fig. 7, the load of a class isdynamic in short time-scales. In every experiment, thesimulator was first warmed up for 10 sampling periods, theslowdown of a class was then measured for every followingsampling period. After 60 sampling periods, the measuredslowdown was averaged. Each reported result is an averageof 100 runs.

4.3 Effectiveness of the Queuing-Theoretic Strategy

In this section, we show the effectiveness of the queuing-theoretic strategy by comparing the simulated slowdownwith the expected values calculated by (19) under variousload conditions. We have conducted experiments withdifferent numbers of classes. For brevity, we only presentthose with two classes because no qualitative difference isobserved. The results are depicted in Fig. 2. To show theresults in a comparable way, the logarithmic y-axis is used.From this figure, it is clear that there are very smalldifferences between the simulated and expected slowdownunder various load conditions. This indicates that thequeuing-theoretic strategy is able to guarantee the target


Fig. 1. The simulation model’s structure.Fig. 2. Simulated and expected slowdown (�2=�1 ¼ 2).

slowdown ratio between two classes. It provides the basefor the following discussions on the properties of thequeuing-theoretic strategy.

Notice that the slowdown of a class increases with theincrease of system load. This can be explained by (19). Withthe increase of request arrival rates (system load),(1�m1

PNi¼1 �i) decreases and

PNi¼1 �i=�i increases. There-

fore, Si increases.

4.4 Differentiation Predictability of theQueuing-Theoretic Strategy

Recall that the predictability of PSD services means theslowdown of a higher priority class is proportionallysmaller than that of a lower priority class under differentsystem load conditions and time-scales. We carried outexperiments to examine the capability of the queuing-theoretic strategy in supporting this property. Fig. 3 showsthe percentiles of the results.

In the figure, it is observed that the strategy guaranteesthe 50th percentiles of achieved slowdown ratios to bearound the target ones. This indicates that a higher priorityclass always experiences smaller average slowdown than alower priority class in long time-scales.

We also observe from Fig. 3 that the variance of theachieved slowdown ratios is large. For example, when thesystem load is 50 percent and the target slowdown ratio is 4,the difference between the 95th percentile and the fifthpercentile is 5. There are two reasons for this. First, althoughthe strategy is able to control the average slowdown of aclass by adjusting the processing rate based on queuingtheory, it provides no means to control the variance of theslowdown simultaneously. Second, the workload of a classis stochastic and may change abruptly [21]. In the queuing-theoretic strategy, the processing rate of a class is calculatedaccording to a class’s predicted load, which is difficult topredict accurately from history information. Such inaccu-racy in load prediction results in large variance along withthe achieved slowdown ratio. We shall further discuss thislimitation of the queuing-theoretic strategy in Section 4.7.

Fig. 3 also shows that the provided PSD services violatethe predictability requirement when the target slowdownratio is small. For example, when �2=�1 is 2 and the systemload is 10 percent, the fifth percentile of the achievedslowdown ratios is smaller than 1. It indicates that a higherpriority class occasionally receives worse services than alower priority class. This further motivates the design of theintegrated strategy presented in Section 5.

In order to demonstrate the differentiation predictabilitydue to the queuing-theoretic strategy in short time-scales,we present the slowdown of individual requests in Fig. 4. It

is observed from this figure that, although the targetslowdown ratio of class 2 to class 1 is 2, sometimes requestsfrom class 1 have larger slowdown than class 2 when thesystem load is heavy. Such behavior has also been observedin experiments under different load conditions. It suggeststhat the strategy is unable to provide predictablePSD services in short time-scales. This is because thequeuing-theoretic strategy determines the processing rateallocated to a class periodically according to its macrobehavior (predicted class load) rather than the microbehavior (individual requests’ slowdown).

4.5 Differentiation Controllability of theQueuing-Theoretic Strategy

In this section, we demonstrate the differentiation controll-ability due to the queuing-theoretic strategy. Fig. 5 plots theachieved slowdown ratios of two classes with differentdifferentiation parameters. It is seen that the target slow-down ratios are accurately achieved when they are small(e.g., �2=�1 ¼ 2 or 4). As the target slowdown ratio increases,the error becomes larger due to load-prediction errors.From (18), it is clear that the processing rate allocated to aclass is determined by its class load, its differentiationparameter, and the system load. According to (18), theprediction errors have a greater impact on the achievedslowdown ratio with the increase of the differentiationparameter. The results show that the strategy achieves thetarget slowdown ratios on average. They demonstrate thecapability of the strategy to control the slowdown ratiosbetween classes.


Fig. 3. Achieved slowdown ratios due to the queuing-theoretic strategy.

Fig. 4. Slowdown of individual requests (system load is 90 percent).

Fig. 5. Simulated slowdown ratios with different number of classes.

4.6 Impact of the Shape Parameter and the UpperBound

In this part, we examine the impact of the shape parameter� and the upper bound q of the bounded Pareto distributionon the performance of the strategy. The shape parameterdetermines the correlation of traffic requests. A large shapeparameter implies that the requests are independent witheach other in size. The upper bound reflects the heavy-tailproperty of the bounded Pareto distribution.

Fig. 6a depicts the slowdown of two classes due todifferent shape parameters (1:1 � � � 1:9). The first ob-servation is that the shape parameter has little impact on thedifferentiation predictability due to the proposed rate-allocation strategy. Both classes experienced the differen-tiated slowdown as expected. The difference between theexpected slowdown and the achieved one is independent ofthe shape parameter. This behavior can be explained by thefact that there is no assumption about the shape parameterin the rate-allocation strategy. The second importantobservation is that the slowdown of a class decreases asthe shape parameter increases. Intuitively, with the givenlower and upper bounds, the smaller the shape parameter �,the more bursty the traffic [21]. A request may experiencelarger queuing delay than that from a “smooth” traffic.Formally, by (6), (7), when the shape parameter decreases,itsm�1 decreases, its second momentm2 increases, and thenits expected slowdown increases.

The upper bound q of the bounded Pareto distributionalso affects the experienced slowdown of the M=GP=1traffic. Fig. 6b depicts the simulation results with differentupper bounds. It is seen that the upper bound also has littleimpact on the differentiation predictability due to the rate-allocation strategy; that is, the difference between theachieved slowdown and expected one is independent ofthe upper bound. Second, the higher the upper bound, thelarger the expected slowdown of the classes when the lower

bound p of the bounded Pareto distribution remains thesame. Intuitively, as the upper bound increases, thebounded Pareto distribution becomes more heavy-tailedand the slowdown then increases. By (6), (7), as the upperbound increases, m�1 remains almost unchanged and thesecond moment of the traffic m2 increases. We find that, inthe M=GP=1 traffic model, as the shape parameterincreases, the slowdown decreases; as the upper boundincreases, the slowdown increases.

We also observe that the variance of achieved slowdownratios is affected by the shape parameter and upper boundsettings. Notice that the y-axis is logarithmic. The varianceincreases with the decreasing of the shape parameter andthe increasing of the upper bound. On one hand, for givenlower and upper bounds, a small shape parameter causesmore heavy-tailed workloads [21]. On the other hand, as theupper bound increases, the bounded Pareto distributionbecomes more heavy-tailed. In both cases, the percentage ofrequests that demand a large amount of system resourceincreases. Recall that requests from the same class areprocessed using FCFS discipline. Small requests havehigher probability to be after a large request and experiencelarger delay and slowdown. This further demonstrates thatthe variance observed in Fig. 5 is caused by trafficdynamics.

We have evaluated the performance of the queuing-theoretic strategy under different workload settings. Wevaried the number of classes, the arrival rate ratio of theclasses, the differentiation parameters. We have alsoexamined different methods, such as exponential movingaverage, for load prediction. While we do not present all theresults due to space limitation, it is worth noting that wedid not find any qualitative differences between them.

4.7 Limitations of the Queuing-Theoretic Strategy

Despite the fact that the queuing-theoretic strategy canprovide PSD services in long time-scales, the inaccurateworkload prediction caused by traffic dynamics limits itsperformance in short time-scales. We carried out simula-tions to quantify the strategy’s limitations. Fig. 7 shows thesimulated workloads and the differences between the real


Fig. 6. Impact of the parameters of the bounded Pareto distribution.

(a) Impact of the shpae parameter. (b) Impact of the parameters of the

bounded Pareto distribution.

Fig. 7. The performance of different workload prediction methods.

workloads and predictions. In the moving average method,

the workload of a class was predicted as the average of

measured workloads in the previous WindowSize sampling

periods. We have also examined the performance of the

moving average method with different WindowSize set-

tings and other prediction methods, such as the exponential

moving average method. Because the results show no

qualitative differences, we only present those due to the

moving average method with WindowSize as 5.From this figure, we first observe that, although the

workload is stochastic and changes frequently, the sumof the

differences is very close to zero. Therefore, the queuing-

theoretic strategy is able to provide PSD services in long time-

scales. In short time-scales, however, there are large differ-

ences between the real workloads and the predictions. Such

differences cause inaccurate rate allocation in queuing-

theoretic strategy. Due to the dynamics of workloads, it is

very difficult to accurately predict the workload of a class

from history. Furthermore, queuing-theoretic strategy pro-

vides no means to compensate for the effect of the prediction

errors. Therefore, a large variance is observed in Fig. 3.The prediction errors are the main reason behind large

variance and weak predictability of the PSD services

provided by the queuing-theoretic strategy. As mentioned

before, it is difficult to predict a class’s load from history

information. How to reduce the impact of such prediction

errors motivates the design of our integrated processing-

rate allocation strategy.

5 INTEGRATED RATE-ALLOCATION STRATEGY WITH

FEEDBACK CONTROL

Feedback control provides a sound way to measure and

compensate for the effect of the disturbances introduced by

the prediction errors. Therefore, we take advantage of

feedback control theory to address the limitations of the

queuing-theoretic strategy so as to provide predictable and

fine-grained PSD services.

5.1 Structure of the Integrated Strategy

In our strategy, feedback control theory is integrated into

queuing theory. Its basic structure is presented in Fig. 8.

There are two major components in the system: a rate

predictor and a feedback controller. The rate predictor is to

estimate the processing rate of a class by (18). The feedback

controller is to adjust the rate allocation according to the

difference between the target slowdown ratio and the

achieved one using integral control.

The feedback controller adjusts a class’s processing rateusing integral control. Proportional, integral, and derivativecontrol are the most popular control design techniques [16].The advantage of the proportional controller lies in itssimplicity. It, however, cannot eliminate the steady-stateerror introduced in the process, which causes the targetslowdown ratio to be unachievable. Derivative control takesinto account the change of errors and has good responsive-ness. Its drawback is that it is sensitive to measurementnoises, which are common in the provisioning ofPSD services since we are essentially controlling a prob-ability. Integral control is able to eliminate the steady-stateerror and to avoid overreactions to measurement noises.

To translate the PSD service model into the feedbackcontrol framework shown in Fig. 8, the control loops mustbe determined first. In every loop, the following aspectsneed to be determined: 1) the reference input, 2) the error,and 3) the output of the feedback control. In the integratedstrategy, one class, such as class 1, is selected as the baseclass and a control loop is associated with every other class.This leads to a total of N � 1 control loops in the system.

In the control loop associated with class i, the referenceinput in the kth sampling period is

riðkÞ ¼�iðkÞ�1ðkÞ

: ð20Þ

The output of the server is the achieved slowdown ratio ofclass i to the base class

yiðkÞ ¼SiðkÞS1ðkÞ

: ð21Þ

If the rate allocation calculated by the rate predictor isaccurate enough, the slowdown ratio error can be reducedto zero. In practice, because of the variance of traffic, therate predictor cannot correct the error by itself. We definethe error associated with class i as

eiðkÞ ¼ riðkÞ � yiðkÞ ¼�iðkÞ�1ðkÞ

� SiðkÞS1ðkÞ

: ð22Þ

For class 1, we have

e1ðkÞ ¼ 0 for all k: ð23Þ

Recall that the PSD service model aims to maintainprespecified slowdown ratios between different classes. By(19), we have

Si

S1¼ �i

�1

c1 � �1m1

ci � �im1: ð24Þ

It indicates that slowdown ratios between classes can becontrolled by adjusting their processing-rate ratios. There-fore, the integral feedback controller sums up the errorscaused by previous rate allocation and adjusts the proces-sing-rate ratios of class i to class 1 accordingly. Thefeedback controller output then is

�uiðkÞ ¼ �c1ðkÞciðkÞ

¼ �c1ðk� 1Þciðk� 1Þ þ g � eiðkÞ; ð25Þ

where g is the control gain, which determines the adjust-ment of the processing-rate ratio corresponding to the error.


Fig. 8. The structure of the integrated strategy for PSD service

provisioning.

The adjustment is incorporated with the rate predictoroutput

~uuiðkÞ ¼c1ci:

Thus, the processing-rate allocation for the next samplingperiod kþ 1 is calculated as

c1ðkþ 1Þciðkþ 1Þ ¼ ~uuiðkÞ þ�uiðkÞ ¼

c1ciþ g

ZeiðkÞdk: ð26Þ

By (2), (23), and (26), the processing rate of class i ofsampling period kþ 1 thus is

ciðkþ 1Þ ¼ 1

c1ciþ g

ReiðkÞdk

� ��PN

i¼1 1=c1ciþ g

ReiðkÞdk

� � :ð27Þ

There are three points worth noting here. First, it indicatesthat the processing rate of a class is affected by all errorsincurred in the past, which is the feature of integral control.Second, a class’s processing rate is also affected by theerrors associated with other classes because of the require-ment of the conservation law. Third, it indicates that thecontrol gain g has great impact on the performance of theintegrated strategy, which we shall discuss in Section 5.7.

5.2 Design Issues

In this section, we discuss three design issues of theintegrated strategy. First, recall that PSD services should bepredictable and fair. They require that the maximalprocessing rate allocated to a class must be bounded sothat no other classes will be overloaded. Let bi denote thelimit of class i’s processing rate. The processing rate ofclass i is then subject to constraint

ci � bi; 1 � i � N: ð28Þ

The determination of the rate limit of a class will bepresented in Section 5.6.

Second, the control gain affects the performance of theintegrated strategy significantly. A controller with a largegain can respond quickly to errors. It, however, may causelarge oscillations. Therefore, to reduce the overshootintroduced by the integral control, a small gain is preferred[32]. As we shall demonstrate in Section 5.7, a small controlgain can reduce the overshoot and the target slowdownratio can be quickly achieved as well.

Finally, according to (24), the slowdown ratio betweentwo classes depends on their class workloads. Such anobservation implies that the processing rates should also beadjusted according to class workloads. In our strategy, thisis realized by the rate predictor and the integrated strategycan respond to the workload changes quickly.

5.3 Simulation Configurations

In the simulations, we turned on the feedback controllershown in Fig. 1 to perform the integrated rate-allocationstrategy. The feedback controller performs the integralcontrol. It measures the difference between the targetslowdown ratio and the achieved one for each class andthen calculates rate allocations based on (27). The result

then is compared with its rate limit. In the case that thecalculated rate is larger than its rate limit, the actualprocessing rate is set to the rate limit.

Simulation parameters were set as the same as those inqueuing-theoretic strategy except that the rate predictorestimates a class’s processing rate every 10 samplingperiods.

In order to prevent sudden spikes in the slowdown ratiosample from causing large reactions in the controller, theslowdown ratio is smoothed using an exponential movingaverage method. Since there is no qualitative differencebetween different settings of the weight of currentlymeasured slowdown ratio, in this paper, we only presentthe result where the weight is 0.25.

5.4 Effectiveness of the Integrated Strategy

We first discuss the effectiveness of the integrated strategy byinvestigating itsmacroscopic andmicroscopic behaviors. Thetarget slowdown ratios were set to 2, 4, and 8. Fig. 9 gives thepercentiles of the results due to the integrated strategy.

In comparison with the results due to the queuing-theoretic strategy as shown in Fig. 3, it is clear that theintegrated strategy has much finer-grained control overslowdown ratios than the queuing-theoretic strategy. Forexample, when the system load is 50 percent and the targetis set to 4, the difference between the 95th percentile and thefifth percentile is reduced from 5 in the queuing-theoreticstrategy to 0.5 in the integrated strategy.

Moreover, the observation that the slowdown ratios arealways larger than 1 means the PSD services are predictable,whereas the queuing-theoretic strategy fails to meet thisrequirement. These observations conclude that the integratedstrategy outperforms the queuing-theoretic strategy.

Wenext investigate themicroscopic behaviorsof these twostrategies under a system with a different number of classes.In these simulations, the system load was assumed to be 0.8.Fig. 10 presents the results with two classes. It shows that, inthe integrated strategy, the achieved slowdown ratio starts toconverge to the target value within 60 sampling periodsbecause the processing-rate allocation is adjusted adaptively.On the other hand, the queuing-theoretic strategy fails toprovide PSD services because of traffic dynamics. Note thatthis figure shows the microscopic behaviors (e.g., one run) ofthese two strategies. The queuing-theoretic strategy is able toachieve the target slowdown ratio on average, which isdemonstrated in Fig. 3.

We also examine the impact of both processing-rateallocation strategies on a class’s average queuing delay andshow the results in Fig. 10. It is observed that the integratedstrategy is able to provide proportional delay differentiation


Fig. 9. Achieved slowdown ratios due to the integrated strategy.

services. According to (9), the average slowdown of a classis proportional to its average queuing delay. Therefore,when two classes have the samem�1, their average queuingdelay ratio will be the same as their slowdown ratio. This isa salient property of the integrated strategy: providingdifferentiated services in terms of slowdown and delaysimultaneously. On the other hand, the queuing-theoreticstrategy cannot provide such differentiated services becauseof its inaccurate rate allocation in one run. Note that, whenrequests of different classes follow different service timedistributions, the differentiated delay services are notguaranteed. We have also examined the performance ofboth strategies with three classes. Simulation results showthat the performance of the integrated strategy is consis-tently better than that of the queuing-theoretic one.

5.5 Robustness of the Integrated Strategy

We carried out simulations to investigate the strategy’srobustness under changing workload conditions. In thesimulation, the initial workloads of both classes wereassumed to be 0.4. At the 65th sampling period, theworkload of class 1 was decreased to 0.2 and then changedback after the 115th sampling period. Fig. 11 shows theaverage slowdown of all processed requests. For example,at the 65th sampling period, the result is the achievedslowdown average over all requests processed between the10th and the 65th sampling periods. We observe from Fig. 11that such workload changes have little impact on theperformance of the integrated strategy. This is because the

rate predictor can quickly capture the changes of a class’sworkload and estimate its processing rate accordingly.Based on the estimation made by the rate predictor, thefeedback controller is able to reduce the errors caused bychanging workloads promptly.

We also investigated the integrated strategy’s robustnesswith real trace. In the simulation, the workload wasgenerated based on the World Cup Soccer 98 server logs[3], which is publicly available. The results are presented inFig. 12. Because the simulated workload does not strictlymeet the M=GP=1 model, we observe that the queuing-theoretic strategy fails to provide PSD services. On the otherhand, in the integrated strategy, although the variation isclearly larger than that with the accurate model, the integralcontroller compensates for the effect of the model inaccu-racy and keeps the achieved slowdown ratio around thetarget. Fig. 12 also depicts the results where the ratepredictor was turned off. In comparison with the resultsdue to the integrated strategy, there are small differencesbetween them. This further indicates that the robustness ofthe integrated strategy comes from the integral controller.

5.6 Effect of the Rate Limit

As discussed in Section 5.2, to provide predictable and fairPSD services, the processing rate allocated to a class shouldhave an upper bound. In this section, we examine the effectof the rate limit on the performance of the integratedstrategy and then discuss how to obtain an appropriate ratelimit of a class. To make analysis simple, in the simulations,we assumed there existed two classes that have the sameaverage load in the system with the same rate limit.Moreover, the same request stream was used in thesesimulations to make the results comparable. Fig. 13 showsthe simulation results.

From this figure we observe that, the larger the rate limit,the slower the achieved slowdown ratio converges to thetarget and the larger the variance is. For example, when therate limit was set to 0.8, it took more than 100 sampling


Fig. 10. Simulation results with two classes (�2=�1 ¼ 2).

Fig. 11. Performance under changing environments (�2=�1 ¼ 2).

Fig. 12. Simulation results with real traces (�2=�1 ¼ 4).

Fig. 13. Effect of rate limits (�2=�1 ¼ 2).

periods before the achieved slowdown ratio converged tothe target; when the rate limit was set to 0.6, it converged in30 sampling periods. This is because, with a large rate limitof one class, another class may be overloaded because itsallocated rate is not able to process the arrived requests intime and vice versa. Such an occasional overload causeslarge errors. In order to reduce the errors, the rate allocationin the next sampling period is adjusted, which leads to largeoscillations and slows the convergence.

As the rate limit decreases, the converging rate increases.For example, with the rate limit was set to 0.52, the targetslowdown ratio was reached more quickly than when thelimit was set to 0.6. It, however, yielded steady errors. Weargue that a too small rate limit can damage the capability ofthe integrated strategy to reduce the variance. For example,when the rate limit was set to 0.52, recall that both classes hadthe same rate limit and, due to the constraint of (2), theprocessing rate that could be allocated to a class ranged from0.52 to 0.48. Hence, when an error happened, the integratedstrategy was unable to correct it effectively.

Essentially, the rate limit of a class depends on theprocessing rate demanded by other classes. As mentionedbefore, the first term of (18) is the baseline required by a class.Therefore, it is necessary that the rate allocated to a classshould be small enough to prevent other classes from beingoverloaded. The rate limit of class i then is calculated as

bi ¼ 1�X

j¼1N;j 6¼i

�jm1: ð29Þ

Accordingly, the rate limit of both classes should be 0.6 inthe simulations because the average workload of a class is0.4. This constraint is verified by Fig. 13: With the rate limit0.6, the achieved slowdown ratio quickly converges to thetarget with small variance.

5.7 Determination of the Control Gain

As mentioned before, the control gain g has a significantimpact on the performance of the integrated strategy. Weconducted simulations with a different number of classesand workload conditions. Because there is no qualitativedifference in the results, we present the results of twoclasses and the average workload of a class is 0.25. Weexamined different settings of the control gain, including 1,2, 4, and 16. Due to space limitations, we only present theresults where the gain is set to 1 and 4 in Fig. 14. We observethat the larger the control gain, the faster the resultsconverge to the target slowdown ratio. With a large gain,the error is amplified. The feedback controller thusresponds more quickly than that with a small control gain.

In addition, due to the rate limit of every class, the varianceis small, even with large gains.

A large gain, however, may cause excessive slowdownoscillation of a class. In Fig. 15, the average slowdown ofclass 2 with different gains is presented. The results areaveraged over every five sampling periods. It is clear thatthe average slowdown becomes relative stable after the75th sampling period, when the control gain is set to 1. Inthe situations where the control gain is set to 4, a largevariance can be observed around the 175th and260th sampling periods. Our examination of the resourceallocation also reveals that, with a large control gain, theamount of resource allocated to a class oscillates for a longertime and more frequently than that with a small gain.

As pointed out in Section 5.2, to reduce the overshoots, asmall control gain is preferred. As mentioned before,slowdown is used to measure the service quality receivedby clients. In practice, a small slowdown oscillation isdesirable. Large and frequent changes of slowdown in shorttime-scales frustrate end users. With respect to theseconsiderations, a small control gain, such as 1, is preferred.

5.8 Impact of the Shape Parameter and the UpperBound

In this section, we examine the impact of the shapeparameter (�) and upper bound (q) of the bounded Paretodistribution on the performance of the integrated strategy.Fig. 16 shows the fifth, 50th, and 95th percentiles ofachieved slowdown ratios between two classes withdifferent shape parameter (0 < � < 2) and upper boundsettings (q ¼ 100; 1;000; 10;000).

The first observation is that, with different shapeparameter and upper bound settings, the integratedstrategy can always achieve the target, that is, the50th percentiles are always close to the target. This furtherdemonstrates the effectiveness and robustness of theintegrated strategy.

We also observe that the variance of achieved slowdownratios is affected by the shape parameter and upper bound


Fig. 14. The effect of the control gain on slowdown ratio. Fig. 15. The effect of the control gain on average slowdown.

Fig. 16. Impact of traffic characteristics (�2=�1 ¼ 4).

settings. Similar to the queuing-theoretic strategy’s behaviorshown in Fig. 6, the variance increases with the decrease ofthe shape parameter and the increase of the upper bound.How to further reduce the variance is part of our futurework. One possible solution is to use shortest job first orshortest remaining processing time first algorithms toschedule requests from the same class so as to reduce themean response time and the mean slowdown of smallrequests [11], [17].

6 IMPLEMENTATIONS AND EXPERIMENTAL RESULTS

We implemented the integrated strategy on Apache Webserver 1.3.31 running on Linux 2.6. Apache Web server isnormally executed with multiple processes (or threads). Atthe start-up, a parent server process sets up listeningsockets and creates a pool of child processes. The childprocesses then listen on and accept client requests fromthese sockets. Recall that, by (26), we can control theslowdown ratio between two classes by manipulating theirprocessing-rate ratio. As discussed in Section 2, a virtualserver can be a group of child processes in a multiprocessserver. Since every child process in Apache Web server isidentical, in the implementation, we realize the processing-rate allocation by controlling the number of child processesthat a class is allocated.

6.1 Implementation Structure

The implementation structure of the integrated strategy ispresented in Fig. 17. It consists of a classifier, a ratepredictor, a feedback controller, and a rate allocator.

The classifier determines a request’s class according torules defined by service providers. In our implementation,the rules are based on the request’s header information(e.g., IP address and port number). After being classified, arequest is stored in its corresponding waiting queue.Associated with each waiting queue is a record of trafficinformation, such as arrival rate and service time. Thearrival rate is measured directly from arrived clientrequests. The service time is returned from the Web serverand set by the rate allocator.

The rate predictor obtains the arrival rate and servicetime from waiting queues and estimates the processing ratethat a class should be allocated according to (18).

The feedback controller inferences the achieved slow-down ratios between different classes and calculates theprocessing rate of a class according to (27) by incorporatingthe rate estimation from the rate predictor. It then sets thenumber of allocated processes of a class in the rate allocator.

The rate allocator enforces the processing-rate allocation.In our implementation, the Apache Web server is modifiedto accept requests from a unix domain socket. When a child

process calls accept() on the unix domain socket, a signal issent to the rate allocator. Upon receiving such a signal, therate allocator scans the waiting queues to check if there is abacklogged class whose number of allocated processes islarger than that it is currently occupying. If such a classexists, its head-of-line request and the class type are passedto the child process through the unix domain socket;otherwise, a flag is set. Whenever a new request arrives, theflag is checked first and the waiting queues are rescanned.The rate allocator also increases a counter that records thenumber of occupied processes by the class. After handling arequest, the child process sends the request’s service timeand class type back to the rate allocator. The rate allocatorthen decreases the corresponding counter and stores theservice time in the corresponding waiting queue.

In the simulation, we define the slowdown with respectto client request for generality. In practice, client-perceivedperformance of a Web server is based on a whole Web page,which can be a single file or an html file with multipleembedded objects. According to HTTP/1.1, a client firstestablishes a persistent connection with the Web server andsends requests for a Web page and possible embeddedobjects. Thus, the queuing delay of a Web page is defined asthe time interval between the arrival of a connection requestand the time the established connection is passed to a childprocess of Apache Web server. The service time is definedas the time interval between the passing of the establishedconnection and the time the response of the whole Webpage is sent to the client. We determine the end of theresponse based on the “referer” header field of an HTTPrequest. An embedded object uses this field to indicatewhich Web page it contains. A similar method is alsoproposed in [28].

6.2 Experimental Results

We conducted experiments to measure the performance ofthe integrated strategy. The experimental environmentconsisted of four PCs running on a 100 Mbps Ethernet. EachPC was a Dell PowerEdge 2450 configured with dual-processor (1 GHz Pentium III) and 512 MB main memory.We installed one Apache Web server on one PC, while acommonlyusedWeb traffic generator SURGE [5] ranonotherPCs. In the generated Web objects, the maximum number ofembedded objects in a given page was 150 and thepercentages of base, embedded, and loner objects were30 percent, 38 percent, and 32 percent, respectively. Theworkload of emulated requests was controlled by adjustingthe total number of concurrent user equivalents (UEs) inSURGE.

The Apache Web server was set up with support ofHTTP/1.1. The number of maximal concurrent child serverprocesses was set to be 128. The system was first warmedup for 180 seconds. After that, the controller was turned on.Since the sampling period should be reasonably large sothat the scheduler can determine the effect of resourceallocation on the achieved slowdown ratio, it was set to3 seconds and the rate predictor estimates processing rateevery 10 sampling periods. The control gain in theseexperiments was set to 1, as suggested in Section 5.7.

We first conducted an experiment to evaluate theeffectiveness of the integrated strategy. Fig. 18 shows the


Fig. 17. The implementation structure of the integrated strategy.

workload configurations and corresponding results. In this

experiment, we assumed there were two classes and the

target slowdown ratio was set to 4. The number of UEs was

set to 300. The arrival ratio of class 1 to class 2 was

randomly changed within 1=9 and 9=1 for every 3 seconds.

Fig. 18a depicts the arrival rate of both classes for every

3 seconds. It is to emulate the frequent and abrupt changes

of a class’s workload. From Fig. 18b, we observe that,

although the workload generated by SURGE does not

strictly meet the M=GP=1 model, the achieved slowdown

ratios can still be kept around the target. This demonstrates

that the integrated strategy is effective in providing

PSD services under realistic workload configurations. It

also shows that the suggested small control gain is a good

choice in real systems.We further carried out experiments to evaluate the

performance of the integrated strategy under different

target slowdown ratios and workloads. Fig. 19 shows the

fifth, 50th, and 95th percentiles of achieved slowdown

ratios. In these experiments, the workload has a similar

configuration as shown in Fig. 18a. In Fig. 19, we observe

that the target ratios (2, 4, and 8) can be achieved under

different workload conditions. In the figure, we also

observe that the variance of achieved ratios becomes

relatively large with the increase of workloads. In the

implementation, the average slowdown of a class is

controlled by adjusting the number of its allocated

processes. Because all processes have the same priority,

this method has little impact on the service time of an

individual request. With the increase of workload, a class’s

average queuing delay increases and so do its slowdown

and variance. One possible solution is to control the service

time of an individual request by adjusting the priority of

child processes according to workload conditions. We will

evaluate this in our future implementations.

7 RELATED WORK

Providing differentiated services to network applications

and clients has become popular. In order to provide end-to-end differentiated services, issues on both the network-side

and server-side should be addressed. On the network side,previous efforts focused on providing queuing delay

differentiation and many packet scheduling algorithmshave been proposed to achieve proportional delay differ-

entiation in network routers. Representative algorithmsinclude proportional average delay [13], mean delay

proportional [26], VirtualLength [33], and Little’s averagedelay [34]. These kinds of algorithms can be tailored to

provide differentiated delay services on Internet servers[20]. Another important performance metric is loss rate. The

focus of [19] is on providing differentiated loss services todifferent classes. The authors proposed an algorithm in

which a packet’s drop probability is calculated based on itsarrival process so as to provide differentiated loss services.

On the server side, a primary focus has been on priority-

based request scheduling with admission control fordifferentiated services in terms of response time [2], [9],

[14]. For example, in early work [2], the authors addressedstrict priority scheduling strategies for controlling CPU

utilization in Web content hosting servers. Service differ-entiation was introduced by assigning strict priorities to

requests for different content. The results showed thatservice differentiation can be achieved, but the quality

differences among different classes cannot be quantitativelyguaranteed by the use of strict priority scheduling. One

objective of our work is to guarantee the quality differences.When a system is overloaded, admission control drops

requests from lowerpriority classes so as to guarantee servicequality receivedby those fromhigher priority classes [9], [10],

[20]. In this paper, we mainly focus on providing differ-entiated services when the resource demands of workloads

are within the resource capacity of the system. This iscomplementary to previous work on admission control for

overload protection and service differentiation.Feedback control theory has been applied as an analytic

method for providing differentiated services. In [1], [24],

[30], the authors used system identification to modelInternet servers based on their past behaviors. Then, a

closed-loop system with an integral control law was appliedto provide differentiated services in which different classes

had different average queue length or queuing delay. Incontrast, using feedback control theory aside, our strategy

takes advantage of queuing theory to predict the processing


Fig. 18. Performance of the integrated strategy (�2=�1 ¼ 4). (a) Workload

configuration. (b) Achieved slowdown ratios.

Fig. 19. Performance of the integrated strategy under different

environments.

rate that should be allocated to a class according to itspredicted load conditions.

The integration of queuing theory and feedback controlhas been applied in providing differentiated delay services.In [25], the authors extended the strategy presented in [32]to implement relative delay differentiation services. Thestrategy predicted the expected delay from a class’s loadusing queuing theory and a feedback controller was used toadjust the resource allocation based on the error betweenthe achieved delay ratio and the target one. A connectionscheduler was used as an actuator to control the relativedelays of different classes. The objective of this paper is toprovide differentiated services in terms of slowdown,which needs consideration of queuing delay as well as arequest’s service time.

Slowdown is an important performance metric ofresponsiveness because it is desirable that a request’s delaybe proportional to its processing requirement. In [17], theauthors evaluated online job assignment strategies in adistributed server system, where the workload was heavy-tailed and job size was unknown to the scheduler. Theprimary objective was to minimize mean slowdown of alljobs in the distributed system. In contrast, we want toprovide proportional slowdown differentiation services torequests from different classes. Slowdown is also commonlyknown as normalized response time or stretch factor [18],[38]. In [38], the authors adopted stretch factor as theperformance metric for service differentiation in a cluster ofInternet servers, which are modeled as an M=M=1 queuingsystem. We note that, for an M=M=1 FCFS queue with theunbounded exponential service time density distribution,there is no valid stretch factor or slowdown because m�1 isnonexistent. For an M=M=1 FCFS queue with a boundedexponential service time distribution, there is no closed-form expression for the stretch factor or slowdown becausem�1 only has a definite value when the lower bound andthe upper bound of service time are given. In addition,recent Internet workload measurements indicate that, formany Web applications, the exponential distribution is apoor model for service time distribution and that a heavy-tailed distribution is more accurate [3], [17]. In this paper,we investigate the problem of processing-rate allocation forPSD provisioning under a popular heavy-tailed trafficpattern (bounded Pareto).

Most recently, we developed a self-tuning fuzzy controlapproach for QoS-aware resource management on Internetservers [35]. It assumes no knowledge about the inputtraffic. The approach was implemented in an eQoS frame-work that assisted an Apache Web server in providing aclient-perceived and end-to-end QoS guarantees.

8 CONCLUSION

Slowdown is an important performance metric of Internetservices because it takes into account the delay and servicetime of a request simultaneously. Although delay and lossrate have been studies in QoS-aware design for a long time,there was little work done in providing differentiatedservices in terms of slowdown.

In this paper, we have investigated the problem ofprocessing-rate allocation for PSD provisioning on Internet

servers. We have derived a closed-form expression of theexpected slowdown in an M=G=1 FCFS queuing systemwith bounded Pareto service time distribution. We haveproposed a queuing-theoretic strategy for processing-rateallocation so as to achieve PSD services. Simulation resultshave shown that the queuing-theoretic strategy can guar-antee PSD services on average. In order to improve thepredictability of PSD services and reduce the varianceachieved by the queuing-theoretic strategy, we havedeveloped and integrated a feedback controller into thestrategy. The integrated strategy achieved the objective ofPSD provisioning at a fine-grained level. We futherimplemented the integrated strategy in an Apache Webserver. We carried out experiments using representativeworkloads generated by SURGE. Results have demon-strated that the proposed strategy is feasible and effective inpractice also.

There are several remaining issues and challenges thatwarrant further research. For example, in this paper,requests from the same class are scheduled usingFCFS discipline. The effect of other disciplines, such asshortest remaining processing time first discipline, alsoneeds further investigation.

ACKNOWLEDGMENTS

The authors would like to thank the anonymous reviewersfor their time and valuable suggestions. This work wassupported in part by US National Science Foundation grantACI-0203592 and NASA grant 03-OBPR-01-0049.

REFERENCES

[1] T.F. Abdelzaher, K.G. Shin, and N. Bhatti, “Performance Guaran-tees for Web Server End-Systems: A Control-Theoretical Ap-proach,” IEEE Trans. Parallel and Distributed Systems, vol. 13, no. 1,pp. 80-96, Jan. 2002.

[2] J. Almeida, M. Dabu, A. Manikutty, and P. Cao, “ProvidingDifferentiated Levels of Service in Web Content Hosting,” Proc.ACM SIGMETRICS Workshop Internet Server Performance, pp. 91-102, 1998.

[3] M. Arlitt and T. Jin, “A Workload Characterization Study of the1998 World Cup Web Site,” IEEE Network, vol. 14, no. 3, pp. 30-37,May-June 2000.

[4] M. Arlitt and C.L. Williamson, “Internet Web Servers: WorkloadCharacterization and Performance Implications,” IEEE/ACMTrans. Networking, vol. 5, no. 5, pp. 631-645, Oct. 1997.

[5] P. Barford and M. Crovella, “Generating Representative WebWorkloads for Network and Server Performance Evaluation,”Proc. ACM Sigmetrics, pp. 151-160, 1998.

[6] M.A. Bender, S. Chakrabarti, and S. Muthukrishnan, “Flow andStretch Metrics for Scheduling Continuous Job Streams,” Proc.ACM-SIAM Symp. Discrete Algorithms, pp. 270-279, 1998.

[7] S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, and W. Weiss,“An Architecture for Differentiated Services,” IETF, RFC 2475,Dec. 1998.

[8] C. Chekuri, A. Goel, S. Khanna, and A. Kumar, “Multi-ProcessorScheduling to Minimize Flow Time with � Resource Augmenta-tion,” Proc. ACM Symp. Theory of Computing, pp. 363-372, 2004.

[9] X. Chen and P. Mohapatra, “Performance Evaluation of ServiceDifferentiating Internet Servers,” IEEE Trans. Computers, vol. 51,no. 11, pp. 1368-1375, Nov. 2002.

[10] L. Cherkasova and P. Phaal, “Session-Based Admission Control: AMechanism for Peak Load Management of Commerical WebSites,” IEEE Trans. Computers, vol. 51, no. 6, pp. 669-685, June 2002.

[11] M.E. Crovella, R. Frangioso, and M. Harchol-Balter, “ConnectionScheduling in Web Servers,” Proc. USENIX Symp. InternetTechnologies and Systems, 1999.


[12] M.E. Crovella, M. Harchol-Balter, and C.D. Murta, “Task Assign-ment in a Distributed System: Improving Performance byUnbalancing Load,” Proc. ACM Sigmetrics, pp. 268-269, 1998.

[13] C. Dovrolis, D. Stiliadis, and P. Ramanathan, “ProportionalDifferentiated Services: Delay Differentiation and Packet Schedul-ing,” IEEE/ACM Trans. Networking, vol. 10, no. 1, pp. 12-26, 2002.

[14] L. Eggert and J. Heidemann, “Application-Level DifferentiatedServices for Web Servers,”World Wide Web J., vol. 2, no. 3, pp. 133-142, 1999.

[15] D.G. Feitelson and L. Rudolph, “Metrics and Benchmarking forParallel Job Scheduling,” Proc. Int’l Parallel Processing Symp.Workshop Job Scheduling Strategies for Parallel Processing, pp. 1-24,1998.

[16] G.F. Franklin, J.D. Powell, and A. Emami-naeini, Feedback Controlof Dynamic Systems, fourth ed. Prentice Hall, 2002.

[17] M. Harchol-Balter, “Task Assignment with Unknown Duration,”J. ACM, vol. 49, no. 2, pp. 260-288, 2002.

[18] M. Harchol-Balter, B. Schroeder, N. Bansal, and M. Agrawal,“Size-Based Scheduling to Improve Web Performance,” ACMTrans. Computer Systems, vol. 21, no. 2, pp. 207-233, May 2003.

[19] Y. Huang and R. Guerin, “A Simple FIFO-Based Scheme forDifferentiated Loss Guarantees,” Proc. Int’l Workshop Quality ofService, pp. 96-105, 2004.

[20] S.C. Lee, J.C. Lui, and D.K. Yau, “A Proportional-Delay Diffserv-Enabled Web Server: Admission Control and Dynamic Adap-tion,” IEEE Trans. Parallel and Distributed Systems, vol. 15, no. 5,pp. 385-400, May 2004.

[21] W.E. Leland, M.S. Taqqu, W. Willinger, and D.V. Wilson, “On theSelf-Similar Nature of Ethernet Traffic (Extended Version),” IEEE/ACM Trans. Networking, vol. 2, no. 1, pp. 1-15, Feb. 1994.

[22] M.K. Leung, J.C. Lui, and D.K. Yau, “Adaptive Proportional DelayDifferentiated Services: Characterization and Performance Eva-luation,” IEEE/ACM Trans. Networking, vol. 9, no. 6, pp. 801-817,Dec. 2001.

[23] Z. Liu, N. Niclausse, and C. Jalpa-Villanueval, “Traffic Model andPerformance Evaluation of Web Servers,” Performance Evaluation,vol. 46, nos. 2-3, pp. 77-100, Oct. 2001.

[24] C. Lu, T.F. Abdelzaher, J.A. Sankovic, and S.H. Son, “A FeedbackControl Approach for Guaranteeing Relative Delays in WebServers,” Proc. IEEE Real-Time and Embedded Technology andApplications Symp., 2001.

[25] Y. Lu, T.F. Abdelzaher, C. Lu, L. Sha, and X. Liu, “FeedbackControl with Queuing-Theoretic Prediction for Relative DelayGuarantees in Web Servers,” Proc. IEEE Real-Time and EmbeddedTechnology and Applications Symp., pp. 208-217, 2003.

[26] T. Nandagopal, N. Venkitaraman, R. Sivakumar, and V. Bhar-ghavan, “Delay Differentiation and Adaptation in Core StatelessNetworks,” Proc. IEEE Infocom, pp. 421-430, 2000.

[27] K. Nichols, V. Jacobson, and L. Zhang, “A Two-Bit DifferentiatedServices Architecture for the Internet,” IETF, RFC 2638, July 1999.

[28] D.P. Olshefski, J. Nieh, and E. Nahum, “ksniffer: Determining theRemote Client Perceived Response Time from Live PacketStreams,” Proc. Usenix Operating Systems Design and Implementa-tion, 2004.

[29] A.K. Parekh and R.G. Gallager, “A Generalized Processor SharingApproach to Flow Control in Integrated Services Networks: TheSingle-Node Case,” IEEE/ACM Trans. Networking, vol. 1, no. 3,pp. 344-357, 1993.

[30] S. Parekh, N. Gandhi, J. Hellerstein, D. Tilbury, T. Jayram, and J.Bigus, “Using Control Theory to Achieve Service Level Objectivesin Performance Management,” J. Real-Time Systems, pp. 127-141,2002.

[31] V. Paxson and S. Floyd, “Wide Area Traffic: The Failure of PossionModeling,” IEEE/ACM Trans. Networking, vol. 3, no. 3, pp. 226-244,June 1995.

[32] L. Sha, X. Liu, Y. Lu, and T.F. Abdelzaher, “Queuing Model BasedNetwork Server Performance Control,” Proc. IEEE Real-TimeSystems Symp., pp. 81-90, 2002.

[33] J. Wei, Q. Li, and C. Xu, “VirtualLength: A New PacketScheduling Algorithm for Proportional Delay Differentiation,”Proc. Int’l Conf. Computer Comm. and Network, pp. 331-336, 2003.

[34] J. Wei, C. Xu, and X. Zhou, “A Robust Packet SchedulingAlgorithm for Proportional Delay Differentiation Services,” Proc.IEEE Globecom, 2004.

[35] J. Wei and C. Xu, “A Self-Tuning Fuzzy Control Approach forEnd-to-End QoS Guarantees in Web Servers,” Proc. Int’l WorkshopQuality of Service, 2005.

[36] X. Zhou, J. Wei, and C. Xu, “Processing Rate Allocation forProportional Slowdown Differentiation on Internet Servers,” Proc.Int’l Parallel and Distributed Processing Symp., 2004.

[37] X. Zhou and C. Xu, “Harmonic Proportional Bandwidth Alloca-tion and Scheduling for Service Scheduling on StreamingServers,” IEEE Trans. Parallel and Distributed Systems, vol. 15,no. 9, pp. 835-848, Sept. 2004.

[38] H. Zhu, H. Tang, and T. Yang, “Demand-Driven ServiceDifferentiation in Cluster-Based Network Servers,” Proc. IEEEInfocom, pp. 679-688, 2001.

Jianbin Wei received the BS degree in compu-ter science from Huazhong University of Scienceand Technology, China, in 1997 and the MSdegree in computer engineering from WayneState University in 2003. He is a PhD candidatein the Department of Electrical and ComputerEngineering at Wayne State University, Detroit,Michigan. His research interests are in distrib-uted and Internet computing systems. He is astudent member of the IEEE.

Xiaobo Zhou received the BS, MS, and PhDdegrees in computer science from NanjingUniversity, China, in 1994, 1997, and 2000,respectively. He is an assistant professor in theDepartment of Computer Science, University ofColorado at Colorado Springs. His researchinterests are in distributed and Internet comput-ing systems, broadband multimedia applica-tions, and networking security. He hasauthored/coauthored more than 30 papers in

these areas. He was the founding program cochair of the IEEEInternational Workshop on Security in Systems and Networks (SSN).He was the guest coeditor of the Journal of Network and ComputerApplications for distributed denial of service and intrusion detection. Hehas served on the technical program committees of numerousinternational conferences, including the most recent IEEE ICCCN ’05and MDC ’05. His research was supported in part by the US Air ForceResearch Laboratory. He was a visiting scientist in 1999 and apostdoctoral research associate in 2000 at Paderborn Center forParallel Computing, University of Paderborn, Germany. From January2001 to August 2003, he was a visiting assistant professor in theDepartment of Computer Science, Wayne State University, Detroit. Heis a member of the IEEE Computer Society.

Cheng-Zhong Xu received the BS and MSdegrees from Nanjing University in 1986 and1989, respectively, and the PhD degree from theUniversity of Hong Kong in 1993, all in computerscience. He is currently an associate professorin the Department of Electrical and ComputerEngineer of Wayne State University, Detroit,Michigan. He has published more than 80 peer-reviewed journal and conference papers in theareas of distributed and parallel computing, and

scalable and secure Internet services. He is the author of the bookScalable and Secure Internet Services and Architecture (CRC Press,2005) and the leading coauthor of the book Load Balancing in ParallelComputers: Theory and Practice (Kluwer Academic, 1997). He was aguest coeditor of the Journal of Parallel and Distributed Computing onscalable Web services in October 2003. He has also served on theprogram committees of numerous international conferences andeditorial boards of international journals. His research was supportedin part by the US National Science Foundation and NASA. He is arecipient of the “Faculty Research Award” of Wayne State University in2000, the President’s Awards for Excellence in Teaching in 2002, andthe Career Development Chair Award in 2003. He is a senior member ofthe IEEE and a member of the ACM.

. For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.


964 ieee transactions on computers, vol. 54, no. 8, …zbo/publications/zhou-tc05.pdf · robust...

Documents