bahati et al. 2007 - policy-driven autonomic management of multi-component systems

Upload: egonfish

Post on 13-Apr-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 Bahati Et Al. 2007 - Policy-Driven Autonomic Management of Multi-Component Systems

    1/15

    Policy-driven Autonomic Management of Multi-component

    Systems

    Raphael M. Bahati, Michael A. Bauer, Elvis M. Vieira

    Department of Computer Science

    The University of Western Ontario,

    London, ON N6A 5B7, CANADA

    Email: {rbahati;bauer;elvis}@csd.uwo.ca

    Abstract

    Policies have been proposed as a means to ex-press required or desired behavior of systemsand applications, and possible management ac-tions for resolving violations, to an autonomicmanager. In multi-component systems, such ase-commerce systems, independent sets of poli-cies often deals with managing the behavior ofthe individual components. In turn, the au-tonomic management system uses the policiesto make decisions on what actions to take per

    component when a policy is violated. Duringoperation of these multi-component systems,however, these independent sets of policies mayyield multiple directives from which the auto-nomic manager must select one or more appro-priate actions. In this work we look at heuris-tics that an autonomic manager might use toselect an action. We outline the design and im-plementation of an autonomic manager makinguse of these heuristics and describe our expe-riences with it in a dynamic Web server. Ex-perimental results are reported comparing theeffectiveness of the heuristics.

    Copyright c 2007 Raphael M. Bahati, Michael A.

    Bauer, and Elvis M. Vieira. Permission to copy is

    hereby granted provided the original copyright notice

    is reproduced in copies made.

    1 IntroductionTodays systems are becoming increasinglycomplex, both in terms of the infrastructureunder which they operate and what the usersof these systems expect. As a result, man-aging such systems has become quite a chal-lenge. Our interest is in the development ofmechanisms for automating the managementprocess to enable the efficient operation of sys-tems and the utilization of services. Policy-based management offer significant benefit tothis effect since it makes it easy to define and

    modify systems behavior, at run-time, throughpolicy manipulation rather than through re-engineering [11]. In autonomic management,policies are often used to express required ordesired behavior of systems and applications.As such, policies can be input to or embeddedwithin the autonomic management elements ofthe system to provide the kinds of directiveswhich an autonomic manager could make useof in order to meet operational requirements.

    Within multi-component systems, such as e-commerce systems, policy specification is fre-quently component-based, that is, focused on

    the operational requirements of a particularsystem component, e.g. a Web server or adatabase. In such systems, it is quite com-mon for multiple components to co-exist on asingle server and cooperate to deliver a set ofservices. Hence, it is reasonable to expect thateach component would have its own set of asso-

    1

  • 7/27/2019 Bahati Et Al. 2007 - Policy-Driven Autonomic Management of Multi-Component Systems

    2/15

    ciated policies. (While it might be possible todefine policies that deal with multi-componentbehavior, we note that these can be difficult todetermine as the number of components andinteractions grow.) There is, therefore, a like-lihood that the autonomic manager may have

    to deal with the possibility of multiple direc-tives from which one or more appropriate ac-tions must be selected.

    In multi-component systems, when opera-tional requirements are not met, several poli-cies associated with the different componentsmight be violated. In such situations, the au-tonomic manager must decide on which policy(or policies) are most important at that timeand/or which actions, e.g. adjustments to tun-ing parameters, should be taken. This task,however, is highly non-trivial, particularly indynamic environments in which the workload

    characteristics change and where the numberof components and their interaction change aswell. This work looks at heuristics that an au-tonomic manager might use to select an ac-tion. We outline the design and implementa-tion of an autonomic manager making use ofthese heuristics, describe its use in managing adynamic Web server environment comprised ofApache [1], PHP [5], and a database server, andreport on experimental results on the effective-ness of the heuristics. The heuristics are basedon the structure of the policies and should,therefore, be applicable to other domains.

    The remainder of this paper is organizedas follows: In Section 2, we review relatedwork. In Section 3, we describe the structureof policies that we assume in our work, pro-vide some examples and outline heuristics forselecting a policy and actions when multiplepolicies are violated. In Section 4, we brieflydescribe the main components of our policy-driven autonomous management system. Sec-tion 5 describes the implementation of the pro-posed mechanisms and Section 6 reports on ex-periments. Finally, Section 7 provides a brief

    summary and outlines some future directions.

    2 Related Work

    Policy-based management has received signif-icant interest in recent years. Within auto-

    nomic computing, policies can be consideredto fall into three main categories [13]: Ac-tion policies (or Obligation policies [11])take the form of if [conditions] then [actions]where the policy defines possible actions thatcould be taken whenever certain conditions are

    satisfied; that is, the policy is violated. Goalpolicies specify the desired state(s) and let thesystem itself figure out how to achieve the de-sired behavior. Utility function policies de-fine an objective function that aims to modelthe behavior of the system at each possiblestate. While goal and utility function policieshave attracted a lot of interest in the recentpast (see, for example, [12, 13, 19]), they re-main quite difficult to elicit [13] particularlybecause it is often quite difficult to developmodels that accurately capture the dynamics ofcomplex system [18]. This is particularly true

    in e-commerce systems where multiple compo-nents may co-operate to deliver a set of servicesand where complex interactions may arise dueto changes in configuration and workload char-acteristics. Our interest is on the use of actionpolicies (referred herein asexpectation policies)since they can be defined and modified on a percomponent basis and can provide useful infor-mation for autonomic managers.

    Most of the research on the use of action poli-cies has focused on the specification and useas is within systems and where changes tothe policies is only possible through manual in-tervention. In an environment where multiplesets of policies may co-exist and where, at run-time, multiple policies may be violated, policyselection is often based on statically configuredpolicy priorities which an administrative usermay have to explicitly specify. However, as sys-tems behavior becomes more complex, deter-mining priorities among policies applicable tomultiple components becomes more challeng-ing. It is inherently necessary, therefore, forautonomic systems to have mechanisms to dy-namically adjust the way they use policies to

    deal with not only changes in the configurationof the managed environment, but also the un-predictability in workload characteristics.

    An autonomic system can change the way ituses policies in three broad ways (as outlinedin [15]): (1) the system can adapt policies bymaking changes to the policy parameters; (2)

    2

  • 7/27/2019 Bahati Et Al. 2007 - Policy-Driven Autonomic Management of Multi-Component Systems

    3/15

    the system can use runtime context to dynam-ically use the information provided by the poli-cies, such as enabling/disabling a policy undercertain circumstances or selecting actions basedon context; (3) or the system can determinepolicy use through some learning mechanisms.

    The focus of this paper is on the second of theseapproaches and differs from most of the work inthis area (see, for example, [15, 20]) in that, ourfocus is on the selection of actions in situationswhere multiple policies could be violated.

    3 Policies and Heuristic Se-

    lection Strategies

    As with much of the previous work on policy-driven management ([11, 14]), we assume thatour policies are essentially condition-action

    pairs. In particular, we assume that a policypi associated with a particular component orthe system itself is defined as follows:

    pi = (Ci, Ai) (1)

    whereCi is a set of conditions andAi is an or-dered list of actions, each of the form (ai, ti),whereaiis an action that makes an adjustmentto some tuning parameter and ti is aninvalida-tiontest which evaluates to true or false. Thetest can be used to determine if the componentstate or context invalidates the particular ac-

    tion. An example would be if a particular tun-ing parameter could not be decreased beyond apreset limit, such as decreasing the maximumallowable request rate below 50 (see, for exam-ple, the policy with ID 10 in Table 1). We fur-ther assume that our conditions are all thresh-old conditions, i.e., of the form ci op Vt(ci),where op is a relational operator and Vt(ci) iscondition cis threshold. Several sample poli-cies are illustrated in Table 1 where they areexpressed in a stylized if condition then ac-tion form; alternative actions are separated bya vertical bar (|). Since the order is impor-

    tant, more drastic actions could be taken onceit is no longer possible, for example, to meetthe objectives through tuning applications pa-rameters. This is precisely the purpose of theactionAdjustMaxRequestRatewhich throttlesrequests to the server by reducing the rate atwhich the server accepts clients requests.

    In an environment where multiple applica-tions (possibly with multiple tuning parame-ters) exist, the cause of a violation may be dif-ficult to identify, let alone resolve since eachapplication may define its own set of policieson how to deal with violations. For exam-

    ple, a violation in Apaches response time maybe addressed in three different ways as illus-trated by the policies with ID 10, 20, and 30in Table 1; First, by increasing the numberof server processes (MaxClients), reducing theamount of time a client holds onto a serverprocess (MaxKeepAliveRequests), or throt-tling requests to the server (MaxRequestRate).Second, by increasing the PHPs cache size(EaccMemSize). Third, by increasing theamount of physical memory used to in-dex database tables (KeyBufferSize) or theamount of physical memory used to cache

    query results (QueryCacheSize). A criticalquestion then is, given these multiple direc-tives, how should the autonomic manager goabout selecting appropriate actions to try toenforce the desired behavior?

    In the simplest case where only a single pol-icy is violated, the choice of actions is simpler,namely, the actions advocated by the vio-lated policy; in many cases this may only beone. In more complex scenarios where multi-ple policies (possibly advocating conflicting ac-tions) are violated, however, this task is not as

    trivial. While enforcing all the actions advo-cated by the violated policies might be a possi-bility, it is not the best strategy for several rea-sons. First, in our current approach, we permitonly a single action within a single expectationpolicy to be executed. This is a strategy ofdoing something simple and seeing if thereis a positive effect. If the change is not suffi-cient, then a violation is likely to occur againand a further action (which could be the same,e.g. increasing or decreasing the value of theparameter) can be taken. The management cy-cle in the implementation is short enough that

    this can happen quickly. It is also worth not-ing that, at any one point, there is only onebottleneck component. Second, taking multi-ple actions makes it difficult to understand theimpact of the actions, e.g. were they all neces-sary, were some more effective than others, etc.By having the autonomic manager take a single

    3

  • 7/27/2019 Bahati Et Al. 2007 - Policy-Driven Autonomic Management of Multi-Component Systems

    4/15

    action and log that action and other informa-tion, an analysis component can examine thatinformation and possibly determine which ac-tion(s), or the order thereof, is better, etc. Wehave began addressing some of these issues [9].

    Our approach to evaluating policy actions re-

    quires an estimate of the value of an action.This is particularly useful when it is necessaryto differentiate one action from another, giventhat multiple, and sometimes conflicting, ac-tions may be advocated by the violated poli-cies. This value may be derived from eitherthe temporal characteristics of the environmentor the long-term experience in the use of pol-icy actions. This paper focuses mainly on theformer, and considers several factors in com-puting this value, which is referred hereafter asthestrength of the action.

    The severity of the violation: Ratherthan treating each violation equally, moreweight is assigned to those violations thatare more severe. The severity measure isbased on the value of the metric relative tothe conditions threshold. For example, for aCPU utilization of 100% given the conditionCPU:utilization > 85% (i.e., due to theviolation of the policy with ID 31 in Table 1)is computed based on the difference betweenthe measured value and its threshold value (i.e.,15%). This is defined by Equation 4 and willbe elaborated on shortly.

    The significance of the violation: Inthe case that multiple policies are violated, itmay be desirable to assign a higher priority orweight to a particular event so that the man-agement system can first respond to such a vi-olation (i.e., by selecting appropriate policy ac-tions) before dealing with other less-importantviolations. For instance, it is quite reasonableto respond to CPU utilization violations beforeaddressing violations related to response timesince failure to address the former may resultin more severe violations of the latter as a re-sult of overutilization of CPU resources. This

    is done by allowing a weight to be associatedwith events which then become weights on theconditions that become true in violated poli-cies. The weight associated with a condition ciis defined in the following as WC(ci); for exper-iments reported in this paper, all policy condi-tions were given an equal weight.

    The advocacy of the action: In the casethat multiple policies are violated, it might bepossible that more than one policy advocatethe same action. For example, in our currenttest environment involving the Apache serverand other components (see Section 5.1), differ-

    ent policies with different conditions (see, forexample, policies with ID 10 and 11 in Table1) may indicate that the same action be taken,e.g. adjustMaxRequestsRate, which adjuststhe maximum number of requests the servercan process. The number of policies advocat-ing the action as well as the position of theaction within each policy are also consideredwhen evaluating its strength. The position isof particular interest since, in our experience, itis often the case that more drastic actions arenot taken until other actions to adjust tuningparameters have been tried. This is precisely

    the case for the policy with ID 10, for exam-ple, where requests are throttled after it is nolonger possible to adjust the MaxClients andMaxKeepAliveRequestsparameters.

    The specificity of the policy: In a sit-uation where several policies are violated, thenumber of conditions within each policy couldbe taken into consideration. For example, inthe event both CPU utilization and responsetime are violated, the policy with both condi-tions violated would be considered as more sig-nificant. For example, as a result of violatingpolicies with ID 10 and 11 in Table 1, addi-tional weight will be given to policy 11.

    Then, thestrength of a policy action, ai,is computed as follows:

    Q(ai) =

    pj[Pv]ai

    S(pj) WA(ai)

    n([Pv]ai) (2)

    where Pv is the set of policies that have beenviolated, [Pv]ai is the subset of Pv advocat-ing action ai, n([Pv]ai) is the size of that sub-set,WA(ai) is the weight of action ai based onits position within policy pj , and S(pj) is the

    strength of conditions of policy pj as specifiedby Equation 3:

    S(pj) =cipj

    WC(ci) V(ci) (3)

    where WC(ci) is the weight of condition cibased on the significance of its violation, and

    4

  • 7/27/2019 Bahati Et Al. 2007 - Policy-Driven Autonomic Management of Multi-Component Systems

    5/15

    Group ID Policy RuleApache 10 expectation policyRESPONSETIMEViolation(PDP,PEP)

    if (APACHE:responseTime > 2000.0 ms) & (APACHE:responseTimeTREND > 0.0)then{AdjustMaxClients(+25)test{newMaxClients < 151} |

    AdjustMaxKeepAliveRequests(-30)test{newMaxKeepAliveRequests > 1} |

    AdjustMaxRequestRate(-25)test{newMaxRequestsRate > 49}}11 expectation policyCPUandRESPONSETIMEViolation(PDP,PEP)if (CPU:utilization > 85.0%) & (CPU:utilizationTREND > 0.0) &

    (APACHE:responseTime > 2000.0 ms) & (APACHE:responseTimeTREND > 0.0)then{AdjustMaxKeepAliveRequests(-30)test{newMaxKeepAliveRequests > 1} |

    AdjustMaxRequestRate(-25)test{newMaxRequestsRate > 49}}PHP 20 expectation policyRESPONSETIMEViolation(PDP,PEP)

    if (APACHE:responseTime > 2000.0 ms) & (APACHE:responseTimeTREND > 0.0)then{AdjustEaccMemSize(+1 M)test{availableEaccMemSize < 1 M} &

    test{newEaccMemSize < 32 M}}21 expectation policySERVERnormal(PDP,PEP)

    if (CPU:utilization < 10.0%) & (MEMORY:utilization < 25.0%) &(APACHE:responseTime < 200.0 ms) & ! (APACHE:refusedRequests > 0.0)

    then{AdjustEaccMemSize(-1 M)test{newEaccMemSize > 16 M}}MySQL 30 expectation policyRESPONSETIMEViolation(PDP,PEP)

    if (APACHE:responseTime > 2000.0 ms) & (APACHE:responseTimeTREND > 0.0)then{AdjustKeyBufferSize(+1 M)test{availableKeyBlocks > 1000} &

    test{newKeyBufferSize < 16 M} |AdjustQueryCacheSize(+1 M)test{availableQueryCacheMem < 1 M} &

    test{newQueryCacheSize < 32 M}}31 expectation policyCPUViolation(PDP,PEP)

    if (CPU:utilization > 85.0%) & (CPU:utilizationTREND > 0.0)then{AdjustThreadCacheSize(+50)test{newThreadCacheSize < 201}}

    Table 1: A subset of expectation policies used to manage the LAMP server.

    V(ci) is the severity of condition cis violation.This value is computed as follows:

    V(ci) =

    Vc(ci) Vt(ci)y (4)

    where Vc(ci) is the value of the metric in thecurrent event responsible for violating condi-tion ci, Vt(ci) is the threshold value of condi-tion ci, and

    y=

    1, Vt(ci) = 0Vt(ci), otherwise

    (5)

    An immediate consequence of using an algo-rithm to select an action based on the aboveis that the autonomic systems behavior in theuse of policies becomes dynamic. That is, the

    same set of violations could, for example, re-sult in different actions being taken dependingon their current strength. This is in contrastto static approaches where the order of the ac-tions is always the same for the same set ofviolations.

    4 The Autonomous Man-

    agement Framework

    A general framework for the policy-driven au-tonomic management is depicted in Figure 1.The architecture was first proposed in [8] andfurther examined in [6] in the context of self-configuring and self-tuning the Apache Webserver. We briefly highlight the key features of

    5

  • 7/27/2019 Bahati Et Al. 2007 - Policy-Driven Autonomic Management of Multi-Component Systems

    6/15

    the architecture and the relevant interactionsbetween the different components.

    Figure 1: A framework for policy-driven auto-nomic management.

    4.1 Architectural Components

    Knowledge Base: This component is ashared repository for system policies and otherrelevant information. This may include infor-mation for determining corrective actions forresolving QoS violations as well as configur-

    ing systems and applications. The informationabout policies is eventually distributed to othermanagement components, and then realized asactions driving autonomic management.

    Monitor (M): Monitors gather perfor-mance metric information of interest for themanagement system such as resource utiliza-tion, response time, throughput and other rel-evant information.

    Monitor Manager: This component dealswith the management of Monitors, includinginstantiating (i.e., loading and starting) a mon-itor for a certain resource type to be moni-

    tored as well as providing the context of moni-toring (i.e., monitoring frequency or time in-terval for periodic monitoring or monitoringtimes for scheduled monitoring). In addition,the Monitor Manager allows monitors to be re-configured (i.e., enabling/disabling a monitor,adjusting the context of monitoring, etc.). It

    also process monitor events and report the de-tails to the Event Handler.

    Event Handler: The Event Handler dealswith the processing of events from the Mon-itor Manager to determine whether there areany QoS violations (based on the enabled pol-

    icy conditions) and forwarding appropriate no-tifications to the interested components. Thisincludes notifying the PDP of any violationsas well as forwarding information to the EventLog for archiving.

    Policy Decision Point (PDP):This com-ponent is responsible for deciding on what ac-tion(s) to take given one or more violationevents from the Event Handler. This includesdeciding which policy, if any expectation policyhas been violated, was most important andthen what action to take.

    Policy Enforcement Point (PEP): This

    component maps the actions subscribed by thePDP to the executable elements correspondingto the various Effectors.

    Effector (E): Effectors translate the policydecisions, i.e. corrective actions, into adjust-ment of configuration parameters to implementthe corrective actions.

    Event Log: This component archives tracesof the management systems events onto (1)an event log in the memory for capturing re-cent short term events, and (2) a persistentevent log on disk for capturing long term his-tory events for later examination. Such eventsmay include QoS requirements violations fromthe Event Handler, records of decisions madeby the PDP in response to the violations, theactions enforced by the PEP, and other relevantmanagement events.

    Event Analyzer: This component corre-lates the events with respect to the contexts,performs trend analysis based on the statisticalinformation, and models complex situations forcausality analysis and predictive outcomes ofcorrective actions, to enable the PDP to learnfrom past, predict future and make appropriate

    trade-offs and optimal corrective actions.

    4.2 Component Interactions

    Having described the building blocks of thepolicy-driven autonomous management sys-tem, we describe the interactions between the

    6

  • 7/27/2019 Bahati Et Al. 2007 - Policy-Driven Autonomic Management of Multi-Component Systems

    7/15

    components and the steps taken to resolve vi-olations. For the purpose of this section, itshould be assumed that the components havealready been initialized with appropriate poli-cies. (A brief description of the initializationprocesses can be found in Section 5.3.)

    1. The Monitors collect and forward the met-ric information to the Monitor Manager ateach polling interval.

    2. The Monitor Manager processes the re-ceived events (i.e., computes averages andtrends based on historical data) and thenforwards the processed information to theEvent Handler.

    3. The Event Handler checks to see whetherany of the QoS requirements may havebeen violated. The violation notifications

    are then forwarded to the PDP.

    4. During each management interval (10 sec-onds), the PDP collects all the violationmessages from the Event Handler. Themessages are then processed to determinewhether any of its enabled expectationpolicies has been violated. The PDP thenselects corrective actions, from the actionsadvocated by the violated policies, and for-wards the selected actions to the PEP.

    5. On receiving the policy actions, the PEP

    performs tests associated with each action,if any, and if successful, invokes the appro-priate Effector to perform the actual ad-

    justment to the systems or applicationsparameter(s).

    A detailed look at the functionality of thePDP follows.

    4.3 Determining Corrective Pol-

    icy Actions

    Central to the functionality of the PDP is the

    need to determine what actions to take giventhat policies are violated. Algorithm 1 de-scribes the steps the PDP takes at each man-agement interval to determine the correctiveactions for resolving QoS requirements viola-tions. The steps of the algorithm can be sum-marized as follows:

    1. Form a set [Pv] of violated policies (fromthe enabledexpectation policies) based onthe violations events in [E] (lines 1 - 12).A policy is said to be violated if all itsconditions evaluate to true when matchedagainst the violation events in [E].

    Algorithm 1 Actions Selection Algorithm

    Input: [E] - a set of violation events.Input: [P] - a set of expectation policies.Output: [Av] - a set of unique policy actions

    sorted by Q(ai).1: foreach policy pi [P] do2: for each condition cj pi do3: foreach event ek[E] do4: if cj is TRUE then5: break6: end if7: end for8: end for9: if all conditions [C] pi are TRUE

    then10: [Pv] pi11: end if12: end for13: foreach policy pi [Pv] do14: for each condition cj pi do15: foreach event ek[E] do16: if cj is TRUE then17: Compute V(cj) (see Equation 4)18: break

    19: end if20: end for21: end for22: end for23: foreach policy pi [Pv] do24: [Av] ai pi25: end for26: foreach action ai [Av] do27: Compute Q(ai) (see Equation 2)28: end for29: Sort [Av] byQ(ai)30: Send [Av] to the PEP

    2. Compute the severity of each condition in[Pv] using the values of the violation eventsin [E] (lines 13 - 22). The set [E] consistsof unique violation events. (If the sameviolation occurs more than once during theinterval, the average value is used.)

    7

  • 7/27/2019 Bahati Et Al. 2007 - Policy-Driven Autonomic Management of Multi-Component Systems

    8/15

    3. Form a set [Av] of unique policy actionsbased on the actions advocated by the vi-olated policies in [Pv] (lines 23 - 25).

    4. Compute the strength of each policy ac-tion in [Av] (lines 26 - 28), by taking into

    account the factors listed in Section 3.5. Sort the actions in [Av] by their strength

    (line 29). The aim here is to ensure thatactions with the highest strength value aretried first.

    6. Forward the sorted actions set [Av] to thePEP (line 30). Since, only a single actionis executed, assuming it passes the tests,the order in which the actions are arrangedis of great importance.

    5 Implementation and Ex-perience

    In our previous work [7], we presented a studyon the management of a basic system com-prised of an Apache Web server. The in-vestigation mainly focused on the behavior ofApache while serving static content (i.e., staticHTML pages). We extend this work by in-vestigating performance behavior specific to amulti-tiered Web-server environment under thechanges proposed in Section 4.3.

    5.1 Managed Applications

    Our investigation primarily focuses on the per-formance management of a LAMP server con-sisting of Linux, Apache, MySQL [3], and PHP.The PHP performance has been further en-hanced with the eAccelerator [2] encoder. Thismodule provides mechanisms for caching com-piled scripts so that later requests invoking sim-ilar scripts do not incur compilation penalty.Of interest include theEaccMemSizeparameterwhich controls the size of the memory cache.

    (The actual adjustment to the parameter isdone by editing the PHP configuration file andgracefullyrestarting Apache.) We use the PHPBulletin Board (phpBB) application [4] to gen-erate dynamic Web pages. This applicationutilizes queries to display information storedinside a database, in our case, the MySQL

    database. The main database tables includeforums, topics, posts, users, and groups. Thesetables are used to store information specificto discussions. In addition to viewing forum-related information, users may post messagesusing forms, which can be viewed through a

    Web browser.We have implemented mechanisms for man-

    aging the performance of the MySQL databaseby developing appropriate Effectors for ad-

    justing important database tuning parameters.(The server has support for dynamic adjust-ment of these parameters and does not re-quire restarting.) Our initial work has focusedon a few parameters whose tuning has beenshown to greatly impact the performance of thedatabase. (For a comprehensive description ofthe MySQL tuning parameters, the reader isreferred to [3].) Among the Effectors imple-

    mented include those for manipulating the fol-lowing parameters: The KeyBufferSize cor-responds to the maximum amount of phys-ical memory used to index database tables.The ThreadCacheSize corresponds to the to-tal number of threads the database server maycache for reuse. Thus, instead of creating a newthread for each request to the database, theserver uses the available threads in the cache tosatisfy the request. This has the advantage ofimproving the response time as well as the CPUutilization. The QueryCacheSize correspondsto the maximum amount of physical memoryused to cache query results. Thus, a similarquery to previously cached results will be ser-viced from memory and not from disk. Finally,the MaxConnections corresponds to the max-imum number of simultaneous connections tothe database.

    5.2 Workload Generator

    To simulate the stochastic behavior of users, wehave modified the Apache load generator tool(ab) in the several aspects: (1) It uses thread-

    ing mechanisms to support concurrent and in-dependent keep-alive requests to the server. (2)It emulates the stochastic behavior of userswith a Poisson requests inter-arrival distribu-tion. The tool takes as its input the maxi-mum value of the clients think time. (3) It al-lows for dynamic adjustment of both the upper-

    8

  • 7/27/2019 Bahati Et Al. 2007 - Policy-Driven Autonomic Management of Multi-Component Systems

    9/15

    limit of the clients think time and the num-ber of concurrent requests to the server usingdistributions provided. (4) It emulates the ac-tual behavior of users by traversing the Webgraph of an actual Web site. Thus, for eachresponse from the server, the tool randomly se-

    lects which subsequent link (among the links inthe received Web page) to follow.

    5.3 Prototype Implementation

    We make use of a Policy Tool to provide aninterface to the autonomous management sys-tem. It allows users to manipulate policiesstored in the Knowledge Base. This may, forexample, include adding, enabling, disabling,or deleting groups, policies, conditions, actionsand tests, as well as organizing policies intogroups. We have extended this tool to pro-

    vide remote configuration capability wherebythe management components could be startedand stopped remotely as shown in Figure 2.The tool is also used as a monitoring consolefor observing the behavior of the Web server.

    The policy-object model structure (see [6])has also been extended to incorporate addi-tional policy attributes proposed in Section 3.The Knowledge Base has been configured witha set of policies which are organized in the fol-lowing four main groups:

    AM: consists of policies specific to theautonomous management system, includ-ing configuration policies for installing thecomponents of the management system.

    Apache: consists of policies specific to themanagement of the Apache Web server.The use of these policies has been the focusof our previous work [6, 7, 8].

    PHP: consists of policies specific to thePHP module which deal, particularly, withthe eAccelerator caching management.

    MySQL: consists of policies specific to themanagement of the MySQL database.

    Briefly, the management system is instanti-ated by first invoking the Management Agent(not shown in Figure 1). The initial task of thisagent is to query all the enabled configuration

    Figure 2: The testbed environment consist-ing of five workstations, each connected via a10/100 Mbps Ethernet switch. One worksta-tion is used to host Apache (configured with aPHP module) as well as the MySQL database.The workstation also hosts the Knowledge Basecontaining configuration and expectation poli-cies, a subset of which is shown in Table 1. Thisworksation runs Linux (Fedora Core 4) on a 2.0GHz processor with 2.0 G memory. The gold,silver, and bronze workstations are used to sim-ulate user behavior.

    policies (from the Knowledge Base) whose sub-

    ject matches the agents name. It is these poli-cies that are used to install the managementcomponents, with the exception of Monitors,the responsibility of which falls to the Mon-itor Manager. The PDP, in turn, queries theKnowledge Base for all the enabled expectationpolicies and uses this information to make deci-sions on how to respond to violations. Once thedifferent management components have beeninstalled, the managers responsibility becomesensuring that appropriate components are no-tified if there are any changes to the policiesgoverning the behavior of the system.

    6 Results

    In this section, we report on our initial evalua-tion of the performance of the different heuris-tics for selecting policy actions. Our compar-

    9

  • 7/27/2019 Bahati Et Al. 2007 - Policy-Driven Autonomic Management of Multi-Component Systems

    10/15

    Figure 3: Behavior under no policies.

    Figure 4: Behavior under heuristic 1.

    10

  • 7/27/2019 Bahati Et Al. 2007 - Policy-Driven Autonomic Management of Multi-Component Systems

    11/15

    CPU Memory Response Accepted RejectedUtilization (%) Utilization (%) Time (ms) Requests (#) Requests (%)

    No policies 98.95 [2.91] 32.47 [1.27] 1894.33 [1266] 50 [41] 0.00Heuristic-1 72.60 [27.99] 30.50 [1.12] 1498.61 [1696] 48 [41] 48.94Heuristic-2 72.23 [23.01] 30.92 [0.82] 1526.15 [1394] 48 [41] 33.33

    Heuristic-3 78.26 [27.20] 31.32 [1.34] 1367.15 [1178] 45 [39] 34.25

    Table 2: Performance comparisons of the heuristics.

    isons focus on the behavior of the LAMP serverwith respect to several performance metrics:Apaches responsiveness (i.e., response time),throughput (i.e., number of requests processedand rejected), and resources utilization (i.e.,CPU and memory). For the experiments re-ported, we considered three classes of users(gold, silver, bronze - best effort). For each

    class, the system could limit the number of re-quests from that class. In this paper, we onlyconsider requests involving dynamic Web con-tent. Measurements were taken every five sec-onds. For all the experiments, the load gener-ator in each client workstation was configuredto dynamically change the frequency at whichrequests were sent to the server. The requestrate for the gold, silver, and bronze worksta-tions was identical and followed a shape simi-lar to a normal distribution, where the serverload was gradually increased to some maximum

    value, and then decreased gradually to zero.In the first (base) experiment all requestswere treated equally, i.e., there was no ser-vice differentiation, and we evaluated the be-havior of the server without the benefit of theautonomous management system (Figure 3).This involved disabling all the expectation poli-cies as well as setting the servers bandwidtharbitrarily large to prevent any requests frombeing rejected. For the remainder of the ex-periments, all the expectation policies were en-abled, while we compared several heuristics:Heuristic 1 involved always selecting the ac-

    tions of the first violated policy (Figure 4).Heuristic 2 involved selecting the actions ofa single policy based on its priority as com-puted by Equation 3 (Figure 5). Heuristic 3involved selecting the actions with the largeststrength from those advocated by the violatedpolicies as computed by Equation 2 (Figure 6).

    Due to space limitations, we do not commentfurther on service differentiation.

    We summarize the results of the experimentsin Table 2 according to several performancemetrics. The values were computed by averag-ing each metric measurements during the over-load period (i.e., from the 750 second markthrough the 1550 second mark). A standard de-viation (inside square brackets) is also listed be-side each value. It should be pointed out that,while we use the mean value to assist in thecomparisons, the main objective (as specifiedby the expectation policies) is to ensure thatthe thresholds specific to the different metricsare not exceeded: namely, 85% for CPU uti-lization, 50% for memory utilization, and 2000ms for response time. The high standard devia-tions are in part caused by fluctuations causedby actions taken by the management system,such as agracefulrestart of the Apache server.

    6.1 Response Time

    In terms of the servers response time, it isquite clear that heuristic 3 performs betterthan heuristic 1, 2 and the base experiment.Also, as expected, all the heuristics performedbetter than the base experiment. This could bedue to the fact that, in the base experiment,the CPU utilization increases to nearly 100%(see Figure 3), at which point the server is un-able to respond to clients within the specifieddesired time limit. During this period (for the

    most part), the servers response time exceeds2000 ms as can be seen in the graph. This is incontrast to the behavior under, say, heuristic 2or heuristic 3 whereby, as soon as a violationis observed, the management system is able toadjust appropriate tuning parameters and as aresult, improve the response time.

    11

  • 7/27/2019 Bahati Et Al. 2007 - Policy-Driven Autonomic Management of Multi-Component Systems

    12/15

    Figure 5: Behavior under heuristic 2.

    Figure 6: Behavior under heuristic 3.

    12

  • 7/27/2019 Bahati Et Al. 2007 - Policy-Driven Autonomic Management of Multi-Component Systems

    13/15

    6.2 Throughput

    Overall, the average number of accepted re-quests under the base experiment is slightlyhigher compared to the heuristics. While thismay seem desirable, note that most of these re-

    quests experience a much worse response timeon average. It is, therefore, important to nego-tiate both objectives and this is precisely thepurpose of considering policies for improvingboth the servers response time and throughputwhen deciding on what action to take. This isreflected in the results for heuristics 2 and 3which make use of such mechanisms. For thesame reason, we also see that the percentage ofrejected requests for heuristics 2 and 3 is muchlower than that of heuristic 1.

    6.3 CPU Utilization

    In terms of CPU utilization, the behavior of theserver is better when under the heuristics thanwhen compared to the base experiment. How-ever, heuristic 3 seems to perform slightly worsethan heuristic 1 and heuristic 2 in terms of theaverage CPU utilization (see Table 2). Thisillustrates one of the key challenges in tryingto balance between conflicting objectives; wecomment further on this in Section 6.5.

    6.4 Memory Utilization

    In the case of memory utilization, there is notmuch difference in the behavior of the serverin the four experiments. Note, however, thatwhile the utilization is constant throughout therun for the base experiment (Figure 3), thereis a drop in memory utilization close to theend of the run in the case of the heuristics.This is the result of policies for configuring theserver under normal behavior (i.e., when theload is low). For example, in such a situation,the number of MaxClients could be reducedand as a result, one is able to free up resources.

    6.5 Exceeding Thresholds

    The graphs in Figures 3, 4, 5, and 6 illustratethe CPU and response time variation as a re-sult of load during the experiment. The graphssuggest that the policies and heuristics do have

    Figure 7: Performance beyond thresholds.

    a positive impact on the behavior of the sys-tem. However, averages can be misleading, es-pecially when comparing the heuristics, since

    several peaks can distort the average. The av-erages and standard deviations (see Table 2)provide evidence to this effect. An alternativeis to compute the area occupied by the curvebeyond the threshold and divide it by the dura-tion of the experiment (i.e., 2.0 103 seconds).Since the horizontal axis is over time and thevertical axis is the measure of the metric (CPUload or response time), the area is a measureof the magnitude of the violation.

    From Figure 7, one can see that during over-load period, the time that the CPU exceededthe threshold was significantly lower for the

    heuristics than the experiment with no poli-cies. We also see that, while heuristic 3 per-formed slightly worse than heuristics 1 and 2in terms of the magnitude of the violation be-yond the threshold, it had a more positive im-pact on the servers response time than any ofthe other experiments. This could be a result

    13

  • 7/27/2019 Bahati Et Al. 2007 - Policy-Driven Autonomic Management of Multi-Component Systems

    14/15

    of giving more weight to violations that are se-vere; in this case response time. It illustratesone of the key challenges facing autonomic sys-tems; i.e. trying to negotiate between seem-ingly conflicting objectives. On the one hand,striving to meet customer needs, in this case

    improving servers response time. On the otherhand, trying to ensure efficient utilization ofsystems resources. Striking a balance betweenthese objectives, particularly for an autonomicmanager making decisions based on violationevents within a single management window, isextremely difficult. In order to begin address-ing some of these challenges, policy selectionmechanisms may need to consider past experi-ence in the use of policies to make appropriatetradeoffs. This is significantly important es-pecially when policy selection decisions involveactions whose impact might be positive (or neg-

    ative) but not immediate. We have began ad-dressing some of these issues in our preliminaryimplementation of the Event Analyzer (see, forexample, [9]).

    7 Conclusion

    In this paper, we have presented mechanismsfor selecting policy actions given that multi-ple policies, possibly advocating conflicting ac-tions, are violated. Several heuristics were con-

    sidered which provided directives for the auto-nomic manager to resolve QoS requirements vi-olations. An illustration of the effects of suchmechanisms in managing the performance of amulti-tiered Web server consisting of Apache,MySQL, and PHP server was also presented.The heuristics introduced are based only on thestructure of the policies and, thus, they shouldalso be applicable in other domains.

    While the results of the initial experimentsare encouraging, there are several areas forimprovement: (1) More extensive experimentsthat stress the different performance aspects of

    a dynamic Web server are needed. This may,for example, include evaluating the impact ofdatabase-write intensive transactions. (2) Ourcurrent work has focused exclusively on appli-cation adaptation which involves tuning thebehavior of applications to meet the constraintsimposed by their environment. An avenue for

    future work includes incorporating environ-ment adaptation mechanisms that deal withthe adjustment of system parameters, such asCPU scheduling (see, for example, [16]). (3)More work is also needed in developing bet-ter policies for managing dynamic Web servers.

    Studies such as [10, 17] could provide a sourceof policies. (4) It is also important to evaluatethe effectiveness of the policy actions based onlong-term experience in their use. This is thefocus of our future work on the Event Analyzer.

    About the Authors

    Raphael Bahati is a Ph.D. student in the De-partment of Computer Science, at the Uni-versity of Western Ontario. His current re-search interests include autonomic computing,

    distributed system management and grid com-puting. Michael Bauer is a Professor of Com-puter Science at the University of Western On-tario with interests in distributed system man-agement, autonomic computing, distributed re-sources allocation and high performance com-puting grids. Elvis Vieira has just completedhis PhD in Computer Science at the Univer-sity of Western Ontario and is pursuing post-doctoral studies. His research interests in-clude quality of service, communication proto-cols and autonomic systems.

    References

    [1] Apache. http://www.apache.org/

    [2] eAccelerator. http://eaccelerator.net/

    [3] MySQL. http://www.mysql.com/

    [4] phpBB. http://www.phpbb.com/

    [5] PHP. http://www.php.net/

    [6] R. M. Bahati, M. A. Bauer, C. Ahn,O. K. Baek, and E. M. Vieira. Map-ping Policies into Autonomic Manage-ment Actions. InInternational Conferenceon Autonomic and Autonomous Systems(ICAS06), page 38, Silicon Valley, CA,USA, July 2006.

    14

  • 7/27/2019 Bahati Et Al. 2007 - Policy-Driven Autonomic Management of Multi-Component Systems

    15/15

    [7] R. M. Bahati, M. A. Bauer, C. Ahn, O. K.Baek, and E. M. Vieira. Policy-based Au-tonomic Management of an Apache WebServer. In International Conference onSelf-Organization and Autonomous Sys-tems in Computing and Communications

    (SOAS06), volume 2, pages 2130, Erfurt,Germany, September 2006.

    [8] R. M. Bahati, M. A. Bauer, C. Ahn,O. K. Baek, and E. M. Vieira. Us-ing Policies to Drive Autonomic Manage-ment. In International Symposium on aWorld of Wireless, Mobile and MultimediaNetworks (WoWMoM06), pages 475479,Buffalo, NY, USA, June 2006.

    [9] R. M. Bahati, M. A. Bauer, and E. M.Vieira. Adaptation Stratergies in Policy-Driven Autonomic Management. In In-ternational Conference on Autonomic andAutonomous Systems (ICAS07), page 16,June 2007.

    [10] E. Cecchet, A. Chanda, S. Elnikety, J.Marguerita, and W. Zwaenepoel. Per-formance Comparison of Middleware Ar-chitectures for Generating Dynamic WebContent. In ACM/IFIP/USENIX In-ternational Middleware Conference, pages242261, Brazil, June 2003.

    [11] N. Damianou, N. Dulay, E. C. Lupu, and

    M. S. Sloman. Ponder: A Language forSpecifying Security and Management Poli-cies for Distributed Systems: The Lan-guage Specification (version 2.1). Tech-nical Report, Imperial College, London,England, April 2000.

    [12] T. Kelly. Utility-directed Allocation. InWorkhop on Algorithms and Architectures

    for Self-Managing Systems, San Diego,CA, USA, June 2003.

    [13] J. O. Kephart and W. E. Walsh. An Artifi-cial Intelligence Perspective on Autonomic

    Computing Policies. In InternationalWorkshop on Policies for Distributed Sys-tems and Networks (POLICY04), NY,USA, pages 312, June 2004.

    [14] H. L. Lutfiyya, G. Molenkamp, M. J.Katchabaw, and M. A. Bauer. Issues

    in Managing Soft QoS Requirements inDistributed Systems Using a Policy-basedFramework. InInternational Workshop onPolicies for Distributed Systems and Net-works (POLICY01), pages 185201, Bris-tol, UK, January 2001.

    [15] L. Lymberopoulos, E. C. Lupu, andM. S. Sloman. An Adaptive Policy-BasedFramework for Network Services Manage-ment. InJournal of Networks and SystemsManagement, volume 11, pages 277303,2003.

    [16] P. Pradhan, R. Tewari, S. Sahu, A. Chan-dra, and P. Shenoy. An Observation-based Approach Towards Self-managingWeb Servers. In International Workshopon Quality of Service (IWQoS02), pages1322, Miami, Florida, USA, May 2002.

    [17] U. V. Ramana and T. V. Prabhakar.Some Experiments with the Performanceof LAMP Architecture. In InternationalConference on Computer and InformationTechnology (CIT05), pages 916 920,Shanghai, China, September 2005.

    [18] R. S. Sutton and A. G. Barto. Rein-forcement Learning: an Introduction. MITPress, 1998.

    [19] W. E. Walsh, G. Tesauro, J. O.

    Kephart, and R. Das. Utility Func-tions in Autonomic Systems. In Interna-tional Conference on Autonomic Comput-ing (ICAC04), pages 7077, New York,NY, USA, May 2004.

    [20] K. Yoshihara, M. Isomura, and H. Ho-riuchi. Distributed Policy-based Man-agement Enabling Policy Adaptation onMonitoring using Active Network Technol-ogy. In IFIP/IEEE International Work-shop on Distributed Systems: Operationsand Management, Nancy, France, October

    2001.

    15