1 support for dynamic adaptation of power-aware server clusters vinicius petrucci, orlando loques...

1

Support for dynamic adaptation ofpower-aware server clusters

Vinicius Petrucci, Orlando LoquesFluminense Federal University, Brazil

Daniel MosséUniversity of Pittsburgh, USA

March, 2009

2

Research context

• Dynamic computing environments– varying workloads– resources variability (including component failures)– changing user needs

• Applications have to cope with changes– adaptive behavior requirement

• Support for dynamic adaptations– reusable infrastructure– adaptation language

3

Application cases• Server Clusters

– power optimization and QoS control• Wireless sensor networks

– bandwidth availability, data reliability (accuracy), power optimization

• Overlay networks– topology reconfiguration

• Grids– shared (heterogeneous) resources with varying quality and

availability• Pervasive / ubiquitous computing

4

Wireless sensor networksDistributed autonomous devices which rely on sensors to cooperatively monitor physical or environmental conditions

* Wikipedia.org

Example: energy optimization can be achieved by turning some sensors on/off

5

Videoconferencing system

Set of servers (called reflectors) that route the audio/video streamsto the participating clients.

Example: monitoring and control of the reflector configuration to meet the QoS

refUFF

refLMPD

refUERJ

Clients

6

Server clusters

Example: power optimization while meeting performance / QoS requirements

Clients have a single view through ServerCluster component (load balancer) and requests are processed by back-end servers

7

Problem

• Adaptive policies for applications– implementation may be complex in itself– most of those are implemented in ad-hoc fashion

• Code for adaptation policies is– mixed with the application code– costly and difficult to modify and maintain in a real

operational environment

7

8

Approach

• Generic solution to support adaptations– external reusable infrastructure to monitor and

adapt running applications– contract-based adaptation language for

representing high-level policies

• Software architecture abstractions– representation of application configurations– stored as meta-level data (object model)

8

9

Related work

• Rainbow (CMU)– adaptation language + supporting infrastructure

• Autonomic managers (IBM)– provides a generic view of autonomic computing

• Jade (INRIA)– lack of adaptation knowledge representation

• CASA (Univ. of Zurich)– contract-based language using XML

• We propose a lightweight approach based on scripting/dynamic language facilities

10

Autonomic computing (IBM)

10

Knowledge: adaptation models, data, and scripts

General feedback control loop

11

Adaptation framework

11

12


12

13

Adaptation language• Profiles

– conditions for triggering adaptations• Adaptations

– steps to move an application away from an undesirable condition

• Negotiation clauses– particular order to deploy the adaptations

• Constructs: adapt_period, settling_time– cater for timing issues of adaptation

13

14


14

15


15

16


16

17


17

18

Scripting languages

• Scripting/dynamic language (Python)– high-level abstractions for expressing

dynamic adaptation policies– built-in functions simplify infrastructure

development (e.g., compile, exec)• Abstract adaptation operators

– mapped to application-level operations at run-time

– may rely on APIs provided by the app support level (e.g., Apache modules API)

18

19

Multiple adaptation contracts

• Support for multiple domains of adaptation– each contract has one thread of control

• Simple concurrency model– global locking mechanism– First-Come, First-Serve approach

19

while contract.running:

for a in contract.adaptations:

if a.profile is True:

execute adaptation code of “a”

sleep for “settling_time” interval

sleep for “adapt_period” interval

20

The case of server clusters

• Server utilization remains very low– average about 6%

• Energy consumption is high and growing– about 9% per year

• Carbon emissions are set to quadruple by 2012– projected to surpass the airline industry

• Great opportunity for dynamic adaptations

20

Source: Uptime Institute (McKinsey & Co. Report --- http://uptimeinstitute.org)

21

Dynamic adaptations

• Dynamic adaptation capabilities– CPU DVFS (dynamic voltage/frequency scaling)– server on/off mechanisms (e.g., suspend-to-RAM +

wake-on-LAN)

• Power and performance trade-off– servers' capacity management to reduce energy

consumption– guarantee of QoS requirements (e.g., utilization or

response time)

21

22

Configuration problem

22

N = number of servers; Fi = number of frequencies of the server i p_busy, p_idle = power costperf = servers’ performance Xij = decision variabledemand = incoming workload

23


23

minimize the overall powerconsumption

24


24

associate decision variable xij with objective function variables

25


25

select only one frequency on a given server

26


26

handle the incoming workload (given by demand)

27

Adaptation example

• Thresholds for cluster utilization– e.g., T_LOW = 0.70 and T_HIGH = 0.85

27

profile { webcluster.load / webcluster.maxLoad() < T_LOW} util_low;

profile { webcluster.load / webcluster.maxLoad() > T_HIGH} util_high;

28

Adaptation example

28

contract { adaptation { demand = webCluster.load / T_HIGH changeConf = webCluster.bestConfig(demand) for (s, f) in changeConf: if f == 0: webCluster.turnOff(s) else: if s.status == 0: webCluster.turnOn(s) webCluster.adjustFreq(s,f) } adjustCluster with util_low or util_high \ settling_time 6000/*ms*/;

} decision1 adapt_period 5000/*ms*/;

29

Adaptation example

29

contract { adaptation { demand = webCluster.load / T_HIGH changeConf = webCluster.bestConfig(demand) for (s, f) in changeConf: if f == 0: webCluster.turnOff(s) else: if s.status == 0: webCluster.turnOn(s) webCluster.adjustFreq(s,f) } adjustCluster with util_low or util_high \ settling_time 6000/*ms*/;

} decision1 adapt_period 5000/*ms*/;

30

Adaptation example

• Common monitoring support– e.g., variable access: webcluster.load

• Reusable adaptation operators– e.g., webcluster.turnOn(), webcluster.turnOff()

• Some of policy-specific operators can also be defined– e.g., webcluster.bestConfig()

• Different adaptation polices can be used

30

31

Application-specific layer

• Apache built-in load balancer module– mod_proxy_balancer

• New apache module in C (mod_frontend)– Expose an API (XML-RPC) for

• monitoring system properties• controlling the front-end web server

– Example• sensors -> load (req/s), req. response time• actuators -> DVS, On/Off

31

32

Experimental evaluation

• Dedicated web cluster testbed

32

33

Controlling cluster utilization

33

34

Power/energy savings

34* Energy consumption reduction of ~ 37% compared to not using adaptations

35

Using different quality metric

35

36

Supporting multiple contracts

36* Running concurrent adaptation contracts: power management and fault tolerance

37

Different adaptation policies

37

Disruption : the number of turning on (and off) adaptations, which may involve a switching cost.

What is the best way to minimize disruption AND energy consumption ??

Future study : anticipatory adaptation model, risk-aware controller ...

38

Adaptation time overhead

38

• The worst case measured (overall adaptation phase): 13,045.78 ms• Operations: 1 on, 1 off, and 2 adj. freq. => 12,012ms + 1,005ms + 7ms + 7ms = 13,013ms• Framework overhead = 32.78 ms

39

Conclusion

• Framework-based approach to support dynamic adaptations– power and performance management for server

clusters

• Re-usability of the adaptation infrastructure– simplifies both evaluation and management of

different adaptation policies / requirements– helps to reduce the development cost of adaptive

applications

39

40

Future work

• Improvements in the framework– forecasting for adaptation decisions

• Other power-aware adaptations– multi-core architecture / memory systems

• Optimization algorithms for adaptation– processor allocation among multiple services /

applications• Experimental evaluation

– virtualization -> consolidation, live migration– more realistic/real workloads

40

41

Power and performance

41

42

Fault tolerance contract

42

contract { adaptation {

srv = webcluster.getFailedServer() newsrv = webcluster.allocNewServer() if newsrv: webcluster.replaceServer(srv, newsrv) else: webcluster.log(“could not allocate server”)

} repair with server_fail settling_time 4000/*ms*/;

} fault_tolerance adapt_period 1000/*ms*/;

profile { webcluster.failure > 0 } server_fail;

43

Fault tolerance contract

43

contract { adaptation {

srv = webcluster.getFailedServer() newsrv = webcluster.allocNewServer() if newsrv: webcluster.replaceServer(srv, newsrv) else: webcluster.log(“could not allocate server”)

} repair with server_fail settling_time 4000/*ms*/;

} fault_tolerance adapt_period 1000/*ms*/;

profile { webcluster.failure > 0 } server_fail;

44

Filter modules

44

45

Holt's method

45

1 support for dynamic adaptation of power-aware server clusters vinicius petrucci, orlando loques...

Documents

adaptation framework

adaptation policies

adaptation models

dynamic adaptation of

sensors onoff slide

timing issues of adaptation

zurich contractbased

autonomic computing