1 support for dynamic adaptation of power-aware server clusters vinicius petrucci, orlando loques...

45
1 Support for dynamic adaptation of power-aware server clusters Vinicius Petrucci, Orlando Loques Fluminense Federal University, Brazil Daniel Mossé University of Pittsburgh, USA March, 2009

Upload: nelson-hill

Post on 18-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

1

Support for dynamic adaptation ofpower-aware server clusters

Vinicius Petrucci, Orlando LoquesFluminense Federal University, Brazil

Daniel MosséUniversity of Pittsburgh, USA

March, 2009

2

Research context

• Dynamic computing environments– varying workloads– resources variability (including component failures)– changing user needs

• Applications have to cope with changes– adaptive behavior requirement

• Support for dynamic adaptations– reusable infrastructure– adaptation language

3

Application cases• Server Clusters

– power optimization and QoS control• Wireless sensor networks

– bandwidth availability, data reliability (accuracy), power optimization

• Overlay networks– topology reconfiguration

• Grids– shared (heterogeneous) resources with varying quality and

availability• Pervasive / ubiquitous computing

4

Wireless sensor networksDistributed autonomous devices which rely on sensors to cooperatively monitor physical or environmental conditions

* Wikipedia.org

Example: energy optimization can be achieved by turning some sensors on/off

5

Videoconferencing system

Set of servers (called reflectors) that route the audio/video streamsto the participating clients.

Example: monitoring and control of the reflector configuration to meet the QoS

refUFF

refLMPD

refUERJ

Clients

6

Server clusters

Example: power optimization while meeting performance / QoS requirements

Clients have a single view through ServerCluster component (load balancer) and requests are processed by back-end servers

7

Problem

• Adaptive policies for applications– implementation may be complex in itself– most of those are implemented in ad-hoc fashion

• Code for adaptation policies is– mixed with the application code– costly and difficult to modify and maintain in a real

operational environment

7

8

Approach

• Generic solution to support adaptations– external reusable infrastructure to monitor and

adapt running applications– contract-based adaptation language for

representing high-level policies

• Software architecture abstractions– representation of application configurations– stored as meta-level data (object model)

8

9

Related work

• Rainbow (CMU)– adaptation language + supporting infrastructure

• Autonomic managers (IBM)– provides a generic view of autonomic computing

• Jade (INRIA)– lack of adaptation knowledge representation

• CASA (Univ. of Zurich)– contract-based language using XML

• We propose a lightweight approach based on scripting/dynamic language facilities

10

Autonomic computing (IBM)

10

Knowledge: adaptation models, data, and scripts

General feedback control loop

11

Adaptation framework

11

12

Adaptation framework

12

13

Adaptation language• Profiles

– conditions for triggering adaptations• Adaptations

– steps to move an application away from an undesirable condition

• Negotiation clauses– particular order to deploy the adaptations

• Constructs: adapt_period, settling_time– cater for timing issues of adaptation

13

14

Adaptation framework

14

15

Adaptation framework

15

16

Adaptation framework

16

17

Adaptation framework

17

18

Scripting languages

• Scripting/dynamic language (Python)– high-level abstractions for expressing

dynamic adaptation policies– built-in functions simplify infrastructure

development (e.g., compile, exec)• Abstract adaptation operators

– mapped to application-level operations at run-time

– may rely on APIs provided by the app support level (e.g., Apache modules API)

18

19

Multiple adaptation contracts

• Support for multiple domains of adaptation– each contract has one thread of control

• Simple concurrency model– global locking mechanism– First-Come, First-Serve approach

19

while contract.running:

for a in contract.adaptations:

if a.profile is True:

execute adaptation code of “a”

sleep for “settling_time” interval

sleep for “adapt_period” interval

20

The case of server clusters

• Server utilization remains very low– average about 6%

• Energy consumption is high and growing– about 9% per year

• Carbon emissions are set to quadruple by 2012– projected to surpass the airline industry

• Great opportunity for dynamic adaptations

20

Source: Uptime Institute (McKinsey & Co. Report --- http://uptimeinstitute.org)

21

Dynamic adaptations

• Dynamic adaptation capabilities– CPU DVFS (dynamic voltage/frequency scaling)– server on/off mechanisms (e.g., suspend-to-RAM +

wake-on-LAN)

• Power and performance trade-off– servers' capacity management to reduce energy

consumption– guarantee of QoS requirements (e.g., utilization or

response time)

21

22

Configuration problem

22

N = number of servers; Fi = number of frequencies of the server i p_busy, p_idle = power costperf = servers’ performance Xij = decision variabledemand = incoming workload

23

Configuration problem

23

minimize the overall powerconsumption

24

Configuration problem

24

associate decision variable xij with objective function variables

25

Configuration problem

25

select only one frequency on a given server

26

Configuration problem

26

handle the incoming workload (given by demand)

27

Adaptation example

• Thresholds for cluster utilization– e.g., T_LOW = 0.70 and T_HIGH = 0.85

27

profile {  webcluster.load / webcluster.maxLoad() < T_LOW} util_low;

profile {  webcluster.load / webcluster.maxLoad() > T_HIGH} util_high;

28

Adaptation example

28

contract {  adaptation {    demand = webCluster.load / T_HIGH    changeConf = webCluster.bestConfig(demand)    for (s, f) in changeConf:      if f == 0:  webCluster.turnOff(s)      else:        if s.status == 0:  webCluster.turnOn(s)        webCluster.adjustFreq(s,f)  } adjustCluster with util_low or util_high \                         settling_time 6000/*ms*/;

} decision1 adapt_period 5000/*ms*/;

29

Adaptation example

29

contract {  adaptation {    demand = webCluster.load / T_HIGH    changeConf = webCluster.bestConfig(demand)    for (s, f) in changeConf:      if f == 0:  webCluster.turnOff(s)      else:        if s.status == 0:  webCluster.turnOn(s)        webCluster.adjustFreq(s,f)  } adjustCluster with util_low or util_high \                         settling_time 6000/*ms*/;

} decision1 adapt_period 5000/*ms*/;

30

Adaptation example

• Common monitoring support– e.g., variable access: webcluster.load

• Reusable adaptation operators– e.g., webcluster.turnOn(), webcluster.turnOff()

• Some of policy-specific operators can also be defined– e.g., webcluster.bestConfig()

• Different adaptation polices can be used

30

31

Application-specific layer

• Apache built-in load balancer module– mod_proxy_balancer

• New apache module in C (mod_frontend)– Expose an API (XML-RPC) for

• monitoring system properties• controlling the front-end web server

– Example• sensors -> load (req/s), req. response time• actuators -> DVS, On/Off

31

32

Experimental evaluation

• Dedicated web cluster testbed

32

33

Controlling cluster utilization

33

34

Power/energy savings

34* Energy consumption reduction of ~ 37% compared to not using adaptations

35

Using different quality metric

35

36

Supporting multiple contracts

36* Running concurrent adaptation contracts: power management and fault tolerance

37

Different adaptation policies

37

Disruption : the number of turning on (and off) adaptations, which may involve a switching cost.

What is the best way to minimize disruption AND energy consumption ??

Future study : anticipatory adaptation model, risk-aware controller ...

38

Adaptation time overhead

38

• The worst case measured (overall adaptation phase): 13,045.78 ms• Operations: 1 on, 1 off, and 2 adj. freq. => 12,012ms + 1,005ms + 7ms + 7ms = 13,013ms• Framework overhead = 32.78 ms

39

Conclusion

• Framework-based approach to support dynamic adaptations– power and performance management for server

clusters

• Re-usability of the adaptation infrastructure– simplifies both evaluation and management of

different adaptation policies / requirements– helps to reduce the development cost of adaptive

applications

39

40

Future work

• Improvements in the framework– forecasting for adaptation decisions

• Other power-aware adaptations– multi-core architecture / memory systems

• Optimization algorithms for adaptation– processor allocation among multiple services /

applications• Experimental evaluation

– virtualization -> consolidation, live migration– more realistic/real workloads

40

41

Power and performance

41

42

Fault tolerance contract

42

contract {  adaptation {

    srv = webcluster.getFailedServer()     newsrv = webcluster.allocNewServer() if newsrv: webcluster.replaceServer(srv, newsrv)    else: webcluster.log(“could not allocate server”)

  } repair with server_fail settling_time 4000/*ms*/;

} fault_tolerance adapt_period 1000/*ms*/;

profile { webcluster.failure > 0 } server_fail;

43

Fault tolerance contract

43

contract {  adaptation {

    srv = webcluster.getFailedServer()     newsrv = webcluster.allocNewServer() if newsrv: webcluster.replaceServer(srv, newsrv)    else: webcluster.log(“could not allocate server”)

  } repair with server_fail settling_time 4000/*ms*/;

} fault_tolerance adapt_period 1000/*ms*/;

profile { webcluster.failure > 0 } server_fail;

44

Filter modules

44

45

Holt's method

45