conclusions from the european roadmap on control of computing systems karl-erik Årzén, anders...

Post on 26-Dec-2015

223 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Conclusions from the European Roadmap on

Control of Computing Systems

Karl-Erik Årzén, Anders Robertsson, Dan Henriksson

LTH, Lund University, Sweden

Mikael Johansson, Håkan Hjalmarsson, Karl Henrik Johansson

Royal Institute of Technology , Sweden

FeBiD’06, Vancouver, April 3, 2006

Background:

Recent large research interest,

(academically as well as industrially initiated) in

Control-based methods for resource management in real-time computing and communication systems

In most cases, allocation of memory, computing and/or communication resources

Examples

• Performance control of web-servers, • Dynamic resource management in embedded

systems, • Traffic control in communication networks, • Transaction management in database servers, • Autonomic computing

etc.

eBusiness

Many types of servers and applications

TPF

transaction processing

facility

mainframe

Front end for online customer service

SUN

E-mail

Server A

E-mailAddress Capture

AIX

DSS

Server B

Decision support

gateways

SUN

Sybase

Server A

LocalDirector

Network

Hub Servers

Application Server

Web Server

Server A

messaging

HTTP

Presentation Business Logic Gateway

Relational database Servers

mainframeSYSPLEX

transaction monitor

Back-end Systems

DSSClient

JDBC

HTTP

Server B

Web Server

messaging

Logging

MQ

Server B

GatewayLogging

messaging

Server AApplication

Logging

messaging

Logging Servers

TPF

SYSPLEXmainframe

hierarchical database

transaction monitorSNA

Server B

SNA

SNAmessaging

SUN

Sybase Security

Server B

Security Servers

Multi-tier systems of Web browsers, business logic and databases

Feedback at various levels

Queue Control

IBM, HP, Microsoft, Amazon, ….

Challenges:• Modeling

formalisms (DES, ODEs, queuing theory, …)

• Design of software and computing systems for controllability

[courtesy J. Hellerstein]

ARTIST2

• Roadmap outcome from ARTIST2-workshop in Lund, Sweden, May 2005

• EU/IST FP6 Network of Excellence – Embedded Systems Design

• NSF-supported workshop on

”Future trends in control of computer systems”

by Hellerstein, Tilbury & Abdelzaher, May 2005

www.artist-embedded.org/FP6

Roadmap

Available for download at

http://www.control.lth.se/user/karlerik/roadmap1.pdf

Experiment:

You have wireless network access – try the server!

…or not.

An admission (control) problem

Report from Swed. Emergency Management Agency

How to handle the overload problem?

• Overprovision – (more capacity than needed on average)

• Admission control– Some are denied access, but server

continues to operate.

• Change service – (”sending text-only at high loads”)

Why is control of computing systems interesting?

• Multidisciplinary:

– Several ”new” challanges– Not covered within one traditional ”research domain” (queueing theory,

computer science, systems and control…)

• Need systematic tools for design and analysis– robustness to ”disturbances”– better performance

• Cost of operating computing systems is raising/dominating (60-90%) [Hellerstein et al, 2005]

Outline• Background & Motivation

• Computer systems in a control theoretic framework– Modeling issues

• Roadmap: Research challenges in

– Control of server systems, – Control of CPU resources,– Feedback scheduling of control systems,– Control of communication networks, – Error control of software systems,– Control middleware.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

[4.15 pm] Panel – Top Three Challenges in Control of Networks and Systems

Contents of roadmap

Six research areas:

– Control of server systems, – Control of CPU resources– Feedback scheduling of control systems,– Control of communication networks, – Error control of software systems,– Control middleware.

”…how flexibility, adaptivity, performance and robustness can be achieved in a real-time computing or communication system through the use of control theory”

Modeling Formalisms

• Heuristic approach vs. Model based control– ”Inherent robustness” from feedback control – One reason why many ad hoc stratergies work – More can be gained (systematic design & analysis)

Basic principle: Use simple enough models for design and analysis– Model should capture essential dynamics and show

similar behavior as system for different distributions and load cases.

Modeling Formalisms

• Identification

• Sampling (SoH), ”noise”, inherent nonlinearities – ”First principles” (conservation, queueing theory)

– Computing systems: discrete-event dynamic systems (DEDS)

+ real-time systems => timed automata or timed Petri nets• Risk of state-space explosion (does not scale with arrival/service rates)• Well-suited for safetey and blocking properties, but how does it relate to

stability and robustness?

Modeling of queueing-systems

• Discrete event models• Queue theoretic model (Markov chains etc.)• Flow models (cont. time / average models)• Discrete time models

Modeling aspects

”Gain-scheduling”: (standard control principle):

– ”Choose among different control-parameters depending on e.g., operating condition”.

• ”Good” model structure of corresponding computing system may change with work load (e.g., for server systems)– Flow models OK for high loads – DEDS-models feasible för low loads

• Interpolation between different model structures?!• Transient vs steady-state behavior

Actuator Mechanisms

• The difference between the service rate, µ, and the arrival rate, λ, determines the delay experienced by the requests.

• Enqueue actuators: (Changing the arrival rate) – Admission control mechanism– Change inter-arrival period of task ”upstream” in multitiered system

• Dequeue actuator: Changing the service rate:– Number of server threads– Quality adaptation– Dynamic voltage scaling

Actuators - Implementation aspects

• Gate model:– Call gapping — accept first u(kh) calls in

control interval– Percent blocking — preserves distribution

Crusial not to trigger network resend (lost package), -”cmp Heracles and the hydra” [Sha ?]

Related reseach areas

Similarities/differences

of the different domains

– Traffic flow control– Manufacturing and supply chains– Communication networks– Power networks

with respect to

– Where does the congestion appear?– Routing?– Available information (dest.)?– Time/distance matters?– ”Package dropping” – OK or not?– Control action? Animation

Example: Highway congestion in LA [Varaiya et. al.]

Control of server systems

• Temporal control locally at server– Direct or ”indirect” objective

(service provider vs. customer)

• Queue-management and load balancing

• Inherent nonlinearities

• Multi-tiered systems including large eCommerce systems

Example: Admission control

Objective:– Good transient behavior for traffic changes– Preserve good performance for overload situations

Measure of admission– queue length– average time– utilization– CPU load / energy consumption– memory– …

Example: Feedforward + feedback

Control of server systems

• Prediction and state estimation based control• State and actuator constraints• Interestings region: When do the flow-models cease to

be valid?• Changing models and criteria in different load

situations...• Very exciting new results on discrete-event based

estimation and control– DE-sampling vs. DT-sampling

• control: ratio 1/5, • bandwidth allocation: 1/2

Server systems - Research challenges

• Modeling issues (as discussed before)• Control + queueing theory = ?• Event-based control – theory gap• Control objectives

– References (load, utilization) – Performance metrics and cost functions

• (upcrossing probabilities)• Security, reliability, availability, efficiency…

• Design patterns/Control patterns– Software structure + control structure and analysis for software

design• Well known in e.g., process control (ratio control, cascade,

midranging etc)– When should a queue problem be considered as

• an admission problem?• an delay control problem?

• Large-scale distributed systems / multi-tier systems– Distributed control, MPC, …

Control of CPU resources

• A large amount of feedback-based or adaptive global QoS management systems have been proposed.

• Early ad hoc schemes

of multi-level feedback

queue scheduling

control-theoretical approaches

using FC-EDF, EUCON

[Stancovic, Lu, Buttazzo,…]

The EDF-FC scheme (from [Stankovic et al., 1999])

Control of CPU resources –The challenges and research directions

• Multiprocessor systems• Power-aware CPU scheduling

– Dynamic Voltage Scaling– joint optimization problem of minimizing energy while still meeting real-

time constraints• already today receives a considerable attention from the research

community.• End-to-end resource management:

– Resource management in distributed systems where an activity spans multiple nodes

• Hierarchical resource allocation schemes– Cascaded structures with local allocation

• Efficient feedback scheduling mechanisms– Scheduling algorithm overhead – online optimization doable?

Feedback scheduling of control tasks

Actuation– Task period hi

Solve two different problems:• Resource regulation

– Control the total utilization to avoid overloads

• Optimal resource distribution– Assign individual task periods to optimize performance

Example: Dynamic Real-Time Scheduling of Model Predictive Controllers

• Based on on-line optimization of a cost function– Convex optimization problem solved in each sample– Iterative anytime algorithm– Result gradually refined up to a certain bound

• Attractive control strategy– Straightforward to use for multi-variable processes– Ability to handle constraints

• Unattractive real-time properties– High computational demands– Very large variations in execution times

Henriksson et al. 2004

Example: Feedback scheduling of MPC control tasks

Main idea

A process in stationarity may need less resources than a process in a transient phase

Use feedback from the optimization algorithm to determine

• for each MPC task, when to terminate the optimization and output the control signal, and– the optimization may be terminated early and still

produce acceptable results.

• which of several ready MPC tasks that should be scheduled for execution.

[Henriksson et. al., 2004]

• Current values of the cost functions act as dynamic task priorities– Constitutes an on-line QoS measure for the task– Reflects the relative importance of the tasks

• Feedback scheduler distributes the computing resources– Schedules MPC task with highest cost– Invoked after each iteration– Implemented as a separate task

• Cooperative robot task under resource constraints

• Master and slave configuration

• Ball and beam application

• Problems:– MPC tasks exhibit very large variations in

execution time– Traditional scheduling theory not applicable

• Solutions:– Premature termination of optimization– Dynamic scheduling based on cost functions

The challenges and research directions for feedback scheduling of control tasks

include all the challenges and research direction of control of CPU resources.

Additionally, the following items are important:

• Temporal robustness indices• Formal performance guarantees

– open question whether it is possible to combine the flexibility implied by feedback scheduling with formal guarantees

Control of Communication Networks

Example:• Feedback control is embedded in the TCP protocol in the

form of a sliding window mechanism.• Introduced in the 80’s to solve the congestive failure

problems that had brought down the network.• We have not experienced system-wide congestive

failures again even though the network has grown orders of magnitude.

• This is a testament of the effectiveness of feedback control in a highly dynamic, decentralized, and fast changing environment.

Remark:[9.00] Robust yet Fragile: Intrinsic Tradeoffs in Layered

Architectures

Control of Communication Networks

• Feedback control mechanisms are fundamental for the separation of communication layers

• Gives robustness and allows local optimization and refinements

Example• Reliable data transfer over wireless link through suitable

feedback control of– transmission power– modulation scheme– channel coding

Research Challenges in Control of Communication Networks

• Architectures and model abstractions for network control– Network models suitable for control and observer design– Robustness of large scale and distributed systems– Resource management in wireless networks

– Cross-layer adaptation for new services and optimized

performance

Cross-layer adaptation for improved

performance of cellular and wired networks

• Bandwidth variations in radio link give performance degradations due to large end-to-end delay and improper transport protocol

• Proxy between cellular and wired networks adapt sending rate to bandwidth variations through available radio link state information

3G-GGSN

3G CellularNetwork

Terminal

BTS

3G-SGSNRNC

BTS

App Server

InternetBW variations

PROXY

TCP

TCP

Proxy hybrid control law

• Controller in proxy regulates sending rate based on– Events generated by bandwidth changes obtained from RNC– Sampled measurements of queue length in RNC

[Möller et al., 2005]

Experimental evaluation

• Improved time-to-serve-user and link utilization compared to traditional end-to-end protocol

[Möller et al., 2005]

End-to-end protocol

New protocolBandwidth utilization

• Stability and robustness analysis of new protocol • Ongoing experimental evaluation and testing with

Network-aware control architecture

• Estimate network state– Delay– Data loss probability– Bandwidth

• Adjust controller accordinglyWireless network

Networkobserver

Plantobserver

Controllaw

Network-aware controllers

Control algorithms to cope with communication imperfections– Control under network delay– Control under data loss– Control under bandwidth limitation– Control under topology constraints

Characteristics depend on network technology Wireless network

Networkobserver

Plantobserver

Controllaw

Delay estimation

• Internet round-trip time (RTT) data are noisy with piecewise constant average

• Complex network dynamics hard to model• RTT estimation in TCP:

• Improved estimation thru Kalman filter with hypothesis test (CUSUM filter)

[Jacobsson et al., 2004]

Control middlewareMiddleware:

• a software abstraction layer that mediates the interactions between a component or application

• Commonly used in distributed system to provide communication services. – Java-RMI, Microsoft’s .COM, and CORBA…

• Networked embedded system applications,• e.g., mobile systems and sensor systems.

– GAIA [Romn et al., 2002], WSAMI [Issarny et al., 2005], and AURA

Control middleware

Research Directions• The most important research item for control middleware is to

develop these systems from research prototypes to something that may be used more widely.

• Middleware functionality:

Still an open question whether the middleware should – be passive, i.e., provide sensing and actuation services that the

application can use to itself implement the feedback control, or if it should be

– active, i.e., the middleware should be responsible for the actual control loop.

Both of these approaches have advantages and disadvantages.

Error control of software systems [L.Sha]

• The idea behind error control of software is to use ideas similar to the ideas used in feedback control in order to detect malfunctioning software components and, in that case fall back on, a well-tested core software component that is able to provide the basic application service with guarantees on performance and safety.

• Provide techniques and tools that support making the semantic assumptions of each software component explicit and machine checkable.

• Simple and reliable core• System remain in recoverable states

• SIMPLEX-architecture [Sha]– High accurance vs high performance– Need to stay in recoverable state

• Runs in parallell --- cmp ”bumpless transfer”

----------------------------------------------------------------------------

• ORTGA [FeBID’06]– Maximum stability region

– How to detect conditions for switches? (FDI)• False alarm vs. Non-recovery risk of instability

Roadmap

Available for download at

http://www.control.lth.se/user/karlerik/roadmap1.pdf

Conclusions

• Thank you for your attention!

• Questions?

• Panel debate

Proposed solutions for wireless TCP

• Split connection– Destroys end-to-end semantics

• End-to-end protocols– Deployment issues

• Link-layer improvements– Performance limitations

E.g., Balakrishnan et al., Ludwig and Katz, Xylomenos et al., Huang et al., Hossain et al., RFC 3135 and 3366, …

top related