control-based load shedding in data stream management systems yicheng tu and sunil prabhakar...

21
Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3, 2006

Upload: homer-daniels

Post on 18-Jan-2018

222 views

Category:

Documents


0 download

DESCRIPTION

DSMS architecture Network of query operators (O1 – O3) Each operator has its own queue (q1 – q4) Scheduler decides which operator to execute Query results (Q1, Q2) pushed to clients Example systems: Aurora/Borealis STREAM

TRANSCRIPT

Page 1: Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,

Control-Based Load Shedding in Data Stream Management Systems

Yicheng Tu and Sunil PrabhakarDepartment of Computer Sciences, Purdue UniversityApril 3, 2006

Page 2: Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,

Data stream management systems• Applications

• Financial analysis• Mobile services• Sensor networks• Network monitoring• More …

• Continuous data, discarded after being processed

• Continuous query• Data-active query-

passive model

User

DSMS

User

User

Data

Data

Data

Data

Data

Query Results

Page 3: Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,

DSMS architecture• Network of query operators (O1 – O3)• Each operator has its own queue (q1 – q4)• Scheduler decides which operator to

execute• Query results (Q1, Q2) pushed to clients• Example systems:

• Aurora/Borealis• STREAM

Page 4: Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,

Qualities in DSMS data processing• Data processing in DSMS is quality-critical

• tuple delay• data loss• sampling rate, window size, …

• Overloading during spikes degraded quality (delay)

• Solution: adjust data loss (i.e., load shedding)• On DSMS side • Eliminating excessive load by dropping data

items • The real problem is:

tuple delay is the major concern: results generated from old data are useless!

How to maintain processing delays while minimizing data loss ?

Page 5: Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,

Related work• Accuracy of aggregate queries under load

shedding (Babcock et al., ICDE04)• Data triage (Reiss & Hellerstein, ICDE05)

• Put data into an asylum upon overloading• LoadStar (Chi et al., VLDB05)• QoS-driven load shedding (Tatbul et al., VLDB03)

• Key questions- When?- How much?- Where?

• Use a load shedding roadmap to decide where• Simple, intuitive algorithm to decide when and how

much

Page 6: Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,

What’s wrong?• Highly dynamic environment is reality

• Bursty data input• Variable unit processing cost

• Fail to capture current system status (queue length) and output (delay)• Delay positively related to queue length

• Examples 1. Unbounded increase of delay• Example 2. Unnecessary data loss

Page 7: Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,

Our approach

• The feedback control loop:• Plant• Monitor• Controller• Actuator

• How it works• Error (e) = desirable output

(yr) - measured output (y) • Focal point: controller,

which maps e to control signal u

• Disturbances

• View load shedding as a control problem • Control: manipulation of system behavior by adjusting system

input • Cruise control of automobiles, room temperature control, etc.

• Open-loop vs. closed-loop (feedback) control

Page 8: Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,

Why feedback control ?

Open loop

Closed-loop

1/a

oimmrromir dddad

ayyddad

ayy

)(1)(

om

im

mr

m

m ddaK

ddaK

daydaK

daKy)(1

1)(1)(1

)(

oir d

Kd

Kyy 11

Page 9: Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,

Challenges• Can we model the system?

• Analytical model may not be easy to derive• System identification: experimental methods

• How to design the controller?• Use control theoretical tools for guaranteed

performance• DSMS-specific problems

• Lack of real-time measurement of output signal ( y )

• How to set control period (T)• Real system evaluation

• we use Borealis in our study

Page 10: Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,

Modeling a DSMS• Borealis data stream manager

• Round robin operator scheduler• FIFO waiting queues• For now, fix the per-tuple processing cost c

• Proposed model: y = qc

where q is the number of outstanding data tuples

• Discrete form: y(k) = q(k-1)c• Denote the input load as fi and system

processing power as fo:

kj

oi jfjfHcTckqky )]()([)1()(

Page 11: Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,

Controller design• Design based on pole placement• Guaranteed performance targeting

• Convergence rate - responsiveness• Damping - smoothness

• The controller:

Page 12: Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,

Control period• Provides complete answer to the question “when

to shed load”? • Arbitrarily set in previous studies• Case-by-case decision with some systematic

rules• In our problem, a tradeoff between:

• Sampling theory (Nyquist-Shannon Theorem): in order to capture the moving trends of the disturbances, higher (shorter) sampling frequency (period) is preferred

• Stochastic feature of output ( y ) and parameter ( c ): more samples are needed longer period is

preferred• The first factor should be given more weight

Page 13: Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,

Experiments• Controller and load shedder implemented in

Borealis• Synthetic (“pareto”) and real (“Web”) data

streams• Small query network with variable average

processing cost

Page 14: Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,

Experimental results• Experiments for

comparison• Aurora – open loop

solution• Baseline – a simple

feedback method• Target delay : 2000ms• Control period : 1

second• Total time: 400

seconds• For both input types,

data loss are almost the same for three load shedding strategies

Page 15: Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,

Future work• Time-varying DSMS model

• For example, time-varying cost c• Possible solution: adaptive control

• Adaptation other than load shedding• New disturbances?• Model changes?

• Other database problems?

distubance disturbance

InternalDynamics

ExternalController

InternalController

ExternalDynamics

Page 16: Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,

Backup - 1

Page 17: Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,

Backup - 2• Lack of robustness

of open-loop solution• More optimistic

policy adapted in Aurora

• Unstable performance

• Our solution is robust• Under input streams

with different burstiness

Page 18: Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,

Backup - 3

Page 19: Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,

Backup - 4 :Model verification• Feed Borealis with synthetic streams

• Input rate: step function or sinusoidal function of time

• Average processing cost is fixed

Page 20: Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,

Summary• Load shedding is an important quality

adaptation method• Ad hoc solutions do not work under

dynamic load and system features• We propose an approach to guide load

shedding in a highly dynamic environment based on feedback control theory

• Initial experimental results performed in a real-world DSMS show promising potential of our approach

Page 21: Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,

Acknowledgements

• Dr. Song Liu, Hurco Companies, Inc., Indianapolis, IN.

• Prof. Bin Yao, School of Mechanical Engineering, Purdue University

• Ms. Nesime Tatbul, Profs. Ugur Cetentimel, Stan Zdonik, CS Department, Brown University