1 measurement techniques and tools types, selection, characterisation and forecasting of workload ...

50
1 MEASUREMENT TECHNIQUES AND TOOLS Types, Selection, Characterisation and Forecasting of Workload Instrumentation Benchmarking Representation of Measurement Data

Upload: bernard-glenn

Post on 01-Jan-2016

224 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

1

MEASUREMENT TECHNIQUES AND TOOLS

Types, Selection, Characterisation and Forecasting of Workload

Instrumentation

Benchmarking

Representation of Measurement Data

Page 2: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

2

Classification of Measurement Techniques-- Introduction

A real measurement experiment could be rather involved, but if we can identify the measurement primitives needed to support it, the entire problem becomes easy to solve.

Before discussing these primitives, Let us introduce the concepts of event and state.

Page 3: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

3

State and Event

The state of the system is defined by the values contained in various storage elements, be they memory locations, registers, or flip- flops.

Depending on the measurement objective, some of these may be used to define the relevant states and others to provide further information about to happening in those states.

The former is called Primary Variables and others auxiliary.

We refer to a change in a relevant state variable as an event.

The events could also be classified as primary or auxiliaries depending on what type of state variables are involved.

Page 4: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

4

Primitive Measurements

Primitive measurements are as follows:

There are 3 types of measurements.

1. The number of times a given state is visited during a given time interval.

Examples are the number of times a data structure is referenced, relative frequency of executing a given instruction, number of times the I/O is done from cylinder 0 of some disk, etc.

We call these measurements type A.

Page 5: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

5

Primitive Measurements – Type B and C.

2. The values of auxiliary state variables whenever a relevant state is entered.

For example, We may wish to record the number of processes in the ready

list whenever an I/O operation is initiated. We call these type B.

3. The fraction (or amount) of time for which the system is in a given state.

As an example, we may want to know what fraction of the time the disk head stays on cylinder 0.

We call these type C measurements.

Page 6: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

6

Condition for measurement

1. As being in a relevant state. A natural way to do this is to sample the system and check if the primary state variables have the desired values.

For example, to check if the control is inside a given procedure, we sample the program counter and see if it contains an address that belongs to that procedure.

This leads to sampled monitoring.

2.As an event that brings the system to a relevant state. Thus, to check if the control is inside a given procedure, we explicitly look for the events of entry to and exit from the procedure.

This leads to trace monitoring.

Page 7: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

7

Some Points It is easy to see that we can do measurements of all three types

using trace monitoring.

Type A measurements cannot be done using sampling, since the sampling would fail to count all instances when the system was in the desired state.

It is also true for Type B.

Type C measurements are possible by sampling since the computed relative frequencies really gives an estimate of the fraction of time spent in the desired state.

Multiplying this by the measurement duration, gives an estimate of total time spent in the desired state. Of course, sampling only gives us an estimate, not the exact answer.

Page 8: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

8

Another dimension for classifying the measurement techniques

This dimension has to do with the type of instrumentation used in monitoring.

The instrumentation can be broadly divided into the following three classes:

Hardware Monitoring Software Monitoring Hybrid Monitoring

Page 9: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

9

Hardware monitoring-- Introduction

This technique employs additional monitoring hardware that is interfaced with the system under measurement in a nonintrusive way.

The main advantage of this technique is that the measurement does not interfere with the normal functioning of the monitored system and fast events can be captured.

However, it is expensive and has difficulty in doing software- level measurements.

Typical applications of this technique are in measurements of types A and C for fast occurring events,

e.g., in measuring device utilizations, cache hit rate, and pipeline flush rate.

Page 10: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

10

Software monitoring-- Introduction

This technique uses some measurement code either embedded in the existing software or as a separate set of routines.

The main advantage of this technique is its generality and flexibility.

The disadvantages are that it may seriously interfere with the normal functioning of the system and cannot be used to capture fast occurring events.

This technique is most appropriate for obtaining user program and operating system related information,

such as the time spent executing a particular routine, page- fault frequency, and average number of processes in each possible state.

Page 11: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

11

Hybrid monitoring-- Introduction

This technique draws upon the advantages of both hardware and software monitoring.

All relevant signals are collected under software control and sent to another machine for measurement and processing.

The advantages are that it is flexible and that its domain of application overlaps those of both hardware and software monitoring.

The disadvantages are that the synchronization requirements between the measuring and measured system may cause some interference, and it is expensive and cumbersome to obtain detailed program or O/S- level measurements.

Page 12: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

12

Issues for selecting techniques

1.Accessibility. the hardware may be unaware of the software- level information and

thus unable to obtain it.

2. Event frequency. If the events occur too rapidly, the software may be unable to track

them. Thus, we have to either select hardware monitoring or a sampled measurement in software.

3. Monitor artifact. The interference caused by measurement may perturb the workload

significantly. For example, if we monitor the I/O traffic on a disk and record the

information on that disk itself, the measurements are no longer accurate.

Page 13: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

13

Issues-- Continued

4.Overhead . In some situations, the interference may either not affect the

measurement accuracy or may be easy to compensate for.

Nevertheless, the interference may be unacceptable because of a significant reduction in useful work .

Flexibility. This issue has to do with how easy it is to modify or upgrade the

instrumentation and / or the information being collected.

Generally, software instrumentation is easier to change than hybrid instrumentation, which in turn is easier to change than hardware instrumentation.

Page 14: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

14

Hardware Monitor

A general-purpose hardware monitor consists of:

1. counters: incremented whenever a monitored event occurs.

2. logic elements (AND, OR, and other logic gates): signals from probes are combined, the combinations are used to indicate events that may increment the counters.

3. comparators: to compare counters or signal values with preset values.

4. timer: used for time stamping or for triggering a sampling operation.5. tape/disk: for storing the data

Page 15: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

15

Workload Definition

Workload can be defined as the set of all inputs that the system receives from its environment during any given period of time.Difficult to handle real workloads with a large no. of elementsneed to build a workload model that captures most relevant characteristics of the real workloadthe choice of characteristics and parameters depends on the goal

e.g., web server study

Goal workload characteristics needed

(i) cost x benefit of creating - frequency & concentration of doc. ref.,

a proxy caching server doc. sizes, inter ref. times, etc.

(ii) impact of faster CPU on - ave. CPU time & ave. # of I/O operations

response time per request, etc.multiple and different workloads in a distributed environment

e.g., workloads of client, server, and network in a c/s system

Page 16: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

16

Types

test workload

real model

synthetic artificial

natural hybrid executable non executable

(benchmark,

trace) (instruction mix, (parametric,

synthetic programs, prob. distr.)

kernels, etc.)

Page 17: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

17

Types

test workload - any workload used in performance studies real workload consists of all the programs, transactions, commands, etc.,

processed during a given period of time, i.e., workload that the system processes during a measurement session. It cannot be repeated.

workload model- characteristics are similar to those of real workload and can be repeatedly applied in a controlled manner

synthetic workloads are constructed using basic components (programs, interactive commands, etc.) of the real workload and specially developed components (synthetic programs, kernels, synthetic scripts, etc.)

artificial workload are test workload implemented without making use of any real workload components

executable workload - load in this case may be directly executed on a real system

non executable workload are described by a set of mean parameter values (e.g., request inter arrival time, service demand, request mix) that reproduce the resource usage of real workload. It is not suitable for execution on a real system. Suitable for analytical and simulation models

Page 18: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

18

Workload Selection

Improper selection of workloads result in misleading conclusions four major considerations in selecting workload:

service exercised by the workload the level of detail (most frequent request,

frequency of request types, time stamped

sequence, ave./distr. of resource demands) representativeness (test workload & real

application match in arrival rate, total demand

of each resource, resource usage profile) timeliness (should represent the latest

usage pattern) other considerations:

loading level (best, worst &typical) impact of external components repeatability (multiple alternatives to be tested)

Applications

Operating System

CPU

ALU

Transactions

O/S commands+services

Instructions

Arithmetic instructions

Example for service - interface levels of various SUTs

Page 19: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

19

Representativeness of Workload Model Indicates the accuracy in representing the real workload the accuracy of a workload model is defined in different ways depending on

the modelling level adopted Given a workload W, some of the criteria that may be and indeed have been

chosen to evaluate the representativeness of a model W' can be easily derived from the following definitions:

1. W' is a perfectly representative model of W if it demands the same physical resources in the same proportions as W .

2. W' is a perfectly representative model of W if it demands the same physical resources at the same rates as W .

3. W' is a perfectly representative model of W if it performs the same functions in the same proportions as W .

What do they mean?

Example: Suppose if the characterisation of the workload is based on (1) the total CPU time and (2) the total number of I/O operations. Then

Page 20: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

20

Representativeness of Workload Model

criterion 1: States that the ratios between the values of these two parameters in W and W' must be the same.

criterion 2: Specifies that the mean durations of the CPU bursts and I/O bursts in W and W' must be equal.

criterion 3: If the workload to be modelled consists of 400 hrs of compilation, 250 hrs of test runs, 700 hrs of data processing, 300 hrs of scientific computation, then W' will have to consist of say 20 minutes of compilation, 12.5 minutes of test runs, 35 minutes of administration and 15 minutes of scientific computations.

However, there are times when the represented model satisfying the three criteria or one of them virtually impossible.

Another useful criterion is the performance-oriented criterion: criterion 4: W' is a perfectly representative model of W if it produces the same

values of performance measures as W when running on the same system.

Page 21: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

21

Representativeness of Workload Model

=

In summary, there is no absolute and unique criterion for building and evaluating workload models: the characterisation level and the characterising parameters will have to be selected according to the objectives of the study for which the modelling effort is undertaken. One criterion can also be combined with another criterion based on a different level of characterisation

Performance Measures

realPPerformance Measures

elPmod

RealWorkload

WorkloadModel

System System

Page 22: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

22

Workload Characterisation

the process of selecting the workload or workloads on which to base the performance study

characterisation process analyses a work load and identifies its basic components and features that have impact on system’s performance

basic component refers to a generic unit of work that arrives at the system from an external source

e.g., job, transaction, interactive command, process, etc. Three Levels of Workload Modelling

user-oriented characterisation: e.g., business workload with business quantities such as # of employees, invoices per customer, etc.

functional characterisation: describes programs, commands or applications that make up the workload and is system independent

resource-oriented characterisation: based on the consumptions of the system's hardware and software resources; example: CPU time consumed, number of instructions executed, main memory and secondary storage space required, total I/O consumed, number of workfile used, etc.

Page 23: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

23

Workload Characterisation Steps involved in constructing a workload model to be used as input to

analytic/simulation models (focus on resource-oriented characterisation)

1. establish the analysis standpoint (e.g., client, server or n/w in c/s system?)

2. identify the basic components

3. choose characterising parameters of each component; two groups of parameters:workload intensity- e.g., arrival rate, # of clients and think timeservice demands- specified by the K-tuple , where K is

the # of resources considered, and is the service demand of basic component at resource j.

4. Collect data; includes the following tasks:identify the time windowsmonitor and measure the system activities during the defined windowsassign values to each characterising parameter of every component

5. Partition the workload into classes of similar components; resource usage can be an important attribute for partitioning

),,,( 21 iKii DDD

ijDi

Page 24: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

24

Workload Characterisation6. Calculate the class parameters:

each workload component is characterised by an p-dim vector , where p is the number of parameters and is

the value for the j-th parameter.how to calculate the parameter values that represent a class?Available techniques:

(i) averaging

Given a sample of n observations corresponding to the n components of a class for some parameter j, the arithmetic mean is the simplest method to characterise the parameter. Other alternatives are geometric mean, harmonic mean, median, and mode.

(ii) specifying dispersion

If the variability in the data is large, the variance is often used. Other alternatives for specifying variability are standard deviation, coefficient of variation, range (minimum and maximum), 10- and 90-percentiles, semi-interquartile range, and the mean absolute deviation.

)( iw),,,( 21 ipiii xxxw ijx

},,,{ 21 nxxx x

Page 25: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

25

Workload Characterisation

(iii) single-parameter histograms shows the relative frequencies of various values of a parameter use if the variance is high and the averages cannot be used ignore the correlation among different parameters

(iv) multiparameter histograms use if there is a significant correlation between different workload

parameters describe using a multi-dimensional matrix or histograms

(v) Markov models sometimes, it is important to model the dependence among various

parameters; e.g., next request is dependent on the last few requests formally stating, if the next `system state’ depends only on the current

system state, the system follows a Markov model can be described by a transition matrix or state transition diagram

which gives the probabilities of the next state given the current state

Page 26: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

26

Workload Characterisation

e.g., transition probability matrix and the corresponding state transition diagram for a job’s transition between the CPU, disk and terminal are shown below:

From/To CPU Disk Terminal

CPU 0.6 0.3 0.1

Disk 0.9 0 0.1

Terminal 1 0 0

transition probabilities directly measured on the real system can be used in the workload model

(vi) clustering the main aim of clustering is to partition the components into groups

so the members of a group are as similar as possible, and different groups are as dissimilar as possible

goal is to minimise intragroup variance OR maximise intergroup variance

Term

1

0.3

0.1

0.9

0.1

0.6CPU Disks

Page 27: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

27

Workload Forecasting

Process of predicting how the workload will vary in future an important phase in capacity planning workload forecasting strategy combines both quantitative and

qualitative approaches

Page 28: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

28

Workload Forecasting techniques Several criteria for selecting forecasting techniques such as span (e.g., short-

range, medium-range, and long-term), availability of historical data, data pattern, desired accuracy, etc.

Three most often applied techniques are:

1. Linear Regression: simple linear regression formula is given by

where the constants a & b are determined by the method of least squares as

data

time

Trend Cyclical Seasonal Stationary

Historical data patterns

xbay

Page 29: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

29

where , for , are the n observed data points, ,

and

2. Moving Average: the forecast value is given by

where : forecast value at time t+1

: observed value at time t

n : no. of observations used to calculate

n

i i

n

iii

xnx

yxnyxb

1

22

1

)(

xbya ),( ii yx ),,1( ni

n

i iyny

1

1

n

i ixnx

1

1

n

yyyf ntttt

111

1tf

ty

Page 30: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

30

Workload Forecasting techniques adv: for nearly stationary data, accuracy achieved is usually high disadv:

only one forecast value can be calculated at a time difficulty in deciding n; usually minimum mean squared error criteria is

used to select n, where mean squared error is given by :

3. Exponential Smoothing: it is similar to moving average in the sense that both set the forecast value to the average of the observed values; however, exponential smoothing places more weight on the most recent observation. The forecast value is given by:

where : forecast value at time t+1

: observed value at time t

: smoothing weight

n

fyMSE

n

t tt

1

2)(

)(1 tttt fyff

ty1tf

)10(

Page 31: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

31

Workload Forecasting techniques

selected technique should be validated use only part of the observations to exercise the model remaining observations are then compared to the forecast values choose the technique that gives the smallest MSE

forecasting should not be restricted to a single parameter of the workload

Page 32: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

32

Benchmarking

benchmarking is the process of performance comparison for two or more systems by measurements (using standard well known benchmarks)

benchmarking refers to running a set of representative programs on different systems and measuring the results

applications: computer system procurement studies and comparative analyses of

products benchmarks are used as monitoring and diagnostic tools they help developers accurately test new systems or assess the impact

of modification to a system benchmark results can be used to estimate input parameters for

performance models to interpret the benchmark results properly, one must understand the

workload, the system under study, the tests, the measurements and the results

Page 33: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

33

Benchmarking There are three types of popular benchmarks: kernal, synthetic and applcn Application benchmarks:

set of programs taken from real workload they make use of almost all system resources example: benchmark suites from SPEC, TPC, etc. SPEC (System Performance Evaluation Cooperative), an organization

of computing industry vendors, has developed a standardised set of benchmarks (SPEC Benchmark suite) drawn from various engineering & scientific applications

as systems evolve, so must the benchmarks that are used to compare them => new versions are released periodically

SPEC develops benchmarks and publishes performance results SPEC publishes performance results of CPU, file server, web server,

and graphics benchmarks TPC (Transaction Processing Performance Council) is a nonprofit

organization that defines transaction processing and data base benchmarks

Page 34: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

34

Benchmarking

Bench marks can be grouped into two categories:

1. component-level: a set of tests and workloads are specified to measure component or subsystem performance such as CPU speed, I/O time, file server throughput, etc.

2. system-level: consider the entire system - measure the processor, the I/O subsystem, the network, the database, the compiler and the OS

to be useful, a benchmark should have the following attributes: relevant: it must provide meaningful performance measures within a

specific problem domain understandable: the benchmark results should be simple and easy to

understand scaleable: the tests must be applicable to a wide range of systems, in

terms of cost, performance and configuration acceptable: it should present unbiased results that are recognised by

users and vendors

Page 35: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

35

Examples Example 1: SPEC CPU - a component-level benchmark

designed to provide performance measures for comparing different systems for compute-intensive applications

concentrates on the perf. of processor, memory architecture & compiler SPECxx specifies the generation - current generation contains two suites:

CINT - for integer performanceCFT - for floating point performance

workload:CINT contains eight applications (see Table:1) and CFT tentable also shows the SPEC reference time - time on 40MHz super

SPARC as a reference m/cTable 1: SPEC CINT Benchmarks

Number Benchmark Ref Time (sec) Application Area

1 099.go 4,600 Artificial intelligence

2 124.m88ksim 1,900 Chip simulator

3 126.gcc 1,700 Programming

4 129.compress 1,800 Text file compression

5 130.li 1,900 List language interpreter

6 132.ijpeg 2,400 Image compression

7 134.perl 1,900 Shell interpreter

8 147.vortex 2,700 Object-oriented database

Page 36: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

36

Examples results:

typical SPEC CPU performance results are shown in Table 2 first four entries are for the CINT suite and the next four for CFTeach entry for CINT is a geometric mean of eight normalised ratios

corresponding to the eight bms listed in Table1both speed and throughput are providedalso, both peak and baseline results are given - baseline results are

aggregate with minimal compiler optimization, whereas peak obtained with heavy optimization

Table 2: SPEC CPU Benchmark Results for System X

Measure Result

SPECint 15.0

SPECint_base 12.6

SPECint_rate 135.0

SPECint_rate_base 118.0

SPECfp 21.4

SPECfp_base 17.9

SPECfp_rate 180.0

SPECfp_rate_base 168.0

Page 37: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

37

Examples

Example 2: TPC-C - a system-level benchmark an industry standard benchmark for moderately complex online

transaction processing systems models an application that manages orders for a wholesale supplier workload:

consists of five transactions - New-order, Payment, Delivery, Order-status, and Stock-level, that update, insert and delete

New-order and Payment transactions represent 45% and 43%, respectively, of the total load: other three account for 4% each

workload is database intensive, with substantial I/O & cache load results:

typical TPC-C results are shown in Table 3throughput (in tpmC) is the max number of New-order transactions

per minute that a system services90% of the New-order transactions should have response time less

than 5seconds

Page 38: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

38

Examples Table 3: TPC-C Results

Company X

System xyz

Processors 4

Disk capacity 708.33 GB

RAM 4GB

DBMS Microsoft SQL

Operating System Windows NT

Total system cost $460,220

TPC-C throughput (tpmC) 10,950.3

Price/performance $42.03

Other transactions have different response time requirementsI the price/performance ratio, price covers copmuter system

terminals, communication devices, s/w & a five yr maintenance cost

Page 39: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

39

Representation of Measurement Data

an important step in every performance evaluation study is the presentation of final results

graphic charts such as line charts, bar charts, pie charts, and histograms are commonly used

in addition, a number of graphic charts have been developed specifically for computer systems performance analysis; e.g., Gantt charts (or utilisation profiles) Kiviat graphs

Page 40: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

40

Gantt Charts (Utilisation Profiles)

used to show the relative duration of any number of Boolean conditions -conditions that are either true or false

overlap among resources can be shown can be effectively used for quick consultation:

evaluating the results of periodic measurements provides initial picture of the situation

0% 20% 40% 60% 80% 100%

CPU

I/O Channel

Network

60

20 20

30 10 5 15

UtilizationFig. Sample Gantt chart

Page 41: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

41

Gantt Charts

Example:

Data for Gantt chartA B C D time used (%)

0 0 0 0 5

0 0 0 1 5

0 0 1 0 0

0 0 1 1 5

0 1 0 0 10

0 1 0 1 5

0 1 1 0 10

0 1 1 1 5

1 0 0 0 10

1 0 0 1 5

1 0 1 0 0

1 0 1 1 5

1 1 0 0 10

1 1 0 1 10

1 1 1 0 5

1 1 1 1 10

----

100Utilization

0% 20% 40% 60% 80% 100%

A

B

C

55

35 30

155 515

D5 5 10 10 5 5 5 5

Fig. Final Gantt chart

A

B

C

D

A

B B

C C C CC` C` C` C`

B` B`

A`

55 45

20 35 30 15

15 5 15 20 15 15 5 10

Fig. Draft of the Gantt chart

Page 42: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

42

Kiviat Graphs

introduced by Kolence & Kiviat (1973) visual device that helps in quick identification of performance problems;

human ability to recognise shapes therefore permits faster qualitative evaluation of a system performance

the most common known version is from Kent Kents conventions:

1. select an even number of variables to be studied, half of which are HB performance metrics, and half are LB performance metrics

2. subdivide the circle into as many sectors as there are variables

3. number sequentially the semiaxes starting with the upward-going vertical semiaxis

4. associate HB performance indexes to odd semiaxes, and LB indexes to even semiaxes

the Kiviat graph for an ideal system is a star given two Kiviat graphs, it is easy to tell which system is more balanced by

looking at their shapes

Page 43: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

43

Kiviat Graphs

Example: for the given set of performance metrics, Kiviat graph for a balanced system and the one for a CPU-bound system are shown

performance metrics: the percentage of time spent in the following states 1. CPU busy 2. CPU only busy

3. CPU/channel overlap 4. channel only busy

5. any channel busy 6. CPU wait

7. CPU in problem state 8. CPU in supervisor state CPU busy90%CPU only

busy10%

CPU/channel overlap80%

channel only busy10% any channel

busy90%

CPU wait10%

CPU in problem state80%

CPU in supervisor state10%

CPU busy95%CPU only

busy85%

CPU/channel overlap10%

channel only busy5% any channel

busy15%

CPU wait5%

CPU in problem state90%

CPU in supervisor state5%

Kiviat graph for a balanced system Kiviat graph for a CPU-bound system

Page 44: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

44

Summarizing Measured Data

Measured data can be summarized by stating average (index of central tendency): mean (algebraic, geometric,

harmonic), media, or mode; and variability (index of dispersion): range, variance or standard deviation,

coefficient of variation, SIQR, mean absolute deviation, etc. type of distribution the data follows:

plot a histogram - determine the max and min of the observed values; divide the range into a number of subranges called cells or buckets; normalize (w.r.t. the total number of observations) the count of observations that fall into each cell; plot as a column chart the resulting cell frequencies

quantile-quantile plot - a better technique for small samples. It is a plot of observed quantiles versus theoretical quantiles. If the observations do agree with the selected theoretical distribution, the quantile-quantile plot will be linear

Page 45: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

45

Quantile-quantile plots

Let

- observed quantile

quantile using the theoretical distribution

that is, or ; where F(.) is the CDF of the selected theoretical distribution

For those distributions whose CDF cannot be inverted, one can use tables

For unit Normal distribution N(0,1) the following approximation is used

For , above computed values are scaled to before plotting

recall that if the observations come from the Normal distribution, the Normal quantile-quantile plot will be linear

)(iy thqithqx ii

)( ii xFq )(1ii qFx

])1([91.4 14.014.0iii qqx

),( N ix ix

Page 46: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

46

Quantile-quantile plots

Interpretation of Normal quantile- quantile plot

Observedquantiles

Normal quantiles

Observedquantiles

Normal quantiles

Observedquantiles

Normal quantiles

Observedquantiles

Normal quantiles

(a) Normal (b) Long tails

© Short tails (d) Asymmetric

Page 47: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

47

Confidence Interval a fundamental concept that every performance analyst needs to understand a concept commonly used in comparing systems using sample data definite statement about a system characteristics cannot be made based on

sample data basic idea here is to make a probabilistic statement about the range in which

the characteristics of the system would lie population characteristics do hold for all possible realizations of the

stochastic process describing the system behaviour, where as sample estimates hold only for one particular realization

population characteristics are called parameters while sample estimates are called statistics; e.g., population mean is a parameter, while sample mean is a statistic

parameters are fixed while statistics are random variables exact values of parameters can be computed only from the corresponding

probability distributions - often difficult however estimates can be obtained from sample data

)()(x

Page 48: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

48

Confidence Interval for the Mean Each sample mean is an estimate of the population mean given k samples, we have k different estimates we need a single perfect estimate of the population mean from these k

estimates the best we can do is to get probability bounds; get two bounds, & , such

that there is a high probability , , that the population mean is in the interval :

the interval is called confidence interval for the population mean, is called the significance level, 100 is called the confidence level ( %) and

is called the confidence coefficient it is possible to determine the CI from a single sample

CLT (Central Limit Theorem) allows us to determine the distribution of sample mean

1c 2c)1(

),( 21 cc 1}Pr{ 21 cc

),( 21 cc )1(

)1(

Page 49: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

49

Confidence Interval for the Mean

CLT => if the n observations in a sample are independent and the population has a mean and standard deviation , then for large n the sample mean has a Normal distribution :

standard deviation of sample mean is called standard error using the CLT, we can state with 100 % confidence that the

population mean is between and ( )

In other words, a 100 % CI for the population mean is given by

where - sample mean, s - sample deviation, n - sample size, and is the

quantile of a unit normal variate (available in tables)

},,{ 21 nxxx

),(~ nNx

)1(

),( 2121 nszxnszx

nszx 21 )1(

)( 21 nszx

x 21 z)2/1(

21 z

Area= 1

Page 50: 1 MEASUREMENT TECHNIQUES AND TOOLS  Types, Selection, Characterisation and Forecasting of Workload  Instrumentation  Benchmarking  Representation of

50

Confidence Interval for the Mean

standard error as n if the sample size is small ( say, n < 30), CI can be constructed only if

the observations come from a population with Normal distribution the procedure is the same as for large samples except that we use

the t-distribution in place of standard normal thus, the 100 % confidence interval is given by

here, is the -quantile of the t-variate with n-1 degrees of freedom (these quantiles are also available in tables)

),( ]1;21[]1;21[ nstxnstx nn

]1;21[ nt )2/1(

)1(