comparing systems powerpoints.pdf

7/25/2019 Comparing Systems Powerpoints.pdf

1/45

"The method that proceeds without analysis is like thegroping of a blind man. Socrates


2/45

9-2

Questions Addressed

What are the issues involved in comparing

two or more system configurations?

What is hypothesis testing?

How do common random numbers work?


3/45

Purpose

Purpose: comparison of alternative system designs.

Approach: discuss a few of many statistical

methods that can be used to compare two or more

system designs.

Statistical analysis is needed to discover whether observed

differences are due to:

Differences in design, or

The random fluctuation inherent in the models.

3


4/45

Outline

For two-system comparisons:

Independent sampling.

Correlated sampling (common random numbers).

For multiple system comparisons:

Bonferroni approach: confidence-intervalestimation, screening, and selecting the best.

4


5/45

9-5

Comparing Two Systems

Challenge is to determine if differences are

attributable to actual differences inperformance and not to statistical variation.

Running multiple replications, or batches,

is required.


6/45

9-6

Hypothesis Testing

Used when outcomes are close or high

precision is required.

A hypothesis is first formulated (e.g. thatmethods A and B both result in the same throughput)

and then a test is made to see whether the

results of the simulation lead us to rejectthe hypothesis.


7/459-7

What does it mean if you

fail to reject a hypothesis?

1. Our hypothesis is correct

OR

2. The variance in the observed outcomesare too high given the number ofreplications to be conclusive (run morereplications or use variance reduction techniques).


8/459-8

Types of Errors

Type I error: Reject True Hypothesis

Type II error: Accept False Hypothesis

True False

Accept

Reject III


9/459-9

Possible Confidence Intervals

(A)[------|------]

U1-U2=0

[------|------]

[------|------](C)

(B)

[------|------]denotes confidence Interval

Fail to

Reject H0

Reject H0

Reject H0


10/459-10

Comparison of two Buffer Strategies

Assume we have two candidate strategies believed to

maximize the throughput of the system.

We offer two methods based on the Confidence Interval

approach.

We seek to discover if the mean throughput of the systemdue to Strategy 1 and Strategy 2 are significantly different.

We begin by estimating the mean performance of the two

proposed strategies by simulating each strategy for 16 days

past the warm-up period.

The simulation experiment was replicated 10 times for

each strategy.

The throughput achieved by each strategy is shown next.


11/459-11

Comparison of two Buffer Strategies

(A)

Replication

(B)

Strategy 1

Throughput

(C)

Strategy 2

Throughput

1 54.48 56.01

2 57.36 54.08

3 54.81 52.14

4 56.20 53.49

5 54.83 55.49

6 57.69 55.00

7 58.33 54.88

8 57.19 54.47

9 56.84 54.93

10 55.29 55.84

mean 56.30 54.63

Standard Deviation 1.37 1.17

Variance 1.89 1.36


12/459-12

Welch Confidence Interval

Requires that the observations drawn

from each simulated system benormally distributed and independent

within a population and betweenpopulations.


13/459-13


14/459-14

17.5

= T.INV.2T(0.05,17.5) = 2.10


15/45

9-15


16/459-16


17/45

Paired-t CI for Comparing 2 Systems

Like the Welch CI method, the paired-t CI

method requires that the observations drawing

from each population be normally distr ibuted

and independentwithin a population.

However, the paired-t CI method does not

require that the observations betweenpopulations be independent.

9-17


18/45

Paired-t CI for Comparing 2 Systems

This allows us to use Common Random

Numbers to force a positive correlation between

the two populations in order to reduce the half-

width.

Finally, like the Welch method, the paired-t CI

method does not require that the population have

equal variances.

Given n observations (n1=n2=n), we pair the

observations from each population (x1and x2) to

define a new random variable: x(1-2)j = x1j- x2j9-18


19/459-19


20/45

9-20

Paired-t Comparison

Variance 3.42

(A)

Replicat ion (j)

(B )

Strategy 1Throughputx1j

(C)

Strategy 2Throughput

x2j

(D)

Difference (B C)

X(1-2)j

= x1j- x2j

1 54.48 56.01 -1.53

2 57.36 54.08 3.28

3 54.81 52.14 2.674 56.20 53.49 2.71

5 54.83 55.49 -0.66

6 57.69 55.00 2.69

7 58.33 54.88 3.45

8 57.19 54.47 2.72

9 56.84 54.93 1.91

10 55.29 55.84 -0.55

mean 1.67

Standard Dev. 1.85


21/45

9-21


22/45

9-22

Comparing More than 2 Systems

Bonferroni approach

ANOVA

Factorial design and

optimization experiments


23/45

9-23


24/45

9-24


25/45

9-25


26/45

9-26


27/45

9-27

Factorial Design

Input variables are the factors

Output measures are the responses

Tests system response(s) when

multiple factors are being manipulated.


28/45

9-28

Two-level Full-factorial Design

Define a high and low level setting for each factor.

Try every combination of factor settings.

Run detailed studies for factors that have the

greatest impact.


29/45

9-29

Fractional-factorial Design

Strategically "screen out" factors that have little or no

impact on system performance. On remaining factors, run a full-factorial experiment.

Run detailed studies for factors that have the greatest

impact.


30/45

9-30

Variance Reduction

Common Random Numbers (CRN)

Evaluates each system under the exact same

circumstances.Helps ensure that observed differences of

system designs are due to the differences in

the designs and not to differences inexperimental conditions.

Antithetic Random Numbers (ARN)


31/45

9-31

CRN ContinuedTable 7.10. Comparison of two buffer allocation strategies usincommon random numbers.(A)

Repl icat ion

(B)

Strategy 1

Throughput

(C)

Strategy 2

Throughput

(D)

Differenc e (B -C)

1 79.05 75.09 3.96

2 54.96 51.09 3.87

3 51.23 49.09 2.14

4 88.74 88.01 0.73

5 56.43 53.34 3.09

6 70.42 67.54 2.88

7 35.71 34.87 0.84

8 58.12 54.24 3.88

9 57.77 55.03 2.74

10 45.08 42.55 2.53

XDifference= 2.67

sDifference= 1.16


32/45

More Detail on Comparison of Two

System Designs

Goal: compare two possible system configurations, e.g.:

two possible ordering policies in a supply-chain system,

two possible scheduling rules in a job shop.

Approach: the method of replications is used to analyze the

output data.

The mean performance measure for system iis denoted by

i (i = 1, 2) to obtain point and interval estimates for the

difference in mean performance, namely q1q2.

32


33/45

Comparison of Two System Designs

Vehicle-safety inspection example:

The station performs 3jobs: (1) brake check, (2) headlight check, and (3)steering check.

Vehicles arrival: Possion with rate = 9.5/hour.

Present system:

Three stalls in parallel (one attendant makes all 3 inspections at each stall).

Service times for the 3jobs: normally distributed with means 6.5, 6.0and 5.5minutes, respectively.

Alternative system:

Each attendant specializes in a single task, each vehicle will pass throughthree work stations in series

Mean service times for each job decreases by 10%(5.85, 5.4, and 4.95minutes).

Performance measure:mean response time per vehicle (total time fromvehicle arrival to its departure).

33


34/45


From replication rof system i, the simulation analyst obtains an estimate

Yirof the mean performance measure qi.

Assuming that the estimators Yirare (at least approximately) unbiased:

q1= E(Y1r), r = 1, , R1; q2= E(Y2r), r = 1, , R2

Goal: compute a confidence interval for 1 2to compare thetwo system designs

Confidence interval for 1 2: If c.i. is totally to the left of 0, strong evidence for the hypothesis that q1q2 < 0 (q1< q2).

If c.i. is totally to the right of 0, strong evidence for the hypothesis that q1q2 > 0 (q1> q2).

If c.i. contains 0, no strong statistical evidence that one system is better than the other

If enough additional data were collected (i.e., Riincreased), the c.i. would

most likely shift, and definitely shrink in length, until conclusion of 1< 2 or

1> 2would be drawn.34


35/45


A two-sided 100(1-)% c.i. for q1q2always takes the

form of:

To calculate the Standard Error, the analyst uses one of

two statistical techniques.

Both techniques assume that the basic data Yir are

approximately normally distributed.

We will discuss these two methods next.

).(. 2.1.,2/2.1. YYestYY

35

freedom.ofdegresstheisand

ns,replicatioalloversystemformeasureeperformancmeansampletheiswhere .

iYi


36/45


Statistically significant versus practically significant

Statistical significance: is the observed difference

larger than the variability in ?

Practical significance: is the true difference q1q2large

enough to matter for the decision we need to make?

Confidence intervals do not answer the question of

practical significance directly, instead, they bound the true

difference within a range.

2.1. YY

36

2.1. YY


37/45

Independent Sampling with Equal Variances

[Comparison of 2 systems]

Different and independent random number streams are

used to simulate the two systems

All observations of simulated system 1are statistically

independent of all the observations of simulated system 2.

The variance of the sample mean, , is:

For independent samples:

21,2

.. ,i

RR

YVYV

i

i

i

ii

iY

.

37

2

22

1

21

2.1.2.1.

RR

YVYVYYV


38/45

Independent Sampling with Equal Variances


If it is reasonable to assume that 21 22(approximately) or ifR1= R2,a two-sample-t confidence-interval approach can be used:

The point estimate of the mean performance difference is:

The sample variance for system i is:

The pooled estimate of 2is:

C.I. is given by:

Standard error:

).(. 2.1.,2/2.1. YYestYY

2.1. YY

38

ii R

r

iirii

R

r

irii

i YRYR

YYR

S

1

2.

2

1

2

.2

1

1

1

1

freedomofdegrees221re whe,2)1()1(

21

2

22

2

112 -RRRR

SRSRSp

21

2.1.

11..

RRSYYes p


39/45

Independent Sampling with Unequal Variances


If the assumption of equal variances cannot safely be made, anapproximate 100(1-)%C.I. for can be computed as:

With degrees of freedom:

Minimum number of replicationsR1> 7andR2> 7is recommended.

39

intergerantoround,

1//1//

//

2

2

2221

2

121

2

2221

21

RRSRRS

RSRS

2

22

1

21

2.1...R

S

R

SYYes


40/45


[Comparison of 2 systems] For each replication, the same random numbers are used to simulate

both systems.

For each replication r, the two estimates, Yr1and Yr2, are correlated.

However, independent streams of random numbers are used on different

replications, so the pairs (Yr1,Ys2) are mutually independent.

Purpose: induce positive correlation between (for each r) to

reduce variance in the point estimator of .

Variance of arising from CRN is less than that of independent

sampling (withR1=R2).

RRR

YYYVYVYYV

2112

2

2

2

1

2.1.2.1.2.1.

2

,cov2

.1 .2,Y Y

40

12is positive

2.1. YY

2.1. YY


41/45

Common Random Numbers (CRN) [Comparison of 2 systems]

The estimator based on CRN is more precise, leading to a

shorter confidence interval for the difference.

Sample variance of the differences :

Standard error:

1-Rfreedomofdegresswith,1

andwhere

1

1

1

1

1

21

1

22

1

22

R

r

rrrr

R

r

r

R

r

rD

DR

D-YYD

DRDR

DDR

S

2.1. YYD

41

R

SYYesDes D 2.1...).(.


42/45


[Comparison of 2 systems] It is never enough to simply use the same seed for the

random-number generator(s):

The random numbers must be synchronized: each

random number used in one model for some purposeshould be used for the same purpose in the other

model.

e.g., if the ithrandom number is used to generate a

service time at work station 2 for the 5tharrival in

model 1, the ithrandom number should be used for

the very same purpose in model 2.

42


43/45

Comparison of Several System Designs

To compare K alternative system designsbased on some specific

performance measure, qi, of system i, for i = 1, 2, , K.

Procedures are classified as:

Fixed-sample-size procedures:predetermined sample size is used to

draw inferences via hypothesis tests of confidence intervals.

Sequential sampling (multistage): more and more data are collected

until an estimator with a pre-specified precision is achieved or until one

of several alternative hypotheses is selected.

Some goals/approaches of system comparison:Estimation of each parameter q.

Comparison of each performance measure qi, to control q1.

All pairwise comparisons, qi- qj, for all i not equal to j

Selection of the best qi. 43


44/45

Bonferroni Approach[Multiple Comparisons]

To make statements about several parameters simultaneously,(where all statements are true simultaneously).

Bonferroni inequality:

The smaller jis, the wider thejthconfidence interval will be.

Major advantage: inequality holds whether models are run with

independent sampling or CRN

Major disadvantage: width of each individual interval increases

as the number of comparisons increases.

C

jEji

, ...,CiSP1

11)1true,arestatementsall(

44

Overall error probability, provides an upper bound on the

probability of a false conclusion


45/45

Bonferroni Approach [Multiple Comparisons]

Should be used only when a small number of comparisons are made. Practical upper limit: about 10 comparisons

3 possible applications:

Individual c.i.s: Construct a 100(1- j)%c.i. for parameter qi,

where # of comparisons = K.

Comparison to an existing system: Construct a 100(1- j)%

c.i. for parameter qi

- q1

(i = 2,3, K), where # of comparisons

=K1.

All pairwise: For any 2 different system designs, construct a

100(1- j)%c.i. for parameter qi- qj. Hence, total # of

comparing systems powerpoints.pdf

Documents