comparisons via simulation - georgia institute of technology · depends on the problem at hand: ......

Unit 5a: Comparisons via Simulation

Kwok Tsui(and Seonghee Kim)

School of Industrial and Systems EngineeringGeorgia Institute of Technology

Motivation

• Simulations are typically run to compare 2 or more alternative system designs or scenarios.

• Simulations, as all models, provide better estimates of relative difference than they do absolute performance because the same simplifications go into all the models.

Types of Comparisons

• Determining which scenarios have similar performance.

• Determining which scenarios are better than a standard or default.

• Determining which scenario is the “best.”• Determining how a system’s performance

changes as a function of controllable parameters, or optimizing over the parameters.

When are scenarios “different?”

• There is a distinction between statistical and practical difference.

• A practically meaningful difference depends on the problem at hand:– 5 minutes in cycle time– $10,000 on a portfolio’s return– 100 people being unable to connect

Continued…

• Statistical significance depends on how much sampling variability there is in the point estimate:– A 95% confidence interval for the

difference in expected cycle time between model A and B is 4 ±

5 minutes. What can

we conclude?– What if it is 4 ±

1 minute?

Controlling Significance

• We use statistical procedures to tell us whether we can believe the difference we see in the results from two or more simulations.

• We use the number of replications to control the size of the differences that are detectable; that is, to control the error in our estimates.

Special Opportunities

• In simulation, more so than in other statistical experiments, we control the source of randomness.

• By using the same random numbers to drive the simulation of each scenario we achieve sharper comparisons. This is known as “correlated sampling” or “common random numbers” (CRN).

Intuition behind CRN

• We want each scenario to see the same source of randomness (demands for product, service times, failed machines,customer arrivals, etc.).

• CRN implies that differences in observed performance will be primarily due to differences in the scenarios, not differences in the random inputs.

Impact of CRN

The outputs are variable, but CRN makes it easy to see that “two loaders” has smaller response time.

Example 12.2 Dump Truck

0

5

10

15

20

25

30

1 2 3 4 5 6

Replication

Aver

age

Resp

onse

Tim

e

Two LoadersOne Loader

Math behind CRN

),(Cov2)(Var)(Var)(Var 212121 YYYYYY −+=−

If scenarios are simulated independently (different random numbers), then Cov = 0. But if we use CRN then Cov > 0 (usually), reducing the variance of the difference.

CRN Happens

• Note that CRN is, essentially, the default experiment design unless we explicitly do something to cause each simulation to use different random numbers.

• However, there are things we can do to make the effect of CRN stronger.

Making CRN Work

• The effect of CRN is enhanced if the same random number is used for the same purpose in each simulated scenario.

• The primary way to make this happen is to assign a distinct random number stream to each distinct input process (interarrival times, service times, etc.)

What are Streams?

• Remember that pseudorandom numbers are provided by a generator with a (very) long period.

• Streams are just different starting places (very far apart) within this long sequence.

• Arena has many streams(1.8 * 1019)

Making CRN Work Better

• Use the same stream for an input process even if the distribution changes.– Model A service time: Expo(7.1, 9)– Model B service time: Tria(2,6,12, 9)

• If entities get any randomly assigned attributes, then assign them all at once when the entity is created.

stream

Making it Work EVEN Better

• We want Models A and B to use the same random numbers for the same purpose on each replication of Model A and Model B (as much as possible).

• This is difficult because two models may consume different numbers of random numbers on each replication.

“Burning” Random Numbers

Model Rep 1 Rep 2 Rep 3…

A R1 ,…, R12593

burn R100001 , …

burn R200001 , …

B R1 ,…, R12471

burn R100001 , …

burn R200001 , …

We can skip random numbers at the end of each replication to synchronize them.

Arena does it automatically

Comparing Means

• A standard comparison of scenarios is via differences in their mean performance.

• A common way to compare means is to look for overlapping confidence intervals for each mean.

Box & Whisker Chart

Box shows 95% c.i. for the mean

These intervals overlap

Whiskers show max and min observations

Problems with Overlapping C.I.s

• If each individual interval has 95% confidence, then the overall confidence for all intervals simultaneously is < 95%.

• If the intervals don’t overlap then the scenarios are different, but they may be different even when the intervals do overlap.

• This approach does not exploit CRN.

Better Methods

• We will start with the case of K=2 scenarios, numbered 1 and 2.

Scenario Outputs from R Reps Statistics1 Y11, Y21 , Y31 ,…, YR1

2 Y12, Y22 , Y32 ,…, YR2

1 – 2 D1 , D2 , D3 ,…, DR

211,SY222,SY2, DSD

Paired-t Interval

• Interval for difference in means θ1 - θ2

• Allows unequal variances, and exploits CRN.

• Assumes normally distributed data

RStD D

R

2

1,2/ −± α

Two-Sample t Interval

• Assumes equal variances, no CRN.• Assumes normally distributed data• Has double the degrees of freedom of

the paired-t

RSStYY R

22

21

)1(2,2/21+±− −α

Comparison

• We typically prefer paired t because we have no reason to believe variances will be equal.

• Provided the number of reps is 10 or more, even a little bit of positive correlation from CRN will overcome the loss of degrees of freedom.

Practical Significance

• When we construct confidence intervals for θ1 - θ2 we want to be able to detect differences that matter.

• If we want to detect differences of more that ±ε, then after R0 initial replications we set… 2

1,2/ 0

⎟⎟⎠

⎞⎜⎜⎝

⎛≥ −

εα DR St

R

Example 12.1

• From 10 reps we get an estimate of the difference in response time between two configurations for vehicle inspection of 0.4 ±

0.9 minutes with 95%

confidence.• Suppose a difference of ±

0.5 minutes

matters.

Example 12.1 continued

reps 35)5.0(

)7.1()26.2(2

2

2)110(,2/05.0

=⎟⎟⎠

⎞⎜⎜⎝

⎛≥

⎟⎟⎠

⎞⎜⎜⎝

⎛≥ −

R

StR D

ε

Alternative Approach

• When SD2 is not available use…

2

22

21

2 )()10(2,2/

εα

SStR R

+≥ −

Comparing More than Two

• When we compare more than two scenarios, looking at overlapping confidence intervals is even less appropriate.

• And looking at all differences θi - θj is not the most efficient way to compare scenarios when our goal is to identify the “best.”

Approaches for K > 2

• Form simultaneous confidence intervals for all differences. In this case we need to adjust for multiplicity.

• Identify a subset that contains the best; this is called subset selection.

• Run a multi-stage procedure specifically designed to find the best; this is called ranking (the book gives one procedure).

Simultaneous C.I.s

• Remember that if the confidence level is 1-α, then the chance of making an error is no more than α.

• The Bonferroni inequality says that if we form C intervals, each at level 1- α, then

αC-1 cover}intervalsallPr{ ≥

Example

• Suppose we have K=4 scenarios, and we want to estimate θi - θj for all C = K(K-1)/2 = 6 pairs of means with overall confidence level of 95%.

• Then we should form each confidence interval at the 1 – 0.05/6 = 0.99 level of confidence. Notice that this makes all intervals much wider.

Subset Selection Approach

• A subset selection procedure guarantees, with given confidence level, to find a set that contains a “may be the best” system.

• One way to find the best is to keep increasing R until the subset only contains one scenario.

Identify the Best in PAN

Check box causes PAN to identify all scenarios that might be the best.

The error tolerance is how far you are willing to be off from including the true best.

Graphical Identification of Best

Error Tolerance

• The procedure guarantees, with 95% confidence, to provide a subset of scenarios that contains the best when Tolerance = 0.

• When Tolerance > 0, the subset will contain the best, or a scenario within Tolerance of the best, with 95% confidence.

In this case an error tolerance of 0.05 (5% utilization) causes one scenario to be identified as best. We are guaranteed (with high confidence) that this is the true best, or within 0.05 of it.

With the same data, an error tolerance of 0 causes 4 scenarios to be placed in the group that contains the best. Less risk, but less conclusive.

Intuition

• Compute the sample mean from each scenario.

• Keep the scenario with the best (largest or smallest) sample mean.

• Keep the other scenarios whose sample means are not too far from the best based on a type of confidence interval for the difference.

Controlling Error

• If our goal is to find the best, then we can increase the number of replications until the subset has only one scenario.

• There is no direct way tell how many replications will be needed, but don’t add fewer than 10 replications at a time.

• The book contains a two-stage procedure that guarantees selecting a single scenario.

comparisons via simulation - georgia institute of technology · depends on the problem at hand: ......

Documents