comparisons via simulation - georgia institute of technology · depends on the problem at hand: ......

38
Unit 5a: Comparisons via Simulation Kwok Tsui (and Seonghee Kim) School of Industrial and Systems Engineering Georgia Institute of Technology

Upload: others

Post on 08-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Comparisons via Simulation - Georgia Institute of Technology · depends on the problem at hand: ... • In simulation, more so than in other statistical experiments, we control the

Unit 5a: Comparisons via Simulation

Kwok Tsui(and Seonghee Kim)

School of Industrial and Systems EngineeringGeorgia Institute of Technology

Page 2: Comparisons via Simulation - Georgia Institute of Technology · depends on the problem at hand: ... • In simulation, more so than in other statistical experiments, we control the

Motivation

• Simulations are typically run to compare 2 or more alternative system designs or scenarios.

• Simulations, as all models, provide better estimates of relative difference than they do absolute performance because the same simplifications go into all the models.

Page 3: Comparisons via Simulation - Georgia Institute of Technology · depends on the problem at hand: ... • In simulation, more so than in other statistical experiments, we control the

Types of Comparisons

• Determining which scenarios have similar performance.

• Determining which scenarios are better than a standard or default.

• Determining which scenario is the “best.”• Determining how a system’s performance

changes as a function of controllable parameters, or optimizing over the parameters.

Page 4: Comparisons via Simulation - Georgia Institute of Technology · depends on the problem at hand: ... • In simulation, more so than in other statistical experiments, we control the

When are scenarios “different?”

• There is a distinction between statistical and practical difference.

• A practically meaningful difference depends on the problem at hand:– 5 minutes in cycle time– $10,000 on a portfolio’s return– 100 people being unable to connect

Page 5: Comparisons via Simulation - Georgia Institute of Technology · depends on the problem at hand: ... • In simulation, more so than in other statistical experiments, we control the

Continued…

• Statistical significance depends on how much sampling variability there is in the point estimate:– A 95% confidence interval for the

difference in expected cycle time between model A and B is 4 ±

5 minutes. What can

we conclude?– What if it is 4 ±

1 minute?

Page 6: Comparisons via Simulation - Georgia Institute of Technology · depends on the problem at hand: ... • In simulation, more so than in other statistical experiments, we control the

Controlling Significance

• We use statistical procedures to tell us whether we can believe the difference we see in the results from two or more simulations.

• We use the number of replications to control the size of the differences that are detectable; that is, to control the error in our estimates.

Page 7: Comparisons via Simulation - Georgia Institute of Technology · depends on the problem at hand: ... • In simulation, more so than in other statistical experiments, we control the

Special Opportunities

• In simulation, more so than in other statistical experiments, we control the source of randomness.

• By using the same random numbers to drive the simulation of each scenario we achieve sharper comparisons. This is known as “correlated sampling” or “common random numbers” (CRN).

Page 8: Comparisons via Simulation - Georgia Institute of Technology · depends on the problem at hand: ... • In simulation, more so than in other statistical experiments, we control the

Intuition behind CRN

• We want each scenario to see the same source of randomness (demands for product, service times, failed machines,customer arrivals, etc.).

• CRN implies that differences in observed performance will be primarily due to differences in the scenarios, not differences in the random inputs.

Page 9: Comparisons via Simulation - Georgia Institute of Technology · depends on the problem at hand: ... • In simulation, more so than in other statistical experiments, we control the

Impact of CRN

The outputs are variable, but CRN makes it easy to see that “two loaders” has smaller response time.

Example 12.2 Dump Truck

0

5

10

15

20

25

30

1 2 3 4 5 6

Replication

Aver

age

Resp

onse

Tim

e

Two LoadersOne Loader

Page 10: Comparisons via Simulation - Georgia Institute of Technology · depends on the problem at hand: ... • In simulation, more so than in other statistical experiments, we control the

Math behind CRN

),(Cov2)(Var)(Var)(Var 212121 YYYYYY −+=−

If scenarios are simulated independently (different random numbers), then Cov = 0. But if we use CRN then Cov > 0 (usually), reducing the variance of the difference.

Page 11: Comparisons via Simulation - Georgia Institute of Technology · depends on the problem at hand: ... • In simulation, more so than in other statistical experiments, we control the

CRN Happens

• Note that CRN is, essentially, the default experiment design unless we explicitly do something to cause each simulation to use different random numbers.

• However, there are things we can do to make the effect of CRN stronger.

Page 12: Comparisons via Simulation - Georgia Institute of Technology · depends on the problem at hand: ... • In simulation, more so than in other statistical experiments, we control the

Making CRN Work

• The effect of CRN is enhanced if the same random number is used for the same purpose in each simulated scenario.

• The primary way to make this happen is to assign a distinct random number stream to each distinct input process (interarrival times, service times, etc.)

Page 13: Comparisons via Simulation - Georgia Institute of Technology · depends on the problem at hand: ... • In simulation, more so than in other statistical experiments, we control the

What are Streams?

• Remember that pseudorandom numbers are provided by a generator with a (very) long period.

• Streams are just different starting places (very far apart) within this long sequence.

• Arena has many streams(1.8 * 1019)

Page 14: Comparisons via Simulation - Georgia Institute of Technology · depends on the problem at hand: ... • In simulation, more so than in other statistical experiments, we control the

Making CRN Work Better

• Use the same stream for an input process even if the distribution changes.– Model A service time: Expo(7.1, 9)– Model B service time: Tria(2,6,12, 9)

• If entities get any randomly assigned attributes, then assign them all at once when the entity is created.

stream

Page 15: Comparisons via Simulation - Georgia Institute of Technology · depends on the problem at hand: ... • In simulation, more so than in other statistical experiments, we control the

Making it Work EVEN Better

• We want Models A and B to use the same random numbers for the same purpose on each replication of Model A and Model B (as much as possible).

• This is difficult because two models may consume different numbers of random numbers on each replication.

Page 16: Comparisons via Simulation - Georgia Institute of Technology · depends on the problem at hand: ... • In simulation, more so than in other statistical experiments, we control the

“Burning” Random Numbers

Model Rep 1 Rep 2 Rep 3…

A R1 ,…, R12593

burn R100001 , …

burn R200001 , …

B R1 ,…, R12471

burn R100001 , …

burn R200001 , …

We can skip random numbers at the end of each replication to synchronize them.

Arena does it automatically

Page 17: Comparisons via Simulation - Georgia Institute of Technology · depends on the problem at hand: ... • In simulation, more so than in other statistical experiments, we control the

Comparing Means

• A standard comparison of scenarios is via differences in their mean performance.

• A common way to compare means is to look for overlapping confidence intervals for each mean.

Page 18: Comparisons via Simulation - Georgia Institute of Technology · depends on the problem at hand: ... • In simulation, more so than in other statistical experiments, we control the

Box & Whisker Chart

Box shows 95% c.i. for the mean

These intervals overlap

Whiskers show max and min observations

Page 19: Comparisons via Simulation - Georgia Institute of Technology · depends on the problem at hand: ... • In simulation, more so than in other statistical experiments, we control the

Problems with Overlapping C.I.s

• If each individual interval has 95% confidence, then the overall confidence for all intervals simultaneously is < 95%.

• If the intervals don’t overlap then the scenarios are different, but they may be different even when the intervals do overlap.

• This approach does not exploit CRN.

Page 20: Comparisons via Simulation - Georgia Institute of Technology · depends on the problem at hand: ... • In simulation, more so than in other statistical experiments, we control the

Better Methods

• We will start with the case of K=2 scenarios, numbered 1 and 2.

Scenario Outputs from R Reps Statistics1 Y11, Y21 , Y31 ,…, YR1

2 Y12, Y22 , Y32 ,…, YR2

1 – 2 D1 , D2 , D3 ,…, DR

211,SY222,SY2, DSD

Page 21: Comparisons via Simulation - Georgia Institute of Technology · depends on the problem at hand: ... • In simulation, more so than in other statistical experiments, we control the

Paired-t Interval

• Interval for difference in means θ1 - θ2

• Allows unequal variances, and exploits CRN.

• Assumes normally distributed data

RStD D

R

2

1,2/ −± α

Page 22: Comparisons via Simulation - Georgia Institute of Technology · depends on the problem at hand: ... • In simulation, more so than in other statistical experiments, we control the

Two-Sample t Interval

• Assumes equal variances, no CRN.• Assumes normally distributed data• Has double the degrees of freedom of

the paired-t

RSStYY R

22

21

)1(2,2/21+±− −α

Page 23: Comparisons via Simulation - Georgia Institute of Technology · depends on the problem at hand: ... • In simulation, more so than in other statistical experiments, we control the

Comparison

• We typically prefer paired t because we have no reason to believe variances will be equal.

• Provided the number of reps is 10 or more, even a little bit of positive correlation from CRN will overcome the loss of degrees of freedom.

Page 24: Comparisons via Simulation - Georgia Institute of Technology · depends on the problem at hand: ... • In simulation, more so than in other statistical experiments, we control the

Practical Significance

• When we construct confidence intervals for θ1 - θ2 we want to be able to detect differences that matter.

• If we want to detect differences of more that ±ε, then after R0 initial replications we set… 2

1,2/ 0

⎟⎟⎠

⎞⎜⎜⎝

⎛≥ −

εα DR St

R

Page 25: Comparisons via Simulation - Georgia Institute of Technology · depends on the problem at hand: ... • In simulation, more so than in other statistical experiments, we control the

Example 12.1

• From 10 reps we get an estimate of the difference in response time between two configurations for vehicle inspection of 0.4 ±

0.9 minutes with 95%

confidence.• Suppose a difference of ±

0.5 minutes

matters.

Page 26: Comparisons via Simulation - Georgia Institute of Technology · depends on the problem at hand: ... • In simulation, more so than in other statistical experiments, we control the

Example 12.1 continued

reps 35)5.0(

)7.1()26.2(2

2

2)110(,2/05.0

=⎟⎟⎠

⎞⎜⎜⎝

⎛≥

⎟⎟⎠

⎞⎜⎜⎝

⎛≥ −

R

StR D

ε

Page 27: Comparisons via Simulation - Georgia Institute of Technology · depends on the problem at hand: ... • In simulation, more so than in other statistical experiments, we control the

Alternative Approach

• When SD2 is not available use…

2

22

21

2 )()10(2,2/

εα

SStR R

+≥ −

Page 28: Comparisons via Simulation - Georgia Institute of Technology · depends on the problem at hand: ... • In simulation, more so than in other statistical experiments, we control the

Comparing More than Two

• When we compare more than two scenarios, looking at overlapping confidence intervals is even less appropriate.

• And looking at all differences θi - θj is not the most efficient way to compare scenarios when our goal is to identify the “best.”

Page 29: Comparisons via Simulation - Georgia Institute of Technology · depends on the problem at hand: ... • In simulation, more so than in other statistical experiments, we control the

Approaches for K > 2

• Form simultaneous confidence intervals for all differences. In this case we need to adjust for multiplicity.

• Identify a subset that contains the best; this is called subset selection.

• Run a multi-stage procedure specifically designed to find the best; this is called ranking (the book gives one procedure).

Page 30: Comparisons via Simulation - Georgia Institute of Technology · depends on the problem at hand: ... • In simulation, more so than in other statistical experiments, we control the

Simultaneous C.I.s

• Remember that if the confidence level is 1-α, then the chance of making an error is no more than α.

• The Bonferroni inequality says that if we form C intervals, each at level 1- α, then

αC-1 cover}intervalsallPr{ ≥

Page 31: Comparisons via Simulation - Georgia Institute of Technology · depends on the problem at hand: ... • In simulation, more so than in other statistical experiments, we control the

Example

• Suppose we have K=4 scenarios, and we want to estimate θi - θj for all C = K(K-1)/2 = 6 pairs of means with overall confidence level of 95%.

• Then we should form each confidence interval at the 1 – 0.05/6 = 0.99 level of confidence. Notice that this makes all intervals much wider.

Page 32: Comparisons via Simulation - Georgia Institute of Technology · depends on the problem at hand: ... • In simulation, more so than in other statistical experiments, we control the

Subset Selection Approach

• A subset selection procedure guarantees, with given confidence level, to find a set that contains a “may be the best” system.

• One way to find the best is to keep increasing R until the subset only contains one scenario.

Page 33: Comparisons via Simulation - Georgia Institute of Technology · depends on the problem at hand: ... • In simulation, more so than in other statistical experiments, we control the

Identify the Best in PAN

Check box causes PAN to identify all scenarios that might be the best.

The error tolerance is how far you are willing to be off from including the true best.

Page 34: Comparisons via Simulation - Georgia Institute of Technology · depends on the problem at hand: ... • In simulation, more so than in other statistical experiments, we control the

Graphical Identification of Best

Page 35: Comparisons via Simulation - Georgia Institute of Technology · depends on the problem at hand: ... • In simulation, more so than in other statistical experiments, we control the

Error Tolerance

• The procedure guarantees, with 95% confidence, to provide a subset of scenarios that contains the best when Tolerance = 0.

• When Tolerance > 0, the subset will contain the best, or a scenario within Tolerance of the best, with 95% confidence.

Page 36: Comparisons via Simulation - Georgia Institute of Technology · depends on the problem at hand: ... • In simulation, more so than in other statistical experiments, we control the

In this case an error tolerance of 0.05 (5% utilization) causes one scenario to be identified as best. We are guaranteed (with high confidence) that this is the true best, or within 0.05 of it.

With the same data, an error tolerance of 0 causes 4 scenarios to be placed in the group that contains the best. Less risk, but less conclusive.

Page 37: Comparisons via Simulation - Georgia Institute of Technology · depends on the problem at hand: ... • In simulation, more so than in other statistical experiments, we control the

Intuition

• Compute the sample mean from each scenario.

• Keep the scenario with the best (largest or smallest) sample mean.

• Keep the other scenarios whose sample means are not too far from the best based on a type of confidence interval for the difference.

Page 38: Comparisons via Simulation - Georgia Institute of Technology · depends on the problem at hand: ... • In simulation, more so than in other statistical experiments, we control the

Controlling Error

• If our goal is to find the best, then we can increase the number of replications until the subset has only one scenario.

• There is no direct way tell how many replications will be needed, but don’t add fewer than 10 replications at a time.

• The book contains a two-stage procedure that guarantees selecting a single scenario.