comparing systems powerpoints.pdf
TRANSCRIPT
-
7/25/2019 Comparing Systems Powerpoints.pdf
1/45
"The method that proceeds without analysis is like thegroping of a blind man. Socrates
-
7/25/2019 Comparing Systems Powerpoints.pdf
2/45
9-2
Questions Addressed
What are the issues involved in comparing
two or more system configurations?
What is hypothesis testing?
How do common random numbers work?
-
7/25/2019 Comparing Systems Powerpoints.pdf
3/45
Purpose
Purpose: comparison of alternative system designs.
Approach: discuss a few of many statistical
methods that can be used to compare two or more
system designs.
Statistical analysis is needed to discover whether observed
differences are due to:
Differences in design, or
The random fluctuation inherent in the models.
3
-
7/25/2019 Comparing Systems Powerpoints.pdf
4/45
Outline
For two-system comparisons:
Independent sampling.
Correlated sampling (common random numbers).
For multiple system comparisons:
Bonferroni approach: confidence-intervalestimation, screening, and selecting the best.
4
-
7/25/2019 Comparing Systems Powerpoints.pdf
5/45
9-5
Comparing Two Systems
Challenge is to determine if differences are
attributable to actual differences inperformance and not to statistical variation.
Running multiple replications, or batches,
is required.
-
7/25/2019 Comparing Systems Powerpoints.pdf
6/45
9-6
Hypothesis Testing
Used when outcomes are close or high
precision is required.
A hypothesis is first formulated (e.g. thatmethods A and B both result in the same throughput)
and then a test is made to see whether the
results of the simulation lead us to rejectthe hypothesis.
-
7/25/2019 Comparing Systems Powerpoints.pdf
7/459-7
What does it mean if you
fail to reject a hypothesis?
1. Our hypothesis is correct
OR
2. The variance in the observed outcomesare too high given the number ofreplications to be conclusive (run morereplications or use variance reduction techniques).
-
7/25/2019 Comparing Systems Powerpoints.pdf
8/459-8
Types of Errors
Type I error: Reject True Hypothesis
Type II error: Accept False Hypothesis
True False
Accept
Reject III
-
7/25/2019 Comparing Systems Powerpoints.pdf
9/459-9
Possible Confidence Intervals
(A)[------|------]
U1-U2=0
[------|------]
[------|------](C)
(B)
[------|------]denotes confidence Interval
Fail to
Reject H0
Reject H0
Reject H0
-
7/25/2019 Comparing Systems Powerpoints.pdf
10/459-10
Comparison of two Buffer Strategies
Assume we have two candidate strategies believed to
maximize the throughput of the system.
We offer two methods based on the Confidence Interval
approach.
We seek to discover if the mean throughput of the systemdue to Strategy 1 and Strategy 2 are significantly different.
We begin by estimating the mean performance of the two
proposed strategies by simulating each strategy for 16 days
past the warm-up period.
The simulation experiment was replicated 10 times for
each strategy.
The throughput achieved by each strategy is shown next.
-
7/25/2019 Comparing Systems Powerpoints.pdf
11/459-11
Comparison of two Buffer Strategies
(A)
Replication
(B)
Strategy 1
Throughput
(C)
Strategy 2
Throughput
1 54.48 56.01
2 57.36 54.08
3 54.81 52.14
4 56.20 53.49
5 54.83 55.49
6 57.69 55.00
7 58.33 54.88
8 57.19 54.47
9 56.84 54.93
10 55.29 55.84
mean 56.30 54.63
Standard Deviation 1.37 1.17
Variance 1.89 1.36
-
7/25/2019 Comparing Systems Powerpoints.pdf
12/459-12
Welch Confidence Interval
Requires that the observations drawn
from each simulated system benormally distributed and independent
within a population and betweenpopulations.
-
7/25/2019 Comparing Systems Powerpoints.pdf
13/459-13
-
7/25/2019 Comparing Systems Powerpoints.pdf
14/459-14
17.5
= T.INV.2T(0.05,17.5) = 2.10
-
7/25/2019 Comparing Systems Powerpoints.pdf
15/45
9-15
-
7/25/2019 Comparing Systems Powerpoints.pdf
16/459-16
-
7/25/2019 Comparing Systems Powerpoints.pdf
17/45
Paired-t CI for Comparing 2 Systems
Like the Welch CI method, the paired-t CI
method requires that the observations drawing
from each population be normally distr ibuted
and independentwithin a population.
However, the paired-t CI method does not
require that the observations betweenpopulations be independent.
9-17
-
7/25/2019 Comparing Systems Powerpoints.pdf
18/45
Paired-t CI for Comparing 2 Systems
This allows us to use Common Random
Numbers to force a positive correlation between
the two populations in order to reduce the half-
width.
Finally, like the Welch method, the paired-t CI
method does not require that the population have
equal variances.
Given n observations (n1=n2=n), we pair the
observations from each population (x1and x2) to
define a new random variable: x(1-2)j = x1j- x2j9-18
-
7/25/2019 Comparing Systems Powerpoints.pdf
19/459-19
-
7/25/2019 Comparing Systems Powerpoints.pdf
20/45
9-20
Paired-t Comparison
Variance 3.42
(A)
Replicat ion (j)
(B )
Strategy 1Throughputx1j
(C)
Strategy 2Throughput
x2j
(D)
Difference (B C)
X(1-2)j
= x1j- x2j
1 54.48 56.01 -1.53
2 57.36 54.08 3.28
3 54.81 52.14 2.674 56.20 53.49 2.71
5 54.83 55.49 -0.66
6 57.69 55.00 2.69
7 58.33 54.88 3.45
8 57.19 54.47 2.72
9 56.84 54.93 1.91
10 55.29 55.84 -0.55
mean 1.67
Standard Dev. 1.85
-
7/25/2019 Comparing Systems Powerpoints.pdf
21/45
9-21
-
7/25/2019 Comparing Systems Powerpoints.pdf
22/45
9-22
Comparing More than 2 Systems
Bonferroni approach
ANOVA
Factorial design and
optimization experiments
-
7/25/2019 Comparing Systems Powerpoints.pdf
23/45
9-23
-
7/25/2019 Comparing Systems Powerpoints.pdf
24/45
9-24
-
7/25/2019 Comparing Systems Powerpoints.pdf
25/45
9-25
-
7/25/2019 Comparing Systems Powerpoints.pdf
26/45
9-26
-
7/25/2019 Comparing Systems Powerpoints.pdf
27/45
9-27
Factorial Design
Input variables are the factors
Output measures are the responses
Tests system response(s) when
multiple factors are being manipulated.
-
7/25/2019 Comparing Systems Powerpoints.pdf
28/45
9-28
Two-level Full-factorial Design
Define a high and low level setting for each factor.
Try every combination of factor settings.
Run detailed studies for factors that have the
greatest impact.
-
7/25/2019 Comparing Systems Powerpoints.pdf
29/45
9-29
Fractional-factorial Design
Strategically "screen out" factors that have little or no
impact on system performance. On remaining factors, run a full-factorial experiment.
Run detailed studies for factors that have the greatest
impact.
-
7/25/2019 Comparing Systems Powerpoints.pdf
30/45
9-30
Variance Reduction
Common Random Numbers (CRN)
Evaluates each system under the exact same
circumstances.Helps ensure that observed differences of
system designs are due to the differences in
the designs and not to differences inexperimental conditions.
Antithetic Random Numbers (ARN)
-
7/25/2019 Comparing Systems Powerpoints.pdf
31/45
9-31
CRN ContinuedTable 7.10. Comparison of two buffer allocation strategies usincommon random numbers.(A)
Repl icat ion
(B)
Strategy 1
Throughput
(C)
Strategy 2
Throughput
(D)
Differenc e (B -C)
1 79.05 75.09 3.96
2 54.96 51.09 3.87
3 51.23 49.09 2.14
4 88.74 88.01 0.73
5 56.43 53.34 3.09
6 70.42 67.54 2.88
7 35.71 34.87 0.84
8 58.12 54.24 3.88
9 57.77 55.03 2.74
10 45.08 42.55 2.53
XDifference= 2.67
sDifference= 1.16
-
7/25/2019 Comparing Systems Powerpoints.pdf
32/45
More Detail on Comparison of Two
System Designs
Goal: compare two possible system configurations, e.g.:
two possible ordering policies in a supply-chain system,
two possible scheduling rules in a job shop.
Approach: the method of replications is used to analyze the
output data.
The mean performance measure for system iis denoted by
i (i = 1, 2) to obtain point and interval estimates for the
difference in mean performance, namely q1q2.
32
-
7/25/2019 Comparing Systems Powerpoints.pdf
33/45
Comparison of Two System Designs
Vehicle-safety inspection example:
The station performs 3jobs: (1) brake check, (2) headlight check, and (3)steering check.
Vehicles arrival: Possion with rate = 9.5/hour.
Present system:
Three stalls in parallel (one attendant makes all 3 inspections at each stall).
Service times for the 3jobs: normally distributed with means 6.5, 6.0and 5.5minutes, respectively.
Alternative system:
Each attendant specializes in a single task, each vehicle will pass throughthree work stations in series
Mean service times for each job decreases by 10%(5.85, 5.4, and 4.95minutes).
Performance measure:mean response time per vehicle (total time fromvehicle arrival to its departure).
33
-
7/25/2019 Comparing Systems Powerpoints.pdf
34/45
Comparison of Two System Designs
From replication rof system i, the simulation analyst obtains an estimate
Yirof the mean performance measure qi.
Assuming that the estimators Yirare (at least approximately) unbiased:
q1= E(Y1r), r = 1, , R1; q2= E(Y2r), r = 1, , R2
Goal: compute a confidence interval for 1 2to compare thetwo system designs
Confidence interval for 1 2: If c.i. is totally to the left of 0, strong evidence for the hypothesis that q1q2 < 0 (q1< q2).
If c.i. is totally to the right of 0, strong evidence for the hypothesis that q1q2 > 0 (q1> q2).
If c.i. contains 0, no strong statistical evidence that one system is better than the other
If enough additional data were collected (i.e., Riincreased), the c.i. would
most likely shift, and definitely shrink in length, until conclusion of 1< 2 or
1> 2would be drawn.34
-
7/25/2019 Comparing Systems Powerpoints.pdf
35/45
Comparison of Two System Designs
A two-sided 100(1-)% c.i. for q1q2always takes the
form of:
To calculate the Standard Error, the analyst uses one of
two statistical techniques.
Both techniques assume that the basic data Yir are
approximately normally distributed.
We will discuss these two methods next.
).(. 2.1.,2/2.1. YYestYY
35
freedom.ofdegresstheisand
ns,replicatioalloversystemformeasureeperformancmeansampletheiswhere .
iYi
-
7/25/2019 Comparing Systems Powerpoints.pdf
36/45
Comparison of Two System Designs
Statistically significant versus practically significant
Statistical significance: is the observed difference
larger than the variability in ?
Practical significance: is the true difference q1q2large
enough to matter for the decision we need to make?
Confidence intervals do not answer the question of
practical significance directly, instead, they bound the true
difference within a range.
2.1. YY
36
2.1. YY
-
7/25/2019 Comparing Systems Powerpoints.pdf
37/45
Independent Sampling with Equal Variances
[Comparison of 2 systems]
Different and independent random number streams are
used to simulate the two systems
All observations of simulated system 1are statistically
independent of all the observations of simulated system 2.
The variance of the sample mean, , is:
For independent samples:
21,2
.. ,i
RR
YVYV
i
i
i
ii
iY
.
37
2
22
1
21
2.1.2.1.
RR
YVYVYYV
-
7/25/2019 Comparing Systems Powerpoints.pdf
38/45
Independent Sampling with Equal Variances
[Comparison of 2 systems]
If it is reasonable to assume that 21 22(approximately) or ifR1= R2,a two-sample-t confidence-interval approach can be used:
The point estimate of the mean performance difference is:
The sample variance for system i is:
The pooled estimate of 2is:
C.I. is given by:
Standard error:
).(. 2.1.,2/2.1. YYestYY
2.1. YY
38
ii R
r
iirii
R
r
irii
i YRYR
YYR
S
1
2.
2
1
2
.2
1
1
1
1
freedomofdegrees221re whe,2)1()1(
21
2
22
2
112 -RRRR
SRSRSp
21
2.1.
11..
RRSYYes p
-
7/25/2019 Comparing Systems Powerpoints.pdf
39/45
Independent Sampling with Unequal Variances
[Comparison of 2 systems]
If the assumption of equal variances cannot safely be made, anapproximate 100(1-)%C.I. for can be computed as:
With degrees of freedom:
Minimum number of replicationsR1> 7andR2> 7is recommended.
39
intergerantoround,
1//1//
//
2
2
2221
2
121
2
2221
21
RRSRRS
RSRS
2
22
1
21
2.1...R
S
R
SYYes
-
7/25/2019 Comparing Systems Powerpoints.pdf
40/45
Common Random Numbers (CRN)
[Comparison of 2 systems] For each replication, the same random numbers are used to simulate
both systems.
For each replication r, the two estimates, Yr1and Yr2, are correlated.
However, independent streams of random numbers are used on different
replications, so the pairs (Yr1,Ys2) are mutually independent.
Purpose: induce positive correlation between (for each r) to
reduce variance in the point estimator of .
Variance of arising from CRN is less than that of independent
sampling (withR1=R2).
RRR
YYYVYVYYV
2112
2
2
2
1
2.1.2.1.2.1.
2
,cov2
.1 .2,Y Y
40
12is positive
2.1. YY
2.1. YY
-
7/25/2019 Comparing Systems Powerpoints.pdf
41/45
Common Random Numbers (CRN) [Comparison of 2 systems]
The estimator based on CRN is more precise, leading to a
shorter confidence interval for the difference.
Sample variance of the differences :
Standard error:
1-Rfreedomofdegresswith,1
andwhere
1
1
1
1
1
21
1
22
1
22
R
r
rrrr
R
r
r
R
r
rD
DR
D-YYD
DRDR
DDR
S
2.1. YYD
41
R
SYYesDes D 2.1...).(.
-
7/25/2019 Comparing Systems Powerpoints.pdf
42/45
Common Random Numbers (CRN)
[Comparison of 2 systems] It is never enough to simply use the same seed for the
random-number generator(s):
The random numbers must be synchronized: each
random number used in one model for some purposeshould be used for the same purpose in the other
model.
e.g., if the ithrandom number is used to generate a
service time at work station 2 for the 5tharrival in
model 1, the ithrandom number should be used for
the very same purpose in model 2.
42
-
7/25/2019 Comparing Systems Powerpoints.pdf
43/45
Comparison of Several System Designs
To compare K alternative system designsbased on some specific
performance measure, qi, of system i, for i = 1, 2, , K.
Procedures are classified as:
Fixed-sample-size procedures:predetermined sample size is used to
draw inferences via hypothesis tests of confidence intervals.
Sequential sampling (multistage): more and more data are collected
until an estimator with a pre-specified precision is achieved or until one
of several alternative hypotheses is selected.
Some goals/approaches of system comparison:Estimation of each parameter q.
Comparison of each performance measure qi, to control q1.
All pairwise comparisons, qi- qj, for all i not equal to j
Selection of the best qi. 43
-
7/25/2019 Comparing Systems Powerpoints.pdf
44/45
Bonferroni Approach[Multiple Comparisons]
To make statements about several parameters simultaneously,(where all statements are true simultaneously).
Bonferroni inequality:
The smaller jis, the wider thejthconfidence interval will be.
Major advantage: inequality holds whether models are run with
independent sampling or CRN
Major disadvantage: width of each individual interval increases
as the number of comparisons increases.
C
jEji
, ...,CiSP1
11)1true,arestatementsall(
44
Overall error probability, provides an upper bound on the
probability of a false conclusion
-
7/25/2019 Comparing Systems Powerpoints.pdf
45/45
Bonferroni Approach [Multiple Comparisons]
Should be used only when a small number of comparisons are made. Practical upper limit: about 10 comparisons
3 possible applications:
Individual c.i.s: Construct a 100(1- j)%c.i. for parameter qi,
where # of comparisons = K.
Comparison to an existing system: Construct a 100(1- j)%
c.i. for parameter qi
- q1
(i = 2,3, K), where # of comparisons
=K1.
All pairwise: For any 2 different system designs, construct a
100(1- j)%c.i. for parameter qi- qj. Hence, total # of