best sy sytem nelson swann goldsman

Upload: amitghade

Post on 04-Jun-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 Best Sy Sytem Nelson Swann Goldsman

    1/14

    SIMPLE PROCEDURES FOR SELECTING THE BEST SIMULATEDSYSTEM WHEN THE NUMBER OF ALTERNATIVES IS LARGE

    BARRY L. NELSON Department of Industrial Engineering and Management Sciences, Northwestern University,

    Evanston, Illinois 60208-3119, [email protected]

    JULIE SWANNSchool of Industrial and Systems Engineering, Georgia Institute of Technology,

    Atlanta, Georgia 30332, [email protected]

    DAVID GOLDSMANSchool of Industrial and Systems Engineering, Georgia Institute of Technology,

    Atlanta, Georgia 30332, [email protected]

    WHEYMING SONG Department of Industrial Engineering, National Tsing Hua University,

    Hsinchu R.O.C., Taiwan, [email protected]

    (Received January 1998; revisions received January 1999, November 1999; accepted July 2000)

    In this paper, we address the problem of nding the simulated system with the best (maximum or minimum) expected performance whenthe number of alternatives is nite, but large enough that ranking-and-selection (R&S) procedures may require too much computation tobe practical. Our approach is to use the data provided by the rst stage of sampling in an R&S procedure to screen out alternatives thatare not competitive, and thereby avoid the (typically much larger) second-stage sample for these systems. Our procedures represent acompromise between standard R&S procedureswhich are easy to implement, but can be computationally inefcientand fully sequentialprocedureswhich can be statistically efcient, but are more difcult to implement and depend on more restrictive assumptions. Wepresent a general theory for constructing combined screening and indifference-zone selection procedures, several specic procedures and aportion of an extensive empirical evaluation.

    1. INTRODUCTIONA central reason for undertaking manyperhaps moststochastic simulation studies is to nd a system design thatis the best, or near the best, with respect to some measureor measures of system performance. The statistical proce-dure that is most appropriate for this purpose depends onthe characteristics of the problem at hand, characteristicsthat include the number of alternative designs, the num-ber of performance measures, whether or not the alterna-tives are functionally related in some useful way, and what,if any, regularity conditions apply to the response surface.

    Comprehensive reviews of the available tools can be foundin Fu (1994) and Jacobson and Schruben (1989).When the number of alternative designs is relatively

    small, say 2 to 10, and there is not a strong func-tional relationship among them, then statistical proceduresbased on the theory of ranking and selection (R&S) arepopular because they are easy to apply and interpret.See, for instance, Bechhofer et al. (1995) for a treat-ment of the general topic of R&S, and Goldsman andNelson (1998) for a survey of R&S procedures applied to

    simulation. When mean performance is of interest, the typ-ical indifference-zone (IZ) selection procedure conforms tothe following recipe:

    1. For each alternative, obtain a (usually small) num-ber of observations of the system performance measure of interest and calculate a measure of the variability of theobservations.

    2. Based on the measure of variability, the number of alternatives, and the desired condence level, determine thetotal number of observations needed from each alternativeto guarantee that a user-specied practically signicant dif-ference in performance can be detected at the desired con-

    dence level.3. Obtain the prescribed number of additional observa-

    tions from each alternative and select the one with the bestsample performance.

    Why are IZ selection procedures well suited for the sim-ulation environment? First of all, many of these proceduresassume that the data from a particular competitor are inde-pendent and normally distributedassumptions that areroughly satised by appropriately batched data or by sam-ple averages of independent replications of the simulations.

    Subject classications: Simulation, design of experiments: two-stage procedures. Simulation, statistical analysis: nding the best alternative. Statistics, design of experiments. Area of review: Simulation.

    Operations Research 2001 INFORMSVol. 49, No. 6, NovemberDecember 2001, pp. 950963 950

    0030-364X/01/4906-0950 $05.001526-5463 electronic ISSN

  • 8/13/2019 Best Sy Sytem Nelson Swann Goldsman

    2/14

    Nelson, Swann, Goldsman, and Song / 951

    Second, we can manipulate the underlying pseudorandomnumber seeds to produce simulations that are also indepen-dent between competitors, possibly running the differentcompetitors on parallel processors.

    On the other hand, why are such procedures only recom-mended for a small number of alternatives? Consider Step 2

    for a specic procedure due to Rinott (1978). Rinotts pro-cedure species N i , the total number of independent, nor-mally distributed observations required from alternative i,to be

    N i =max n0hS i

    2

    (1)

    where n0 is the initial sample size; h is a constant thatdepends on the number of alternatives k, the desired con-dence level 1 , and n0; S 2i is the sample variance of the initial n0 observations; and is the practically sig-nicant difference specied by the user. Two features of Equation (1) argue against using it when the number of alternatives is large: (a) the constant h is an increasingfunction of k, and (b) the formula is based on the worst-case assumption that the true mean of the best alternative isexactly better than all of the others, and all of the othersare tied for second best. This assumption is worst casein the sense that it makes the best alternative as hard aspossible to separate from the others, given that it is at least

    better than anything else. The reason for assuming theworst case is that it allows formula (1) to be independentof the true or sample means.

    The focus of this paper is issue (b). However, it is worth

    noting that the growth of h as a function of k is typicallyquite slow. Figure 1 plots kh2 versus k for Rinotts pro-cedure when the condence level for correctly selectingthe best alternative is 95%, and the rst-stage sample size

    Figure 1. The expected total sample size kh2 in units/ 2 for Rinotts procedure as a function

    of the number of systems k for 95% con-dence of making a correct selection.

    number of alternatives, k

    t o t a l

    o b

    s e r v a

    t i o n s

    , k h ^ 2

    0 20 40 60 80 100

    0

    5 0 0

    1 0 0 0

    1 5 0 0

    2 0 0 0

    is large. Because N i , the number of observations requiredfrom alternative i, is proportional to h2, the gure showsthe expected total number of observations in the experimentin units of / 2 (under the assumption that the variancesacross all alternatives equal 2). Notice that within therange of k

    =2 to 100 systems, each additional system adds

    approximately 20 / 2 observations to the experiment; inother words, the computational effort increases linearly inthe number of systems. Thus, issue (a) is not as serious asit is often thought to be.

    The fact that IZ selection procedures are based on aworst-case analysis implies that they may prescribe moreobservations than needed in order to deliver the desiredcorrect-selection guarantees. 1 This is especially unfortunatewhen the number of alternatives is large and they differwidely in performance, because a great deal of simulationeffort may be wasted on alternatives that are not competi-tive with the best. One situation in which a large number

    of heterogeneous alternatives may arise is after terminationof a stochastic optimization procedure. Such procedureseither provide no statistical guarantee or only a guaranteeof asymptotic convergence. We have proposed using R&Sto clean up after stochastic optimizationideally withminimal additional samplingto insure that the alternativeselected as best is indeed the best or near the best amongall of those visited by the search (Boesel and Nelson 1998,Boesel et al. 2001).

    Our goal is to provide procedures that are more adaptivethan standard IZ selection procedures without losing theease of implementation and interpretation that make them

    attractive. Specically, we will provide simple screeningprocedures that can be used to eliminate noncompetitivesystems after Step 1, thereby saving the (possibly large)number of observations that would be taken at Step 3.The screening procedures we propose are based on thesubset selection branch of R&S. These procedures attemptto select a (possibly random-size) subset of the k compet-ing systems that contains the one with the largest or small-est expected performance. Gupta (1956, 1965) proposed asingle-stage procedure for this problem that is applicablewhen the samples from the competing alternatives are inde-pendent, equal-sized, and normally distributed with com-mon unknown variance. The fact that subset selection canbe done in a single stage of sampling is a feature weexploit, but we rst need to extend the existing methods toallow for unknown and unequal variances across systems.Unknown and unequal variance subset-selection proceduresdo exist, but they require two or more stages of samplingin and of themselves (see, for instance, Koenig and Law1985 and Sullivan and Wilson 1989).

    In this paper we combine subset selectionto screen outnoncompetitive systemswith IZ selectionto select thebest from among the survivors of screeningto obtain acomputationally and statistically efcient procedure. Pro-cedures of this type have appeared in the literature before,e.g., Gupta and Kim (1984), Hochberg and Marcus (1981),Santner and Behaxeteguy (1992), Tamhane (1976, 1980),

  • 8/13/2019 Best Sy Sytem Nelson Swann Goldsman

    3/14

    952 / Nelson, Swann, Goldsman, and Song

    and Tamhane and Bechhofer (1977, 1979). One purpose of our paper is to extend this work, which typically assumesknown variances, or unknown but equal variances, so as tobe more useful in simulation experiments.

    If the idea of sample-screen-sample-select is effective,then one might naturally push it to the limit to obtain a fullysequential procedure (see, for instance, Hartmann 1991 andPaulson 1964). Such procedures take a single observationfrom each alternative that is still in play at the current stageof sampling, eliminate noncompetitive systems, then con-tinue with a single observation from each remaining alter-native, and so on.

    One disadvantage of a fully sequential procedure is theoverhead required to repeatedly switch among alternativesimulated systems to obtain a vector of observations acrossall systems still in play. Another is that existing proceduresrequire equal variances across systems, an assumption thatis clearly violated in many systems-simulation problems

    (in related research we are working on removing thisrestriction). Finally, the situations in which fully sequentialprocedures such as Hartmann (1991) and Paulson (1964)tend to beat screen-and-select procedures occur when all of the alternatives are close in performance (Bechhofer et al.1990), while we are interested in situations characterizedby a large number of alternatives with heterogeneous per-formance. On the whole, though, we believe that the use of sequential procedures is ultimately a good ideathe onlyproblems lie in overcoming the above obstacles. Thus, weare also pursuing the development of such procedures (Kimand Nelson 2001).

    The paper is organized as follows: 2 presents a decom-position lemma that allows us to marry screening proce-dures to IZ selection procedures while maintaining overallstatistical error control. In 3 we extend the state of the artin single-stage screening to allow unequal variances acrossalternatives; then in 4, we combine these new screen-ing procedures with IZ-selection procedures for the caseof unequal variances. Section 5 extends our ideas to allowthe systems to be screened in groups, rather than all atonce, which is convenient in an exploratory study or inconjunction with an optimization/search algorithm, and canbe more efcient. The paper ends with a report on a large-

    scale empirical study in 6, and conclusions in 7.

    2. A DECOMPOSITION LEMMA

    In this section we present a key lemma that simpliesthe construction of combined screening and IZ selectionprocedures. The Bonferroni-like lemma, which generalizesa result in Hochberg and Marcus (1981), establishes thatunder very general conditions we can apply an IZ selectionprocedure to the survivors of a screening procedure and stillguarantee an overall probability of correct selection (CS)even if the selection procedure starts with the same datathat was used for screening.

    Let the alternative systems be numbered 1 2 k andlet k denote the unknown index of the best alternative.

    Suppose is a procedure that obtains data from each sys-tem, and based on that data determines a (possibly) ran-dom subset I of 1 2 k , such that Pr 1 0,where = k I is the event that I contains the bestalternative.

    We want to consider combining with a multiple-stage

    selection procedure that will take I and the data usedto determine I as its initial stage of sampling, and thenobtain additional data in an attempt to determine the bestsystem. The types of selection procedures we have in mindcan be applied independently of any screening procedureor, equivalently, can be viewed as retaining all systems afterthe rst stage of sampling.

    Let J = 1 2 s be the distinct subsets of 1 2 k that contain k . There are s =2k1 such sub-sets. We require that be a procedure that has the fol-lowing properties: determines a (possibly random) indexK J such that Pr J 1 1 for any such sub-set J that contains k , where J = K = k is theevent that K is the best alternative. Further, suppose that

    1 2 k J for all J . In other words, if acorrect selection would be made when is applied to theentire set, then a correct selection would be made if it wereapplied to any subset that contains k . This property willhold for any procedure whose sampling from system i, say,depends only on the data generated for system i. This willbe true, for instance, when takes an initial sample of n0observations from system i, then determines the total sam-ple from system i by a formula like

    N i =max n0hS i

    2

    where S 2i is the sample variance of the initial n0 observa-tions from system i and both or h are xed.

    Let =s

    =1 J , the event that k is selected for allsubsets J to which can be applied.Lemma 1. For the combined procedure + ,Pr CS Pr 1 0 + 1Proof. Any outcome belonging to the event results in acorrect selection, provided that the subset of systems con-sidered by contains k . The event only provides out-

    comes for which this is the case. Any outcome that satisesboth conditions will certainly result in a correct selection.

    Next notice that

    Pr =Pr +Pr PrPr +Pr 1 k Pr1 0 + 1 1 1

    where the rst inequality follows because Pr s=1 J Pr 1 2 k . Remark. The additive decomposition of the screening andIZ selection procedures in Lemma 1 will not be the moststatistically efcient in all cases. For some specic proce-dures, we have shown that Pr 1 0 1 1 .

  • 8/13/2019 Best Sy Sytem Nelson Swann Goldsman

    4/14

    Nelson, Swann, Goldsman, and Song / 953

    Notice that 1 0 + 1 1 0 1 1 = 0 1 < 0,implying that the additive decomposition is more conserva-tive (and therefore less statistically efcient) than the mul-tiplicative one. However, the difference is quite smallinthe third decimal place or beyond for standard condencelevels.

    3. SCREENING PROCEDURES

    The decomposition lemma allows us to derive screeningprocedures in isolation from the selection procedures withwhich they will be combined. In this section we presenta new screening procedure that yields a subset of randomsize that is guaranteed to contain k with probability 1 0. This procedure generalizes others in the literatureby permitting unequal variances, which certainly is the casein many simulation studies. The procedure also exploitsthe dependence induced by the use of common random

    numbers.We will use the following notation throughout the paper:Let X ij be the j th observation from alternative i, for i =1 2 k . We will assume that the X ij are i.i.d. N i 2i random variables, with both i and 2i unknown and(perhaps) unequal. These assumptions are reasonable whenthe output data are themselves averages of a large num-ber of more basic output data, either from different repli-cations or from batch means within a single replication.The ordered means are denoted by 1 2 k .We assume that bigger is better, implying that the goal of the experiment is to nd the system associated with k .

    In other words, we dene a CS to mean that we select asubset that contains the index k . We will require that thePr CS k k1 1 , where k k 1 is a reminder that the guarantee is valid, provided the dif-ference between the best and second-best true mean is atleast . Throughout the paper, we use i to denote theunknown index of the system with the ith smallest mean.

    The following procedure can be applied when an initialsample having common size has been obtained from allsystems. No further sampling is required, but as a resultthe size of the subset is random.

    Screen-to-the-Best Procedure.

    1. Select the overall condence level 1 0, practicallysignicant difference , sample size n0 2, and number of systems k. Set t =t 1 0 1k1 n 01, where t denotes the quantile of the t distribution with degrees of freedom.

    2. Sample X ij i =1 2 k j =1 2 n 0 .3. Compute the sample means and variances X i and S 2ifor i =1 2 k . Let

    W ij =tS 2in0 +

    S 2j n0

    1/ 2

    for all i =j .4. Set I =

    i 1 i k and

    X i

    X j

    W ij

    +,

    j =i , where y+ =max 0 y .5. Return I .

    The subset I can be thought of as containing those alter-natives whose sample means are not signicantly inferiorto the best of the rest. In addition, notice that the subset I will never be empty for this procedure. In the appendix weprove that Pr k I k k1 1 0. In the spe-cial case of

    =0 this is a generalization of Guptas (1965)

    subset-selection procedure to allow unequal variances.Remark. If common random numbers (CRN) are usedto induce dependence across systems, then the subset-selection procedure described above is still valid, providedS 2i /n 0 +S 2j /n 0 is replaced by S 2ij /n 0, where

    S 2ij = 1n0 1

    n0

    =1X i X j X i X j

    2

    and t =t1 0 / k 1 n 01.4. COMBINED PROCEDURES

    The decomposition lemma makes it is easy to apply an IZselection procedure to the systems retained by a screen-ing procedure, while still controlling the overall condencelevel. The key observations are as follows:

    For overall condence level 1 , choose condencelevels 1 0 for the screening procedure, and 1 1 for theIZ selection procedure such that 0 + 1 = . A convenientchoice is 0 = 1 = / 2. Choose the critical constant t for the screening proce-dure to be appropriate for k systems, n0 initial observations,

    and condence level 1 0. Choose the critical constant h for the IZ selection pro-

    cedure to be appropriate for k systems, n0 initial observa-tions, and condence level 1 1.Below we exhibit one such procedure that combinesthe screening procedure of 3 with Rinotts IZ selectionprocedure.

    Combined Procedure.1. Select overall condence level 1 , practically sig-nicant difference , rst-stage sample size n0 2, and

    number of systems k. Set t =t 1 0 1k1 n 01 and h =h 11 n 0 k , where h is Rinotts constant (see Wilcox 1984

    or Bechhofer et al. 1995 for tables), and 0 + 1 = .2. Sample X ij i =1 2 k j =1 2 n 0.3. Compute the rst-stage sample means and variancesX 1i and S 2i for i =1 2 k . LetW ij =t

    S 2in0 +

    S 2j n0

    1/ 2

    for all i =j .4. Set I = i 1 i k and X 1

    i X 1j W ij +, j =i .5. If I contains a single index, then stop and return thatsystem as the best.Otherwise, for all i I compute the second-stage samplesizeN i =max n0 hS i

    2

  • 8/13/2019 Best Sy Sytem Nelson Swann Goldsman

    5/14

    954 / Nelson, Swann, Goldsman, and Song

    6. Take N i n0 additional observations from all systemsi I and compute the overall sample means

    X 2i = 1N i

    N i

    j =1X ij

    for i I .7. Select as best the system i I with the largest X 2i .The conclusion that Pr CS k k1 1 for the combined procedure follows immediately from

    Lemma 1 because the screening procedure is calibrated toretain the best with probability 1 0, while Rinottsselection procedure guarantees a probability of correctselection 1 1 for k (or fewer) systems. Thus, the over-all probability of correct selection is 1 0 + 1 =1 .Remark. The goal of a procedure like the one above is tond the best system whenever it is at least better thanany other. However, we also obtain statistical inference thatholds no matter what the conguration of the true means.We claim that with probability greater than or equal to 1 all of the following hold simultaneously:

    For all i I c , we have i < maxj I j + ; that is, wecan claim with high condence that systems excluded byscreening are less than better than the best in the retainedset. If we use =0 for screening, this means that we canclaim with high condence that the systems excluded byscreening are not the best.

    For all i I ,

    i maxj I j =i j X 2i maxj I j =i X

    2j

    X 2i maxj I j =i X 2

    j ++

    where y =min 0 y and y+ =max 0 y . These con-dence intervals bound the difference between each alterna-tive and the best of the others in I .

    If K denotes the index of the system that we select asthe best, thenPr K maxi I i

    =K i 1

    In words, the system we select will be within of the bestsystem in I with high condence.

    The rst claim follows because the screening proce-dure is in fact one-sided multiple comparisons with acontrol with each system i taking a turn as the control(Hochberg and Tamhane 1987). The second claim followsfrom Proposition 1 of Nelson and Matejcik (1995), whilethe third claim follows from Nelson and Goldsman (2001),Corollary 1.

    Remark. Because we prefer working with sample meansas estimators of the true system means, the procedure pre-sented here is based on sample means. However, manysubset-selection and IZ selection procedures have been

    based on weighted sample means, and these proceduresare typically more efcient than corresponding proceduresbased on sample means. Examples include the restrictedsubset-selection procedures (RSSPs) 2 of Koenig and Law(1985), Santner (1975), and Sullivan and Wilson (1989);and the IZ selection procedure of Dudewicz and Dalal

    (1975). The decomposition lemma in 2 allows us to formcombined procedures based on weighted sample means if we desire. Two obvious combinations are:

    Combine the screening procedure of 3 with the IZselection procedure of Dudewicz and Dalal (1975) to obtaina two-stage procedure for selecting the best.

    Combine either the RSSP of Koenig and Law (1985)or Sullivan and Wilson (1989) with the IZ selection proce-dure of Dudewicz and Dalal (1975) to obtain a three-stageprocedure for selecting the best that bounds the number of systems that survive screening.

    We investigate one of these combinations later in the

    paper.

    5. A GROUP-SCREENING VERSION OFTHE COMBINED PROCEDURE

    One difculty with the combined screening and IZ selec-tion procedure presented in the previous section is that all kof the systems must receive rst-stage sampling before pro-ceeding to the second stage. In this section, we show thatit is possibleand sometimes advantageousto break theoverall screening-and-selection problem into a screening-and-selection problem over smaller groups of systems. Aswe will show, if one or more very good alternatives are

    found early in the process, then they can be used to elim-inate a greater number of noncompetitive systems thanwould be eliminated if all were screened at once.

    To set up the procedure, let G1 G 2 G m be groups of systems such that G 1 G 2 G m = 1 2 k G i G j = for i =j G i 1 for all i and G 1 2. When wescreen the th group in the experiment, the systems in Gwill be screened with respect to each other and all systemsretained from the previous groups. However, the systemsretained from previous screenings have already receivedsecond-stage sampling, so screening in group is based on

    W ij =tS 2iN i +

    S 2j N j

    1/ 2

    where

    N i =n0 if system i has only received

    rst-stage sampling,N i if system i has received

    second-stage sampling.

    The potential savings occur because typically N i n0,which shortens W ij , providing a tighter screening proce-dure.

    In the following description of the group-screeningprocedure, we let X si denote the sample mean of all obser-vations taken from system i through s =1 or 2 stages of

  • 8/13/2019 Best Sy Sytem Nelson Swann Goldsman

    6/14

    Nelson, Swann, Goldsman, and Song / 955

    sampling; and we use I to denote the set of all systems thathave survived screening after groups have been screened.

    Group-Screening Procedure.1. As in Step 1 of the combined procedure in 4.2. Let I 0 = .3. Do the following for

    =1 2 m :

    (a) Sample X ij j =1 2 n 0, for all i G ; com-pute X 1i and S 2i ; and set N i =n0.Comment: G is the current group of systems to bescreened.

    (b) Let I =I new I old , whereI new = i i G X

    1i X

    1j W ij + j G

    and X 1i X 2j W ij + j I 1and

    I old

    = i i I 1 X 2

    i X 1

    j W ij + j Gand X 2i X 2j W ij + j I 1

    Comment: I new is the set of newly screened systems thatmake it into the next screening, while I old is the set of previously retained systems that survive another round.At the cost of some additional data storage I old can sim-ply be set to I 1 so that all systems that survive oneround of screening survive to the end.

    (c) For all i I new compute the second-stage samplesize N i based on Rinotts procedure, sample N i

    n0 addi-

    tional observations, and compute the second-stage sam-ple mean X 2i . Set N i =N i .4. Select as best the system i I m with the largest sam-

    ple mean X 2i .In the appendix we show that Pr CS k k11 2 0 1. However, we conjecture that Pr CS k

    k 1 1 0 1 =1 . In the online companionto this paper we show why we believe that the conjectureis true, and henceforth, we operate under this assumption .

    Remark. There are several contexts in which this type of procedure might be useful.

    Consider exploratory studies in which not all systemdesigns of interest are initially known or available. Sup-pose the user can bound k so that the critical values canbe computed. Then the study can proceed in a relativelyinformal manner, with new alternatives added to the studyas they occur to the analyst, or as suggested by the per-formance of other alternatives. The analyst can terminatethe study at any point at which acceptable performance hasbeen achieved and still have the desired correct-selectionguarantees.

    Certain heuristic search procedures, such as geneticalgorithms, work with groups or populations of systemsat each iteration. The group-screening procedure allowsnew alternatives generated by the search to be compared

    with the best alternatives previously visited, while main-taining the overall condence level for the system nallychosen (provided, again, that the user can bound k inadvance). See, for instance, Boesel et al. (2001).

    Example. To illustrate the potential savings from group

    screening, consider a situation with k =3 systems havingthe following characteristics:System

    1 2 3 3 0 1

    2 10 10 10

    If the rst-stage sample size is n0 =10, and we want anoverall condence level of 1 =0 95, then the requiredcritical values are t 0 975 1/ 2 9 =2 68 and h 0 975 10 3 =3 72.Suppose that the practically signicant difference is =

    1, and (for simplicity) the sample averages and samplevariances perfectly estimate their population counterparts.Then, when all systems are screened at once we have

    W ij =2 681010 +

    1010

    1/ 2

    3 79 for all i =jClearly, System 1 survives because it is the sample best,System 2 is screened out, but System 3 also survivesbecause

    X 13 =1 > X 1

    1 W 31 + =3 3 79 1 + =0 21Thus, both Systems 1 and 3 receive second-stage sampling,in which case

    N i =3 72 10

    1

    2

    =139 for i =1 3Now suppose that the systems are grouped as G1 = 1 2and G2 = 3 . In the rst step of the group screening pro-cedure System 1 survives and System 2 is eliminated, asbefore. System 1 again receives a total sample size of 139observations. The value of W 31 then becomes

    W 31 =2 681010 +

    10139

    1/ 2

    2 77Thus, System 3 is screened out because

    X 13 =1 3 2 77 1 + =1 23In this case we save the 129 second-stage observations thatare not required for System 3. Of course, the results changeif the systems are encountered in a different order; in fact,group screening is less efcient than screening all systemsat once if G 1 = 2 3 and G2 = 1 . We explore this trade-off more fully in 6.4.4.6. EMPIRICAL EVALUATION

    In this section we summarize the results of an extensiveempirical evaluation of two of the screening procedures

  • 8/13/2019 Best Sy Sytem Nelson Swann Goldsman

    7/14

    956 / Nelson, Swann, Goldsman, and Song

    described above: The combined procedure in 4whichuses screening to deliver a random-sized subset to RinottsIZ selection procedureand the group-screening versionof this same procedure as presented in 5. We also reportone set of results from a smaller study that combined theRSSP of Sullivan and Wilson (1989) with the IZ selection

    procedure of Dudewicz and Dalal (1975). Recall that anRSSP controls the maximum number of systems that sur-vive screening at the cost of an additional stage of sam-pling. These procedures are based on weighted samplemeans; such procedures tend to be more statistically ef-cient than procedures based on standard sample means, butare also more difcult to justify to practitioners.

    Rather than use systems simulation examples, whichoffer less control over the factors that affect the perfor-mance of a procedure, we chose to represent the systems asvarious congurations of k normal distributions (to assessthe impact of departures from normality, we also repre-

    sented the systems as congurations of k lognormal dis-tributions). In all cases System 1 was the best (had thelargest true mean). We evaluated the screening procedureson different variations of the systems, examining factors,including the practically signicant difference , the initialsample size n0, the number of systems k, the congurationof the means i , and the conguration of the variances 2i .As a basis for comparison, we also ran Rinotts IZ selec-tion procedure without any screening. The congurations,the experiment design, and the results are described below.

    6.1. Congurations

    To examine a difcult scenario for the screening pro-cedures, we used the slippage conguration (SC) of themeans.3 In the SC, the mean of the best system was setexactly or a multiple of above the other systems, andall of the inferior systems had the same mean.

    To investigate the effectiveness of the screening pro-cedure in removing noncompetitive systems, monotone-decreasing means (MDM) were also used. In the MDMconguration, the means of all systems were spaced evenlyapart.

    Group-decreasing means (GDM) formed the nal con-guration. Here the inferior systems were divided into twogroups, with means common within groups, but with therst groups mean larger than the second groups. Thepercentage of systems in each group and the differencebetween the second groups mean and the best systemsmean changed as described below.

    In some cases the variances of all systems were equal( 2i =1); in others they changed as described below.6.2. Experiment Design

    For each conguration, we performed 500 macroreplica-tions (complete repetitions) of the entire screening-and-selection procedure. The number of macroreplications waschosen to allow comparison of the true probability of cor-rect selection (PCS) to the nominal level. In all experi-ments, the nominal PCS was 1 = 0 95 and we took

    0 = 1 = 0 025. If the procedures true PCS is closeto the nominal level, then the standard error of the esti-mated PCS, based on 500 macroreplications, is about

    0 95 0 05 / 500 0 0097. What we want to examineis how close to 1 we get. If PCS 1 for allcongurations of the means, then the procedure is overlyconservative.

    In preliminary experiments, the rst-stage sample sizewas varied over n0 = 5 10, or 30. Based on the resultsof these experiments, follow-up experiments used n0 =10.The number of systems in each experiment varied overk =2 5 10 20 25 50 100 500.The practically signicant difference was set to =d 1/ n0, where 21 is the variance of an observation fromthe best system. Thus, we made independent of the rst-stage sample size by making it a multiple of the standarddeviation of the rst-stage sample mean. In preliminaryexperiments, the value of d was d

    = 1/ 2 1, or 2. Based

    on the results of the preliminary experiments, subsequentexperiments were performed with d =1 standard deviationof the rst-stage sample mean.

    In the SC conguration, 1 was set as a multiple of ,while all of the inferior systems had mean 0. In the MDMconguration, the means of systems were spaced accord-ing to the following formula: i = 1 b i 1 , for i =2 3 k , where b = / . Values of considered in pre-liminary experiments were =1 2, or 3 (effectively spac-ing each mean / 2, or / 3 from the previous mean). Forlater experiments, the value =2 was used.In the GDM conguration, the experimental factors con-sidered were the fraction of systems in the rst group of inferior systems, , and the common mean for each group.The fraction in the rst group was examined at levels of

    =0 25 0 5 0 75. The means in the rst group were alli = 1 , while the means in the second group were alli = 1 . The spacing of the second group was variedaccording to =2 3 4.The majority of the experiments were executed with

    the mean of the best system from the next-best sys-tem. However, to examine the effectiveness of the screeningprocedure when the best system was clearly better, someexperiments were run with the mean distance as much as4 greater. On the other hand, in some cases of the MDMand GDM congurations the mean of the best was less than

    from the next-best system.For each conguration we examined the effect of unequal

    variances on the procedures. The variance of the best sys-tem was set both higher and lower than the variances of the other systems. In the SC, 21 = 2, with =0 5 2,where 2 is the common variance of the inferior systems.In the MDM and GDM congurations, experiments wererun with the variance directly proportional to the meanof each system, and inversely proportional to the mean of each system. Specically, 2i

    =i

    +1 to examine the

    effect of increasing variance as the mean decreases, and 2i =1/ i +1 to examine the effect of decreasing

  • 8/13/2019 Best Sy Sytem Nelson Swann Goldsman

    8/14

    Nelson, Swann, Goldsman, and Song / 957

    variance as the mean decreases. In addition, some experi-ments were run with means in the SC, but with variancesof all systems either monotonically decreasing or monoton-ically increasing as in the MDM conguration.

    To assess the impact of nonnormality, we also gener-ated data from lognormal distributions whose skewness andkurtosis (standardized third and fourth moments) differedfrom those of the normal distribution.

    In evaluating the group-screening procedure, we alsoconsidered the additional experimental factors of groupsize, g, and the placement of the best system. The sizes of groups considered were g =2 k/ 2. Experiments were runwith the best system in the rst group and in the last group.

    When employing restricted-subset selection for screen-ing, we xed the maximum subset size r to be the largerof r = 2 and r = k/ 10 (that is, 10%) of the total num-ber of systems, k. This seemed to be a reasonable imitationof what might be done in practice, because a practitioner

    is likely to make the subset size some arbitrary but smallfraction of k.

    6.3. Summary of Results

    Before presenting any specics, we briey summarize whatwas observed from the entire empirical study.

    The basic combined procedure performed very wellin some situations, whereas in others it did not offersignicant improvement over Rinotts procedure with-out screening. In congurations such as MDM, thescreening procedure was able to eliminate noncompeti-tive systems and reduce the total sample size dramatically.However, in the SC, when the practically signicant differ-ence and true difference between the best and other systemswas less than two standard deviations of the rst-stage sam-ple mean, the procedure eliminated few alternatives in therst stage. The key insight, which is not surprising, is thatscreening is very effective at eliminating systems that areclearly statistically different at the rst stage, but is unableto make ne distinctions. Thus, the combined procedure ismost useful when faced with a large number of systemsthat are heterogeneous in performance, not a small numberof very close competitors. Restricting the subset size couldbe more or less efcient than not restricting it, dependingon how well the restricted-subset size was chosen.

    The PCS can be close to the nominal level 1 insome situations, but nearly 1 in others. Fortunately, thosesituations in which PCS is close to 1 were also typicallysituations in which the procedures were very effective ateliminating inferior systems and thus saved signicantly onsampling.

    In the robustness study, we found the PCS of the basiccombined procedure to be robust to mild departures fromnormality. However, more extreme departures did leadto signicant degradation in PCS, sometimes well below1

    .

    The performance of the group-screening procedure wasvery sensitive to how early a good (or in our experiments,

    the best) system was encountered. Encountering a goodalternative in an early group resulted in savings, whileencountering it late (say, in the last group of systems exam-ined) substantially increased the total sampling requiredrelative to screening all systems together. Thus, if groupscan be formed so that suspected good candidates are in

    the rst group then this approach can be effective. Boeselet al. (2001) show that it is statistically valid to sort therst-stage sample means before forming the groups, whichmakes early detection of a good system more likely.

    6.4. Some Specic Results

    We do not attempt to present comprehensive results fromsuch a large simulation study. Instead, we present detailsof some typical examples. The performance measures thatwe estimated in each experiment include the probabilityof correct selection (PCS), the average number of sam-ples per system (ANS), and the percentage of systemsthat received second-stage sampling (PSS). Notice that PSSis a measure of the effectiveness of the screening proce-dure in eliminating inferior systems. We rst examine thebasic combined procedure from 4, then compare it tothe group-screening procedure from 5. We also comparethe basic combined procedure to a procedure that employsan RSSP for screening and an IZ selection procedure basedon weighted sample means for selection.

    6.4.1. Comparisons Among Congurations of theMeans. Comparing the performance of the basic com-bined procedure on all of the congurations of the means

    shows the areas of strength and weakness. Table 1 providesan illustration. The estimated ANS depends greatly on theconguration of the systems. In the worst case (the SC),the procedure obtains a large number of samples from eachsystem; the ANS grows as the number of systems increases,and it is less efcient than simply applying Rinotts pro-cedure without screening. However, for the MDM the pro-cedure obtains fewer samples per system as the number of systems increases because the additional inferior systemsare farther and farther away from the best for MDM. ThePSS values indicate that the procedure is able to screen outmany systems in the rst stage for the MDM conguration,thus reducing the overall ANS. These and other results leadus to conclude that the screening procedure is ineffectivefor systems within 2 from the best system (when the is exactly one standard deviation of the rst-stage samplemean), but quite effective for systems more than 2 away.

    The three congurations also indicate PCS values can beclose to or far from the nominal level. For instance, PCS isalmost precisely 0 95 for k =2 in the SC, but nearly 1 formost cases of MDM.

    The results in Table 2 illustrate the advantages and dis-advantages of using a restricted-subset-selection procedurefor screening, and also the gain in efciency from using aprocedure based on weighted means.

    Comparing the rst set of results from Table 1 (whichis Rinotts procedure without screening) to the rst set of

  • 8/13/2019 Best Sy Sytem Nelson Swann Goldsman

    9/14

    958 / Nelson, Swann, Goldsman, and Song

    Table 1. Comparisons across all congurations of the means. In all cases n0 = 10 2i = 1 for alli =1/ n0, and nominal PCS =0 95.k

    Procedure/ Conguration Measure 2 5 10 100 500

    no screening PCS 0.948 0.960 0.960 0.974 0.972SC ANS 69 135 184 356 554

    1 = PSS 100% 100% 100% 100% 100%screening PCS 0.960 0.964 0.984 0.978 0.968SC ANS 86 170 225 453 538

    1 = PSS 89% 91% 94% 99% 99%screening PCS 1.000 0.998 0.960 1.000 1.000MDM ANS 88 166 177 66 27 =2 PSS 90% 88% 70% 12% 3%screening PCS 0.960 0.982 0.994 0.974GDM ANS 128 173 393 505

    =1/ 2 =4 PSS 65% 69% 80% 89%

    results from Table 2 (which is Dudewicz and Dalals proce-dure without screening) shows that Dudewicz and Dalalstwo-stage indifference-zone procedure is always at least asefcient, in terms of ANS, as Rinotts procedure. 4 Rinott(1978) proved that this is true in general.

    When we compare the MDM results from Table 1 withthe second set of results in Table 2, we see that restrictingthe maximum subset size can be either more or less ef-cient than taking a random-sized subset after the rst stageof sampling. In this example, there are exactly 2 systemswithin of the best (the best itself and the second-bestsystem). Therefore, a subset size of r = 2 is ideal, mak-ing restricted-subset screening more efcient than random-size subset screening when k = 5 or 10, because r = 2.However, in the k =100 500 cases a maximum subset of size r =k/ 10 is larger than necessary; thus, the random-size subset screening is substantially more efcient in thesecases.

    6.4.2. Other Factors. In this section, we look briey atthe effect of varying the number of systems, the practicallysignicant difference, the initial sample size, and the sys-tems variances.

    Increasing the number of systems, k, causes an approx-imately linear increase in the ANS for the SC (as we

    saw in Table 1, ANS can decrease when the means are

    Table 2. Results for Dudewicz and Dalal, and Sullivan and Wilson combined with Dudewicz and Dalal,for the MDM conguration with =2 n 0 =10 2i =1 for all i =1/ n0, and nominalPCS =0 95.

    k

    Procedure Measure 2 5 10 100 500

    no screening PCS 1.000 1.000 0.998 1.000 1.000r =1 ANS 69 128 165 312 457(D&D) PSS 100% 100% 100% 100% 100%screening PCS 1.000 0.998 1.000 1.000

    r =max 2 k/ 10 ANS 106 144 175 209(S&W +D&D) PSS 40% 20% 10% 10%

    widely spaced as in MDM); see Table 3. For example, with= 1 =1/ n0 in the SC (and all other systems havingmean 0), an increase from 2 systems to 500 systems causes

    the estimated ANS to increase from 86 to 538.The practically signicant difference had a greater

    effect on the ANS (roughly proportional to 1 / 2) than didk (roughly linear), again with the most signicant effect inthe SC. For example (see Table 3), at k =500 systems with= 1 =1/ n0 the ANS = 538, but with = 1 =2/ n0the ANS = 134. It is worth noting that linking the and the

    true difference 1 = k k1 , as we did here, hampersthe screening procedure. If instead we x and increaseonly k k1 , then screening is more effective. Table 4shows such results when is xed at 1/ n0 and the gapbetween k and k 1 goes from one to four times thisamount.

    The estimated PCS varied widely with , again indicat-ing that the estimated PCS could be close to the nominalvalue of 95%, ranging from 94.8% to 98.4% in the SCconguration.

    The initial sample size, n0, also had a signicant effecton ANS and PCS, particularly in the SC (see Table 5). Forinstance, with k =100 systems, ANS is greater than 750for n0 =5 and n0 =30, but for n0 =10 the ANS = 453. Theconclusion is that we would like to have a large enough

  • 8/13/2019 Best Sy Sytem Nelson Swann Goldsman

    10/14

    Nelson, Swann, Goldsman, and Song / 959

    Table 3. The effect of number of systems k and the difference between the best and next-best systemfor the SC. In all cases n0 =10 2i =1 for all i 1 = 2 = 3 = =k =0, where ismeasured in units of 1 / n0, and nominal PCS =0 95.

    k

    /Procedure Measure 2 5 10 100 500

    1/ 2 PCS 0.952 0.966 0.976 0.984 0.964screening ANS 362 715 925 1820 2155

    PSS 94% 97% 98% 99% 100%1 PCS 0.960 0.964 0.984 0.978 0.968screening ANS 86 170 225 453 538

    PSS 89% 91% 94% 99% 99%1 PCS 0.948 0.960 0.960 0.974 0.972no screening ANS 69 135 184 356 554

    PSS 100% 100% 100% 100% 100%2 PCS 0.970 0.982 0.984 0.978 0.968screening ANS 16 34 48 109 134

    PSS 64% 64% 70% 91% 97%

    initial sample to obtain some sharpness for the screeningprocedure, while not taking so many observations that wehave more precision than really necessary.

    The effect of unequal variances was insignicant com-pared to other experimental factors in the cases weconsidered, so we limit ourselves to a few comments: Whenthe variance of the best system was increased and the vari-ances of the other systems held constant and equal, thenANS also increased in the SC. However, if all varianceswere unequal, then as the variance of the best systembecame larger than those of the inferior systems, the ANSdecreased in all congurations. The effect on PCS was not

    consistent with changes in variances. Overall, the values of the means were more important than the values of the vari-ances, showing that our procedure sacrices little to accom-modate unequal variances.

    6.4.3. Robustness. To assess the impact of nonnormaldata on the basic combined procedure, the procedure wasapplied to lognormally distributed data with increasing lev-els of skewness and kurtosis, relative to the normal distri-

    Table 4. The effect of number of systems k and the difference between the best and next-best systemfor the SC with xed. In all cases n0 =10 2i =1 for all i =1/ n0 2 = 3 = =

    k

    =0 1 is measured in units of 1/ n0, and nominal PCS

    =0 95.

    k

    1 /Procedure Measure 2 5 10 100 500

    1 PCS 0.960 0.964 0.984 0.978 0.988screening ANS 86 170 225 453 561

    PSS 89% 91% 94% 99% 99%1 PCS 0.948 0.960 0.960 0.974 0.972no screening ANS 69 135 184 356 554

    PSS 100% 100% 100% 100% 100%2 PCS 0.998 0.998 1.000 1.000 1.000screening ANS 64 155 221 451 537

    PSS 77% 82% 89% 98% 99%4 PCS 1.000 1.000 1.000 1.000 1.000

    screening ANS 22 89 147 422 532PSS 55% 48% 58% 88% 97%

    bution (which has skewness 0 and kurtosis 3). Parametersof the lognormal distribution were chosen to obtain thedesired variance, skewness, and kurtosis, then the distribu-tion was shifted to place the means in the SC.

    Table 6 shows the estimated PCS for three lognormalcases, with the corresponding normal case included forcomparison. When skewness and kurtosis differ somewhatfrom normality (1.780, 9.112), the procedure still main-tains a PCS 0 95. However, as the departure becomesmore dramatic, the achieved PCS drops well below thenominal level. A larger number of systems ( k = 100 vs.k =10) exacerbates the problem, and the degradation is notalleviated by increased sampling (which occurs when issmaller). Thus, the procedure should be applied with cau-tion when data are expected or known to differ substantiallyfrom the normal model; mild departures, however, shouldpresent no difculty.

    6.4.4. Effectiveness of Group Screening. In the secondportion of the empirical evaluation, the basic combined pro-cedure was compared to a procedure that separated systems

  • 8/13/2019 Best Sy Sytem Nelson Swann Goldsman

    11/14

    960 / Nelson, Swann, Goldsman, and Song

    Table 5. The effect of number of systems k and the rst-stage sample size n0 for the SC. In all cases 2i =1 for all i 1 = =1/ n0 2 = 3 = =k =0, and nominal PCS =0 95.

    k

    n0 Measure 2 5 10 100 500

    5 PCS 0.968 0.976 0.968 0.988 0.992ANS 69 162 246 765 1651PSS 92% 97% 99% 100% 100%

    10 PCS 0.960 0.964 0.984 0.978 0.968ANS 86 170 225 453 538PSS 89% 91% 94% 99% 99%

    30 PCS 0.952 0.958 0.952 0.978 0.980ANS 282 365 468 817 1074PSS 85% 86% 89% 94% 96%

    into groups and performed group screening. The perfor-mance of the group-screening procedure was mixed.

    In the MDM conguration, the performance of the group-

    screening procedure depended on the placement of the bestsystem, the group size, and interactions between the twofactors. When the distance between the best system meanand the inferior system mean was large enough in the SC,then analogous results were obtained.

    In general, if the best system was placed in the rstgroup, then the group-screening procedure outperformedthe basic combined procedure. For example, for k = 10systems, ANS = 152 and 154 for group screening (withgroup sizes 2 and k/ 2, respectively), while ANS =177 withno group screening and 184 with no screening at all; seeTable 7. However, if the best system was placed in the last

    group, then it was better to screen all systems at once.If the best system was discovered early, then a smallergroup size was better. For instance, with k =25 systems,if group size was g =2 (with the best in the rst group),then ANS = 105. But if group size was g =k/ 2 (with thebest in the rst group), then ANS = 126. However, if thebest system was in the last position, a larger group size wasbetter (ANS = 314 compared to ANS = 235).

    7. CONCLUSIONS

    In this paper we have presented a general methodologyand several specic procedures for reducing the sampling

    effort that is required by a two-stage indifference-zoneselection procedure. Our approach is to eliminate or screenout obviously inferior systems after the initial stage of

    Table 6. The effect of nonnormality on PCS for the SC. In all cases 2i =1 for all i 1 = 2 =3 = =k =0 n 0 =10 is measured in units of 1/ n0, and nominal PCS =0 95.

    =1/ 2 =1k =10 k =100 k =10 k =100Distribution (Skewness, Kurtosis) PCS PCS PCS PCS

    normal (0, 3) 0 976 0 984 0 984 0 978lognormal (1.780, 9.112) 0 958 0 966 0 962 0 954

    lognormal (4, 41) 0 850 0 774 0 900 0 814lognormal (6.169, 113.224) 0 796 0 562 0 848 0 648

    sampling, while still securing the desired overall guaranteeof a correct selection. Such procedures preserve the sim-ple structure of IZ selection, while being much more ef-

    cient in situations where there are many alternative systems,though some are not really competitive.We focused on procedures that can use screening after an

    initial stage of samplingbecause of our interest in largenumbers of systems with heterogeneous performanceandIZ selection procedures based on sample meansbecauseof our preference for standard sample means over weightedsample means. However, in some situations restricting themaximum subset size and using a procedure based onweighted sample means can be advantageous. As a roughrule of thumb, we use rst-stage screening when there area large number of systems and their means are expected

    to differ widely, but use restricted-subset screening whena substantial number of close competitors are anticipated.The development of more formal methods for choosingamong the two is the subject of ongoing research.

    All of the combined procedures presented in this paperare based on the assumption that all systems are simulatedindependently. However, it is well known in the simulationliterature that the use of common random numbers (CRN)to induce dependence across systems can sharpen the com-parison of two or more alternatives. In the case of R&S,sharpening means reducing the total number of observa-tions required to achieve the desired probability of correct

    selection. Two IZ selection procedures that exploit CRNshave been derived in Nelson and Matejcik (1995), and wehave derived combined procedures based on them. How-

  • 8/13/2019 Best Sy Sytem Nelson Swann Goldsman

    12/14

    Nelson, Swann, Goldsman, and Song / 961

    Table 7. The effect of group screening relative toscreening all at once in the MDM congura-tion with =2. In all cases 2i =1 for all i,n0 =10 1 = =1/ n0, and nominal PCS =0 95.

    Scenario k

    =10 k

    =25

    g =2 with PCS 0.992 0.996best in rst ANS 152 105group PSS 56% 28%

    g =2 with PCS 0.998 1.000best in last ANS 233 314group PSS 99% 100%

    g =k/ 2 with PCS 0.996 1.000best in rst ANS 154 126group PSS 59% 35%

    g =k/ 2 with PCS 1.000 1.000best in last ANS 225 235group PSS 94% 70%

    g

    =1, no PCS 0.960 1.000

    group screening ANS 177 129PSS 70% 36%PCS 1.000 1.000

    no screening ANS 184 264PSS 100% 100%

    ever, these procedures are not desirable when the number of alternatives is very large, which is the focus of this paper.One procedure is based on the Bonferroni inequality, andwill become increasingly conservative as k increases. Theother assumes that the variance-covariance matrix of thedata across all alternatives satises a condition known assphericity. This is an approximation that works well as longas the true variances do not differ too much from the aver-age variance, and the true correlations between systems donot differ too much from the average correlation. As thenumber of systems increases, the validity of this approxi-mation becomes more and more suspect.

    APPENDIX

    The following lemmas are used in proving the validity of the screen-to-the-best procedure and the group-screeningprocedure. A much more complete proof of Lemma 4, as

    well as a proof of the validity of the screen-to-the-best pro-cedure under common random numbers, and a proof of ourinference for systems in the eliminated set are contained inthe online companion to this paper.

    Lemma 2 (Banerjee 1961). Let Z be an N 0 1 randomvariable that is independent of Y 1 Y 2 Y k, which areindependent chi-squared random variables, with Y i hav-ing degrees of freedom i . Let 1 2 k be arbitraryweights such that ki=1 i =1 and all i 0. ThenPr Z2

    k

    i

    =1

    t 2i iY i

    i1

    when ti =t1/ 2 i .

    Lemma 3 (Tamhane 1977). Let V 1 V 2 V k be inde- pendent random variables, and let gj v 1 v 2 v k j =1 2 p , be nonnegative, real-valued functions, each onenondecreasing in each of its arguments. Then

    E p

    j =1gj V 1 V 2 V k

    p

    j =1E gj V 1 V 2 V k

    We are now in a position to prove that Pr k I k k1 1 0 for the screen-to-the-best procedure.In the proof we let ij = i j .

    Proof. Notice that

    Pr k I k k 1=Pr X k X j W k j + j =k k k 1

    =PrX k X j k j

    2k + 2jn0

    1/ 2 W k j +

    2k + 2jn0

    1/ 2 k j

    2k + 2jn0

    1/ 2

    j =k k k1 (A1)

    To simplify notation, let

    Z j =X k X j k j

    2k + 2jn0

    1/ 2

    and let

    vj = 2k

    + 2j

    n0

    1/ 2

    Now, by the symmetry of the normal distribution, we canrewrite (A1) as

    Pr Z j W k j +

    vj +k j

    vj j =k k k 1

    Pr Z j W k j

    vj j =k (A2)

    The inequality leading to (A2) arises because k j under the assumed condition, W k j

    +

    +W k j ,

    and Z j does not depend on 1 2 k .To further simplify notation, let

    Q j =W k j

    vj =t 1 0 1/ k 1 n 01S 2k +S 2j 2k + 2j

    1/ 2

    We now condition on S 21 S 22 S

    2k and rewrite (A2) as

    E Pr Z j Q j j =k S 21 S 2kE

    k1

    j =1Pr Z j Q j S

    21 S

    2k

    k

    1

    j =1E Pr Z j Q j (A3)

  • 8/13/2019 Best Sy Sytem Nelson Swann Goldsman

    13/14

    962 / Nelson, Swann, Goldsman, and Song

    where the rst inequality follows from Slepians inequal-ity (Tong 1980), since Cov Z i Z j 0, and the secondinequality follows from Lemma 3, since Pr Z j Q j isincreasing in Qj , and Q j is increasing in S 21 S

    2k .

    To complete the proof, we attack the individual productterms in (A3). Notice that Q j 0 and Z j is N 0 1 , so we

    can writePr Z j Q j =

    12 +Pr 0 Z j Q j

    = 12 +

    12

    Pr Z 2j Q2j

    = 12 +

    12

    Pr Z 2j t2 1

    S 2j 2j +t

    2 2S 2k 2k

    12 +

    12

    12 1 1 0 1k1

    = 1 0 1k1 (A4)

    where the inequality in (A4) follows from Lemma 2 with 1 =1 2 = 2j / 2k + 2j .

    Substituting this result into (A3) shows that

    Pr CS k k1k1

    j =11 0

    1k1 =1 0

    To establish the correct selection guarantee for the group-screening procedure, we rst prove the following lemma.Lemma 4. Let be the index such that k G , and dene F = G1 G 2 G 1 (with F = if = 1)and S =G +1 G +2 Gm (with S = if =m).Consider the event

    = X 1k X 1j W k j + j G and X 1k X

    2j W k j + j F and

    X 2k X 1

    j W k j + j S Then Pr k k1 1 2 0.Proof. The proof of this lemma is quite long and tedious,so we only provide a sketch here; the complete details canbe found in the online companion.

    Using the same initial steps as before, we can show that

    Pr k k 1Pr Z 1 1j

    W k j v 1 1j

    j G Z1 2

    j W k j v 1 2j

    j F Z2 1

    j W k j v 2 1j

    j S (A5)

    where

    v 1 1j = 2kn0 +

    2j n0

    1/ 2

    v 1 2j = 2kn0 +

    2j N j

    1/ 2

    v 2 1j = 2k

    N k + 2j n0

    1/ 2

    and

    Z a bj =X ak X

    bj k j

    v a bj

    The proof then proceeds as follows:

    1. Condition on S 21 S

    2k and use Slepians inequality

    to break (A5) into the product of three probabilities.2. Use Slepians inequality and Lemma 3 on each piece

    to break any joint probabilities into products of marginalprobabilities for the Z a bj . Altogether, there will be k1terms in the product.

    3. Show that each marginal probability is 12 0 1k1 .

    It is critical to the proof that W ij is based only on the rst-stage sample variances, S 2i , because S

    2i is independent of

    both the rst- and second-stage sample means. In the onlinecompanion to this paper we provide compelling evidence,but not a proof, that each marginal probability is in fact

    1 0 1k1 .

    We are now in a position to establish the probabilityrequirement for the group-screening procedure. We claimthat

    Pr CS k k1Pr and X 2k > X 2j j =k k k1

    First, the event implies that system k survives screen-ing the rst time it is evaluated and that it is not eliminatedby the rst-stage sample means of any systems evalu-

    ated after k . This event occurs with probability 12 0by Lemma 4. Then the event X 2k > X 2j j = kinsures that the second-stage sample mean of system k isnot eliminated by any other systems second-stage samplemean; this event occurs with probability 1 1 becausewe used Rinotts constant to determine the second-stagesample size. Therefore, using Lemma 1, the probability of the joint event is 12 0 1. ENDNOTES

    1 However, it can be shown that indifference-zone selec-tion procedures are quite efcient when the worst-casereally happens; see for instance Hall (1959), Eaton (1967),and Mukhopadhyay and Solanky (1994).

    2 An RSSP insures that the maximum size of the sub-set does not exceed a user-specied number of systems byusing two or more stages of sampling to form the subset.

    3 The slippage conguration is the least favorable con-guration for Rinotts procedure, which forms the secondstage of two of the procedures presented here. Least favor-able means that this conguration attains the lower boundon probability of correct selection over all congurationsof the means with k k1 .4 Notice that in the absence of screening the ANS of Rinott or Dudewicz and Dalal is independent of the con-guration of the true means.

  • 8/13/2019 Best Sy Sytem Nelson Swann Goldsman

    14/14

    Nelson, Swann, Goldsman, and Song / 963

    ACKNOWLEDGMENTThis research was partially supported by National ScienceFoundation Grant numbers DMI-9622065 and DMI-9622269, and by JGC Corporation, Symix Corporation, andRockwell Software. The reports of three referees and anassociate editor were very helpful.

    REFERENCES

    Banerjee, S. 1961. On condence interval for two-means problembased on separate estimates of variances and tabulated valuesof t -table. Sankhy a A23 359378.Bechhofer, R. E., C. W. Dunnett, D. M. Goldsman, M. Hartmann.1990. A comparison of the performances of procedures forselecting the normal population having the largest meanwhen the populations have a common unknown variance.Comm. Statist. B19 9711006., T. J. Santner, D. Goldsman. 1995. Design and Analysis of

    Experiments for Statistical Selection, Screening and MultipleComparisons . John Wiley, New York.

    Boesel, J., B. L. Nelson. 1998. Accounting for randomness inheuristic simulation optimization. Proc. 12th European Sim-ulation Multiconference . Manchester, U.K. Society for Com-puter Simulation International. 634638., , S. H. Kim. 2001. Using ranking and selection toclean up after a simulation search. Technical report, Depart-ment of Industrial Engineering and Management Sciences,Northwestern University, Evanston, IL.

    Dudewicz, E. J., S. R. Dalal. 1975. Allocation of observations inranking and selection with unequal variances. Sankhy a B372878.

    Eaton, M. L. 1967. Some optimum properties of ranking proce-dures. Ann. Math. Statist. 38 124137.Fu, M. 1994. Stochastic optimization via simulation: A review.

    Ann. Oper. Res. 53 199248.Goldsman, D., B. L. Nelson. 1998. Comparing systems via sim-

    ulation. J. Banks, ed. The Handbook of Simulation. JohnWiley, New York, 273306.

    Gupta, S. S. 1956. On a decision rule for a problem in rankingmeans. Ph.D. dissertation Institute of Statistics, University of North Carolina, Chapel Hill, NC.. 1965. On some multiple decision (selection and ranking)rules. Technometrics 7 225245., W.-C. Kim. 1984. A two-stage elimination-type procedurefor selecting the largest of several normal means with a com-mon unknown variance. T. J. Santner, A. C. Tamhane eds. Design of Experiments: Ranking and SelectionEssays in Honor of Robert E. Bechhofer Marcel Dekker, New York,7794.

    Hall, W. J. 1959. The most economical character of Bech-hofer and Sobel decision rules. Ann. Math. Statist. 30964969.

    Hartmann, M. 1991. An improvement on Paulsons procedure forselecting the population with the largest mean from k normal

    populations with a common unknown variance. Sequential Anal. 10 116.

    Hochberg, Y., R. Marcus. 1981. Three stage elimination type pro-cedures for selecting the best normal population when vari-ances are unknown. Comm. Statist. A10 597612., A. C. Tamhane. 1987. Multiple Comparison Procedures .John Wiley, New York.

    Jacobson, S., L. Schruben. 1989. A review of techniques for sim-ulation optimization. Oper. Res. Letters 8 19.

    Kim, S. H., B. L. Nelson. 2001. A fully sequential procedure forindifference-zone selection in simulation. ACM TOMACS 11 ,in press.

    Koenig, L. W., A. M. Law. 1985. A procedure for selecting a sub-set of size m containing the best of k independent normalpopulations, with applications to simulation. Comm. Statist.B14 719734.

    Mukhopadhyay, N., T. K. S. Solanky. 1994. Multistage Selectionand Ranking Procedures: Second-order Asymptotics . MarcelDekker, New York.

    Nelson, B. L., D. Goldsman. 2001. Comparisons with a standardin simulation experiments. Management Sci. 47 449463., F. J. Matejcik. 1995. Using common random numbers forindifference-zone selection and multiple comparisons in sim-ulation. Management Sci. 41 19351945.

    Paulson, E. 1964. A sequential procedure for selecting the popu-lation with the largest mean from k normal populations. Ann. Math. Statist. 35 174180.

    Rinott, Y. 1978. On two-stage selection procedures and relatedprobability-inequalities. Comm. Statist. A7 799811.

    Santner, T. J. 1975. A restricted subset selection approach to rank-ing and selection problems. Ann. Statist. 3 334349., M. Behaxeteguy. 1992. A two-stage procedure for selectingthe largest normal mean whose rst stage selects a bounded

    random number of populations. J. Statist. Planning and Infer-ence 31 147168.

    Sullivan, D. W., J. R. Wilson. 1989. Restricted subset selectionprocedures for simulation. Oper. Res. 37 5271.

    Tamhane, A. C. 1976. A three-stage elimination type procedurefor selecting the largest normal mean (common unknownvariance). Sankhy a B38 339349.. 1977. Multiple comparisons in model I one-way anova withunequal variances. Comm. Statist. A6 1532.. 1980. On a class of multistage selection procedures withscreening for the normal means problem. Sankhy a B42197216., R. E. Bechhofer. 1977. A two-stage minimax procedure

    with screening for selecting the largest normal mean. Comm.

    Statist. A6 10031033., . 1979. A two-stage minimax procedure with screen-ing for selecting the largest normal mean (ii): an improvedPCS lower bound and associated tables. Comm. Statist. A8337358.

    Tong, Y. L. 1980. Probability Inequalities in Multivariate Distri-butions . Academic Press, New York.

    Wilcox, R. R. 1984. A table for Rinotts selection procedure. J. Quality Tech. 16 97100.