a quasi-exact method for the confidence intervals of the difference of two independent binomial...

14
STATISTICS IN MEDICINE Statist. Med. 2002; 21:943–956 (DOI: 10.1002/sim.1053) A quasi-exact method for the condence intervals of the dierence of two independent binomial proportions in small sample cases Xun Chen ; Clinical Biostatistics; Merck Research Laboratories; RY33-404; Rahway; P.O. Box 2000; NJ 07065-0900; U.S.A. SUMMARY In this paper we propose a quasi-exact alternative to the exact unconditional method by Chan and Zhang (1999) estimating condence intervals for the dierence of two independent binomial proportions in small sample cases. The quasi-exact method is an approximation to a modied version of Chan and Zhang’s method, where the two-sided p-value of an observation is dened by adding to the one-sided p-value the sum of all probabilities of more ‘extreme’ events in the unobserved tail. We show that distinctively less conservative interval estimates can be derived following the modied denition of the two-sided p-value. The approximations applied in the quasi-exact method help to simplify the computations greatly, while the resulting infringements to the nominal level are low. Compared with other approximate methods, including the mid-p quasi-exact methods and the Miettinen and Nurminen (M&N) asymptotic method, our quasi-exact method demonstrates much better reliability in small sample cases. Copyright ? 2002 John Wiley & Sons, Ltd. KEY WORDS: asymmetry; conservativeness; quasi-exact; reliability; small sample 1. INTRODUCTION Estimation for the dierence of two independent binomial proportions is a commonly encoun- tered problem in medical and health care applications. Use of condence intervals has been increasingly encouraged in addition to, or even in place of, the tests of statistical signicance [1]. Specically, we use the following notations to describe the problem. Let X 1 Bin(n 1 ;p 1 ) and X 2 Bin(n 2 ;p 2 ) be two independent binomial random variables. The joint probability of a particular realization (x 1 ;x 2 ) is thus Pr(X 1 = x 1 ;X 2 = x 2 )= n 1 x 1 n 2 x 2 p x1 1 (1 p 1 ) n1x1 p x2 2 (1 p 2 ) n2x2 (1) Correspondence to: Xun Chen, Clinical Biostatistics; Merck Research Laboratories; RY33-404; Rahway; P.O. Box 2000; NJ 07065-0900; U.S.A. E-mail: xun [email protected] Received June 2000 Copyright ? 2002 John Wiley & Sons, Ltd. Accepted June 2001

Upload: xun-chen

Post on 06-Jul-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

STATISTICS IN MEDICINEStatist. Med. 2002; 21:943–956 (DOI: 10.1002/sim.1053)

A quasi-exact method for the con�dence intervalsof the di�erence of two independent binomial proportions

in small sample cases

Xun Chen∗;†

Clinical Biostatistics; Merck Research Laboratories; RY33-404; Rahway; P.O. Box 2000;NJ 07065-0900; U.S.A.

SUMMARY

In this paper we propose a quasi-exact alternative to the exact unconditional method by Chan and Zhang(1999) estimating con�dence intervals for the di�erence of two independent binomial proportions insmall sample cases. The quasi-exact method is an approximation to a modi�ed version of Chan andZhang’s method, where the two-sided p-value of an observation is de�ned by adding to the one-sidedp-value the sum of all probabilities of more ‘extreme’ events in the unobserved tail. We show thatdistinctively less conservative interval estimates can be derived following the modi�ed de�nition ofthe two-sided p-value. The approximations applied in the quasi-exact method help to simplify thecomputations greatly, while the resulting infringements to the nominal level are low. Compared withother approximate methods, including the mid-p quasi-exact methods and the Miettinen and Nurminen(M&N) asymptotic method, our quasi-exact method demonstrates much better reliability in small samplecases. Copyright ? 2002 John Wiley & Sons, Ltd.

KEY WORDS: asymmetry; conservativeness; quasi-exact; reliability; small sample

1. INTRODUCTION

Estimation for the di�erence of two independent binomial proportions is a commonly encoun-tered problem in medical and health care applications. Use of con�dence intervals has beenincreasingly encouraged in addition to, or even in place of, the tests of statistical signi�cance[1]. Speci�cally, we use the following notations to describe the problem. Let X1∼Bin(n1; p1)and X2∼Bin(n2; p2) be two independent binomial random variables. The joint probability ofa particular realization (x1; x2) is thus

Pr(X1 = x1; X2 = x2)=(n1x1

)(n2x2

)px11 (1− p1)n1−x1px22 (1− p2)n2−x2 (1)

∗ Correspondence to: Xun Chen, Clinical Biostatistics; Merck Research Laboratories; RY33-404; Rahway;P.O. Box 2000; NJ 07065-0900; U.S.A.

† E-mail: xun [email protected]

Received June 2000Copyright ? 2002 John Wiley & Sons, Ltd. Accepted June 2001

944 X. CHEN

for x1 = 0; 1; : : : ; n1 and x2 = 0; 1; : : : ; n2. The parameter of interest is the di�erence of the twobinomial proportions

�=p1 − p2Here � must be in (−1; 1). In estimating the 100(1 − �) per cent con�dence interval forthe proportion di�erence �, a common recommendation of most textbooks is the followingapproximate formula:

p̂1 − p̂2±Z�=2√{p̂1(1− p̂1)=n1 + p̂2(1− p̂2)=n2} (2)

where p̂1 = x1=n1 and p̂2 = x2=n2 are the unrestricted maximum likelihood estimates of p1 andp2, respectively. Z�=2 is the usual critical value cutting o� probability �=2 in the upper tailof the standard normal distribution. It is known that when sample size is small, the abovesimple asymptotic formula is not reliable [2].Exact methods are usually more reliable than asymptotic methods in small sample cases. In

contrast to the asymptotic methods which are derived from large sample approximation, theterm exact referred to the use of exact distribution in derivation. It does not mean to meetthe nominal level exactly as speci�ed. An exact method guarantees to meet the prespeci�ednominal level by exceeding it. (In fact, it is known that in discrete setting, no non-randomizedprocedure can have a size exactly as speci�ed, except coincidentally [3].) To use the exactdistribution of (X1; X2), one has to �gure out how to eliminate the e�ect of the unknownnuisance parameter p1 (with p2 =p1 − � in (1)) at �rst. One way is to apply Fisher’s exactmethod, which eliminates the e�ect of nuisance parameter p1 by computing the p-value of anobservation in a conditional sample space. This method, referred to as the exact ‘conditional’method, was �rst proposed by Thomas and Gart [4] and corrected by Santner and Snell [5].Santner and Snell [5] also found that the exact conditional method was unnecessarily tooconservative and suggested the use of an exact ‘unconditional’ method to derive con�denceintervals for � in small sample cases. In contrast to the conditional method, the unconditionalmethod seeks to eliminate the e�ect of the nuisance parameter p1 by maximizing the signi�-cance level of an observation over the domain of p1. StatXact [6] o�ered the user a softwareof constructing con�dence intervals for � applying Santner and Snell’s exact method. Chanand Zhang [7] further improved Santner and Snell’s exact unconditional method by applyinga di�erent ordering statistic. Santner and Yamagami [8] and Coe and Tamhane [9] proposedanother type of exact unconditional method, where the e�ect of the nuisance parameter p1 waseliminated by ‘combining’ acceptance regions over the domain of p1. Santner and Yamagami[8] showed that their interval estimates also outperformed those of Santner and Snell. (Ourlater comparisons will show that the interval estimates of Santner and Yamagami [8] and Coeand Tamhane [9] are in fact both superior to the estimates of the Chan and Zhang method.)We note that the exact unconditional method by Chan and Zhang [7] has some very at-

tractive merits. It not only provides a direct correspondence to the p-values of the relevanthypothesis tests, it is also very straightforward in derivation and easy to carry out progra-matically. Following similar derivation, we propose a quasi-exact alternative to the Chan andZhang method in this paper. It retains the good aspects of the Chan and Zhang method whilebeing distinctively less conservative and simpler in calculation. The quasi-exact method de-�nes the two-sided p-value of an observation by adding to the one-sided p-value the sumof all probabilities of more ‘extreme’ events in the unobserved tail. The quasi-exact method

Copyright ? 2002 John Wiley & Sons, Ltd. Statist. Med. 2002; 21:943–956

CONFIDENCE INTERVALS FOR DIFFERENCE OF TWO INDEPENDENT BINOMIAL PROPORTIONS 945

does not guarantee to meet the nominal level, but the degree of infringement is found to bevery low. Compared with other approximate methods, the quasi-exact method demonstratesmuch better reliability in small sample cases.Three types of estimation methods, exact, quasi-exact and asymptotic, are presented in

Section 2. The di�erent methods are compared in terms of coverage probability and expectedwidth in Section 3.

2. CONFIDENCE INTERVAL ESTIMATION METHODS

2.1. Asymptotic method

Newcombe’s review [2] of nine asymptotic methods showed that the method proposed byMiettinen and Nurminen [10] (referred to as the M&N asymptotic method hereafter) had morereliable performance than many of the other asymptotic methods in small to mid sample cases(n1 and n2 from 5 to 50). The M&N asymptotic method solves equations

p̂1 − p̂2 − �√{p̃1(1− p̃1)=n1 + p̃2(1− p̃2)=n2}√{1− 1=(n1 + n2)}=±Z�=2 (3)

for lower (+) and upper (−) bounds of �, say �l and �u, respectively. (Formula (3) isidentical to that of Mee [11] except for the term

√{1− 1=(n1 + n2)}. This term is negligiblein large samples. Miettinen and Nurminen added this term for less anticonservative estimates insmall sample cases.) In contrast to the simple asymptotic formula (2), constrained maximumlikelihood estimates for p1 and p2 (under the restriction of p1 − p2 = �) are used in thedenominator of (3) (denoted as p̃1 and p̃2, respectively). The constrained likelihood equationis a cubic equation of p1 (and thus p2). Miettinen and Nurminen [10] showed that p̃1 (andthus p̃2) can be solved uniquely by closed-form formulae. The reader may refer to Miettinenand Nurminen [10] for formulae.

2.2. Exact unconditional method

Santner and Snell [5] and Chan and Zhang [7] derived 100(1−�) per cent con�dence intervalfor � by inverting the p-value of observation (x1; x2) as in the following:{

�l= inf{� : Pvalue2(x1; x2|T; �)¿�}�u= sup{� : Pvalue2(x1; x2|T; �)¿�}

(4)

where Pvalue2(x1; x2|T; �) denotes the two-sided p-value of observation (x1; x2) for given Tand �; T =T (x1; x2; �) is a prespeci�ed statistic ordering the sample space of (X1; X2); D(�)denotes the domain of p1 with

D(�)=

{(0; 1 + �); −1¡�¡0(�; 1); 06�¡1

unless otherwise speci�ed. Both Santner and Snell [5] and Chan and Zhang [7] de�ned thetwo-sided p-value of an observation by doubling the one-sided p-value, that is

Pvalue2(x1; x2|T; �)=2Pvalue1(x1; x2|T; �) (5)

Copyright ? 2002 John Wiley & Sons, Ltd. Statist. Med. 2002; 21:943–956

946 X. CHEN

with

Pvalue1(x1; x2|T; �) = maxp1∈D(�)

Pr{(X1; X2) :T (X1; X2; �)¿T (x1; x2; �)|�; p1}

∧ maxp1∈D(�)

Pr{(X1; X2) :T (X1; X2; �)6T (x1; x2; �)|�; p1} (6)

where x∧y= min(x; y). Note that the p-value in (6) is independent of the nuisance parameterp1, and based on (5) and (6), the estimates from (4) satisfy

Pr(�l6�6�u)¿1− �Comparing Santner and Snell’s T (x1; x2; �)= p̂1 − p̂2 with three other di�erent ordering

statistics, Chan and Zhang [7] concluded that (4) performed the best in general in smallsample cases when the M&N statistic was used as ordering statistic, that is, let

T (x1; x2; �)=p̂1 − p̂2 − �√{p̃1(1− p̃1)=n1 + p̃2(1− p̃2)=n2}

The exact unconditional method developed from (5) and (6), with the M&N statistic asordering statistic, will be referred to as the C&Z exact method hereafter. Unless otherwisespeci�ed in this paper, the M&N statistic will be used as ordering statistic throughout.It is worth noting a special ‘asymmetric’ feature of the exact distribution of (X1; X2)

when de�ning the two-sided p-value of an observation. Experience tells us the unobservedtail of an event is usually much smaller than that of the observed tail (see, for example,Mantel [12]). As illustrated in Figure 1, assume the observed outcome is x1 = 2 and x2 = 0for n1 = n2 = 4. Rank the sample space by the absolute value of M&N statistic. It is found thatwhen �=−0:2, the maximum signi�cance level of the observation is 0.0268 in the observedtail following formula (6) (one-sided p-value), while the corresponding signi�cance levelof the same observation is only 0.0150 in the unobserved tail. (The maximum signi�cancelevel of the observation in the unobserved tail is 0.0168, which is also much smaller than itsp-value in the observed tail.) De�ning the two-sided p-value as the sum of all probabilitiesof more ‘extreme’ events in both tails, rather than doubling the one-sided p-value in theobserved tail, may thus result in smaller two-sided p-values. In symbol, let

Pvalue2(x1; x2|T; �)= maxp1∈D(�)

Pr{(X1; X2) : |T (X1; X2; �)|¿|T (x1; x2; �)||�; p1} (7)

where one event, say (X1; X2), is claimed to be more ‘extreme’ than observation (x1; x2) ifand only if |T (X1; X2; �)|¿|T (x1; x2; �)|. In Figure 1, the two-sided p-value of the observation(2; 0) will be 0.0430 based on (7), in contrast to 2× 0:0268=0:0536 based on (5) and (6).Less conservative interval estimates may thus be derived following (7). Furthermore, it can beshown that the interval estimates (�l; �u) based on the p-value de�nition (7) also guaranteesat least 100(1− �) per cent coverage probability:

Pr(�l6�6�u) = Pr(Pvalue2(x1; x2;T; �)¿�)

= Pr(maxp1∈D(�)

Pr{(X1; X2) : |T (X1; X2; �)|¿|T (x1; x2; �)||�; p1}¿�

)

Copyright ? 2002 John Wiley & Sons, Ltd. Statist. Med. 2002; 21:943–956

CONFIDENCE INTERVALS FOR DIFFERENCE OF TWO INDEPENDENT BINOMIAL PROPORTIONS 947

Figure 1. An illustration of asymmetric tail distribution, n1 = 4, n2 = 4, x1 = 2, x2 = 0.

¿ Pr(Pr{(X1; X2) : |T (X1; X2; �)|¿|T (x1; x2; �)||�; p1 ∈D(�)}¿�)¿ 1− �

The exact unconditional method developed from (7) will be referred to as the Chen exactmethod hereafter.The special ‘asymmetric’ feature of the exact distribution of (X1; X2) may be addressed

in di�erent ways. For example, when deriving con�dence intervals of � by inverting theacceptance regions of the parameter, Santner and Yamagami [8] (S&Y) and Coe and Tamhane[9] (C&T) de�ned the acceptance region of � based on the total area of the ‘unaccepted’parts, rather than doubling the area of the ‘unaccepted’ parts on the observed side. That is,a 1 − � level acceptance region of parameter �, say A�, is de�ned as a set of (X1; X2) suchthat

P(Ac� ; �)= supp1 ∈D(�)

P{(X1; X2)∈Ac� |p1; �}6�

where Ac� denotes the complement of A� in the domain of (X1; X2), or, say, the unacceptedparts. (Additional constraints are needed to uniquely de�ne the 1− � level acceptance regionfor a given �, and di�erent constraints were applied by Santner and Yamagami [8] (S&Y)and Coe and Tamhane [9] separately.) Table I compares the four exact unconditional methods

Copyright ? 2002 John Wiley & Sons, Ltd. Statist. Med. 2002; 21:943–956

948 X. CHEN

Table I. Ninety-�ve per cent exact con�dence intervals for �, (n1; n2)= (5; 5).

(x1; x2) C&Z C&T S&Y Chen

(0; 0) (−0:522; 0:522) (−0:451; 0:451)∗ (−0:462; 0:462)† (−0:451; 0:451)∗(1; 1) (−0:562; 0:562) (−0:488; 0:488)∗ (−0:489; 0:489)† (−0:488; 0:488)∗(2; 2) (−0:626; 0:626) (−0:555; 0:555)† (−0:556; 0:556)‡ (−0:554; 0:554)∗(0; 1) (−0:716; 0:339) (−0:657; 0:268)∗ (−0:658; 0:268)† (−0:657; 0:268)∗(1; 2) (−0:730; 0:425) (−0:700; 0:339)∗ (−0:700; 0:462)‡ (−0:671; 0:388)†(2; 3) (−0:757; 0:475) (−0:645; 0:393)∗ (−0:647; 0:462)‡ (−0:700; 0:400)†(0; 2) (−0:853; 0:187) (−0:811; 0:100)∗ (−0:811; 0:180)‡ (−0:811; 0:171)†(1; 3) (−0:866; 0:305) (−0:825; 0:238)∗ (−0:826; 0:239)† (−0:825; 0:238)∗(0; 3) (−0:947; 0:030) (−0:924; 0:000)∗ (−0:927; 0:000)† (−0:924; 0:000)∗(1; 4) (−0:950; 0:112) (−0:926; 0:100)∗ (−0:904; 0:189)‡ (−0:926; 0:110)†(0; 4) (−0:995;−0:110) (−0:990;−0:100)‡ (−0:990;−0:189)∗ (−0:990;−0:110)†(0; 5) (−1:000;−0:383) (−1:000;−0:339)‡ (−1:000;−0:462)∗ (−0:1000;−0:388)†∗Shortest interval.†Second shortest interval.‡Third shortest interval.

(C&Z, S&Y, C&T and Chen) for n1 = n2 = 5. (The results of the S&Y and C&T methodsare directly extracted from Coe and Tamhane [9].) We note that by properly addressingthe ‘asymmetric’ feature of (1), the C&T, S&Y and Chen methods substantially overwhelmthe C&Z method in all of the illustrated sample cases. The Chen method ties the C&Tmethod as the winner of the comparison in most cases. It is out the scope of this paperto provide in-depth investigation for the S&Y or C&T methods. However, we should notethat the de�nitions of the acceptance region in S&Y and C&T are very complicated. TheChen and C&Z methods are much more straightforward in derivation and easier to carry-outprogramatically. In addition, the Chen and C&Z methods provide direct correspondence tothe p-value of the relevant hypothesis test. Such direct correspondence is, however, not veryclear in the C&T and S&Y methods.One precaution should be taken when applying the p-value de�nition (7). As pointed out by

a referee, the region I(�)= {� : Pvalue2(x1; x2|T; �)¿�} could be a union of disjointed intervalsbased on the de�nition (7). As a result, to correctly locate the lower bound and upper boundof I(�) following (7) (that is, to estimate �l and �u based on (7)), a grid search is necessary,starting from the boundaries of the domain of �, say −1 and +1, separately, and stoppingat the �rst identi�able ‘turning’ point next to the boundary. For an observation (x1; x2) anda prespeci�ed nominal level �, if the starting point, say �0, leads to Pvalue2(x1; x2|T; �0)¿�,then the �rst identi�able “turning” point next to �0, say �t , should be the �rst point onthe searching route satisfying Pvalue2(x1; x2|T; �t)6�, and if Pvalue2(x1; x2|T; �0)¡�, the �rstidenti�able ‘turning’ point next to �0 should be the �rst point on the searching route satisfyingPvalue2(x1; x2|T; �t)¿�. (If Pvalue2(x1; x2|T; �0)= �, then �t = �0.) The step size of the gridsearch is determined by the required degree of accuracy. The same grid search procedureshould also be applied to the C&Z exact method as I(�) based on (5) and (6) could be aunion of disjointed intervals when the M&N statistic is used. (The S&Y and C&T methodsalso required grid searches on the domain of �.)

Copyright ? 2002 John Wiley & Sons, Ltd. Statist. Med. 2002; 21:943–956

CONFIDENCE INTERVALS FOR DIFFERENCE OF TWO INDEPENDENT BINOMIAL PROPORTIONS 949

2.3. Quasi-exact method

While it is quite feasible to perform the complex computations of exact unconditional methodsnowadays, a slightly relaxed view, however, may still allow us to entertain the bene�ts ofthe procedure but with much simpler computations. A procedure with simpler calculation andcomparable reliability is absolutely more attractive in practice, especially when the sample sizeis relatively large, say ¿20, or when repeated application of the procedure is necessary, forexample, in simulation studies. Approximations are thus considered to ease the computationalload of the exact unconditional methods.At �rst, instead of searching for the maximum signi�cance level as in (6) and (7), we

approximate it by the signi�cance level at one single point – the constrained maximumlikelihood estimate of (p1; p2). For example, (7) can be approximated by

Pvalue2(x1; x2|T; �)=Pr{(X1; X2) : |T (X1; X2; �)|¿|T (x1; x2; �)||�; p̃1} (7a)

One rationale of this approximation lies in the fact that the likelihood of observation (x1; x2)is maximized when p1(2) = p̃1(2) for a given �. Storer and Kim [13] and Kang and Chen [14]applied the same approximation to simplify the computation of exact unconditional tests forhypotheses on �. Owing to the approximation, the coverage probabilities of correspondinginterval estimates no longer guarantee to meet the nominal level. Newcombe [2], however,found that the coverage probabilities barely fall below the nominal level when applying theapproximation to (6) with T (x1; x2; �)= p̂1 − p̂2. Our evaluation in Section 3 will furthercon�rm this �nding.When applying (7a) to estimate �l and �u, in theory, the same grid searching procedure in

the Chen exact method should be used as I(�) could be disjointed based on (7a). In practice,however, we �nd I(�) based on (7a) (or (7)) is seldom disjointed and whenever it happens the‘extra’ intervals are always fairly narrow compared with the primary part which includes theobservation. Hence, we suggest ignoring the potential disjointedness resulted from de�nition(7a) in calculation and simplifying the grid search by a bisection search starting from a setof initial estimates of �l and �u and stopping at the �rst identi�able ‘turning’ points next tothe initial estimates. While it is possible that the simpli�ed searching procedure may resultin ‘overestimated’ �l or ‘underestimated’ �u, we believe the consequent in�uence on overallcoverage probability will be negligible. The interval estimates derived from (7a) followingthe simpli�ed searching procedure will be referred to as Chen quasi-exact estimates hereafter.We use ‘quasi-exact’ here because the exact distribution of (X1; X2) remains to be used forthe calculation of p-value in (7a), despite the approximations applied.The con�dence interval of � can also be derived by inverting the mid-p-value of the

observation. There is considerable discussion about the application of mid-p-value in discretedistribution (see Berry and Armitage [15] for a review). In the two-sample binomial case, theone-sided mid-p-value of observation (x1; x2) is de�ned as one-half of the probability of theobservation plus the probability of more extreme events, that is

MidPvalue1(x1; x2|T; �)

= maxp1∈D(�)

(Pr{(X1; X2) :T (X1; X2; �)¿(or ¡)T (x1; x2; �)

∣∣�; p1}+ 12 Pr(x1; x2|�; p1))(8)

Copyright ? 2002 John Wiley & Sons, Ltd. Statist. Med. 2002; 21:943–956

950 X. CHEN

where ¿ or ¡ is used, whichever corresponds to the smaller tail. Similar to (5) and (7),there are at least two ways to de�ne the two-sided mid-p-value:

MidPvalue2(x1; x2|T; �)=2MidPvalue1(x1; x2|T; �) (9)

or

MidPvalue2(x1; x2|T; �)

= maxp1∈D(�)

(Pr{(X1; X2) : |T (X1; X2; �)|¿|T (x1; x2; �)||�; p1}+ 12 Pr(x1; x2|�; p1)

)(10)

Although the rationale or interpretation of the mid-p-value is not very easy to see, the ‘ben-e�t’ of replacing the p-value with the mid-p-value in discrete setting is obvious: to deriveless conservative hypothesis tests, and thus less conservative interval estimates. For example,the interval estimates based on (9) are always narrower than those based on (5) as (9) isalways smaller than (5), and the same for (10) and (7). A trade-o�, however, is that thecoverage probabilities of the interval estimates derived from the mid-p-values may fall belowthe required nominal level. The interval estimates based on (9) and (10) are thus quasi-exact estimates. We shall compare the Chen quasi-exact method with the mid-p-quasi-exactmethods in Section 3. It will be interesting to see how these di�erent quasi-exact methodsbalance anticonservativeness and reliability di�erently in small sample cases. To simplify thecomputation, similarly, we approximate (9) and (10) by

2 Pr{(X1; X2) :T (X1; X2; �)¿(or ¡)T (x1; x2; �)|�; p̃1}+ Pr(x1; x2|�; p̃1) (9a)

and

Pr{(X1; X2) : |T (X1; X2; �)|¿|T (x1; x2; �)||�; p̃1}+12Pr(x1; x2|�; p̃1) (10a)

respectively. We refer to the methods based on (9a) and (10a) as mid-p method 1 and mid-pmethod 2, respectively.

3. PERFORMANCE COMPARISON

We compare the performance of six di�erent methods – the M&N asymptotic method, theC&Z exact method, the Chen exact method, the Chen quasi-exact method, the mid-p method 1and the mid-p method 2 – in terms of coverage probability and expected width in this section.For any n1 and n2, at a speci�ed nominal level 100(1− �) per cent, the coverage probabilityof a method is a function of � for given p1, say, Coverage(�|p1). It can be calculated by

n1∑x1=0

n2∑x2=0

I(x1; x2; �)(n1x1

)(n2x2

)px11 (1− p1)n1−x1 (p1 − �)x2 (1− p1 + �)n2−x2

where I(x1; x2; �) is an indicator function showing whether or not the con�dence intervalcovers the value �, that is, I(x1; x2; �)=1 if �∈ (�l(x1; x2); �u(x1; x2)), I(x1; x2; �)=0 otherwise.

Copyright ? 2002 John Wiley & Sons, Ltd. Statist. Med. 2002; 21:943–956

CONFIDENCE INTERVALS FOR DIFFERENCE OF TWO INDEPENDENT BINOMIAL PROPORTIONS 951

Here we use notation (�l(x1; x2); �u(x1; x2)) to specify the con�dence interval derived basedon observation (x1; x2). Similarly, the expected width, say E(w|p1; �), may be calculated by

n1∑x1=0

n2∑x2=0

w(x1; x2)(n1x1

)(n2x2

)px11 (1− p1)n1−x1 (p1 − �)x2 (1− p1 + �)n2−x2

where w(x1; x2)= �u(x1; x2) − �l(x1; x2) is the width of the interval for given x1 and x2. Notethat the expected width for a given method is a function of � and p1.The six di�erent estimation methods are compared over a grid of quadruplets (n1; n2; p1; �)

under di�erent small sample settings. The nominal level is chosen to be the conventional 95per cent. For the equal sample case, we consider n1 = n2 = n=2(1)20 (2 to 20 by 1); forthe unequal case, we let n1 = 10 and n2 = 2(1)9 (2 to 9 by 1, and so on). For each given(n1; n2), the 95 per cent con�dence intervals are obtained for all possible outcomes and foreach method we compare here. For each pair of (n1; n2), we consider a 49 × 49 (p1; p2)grid: p1(2) = 0:02(0:02)0:98. Correspondingly, �=p1−p2 = −0:96(0:02)0:96. Note that whenn1 = n2 = n, pairs (p1; p2), (1−p1; 1−p2), (p2; p1), and (1−p2; 1−p1) correspond to the samecoverage probability and expected width; when n1 �= n2, pairs (p1; p2) and (1 − p1; 1 − p2)correspond to the same coverage probability and expected width. Hence for the equal samplecase, the number of unduplicated grid points we actually tested is 19× 25×(1+49)

2 = 11875; forthe unequal sample case, the number of unduplicated grid points is 8× 49×(1+49)

2 = 9800. Thesix di�erent methods compared here all hold some important invariant properties as follows:if the con�dence interval for � based on observation (x1; x2) is (�l; �u) for one method, thenthe con�dence interval for � based on observation (n1 − x1; n2 − x2) will be (−�u;−�l) forthe same method; and when n1 = n2, the con�dence intervals for observation (x; x) satisfy�u=−�l.Figures 2, 3 and 4 compare the mean coverage probability, the proportion of coverage

probability falling below the nominal level, and the minimum coverage probability of the sixestimation methods, respectively. The top and bottom graphs are for the equal and unequalsample cases separately. Similar patterns are observed. The Chen quasi-exact method demon-strates close performance to its exact origin (the Chen exact method), but is less conservativedue to the approximations. Similar patterns are found between the C&Z exact method andthe C&Z quasi-exact method (not shown here). The mean coverage probabilities of the sixmethods are all greater than the nominal level except for the mid-p method 2 at a coupleof (n1; n2) combinations. The C&Z exact method is found substantially more conservativethan the other �ve methods. The Chen quasi-exact method shows comparable mean coverageprobability with that of the mid-p method 1 and the M&N asymptotic method. The mid-pmethod 2 tends to be the least conservative method based on Figure 2. However, it possessesthe largest violation rates (the proportions of coverage probability falling below the nominallevel) throughout. As shown in Figure 3, the violation rates of the mid-p method 2 are mostly,more than 40 per cent and even as large as 60 per cent. In contrast, the Chen quasi-exactmethod shows very good reliability. Its violation rates are consistently lower than 10 per cent,and mostly less than 5 per cent. The violation rates of the mid-p method 1 and the M&Nmethod are smaller than those of the mid-p method 2, but are still substantially larger thanthose of the Chen quasi-exact method. We have reason to believe that the smaller mean cov-erage probability of the mid-p method 2 (compared with the Chen quasi-exact method etc.)is mostly a result of its much more frequent infringements (to nominal level). Furthermore, in

Copyright ? 2002 John Wiley & Sons, Ltd. Statist. Med. 2002; 21:943–956

952 X. CHEN

Figure 2. Mean coverage probability of 95 per cent CI.

Figure 4, we note that the Chen quasi-exact method maintains at least 93.5 per cent coverageprobability throughout. The minimum coverage probability of the mid-p method 2 falls be-low 90 per cent. The minimum coverage probability of the M&N asymptotic method is evenless than 85 per cent, and the mid-p method 1 has minimum coverage probability around 91per cent. The minimum coverage probability of the C&Z exact method and the Chen exactmethod never falls below the nominal level as it should. We compare the average expectedwidth of the di�erent estimation methods (versus the C&Z exact method) in Figure 5. TheChen quasi-exact method demonstrates comparable expected width with the mid-p method 1and the M&N asymptotic method. Since the interval estimates derived from the mid-p method2 are contained by those from the Chen quasi-exact method, the expected width of the mid-p

Copyright ? 2002 John Wiley & Sons, Ltd. Statist. Med. 2002; 21:943–956

CONFIDENCE INTERVALS FOR DIFFERENCE OF TWO INDEPENDENT BINOMIAL PROPORTIONS 953

Figure 3. Proportion of coverage probability under 95 per cent nominal level.

method 2 is consistently less than that of the Chen quasi-exact method. However, this advan-tage of the mid-p method 2 is diminished by its distinctively lower reliability when comparedwith the Chen quasi-exact method.

4. CONCLUSION AND DISCUSSION

In general, we recommend the application of the Chen quasi-exact method for the estima-tion of con�dence intervals for � in small sample cases. The Chen quasi-exact method is

Copyright ? 2002 John Wiley & Sons, Ltd. Statist. Med. 2002; 21:943–956

954 X. CHEN

Figure 4. Minimum coverage probability of 95 per cent CI.

much less conservative than most existing exact methods and much simpler in calculation.The approximations applied in the Chen quasi-exact method only result in slight and infre-quent infringement to the nominal level. Compared with many other approximate methods,it demonstrates much better reliability in small sample cases. In summary, we �nd that theChen quasi-exact method o�ers the best balance between anticonservativeness and reliabilityin small sample cases of any of the six methods compared in this paper. The method can beeasily conducted by S-plus. The program is available from the author upon request.The Chen quasi-exact method can be easily extended to provide con�dence intervals for

p1=p2, if the ratio of proportions is of interest.

Copyright ? 2002 John Wiley & Sons, Ltd. Statist. Med. 2002; 21:943–956

CONFIDENCE INTERVALS FOR DIFFERENCE OF TWO INDEPENDENT BINOMIAL PROPORTIONS 955

Figure 5. Ratio of average expected width (versus C&Z exact method).

ACKNOWLEDGEMENTS

The author is grateful to the editor and two anonymous referees for their very helpful comments.

REFERENCES

1. Gardner MJ, Altman DG. (eds). Statistica with Con�dence. British Medical Journal: London, 1989.2. Newcombe RG. Interval estimate for the di�erence between independent proportions: comparison of elevenmethods. Statistics in Medicine 1998; 17:873–890.

3. Tocher KD. Extension of the Neyman-Pearson theory of tests to discontinuous variates. Biometrika 1950; 37:130–144.

Copyright ? 2002 John Wiley & Sons, Ltd. Statist. Med. 2002; 21:943–956

956 X. CHEN

4. Thomas DG, Gart JJ. A table of exact con�dence limits for di�erences and ratios of two proportions and theirodds ratios. Journal of the American Statistical Association 1977; 72:73–76.

5. Santner TJ, Snell MK. Small-sample con�dence intervals for p1 − p2 and p1=p2 in 2 × 2 contingency tables.Journal of the American Statistical Association 1980; 75:386–394.

6. Cytel Software. StatXact Version 3. Cytel Software: Cambridge, MA, 1995.7. Chan ISF, Zhang Z. Test-based exact con�dence intervals for the di�erence of two binomial proportions.Biometrics 1999; 55:1202–1209.

8. Santner TJ, Yamagami S. Invariant small sample con�dence intervals for the di�erence of two successprobabilities. Communication in Statistics 1993; 22:33–59.

9. Coe PR, Tamhane AC. Small sample con�dence intervals for the di�erence, ratio and odds ratio of two successprobabilities. Communication in Statistics 1993; 22:925–938.

10. Miettinen O, Nurminen M. Comparative analysis of two rates. Statistics in Medicine 1985; 4:213–226.11. Mee RW. Con�dence bounds for the di�erence between two probabilities. Biometrics 1984; 40:1175–1176.12. Mantel, N. Yatess correction for continuity and analysis of 2 × 2 contingency tables: comment. Statistics in

Medicine 1990; 9:369–370.13. Storer BE, Kim C. Exact properties of some exact test statistics for comparing two binomial proportions. Journal

of the American Statistical Association 1990; 85:146–155.14. Kang SH, Chen JJ. An approximate unconditional test of no-inferiority between two proportions. Statistics in

Medicine 2000; 19:2089–2100.15. Berry G. Armitage P. Mid-P con�dence intervals: a review. Statistician 1995; 44:417–423.

Copyright ? 2002 John Wiley & Sons, Ltd. Statist. Med. 2002; 21:943–956