a k-stage sequential sampling procedure for estimation of normal mean

19
journal of statistical planning Journal of Statistical Planning and and inference ELSEVIER Inference 65 (1997) 109 127 A k-stage sequential sampling procedure for estimation of normal mean Wei Liu Department o/' Mathenultics, Unit, erMtl q/' Southampton, Southampton SO17 IBJ, UK Received 6 December 1995; received in revised fi)rm 3 December 1996 Abstract We study a k-stage (k>~3) sequential estimation procedure which includes the three-stage procedure of Hall (1981) as a special case. With a suitable value of k, the k-stage procedure not only can be as efficient as the fully sequential procedure of Anscombe, Chow and Robbins in terms of sample size, but also requires at most k sampling operations. For the problem of constructing a fixed width confidence interval for the mean of a normal population with unknown variance, the three-stage procedure of Hall always needs a few more observations than the fully sequential procedure. The five-stage procedure, however, requires almost the same number of observations as the fully sequential procedure. © 1997 Elsevier Science B.V. Kevwords: Confidence intervals; Normal distribution; Sequential methods 1. Introduction Let N(/t, ~2) be a normal population with mean tl and variance ~r 2, both unknown, and suppose we wish to construct a confidence interval for t~ with predetermined coverage probability 7 and width 2d. Let no = (zo/d) 2 be the sample size which would have been used to achieve this goal had 0 "2 been known, where z= • I((1 5 7)/2) and • and q5 are, respectively, the cdf and pdf of a N(0, 1 ) random variable. The fully sequential procedure of Anscombe (1953), Robbins (1959) and Chow and Robbins (1965), denoted as the ACR procedure hereafter, takes observations one by one, and makes a decision after each observation. Let YI, Y2.... be independent observations from the N(it, cr 2) population and let 17,, and o-,2, be, respectively, the mean and variance of the n-sample }I1..... Y,, - on : (Yi - 17):. n/=l n--l~=l 0378-3758/97/$17.00 @ 1997 Elsevier Science B,V. All rights reserved PIIS0378-3758(97)00048-7

Upload: wei-liu

Post on 04-Jul-2016

216 views

Category:

Documents


3 download

TRANSCRIPT

journal of statistical planning

Journal of Statistical Planning and and inference ELSEVIER Inference 65 (1997) 109 127

A k-stage sequential sampling procedure for estimation of normal mean

Wei Liu Department o/' Mathenultics, Unit, erMtl q/' Southampton, Southampton SO17 IBJ, UK

Received 6 December 1995; received in revised fi)rm 3 December 1996

Abstract

We study a k-stage (k>~3) sequential estimation procedure which includes the three-stage procedure of Hall (1981) as a special case. With a suitable value of k, the k-stage procedure not only can be as efficient as the fully sequential procedure of Anscombe, Chow and Robbins in terms of sample size, but also requires at most k sampling operations. For the problem of constructing a fixed width confidence interval for the mean of a normal population with unknown variance, the three-stage procedure of Hall always needs a few more observations than the fully sequential procedure. The five-stage procedure, however, requires almost the same number of observations as the fully sequential procedure. © 1997 Elsevier Science B.V.

Kevwords: Confidence intervals; Normal distribution; Sequential methods

1. Introduction

Let N ( / t , ~2) be a normal population with mean tl and variance ~r 2, both unknown, and suppose we wish to construct a confidence interval for t~ with predetermined coverage probability 7 and width 2d. Let no = ( z o / d ) 2 be the sample size which would have been used to achieve this goal had 0 "2 been known, where z = • I((1 5 7) /2) and • and q5 are, respectively, the cdf and pdf o f a N(0, 1 ) random variable. The fully sequential procedure o f Anscombe (1953), Robbins (1959) and Chow and Robbins (1965), denoted as the ACR procedure hereafter, takes observations one by one, and makes a decision after each observation. Let YI, Y2 . . . . be independent observations from the N(it, cr 2) population and let 17,, and o-,2, be, respectively, the mean and variance o f the n-sample }I1 . . . . . Y,,

- on : ( Y i - 17):. n/=l n - - l ~ = l

0378-3758/97/$17.00 @ 1997 Elsevier Science B,V. All rights reserved P I I S 0 3 7 8 - 3 7 5 8 ( 9 7 ) 0 0 0 4 8 - 7

110 I4A Liu/Journal of Statistical Planning and ln/erence 65 (1997) 10~127

Then the ACR procedure continues sampling until

N=inf{n~>m0: n > 21,{r2},

where m0 (~>2) is the size of the first sample, 2 = ( z / d ) 2 and {l,} is a sequence of constants with l,, = 1 + 2~/n + o(n 1) as n -+ cxD where ~ is a given constant. On stopping sampling, construct the confidence interval Ix = 17N -+- d for /.z.

The ACR procedure is very efficient in terms of sample size, since after each new observation it updates the estimate of a 2 and checks whether enough observations have already been drawn. The following asymptotic result has been proved (see e.g. Woodroofe, 1977).

Theorem 1. Suppose that mo >~4, then as no--+ vo

E ( N ) = n o + v + 2~ - 2 + o(1),

where v = 3 _ ~ n ~ , n 1E{(z2 - 2n) +} ~ 0.817; moreover, tJ'm0 ~>7, then

P{l~ E IN} = 7 + ( 2 n o ) - ' z O ( z ) { Z ( v + 2~ -- 2) -- (1 + z2)) + o(n o ' ).

Therefore, at least for large values of no, the difference between the expected sample size of the ACR procedure and the 'optimal' sample size no is about v + 2 ~ - 2, and the coverage probability, although not precisely known, is close to 7. Despite its great efficiency, the ACR procedure can be expensive to carry out since it is fully sequential. In many real situations, significant savings can be achieved by gathering many observations together.

From Theorem 1 we choose ~=~ .0>0 satisfying 2(v + 2~0 - 2) - (l + z 2 ) = 0 so that the coverage probability is equal to 7 +O(no I ). The corresponding average sample size is then given by

£(N) =,,0 + (1 + z2)/2 + o(l). (1.1)

Stein (1945) proposed a two-stage procedure which allows us to construct a 2d-width confidence interval for/z whose coverage probability is at least 7, and it requires only two sampling operations to achieve this end. Let m0 be the size of the first sample, and let

T1 = max{[(t(m'o-2',)/2amo/d) 21 + 1, m0},

where [x] denotes the integer part of x and .(1-;,)/2 is the upper (1 - 7 ) / 2 point of l m o - - 1

Student's t-distribution with m0 - 1 degrees of freedom. Then Stein's procedure draws a second sample of size TI - m0, and constructs the confidence interval Iv, =- ['v, i d

for /~. Stein's procedure can be considerably less efficient than the ACR procedure in terms of sample size, since it takes the second sample together and the size of second sample depends entirely on the first sample. I f the size of the first sample, m0, is considerably smaller than the optimal sample size no, then Stein's procedure

V~ Liu/Journal of Statistical Phmnin~l and h~ference 65 (1997) 100 127 Ill

often leads to substantial oversampling. Cox (1952) showed that E ( T I ) - n o - * + ~ as

no --~ +oo if mo(no)/no --~ O.

As a compromise between Stein's two-stage procedure and the fully sequential ACR

procedure, Hall (1981) proposed a three-stage sampling procedure. It takes a second

sample of size T 2 - m0, where m0 is the size of the first sample as before, and

r2 max{[el ' 2 = Zo',,,0 ] + l,mo},

where 0<c~ <1 is a predetermined constant. Calculate the variance cry- for the pooled

sample of size T2, then take a third sample of size T3 - T2 where

T3 = max{[ ) / ; } :+ 1,1] + 1, T2},

where ~l is a predetermined constant. On stopping sampling, construct the confidence

interval IT, ==-- Yr,, 4- d for /~. The three-stage procedure requires one more sampling operation than the two-stage

procedure. It is more robust to oversampling when m is small however, since after

the second sample there is one more chance to adjust the final sample size T~ and the estimate of a 2 calculated after the second sample is more accurate. The following

result has been proved by Hall (1981).

Theorem 2. A s s u m e that

no -~ ,oc, m o = mo(n o) --~ oc, lim sup mo/no < c l and no = O(m(r)), ( 1.2 )

where r >~ 1 is a .fixed constant. Then

1 E(T3) =n0 2 / c 1 + 2 + q + o ( 1 ) ,

P{I t ~ 17, } = ;' + (2n0) ~z0(z){1 + 2q - (5 + z ~-)/cL } + o(nor ).

Therefore, in contrast to the two-stage procedure, E (T3) -n0 is bounded for the three-

stage procedure. To facilitate the comparisons with the ACR procedure, we choose q =: r/0 > 0 satisfying 1 + 2 r / 0 - (5 + z2)/cl = 0 so that the coverage probability is also

equal to 7 + O(no I) and the corresponding average sample size is given by

E( 7~ ) = no + ( 1 + z 2)/(2cl ) + o( 1 ). ( 1.3 )

From (1.1), (1.3) and 0 < c l <1 , at least asymptotically, E ( N ) < E ( T 3 ) , as one would

expect. Although asymptotically E(T3) can be made arbitrary close to E ( N ) by setting the value of c~ close to one, this fails for smaller samples for the following reasons. If

c~ is close to one then a large proportion of total observations is taken in the second sample and so not much ground is left for manoeuvre in the third sample; even it" Ta is found to be larger than the total sample size after the second sample is taken, it cannot be reduced. Note that c~ = 1 corresponds to a two-stage procedure. On the other hand,

if c~ is close to zero then too few observations are taken in the second sample and so the goal of getting a more accurate estimate of a 2 (and hence the total sample size)

112 ~Id Liu/Journal of Statistical Plannin9 and Inference 65 (1997) 109 127

cannot be achieved. Again, cl = 0 corresponds to a two-stage procedure. From these observations, the value of cj should be neither too large (close to one) nor too small

1 (close to zero). Indeed, the value of cl = g was recommended by Hall (1981) based on simulation results.

When cj = ½ then from (1.1) and (1.3) we have, at least asymptotically, that E ( T 3 ) - E ( N ) ~ (1 +z2) /2 , which is equal to 2.4 when 7=0.95 and 3.8 when 7=0.99. So, on average, the (best) three-stage procedure will need about three observations more than the ACR procedure when ~/= 0.95; this figure increases to four when 7 = 0.99.

The purpose of this paper is to demonstrate that a k-stage procedure with a small value of k (~>3) can be as efficient as a ACR procedure in terms of sample size for both larger and smaller samples. For the problem of constructing fixed width confi- dence intervals considered in this paper, the simulation results indicate that k = 5 is sufficient. A k-stage procedure then has the extra advantage that the number of sam- pling operations, k, is known before observing data and often much smaller than that of a fully sequential ACR procedure. A general k-stage procedure is defined in Section 2. The asymptotic theory is contained in Section 3 with the proofs given in the appendix. The results of a series of Monte Carlo simulations are presented in Section 4. Some concluding remarks are contained in Section 5.

2. The k-stage procedure

Fix k(~>3) and the constants 0 < c l < . . . <ck-2 < 1. Take a first sample of size m0, and take the next k - 2 samples sequentially with the ith (i = 2 . . . . . k - 1) sample of

size Mi_l --Mi 2, where

M j = max{[c j2a2 ] + I , M i _ I } , l ~ j < ~ k - 2 ,

and M0 = m0. Then, take a final sample of size Mk 1 - Mk 2 where

Mk-, = max{[2a2k , + r/] + 1, Mk 2},

where r/ is a predetermined constant. On stopping sampling, construct the confidence interval IMp_,- IrMa_, + d for /~. Denote this procedure by ~k(Cl . . . . . ck-2). Hall's (1981) three-stage procedure corresponds to k = 3.

3. Asymptotic theory

The assumptions used to establish the asymptotic results are the same as those used by Hall (1981), which are given in (1.2).

Theorem 3. I f (1.2) holds then as no--~ ~xD

I E(Mk_ 1 ) = no - 2/ck-2 + ~ + tl + o( 1 )

W. Liu / Journal o!" Statistical Plannimt and h!/ereJtce 65 f 1097) 1(19 127 I13

~ltl[t

P{I' ~ IM, ~} = 7 + (2n0)-Iz~b(:){ 1 + 2 , 1 - (5 +z2)/ck_2} + O(no I).

The proof of the theorem is outlined in the appendix. It is interesting to note from

Theorem 3 that the larger sample behaviour of the k-stage procedure depends (mainly)

on the value of ca-2 but not on k and the other ci's. This, however, is not true

for smaller samples, as has already been pointed out for the three-stage procedure in Section 1. By reparametrization, the following results on E(Mi), i - 2 . . . . . k - 2 follow

directly from Theorem 3. The result on E(M1) can be proved directly.

Corollary 4. / . / ( I .2 ) hol& then as no -~ oc

E(Mi ) = c i n o - 2Ci/C, I + 4 + O( 1 ), i - - 2 . . . . . k - 2,

I E(MI) tin0 + ~7 + o(1).

~ > 0 so that the coverage From Theorem 3, we set ~ / = t / 0 = ( 5 + z2)/ ' (2ca._2)- 2

probability is equal to ,, + o(n0 -1 ); the corresponding average total sample size is then

given by

E(Ma i ) - - n 0 + ( 1 + z 2 ) / ( 2 c k 2)+O(1), (3.1)

From (3.1), E(Mk_ I ) > E ( N ) asymptotically. Result (3. I ) also suggests that the value

of ca. 2 should be set close to unity in order to reduce E(Mk-I ) - E(N) . This fails when k 3, however, for the reasons given below (1.3). Nevertheless, we demonstrate

in the next section that E(Mk I) can be arbitrary close to E(N) by choosing a suitable

combination of k and cl . . . . ,ck-2 with ck 2 close to unity.

4. Simulation results

Although large sample results suggest that the performance of a multi-stage pro-

cedure will not be improved by increasing the number of stages, this is not true for

smaller samples, in a series o f Monte-Carlo trials conducted, we set l , , - 1 + 2~0/n and

~7-= ~10, and so that the coverage probabilities of the ACR and the k-stage procedures are both equal to 7 + O(no 1). Varying 7 from 0.90 to 0.99 leads to similar results.

So we shall report in detail only for 7 = 0.95, which implies that z = 1.96, 2 ~ 0 - 3.604 and ~fl)= 4.421/c~_2 - 0 . 5 . Note that both the ACR and the /,-stage procedures depend

on d and a only through d/a, and d/a :=z/x/n o. Therefore, the simulation results can

be presented in terms of n0 as we did in Table I. It is not necessary to list the val-

ues of both no and d as in Hall (19811). We used a wide range of values of no and m0 = 10,20.

Table 1 contains the results of some of the Monte-Carlo trials conducted with each entry based on 10000 trials. The four procedures presented are . ~ ) - the ACR

114 ~ L i u / J o u r n a l o f Stat is t ical Planning and lnJerence 65 (1997) 109-127

Table 1 Results of 10000 Monte-Carlo trials with 7 = 0.95, 24o - 3.604 and qo = 4.421/ck_2 - 0.5

no :~ m - 10 m = 2 0

3]I &l - no SM p ATI ATI -- no sM p

24 ~o 26.0 2.0 7.2 0.946 27.1 3.1 5.7 0.955 2J°5 26.5 2.5 7.8 0.946 28.2 4.2 6.0 0.959 ~4 27.2 3.2 8.0 0.949 28.9 4.9 6.2 0.959 ;~3 30.6 6.6 8.4 0.96l 32.8 8.8 7.4 0.970

43 .~o 44.9 1.9 9.9 0.942 45.2 2.2 9.4 0.945 ;~5 45.0 2.0 11.4 0.942 45.6 2.6 10.2 0.945 ,~4 45.3 2.3 12.0 0.942 46.2 3.2 10.6 0.948 ,~3 47.6 4.6 14.0 0.947 49.9 6.9 11.5 0.960

61 ;~o 63.1 2.1 11.7 0.946 63.2 2.2 11.4 0.948 ~5 62.9 1.9 13.5 0.944 63.5 2.5 12.7 0.948 ~4 63.0 2.0 14.7 0.946 63.7 2.7 13.4 0.945 -~3 65.3 4.3 17.8 0.943 66.4 5.4 15.4 0.949

76 ,~0 78.2 2.2 12.8 0.946 78.2 2.2 12.8 0.946 ~5 78.1 2.1 14.7 0.945 78.4 2.4 14.0 0.945 ,~4 78.1 2.1 16.2 0.942 78.5 2.5 14.8 0.944 -~3 80.0 4.0 20.3 0.946 80.5 4.5 18.1 0.950

96 ,~l) 98.2 2.2 14.2 0.947 98.2 2.2 14. I 0.947 .~5 98.2 2.2 16.1 0.944 98.4 2.4 15.6 0.949 ~4 98.5 2.5 17.7 0.946 98.5 2.5 16.9 0.944 ~3 100.1 4.1 22.9 0.945 100.5 4.5 20,9 0.949

125 ~o 127.3 2.3 15.9 0.948 127.3 2.3 15.9 0.948 ?-'~5 127.5 2.5 18.0 0.946 127.2 2.2 17,5 0.947 ;~'~4 127.8 2.8 20.0 0,942 127.4 2.4 18.7 0.947 :~3 129.1 4.1 26.3 0,944 129.0 4.0 24.2 0.941

171 ~o 173.3 2.3 18.7 0,947 173.3 2.3 18.7 0.947 ~5 173.6 2.6 20.5 0.946 173.7 2.7 20.1 0.945 ~4 174.4 3.4 22.5 0.945 173.8 2.8 21.6 0.946 '~3 175.9 4.9 30.5 0.946 175.1 4.1 28.0 0.946

246 ,To 248.5 2.5 22.3 0.947 248.5 2.5 22.3 0.947 ,~5 248.9 2.9 23.9 0.950 248.7 2.7 24.0 0.951 ~4 249.8 3.8 26.8 0.948 249.0 3.0 25.6 0,950 .~3 250.7 4.7 38.4 0.948 250.7 4.7 33.8 0,949

384 :~o 386.4 2.4 27.9 0.947 386.4 2.4 27.9 0,947 :~5 387.0 3.0 30.0 0.946 386.6 2.6 29.6 0.949 :~4 388.8 4.8 34.9 0.950 387.4 3.4 31.6 0.948 :~3 389,7 5.7 47.0 0.946 388.6 4.6 42.3 0.944

p r o c e d u r e , ~3 ~ 3 ( 0 . 5 ) , ~ 4 = ~ 4 ( 0 . 5 , 0 . 8 ) , a n d ~5 ~ - ~ 5 ( 0 . 4 , 0 . 7 , 0 . 9 ) . F o r e a c h p r o c e -

d u r e w e c o m p u t e d t h e a v e r a g e to ta l s a m p l e s i ze M , t h e s t a n d a r d d e v i a t i o n o f t he to ta l

s a m p l e s i ze sM, a n d t h e p r o p o r t i o n o f t i m e s p t ha t /~ is c o v e r e d b y t he c o n f i d e n c e

i n t e r v a l s .

IV. Liu / Journal of Statistical Phmnin.q and h~l~'rence 65 (1997) 109 12": I 15

From Table 1, it can be seen that all the coverage probabilities are very close to the'.

target value ;' = 0.95. With respect to the expected sample sizes, ,~3 always needs a few more observations than -~0. But the differences between .#5 and a#0 are negligible and so ,#5 is almost as efficient as ,~0 in terms of total sample size.

Also note that, asymptotically, the ACR procedure requires on average 1÷ no + (1 z 2)/2 - m o sampling operations. The k-stage procedure, on the other hand, requires at most k sampling operations, which is substantially less than that of the ACR procedure, especially when n o - mo is large. Therefore, a k-stage procedure with suitable values of k and c~'s is preferable to the ACR procedure.

5. Concluding remarks

We have demonstrated that, by choosing suitable values of k and ci's, the k- stage procedure can be as efficient as the ACR procedure in terms of sample size. But, of course, the k-stage procedure requires substantially less sampling operations than the ACR procedure. The problem of constructing fixed width confidence in- tervals serves only to demonstrate the idea, which can be used to deal with many other problems, for examples, the sequential estimation (see e.g. Woodroofe, 1977), ranking and selection (see e.g. Mukhopadhyay and Solanky, 1994), hypotheses test- ing (see e.g. Wittes and Brittain. 1990), simultaneous confidence intervals (see e.g. Liu, 1995).

Acknowledgements

1 would like to thank a referee for reading the manuscript very carefully and fi3r useful comments.

Appendix

Although we can proceed to prove a result corresponding to Theorem 2 of Hall (1981, p. 1233) under (1.2) and the assumption that EI yil4r<.:wo (the normality of Y, is not necessary), the proof given below uses the normality of Yi however. This allows us to employ the Helmert transformation and work only with the partial sums of i.i.d, random variables, and hence simplifies the proofs, e.g. the conditional expectation result of Lemma 2 in Hall (1981, p. 1233) can be simplified considerably. Throughout the appendix, Co denotes a generic constant and I ( S ) denotes the index function of set S.

Let Xi = { i ( i + 1 ) } - ' ( i Y i + t - ~.ii=, yj)2/'o'2, i = 1,2 . . . . . which are independent 7.~

random variables. Denote ) [ , , = ( 1 / n ) } 2 ' i ~ £ and S,, ~ i= ,Xi . Let co ca-~ = 1,

I I 6 144 Liu/Journal of Statistical Planning and InJerence 65 (1997) 109-127

N o = m o - 1 and

N, = max {[clnoXuo],No), L, = [c,no)(N,~],

Nk-2 = max{[ck_2noXN, 3] ,Nk-3}, L~-2 = [Ck-ZnO2N~_~],

N k - l = max{[no)(N~ _~ + q],Nk-2}, Lk , = [no~v~_2 + r/].

Then it is c lear that Mi = Ni + 1, i = 1 . . . . . k - 1. W e shall p rove

E(Nk , ) = no - Var(Xi )/ck-2 - ½ + q + o( 1 ), (A. 1 )

Var (Nk_, ) = noVar(Xj )/ck-2 + o(no), (A .2 )

E INk- 1 - ENk_ ,I 3 = o(n~ ). (A .3 )

We first state L e m m a A. 1, which col lects all those asympto t i ca l ly negl ig ib le sets that

wi l l be used in the sequel,

L e m m a A.1. Let

A i ( e ) = { / ) ~ - 1]~>e}, i = 0 , 1 . . . . , k - 2 ,

B i = { N i = N i - l } , i = 1 , . . . , k - 1,

Ci(e) = {[Ni - cino[ >~ecino}, i = I . . . . . k - 2.

Then Jor any ~,>0 there exist 0 < 6 o < 1 and 0 < 6 ~ = 6 1 ( e ) < 1 such that for all suf-

f iciently large no,

BI C {I)(No -- 11>~60} = Ao(6o),

Ci(e) C Bi U Ai - ,(6, ),

Bi+~ _c q ( 5 o ) u {12I(,_~o)c,.ol - II ~>6o}.

A i ( e , ) c C i ( 6 j ) U {I)~[(~_,~)c,,,o] - ll~>61 } U {[)([(J+o,) ..... ] - - 1 [ ~ 6 1 }

for i = 1 . . . . . k - 2 .

F rom the assumpt ion no = O(mD) o f (1 .2) and the Markov inequal i ty we have that

a s n o ~ o o

P { I f ( N o -- l l J>6o}=O(no p) and P{])([(l±~)~,,,o]- l l>~6j} =O(no p)

for any p > 0 since E(X~ ~p) < oo. It fol lows therefore f rom L e m m a A. 1 that for all

sufficiently smal l e > 0 and any p > 0

P(Ai(8))=O(noP), P ( B i ) = O ( n o p) and P(Ci (e , ) )=O(no p) as no -+ oo

for all those Ai(e), Bi and Ci(e) defined in L e m m a A.1.

~( Liu/Journal q[' Statistical Planning aml h!/erence 65 (1997) 109 127 117

Lemma A.2. We have that for some constant ('~h

E{ev~._ 3(2~,, , 1 )} = 0.

E{Nk-3(X,v,, ~ 1)2}=Var(XL)+E{((S ,v~ ~ Na 4 )2 -Nk-4Var(X i ) ) /N~ ~}.

E{( , /xx~(2,~, - 1)?}~<c0. i - - 0 . . . . . k - 1.

E{(S.\. - N~) 2 - NiVar(X1)}=O, i = 0 . . . . . k - 1.

Proof. The expectation on the right-hand side of the second assertion is taken to be

zero if k = 3. The first two assertions are clearly true if k = 3, and can be proved in u

fairly straightforward way by using the conditional expectation formula conditioning on

Xi . . . . ,Xx,. , when k>~4. We use mathematical induction to prove the third assertion.

It is true when i = 0 and assume that it is true when i j (0~<.j~<~k- 2). Then when

i = j + 1 we have

/ ' = E ~ ( ( S , v , - N ) 4 + 6(S,v, - N)=(N+, - Nj)Var(XI )

-~ 4 ( S N , - - ~ ) ( ~ - I - - Nj IE{ (~V 1 - 1 )3 } -7 ( N j - l - , ~ / / ) E { ( X I - 1 )4 })}

< ~ E { ( ' , / N j ( Y N , - 1)) 4} ÷ 6E{ ( , /N j (2 .v , 1 ))2 }Var (X l )

÷ 4E( , /N i I2N. ' -- l l ) E { ( X , - 1) s} + E { tX l - - 1 ) 4 } ,

which is bounded by the Cauchy-Schwarz inequali ty and the assumption that E{(v~ ()[',v/ - I ) ) 4} is bounded. The fourth assertion fol lows directly also from a condit ional argument.

Lemma A.3. As no --+ ,:x:,, we have t h a t j o r an.l, O<~,j<~k - 1 and e > 0

E(~31(AiO:))) = o(1), E(Nj31(Bi))=o(1) and E(N/31(C,:(~:)))=o(1)

.~or all those Ai(r,), Bi and C,(e) delined in Lemma A. 1.

Proof . Let H be either Ai0:) or Bi or Ci(~). We shall show E{~31(H)} o(I ) by using

mathematical induction on j , This is true when j = 0 by Lemma A.I , and assume this

is true when j = l (O<~ldk - 2 ) . Then, when j = 1 + 1, it follows from the inductive

118 W. Liu/Journal of Statistical Planning and InJerence 65 (1997) 10~127

assumption and Lemma A.1 that

E{N~+II(H) } <~ E{(Cl+InO2N~ + I I q- N I ) 3 I ( H ) }

<<. 9c'~+ln~E{O(Ni )3](H)} + o(l ).

It remains to show that

( * ) ~ n 3 E { ( X N , ) 3 [ ( H ) ) = o(1 ),

which can be seen from Lemmas A.1 and A.2 and

( ~¢) <~ 4n30E{]XN, -- l ]3I(H)} + 4n30P(H)

<~ 4n3o(E{(f2N, -- 1 )4} )3/4(p(H))I/4 + 0(1 ) = 0(1 ).

The proof is thus completed. []

Lemma A.4. As no --, 2 , noE(XN~_2 ) ~ no -- Vat(X1 )/c~-2 + o( 1 ).

Proof. Let

~" Bk-2 U Ak-3(e) U Ck-3(e) if k ~>4, Dk 3 = [ Bk-2 UAk-3(e) i f k = 3,

with e > 0. We shall show that as no ~ oc

noE{XNk 2I(Dk-3)} = 0(1),

noE{fQ_~I(D~_ 3 )} = no - Var(Xl )/Ck-z + 0(1 ), (A.4)

from which the lemma follows. First we have

noE{2x~_2I(Dk_3 )} = noE{I(Dk_3 )E(XNk 2 IXI . . . . . XN~-3 )}

=noE { I ( D ~ _ 3 ) ( ~ ( 2 N ~ _ ~ - I )+ I) }

<_ noE(I(Dk_3)12N~ ~ - 11) + noP(Dk-3 )

<~no{P(Dk-3)E(v~k-3 I 2 N ~ 3 - - 1])2)'/2 + noP(Dk-3),

which is o(1) since n2p(Dk_3)=o(1) by Lemma A.1 and E(V~k_31kN~_~ -- 11) 2 is bounded by Lemma 2.

To prove (A.4), note that

Ck-2noE{2N~ 21(D~_3)} = ck-2noE{I(D~_ 3)E(XN t Z [XI , ' " ,XNk-., ) )

= E I(Dk-3)~k_2Nk-3(XN~- 3 -- 1) + ck_2noP(Dk_3)

f Ck--2no - 1)~ + c~-2no = E .I](Dk_3)~Nk_3(XN~_3c + O(1)

( ~vk-2 )

~/~ Liu/Journal of Statistical Phmning and h~/i'rem~e 65 (1997) 109 127 I I~;

E{I (D~ 3)(1 - ()(~,~_~ - 1) ÷ R k 3)Nk 3()(N,; ~ -- 1)} ~- ck_2no + o( l ) ,

(A.5 I

where Rk 3 ~ ck-2no/Nk-2 -- 1 ÷ (2~,~ ~ -- l). Now~ on D ~ - - k 3, we have Rk-3 = c~-2no/[ci,.-2noXN,~ ~] 1 + (2N~ ~ l) and it can

be shown in a straightforward way that ]Rk-31~<Co{(2x~ , - 1 ) 2 + (ck-2no) ~j } for

some constant Co. Consequently,

]E{I(D~I _3)Rk_3Nk_3(2,v~_~ - 1)}l <~CoE{I(D'£_3)((2N ~ .. 1) 2 ÷ ( c k 2no)-l)Nk-312.\,~_~ -- 11}

< ~ C o c E { l ( D ; _ 3 ) ( ( 2 x ~ ., - 1) 2 + (ck 2no) - I )N~ 3}

~<Coc(E{Nk-3(2xx 3 - 1 ) 2 } + ( l + ~ ' ) c k - 3 / ' c a - 2 ) ~ 0 as ~:--~0 (A.6)

since E{Nk_3(J~v~ , - 1) 2} is bounded by Lemma A.2.

We also have

IE{I(D~I 3)Nk 3(2N~ 3 1)}i

= IE{Na 3(2N~ ~ - 1)} E{I (Dk_3)Nk 3(.~, ~, ~ - l)}]

- - IE{ I (Dk-g )x /N~ , -3x /Nk 3 ( 2 ~ ~ - 1)}1

<~(E{I(D~. 3)Nk 3}E{Nk-3(2.,\'~ ~ - 1)2}) 1 ' 2 = 0 ( 1 ) (A.7)

by Lemmas A.2 and A.3. Finally, we show that

c E{I(Dk 3)Nk 3(XN~_3 l ) 2 } ~ V a r ( X l ) + o ( l ) , (A.8)

and then (A.4) follows from (A.5) (A.8). It follows from Lemma A.2 that

E{](D~. 3)Nk 3(XNk_; - l ) 2}

E{Nk_3(2s,,, ~ - l ) 2} - E{I(DN,, 3)Nk-3()(N,. ~ - 1)21}

= E { N k 3(2,v~ ~ - 1 ) 2 } + o ( 1 )

= Var(Xl ) + E{((SN~_4 - N/,-4)2 N~_4Var(Xi ))/Nk-3 } + o(1 )

and so (A.8) is obviously true when k =--3. It remains to show that when k >~4

E{((S~-~ 4 - Nk-4): -- Nk-4 Var(X1 ) ) /Nk-3 } = o(1 ).

Let

H = ,J" ck_3(c) if k 4, / Ck-3(r ) U Ck-a ( r ) if k ~ 5,

120 14~ Liu/Journal of Statistical Planning and Injerence 65 (1997) 109 127

with ~ > 0, then it follows from Lemmas A. 1 and A.2 that

E{I(H)((SN~_~ - Nk-4)2 _ Nx_4 Var(Xl ))~Ark-3 }

<<.NoE{[(H)(N~-4(£N~_~ - 1 )2 + Var (X1 ) ) )

~<(E{((Nk-4()~A~_~ - I ) 2 ) 2 } N 2 p ( H ) ) 1/2 + V a r ( X , ) N o P ( H ) ~ - o ( 1 ) . (A.9)

We also have that for all sufficiently large no

[ c E{ (H)((SN~_a Nk 4) 2 -- Nk-4 Var(X~))/Nk_3}

c , ,~E{I(H )(S~,~_~ - Nk_4)2/((1 ~)ck-3no)}

- E { I ( H ¢)Nk-4 Var(Xi )/(( 1 + e)ck-3no ))

= E { I ( H c)((SN~_~ - N~_4)2 _ Nk-4 Var(Xl ))/(( 1 - ~:)ck_3 no)}

+ E { I ( H C )Nk-4 Var(Xi )2e,/(( 1 - c,)( 1 + e)ck-3no ))

= E{I(H)((SA,~ ~ - Nx 4) 2 - Nk 4 Var(Xl)) / ( ( l - e)ck-3no)}

+ E { I ( H C ) N k _ 4 Var(Yi )2e/((1 - e)(1 + ~:)ck-3no)}

<~E{I(H)((Szv~ ~ - Nk_4) 2 - Nk-a Var(Xl ))/((1 - ~,)ck-3no)}

+ ck-4 Var(Xl )2e,/(( 1 - e)(l + e)ck-3 )

= o(1 ) + ck-4 Var(Xi )2e/((1 - e)(l + e)ck-3) -+ 0 as e --+ 0, (A.10)

where the second equality follows from Lemma A.2, and the last equality follows

from the Cauchy-Schwarz inequality and Lemmas A.2 and A.3. A similar argument establishes that

E{I(HC)((SN~ ~ - Nk-4)2 N~-4 Var(X1 ) ) / N k - 3 )

/> o( 1 ) - ck-4 Var(Yl )2e/((1 - e)( 1 + e,)ck-3 ) --~ 0 as e, --~ 0. (A. 11 )

Now (A.8) follows clearly from (A.9)- (A.11) , and so the proof is completed.

The following lemma can be proved easily by using Lemma 3.

Lemma A.5. As no--~ oG, E(Nk_ l ) = E(Lk 1 ) + o(1).

Lemma A.6. As no --~ cx~, U~, ~noXLk ~ + tl -- [noXL~ ~_ + q] is asymptot ical ly uniform

on (0, 1).

Proof. Let J ~ [ c k 2no~v¢,_~], V=--ck-2no)(x~ ~ - J . An argument similar to Hall (1981, p. 1237) shows that for 0 < x < 1, J > Nk-3 and V C (0, 1)

P{U,,,,<~xIJ, V, N k _ 3 } = x ÷r4 .....

where

Ir4,,¢,l<~ Co(Jl/2/no + J /n 2) as no---+ cx2

W Liu/Journal of Statistical Plannilz(t and h!/~,reHce 65 (1997) I09 127 121

uniformly in V ¢ ( 0 , I) and J > (1 + ~:)Nk-3 for c > 0 and some constant Co. Consc-

quently,

P{U,,,, <~x} = E{P(U,,,, ~<x I J, ~. Nk 3)}

= E { P ( U , , . ~ x ] J , V , Nk 3 ) I ( J > (1+~:)N~ ~)}

+E{P(U,, , , <~x [J, V, .~,~_ 3)1(.I ~<(1 + c)Nz. 3)}

- E { ( x + r4,,,, ) l ( J > ( 1 4 ~,)N~_~ )}

+ E{P(U,,,, <~x ]J. V, N~-3 ) l ( J <~ (1 + c)Nx -3 )}

X _L V5no '

where

]r5,,,,l<~P{J~<(1 +~:)Nk 3 } + C o E ( j t 2 / n o + J / ' n o ) + P { J ~ < ( 1 +c)N/ , 3}.

It remains to show that as no ~ ,~c and fi~r small ~" > 0

P { J < ~ ( 1 + c ) N ~ 3 } = 0 ( l ) and F~(J/'n2)=o(1).

Now

E(J,'n~;) <~ - , 2

by Lemma A.2, and for all sufficiently large no, small c > 0 and k>~4

P{J ~<(1 + ~'.)Nk-3} ~<P{ck :.02v,_, ..<(1 + ~:)Nk 3}

~<P{c/,--2n0)~v, , ~<ca-2no(l c)}

-- P{ca-2no(1 - ~¢)~<(1 + ,~;)N/~-3 + 1 }

<~P{Aa 3(c)} + P{C/,--3(f:)} (A.12)

which is o(1 ) by k e m m a A.1. When k =:3, it fol lows directly from (A.12) that

P { J ~ ( l +,~:)N/~ 3 } = o ( 1 )

by noting that lim supNo/(c lno) < 1. The proof is thus completed. []

We are now ready to prove (A.1). It follows from Lemmas A,4 A.6 that

E(Na. I ) = E ( L k - I ) + o ( 1 ) = E ( l n o ~ v ~ ~ + ~ 1 ] ) + o ( 1 )

E([no)(L,, : + ~l] - (noXl~ : -~ ~l)) + E(noX& : + ll - nof(,v~ : - Jl)

+E(noXx~ : + ~ 7 ) + E ( [ n o X x . . + ~ 7 ] [noP(c, : + t 7 ] ) + o ( I )

_ I + E(n~ )(c~ . no)(~\~ ,) + no Var(Xi )/c~ 2 + 11

+ E([n03?,v~_~ + ~l] - [n03?L , + ~1]) + o(1 ),

122 I~E Liu/Journal o[' Statistical Planning and Inference 65 (1997) 10~127

and so (A. 1) will follow if we show that

E(noXLk 2 - noXN~_2)=o(1) and E([noRN~_2 + q]--[nORLk 2 + q])=o(1).

Let Bk-2 be the set defined in Lemma A. 1. Then

[E(noX&_2 - n0~v, :)l = f (nO2L,_~ -- no2x,_2)dP dB h 2

noSN~_2 dR + nOXNk 2 dP, 2 k - - 2

which is o(1) by the Cauchy-Schwarz inequality and Lemmas A.2 and A.3 as before. A similar argument establishes the other assertion and hence (A. 1) is proved.

Next we prove (A.2) and (A.3). First we can prove Lemma A.7 by using Lemma A.3.

L e m m a A.7. As no ~ oo, Var(Nk 1 ) = Var(Lk_ j ) + o( 1 ).

A simple conditional argument by conditioning on X, . . . . . XN~ 3 gives:

L c m m a A.8.

Var0(N~ 2) Var(NNk--~()(N~-~ 1)) [/'Nk-3"~ = - - Var(X1 )E ~ ~--2-- !

+ Var(X' )E (N--~_2) •

Lemma A.9. The following are true as no-~ oc:

E(N~-_12)=(ck 2n0)--l(1 ~- o ( l ) ) , (A. 13)

E(Nk- 3 )/no <~ Co, (A. 14)

E ( N k - 1 2 N ~ 3 ( ~ ' - I ) ) 1 = °(n° I/2)' (1.15)

E ( 1 2 N ~ ~ - l13)=O(nol). (A. 16)

lqA Liu/ Journal of Statistical Planning and ln/~'rence 65 (1997) 109 127 123

Proof. We first prove (A.13). Let Ck_:(c) be the set defined in Lemma A.I with

~: > 0. Then it follows from Lemma A. 1 that

E ~ Ck-2no

c k - 2 n o d P + - - 1 d P ~<' '~ 2U:) N~ . -2 , ' _-0) Nk 2 d P + . . ,I::t

4 G)~: + ck 2noP(G-2(e,) ) + P(Ck-2(~:)) -+ 0

as n o ~ V C and then c,-+0. This proves (A.13). The assertion (A.14) is clearly true when k = 3, and can be proved in a similar way as (A. 13) by using Lemma A.3 when

k~>4. Assertion (A.15) follows from Lemmas A. I -A.3 by noting that

i , (¢ao_ l "~o ( % - 1 ) = ( x , - 1 ) N~_7 \Nk 2 ck-2dao

= ( l ( C k _ 2 ( g ) ) + l(Ca, 2 0 " ) ) ) \ N k _ c'k- 2 x / n o

~<Co~:E(V~k-aI£N~_~- l l ) + Co~/aoE{I(Ck ~0:)'/N~ 31L~, ~ - ll}

~<Co~'E(x/Na, 31)7x,._,- 11)+ C o x f n o { P ( C k _ 2 ( , ; ) ) E ( N / , . 31~% , - -112)} 12

+ G ~ / ~ / ~ o { E ( N k - 3 1 ( G - 2 ( r , ) ) ) E ( N k - _ ~ [ 2 < ~, - 112)}*~2

Co~ , ;E(~ /Nk 3[)~N¢_~ -- 1[)+O(1)---+0 as ~:--+0.

T o p r o v e ( A . 1 6 ) , note that

E(I)(~ % : - 1] 3) = E(]X,N~ z - 1]31(Cf 2(F'))) -- E([)(,v, : - 1131(G_2(~:)))

.1 , , 3 2 E { ( , / ~ " ~12~,~ , - ll) -~} ~< ((1 - e 4 c k - n o ) ,'

+ ( f (XNa_e - ] )4 )3 /4 (P (Ck -2 ( I ; ) ) ) I4 = n 0 1 0 ( 1 )

by Lemmas A.I and A.2. This completes the proof of the lemma. UA

Lemma A.10. A s no ---+ cx>

Var ~ )_2 (x, - l ) - Va,'(X, )E \ ~ _ ~ j ~ o(,,0 ' ).

124 [4d Liu/ Journal of Statistical Planning and InJerence 65 (1997) 109 127

Proof. From Lemma A.9, the left-hand side of the equation above is equal to

/ , ~ ( x / - 1) X;2

- E ~ (All- 1) -- Var(X,)E ~N2 z j

_ _ ~ 1 ( 1 ~ ( i _ _ 1 ) -- VaF(XI )E + o

and so It remains to show that

{ )} () fNk-3 X~ (Nk 3~ o 1 ~ ( ~ < , - 1 ) -Var<.,)~t--~_~: ~o ' ( . )_--e ~ =-

It follows from Lemmas A.1-A.3 and Wald's equation that

1 E (x i - 1) ($¢) <. E{Nk-3(XN:,_3 - 1)2I(Ck-2(e))} + (-(1 - e)ck_2no) 2

+Var(XI)P(C/~_2(e) ) - ((1 V~)rto)2E{N'-3I(C~-2(~))}

- - ( ( 1 - - g)¢k 2 r t 0 ) 2 ( ( 1 + e,)ck_2no) 2E(Nk-3) + ~0

= 4 e V a r ( X l ) E ( N k - 3 ) 1 + o ( 1 ) , ((1 -- g)(l + F.)Ck_2) 2 no no

which is o(1/n0) since g can be arbitrary small and E(Nk-3)/no<~Co by Lemma A.9; a similar argument shows that

4eVar(X1) E(N~_3) 1 + 0 ( 1 ) (-k) >/ - ( ( 1 - ~)(1 + ~)ck 2) 2 no no ~0 (1)

= O a s £ --~ 0 .

The proof is thus completed. []

Now we prove (A.2). From Lemma A.7 it suffices to show that

Var(Lk_ I ) : no Var(Xl )/ck 2 + o(no).

H~ LiulJom'nal qf Statistical Plwmin# and h!fbrem'e 65 (1997; 109 127 125

From Lemmas A .8 -A . IO, we have

nc~ 2 Var(noff~,~ _: ÷ ~7) = Var()(v~ : )

= Vat \Nj~_~(A~,,,. ~ - 1 ) Va r (Xi )E + V a r ( X i ) E ( ~ ) .

= O(#/<i I ) + Vat(X1 ) 1 ( 1 + o( 1 )) C]~ 217 0

1 Var(Xi ) - - 4- o(no -I ),

C/, 2 #10

and so it remains to show that

Var(L~ I) - Var(no,~, , : + ~ l )=o(no) .

This follows directly from the following three facts:

Var(L/, I) : Var{(no)~x~ 2 4- tl) + (Lk I - no)(,\.,.. - 11)}

Var(noJ(x,,__~ + ~l) + Var(Lk i - noR.\, . -- I1)

+ Coy(no2,\,-< _., L/,_ i - no~%, . ),

Var(Lt l - n o ) ( , ~ , , . - # 1 ) ~ < 1 since IL/, 1 - n o R a , , - ~ 1 1 ~ 1 ,

and

Cov(noJ(,~,~ _~,Lx. I no)~a,~. _.)~<(Var(noJ(x,. )Var(L~ i noJ(x, .... --I1)) i :

~<(Var(no)(~,, :))1..2 (no Var(Xi )/'c, __- + o(no)) i 2 = o ( n o ) .

The p roof o f (A.2) is thus completed. To prove (A.3) , note that

EIN~ i ENI~_il3:--EINk l - n o + V a r ( X i ) / ' c , 2 + 1 , " ° - ~ 7 + o ( 1 ) [ 3 , -

and so it suffices to show that EINz_ i - nol 3 =o (n~ ) . Let B , - i be the set defined m Lemma A.1, then

~; 'Nk i n o [ 3 d p ~ /£ (Nk-1 4 - n o ) 3 d p o(no)

by Lemmas A.1 and A.3, and

INs~-i - n°13 d P : /'B I[n°';~"'~-~ + ' / ] - n ° 1 3 d p

~ < E l [ . o 2 ~ , , . . . . + '7] - nol 3 ~<E{(i,~o2~,.. ~ - . o l + ,1 + 1) 3 }

= o ( = ~ )

126 W. Liu/Journal of Statistical Planninq and lnJerence 65 (1997) 109-127

since E{(n0 I~v~_2 - 11) 3} =o(n2) by Lemma A.9. So EINk_, -n013 =o(n 2) and the proof is completed.

Proof of Theorem 3. The result on E(Mk-l) follows obviously from (A. 1) since Mk-I = Ark-t + 1. The result on the coverage probability can be proved by using the Taylor expansion and (A.I ) - (A.3) in the usual way (see e.g. Hall, 1981, p. 1231), except that the following result is required:

E{~k'"(~.)(p2Mk 1 - p2EMk-I)3} = o(p2) as no --+ w, (A.17)

where ~9(x)=2@(vff ) - 1, p = d / a = z / v / n o and ~ is an intermediate point between p:Mk-1 and p2EMk_ 1 = z: +( ½ +q-2/ck_2 )p2 +o(p: ). To prove this, let E = {p2Mk_ 1 >/ p2EMk_~ - 6} where c5 > 0 is sufficiently small such that p2EMk_~ ) 2 6 for all suffi- ciently large no. By noting that O,t,(~) is bounded on E, we have

I E { ~ ' t ' ( ~ ) ( P 2 M k - J -- p 2 E M k - I )3I(E)}]

~Cop6EIM~-I - EMk-1 [3 = Cop6EINk I - ENk-l ]3

= Cop60(n20) = o(p 2)

by (A.3). Note that ]~/"(x)l <.Cox -5/2 for all x > 0 and so

IE{~bm(~)(p2Mk_~ - p2EMk j)3I(EC)}I

<~ CoE{(pZMk-l )-s/Z(p2EMk-1 )3I(E~)}I

<. Co(p2mo) 5/2(2z2)3p(Ec)

<~ 8Coz6(p2mo)-5/2p{p2Nk - I < PZE(Nk-1 ) - ~}

1 = 8Coz6(pZmo) 5/2p{Nk-I < no - 2 / ck -2 - ~ + rl + o(1) - 6no/z 2}

I <~ 8C026(p2mo)-5 /2p{noXNa ~ q- tl < no - 2/Ck-2 -- ~ + q ~- O(1) -- ~5n0/22}

<~ SCoz6(p2mo)-5 /2p{XNa ,_ < 1 - ~1} (6' > O)

<~ 8Coz6(p2 mo ) - 5/2 p {Ak_ 2( c~l) },

which is o(1/no)=o(p 2) by Lemma A.1. This completes the proof of (A. 17).

References

Anscombe, F.J., 1953. Sequential estimation. J. Roy. Statist. Soc. Ser. B 15, 1-21. Chow, Y.S., Robbins, H., 1965. On the asymptotic theory of fixed width confidence intervals for the mean.

Ann. Math. Statist. 36, 457-462. Cox, D.R., 1952. Estimation by double sampling. Biometrika 39, 217-227. Hall, P., 1981. Asymptotic theory of triple sampling for sequential estimation of a mean. Ann. Statist. 9,

1229-1238. Liu, W., 1995. Fixed-width simultaneous confidence intervals for all pairwise comparisons. Computat. Statist.

Data Anal. 20(1 ), 35-44.

tK Liu/Journal o1 Statistical Pla#m#Tg and bl/i,re#~ee 65 (1997) 109 127 127

Mukhopadhyay, N., Solanky, T.K.S., 1994. Multistage Selection and Ranking Procedures: Second-Order Asymptotics. Marcel Dekker, Inc., New York.

Robbins, H., 1959. Sequential estimation of the mean of a normal population. In: Probability and Stmisiics (Harold Cramer Volume). Almquist and Wiksell, Uppsala.

Stein, C., 1945. A two-sample test for a linear hypothesis whose power is independent of the wmancc. Ann. Math. Statist. 16, 243 258.

Wittcs, J., Brittain, E., 1990. The role of" internal pilot studies in increasing the eMciency oF clinical trials Statist. Med. I1, 55-66.

Woodruofc, M., 1977. Second order approximations for sequential point and interval estimation. Ann. Slatist. 5. 984 995.