a k-stage sequential sampling procedure for estimation of normal mean
TRANSCRIPT
journal of statistical planning
Journal of Statistical Planning and and inference ELSEVIER Inference 65 (1997) 109 127
A k-stage sequential sampling procedure for estimation of normal mean
Wei Liu Department o/' Mathenultics, Unit, erMtl q/' Southampton, Southampton SO17 IBJ, UK
Received 6 December 1995; received in revised fi)rm 3 December 1996
Abstract
We study a k-stage (k>~3) sequential estimation procedure which includes the three-stage procedure of Hall (1981) as a special case. With a suitable value of k, the k-stage procedure not only can be as efficient as the fully sequential procedure of Anscombe, Chow and Robbins in terms of sample size, but also requires at most k sampling operations. For the problem of constructing a fixed width confidence interval for the mean of a normal population with unknown variance, the three-stage procedure of Hall always needs a few more observations than the fully sequential procedure. The five-stage procedure, however, requires almost the same number of observations as the fully sequential procedure. © 1997 Elsevier Science B.V.
Kevwords: Confidence intervals; Normal distribution; Sequential methods
1. Introduction
Let N ( / t , ~2) be a normal population with mean tl and variance ~r 2, both unknown, and suppose we wish to construct a confidence interval for t~ with predetermined coverage probability 7 and width 2d. Let no = ( z o / d ) 2 be the sample size which would have been used to achieve this goal had 0 "2 been known, where z = • I((1 5 7) /2) and • and q5 are, respectively, the cdf and pdf o f a N(0, 1 ) random variable. The fully sequential procedure o f Anscombe (1953), Robbins (1959) and Chow and Robbins (1965), denoted as the ACR procedure hereafter, takes observations one by one, and makes a decision after each observation. Let YI, Y2 . . . . be independent observations from the N(it, cr 2) population and let 17,, and o-,2, be, respectively, the mean and variance o f the n-sample }I1 . . . . . Y,,
- on : ( Y i - 17):. n/=l n - - l ~ = l
0378-3758/97/$17.00 @ 1997 Elsevier Science B,V. All rights reserved P I I S 0 3 7 8 - 3 7 5 8 ( 9 7 ) 0 0 0 4 8 - 7
110 I4A Liu/Journal of Statistical Planning and ln/erence 65 (1997) 10~127
Then the ACR procedure continues sampling until
N=inf{n~>m0: n > 21,{r2},
where m0 (~>2) is the size of the first sample, 2 = ( z / d ) 2 and {l,} is a sequence of constants with l,, = 1 + 2~/n + o(n 1) as n -+ cxD where ~ is a given constant. On stopping sampling, construct the confidence interval Ix = 17N -+- d for /.z.
The ACR procedure is very efficient in terms of sample size, since after each new observation it updates the estimate of a 2 and checks whether enough observations have already been drawn. The following asymptotic result has been proved (see e.g. Woodroofe, 1977).
Theorem 1. Suppose that mo >~4, then as no--+ vo
E ( N ) = n o + v + 2~ - 2 + o(1),
where v = 3 _ ~ n ~ , n 1E{(z2 - 2n) +} ~ 0.817; moreover, tJ'm0 ~>7, then
P{l~ E IN} = 7 + ( 2 n o ) - ' z O ( z ) { Z ( v + 2~ -- 2) -- (1 + z2)) + o(n o ' ).
Therefore, at least for large values of no, the difference between the expected sample size of the ACR procedure and the 'optimal' sample size no is about v + 2 ~ - 2, and the coverage probability, although not precisely known, is close to 7. Despite its great efficiency, the ACR procedure can be expensive to carry out since it is fully sequential. In many real situations, significant savings can be achieved by gathering many observations together.
From Theorem 1 we choose ~=~ .0>0 satisfying 2(v + 2~0 - 2) - (l + z 2 ) = 0 so that the coverage probability is equal to 7 +O(no I ). The corresponding average sample size is then given by
£(N) =,,0 + (1 + z2)/2 + o(l). (1.1)
Stein (1945) proposed a two-stage procedure which allows us to construct a 2d-width confidence interval for/z whose coverage probability is at least 7, and it requires only two sampling operations to achieve this end. Let m0 be the size of the first sample, and let
T1 = max{[(t(m'o-2',)/2amo/d) 21 + 1, m0},
where [x] denotes the integer part of x and .(1-;,)/2 is the upper (1 - 7 ) / 2 point of l m o - - 1
Student's t-distribution with m0 - 1 degrees of freedom. Then Stein's procedure draws a second sample of size TI - m0, and constructs the confidence interval Iv, =- ['v, i d
for /~. Stein's procedure can be considerably less efficient than the ACR procedure in terms of sample size, since it takes the second sample together and the size of second sample depends entirely on the first sample. I f the size of the first sample, m0, is considerably smaller than the optimal sample size no, then Stein's procedure
V~ Liu/Journal of Statistical Phmnin~l and h~ference 65 (1997) 100 127 Ill
often leads to substantial oversampling. Cox (1952) showed that E ( T I ) - n o - * + ~ as
no --~ +oo if mo(no)/no --~ O.
As a compromise between Stein's two-stage procedure and the fully sequential ACR
procedure, Hall (1981) proposed a three-stage sampling procedure. It takes a second
sample of size T 2 - m0, where m0 is the size of the first sample as before, and
r2 max{[el ' 2 = Zo',,,0 ] + l,mo},
where 0<c~ <1 is a predetermined constant. Calculate the variance cry- for the pooled
sample of size T2, then take a third sample of size T3 - T2 where
T3 = max{[ ) / ; } :+ 1,1] + 1, T2},
where ~l is a predetermined constant. On stopping sampling, construct the confidence
interval IT, ==-- Yr,, 4- d for /~. The three-stage procedure requires one more sampling operation than the two-stage
procedure. It is more robust to oversampling when m is small however, since after
the second sample there is one more chance to adjust the final sample size T~ and the estimate of a 2 calculated after the second sample is more accurate. The following
result has been proved by Hall (1981).
Theorem 2. A s s u m e that
no -~ ,oc, m o = mo(n o) --~ oc, lim sup mo/no < c l and no = O(m(r)), ( 1.2 )
where r >~ 1 is a .fixed constant. Then
1 E(T3) =n0 2 / c 1 + 2 + q + o ( 1 ) ,
P{I t ~ 17, } = ;' + (2n0) ~z0(z){1 + 2q - (5 + z ~-)/cL } + o(nor ).
Therefore, in contrast to the two-stage procedure, E (T3) -n0 is bounded for the three-
stage procedure. To facilitate the comparisons with the ACR procedure, we choose q =: r/0 > 0 satisfying 1 + 2 r / 0 - (5 + z2)/cl = 0 so that the coverage probability is also
equal to 7 + O(no I) and the corresponding average sample size is given by
E( 7~ ) = no + ( 1 + z 2)/(2cl ) + o( 1 ). ( 1.3 )
From (1.1), (1.3) and 0 < c l <1 , at least asymptotically, E ( N ) < E ( T 3 ) , as one would
expect. Although asymptotically E(T3) can be made arbitrary close to E ( N ) by setting the value of c~ close to one, this fails for smaller samples for the following reasons. If
c~ is close to one then a large proportion of total observations is taken in the second sample and so not much ground is left for manoeuvre in the third sample; even it" Ta is found to be larger than the total sample size after the second sample is taken, it cannot be reduced. Note that c~ = 1 corresponds to a two-stage procedure. On the other hand,
if c~ is close to zero then too few observations are taken in the second sample and so the goal of getting a more accurate estimate of a 2 (and hence the total sample size)
112 ~Id Liu/Journal of Statistical Plannin9 and Inference 65 (1997) 109 127
cannot be achieved. Again, cl = 0 corresponds to a two-stage procedure. From these observations, the value of cj should be neither too large (close to one) nor too small
1 (close to zero). Indeed, the value of cl = g was recommended by Hall (1981) based on simulation results.
When cj = ½ then from (1.1) and (1.3) we have, at least asymptotically, that E ( T 3 ) - E ( N ) ~ (1 +z2) /2 , which is equal to 2.4 when 7=0.95 and 3.8 when 7=0.99. So, on average, the (best) three-stage procedure will need about three observations more than the ACR procedure when ~/= 0.95; this figure increases to four when 7 = 0.99.
The purpose of this paper is to demonstrate that a k-stage procedure with a small value of k (~>3) can be as efficient as a ACR procedure in terms of sample size for both larger and smaller samples. For the problem of constructing fixed width confi- dence intervals considered in this paper, the simulation results indicate that k = 5 is sufficient. A k-stage procedure then has the extra advantage that the number of sam- pling operations, k, is known before observing data and often much smaller than that of a fully sequential ACR procedure. A general k-stage procedure is defined in Section 2. The asymptotic theory is contained in Section 3 with the proofs given in the appendix. The results of a series of Monte Carlo simulations are presented in Section 4. Some concluding remarks are contained in Section 5.
2. The k-stage procedure
Fix k(~>3) and the constants 0 < c l < . . . <ck-2 < 1. Take a first sample of size m0, and take the next k - 2 samples sequentially with the ith (i = 2 . . . . . k - 1) sample of
size Mi_l --Mi 2, where
M j = max{[c j2a2 ] + I , M i _ I } , l ~ j < ~ k - 2 ,
and M0 = m0. Then, take a final sample of size Mk 1 - Mk 2 where
Mk-, = max{[2a2k , + r/] + 1, Mk 2},
where r/ is a predetermined constant. On stopping sampling, construct the confidence interval IMp_,- IrMa_, + d for /~. Denote this procedure by ~k(Cl . . . . . ck-2). Hall's (1981) three-stage procedure corresponds to k = 3.
3. Asymptotic theory
The assumptions used to establish the asymptotic results are the same as those used by Hall (1981), which are given in (1.2).
Theorem 3. I f (1.2) holds then as no--~ ~xD
I E(Mk_ 1 ) = no - 2/ck-2 + ~ + tl + o( 1 )
W. Liu / Journal o!" Statistical Plannimt and h!/ereJtce 65 f 1097) 1(19 127 I13
~ltl[t
P{I' ~ IM, ~} = 7 + (2n0)-Iz~b(:){ 1 + 2 , 1 - (5 +z2)/ck_2} + O(no I).
The proof of the theorem is outlined in the appendix. It is interesting to note from
Theorem 3 that the larger sample behaviour of the k-stage procedure depends (mainly)
on the value of ca-2 but not on k and the other ci's. This, however, is not true
for smaller samples, as has already been pointed out for the three-stage procedure in Section 1. By reparametrization, the following results on E(Mi), i - 2 . . . . . k - 2 follow
directly from Theorem 3. The result on E(M1) can be proved directly.
Corollary 4. / . / ( I .2 ) hol& then as no -~ oc
E(Mi ) = c i n o - 2Ci/C, I + 4 + O( 1 ), i - - 2 . . . . . k - 2,
I E(MI) tin0 + ~7 + o(1).
~ > 0 so that the coverage From Theorem 3, we set ~ / = t / 0 = ( 5 + z2)/ ' (2ca._2)- 2
probability is equal to ,, + o(n0 -1 ); the corresponding average total sample size is then
given by
E(Ma i ) - - n 0 + ( 1 + z 2 ) / ( 2 c k 2)+O(1), (3.1)
From (3.1), E(Mk_ I ) > E ( N ) asymptotically. Result (3. I ) also suggests that the value
of ca. 2 should be set close to unity in order to reduce E(Mk-I ) - E(N) . This fails when k 3, however, for the reasons given below (1.3). Nevertheless, we demonstrate
in the next section that E(Mk I) can be arbitrary close to E(N) by choosing a suitable
combination of k and cl . . . . ,ck-2 with ck 2 close to unity.
4. Simulation results
Although large sample results suggest that the performance of a multi-stage pro-
cedure will not be improved by increasing the number of stages, this is not true for
smaller samples, in a series o f Monte-Carlo trials conducted, we set l , , - 1 + 2~0/n and
~7-= ~10, and so that the coverage probabilities of the ACR and the k-stage procedures are both equal to 7 + O(no 1). Varying 7 from 0.90 to 0.99 leads to similar results.
So we shall report in detail only for 7 = 0.95, which implies that z = 1.96, 2 ~ 0 - 3.604 and ~fl)= 4.421/c~_2 - 0 . 5 . Note that both the ACR and the /,-stage procedures depend
on d and a only through d/a, and d/a :=z/x/n o. Therefore, the simulation results can
be presented in terms of n0 as we did in Table I. It is not necessary to list the val-
ues of both no and d as in Hall (19811). We used a wide range of values of no and m0 = 10,20.
Table 1 contains the results of some of the Monte-Carlo trials conducted with each entry based on 10000 trials. The four procedures presented are . ~ ) - the ACR
114 ~ L i u / J o u r n a l o f Stat is t ical Planning and lnJerence 65 (1997) 109-127
Table 1 Results of 10000 Monte-Carlo trials with 7 = 0.95, 24o - 3.604 and qo = 4.421/ck_2 - 0.5
no :~ m - 10 m = 2 0
3]I &l - no SM p ATI ATI -- no sM p
24 ~o 26.0 2.0 7.2 0.946 27.1 3.1 5.7 0.955 2J°5 26.5 2.5 7.8 0.946 28.2 4.2 6.0 0.959 ~4 27.2 3.2 8.0 0.949 28.9 4.9 6.2 0.959 ;~3 30.6 6.6 8.4 0.96l 32.8 8.8 7.4 0.970
43 .~o 44.9 1.9 9.9 0.942 45.2 2.2 9.4 0.945 ;~5 45.0 2.0 11.4 0.942 45.6 2.6 10.2 0.945 ,~4 45.3 2.3 12.0 0.942 46.2 3.2 10.6 0.948 ,~3 47.6 4.6 14.0 0.947 49.9 6.9 11.5 0.960
61 ;~o 63.1 2.1 11.7 0.946 63.2 2.2 11.4 0.948 ~5 62.9 1.9 13.5 0.944 63.5 2.5 12.7 0.948 ~4 63.0 2.0 14.7 0.946 63.7 2.7 13.4 0.945 -~3 65.3 4.3 17.8 0.943 66.4 5.4 15.4 0.949
76 ,~0 78.2 2.2 12.8 0.946 78.2 2.2 12.8 0.946 ~5 78.1 2.1 14.7 0.945 78.4 2.4 14.0 0.945 ,~4 78.1 2.1 16.2 0.942 78.5 2.5 14.8 0.944 -~3 80.0 4.0 20.3 0.946 80.5 4.5 18.1 0.950
96 ,~l) 98.2 2.2 14.2 0.947 98.2 2.2 14. I 0.947 .~5 98.2 2.2 16.1 0.944 98.4 2.4 15.6 0.949 ~4 98.5 2.5 17.7 0.946 98.5 2.5 16.9 0.944 ~3 100.1 4.1 22.9 0.945 100.5 4.5 20,9 0.949
125 ~o 127.3 2.3 15.9 0.948 127.3 2.3 15.9 0.948 ?-'~5 127.5 2.5 18.0 0.946 127.2 2.2 17,5 0.947 ;~'~4 127.8 2.8 20.0 0,942 127.4 2.4 18.7 0.947 :~3 129.1 4.1 26.3 0,944 129.0 4.0 24.2 0.941
171 ~o 173.3 2.3 18.7 0,947 173.3 2.3 18.7 0.947 ~5 173.6 2.6 20.5 0.946 173.7 2.7 20.1 0.945 ~4 174.4 3.4 22.5 0.945 173.8 2.8 21.6 0.946 '~3 175.9 4.9 30.5 0.946 175.1 4.1 28.0 0.946
246 ,To 248.5 2.5 22.3 0.947 248.5 2.5 22.3 0.947 ,~5 248.9 2.9 23.9 0.950 248.7 2.7 24.0 0.951 ~4 249.8 3.8 26.8 0.948 249.0 3.0 25.6 0,950 .~3 250.7 4.7 38.4 0.948 250.7 4.7 33.8 0,949
384 :~o 386.4 2.4 27.9 0.947 386.4 2.4 27.9 0,947 :~5 387.0 3.0 30.0 0.946 386.6 2.6 29.6 0.949 :~4 388.8 4.8 34.9 0.950 387.4 3.4 31.6 0.948 :~3 389,7 5.7 47.0 0.946 388.6 4.6 42.3 0.944
p r o c e d u r e , ~3 ~ 3 ( 0 . 5 ) , ~ 4 = ~ 4 ( 0 . 5 , 0 . 8 ) , a n d ~5 ~ - ~ 5 ( 0 . 4 , 0 . 7 , 0 . 9 ) . F o r e a c h p r o c e -
d u r e w e c o m p u t e d t h e a v e r a g e to ta l s a m p l e s i ze M , t h e s t a n d a r d d e v i a t i o n o f t he to ta l
s a m p l e s i ze sM, a n d t h e p r o p o r t i o n o f t i m e s p t ha t /~ is c o v e r e d b y t he c o n f i d e n c e
i n t e r v a l s .
IV. Liu / Journal of Statistical Phmnin.q and h~l~'rence 65 (1997) 109 12": I 15
From Table 1, it can be seen that all the coverage probabilities are very close to the'.
target value ;' = 0.95. With respect to the expected sample sizes, ,~3 always needs a few more observations than -~0. But the differences between .#5 and a#0 are negligible and so ,#5 is almost as efficient as ,~0 in terms of total sample size.
Also note that, asymptotically, the ACR procedure requires on average 1÷ no + (1 z 2)/2 - m o sampling operations. The k-stage procedure, on the other hand, requires at most k sampling operations, which is substantially less than that of the ACR procedure, especially when n o - mo is large. Therefore, a k-stage procedure with suitable values of k and c~'s is preferable to the ACR procedure.
5. Concluding remarks
We have demonstrated that, by choosing suitable values of k and ci's, the k- stage procedure can be as efficient as the ACR procedure in terms of sample size. But, of course, the k-stage procedure requires substantially less sampling operations than the ACR procedure. The problem of constructing fixed width confidence in- tervals serves only to demonstrate the idea, which can be used to deal with many other problems, for examples, the sequential estimation (see e.g. Woodroofe, 1977), ranking and selection (see e.g. Mukhopadhyay and Solanky, 1994), hypotheses test- ing (see e.g. Wittes and Brittain. 1990), simultaneous confidence intervals (see e.g. Liu, 1995).
Acknowledgements
1 would like to thank a referee for reading the manuscript very carefully and fi3r useful comments.
Appendix
Although we can proceed to prove a result corresponding to Theorem 2 of Hall (1981, p. 1233) under (1.2) and the assumption that EI yil4r<.:wo (the normality of Y, is not necessary), the proof given below uses the normality of Yi however. This allows us to employ the Helmert transformation and work only with the partial sums of i.i.d, random variables, and hence simplifies the proofs, e.g. the conditional expectation result of Lemma 2 in Hall (1981, p. 1233) can be simplified considerably. Throughout the appendix, Co denotes a generic constant and I ( S ) denotes the index function of set S.
Let Xi = { i ( i + 1 ) } - ' ( i Y i + t - ~.ii=, yj)2/'o'2, i = 1,2 . . . . . which are independent 7.~
random variables. Denote ) [ , , = ( 1 / n ) } 2 ' i ~ £ and S,, ~ i= ,Xi . Let co ca-~ = 1,
I I 6 144 Liu/Journal of Statistical Planning and InJerence 65 (1997) 109-127
N o = m o - 1 and
N, = max {[clnoXuo],No), L, = [c,no)(N,~],
Nk-2 = max{[ck_2noXN, 3] ,Nk-3}, L~-2 = [Ck-ZnO2N~_~],
N k - l = max{[no)(N~ _~ + q],Nk-2}, Lk , = [no~v~_2 + r/].
Then it is c lear that Mi = Ni + 1, i = 1 . . . . . k - 1. W e shall p rove
E(Nk , ) = no - Var(Xi )/ck-2 - ½ + q + o( 1 ), (A. 1 )
Var (Nk_, ) = noVar(Xj )/ck-2 + o(no), (A .2 )
E INk- 1 - ENk_ ,I 3 = o(n~ ). (A .3 )
We first state L e m m a A. 1, which col lects all those asympto t i ca l ly negl ig ib le sets that
wi l l be used in the sequel,
L e m m a A.1. Let
A i ( e ) = { / ) ~ - 1]~>e}, i = 0 , 1 . . . . , k - 2 ,
B i = { N i = N i - l } , i = 1 , . . . , k - 1,
Ci(e) = {[Ni - cino[ >~ecino}, i = I . . . . . k - 2.
Then Jor any ~,>0 there exist 0 < 6 o < 1 and 0 < 6 ~ = 6 1 ( e ) < 1 such that for all suf-
f iciently large no,
BI C {I)(No -- 11>~60} = Ao(6o),
Ci(e) C Bi U Ai - ,(6, ),
Bi+~ _c q ( 5 o ) u {12I(,_~o)c,.ol - II ~>6o}.
A i ( e , ) c C i ( 6 j ) U {I)~[(~_,~)c,,,o] - ll~>61 } U {[)([(J+o,) ..... ] - - 1 [ ~ 6 1 }
for i = 1 . . . . . k - 2 .
F rom the assumpt ion no = O(mD) o f (1 .2) and the Markov inequal i ty we have that
a s n o ~ o o
P { I f ( N o -- l l J>6o}=O(no p) and P{])([(l±~)~,,,o]- l l>~6j} =O(no p)
for any p > 0 since E(X~ ~p) < oo. It fol lows therefore f rom L e m m a A. 1 that for all
sufficiently smal l e > 0 and any p > 0
P(Ai(8))=O(noP), P ( B i ) = O ( n o p) and P(Ci (e , ) )=O(no p) as no -+ oo
for all those Ai(e), Bi and Ci(e) defined in L e m m a A.1.
~( Liu/Journal q[' Statistical Planning aml h!/erence 65 (1997) 109 127 117
Lemma A.2. We have that for some constant ('~h
E{ev~._ 3(2~,, , 1 )} = 0.
E{Nk-3(X,v,, ~ 1)2}=Var(XL)+E{((S ,v~ ~ Na 4 )2 -Nk-4Var(X i ) ) /N~ ~}.
E{( , /xx~(2,~, - 1)?}~<c0. i - - 0 . . . . . k - 1.
E{(S.\. - N~) 2 - NiVar(X1)}=O, i = 0 . . . . . k - 1.
Proof. The expectation on the right-hand side of the second assertion is taken to be
zero if k = 3. The first two assertions are clearly true if k = 3, and can be proved in u
fairly straightforward way by using the conditional expectation formula conditioning on
Xi . . . . ,Xx,. , when k>~4. We use mathematical induction to prove the third assertion.
It is true when i = 0 and assume that it is true when i j (0~<.j~<~k- 2). Then when
i = j + 1 we have
/ ' = E ~ ( ( S , v , - N ) 4 + 6(S,v, - N)=(N+, - Nj)Var(XI )
-~ 4 ( S N , - - ~ ) ( ~ - I - - Nj IE{ (~V 1 - 1 )3 } -7 ( N j - l - , ~ / / ) E { ( X I - 1 )4 })}
< ~ E { ( ' , / N j ( Y N , - 1)) 4} ÷ 6E{ ( , /N j (2 .v , 1 ))2 }Var (X l )
÷ 4E( , /N i I2N. ' -- l l ) E { ( X , - 1) s} + E { tX l - - 1 ) 4 } ,
which is bounded by the Cauchy-Schwarz inequali ty and the assumption that E{(v~ ()[',v/ - I ) ) 4} is bounded. The fourth assertion fol lows directly also from a condit ional argument.
Lemma A.3. As no --+ ,:x:,, we have t h a t j o r an.l, O<~,j<~k - 1 and e > 0
E(~31(AiO:))) = o(1), E(Nj31(Bi))=o(1) and E(N/31(C,:(~:)))=o(1)
.~or all those Ai(r,), Bi and C,(e) delined in Lemma A. 1.
Proof . Let H be either Ai0:) or Bi or Ci(~). We shall show E{~31(H)} o(I ) by using
mathematical induction on j , This is true when j = 0 by Lemma A.I , and assume this
is true when j = l (O<~ldk - 2 ) . Then, when j = 1 + 1, it follows from the inductive
118 W. Liu/Journal of Statistical Planning and InJerence 65 (1997) 10~127
assumption and Lemma A.1 that
E{N~+II(H) } <~ E{(Cl+InO2N~ + I I q- N I ) 3 I ( H ) }
<<. 9c'~+ln~E{O(Ni )3](H)} + o(l ).
It remains to show that
( * ) ~ n 3 E { ( X N , ) 3 [ ( H ) ) = o(1 ),
which can be seen from Lemmas A.1 and A.2 and
( ~¢) <~ 4n30E{]XN, -- l ]3I(H)} + 4n30P(H)
<~ 4n3o(E{(f2N, -- 1 )4} )3/4(p(H))I/4 + 0(1 ) = 0(1 ).
The proof is thus completed. []
Lemma A.4. As no --, 2 , noE(XN~_2 ) ~ no -- Vat(X1 )/c~-2 + o( 1 ).
Proof. Let
~" Bk-2 U Ak-3(e) U Ck-3(e) if k ~>4, Dk 3 = [ Bk-2 UAk-3(e) i f k = 3,
with e > 0. We shall show that as no ~ oc
noE{XNk 2I(Dk-3)} = 0(1),
noE{fQ_~I(D~_ 3 )} = no - Var(Xl )/Ck-z + 0(1 ), (A.4)
from which the lemma follows. First we have
noE{2x~_2I(Dk_3 )} = noE{I(Dk_3 )E(XNk 2 IXI . . . . . XN~-3 )}
=noE { I ( D ~ _ 3 ) ( ~ ( 2 N ~ _ ~ - I )+ I) }
<_ noE(I(Dk_3)12N~ ~ - 11) + noP(Dk-3 )
<~no{P(Dk-3)E(v~k-3 I 2 N ~ 3 - - 1])2)'/2 + noP(Dk-3),
which is o(1) since n2p(Dk_3)=o(1) by Lemma A.1 and E(V~k_31kN~_~ -- 11) 2 is bounded by Lemma 2.
To prove (A.4), note that
Ck-2noE{2N~ 21(D~_3)} = ck-2noE{I(D~_ 3)E(XN t Z [XI , ' " ,XNk-., ) )
= E I(Dk-3)~k_2Nk-3(XN~- 3 -- 1) + ck_2noP(Dk_3)
f Ck--2no - 1)~ + c~-2no = E .I](Dk_3)~Nk_3(XN~_3c + O(1)
( ~vk-2 )
~/~ Liu/Journal of Statistical Phmning and h~/i'rem~e 65 (1997) 109 127 I I~;
E{I (D~ 3)(1 - ()(~,~_~ - 1) ÷ R k 3)Nk 3()(N,; ~ -- 1)} ~- ck_2no + o( l ) ,
(A.5 I
where Rk 3 ~ ck-2no/Nk-2 -- 1 ÷ (2~,~ ~ -- l). Now~ on D ~ - - k 3, we have Rk-3 = c~-2no/[ci,.-2noXN,~ ~] 1 + (2N~ ~ l) and it can
be shown in a straightforward way that ]Rk-31~<Co{(2x~ , - 1 ) 2 + (ck-2no) ~j } for
some constant Co. Consequently,
]E{I(D~I _3)Rk_3Nk_3(2,v~_~ - 1)}l <~CoE{I(D'£_3)((2N ~ .. 1) 2 ÷ ( c k 2no)-l)Nk-312.\,~_~ -- 11}
< ~ C o c E { l ( D ; _ 3 ) ( ( 2 x ~ ., - 1) 2 + (ck 2no) - I )N~ 3}
~<Coc(E{Nk-3(2xx 3 - 1 ) 2 } + ( l + ~ ' ) c k - 3 / ' c a - 2 ) ~ 0 as ~:--~0 (A.6)
since E{Nk_3(J~v~ , - 1) 2} is bounded by Lemma A.2.
We also have
IE{I(D~I 3)Nk 3(2N~ 3 1)}i
= IE{Na 3(2N~ ~ - 1)} E{I (Dk_3)Nk 3(.~, ~, ~ - l)}]
- - IE{ I (Dk-g )x /N~ , -3x /Nk 3 ( 2 ~ ~ - 1)}1
<~(E{I(D~. 3)Nk 3}E{Nk-3(2.,\'~ ~ - 1)2}) 1 ' 2 = 0 ( 1 ) (A.7)
by Lemmas A.2 and A.3. Finally, we show that
c E{I(Dk 3)Nk 3(XN~_3 l ) 2 } ~ V a r ( X l ) + o ( l ) , (A.8)
and then (A.4) follows from (A.5) (A.8). It follows from Lemma A.2 that
E{](D~. 3)Nk 3(XNk_; - l ) 2}
E{Nk_3(2s,,, ~ - l ) 2} - E{I(DN,, 3)Nk-3()(N,. ~ - 1)21}
= E { N k 3(2,v~ ~ - 1 ) 2 } + o ( 1 )
= Var(Xl ) + E{((SN~_4 - N/,-4)2 N~_4Var(Xi ))/Nk-3 } + o(1 )
and so (A.8) is obviously true when k =--3. It remains to show that when k >~4
E{((S~-~ 4 - Nk-4): -- Nk-4 Var(X1 ) ) /Nk-3 } = o(1 ).
Let
H = ,J" ck_3(c) if k 4, / Ck-3(r ) U Ck-a ( r ) if k ~ 5,
120 14~ Liu/Journal of Statistical Planning and Injerence 65 (1997) 109 127
with ~ > 0, then it follows from Lemmas A. 1 and A.2 that
E{I(H)((SN~_~ - Nk-4)2 _ Nx_4 Var(Xl ))~Ark-3 }
<<.NoE{[(H)(N~-4(£N~_~ - 1 )2 + Var (X1 ) ) )
~<(E{((Nk-4()~A~_~ - I ) 2 ) 2 } N 2 p ( H ) ) 1/2 + V a r ( X , ) N o P ( H ) ~ - o ( 1 ) . (A.9)
We also have that for all sufficiently large no
[ c E{ (H)((SN~_a Nk 4) 2 -- Nk-4 Var(X~))/Nk_3}
c , ,~E{I(H )(S~,~_~ - Nk_4)2/((1 ~)ck-3no)}
- E { I ( H ¢)Nk-4 Var(Xi )/(( 1 + e)ck-3no ))
= E { I ( H c)((SN~_~ - N~_4)2 _ Nk-4 Var(Xl ))/(( 1 - ~:)ck_3 no)}
+ E { I ( H C )Nk-4 Var(Xi )2e,/(( 1 - c,)( 1 + e)ck-3no ))
= E{I(H)((SA,~ ~ - Nx 4) 2 - Nk 4 Var(Xl)) / ( ( l - e)ck-3no)}
+ E { I ( H C ) N k _ 4 Var(Yi )2e/((1 - e)(1 + ~:)ck-3no)}
<~E{I(H)((Szv~ ~ - Nk_4) 2 - Nk-a Var(Xl ))/((1 - ~,)ck-3no)}
+ ck-4 Var(Xl )2e,/(( 1 - e)(l + e)ck-3 )
= o(1 ) + ck-4 Var(Xi )2e/((1 - e)(l + e)ck-3) -+ 0 as e --+ 0, (A.10)
where the second equality follows from Lemma A.2, and the last equality follows
from the Cauchy-Schwarz inequality and Lemmas A.2 and A.3. A similar argument establishes that
E{I(HC)((SN~ ~ - Nk-4)2 N~-4 Var(X1 ) ) / N k - 3 )
/> o( 1 ) - ck-4 Var(Yl )2e/((1 - e)( 1 + e,)ck-3 ) --~ 0 as e, --~ 0. (A. 11 )
Now (A.8) follows clearly from (A.9)- (A.11) , and so the proof is completed.
The following lemma can be proved easily by using Lemma 3.
Lemma A.5. As no--~ oG, E(Nk_ l ) = E(Lk 1 ) + o(1).
Lemma A.6. As no --~ cx~, U~, ~noXLk ~ + tl -- [noXL~ ~_ + q] is asymptot ical ly uniform
on (0, 1).
Proof. Let J ~ [ c k 2no~v¢,_~], V=--ck-2no)(x~ ~ - J . An argument similar to Hall (1981, p. 1237) shows that for 0 < x < 1, J > Nk-3 and V C (0, 1)
P{U,,,,<~xIJ, V, N k _ 3 } = x ÷r4 .....
where
Ir4,,¢,l<~ Co(Jl/2/no + J /n 2) as no---+ cx2
W Liu/Journal of Statistical Plannilz(t and h!/~,reHce 65 (1997) I09 127 121
uniformly in V ¢ ( 0 , I) and J > (1 + ~:)Nk-3 for c > 0 and some constant Co. Consc-
quently,
P{U,,,, <~x} = E{P(U,,,, ~<x I J, ~. Nk 3)}
= E { P ( U , , . ~ x ] J , V , Nk 3 ) I ( J > (1+~:)N~ ~)}
+E{P(U,, , , <~x [J, V, .~,~_ 3)1(.I ~<(1 + c)Nz. 3)}
- E { ( x + r4,,,, ) l ( J > ( 1 4 ~,)N~_~ )}
+ E{P(U,,,, <~x ]J. V, N~-3 ) l ( J <~ (1 + c)Nx -3 )}
X _L V5no '
where
]r5,,,,l<~P{J~<(1 +~:)Nk 3 } + C o E ( j t 2 / n o + J / ' n o ) + P { J ~ < ( 1 +c)N/ , 3}.
It remains to show that as no ~ ,~c and fi~r small ~" > 0
P { J < ~ ( 1 + c ) N ~ 3 } = 0 ( l ) and F~(J/'n2)=o(1).
Now
E(J,'n~;) <~ - , 2
by Lemma A.2, and for all sufficiently large no, small c > 0 and k>~4
P{J ~<(1 + ~'.)Nk-3} ~<P{ck :.02v,_, ..<(1 + ~:)Nk 3}
~<P{c/,--2n0)~v, , ~<ca-2no(l c)}
-- P{ca-2no(1 - ~¢)~<(1 + ,~;)N/~-3 + 1 }
<~P{Aa 3(c)} + P{C/,--3(f:)} (A.12)
which is o(1 ) by k e m m a A.1. When k =:3, it fol lows directly from (A.12) that
P { J ~ ( l +,~:)N/~ 3 } = o ( 1 )
by noting that lim supNo/(c lno) < 1. The proof is thus completed. []
We are now ready to prove (A.1). It follows from Lemmas A,4 A.6 that
E(Na. I ) = E ( L k - I ) + o ( 1 ) = E ( l n o ~ v ~ ~ + ~ 1 ] ) + o ( 1 )
E([no)(L,, : + ~l] - (noXl~ : -~ ~l)) + E(noX& : + ll - nof(,v~ : - Jl)
+E(noXx~ : + ~ 7 ) + E ( [ n o X x . . + ~ 7 ] [noP(c, : + t 7 ] ) + o ( I )
_ I + E(n~ )(c~ . no)(~\~ ,) + no Var(Xi )/c~ 2 + 11
+ E([n03?,v~_~ + ~l] - [n03?L , + ~1]) + o(1 ),
122 I~E Liu/Journal o[' Statistical Planning and Inference 65 (1997) 10~127
and so (A. 1) will follow if we show that
E(noXLk 2 - noXN~_2)=o(1) and E([noRN~_2 + q]--[nORLk 2 + q])=o(1).
Let Bk-2 be the set defined in Lemma A. 1. Then
[E(noX&_2 - n0~v, :)l = f (nO2L,_~ -- no2x,_2)dP dB h 2
noSN~_2 dR + nOXNk 2 dP, 2 k - - 2
which is o(1) by the Cauchy-Schwarz inequality and Lemmas A.2 and A.3 as before. A similar argument establishes the other assertion and hence (A. 1) is proved.
Next we prove (A.2) and (A.3). First we can prove Lemma A.7 by using Lemma A.3.
L e m m a A.7. As no ~ oo, Var(Nk 1 ) = Var(Lk_ j ) + o( 1 ).
A simple conditional argument by conditioning on X, . . . . . XN~ 3 gives:
L c m m a A.8.
Var0(N~ 2) Var(NNk--~()(N~-~ 1)) [/'Nk-3"~ = - - Var(X1 )E ~ ~--2-- !
+ Var(X' )E (N--~_2) •
Lemma A.9. The following are true as no-~ oc:
E(N~-_12)=(ck 2n0)--l(1 ~- o ( l ) ) , (A. 13)
E(Nk- 3 )/no <~ Co, (A. 14)
E ( N k - 1 2 N ~ 3 ( ~ ' - I ) ) 1 = °(n° I/2)' (1.15)
E ( 1 2 N ~ ~ - l13)=O(nol). (A. 16)
lqA Liu/ Journal of Statistical Planning and ln/~'rence 65 (1997) 109 127 123
Proof. We first prove (A.13). Let Ck_:(c) be the set defined in Lemma A.I with
~: > 0. Then it follows from Lemma A. 1 that
E ~ Ck-2no
c k - 2 n o d P + - - 1 d P ~<' '~ 2U:) N~ . -2 , ' _-0) Nk 2 d P + . . ,I::t
4 G)~: + ck 2noP(G-2(e,) ) + P(Ck-2(~:)) -+ 0
as n o ~ V C and then c,-+0. This proves (A.13). The assertion (A.14) is clearly true when k = 3, and can be proved in a similar way as (A. 13) by using Lemma A.3 when
k~>4. Assertion (A.15) follows from Lemmas A. I -A.3 by noting that
i , (¢ao_ l "~o ( % - 1 ) = ( x , - 1 ) N~_7 \Nk 2 ck-2dao
= ( l ( C k _ 2 ( g ) ) + l(Ca, 2 0 " ) ) ) \ N k _ c'k- 2 x / n o
~<Co~:E(V~k-aI£N~_~- l l ) + Co~/aoE{I(Ck ~0:)'/N~ 31L~, ~ - ll}
~<Co~'E(x/Na, 31)7x,._,- 11)+ C o x f n o { P ( C k _ 2 ( , ; ) ) E ( N / , . 31~% , - -112)} 12
+ G ~ / ~ / ~ o { E ( N k - 3 1 ( G - 2 ( r , ) ) ) E ( N k - _ ~ [ 2 < ~, - 112)}*~2
Co~ , ;E(~ /Nk 3[)~N¢_~ -- 1[)+O(1)---+0 as ~:--+0.
T o p r o v e ( A . 1 6 ) , note that
E(I)(~ % : - 1] 3) = E(]X,N~ z - 1]31(Cf 2(F'))) -- E([)(,v, : - 1131(G_2(~:)))
.1 , , 3 2 E { ( , / ~ " ~12~,~ , - ll) -~} ~< ((1 - e 4 c k - n o ) ,'
+ ( f (XNa_e - ] )4 )3 /4 (P (Ck -2 ( I ; ) ) ) I4 = n 0 1 0 ( 1 )
by Lemmas A.I and A.2. This completes the proof of the lemma. UA
Lemma A.10. A s no ---+ cx>
Var ~ )_2 (x, - l ) - Va,'(X, )E \ ~ _ ~ j ~ o(,,0 ' ).
124 [4d Liu/ Journal of Statistical Planning and InJerence 65 (1997) 109 127
Proof. From Lemma A.9, the left-hand side of the equation above is equal to
/ , ~ ( x / - 1) X;2
- E ~ (All- 1) -- Var(X,)E ~N2 z j
_ _ ~ 1 ( 1 ~ ( i _ _ 1 ) -- VaF(XI )E + o
and so It remains to show that
{ )} () fNk-3 X~ (Nk 3~ o 1 ~ ( ~ < , - 1 ) -Var<.,)~t--~_~: ~o ' ( . )_--e ~ =-
It follows from Lemmas A.1-A.3 and Wald's equation that
1 E (x i - 1) ($¢) <. E{Nk-3(XN:,_3 - 1)2I(Ck-2(e))} + (-(1 - e)ck_2no) 2
+Var(XI)P(C/~_2(e) ) - ((1 V~)rto)2E{N'-3I(C~-2(~))}
- - ( ( 1 - - g)¢k 2 r t 0 ) 2 ( ( 1 + e,)ck_2no) 2E(Nk-3) + ~0
= 4 e V a r ( X l ) E ( N k - 3 ) 1 + o ( 1 ) , ((1 -- g)(l + F.)Ck_2) 2 no no
which is o(1/n0) since g can be arbitrary small and E(Nk-3)/no<~Co by Lemma A.9; a similar argument shows that
4eVar(X1) E(N~_3) 1 + 0 ( 1 ) (-k) >/ - ( ( 1 - ~)(1 + ~)ck 2) 2 no no ~0 (1)
= O a s £ --~ 0 .
The proof is thus completed. []
Now we prove (A.2). From Lemma A.7 it suffices to show that
Var(Lk_ I ) : no Var(Xl )/ck 2 + o(no).
H~ LiulJom'nal qf Statistical Plwmin# and h!fbrem'e 65 (1997; 109 127 125
From Lemmas A .8 -A . IO, we have
nc~ 2 Var(noff~,~ _: ÷ ~7) = Var()(v~ : )
= Vat \Nj~_~(A~,,,. ~ - 1 ) Va r (Xi )E + V a r ( X i ) E ( ~ ) .
= O(#/<i I ) + Vat(X1 ) 1 ( 1 + o( 1 )) C]~ 217 0
1 Var(Xi ) - - 4- o(no -I ),
C/, 2 #10
and so it remains to show that
Var(L~ I) - Var(no,~, , : + ~ l )=o(no) .
This follows directly from the following three facts:
Var(L/, I) : Var{(no)~x~ 2 4- tl) + (Lk I - no)(,\.,.. - 11)}
Var(noJ(x,,__~ + ~l) + Var(Lk i - noR.\, . -- I1)
+ Coy(no2,\,-< _., L/,_ i - no~%, . ),
Var(Lt l - n o ) ( , ~ , , . - # 1 ) ~ < 1 since IL/, 1 - n o R a , , - ~ 1 1 ~ 1 ,
and
Cov(noJ(,~,~ _~,Lx. I no)~a,~. _.)~<(Var(noJ(x,. )Var(L~ i noJ(x, .... --I1)) i :
~<(Var(no)(~,, :))1..2 (no Var(Xi )/'c, __- + o(no)) i 2 = o ( n o ) .
The p roof o f (A.2) is thus completed. To prove (A.3) , note that
EIN~ i ENI~_il3:--EINk l - n o + V a r ( X i ) / ' c , 2 + 1 , " ° - ~ 7 + o ( 1 ) [ 3 , -
and so it suffices to show that EINz_ i - nol 3 =o (n~ ) . Let B , - i be the set defined m Lemma A.1, then
~; 'Nk i n o [ 3 d p ~ /£ (Nk-1 4 - n o ) 3 d p o(no)
by Lemmas A.1 and A.3, and
INs~-i - n°13 d P : /'B I[n°';~"'~-~ + ' / ] - n ° 1 3 d p
~ < E l [ . o 2 ~ , , . . . . + '7] - nol 3 ~<E{(i,~o2~,.. ~ - . o l + ,1 + 1) 3 }
= o ( = ~ )
126 W. Liu/Journal of Statistical Planninq and lnJerence 65 (1997) 109-127
since E{(n0 I~v~_2 - 11) 3} =o(n2) by Lemma A.9. So EINk_, -n013 =o(n 2) and the proof is completed.
Proof of Theorem 3. The result on E(Mk-l) follows obviously from (A. 1) since Mk-I = Ark-t + 1. The result on the coverage probability can be proved by using the Taylor expansion and (A.I ) - (A.3) in the usual way (see e.g. Hall, 1981, p. 1231), except that the following result is required:
E{~k'"(~.)(p2Mk 1 - p2EMk-I)3} = o(p2) as no --+ w, (A.17)
where ~9(x)=2@(vff ) - 1, p = d / a = z / v / n o and ~ is an intermediate point between p:Mk-1 and p2EMk_ 1 = z: +( ½ +q-2/ck_2 )p2 +o(p: ). To prove this, let E = {p2Mk_ 1 >/ p2EMk_~ - 6} where c5 > 0 is sufficiently small such that p2EMk_~ ) 2 6 for all suffi- ciently large no. By noting that O,t,(~) is bounded on E, we have
I E { ~ ' t ' ( ~ ) ( P 2 M k - J -- p 2 E M k - I )3I(E)}]
~Cop6EIM~-I - EMk-1 [3 = Cop6EINk I - ENk-l ]3
= Cop60(n20) = o(p 2)
by (A.3). Note that ]~/"(x)l <.Cox -5/2 for all x > 0 and so
IE{~bm(~)(p2Mk_~ - p2EMk j)3I(EC)}I
<~ CoE{(pZMk-l )-s/Z(p2EMk-1 )3I(E~)}I
<. Co(p2mo) 5/2(2z2)3p(Ec)
<~ 8Coz6(p2mo)-5/2p{p2Nk - I < PZE(Nk-1 ) - ~}
1 = 8Coz6(pZmo) 5/2p{Nk-I < no - 2 / ck -2 - ~ + rl + o(1) - 6no/z 2}
I <~ 8C026(p2mo)-5 /2p{noXNa ~ q- tl < no - 2/Ck-2 -- ~ + q ~- O(1) -- ~5n0/22}
<~ SCoz6(p2mo)-5 /2p{XNa ,_ < 1 - ~1} (6' > O)
<~ 8Coz6(p2 mo ) - 5/2 p {Ak_ 2( c~l) },
which is o(1/no)=o(p 2) by Lemma A.1. This completes the proof of (A. 17).
References
Anscombe, F.J., 1953. Sequential estimation. J. Roy. Statist. Soc. Ser. B 15, 1-21. Chow, Y.S., Robbins, H., 1965. On the asymptotic theory of fixed width confidence intervals for the mean.
Ann. Math. Statist. 36, 457-462. Cox, D.R., 1952. Estimation by double sampling. Biometrika 39, 217-227. Hall, P., 1981. Asymptotic theory of triple sampling for sequential estimation of a mean. Ann. Statist. 9,
1229-1238. Liu, W., 1995. Fixed-width simultaneous confidence intervals for all pairwise comparisons. Computat. Statist.
Data Anal. 20(1 ), 35-44.
tK Liu/Journal o1 Statistical Pla#m#Tg and bl/i,re#~ee 65 (1997) 109 127 127
Mukhopadhyay, N., Solanky, T.K.S., 1994. Multistage Selection and Ranking Procedures: Second-Order Asymptotics. Marcel Dekker, Inc., New York.
Robbins, H., 1959. Sequential estimation of the mean of a normal population. In: Probability and Stmisiics (Harold Cramer Volume). Almquist and Wiksell, Uppsala.
Stein, C., 1945. A two-sample test for a linear hypothesis whose power is independent of the wmancc. Ann. Math. Statist. 16, 243 258.
Wittcs, J., Brittain, E., 1990. The role of" internal pilot studies in increasing the eMciency oF clinical trials Statist. Med. I1, 55-66.
Woodruofc, M., 1977. Second order approximations for sequential point and interval estimation. Ann. Slatist. 5. 984 995.