adaptive estimation of a function of a mean vector

ELSEVIER Statistics & Probability Letters 33 (1997) 109-115

STATISTICS& PROBABILITY

LETTERS

Adaptive estimation o f a function o f a mean vector

M a r t i n F o x *

Department of Statistics and Probability, A-430 Wells Hall, Michigan State University, East Lansing, MI 48824, USA

Received February 1996

Abstract

A sample is taken from some r-dimensional distribution with mean vector /~ and the value of 0 = P(p) is to be estimated. Here P is a polynomial. An initial sample of size nx is taken and all components are observed. Assume that, after the initial sample, the remaining sampling budget is C and the cost of each new vector taken is co with an additional cost cy if component j is observed. A subset D ~ 0 of { 1 . . . . , r} is selected and a second sample of size n2 is taken in which only those components with index j E D are observed. It is desired to select D to minimize the MSE of the estimator 0 = P(,~) where the components of X are the sample means of observed components.

Consider the case P(p) = 1~j-1 ~J" Assume that E(I-[~=IXj .2) < cx~. Except for terms which are o(1/nl) the MSE

of 0 depends only on the means, variances, and covariances. Without loss of generality, the MSE is computed for the case D = {I . . . . ,m}. In the case r = 2 the inequalities are given determining when D = {1} yields smaller MSE than D = {1,2} and when D = {1} yields smaller MSE than D = {2}. In determining D the means, variances and covariances will be estimated from the initial sample.

r The cases of estimating ~-]~/=l #J, more general polynomials in p, and more general functions of p are discussed briefly.

A M S classification: Primary 62L12; secondary 62L05; 62H12

Keywords: Function of mean vector; Minimize MSE; Multivariate; Observed subset of components; Second-stage budget; Two stage

I. Introduction and notation

Let X be an r-dimensional random vector with mean vector p, variances a 2 and covariances ajk. The major

part of this paper is devoted to the two-stage adaptive estimation of 0 = H~=l PJ. The sample size for the first stage is nl and every component of each vector in the first stage is observed. Assume that, when the first stage o f sampling is complete, the remaining budget is C and the cost o f

each vector in the second-stage is co. It is not necessary that every component of the vectors in the second- stage be observed, but the same components will be observed in each second-stage vector. The cost for each observation o f the j t h component is cj.

* Tel.: (517) 332-3704; fax: (517) 432-1405; e-mail: [email protected].

0167-7152/97/$17.00 (~ 1997 Elsevier Science B.V. All rights reserved PH S0167-7152(96)00117-4

110 M. Fox I Statistics & Probability Letters 33 (1997) 109-115

Let ~ be the class of all nonempty subsets of {1 . . . . ,r} and Dm = {1 . . . . . m}. At the second-stage only the components with index j E D for some D E c~ will be observed so that the second stage sample size will be

I c l n2 = , (1) co + ~ j~D cj

where [ . ] is the greatest integer function. Let X 1 , X 2 . . . . . Xnt+n 2 be a sample from the distribution of X and let Xij be the j th component of X/. The objective will be to choose D so as to minimize the MSE of the usually biased estimator

r

0-- II ,, j = l

where

nl@n 2

x, . ; ( , , + _ ~ j z i=1

i=1

i f j ED,

otherwise.

Assume that

E < ~ . (2)

Then, it will be shown (Section 2) that, ignoring the integer truncation of n2 and up to terms which are o(1/nl) as nl ~ cx~, the MSE of 0 depends only on the means, variances and covariances. The resulting formula for the MSE will be obtained in the case that D = Din.

To apply the formula for the MSE and find the approximately optimal D, the moments will be replaced by their estimates from the first sample.

The result in Section 2 is specialized in Section 3 to the case r = 2 and inequalities are given for D1 to be better than D 2 and for D1 to be better than {2}. The inequalities in this case are for the ratio of the coefficients of variation.

Extensions to the estimation of other polynomials in p are briefly discussed in Section 4 as are extensions to the estimation of other functions of p. Simulation results are given for the estimation of #1 +/~2 in some selected cases with r = 2. Suggestions for further work are in Section 5.

Although the term is much newer, adaptive designs are as old as sequential analysis. They are very much in the forefront of current work in, for example, clinical trials and quality control. The model assumed here is a generalization of those in Shapiro (1985) and Page (1990) where it was assumed that the components of X are independent. In those papers sampling involves one at a time observations of components of a sample from the distribution of X using a fixed sample size. See Flournoy and Rosenberger (1995) for more recent work in adaptive designs and additional references.

In Shapiro (1985) the main application, letting the Xij be Bernoulli random variables, is to estimate the reliability of a series system or, by the obvious generalization, any system.

M. Fox t Statistics & Probability Letters 33 (1997) 109-115 111

2. E s t i m a t i o n o f 0 = I ~ = 1 lzj

Let Zij = Xij - #j so that Zj = X--j - / l j . Further, let

r r F

U : H X ' - I'I/~' : H ( e ' + #J) - H /~J" (3) j= l j=l j= l j= l

Expand the first product in the second expression for U in (3) to obtain

U : Z (jIID/~j) ( I~7"j" ) . (4) D E & j c D /

By (2), all mixed moments of the Zj of order greater than two are o(1/nl). Now consider the two-stage sampling problem originally formulated and, without loss of generality, assume

that only the X~; for j E Dm are observed in the second sample. For each j , letX1Lbe the mean of the Xij in the first sample. For each j E Din, let X2j be the mean in the second sample. Then, Zj = (nlZ U +n2Z:j)/(nl +n2) if j E Dm and ~zj =TU otherwise. Hence,

02 i f j E D m ,

VAR (Zj) : n, + n2

4 (s) - - otherwise, /ql

and, for j ¢ k,

ajk: if j or k E Din, COV

ai k otherwise. nl

Let ]Vim be the resulting MSE of 0. Use (4), (5), and (6) to obtain

m

nl q- n2 nl nl + n2 j = t j = m + 1

+Z . i=m+l k=j+I

asn 1 ~ C x : Z

.v/1

j=l k=j+t

+o(1)

(6)

(7)

n, (Tin + w )

(8)

./ + 2 Z j=l nl (Tm + W) +j=m+l nl j=l k=j+l

() +2 Z ~ ( I - I t4j l~ ' ) (1-L¢kl&)aY~+° l nl g "

j=m+l k:-j+ 1

m Let Tm = ~ j = 0 c j , the cost per vector observed in the second stage, and W = C/n1, the ratio of the remaining budget to the first-stage sample size. From (1), ignoring the truncation, n2 = C/Tm. Then, multiplying the numerator and the denominator of appropriate terms of (7) by Tmnl yields, as nl --~ oc,

112 M. Fox I Statistics & Probability Letters 33 (1997) 109-115

Remark 1. The first terms omitted are the third mixed moments. These terms are O(1/n~). Assuming nor- mality, the third mixed moments are all 0 and the first omitted terms are the fourth mixed moments which are O(1/n~).

Remark 2. Suppose that all the ajk = 0. Dividing the numerator and denominator of each term in (8) by r 1-Ij=l/*j (assumed positive) it is seen that Mm depends on the parameters only through the coefficients of

variation as in Page (1990).

3. T h e c a s e r = 2

Assume that/.1,1'2 > 0. Other cases can also be handled provided that at least one/*i ~ 0. Since application of the adaptive procedure involves replacement of #i by X/, with probability one the procedure can be applied.

By (8),

M2

and

2 2 /.2¢722(t20-}-C1 "~-C2) /*1/.2{712(C0-rCi q-¢2) ( ~ 1 ) /-/20"1 (C0 -}- ¢1 -~- C2) n l ( c O + C l + C z + W ) + n l ( c o + c l + c z + W ) + n l ( e 0 + C l + C z + W ) + °

2 2 2 2 ( 7 ) = /.20"1 (CO -}- Cl ) /*10"2 /*1/.20"12(C0 + Cl ) 1 M1 nl(co + cl + W) + + nl nl(co + Cl + W) + o .

Let A = M2 - M 1 so that the MSE of 0 for only observing the first component in the second stage is smaller than that for observing both if, and only if, A > 0. Set z =/*2al/l~la2, the ratio of the coefficients of variation, and let p be the correlation between X1 and X2. Multiply A by nl(co + ci + c2 + W)(co + Cl + W) and divide by W/*2a~. It is then seen that, ignoring the o(nl) terms, A > 0 is equivalent to

c2r 2 q- 2pc2z -- (co + Cl "Jr- W) > 0

which has positive solution

./p2 + co + Cl + W P. (9) -f > V ¢2

The MSE of 0 for D = {2} is smaller than that for D2 if, and only if, z is less than the reciprocal of the right side of (9) modified by interchanging Cl and c2.

Let M* be the MSE of 0 for D = {2} and A* = M * -M1. An analog of (8) is

M* = # 2 % + /*1°'2( C0 -'}- c 2 ) /*1/.2°'12( C0 "q- c 2 ) 1

nl nl(co + c2 + W) + nl(co + c2 + W) + o .

Multiply A* by hi(Co + e2 + W)(co + cl + W) and again divide by Wp~a 2. Ignoring 0(1/nl) terms, A* > 0 if, and only if,

(C0 + C2 ~- W ) 27 2 or- 2 (c2 - cl ) pz - (co + cl + W) > 0

which has positive solution

~ ( )2 c O + C l + W ( c 2 _ c l ) p C2 -- Cl p2 -'b (10) "t" ~" cO 7c2 -{- W co + £?2 -+- m co -~ c2 ~- W

M. Fox / Statistics & Probability Letters 33 (1997) 109-115 113

Let hi and h2 be the right sides of (9) and (10), respectively. Then, hi is increasing in co, cl and W and is decreasing in c2 and in p. The only obvious behavior of h2 is that it is decreasing in p if c2 < cl and increasing if c2 > cl.

Computations of hi, h2, and the reciprocal o f hi with Cl and c2 interchanged have been done. The ranges of the parameters are Cl = 1(1)10, c2 = cl(1)10, W = 10(10)100, and p = -1(0 .1)1 . The results will be made available on request. The only interest in p = :kl is as limiting cases.

When cl = c2 the condition for D = D1 to be better than D = {2} is that ~ > 1. I f hi < h2, then there is no value of r for which D2 is optimal. This phenomenon has been observed, for example, when co = 1, cl = c2 = 10, and W = 10 for p > 0.5. Other cases of this phenomenon have been found including cases with Cl ¢ c2. However, for most combinations of values of Co, ca, c2 and W used in the computations, there are no values of p for which D2 cannot be optimal.

4. Estimation of other functions of p

r 4.1. Estimation o f 0 = ~ j = l J

I f 0 ~--~j=l PJ is to be estimated by the unbiased estimator 0 r .+~ = r = ~- -~ j= l J ' all the calculations in Sections 2 and 3 are correct except that:

1. U is replaced by ~ = i z ~ j so that the products o f the /~j in (4) are not present and these products disappear from (8) as well.

2. The third- and higher-order moments do not appear in Mm so that there are no o(1/nl ) terms and (8) is exact. The existence of only the second moments is required.

For the case r = 2, the expressions in (9) and (10) are still valid, but with ~ = ~rl/Cr2. Table 1 contains the results o f selected simulations. In each case a2 = 1, nt = 100, Co = 1, Cl = c2 = 10,

C --- 10, and #1 = /~2 = 0. Since Cl = c2, it follows that h2 = 1 and the boundary between optimality of D = {1,2} and D = {2} is l/h1. The two values of p given straddle the boundary between existence and nonexistence of values of z for which D = {1,2} is optimal. In each case 1000 replications were taken. The quantity R is the ratio of the observed mean square error, in this case the observed mean over replications of

.,~21 +X~ , and the mean square error, E(X~ + R ~ ) for the optimal D.

4.2. Estimation o f other polynomials in I1

The product

r

o : I I g j j = l

can be estimated by

r

0--IF;' j=l

114 M. Fox / Statutes & Probab~ityLetters 33 (1997) 109-115

Table l Simulation results

ol {1,2}

p = 0.5 p = 0.6 hi = 1.033, l/hl = 0.968 hi = 0.968, 1~hi = 1.033

F~quencies Frequencies

{1} {2} R {1,2} {1} {2} R

1.0 322 337 341 1.153 52 485 463 1.176 1.1 209 706 85 1.181 26 862 112 1.213 1.2 62 927 11 1.162 3 988 9 1.242 1.3 I1 989 0 1.234 1 998 1 1.162 1.4 1 999 0 1.190 0 1000 0 1.219 1.5 0 1000 0 1.140 0 1000 0 1.075

If the 7j are positive integers, then the term for D =Dm in the analog of (4), ignoring nonlinear terms in the Z j , is

j= l \ tTlj /

The condition which yields the analog of (8) is that

E ~J < oc. (11)

Polynomials in p can be estimated by these procedures provided (11) is satisfied for all terms when p is replaced by X. Under suitable regularity conditions, more general functions of p can be estimated by replacing them by polynomial approximations.

5. Further problems

Two related asymptotic problems are suggested by the simulation results in Section 4.1. 1. What are the asymptotic behaviors of the probabilities of selecting each of the D E ~? It is particularly

interesting to examine the probability that the optimal D will be chosen. 2. How does the asymptotic behavior of the MSE of the adaptive procedure compare with the MSE of

the optimal procedure when the parameters are known? The difference between the MSEs of the adaptive procedure and of the optimal procedure is due to the possibility that a nonoptimal D will be chosen.

Additionally, there is the question of optimal allocation of the entire budget between the two stages. A Bayesian model, as in Shapiro (1985), would also be a reasonable approach, particularly with regard to this last allocation question. A more complicated model of interest would allow more than two stages.

Acknowledgements

The author wishes to thank the referee for suggestions that considerably improved this paper.

M. Fox I Statistics & Probability Letters 33 (1997) 109-115 115

References

Flournoy, N. and W.F. Rosenberger (eds.) (1995), Adaptive Designs." Selected Proceedings o f a 1992 Joint A MS-1MS-SIAM Summer ConJerence, Institute of Mathematical Statistics Lecture Notes-Monograph Series, Vol. 25.

Page, C.F. (1990), Allocation proportional to coefficients of variation when estimating the product of parameters, J. Amer. Statist. Assoc. 85, 1134-1139.

Shapiro, C.P. (1985), Allocation schemes for estimating the product of positive parameters, J. Amer. Statist. Assoc. 80, 449-454.

adaptive estimation of a function of a mean vector

Documents