estimation of prior distribution and empirical bayes estimation in a nonexponential family

Journal of Statistical Planning and Inference 24 (1990) 81-86

North-Holland

81

ESTIMATION OF PRIOR DISTRIBUTION AND EMPIRICAL

BAYES ESTIMATION IN A NONEXPONENTIAL FAMILY

B. PRASAD

Directorate of Economics & Statistics, Telhan Bhavan, Hyderabad, India

Radhey S. SINGH

Department of Mathematics & Statistics, University of Guelph, Guelph, Ontario, Canada

Received 3 February 1987; revised manuscript received 14 November 1988

Recommended by V.P. Godambe

Abstract: Empirical Bayes squared error loss estimation for a nonexponential family with den-

sities f(xl0) = exp( - (x- @)1(x> 0) for 6’ E 0, a subset of the real line, is considered. An almost

sure consistent estimator of the prior distribution G, whatever it may be, on 0, is proposed. Based

on this estimator an empirical Bayes estimator consistent for the minimum risk optimal estimator

is exhibited. Asymptotic optimality of this estimator is further established.

AMS Subject Classifications: Primary 62C12; secondary 62F05

Key words and phrases: Prior distribution; empirical Bayes; square error loss; consistency;

asymptotic optimality.

1. Introduction

Emprical Bayes approach to statistical problems was pioneered by Robbins

(1955). It was later studied in detail in the context of various statistical estimation

and/or hypothesis testing problems by a number of authors including Robbins

(1963, 1964), Johns and Van Ryzin (1971, 1972), Fox (1978), Susarla and O’Brayan

(1975) and Singh (1977, 1979, 1985). For example, Johns and Van Ryzin (1972),

Singh ((1977) and (1979)) and Lin (1975) studied the EB approach to square error

loss estimation (SELE) in certain exponential families of probability densities, and

Fox (1978) studied EB SELE in some nonexponential families. The approach taken

in these works is to estimate the Bayes estimator directly. In this paper EB SELE

in a useful nonexponential family is considered and the approach adopted is to use

the Bayes estimator w.r.t. an almost sure consistent estimator of the prior, whatever

it may be. The advantage with this approach is that it provides separately a

consistent estimator of the prior distribution as well, not available by the direct

approach.

0378.3758/90/$3.50 0 1990, Elsevier Science Publishers B.V. (North-Holland)

82 B. Prasad, R.S. Singh / Estimation of prior distribution

In the EB context, the component problem considered here is the SELE of 19 based

on an observation from the density f(x 10) = exp( - (x - O)Z(x> 19), where parameter

8 is a random variable with an unknown prior distribution G on 0, a subset of the

real line. Based on observations Xi, . . . , X, from the past n independent repetitions

of the component problem, where Xi-f(. 119;) and 0;‘s are unobservable random

variables i.i.d. according to G, a with probability one consistent estimator of G is

presented. This estimator of G and the observation X from the present problem are

then used to exhibit an EB estimator, which is asymptotically optimal in the sense

of Robbins (1955).

2. The probability model and a consistent estimator of the prior distribution

function

As mentioned in Section 1, the random observation X of our interest in the

component problem is distributed according to the conditional probability density

function f(x 10) = exp( - (x - O))Z(x> 0) where 6’ is an unobservable random variable

with an unknown prior distribution function G on 0 c (- 03, 00). The conditional

cumulative distribution function (c.d.f.) of X given 0 at a point t is therefore

F,(t1e)=F(tI8)=P[XItIe]= ’ L

f(xIe)dx=z(t>e)-f(t18).

Since G is the unconditional c.d.f. of 8, the marginal p.d.f. of X at x is given by

f(x)= jf(xlO) dG(@ and the marginal c.d.f. of X at x is given by

1

F(x) = I F(xl0) dG(8)

= 1 [Z(x>O)-f(xIO)] dG(B)=G(x)-f(x). c

(2.1)

Thus we can write the c.d.f. G of 6’ at a point x as

G(x) = F(x) +.0x), (2.2)

which can be estimated by estimating F and f. Let Xi, X,, . . . . X,, be observations obtained from n independent past experiences

of the component problem in our empirical Bayes frame work, where Xi lOi has

p.d.f. f(. 10;) and Bi, . . . . 0, are i.i.d. random variables with common c.d.f. G. Then

the empirical distribution function of Xi, . . . . X, at a point x is

F,(x)=~-’ 2 Z(XjSX), j=l

and, since Xi, . . . . X,, are i.i.d. according to common c.d.f. F, by the Glivenko-

B. Prasad, R.S. Singh / Estimation of prior distribution 83

Cantelli theorem, sup,(F,(x)-F(x)\+0 as n+co w.p. 1. Further, since by the

definition of a probability density function,

f(x) = lim F(x+h)-F(x-h)

h-0 2h ’

we estimate f(x) by

f,(x) = F,(x+h,)-F(x-h,)

2h, (2.3)

where O<h,+O as n+os. An estimator of the type (2.3) of a p.d.f. was originally

proposed by Rosenblatt (1956). If h, = n -“j, (or even h, - n - 1’5) then, from

Rosenblatt (1956), f, is a mean square consistent estimator of f, i.e.

E&(x) --f(~))~+o as n-+00 for almost all x, and from Nadaraya (1964), f,(x) is a

strongly (with probability one) consistent estimator off(x) for almost all x. We have

thus from (2.2) proved the following theorem.

Theorem 1. The estimator G, defined by

G,(x) = F,(x) +f,(x) (2.4)

with h,-n _ “’ is a strongly consistent estimator of the prior distribution G(x) for

almost all x, whatever G may be on 0.

3. Proposed empirical Bayes estimator

Throughout the remainder of this paper we consider a square error loss function

for EB estimation of f? for the model under our study and restrict our attention to

the parameter space 0 which is a subset of (- n, a) for an 0 <a < 03. Then, the Bayes

optimal estimator of 8 in the component problem that minimizes the over all risk

is given by (see Ferguson (1967)) the posterior mean of 8 given X,

d,(X) = E(8 IX) = Sef(xte)dG(@

!f(xtQ) dG(B) (3.1)

which is not available to us for use since G is unknown.

Since G, consistently estimates G, a natural EB estimator of B in the (n + 1)st

component problem, evaluated at X,, + r =X, would be

e,(x)= Sww) dG,W b-WI@ d’%(e) . (3.2)

However, since the parameter space under study is in (-a,a) and hence -a< E(BlX) =d,(X) <a for almost all values of X, we propose to restrict (3.2) to

(-a, a) and use instead,

We will

B. Prasad, R.S. Singh / Estimation of prior distribution

1:

-a if Qn(X)( -a, d,(X)= Q,(X) if -a<Q,(X)<a, (3.3)

a if Q,(X) L a.

now express d,(X) explicitly in terms of X,, X2, . . . , X, and X for

computational purpose.

Notice that

s f(xlO)dF,(B)=n-’ jet fCxIxj)

=exp(-x).n-’ i eXp(Xj)Z(Xj<X), j=l

f(xl@dF,(~+kJ= f(xl(u-h,))dF,(u) i

=n -’ ji, f(xI(xj-hn))

Similarly,

=exp(-x-h,).n-’ i exp(Xj)Z(Xj<x+h,). j=l

f(-+) dF,(e- A,) = i

f(xl(u + h,)) dF,(u)

=exp(-x+h,).n-’ i eXp(Xj)Z(Xj<X-h,). j=l

Thus we have from (2.4), (2.3) and the above expressions,

s fw dwe) = i fw) w,(e) +mm

=exp(-x)n-’ i Aj(X) j=l

where

Aj(x)=eXP(Xj)[Z(Xj<X)+(2hn)-1{eXP(-h,)Z(Xj<X+h,)

-eXP(h,)Z(Xj<X-h,)}]. (3.4)

Similarly, using the above techniques if we write the expressions for j ef(xlS) dF,(O),

~Of(xlO) d(F,J and ~Of(xiO) dF,(O-h,) in terms of 4, we get from (2.3) and (2.4),

.i’ ef(xle)dG,(e)= ef(xle)d(F,(e)+f,(e))=exp(-x)n-' i Bj(X) i j=l

B. Prasad, R.S. Singh / Estimation ofprior distribution 85

where

B~(X)=eXP(Xj)[XjZ(X;<X)+(2h,)~‘{(Xj-h,)eXP(-h,)Z(Xj<X+hn)

-(Xj+h,)exp(h,)Z(Xi<x-h,)}].

(3.5)

Consequently, for computational purpose, Q, in (3.2) which appears in our pro-

posed EB estimator (3.3) can be written as

e

n

(x) = c;= 14(X)

C;= , Aj (X) ’

4. Consistency and asymptotic optimality of EB estimator d,

In this section we show that d,, is not only a consistent estimator of do but is also

asymptotically optimal.

As we have indicated earlier, F,,(x)+F(x) with probability one uniformly in x,

and f,,(x)+f(x) with probability one for almost all x. Hence G,(x)=F,(x)+f,(x)

converges to G(x) =F(x) +f(x) with probability one for almost all x. Further, since

for each x, f(xl0) and f3f(xl0) are continuous and bounded in 0, by Helly-Bray

theorem, as n-w,

’ f(xl@ dG,(W I f(xl0) dG(0) =f(x) in prob., I

and

@-(xl@ dG,(W I

Of(xlO)f(xJO) dG(0) in prob. I

Hence,

Q,(x) = h+‘) dG,(@ --, jQf(xlQ dG(@ =d

@-(xl@) dG,(Q) Sf(xl@ dG(B) ’ (x)

(4.1)

in probability for almost all x where f(x)>O.

Now consider (d,(X) -do(X)j2. From the definition of d,(X) it is easy to see

that

j&(X)-d,(X)) I IQ,(X) - d,(X)1 A2a for almost all X, (4.2)

since - a~ d,(X) I a for almost all X. Hence from (4.1) and (4.2), it follows that

Id,(X) -d,(X)1 -+O in prob. for almost all X.

Further since by (4.2), Id,(X) - d,(X)/ _ 2 < a, and the risks due to d,, and dc are respectively R(d,,, G) =&d,(X) - f3>2 and R(G) =E(d,(X) - O)‘, and since

ZW,,, G) - R(G) = W,,(X) - dc(X))2 (4.3)

86 B. Prasad, R.S. Singh / Estimation of prior distribution

(e.g., see Singh (1985)), by the Lebesgue dominated convergence Theorem, (4.3)

converges to zero. We have thus proved the following theorem.

Theorem 2. Let d,, be as defined in (3.3). Then d,,(x) is a consistent (in probability) estimator of the &yes optimal estimator do(x) for almost all x in the set {x: f(x)>O}. Further, d,, is asymptotically optimal in the sense that its risk ap- proaches to the minimum possible risk R(G) as n+w.

References

Ferguson, T.S. (1967). Mathematical Statistics: A Decision Theoretic Approach. Academic Press, New

York.

Fox, R. (1978). Solutions to empirical Bayes squared error loss estimation problems. Ann. Statist. 6, 846-853.

Johns, M.V. and J. Van Ryzin (1971). Convergence rates for empirical Bayes two-action problems, 1.

Discrete case. Ann. Math. Statist. 42, 1521-1539. Johns, M.V. and J. Van Ryzin (1972). Convergence rates for Empirical Bayes two-action problems, II.

Continuous case. Ann. Math. Statist. 43, 934-947. Lin, P.E. (1975). Rates of convergence in empirical Bayes estimation problems. Continuous case. Ann.

Stutist. 3, 155-164. Nadaraya, E.A. (1964). On nonparametric estimates of density function and regression curves. Theory

Probab. Appl. 10, 186-190.

Robbins, H. (1955). An empirical Bayes approach to statistics. Proc. Third Berkeley Symp. Math. Statist. Vol. 1, 157-164.

Robbins, H. (1964). The empirical Bayes approach to statistical decision problems. Ann. Mafh. Statist.

35, l-20. Robbins, H. (1963). The empirical Bayes approach to testing statistical hypotheses. Rev. Inst. Statist.

Inst. 31, 195-208. Rosenblatt, M. (1956). Remarks on some nonparametric estimators of density function. Ann. Math.

Statist. 27, 832-837. Singh, R.S. (1976). Empirical Bayes estimation with convergence rates in noncontinuous Lebesgue

exponential families. Ann. Statist. 4, 431-439. Singh, R.S. (1979). Empirical Bayes estimation in Lebesgue exponential families with rates near the best

possible rate. Ann. Statist. 7, 890-902. Singh, R.S. (1985). Empirical Bayes estimation in a multiple linear regression model. Ann. Inst. Stat.

Mafh. 37(A), 71-86. Susarla, V. and T. O’Brayan (1975). An empirical Bayes two action problem with nonidentical compo-

nents for a translated exponential distribution. Comm. Statist. 4(8), 767-775.

estimation of prior distribution and empirical bayes estimation in a nonexponential family

Documents