estimation of prior distribution and empirical bayes estimation in a nonexponential family
TRANSCRIPT
Journal of Statistical Planning and Inference 24 (1990) 81-86
North-Holland
81
ESTIMATION OF PRIOR DISTRIBUTION AND EMPIRICAL
BAYES ESTIMATION IN A NONEXPONENTIAL FAMILY
B. PRASAD
Directorate of Economics & Statistics, Telhan Bhavan, Hyderabad, India
Radhey S. SINGH
Department of Mathematics & Statistics, University of Guelph, Guelph, Ontario, Canada
Received 3 February 1987; revised manuscript received 14 November 1988
Recommended by V.P. Godambe
Abstract: Empirical Bayes squared error loss estimation for a nonexponential family with den-
sities f(xl0) = exp( - (x- @)1(x> 0) for 6’ E 0, a subset of the real line, is considered. An almost
sure consistent estimator of the prior distribution G, whatever it may be, on 0, is proposed. Based
on this estimator an empirical Bayes estimator consistent for the minimum risk optimal estimator
is exhibited. Asymptotic optimality of this estimator is further established.
AMS Subject Classifications: Primary 62C12; secondary 62F05
Key words and phrases: Prior distribution; empirical Bayes; square error loss; consistency;
asymptotic optimality.
1. Introduction
Emprical Bayes approach to statistical problems was pioneered by Robbins
(1955). It was later studied in detail in the context of various statistical estimation
and/or hypothesis testing problems by a number of authors including Robbins
(1963, 1964), Johns and Van Ryzin (1971, 1972), Fox (1978), Susarla and O’Brayan
(1975) and Singh (1977, 1979, 1985). For example, Johns and Van Ryzin (1972),
Singh ((1977) and (1979)) and Lin (1975) studied the EB approach to square error
loss estimation (SELE) in certain exponential families of probability densities, and
Fox (1978) studied EB SELE in some nonexponential families. The approach taken
in these works is to estimate the Bayes estimator directly. In this paper EB SELE
in a useful nonexponential family is considered and the approach adopted is to use
the Bayes estimator w.r.t. an almost sure consistent estimator of the prior, whatever
it may be. The advantage with this approach is that it provides separately a
consistent estimator of the prior distribution as well, not available by the direct
approach.
0378.3758/90/$3.50 0 1990, Elsevier Science Publishers B.V. (North-Holland)
82 B. Prasad, R.S. Singh / Estimation of prior distribution
In the EB context, the component problem considered here is the SELE of 19 based
on an observation from the density f(x 10) = exp( - (x - O)Z(x> 19), where parameter
8 is a random variable with an unknown prior distribution G on 0, a subset of the
real line. Based on observations Xi, . . . , X, from the past n independent repetitions
of the component problem, where Xi-f(. 119;) and 0;‘s are unobservable random
variables i.i.d. according to G, a with probability one consistent estimator of G is
presented. This estimator of G and the observation X from the present problem are
then used to exhibit an EB estimator, which is asymptotically optimal in the sense
of Robbins (1955).
2. The probability model and a consistent estimator of the prior distribution
function
As mentioned in Section 1, the random observation X of our interest in the
component problem is distributed according to the conditional probability density
function f(x 10) = exp( - (x - O))Z(x> 0) where 6’ is an unobservable random variable
with an unknown prior distribution function G on 0 c (- 03, 00). The conditional
cumulative distribution function (c.d.f.) of X given 0 at a point t is therefore
F,(t1e)=F(tI8)=P[XItIe]= ’ L
f(xIe)dx=z(t>e)-f(t18).
Since G is the unconditional c.d.f. of 8, the marginal p.d.f. of X at x is given by
f(x)= jf(xlO) dG(@ and the marginal c.d.f. of X at x is given by
1
F(x) = I F(xl0) dG(8)
= 1 [Z(x>O)-f(xIO)] dG(B)=G(x)-f(x). c
(2.1)
Thus we can write the c.d.f. G of 6’ at a point x as
G(x) = F(x) +.0x), (2.2)
which can be estimated by estimating F and f. Let Xi, X,, . . . . X,, be observations obtained from n independent past experiences
of the component problem in our empirical Bayes frame work, where Xi lOi has
p.d.f. f(. 10;) and Bi, . . . . 0, are i.i.d. random variables with common c.d.f. G. Then
the empirical distribution function of Xi, . . . . X, at a point x is
F,(x)=~-’ 2 Z(XjSX), j=l
and, since Xi, . . . . X,, are i.i.d. according to common c.d.f. F, by the Glivenko-
B. Prasad, R.S. Singh / Estimation of prior distribution 83
Cantelli theorem, sup,(F,(x)-F(x)\+0 as n+co w.p. 1. Further, since by the
definition of a probability density function,
f(x) = lim F(x+h)-F(x-h)
h-0 2h ’
we estimate f(x) by
f,(x) = F,(x+h,)-F(x-h,)
2h, (2.3)
where O<h,+O as n+os. An estimator of the type (2.3) of a p.d.f. was originally
proposed by Rosenblatt (1956). If h, = n -“j, (or even h, - n - 1’5) then, from
Rosenblatt (1956), f, is a mean square consistent estimator of f, i.e.
E&(x) --f(~))~+o as n-+00 for almost all x, and from Nadaraya (1964), f,(x) is a
strongly (with probability one) consistent estimator off(x) for almost all x. We have
thus from (2.2) proved the following theorem.
Theorem 1. The estimator G, defined by
G,(x) = F,(x) +f,(x) (2.4)
with h,-n _ “’ is a strongly consistent estimator of the prior distribution G(x) for
almost all x, whatever G may be on 0.
3. Proposed empirical Bayes estimator
Throughout the remainder of this paper we consider a square error loss function
for EB estimation of f? for the model under our study and restrict our attention to
the parameter space 0 which is a subset of (- n, a) for an 0 <a < 03. Then, the Bayes
optimal estimator of 8 in the component problem that minimizes the over all risk
is given by (see Ferguson (1967)) the posterior mean of 8 given X,
d,(X) = E(8 IX) = Sef(xte)dG(@
!f(xtQ) dG(B) (3.1)
which is not available to us for use since G is unknown.
Since G, consistently estimates G, a natural EB estimator of B in the (n + 1)st
component problem, evaluated at X,, + r =X, would be
e,(x)= Sww) dG,W b-WI@ d’%(e) . (3.2)
However, since the parameter space under study is in (-a,a) and hence -a< E(BlX) =d,(X) <a for almost all values of X, we propose to restrict (3.2) to
(-a, a) and use instead,
We will
B. Prasad, R.S. Singh / Estimation of prior distribution
1:
-a if Qn(X)( -a, d,(X)= Q,(X) if -a<Q,(X)<a, (3.3)
a if Q,(X) L a.
now express d,(X) explicitly in terms of X,, X2, . . . , X, and X for
computational purpose.
Notice that
s f(xlO)dF,(B)=n-’ jet fCxIxj)
=exp(-x).n-’ i eXp(Xj)Z(Xj<X), j=l
f(xl@dF,(~+kJ= f(xl(u-h,))dF,(u) i
=n -’ ji, f(xI(xj-hn))
Similarly,
=exp(-x-h,).n-’ i exp(Xj)Z(Xj<x+h,). j=l
f(-+) dF,(e- A,) = i
f(xl(u + h,)) dF,(u)
=exp(-x+h,).n-’ i eXp(Xj)Z(Xj<X-h,). j=l
Thus we have from (2.4), (2.3) and the above expressions,
s fw dwe) = i fw) w,(e) +mm
=exp(-x)n-’ i Aj(X) j=l
where
Aj(x)=eXP(Xj)[Z(Xj<X)+(2hn)-1{eXP(-h,)Z(Xj<X+h,)
-eXP(h,)Z(Xj<X-h,)}]. (3.4)
Similarly, using the above techniques if we write the expressions for j ef(xlS) dF,(O),
~Of(xlO) d(F,J and ~Of(xiO) dF,(O-h,) in terms of 4, we get from (2.3) and (2.4),
.i’ ef(xle)dG,(e)= ef(xle)d(F,(e)+f,(e))=exp(-x)n-' i Bj(X) i j=l
B. Prasad, R.S. Singh / Estimation ofprior distribution 85
where
B~(X)=eXP(Xj)[XjZ(X;<X)+(2h,)~‘{(Xj-h,)eXP(-h,)Z(Xj<X+hn)
-(Xj+h,)exp(h,)Z(Xi<x-h,)}].
(3.5)
Consequently, for computational purpose, Q, in (3.2) which appears in our pro-
posed EB estimator (3.3) can be written as
e
n
(x) = c;= 14(X)
C;= , Aj (X) ’
4. Consistency and asymptotic optimality of EB estimator d,
In this section we show that d,, is not only a consistent estimator of do but is also
asymptotically optimal.
As we have indicated earlier, F,,(x)+F(x) with probability one uniformly in x,
and f,,(x)+f(x) with probability one for almost all x. Hence G,(x)=F,(x)+f,(x)
converges to G(x) =F(x) +f(x) with probability one for almost all x. Further, since
for each x, f(xl0) and f3f(xl0) are continuous and bounded in 0, by Helly-Bray
theorem, as n-w,
’ f(xl@ dG,(W I f(xl0) dG(0) =f(x) in prob., I
and
@-(xl@ dG,(W I
Of(xlO)f(xJO) dG(0) in prob. I
Hence,
Q,(x) = h+‘) dG,(@ --, jQf(xlQ dG(@ =d
@-(xl@) dG,(Q) Sf(xl@ dG(B) ’ (x)
(4.1)
in probability for almost all x where f(x)>O.
Now consider (d,(X) -do(X)j2. From the definition of d,(X) it is easy to see
that
j&(X)-d,(X)) I IQ,(X) - d,(X)1 A2a for almost all X, (4.2)
since - a~ d,(X) I a for almost all X. Hence from (4.1) and (4.2), it follows that
Id,(X) -d,(X)1 -+O in prob. for almost all X.
Further since by (4.2), Id,(X) - d,(X)/ _ 2 < a, and the risks due to d,, and dc are respectively R(d,,, G) =&d,(X) - f3>2 and R(G) =E(d,(X) - O)‘, and since
ZW,,, G) - R(G) = W,,(X) - dc(X))2 (4.3)
86 B. Prasad, R.S. Singh / Estimation of prior distribution
(e.g., see Singh (1985)), by the Lebesgue dominated convergence Theorem, (4.3)
converges to zero. We have thus proved the following theorem.
Theorem 2. Let d,, be as defined in (3.3). Then d,,(x) is a consistent (in proba- bility) estimator of the &yes optimal estimator do(x) for almost all x in the set {x: f(x)>O}. Further, d,, is asymptotically optimal in the sense that its risk ap- proaches to the minimum possible risk R(G) as n+w.
References
Ferguson, T.S. (1967). Mathematical Statistics: A Decision Theoretic Approach. Academic Press, New
York.
Fox, R. (1978). Solutions to empirical Bayes squared error loss estimation problems. Ann. Statist. 6, 846-853.
Johns, M.V. and J. Van Ryzin (1971). Convergence rates for empirical Bayes two-action problems, 1.
Discrete case. Ann. Math. Statist. 42, 1521-1539. Johns, M.V. and J. Van Ryzin (1972). Convergence rates for Empirical Bayes two-action problems, II.
Continuous case. Ann. Math. Statist. 43, 934-947. Lin, P.E. (1975). Rates of convergence in empirical Bayes estimation problems. Continuous case. Ann.
Stutist. 3, 155-164. Nadaraya, E.A. (1964). On nonparametric estimates of density function and regression curves. Theory
Probab. Appl. 10, 186-190.
Robbins, H. (1955). An empirical Bayes approach to statistics. Proc. Third Berkeley Symp. Math. Statist. Vol. 1, 157-164.
Robbins, H. (1964). The empirical Bayes approach to statistical decision problems. Ann. Mafh. Statist.
35, l-20. Robbins, H. (1963). The empirical Bayes approach to testing statistical hypotheses. Rev. Inst. Statist.
Inst. 31, 195-208. Rosenblatt, M. (1956). Remarks on some nonparametric estimators of density function. Ann. Math.
Statist. 27, 832-837. Singh, R.S. (1976). Empirical Bayes estimation with convergence rates in noncontinuous Lebesgue
exponential families. Ann. Statist. 4, 431-439. Singh, R.S. (1979). Empirical Bayes estimation in Lebesgue exponential families with rates near the best
possible rate. Ann. Statist. 7, 890-902. Singh, R.S. (1985). Empirical Bayes estimation in a multiple linear regression model. Ann. Inst. Stat.
Mafh. 37(A), 71-86. Susarla, V. and T. O’Brayan (1975). An empirical Bayes two action problem with nonidentical compo-
nents for a translated exponential distribution. Comm. Statist. 4(8), 767-775.