lecture 10: exponential familiesrsgill01/667/lecture 10.pdflecture 10: exponential families math...

Lecture 10: Exponential Families

MATH 667-01Statistical Inference

University of Louisville

October 22, 2019Last corrected: 11/5/2019

1 / 23 Lecture 10: Exponential Families

Introduction

We define an exponential family and state a few generalproperties of exponential families1.

We will also discuss sums for a random sample from anexponential family2.

Finally, we show how to find a sufficient statistic based on arandom sample from an exponential family3.

1CB: Section 3.4, HMC: Section 7.52CB: Section 5.2, HMC: Section 7.53CB: Section 6.2, HMC: Section 7.5


Exponential Families

Definition L10.1:4 A family of pdfs or pmfs is called anexponential family if it can be expressed as

f(x;θ) = h(x)c(θ) exp

(k∑i=1

wi(θ)ti(x)

)(form in CB)

= exp

(k∑i=1

pi(θ)Ki(x) +H(x) + q(θ)

)(form in HMC)

where h(x) = eH(x) and ti(x) = Ki(x) do not depend on θand wi(θ) = Ki(θ) and c(θ) = eq(θ) are nontrivial continuousfunctions of θ.

4CB: simliar to page 111, HMC: similar to Definition 7.7.2 on p.4483 / 23 Lecture 10: Exponential Families


Example L10.1: Show that the normal distribution with meanµ and variance 1 can be expressed in the form of anexponential family.

Answer to Example L10.1: Its pdf is

f(x;µ) =1√2π

exp

{−1

2(x− µ)2

}=

1√2π

exp

{−1

2x2 + µx− 1

2µ2}

=

(1√2πe−

12x2)e−

12µ2eµx

= h(x)c(µ)ew1(µ)t1(x)

where h(x) = 1√2πe−

12x2 , c(µ) = e−

12µ2 , w1(µ) = µ, and

t1(x) = x.



Example L10.2: Show that the beta(α,β) distribution can beexpressed in the form of an exponential family.


f(x;α, β) =Γ(α+ β)xα−1(1− x)β−1

Γ(α)Γ(β)I(0,1)(x)

= I(0,1)(x)Γ(α+ β)

Γ(α)Γ(β)e(α−1) lnx+(β−1) ln(1−x)

= h(x)c(α, β)ew1(α,β)t1(x)+w2(α,β)t2(x)

where h(x) = I(0,1)(x), c(α, β) =Γ(α+ β)

Γ(α)Γ(β),

w1(α, β) = α− 1, t1(x) = lnx, w2(α, β) = β − 1, andt2(x) = ln(1− x).



Example L10.3: Consider the continuous distribution withdensity function

f(x; θ) =(θ + 1)xθ

θθ, 0 < x < θ

=(θ + 1)

θθeθ lnx

where θ > 0. Is this an exponential family? Why or why not?

Answer to Example L10.3: No, the support for the densitycannot depend on θ.



Theorem L10.1:5 If X is a random variable with pdf or pmf ofthe form

f(x; θ) = h(x)c(θ)ew(θ)t(x) = ew(θ)t(x)+H(x)+q(θ),

then

E [t(X)] = − 1

w′(θ)

d

dθ

[ln c(θ)

]= − q

′(θ)

w′(θ)

and

Var [t(X)] = − 1

w′(θ)2

(d2

dθ2

[ln c(θ)

]+ w′′(θ)E [t(X)]

)

=1

w′(θ)3{w′′(θ)q′(θ)− q′′(θ)w′(θ)

}.

5CB: special case of Thm 3.4.2 on p.112, HMC: special case of Thm 7.5.1.2 on p.436



Example L10.4:(a) Assuming n is fixed, show that the binomial distributionwith probability of success p based on n trials can beexpressed in the form of an exponential family.(b) Use Theorem L10.1 to show that E[X] = np andVar[X] = np(1− p).Answer to Example L10.4: (a) Its pmf is

f(x; p) =

(n

x

)px(1− p)n−xI{0,1,...,n}(x)

=

((n

x

)I{0,1,...,n}(x)

)(1− p)n exp

{x ln

(p

1− p

)}= h(x)c(p)ew(p)t(x)

where h(x) =

(n

x

)I{0,1,...,n}(x), c(p) = (1− p)n,

w(p) = ln

(p

1− p

), and t(x) = x.



Answer to Example L10.4 continued: Alternately, the pmf canbe expressed as

f(x; p) = h̃(x)c̃(p)ew(p)t(x)

where h̃(x) =

(n

x

)(1

2

)nI{0,1,...,n}(x), c̃(p) = 2n(1− p)n,

w(p) = ln

(p

1− p

), and t(x) = x so that h̃(x) is one of the

pmf’s in the family.



Answer to Example L10.4(b) continued:Since q(p) = ln c(p) = n ln(1− p), we have

E [X] = −−q′(p)

w′(p)= − −n/(1− p)

1/(p(1− p))= np

and

Var [X] =1

w′(p)3{w′′(p)q′(p)− q′′(p)w′(p)}

= (p(1− p))3[− n(2p− 1)

p2(1− p)2· 1

1− p+

1

(1− p)2· 1

p(1− p)

]= np(1− p).



Sometimes, an exponential family is reparametrized in termsof the natural parameter η and cumulant generating functionψ(η):

f(x;η) = h(x)c∗(η) exp

(k∑i=1

ηiti(x)

)

= h(x) exp {−ψ(η)} exp

{k∑i=1

ηiti(x)

}

= exp

{k∑i=1

ηiti(x)− ψ(η)

}h(x)

=exp

(∑ki=1 ηiti(x)

)h(x)∫∞

−∞ exp(∑k

i=1 ηiti(x))h(x) dx

where ψ(η) = ln(∫∞−∞ exp

(∑ki=1 ηiti(x)

)h(x) dx

).



Definition L10.2:6 The set H ={η = (η1, . . . , ηk) :

∫∞−∞ h(x) exp

(∑ki=1 ηiti(x)

)dx <∞

}is called the natural parameter space for the family.(The integral is replaced with an appropriate sum if therandom variable is discrete.)

Note that eψ(η) = (c∗(η))−1 is the moment generatingfunction of (t1(X), . . . , tk(X)) if X has pdf h(x).

Then the formulas for the first two central moments fromTheorem L10.1 reduce to

E [t(X)] = − d

dη

[ln c∗(η)

]= ψ′(η)

and

Var [t(X)] = − d2

dη2

[ln c∗(η)

]= ψ′′(η).

6CB: page 11412 / 23 Lecture 10: Exponential Families


Answer to Example L10.4 continued: In terms of the naturalparameterization η = ln( p

1−p)⇔ p = eη

1+eη , the pmf of thebinomial distribution can be expressed as

f(x; η) = eηt(x)−ψ(η)h̃(x)

where h̃(x) =

(n

x

)(1

2

)nI{0,1,...,n}(x), ψ(η) = n ln

(1+eη

2

)and t(x) = x.

Then E[X] = ψ′(η) = n eη

1+eη = np and

Var[X] = ψ′′(η) = n eη

(1+eη)2= np(1− p).



Definition L10.3:7 A full exponential family is a family ofpmf/pdf’s for which the dimension of θ is equal to k.

Definition L10.4:8 A curved exponential family is a family ofpmf/pdf’s for which the dimension of θ is less than k.

7CB: page 1158CB: page 115



Example L10.5: Show that the normal family of densities withmean µ and variance σ2 can be expressed as an exponentialfamily. What is its natural parameter space? Is it a fullexponential or a curved exponential family?


f(x;η) = eη1t1(x)+η2t2(x)−ψ(η)h(x)

where h(x) = 1√2π

, ψ(η) = 12

(η22η1− ln η1

), t1(x) = −x2

2 , and

t2(x) = x with η1 = 1/σ2 and η2 = µ/σ2.

The natural parameter space is

{(η1, η2) : η1 > 0,−∞ < η2 <∞}

so this is a full exponential family.



Example L10.6: Show that the normal family of densities withmean µ and standard deviation µ can be expressed as anexponential family. Is it a full exponential or a curvedexponential family?

Answer to Example L10.6: With σ = µ, we obtain aone-dimensional curved exponential family

f(x;µ) =1√

2πµ2e−1/2 exp

(− x2

2µ2+x

µ

)with parameter space {

(µ, µ2) : µ > 0}.


Random Sample from an Exponential Family

Theorem L10.2:9 Suppose X1, . . . , Xn is a random samplefrom a pdf/pmf f(x; θ) where

f(x; θ) = h(x)c(θ) exp

(k∑i=1

wi(θ)ti(x)

)is a member of an

exponential family. Define statistics Y1, . . . , Yk by

Yi(X1, . . . , Xn) =

n∑j=1

ti(Xj), i = 1, . . . , k.

Suppose {(w1(θ), . . . , wk(θ)) : θ ∈ Θ} contains an opensubset of Rk. Then the distribution of (Y1, . . . , Yk) is anexponential family of the form

fY (y1, . . . , yk; θ) = H(y1, . . . , yk) [c(θ)]n exp

(k∑i=1

wi(θ)yi

).

9CB: Thm 5.2.11 on p.217, HMC: extension of Thm 7.5.1.1 on p.43617 / 23 Lecture 10: Exponential Families


Example L10.7: Suppose that X1, . . . , Xn are independentexponential random variables with pdf

f(x;β) =1

βe−x/βI(0,∞)(x)

where β > 0. Does the pdf of Y =

n∑i=1

Xi belong to an

exponential family? If so, what is the function H(y)?



Answer to Example L10.7: Since f(x;β) is an exponentialfamily with h(x) = I(0,∞)(x), c(β) = 1

β , w(β) = − 1β , and

t(x) = x, and {w(β) : β > 0} = (−∞, 0) contains an opensubset of R, Theorem L10.2 implies that the pdf of Ybelongs to an exponential family with C(β) = [c(β)]n = 1

βn ,

w(β) = − 1β , and Y (x1, . . . , xn) =

∑ni=1 xi of the form

fY (u) = H(y)1

βne−y/β.

The pdf of a Gamma random variable is

f(x;α, β) =1

Γ(α)βαxα−1e−x/βI(0,∞)(x),

so Y ∼ Gamma(n, β) and H(y) =1

Γ(n)yn−1I(0,∞)(y).


Sufficiency

Theorem L10.3:10 Let X1, . . . , Xn be iid observations from apdf or pmf f(x;θ) that belongs to an exponential family givenby

f(x;θ) = h(x)c(θ) exp

(k∑i=1

wi(θ)ti(x)

),

where θ = (θ1, θ2, . . . , θd), d ≤ k. Then n∑j=1

t1(Xj), . . . ,n∑j=1

tk(Xj)

is a sufficient statistic for θ.

10CB: Theorem 6.2.10 on p.279, HMC: page 44920 / 23 Lecture 10: Exponential Families

Sufficiency

Example L10.8: Suppose that X1, . . . , Xn is a random samplefrom a Normal distribution with unknown mean µ andunknown variance σ2. Find a sufficient statistic for (µ, σ2).

Answer to Example L10.8: Recall from Example L10.5 thatthe normal family of densities with mean µ and variance σ2

can be expressed as

f(x;η) = h(x)c(η)eη1t1(x)+η2t2(x)

where h(x) = 1√2π

, c∗(η) =√η1 exp

(− η22

2η1

), t1(x) = −x2

2 ,

and t2(x) = x with η1 = 1/σ2 and η2 = µ/σ2.

Thus,

(−1

2

n∑i=1

X2i ,

n∑i=1

Xi

)is sufficient for (µ, σ2).


Sufficiency

Any one-to-one function of a sufficient statistic is also asufficient statistic, as shown below.

Suppose T (X) is a sufficient statistic for θ, and suppose r isa one-to-one function (with inverse r−1) such thatT ∗(x) = r(T (x)) for all x.

By the Factorization Theorem(Theorem L9.2), there exist gand h such that

f(x;θ) = g(T (x);θ)h(x) = g(r−1(T ∗(x));θ))h(x).

Letting g∗(t;θ) = g(r−1(t);θ), we have

f(x;θ) = g∗(T ∗(x);θ)h(x)

so that T ∗(X) is sufficient for θ.


Sufficiency

Answer to Example L10.8 continued: It was shown that(−1

2

n∑i=1

X2i ,

n∑i=1

Xi

)is sufficient for (µ, σ2). Let

r(t1, t2) =

(t2n,−2nt1 − t22n(n− 1)

).

Since r is one-to-one, r

(−1

2

n∑i=1

X2i ,

n∑i=1

Xi

)=(X̄, S2

)is

sufficient for (µ, σ2).


lecture 10: exponential familiesrsgill01/667/lecture 10.pdflecture 10: exponential families math...

Documents