stat:5100 (22s:193) statistical inference i - week...
TRANSCRIPT
STAT:5100 (22S:193) Statistical Inference IWeek 14
Luke Tierney
University of Iowa
Fall 2015
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 1
Monday, November 30, 2015 Central Limit Theorem
Recap
• Distances between probability distributions
• Convergence in distribution
• Central limit theorem
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 2
Monday, November 30, 2015 Central Limit Theorem
Theorem (Central Limit Theorem)
Let X1,X2, . . . , be i .i .d . from a population with an MGF that is finite nearthe origin. Then X1 has finite mean µ and finite variance σ2. Let
Zn =X n − µσ/√
n=√
nX n − µσ
and let Z ∼ N(0, 1). Then Zn → Z in distribution, i.e.
P(Zn ≤ z)→∫ z
−∞
1√2π
e−u2/2du
for all z.
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 3
Monday, November 30, 2015 Central Limit Theorem
Notation
• If (Xn − an)/bnD→ Z ∼ N(0, 1), then we sometimes write
Xn ∼ AN(an, b2n)
• Read this as
Xn is approximately normal
or
Xn is asymptotically normal.
• b2n is sometimes called the asymptotic variance.
• bn is sometimes called the asymptotic standard deviation.
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 4
Monday, November 30, 2015 Central Limit Theorem
Examples
• The sample mean X n is approximately normally distributed with meanµ and SD σ/
√n.
• Alternatively, X n ∼ AN(µ, σ2/n).
• The sum Yn =∑n
i=1 Xi has an approximate normal distribution withmean nµ and SD
√nσ.
• Alternatively, Yn ∼ AN(nµ, nσ2).
• If Yn ∼ Binomial(n, p) then Yn ∼ AN(np, np(1− p)).• Yn can be viewed as the sum of n independent Bernoulli(p) random
variables.
• If Yn ∼ χ2n then Yn ∼ AN(n, 2n).
• Yn can be viewed as a sum of n independent χ21 random variables.
• If Yα ∼ Gamma(α, β) then Yα ∼ AN(αβ, αβ2) as α→∞.
• If Yλ ∼ Poisson(λ) then Yλ ∼ AN(λ, λ) as λ→∞.
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 5
Monday, November 30, 2015 Central Limit Theorem
Proof of the Central Limit Theorem.
• Let Yi = (Xi − µ)/σ.
• Then E [Yi ] = 0, Var(Yi ) = 1, and Zn =√
nY n.
• The MGF of Zn is therefore
MZn(t) = E
[exp
{t√
n1
n
∑Yi
}]= E
[exp
{t√n
∑Yi
}]= MY1
(t√n
)n
• Now by Taylor’s theorem we can write
MY (s) = MY (0) + sM ′Y (0) +s2
2M ′′Y (0) + RY (s)
where RY (s) = o(s2) as s → 0.Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 6
Monday, November 30, 2015 Central Limit Theorem
Proof of the Central Limit Theorem (continued).
• Furthermore,
MY (0) = 1
M ′Y (0) = E [Y ] = 0
M ′′Y (0) = E [Y 2] = Var(Y ) = 1
• Therefore
MY
(t√n
)n
=
(1 +
t2
2n+ RY
(t√n
))n
=
(1 +
12t2 + nRY (t/
√n)
n
)n
→ et2/2
as n→∞.
• This is the MGF of the standard normal distribution.
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 7
Monday, November 30, 2015 Other Limiting Distributions
Other Limiting Distributions
Example
• Suppose X1,X2, . . . are non-negative i.i.d. random variables from adistribution with CDF F .
• Let Yn = min{X1, . . . ,Xn}.• Suppose F (x) ∼ axb for some positive a, b as x → 0; this means
limx→0
F (x)
axb= 1.
• This implies that F (y) > 0 for y > 0.
• The CDF of Yn is FYn(y) = 1− (1− F (y))n.
• This converges to one for all y > 0, so YnD→ 0 and Yn
P→ 0.
• Some simulations for Xi ∼ Uniform[0, 1]: http:
//www.stat.uiowa.edu/~luke/classes/193/convergence.R
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 8
Monday, November 30, 2015 Other Limiting Distributions
Example (continued)
• Can we find constants cn such that Zn = cnYn converges to a non-degenerate limit?
• The CDF of Zn is
FZn (z) = P(Yn ≤ z/cn) = 1− (1− F (z/cn))n = 1−(
1− nF (x/cn)
n
)n
.
• As cn →∞ we havenF (z/cn) ∼ na(z/cn)b =
n
cbnazb.
• So taking cn = n1/b yields nF (z/cn)→ azb and therefore
FZn (z)→ 1− e−azb
• This is the CDF of a Weibull distribution.
• For a Uniform[0, 1] distribution a = 1 and b = 1, so the limiting distribution isexponential with mean 1.
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 9
Monday, November 30, 2015 Other Limiting Distributions
Example (continued)
• If F is a Gamma(α, 1) CDF then F (x) ∼ xα
αΓ(α) as x → 0.
• For α = 2 this means F (x) ∼ 12 x2.
• This code provides a graphical comparison of the exact andapproximate CDFs of Zn for n = 10:
n <- 10
x <- seq(0,4,len=101)
plot(x,1-exp(-0.5*x^2), type = "l")
lines(x, 1 - (1 - pgamma(x/sqrt(n), 2))^n, lty = 2)
• You could also compare the true densities to the density of theapproximation.
• On the original scale,
FYn(y) = FZn(cny) ≈ FZ (cny) = 1− e−a(cny)b .
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 10
Monday, November 30, 2015 Other Convergence Results
Other Convergence Results
Theorem (Slutsky’s Theorem)
Suppose XnD→ X and Yn
P→ a, where a is a constant. Then
XnYnD→ Xa
Xn + YnD→ X + a
Theorem (Continuous Mapping Theorem)
Suppose Xn → X in probability (in distribution) and f has the propertythat
P(X ∈ {x : f is continuous at x}) = 1
Then f (Xn)→ f (X ) in probability (in distribution).
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 11
Monday, November 30, 2015 Other Convergence Results
Example
• Suppose X1,X2, . . . , are i .i .d with mean µ and finite variance σ2.
• Let
X n =1
n
n∑i=1
Xi
and
S2n =
1
n − 1
n∑i=1
(Xi − X n)2.
• We already know that
• X nP→ µ
• X n → µ almost surely• X n ∼ AN(µ, σ2/n).
• What can we say about the behavior of S2n as n→∞?
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 12
Monday, November 30, 2015 Other Convergence Results
Example (continued)
• If we assume that X1 has a finite fourth moment θ4 = E [(X1 − µ)4]then we have
E [S2n ] = σ2
Var(S2n ) =
1
n
(θ4 −
n − 3
n − 1σ4
)→ 0.
• So S2n → σ2 in mean square and in probability.
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 13
Monday, November 30, 2015 Other Convergence Results
Example (continued)
• Another approach is to decompose S2n as
S2n =
1
n − 1
[n∑
i=1
(Xi − µ)2 − n(X n − µ)2
]
=n
n − 1
1
n
n∑i=1
(Xi − µ)2 − n
n − 1(X n − µ)2
=n
n − 1Un −
n
n − 1Vn
• By the WLLN
Un =1
n
n∑i=1
(Xi − µ)2 P→ σ2.
• By the WLLN and the continuous mapping theorem Vn = (X n − µ)2 P→ 0.
• Slutsky’s theorem then shows that
S2n =
n
n − 1Un −
n
n − 1Vn
P→ σ2.
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 14
Monday, November 30, 2015 Other Convergence Results
Example (contiued)
• By the SLLN we also have Un → σ2 almost surely.
• By the SLLN and continuity Vn → 0 almost surely as well.
• Continuity then also implies S2n → σ2 almost surely.
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 15
Monday, November 30, 2015 Other Convergence Results
Example (continued)
• Since the variance of S2n is O(n−1) we might expect
Wn =√
n(S2n − σ2)
to have a non-degenerate limiting distribution.
• Big picture:• After dropping smaller order terms S2
n ≈ Un.• Un is a sum of i.i.d. variables Yi = (Xi − µ)2 with
E [Yi ] = E [(Xi − µ)2] = σ2
Var(Yi ) = E [Y 2i ]− E [Yi ]
2
= E [(Xi − µ)4]− σ4
= θ4 − σ4.
• By the CLT Un ∼ AN(σ2, (θ4 − σ4)/n).• So S2
n ∼ AN(σ2, (θ4 − σ4)/n).
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 16
Monday, November 30, 2015 Other Convergence Results
Example (continued)
• To verify this carefully, write
S2n = Un +
1
n − 1Un +
n
n − 1Vn.
• Then
√n(S2
n − σ2) =√
n(Un − σ2) +
√n
n − 1Un +
n
n − 1
√nVn
= Zn +
√n
n − 1Un +
n
n − 1
√nVn.
• By the CLT ZnD→ Z ∼ N(0, θ4 − σ4).
• To complete the proof we need to show that the remaining two termsconverge to zero in probability.
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 17
Monday, November 30, 2015 Other Convergence Results
Example (continued)
• First term√n
n−1 Un:
• UnP→ σ2 by the WLLN, and
√n
n−1 → 0.
• So by Slutsky’s theorem√n
n−1 UnP→ 0.
• Second term nn−1
√nVn:
• nn−1 → 1 and
√nVn =
√n(X n − µ)2 = [
√n(X n − µ)](X n − µ).
• By the CLT√
n(X n − µ)D→ V ∼ N(0, σ2).
• By the WLLN X n − µP→ 0.
• So by Slutsky’s theorem nn−1
√nVn
P→ 0.
• This completes the proof.
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 18
Monday, November 30, 2015 Other Convergence Results
Example
• From the CLT we know that
Zn =X n − µσ/√
n=√
nX n − µσ
D→ Z ∼ N(0, 1).
• What happens if we replace σ by Sn to get
Tn =X n − µSn/√
n?
• Write Tn as
Tn =σ
Sn
[√n
X n − µσ
]=
σ
SnZn.
• By a homework problem σ/SnP→ 1.
• So Slutsky’s theorem implies that TnD→ Z .
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 19
Wednesday, December 2, 2015 Delta Method
Recap
• Central limit theorem
• Other Limit Distributions
• Slutsky’s theorem
• Continuous mapping theorem
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 20
Wednesday, December 2, 2015 Delta Method
Example
• What can we say about the sampling distribution of Sn =√
S2n?
• We know that E [Sn] < σ.
• Since Var(Sn) = E [S2n ]− E [Sn]2 = σ2 − E [Sn]2 we have
E [Sn] =√σ2 − Var(Sn).
• Var(Sn) is typically O(n−1), so typically
E [Sn] = σ + O(n−1).
• Since f (x) =√
x is continuous and S2n → σ2 in probability and
almost surely, we also have Sn → σ in probability and almost surely.
• Can we approximate the distribution of Sn?
• Some sumulations:http://www.stat.uiowa.edu/~luke/classes/193/convergence.R.
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 20
Wednesday, December 2, 2015 Delta Method
Example (continued)
• We know that the distribution of S2n is concentrated near σ2 for large n.
• Near σ2 we can approximate f (x) =√x by its tangent
f (x) ≈ f (σ2) + f ′(σ2)(x − σ2)
=√σ2 +
1
2
1√σ2
(x − σ2)
= σ +1
2σ(x − σ2)
• So Sn ≈ σ + 12σ
(S2n − σ2).
• We know that S2n − σ2 ∼ AN(0, (θ4 − σ4)/n).
• This suggests that
Sn ∼ AN(σ, (f ′(σ2))2(θ4 − σ4)/n)
= AN
(σ,
(1
2σ
)2
(θ4 − σ4)/n
)
= AN
(σ,
1
4n
(θ4
σ2− σ2
)).
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 21
Wednesday, December 2, 2015 Delta Method
Example (continued)
• To verify this more carefully, Taylor’s theorem applied to the continuous,differentiable function f (x) =
√x in a neighborhood of σ2 gives
f (x) = f (σ2) + f ′(σ2)(x − σ2) + o(x − σ2).
• Therefore
√n(f (S2
n )− f (σ2)) = f ′(σ2)√n(S2
n − σ2) +√n o(S2
n − σ2) = Un + Vn.
• Now√n(S2
n − σ2)D→ Z ∼ N(0, θ4 − σ4) and f ′(σ2) is a constant.
• So Un = f ′(σ2)√n(S2
n − σ2)D→ U ∼ N(0, (f ′(σ2))2(θ4 − σ4)).
• The remainder is
Vn =√n o(S2
n − σ2) =[√
n(S2n − σ2)
] [o(S2n − σ2)
S2n − σ2
].
• This converges to zero in probability by the continuous mapping theorem andSlutsky’s theorem.
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 22
Wednesday, December 2, 2015 Delta Method
Example (continued)
• The same idea can be used to approximate the distribution of log S2n .
• The Taylor expansion near σ2 produces
log S2n ≈ log σ2 +
1
σ2(S2
n − σ2).
• The resulting distribution approximation is
log S2n ∼ AN
(log σ2,
θ4 − σ4
nσ4
).
• The log transformation• reduces the skewness of the distribution of S2
n
• spreads the distribution across the entire real line
• This may improve the quality of the approximation.
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 23
Wednesday, December 2, 2015 Delta Method
Theorem (Delta Method, Propagation of Error)
Suppose XnP→ a, Yn(Xn − a)
D→ Z , and f is differentiable at a. Then
Yn(f (Xn)− f (a))D→ f ′(a)Z
If Yn is real-valued, Xn ∈ Rp, f : Rp → Rq, then this holds with
f ′(a) =
(∂fi∂xj
)i=1,...,qj=1,...,p
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 24
Wednesday, December 2, 2015 Delta Method
• The formal Delta method is a formal version of
f (Xn) ≈ f (a) + f ′(a)(Xn − a)
if f is differentiable at a and Xn is close to a with high probability.
• If Xn ∼ AN(a, b2n) with bn → 0, then the delta method can be
expressed as
f (Xn) ∼ AN(f (a), (|f ′(a)|bn)2)
∼ AN(f (a), (f ′(a))2b2n)
• For f : Rp → Rq, if Xn ∼ AN(a,Bn) with Bn → 0, then the deltamethod becomes
f (Xn) ∼ AN(f (a), f ′(a)Bnf ′(a)T )
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 25
Wednesday, December 2, 2015 Examples
Example
• Suppose X1, . . . ,Xn are i.i.d. positive random variables with mean µ and varianceσ2.
• Let Tn = logX n.
• Then, informally,
Tn ≈ logµ+1
µ(X n − µ) ∼ AN
(logµ,
1
n
σ2
µ2
).
• More formally:
• By the central limit theorem
√n(X n − µ)
D→ Z ∼ N(0, σ2)
• Take Yn =√
n, a = µ, f (x) = log x in the Delta method.• Then f ′(a) = 1/µ, and
√n(log X n − logµ)
D→ 1
µZ ∼ N(0, σ2/µ2).
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 26
Wednesday, December 2, 2015 Examples
Example (continued)
• By the continuous mapping theorem
1
X n
P→ 1
µ
if µ 6= 0.
• So by Slutsky’s theorem, if µ 6= 0,
Sn/X nP→ σ/µ.
• By Slutsky’s theorem therefore
√n
X n
Sn(log X n − logµ)
D→ N(0, 1)
• This is useful for forming approximate confidence intervals for logµ.
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 27
Wednesday, December 2, 2015 Examples
Example
• Let X ∼ N(10, (0.1)2) and Y ∼ N(15, (0.2)2) be independent.
• X and Y represent measurements of the sides of a rectangle.
• An estimate of the area of the rectangle based on thesemeasurements is A = XY .
• We would like an approximation to the distribution of A.
• The function f (x , y) = xy is continuous and differentiable.
• The gradient of f is
∇f (x , y) =
(∂f (x , y)
∂x,∂f (x , y)
∂y
)= (y , x).
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 28
Wednesday, December 2, 2015 Examples
Example (continued)
• The Taylor expansion of f near (10, 15) is
f (x , y) ≈ 10× 15 + 15(x − 10) + 10(y − 15)
= 150 + 15(x − 10) + 10(y − 15).
• So
A ≈ 150 + 15(X − 10) + 10(Y − 15)
∼ N(150, (15× 0.1)2 + (10× 0.2)2)
= N(150, (2.5)2).
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 29
Wednesday, December 2, 2015 Examples
Example (continued)
• The exact density of A is
fA(a) =
∫fX (a/y)fY (y)/|y |dy .
• This is not available in closed form but can be computed numerically.• A graphical comparison is produced by
g <- function(a) {
f <- function(y) dnorm(a/y, 10, .1) * dnorm(y, 15, .2) / abs(y)
integrate(f, 14, 16)$value ## these limits seem reasonable
}
a <- seq(140, 160, len = 101)
d <- sapply(a, g)
plot(a, d, type="l")
lines(a, dnorm(a, 150, 2.5), col = "red")
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 30
Friday, December 4, 2015
Recap
• Delta method
• Examples of using limit theorems
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 31
Friday, December 4, 2015 Examples
Example
• A line is described by the equation
y = A + Bx .
• A and B are joinlty normally distributed with parameters
µA = 2 σA = 0.2 µB = 1 σB = 0.1 ρAB = 0.5.
• The value of x where y = 0 is X = −A/B.
• We would like to approximate the distribution of X .
• X = f (A,B) with f (a, b) = −a/b.
• f is continuous and differentiable for b 6= 0.
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 32
Friday, December 4, 2015 Examples
Example (continued)
• The gradient of f is∇f (a, b) = (−1/b, a/b2).
• The first order Taylor approximation to f near (a, b) = (µA, µB) is
f (a, b) ≈ −µA
µB− 1
µB(a− µA) +
µA
µ2B
(b − µB)
= −µA
µB− 1
µBa +
µA
µ2B
b.
= −2− a + 2b.
• Therefore
X ≈ −2− A + 2B
∼ N(−2− µA + 2µB , σ2A + 4σ2
B − 4σAσBρAB)
= N(−2, (0.2)2).
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 33
Friday, December 4, 2015 Examples
Example (continued)
• The exact distribution is
fX (x) =
∫fAB(−xb, b)|b|db.
• This is available in closed form but is somewhat messy.
• We can use a simple simulation to check the approximation:
N <- 10000
z1 <- rnorm(N)
z2 <- rnorm(N, 0.5 * z1, sqrt(0.75))
A <- 2 + 0.2 * z1
B <- 1 + 0.1 * z2
plot(density(-A/B))
x <- seq(-3, -1, len = 100)
lines(x, dnorm(x, -2, 0.2), lty = 2)
• The normal approximation seems to work quite well.
• Are µX ≈ −2 and σ2X ≈ (0.2)2 good approximations to the mean and variance of
X?
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 34
Friday, December 4, 2015 Normal Approximation to the Beta Distribution
Normal Approximation to the Beta Distribution• Suppose
• Xn ∼ Gamma(αn, 1),Yn ∼ Gamma(βn, 1)• Xn,Yn are independent• αn →∞ and βn →∞.• αn/(αn + βn) = p with 0 < p < 1.
• Then
Xn ∼ AN(αn, αn)
Yn ∼ AN(βn, βn)
• A useful result:
TheoremIf Xn ∼ AN(an, b
2n), Yn ∼ AN(cn, d
2n ), and Xn,Yn are independent, then(
Xn
Yn
)∼ AN
((ancn
),
(b2n 0
0 d2n
))Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 35
Friday, December 4, 2015 Normal Approximation to the Beta Distribution
• For this example, this implies(Vn
Wn
)=
(Xn
αn+βnYn
αn+βn
)
∼ AN
((αn
αn+βnβn
αn+βn
),
(αn
(αn+βn)2 0
0 βn(αn+βn)2
))
∼ AN
((p
1− p
),
(p
αn+βn0
0 1−pαn+βn
))
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 36
Friday, December 4, 2015 Normal Approximation to the Beta Distribution
• Now set
Un =Xn
Xn + Yn=
Xnαn+βn
Xnαn+βn
+ Ynαn+βn
=Vn
Vn + Wn∼ Beta(αn, βn)
• Letf (x , y) =
x
x + y= 1− y
x + y
• ThenUn = f (Vn,Wn).
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 37
Friday, December 4, 2015 Normal Approximation to the Beta Distribution
• The functionf (x , y) =
x
x + y= 1− y
x + y
is differentiable for x > 0, y > 0.
• The gradient of f is
∇f (x , y) =
(∂
∂xf (x , y),
∂
∂yf (x , y)
)=
(y
(x + y)2,− x
(x + y)2
)• So
∇f (p, 1− p) = (1− p,−p)
• Alsof (p, 1− p) = p.
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 38
Friday, December 4, 2015 Normal Approximation to the Beta Distribution
• Therefore
Un ∼ AN
(p, (1− p,−p)
(p
αn+βn0
0 1−pαn+βn
)(1− p−p
))
∼ AN
(p,
p(1− p)2 + (1− p)p2
αn + βn
)∼ AN
(p,
p(1− p)
αn + βn
)• Replacing p with its definition in terms of αn and βn:
Un ∼ AN
(αn
αn + βn,
αnβn(αn + βn)3
)
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 39
Friday, December 4, 2015 Normal Approximation to the Beta Distribution
• Comparing exact and approximate densities:
x <- seq(0, 1, len = 100)
n <- 10
p <- 0.25
plot(x, dbeta(x, p * n, (1 - p) * n), type = "l")
lines(x, dnorm(x, p, sqrt(p * (1 - p) / n)), lty = 2)
• Standardized to make approximation N(0, 1):
z <- seq(-3, 3, len = 100)
s <- sqrt(p * (1 - p) / n)
plot(z, s * dbeta(p + z * s, p * n , (1 - p) * n), type = "l")
lines(z, dnorm(z), lty = 2)
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 40
Friday, December 4, 2015 Normal Approximation to the Beta Distribution
• Comparing standardized CDFs:
plot(z, pbeta(p + z * s, p * n , (1 - p) * n), type = "l")
lines(z, pnorm(z), lty = 2)
• A numerical summary (Kolmogorov distance):
max(abs(pbeta(p + z * s, p * n , (1 - p) * n) - pnorm(z)))
• Summary for several sample sizes:
nn <- c(10, 20, 40, 60, 100, 150, 200)
pp <- sapply(nn, function(n) {
s <- sqrt(p * (1 - p) / n)
max(abs(pbeta(p + z * s, p * n , (1 - p) * n) - pnorm(z)))
})
plot(nn, pp)
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 41
Friday, December 4, 2015 Normal Approximation to the Beta Distribution
• A probability plot:
F1 <- pbeta(p + z * s, p * n , (1 - p) * n)
F2 <- pnorm(z)
plot(F1, F2, type = "l")
abline(0, 1, lty = 2)
• A Quantile plot:
plot(p + s * z, qbeta(pnorm(z), p * n , (1 - p) * n), type = "l")
abline(0,1,lty=2)
• Relative errors:
plot(z, (F1-F2) / (F1 + F2), type = "l")
lines(z, (F1-F2) / (1 - F1 + 1 - F2), lty = 2)
• A numerical summary for relative errors:
max(abs(F1-F2) / (F1 + F2), abs(F1-F2) / (1 - F1 + 1 - F2))
• A better approximation in the tails is obtained by approximating the distribution of
log(U/(1− U)).
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 42
Friday, December 4, 2015 Approximate Distribution of Order Statistics
Approximate Distribution of Order Statistics
• For 0 < p < 1 the p-th sample quantile for a U[0, 1] populationsatisfies
U({np}) ∼ Beta({np}, n − {np}+ 1).
• The normal approximation for the Beta distribution therefore impliesthat
U({np}) ∼ AN(p, p(1− p)/n).
• This means thatU({np})
P→ p
• and √n(U({np}) − p)
D→ V ∼ N(0, p(1− p))
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 43
Friday, December 4, 2015 Approximate Distribution of Order Statistics
• Suppose F is a population CDF such that• F is a continuous distribution with density f• At the p-th population quantile F−1(p)
F ′(F−1(p)) = f (F−1(p)) > 0.
• SinceX(k) ∼ F−1(U(k))
we can approximate the distribution of X({np}) using• the normal approximation to the distribution of U({np})• the delta method
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 44
Friday, December 4, 2015 Approximate Distribution of Order Statistics
• The normal approximation to the distribution of U({np}) is
U({np}) ∼ AN(p, p(1− p)/n).
• The delta method then shows that
X({np}) = F−1(U({np})) ∼ AN
(F−1(p),
(d
dpF−1(p)
)2 p(1− p)
n
)
• Nowd
dpF−1(p) =
1
f (F−1(p))
• So
X({np}) ∼ AN
(F−1(p),
p(1− p)
n f (F−1(p))2
)
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 45
Friday, December 4, 2015 Approximate Distribution of Order Statistics
Example
• Suppose f ∼ N(µ, σ2) and p = 1/2. Then F−1(1/2) = µ and
f (F−1(p)) =1√2πσ
• Then
X̃ ∼ AN
(µ,
2πσ2
4n
)• Now X ∼ N(µ, σ2/n) and√
2π
4≈ 1.2533 > 1
• So for a given sample size X is a more accurate estimator of µ thanX̃ .
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 46
Friday, December 4, 2015 Higher Order Delta Method
Higher Order Delta Method
• The first order delta method works well when the derivative of thefunction is not zero.
• For example, suppose Xn ∼ Binomial(n, p).
• Then Yn = Xn/n ∼ AN(p, p(1− p)/n).
• Let Sn = Yn(1− Yn).
• The first order delta method will produce a useful approximatedistribution for 0 < p < 1 and p 6= 1/2.
• For p = 1/2 a second order Taylor expansion of f (p) = p(1− p) canbe used.
• This is called the second order delta method.
• Other approaches, for example based on simulation, may be moreeffective.
Luke Tierney (U Iowa) STAT:5100 (22S:193) Statistical Inference I Fall 2015 47