a simulation method for skewness correction - diva portal302313/fulltext01.pdf · the notion of...

U.U.D.M. Project Report 2008:22

Examensarbete i matematisk statistik, 30 hpHandledare och examinator: Silvelyn ZwanzigDecember 2008

Department of MathematicsUppsala University

A simulation method for skewness correction

Måns Eriksson

Abstract

Let X1, . . . , Xn be i.i.d. random variables with known variance and skewness.A one-sided confidence interval for the mean with approximate confidence level αcan be constructed using normal approximation. For skew distributions the actualconfidence level will then be α+o(1). We propose a method for obtaining confidenceintervals with confidence level α+o(n−1/2) using skewness correcting pseudo-randomvariables. The method is compared with a known method; Edgeworth correction.

Acknowledgements

I would like to thank my advisor Silvelyn Zwanzig for introducing me to the subject, formathematical and stylistic guidance and for always encouraging me.

I would also like to thank my friends and teachers at and around the department ofmathematics for having inspired me to study mathematics, and for continuing to inspireme.

3

Contents

1 Introduction 71.1 Skewness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.2 Setting and notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 The Edgeworth expansion 82.1 Definition and formal conditions . . . . . . . . . . . . . . . . . . . . . . . 82.2 Derivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2.1 Edgeworth expansion for Sn . . . . . . . . . . . . . . . . . . . . . . 112.2.2 Edgeworth expansions for more general statistics . . . . . . . . . . 142.2.3 Edgeworth expansion for Tn . . . . . . . . . . . . . . . . . . . . . . 152.2.4 Some remarks; skewness correction . . . . . . . . . . . . . . . . . . 16

2.3 Cornish-Fisher expansions for quantiles . . . . . . . . . . . . . . . . . . . 16

3 Methods for skewness correction 173.1 Coverages of confidence intervals . . . . . . . . . . . . . . . . . . . . . . . 173.2 Edgeworth correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.3 The bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.3.1 The bootstrap and Sn . . . . . . . . . . . . . . . . . . . . . . . . . 193.3.2 Bootstrap confidence intervals . . . . . . . . . . . . . . . . . . . . . 20

3.4 A new simulation method . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.4.1 Skewness correction through addition of a random variable . . . . 213.4.2 Simulation procedure . . . . . . . . . . . . . . . . . . . . . . . . . 22

4 Comparison 234.1 Coverages of confidence intervals . . . . . . . . . . . . . . . . . . . . . . . 234.2 Criteria for the comparison . . . . . . . . . . . . . . . . . . . . . . . . . . 244.3 Comparisons of the upper limits . . . . . . . . . . . . . . . . . . . . . . . 244.4 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

A Appendix: Skewness and kurtosis 28A.1 The skewness of a sum of random variables . . . . . . . . . . . . . . . . . 28A.2 The kurtosis of a sum of random variables . . . . . . . . . . . . . . . . . . 30

B Appendix: P (θnew ≤ θEcorr) 31

C Appendix: Simulation results 36

1 Introduction

1.1 Skewness

The notion of skewness has been a part of statistics for a long time. It dates back tothe 19th century and most notably to an article by Karl Pearson from 1895 ([26]). Skewdistributions are found in all areas of applications, ranging from finance to biology andphysics. It has been seen both in theory and in practice that deviations from normality inthe form of skewness might have effects too big to ignore on the validity and performanceof many statistical methods and procedures.

This thesis discusses skewness in the context of the central limit theorem and normalapproximation, in particular applied to confidence intervals. Some methods for skewnesscorrection are discussed and a new simulation method is proposed. We assume thatthe concept of skewness is known and refer to Appendix A for some basic facts aboutskewness.

1.2 Setting and notation

Throughout the thesis we assume that we have an i.i.d. univariate sample X1, . . . , Xn,with EX = µ, Var(X) = σ2 and E|X|3 < ∞, such that X satisfies Cramer’s conditionlim supt→∞ |ϕ(t)| < 1, where ϕ is the characteristic function of X. At times we willalso assume that EX4 < ∞. We use X to denote a generic Xi, that is, X is a randomvariable with the same distribution as the Xi. Thus, for instance, EX is the mean ofthe distribution of the observations; EX = EXi for all i.

The α-quantile vα of the distribution of some random variable X is defined to besuch that P (X ≤ vα) = α. When X ∼ N(0, 1) we denote the quantile λα, that is,Φ(λα) = α.

We use An to denote a general statistic, Sn = n1/2(X − µ)/σ to denote the stan-dardized sample mean and Tn = n1/2(X −µ)/σ to denote the studentized sample mean,where σ2 = 1

n

∑(Xi − X)2.

The skewness E(X − µ)3/σ3 of a random variable X is denoted Skew(X), γ or γX

if we need to distinguish between different random variables. The kurtosis of X, E(X −µ)4/σ4 − 3, is denoted Kurt(X), κ or κX . Basic facts about skewness and kurtosis arestated in Appendix A.

As for asymptotic notation, for real-valued sequences an and bn we say that an = o(bn)if an/bn → 0 as n →∞ and an = O(bn) if an/bn is bounded as n →∞.

Finally, we say that a sequence Xn of random variables is bounded in probability if,limc→∞ lim supn→∞ P (|Xn| > c) = 0. We write this as Xn = OP (1) and if, for somesequence an, anXn = OP (1) we write Xn = OP (1/an).

7

2 The Edgeworth expansion

In this section we introduce our main tool, the Edgeworth expansion. Later we will useit to determine the coverage of confidence intervals.

2.1 Definition and formal conditions

Theorem 1. Assume that X1, . . . , Xn is an i.i.d. sample from a univariate distribution,with mean µ, variance σ2 and E|X|j+2 < ∞, that satisfies lim supt→∞ |ϕ(t)| < 1. LetSn = n1/2(X − µ)/σ. Then

P (Sn ≤ x) = Φ(x) + n−1/2p1(x)φ(x) + . . . + n−j/2pj(x)φ(x) + o(n−j/2) (1)

uniformly in x, where Φ(x) and φ(x) are the standard normal distribution function anddensity function and pk is a polynomial of degree 3k − 1. In particular

p1(x) = −16γ(x2 − 1) and

p2(x) = −x( 1

24κ(x2 − 3) +

172

γ2(x4 − 10x2 + 15)).

Proof. A proof is given in Section 2.2.1. See also [7] and [8].

(1) is called an Edgeworth expansion for Sn.The condition lim supt→∞ |ϕ(t)| < 1 is known as Cramer’s condition and was de-

rived by Cramer in [8]. Note that the condition holds whenever X is absolutely con-tinuous. This is an immediate consequence of the Riemann-Lebesgue lemma (Theo-rem 1.5 in Chapter 4 of [18]). Moreover, if we limit the expansion to P (Sn ≤ x) =Φ(x) + n−1/2p1(x)φ(x) + o(n−1/2), so that the remainder term is o(n−1/2), then it suf-fices that X has a non-lattice distribution, which was shown by Esseen in [16].

The Edgeworth expansion was first developed for the statistic Sn but has later beenextended to other statistics.

Definition 1. Let An denote a statistic. Then if

P (An ≤ x) = Φ(x) + n−1/2a1(x)φ(x) + . . . + n−j/2aj(x)φ(x) + o(n−j/2), (2)

where Φ(x) and φ(x) are the standard normal distribution function and density functionand ak is a polynomial of degree 3k − 1, (2) is called the Edgeworth expansion for An.

If ak = 0 for all k < i the normal approximation of the distribution is said to be i:thorder correct.

In general, the polynomials ak will depend on the moments of the statistic. We cantherefore, in a sense, view the Edgeworth expansion as an extension of the central limittheorem, where information about the higher moments of the involved random variablesis used to obtain a better approximation of the distribution function of An. The expan-sion gives an expression for the size of the remainder term depending on the sample size.

8

We might want to compare this to the Berry-Esseen theorem (see for instance [16] orSection 7.6 of [18]), which essentially, in our context, says that the error in the normalapproximation is of order n−1/2.

The Edgeworth expansion for simple statistics was first introduced in papers by Cheby-shev in 1890 ([4]) and Edgeworth in 1894, 1905 and 1907 ([12, 13, 14]). The idea wasmade mathematically rigorous by Cramer in 1928 ([7]) and Esseen in 1945 ([16]). Theexpansions and their applications for more general statistics were then developed inseveral papers, including [2], by various authors in the mid-1900’s.

A thorough treatment of the Edgeworth expansion is found in Chapter 2 of [22].Chapter 13 of [10] gives a brief introduction to the Edgeworth expansion with some ofthe most important results and Section 17.7 of [8] is a standard reference for the casewhere the expansion for Sn is considered.

Conditions for (2) to hold are given next. Although we will focus on expansions ofquite simple statistics, the Edgeworth expansion can be used in very general circum-stances. We state a theorem by Bhattacharya and Ghosh ([2]) that provides conditionsfor the Edgeworth expansion (2) to hold in a general case and illustrate how the theoremrelates to the expansion for Sn.

Let X,X1,X2, . . . ,Xn be i.i.d. random column vectors in Rd with mean µ and letX = n−1

∑ni=1 Xi. Let A : Rd → R be a function of the form AS = (g(x)− g(µ))/h(µ)

or AT = (g(x)− g(µ))/h(x), where g and h are known, θ = g(X) is an estimator of thescalar θ = g(µ), h(µ)2 is the asymptotical variance of n1/2θ and h(X) is an estimatorof h(µ).

For t ∈ Rd, t = (t(1), t(2), . . . , t(d)), define ||t|| = ((t(1))2 + . . . + (t(d))2)1/2 and, for arandom d-vector X, ϕ(t) = E(exp(i

∑dj=1 t(j)X(j)).

Theorem 2. Assume that A has j + 2 continuous derivates in a neighbourhood of µ =E(X) and that A(µ) = 0. Furthermore, assume that E(||X||j+2) < ∞ and that thecharacteristic function ϕ of X is such that lim sup||t||→∞ |ϕ(t)| < 1. Let σA be theasymptotic standard deviation of n1/2A(X) and assume that σA > 0. Then, with An =n1/2A(X)/σA and for j ≥ 1,

P (An ≤ x) = Φ(x) + n−1/2a1(x)φ(x) + . . . + n−j/2aj(x)φ(x) + o(n−j/2) (3)

uniformly in x. ak is a polynomial of degree 3k− 1 with coefficients depending on A andon moments of X of order less than or equal k + 2. ak is odd for even k and even forodd k.

Proof. See [2].

Note that the above theorem is a summary of the results in Bhattacharya’s andGhosh’s 1978 paper and thus not stated in one single theorem in the original work. Oursummary largely resembles that in Chapter 2 of [22].

9

The class of functions that satisfy the conditions in Theorem 2 contains many func-tions of great interest. In particular, we can write any moment estimator, i.e. anestimator based on sample moments, as a function A of vector means in the same waythat we will do below for the mean.

The Edgeworth expansion is sometimes written as an infinte series, P (An ≤ x) =Φ(x)+n−1/2a1(x)φ(x)+ . . .+n−j/2aj(x)φ(x)+ . . .. However, for the series to converge it

is required, for an absolutely continuous random variable, that E(

exp( (X−µ)2

4·σ2 ))

< ∞;a condition that fails even for exponentially distributed X (see [7]). Thus we prefer thestopped series used in (3), which also turns out to be more useful in practice.

Before we show how Theorem 2 relates to the expansion for Sn we state a slightlymore general corollary.

Corollary 1. Let An be either AS = n1/2(θ − θ)/σ or AT = n1/2(θ − θ)/σ, where θ issome unknown scalar parameter, θ is an asymptotically unbiased estimator of θ, σ2 isthe asymptotic variance of n1/2θ and σ2 is some consistent estimator of σ2. Then thefirst two polynomials in the Edgeworth expansion are

a1(x) = −(k1,2 +

16k3,1(x2 − 1)

)and

a2(x) = −x(1

2(k2,2 + k2

1,2) +124

(k4,1 + 4k1,2k3,1)(x2 − 3) +172

k23,1(x

4 − 10x2 + 15))

where the kj,i comes from an expansion of the j:th cumulant of An:

κj,n = n−(j−2)/2(kj,1 + n−1kj,2 + n−2kj,3 + . . .).

Proof. Details are given in Section 2.2.2.

Theorem 2 can be used to show that Sn as well as the studentized sample meanTn = n−1/2(X−µ)/σ admit Edgeworth expansions. Assume that X1, . . . , Xn is a samplefrom a univariate distribution, that the unknown parameter θ is the mean µ of thedistribution and that the distribution has variance σ2. Take d = 2, Xi = (Xi, X

2i )T and

µ = E(X) = (EX, EX2)T and let g(x(1), x(2)) = x(1) and h(x(1), x(2)) = x(2) − (x(1))2.Then g(µ) = µ and g(X) = X. Furthermore h(µ) = σ2 and

h(X) = n−1n∑

i=1

X2i −

(n−1

n∑i=1

Xi

)2= n−1

n∑i=1

(Xi − X)2 = σ2

AS = (g(x) − g(µ))/h(µ) and AT = (g(x) − g(µ))/h(x) both fulfill the conditions inTheorem 2 and the asymptotic standard deviation σA of n1/2A(X) is 1. Furthermore, thecondition E(

√X2 + (X2)2

j+2) < ∞ can be reduced to E(|X|j+2) < ∞ and Cramer’s

condition reduces to lim supt→∞ |ϕ(t)| < 1. Thus the conditions for existence of theexpansion for Sn and Tn follow and turn out to be those in Theorem 1. Both statisticsare of the form that is considered in Corollary 1 and the polynomials in their expansionscan thus be found.

10

2.2 Derivations

Knowing under which conditions the Edgeworth expansion exists, we are ready to derivethe expressions for the polynomials p1 and p2.

2.2.1 Edgeworth expansion for Sn

As before, let Sn = n1/2(X − µ)/σ. Assuming that E|X|j+2 < ∞ and that X fulfillsCramer’s condition, by Corollary 1

P (Sn ≤ x) = Φ(x) + n−1/2p1(x)φ(x) + . . . + n−j/2pj(x)φ(x) + o(n−j/2). (4)

To actually be able to make reasonable use of the expansion we need to find expressionsfor the polynomials pk. The case where j = 2 will prove to be of special interest to us, soalthough we consider general j we will in the end only derive explicit expressions for p1

and p2. Our exposition is mainly based on that in [8] and [22] but aims to be somewhatmore thorough than those that we have seen in the existing literature.

Since we will assume the existence of moments of order higher than 2, Sn is asymp-totically N(0, 1)-distributed, i.e. Sn converges in distribution to S ∼ N(0, 1), and thusthe characteristic function ϕn of Sn converges to e−t/2, the characteristic function of thestandard normal distribution, as n tends to infinity. That is, as n →∞

ϕn(t) = E(exp(itSn)) −→ E(exp(itS)) = e−t2/2

for −∞ < t < ∞ where S ∼ N(0, 1).Recall that if Y1, . . . , Yn are i.i.d. and Sn = Y1 + . . .+Yn then ϕSn(t) = (ϕY1(t))

n (seeTheorem 1.8 in Chapter 4 of [18]). Now, let Yi = (Xi−µ)/σ. Then Sn = n1/2(X−µ)/σ =n1/2 1

n

∑ni1

Yi = n−1/2∑n

i1Yi and thus

ϕn(t) = E(exp(itSn)) = E(

exp(itn−1/2n∑

i=1

Yi))

=(ϕY (tn−1/2)

)n.

Next we define the cumulant generating function for Y as lnϕY (t). A MacLaurin ex-pansion of lnϕY (t) shows that if E(|Y |k) < ∞ then, after a rearrangement of the terms,we can write lnϕY (t) on the form

lnϕY (t) =k∑

j=1

(it)j

j!κj + o(|t|k) as t → 0

for some {κj}. We call the coefficients κj the cumulants of Y ; in particular κj is thej:th cumulant. See Section 15.10 of [8] for details.

By Theorem 4.2 in Chapter 4 of [18] we have that ϕY (t) = 1+∑k

j=1(it)j

j! EY j +o(|t|k)as t → 0 when E(|Y |k) < ∞. If we for a moment don’t worry about the existence of

11

moments and convergence of the series we conclude that

∞∑j=1

(it)j

j!κj = ln ϕY (t) = ln

(1 +

∞∑j=1

(it)j

j!EY j

)and by looking at the MacLaurin expansion of the right hand side (i.e. the MacLaurinexpansion of the function ln(1 + x) where we replace x with

∑EY j(it)j/j!) it follows

that∞∑

j=1

(it)j

j!κj =

∞∑k=1

(−1)k+1 1k

( ∞∑j=1

(it)j

j!EY j

)k.

Comparing the coefficients of (it)j we find that

κ1 = EY = 0

κ2 = EY 2 − (EY )2 = 1

κ3 = EY 3 − 3EY EY 2 + 2(EY )3 = EY 3

κ4 = EY 4 − 3(EY )2 − 4EY EY 3 + 12(EY )2EY 2 − 6(EY )4 = EY 4

The expression for κj holds whenever E(|Y |j) < ∞. Note that the assumption thatE(|X|j) < ∞ implies that E(|Y |j) < ∞.

Returning our attention to Sn, the relation ϕn(t) = (ϕY (tn−1/2))n and the fact thatκ1 = 0 and κ2 = 1 now give us that

ϕn(t) =(

exp( ∞∑

j=1

(itn−1/2)j

j!κj

))n= exp

( ∞∑j=1

n−(j−2)/2 (it)j

j!κj

)=

exp(− 1

2t2 + n−1/2 1

3!κ3(it)3 + . . . + n−(j−2)/2 1

j!κj(it)j + . . .

)=

e−t2/2 exp(n−1/2 1

3!κ3(it)3 + . . . + n−(j−2)/2 1

j!κj(it)j + . . .

)=

e−t2/2(1 + n−1/2r1(it) + n−1r2(it) + . . . + n−j/2rj(it) + . . .

)where the last equality is obtained through the MacLaurin expansion ex = 1+x+x2/2!+. . .

rj is a polynomial of degree 3j that depends on κ3, . . . , κj+2. By comparing thecoefficients of n−j/2 on the last two lines we find that

r1(x) =16κ3x

3

andr2(x) =

124

κ4x4 +

172

κ23x

6.

We can rewrite the expression for ϕn(t) above as

ϕn(t) = e−t2/2 + n−1/2r1(it)e−t2/2 + n−1r2(it)e−t2/2 + . . . + n−j/2rj(it)e−t2/2 + . . . (5)

12

Now, since ϕn(t) =∫∞x=−∞ eitxdP (Sn ≤ X) and e−t2/2 =

∫∞x=−∞ eitxdΦ(x) it seems

plausible that there is an inversion of (5) of the form

P (Sn ≤ x) = Φ(x) + n−1/2R1(x) + . . . + n−j/2Rj(x) + . . .

where Rk is a function such that∫∞−∞ eitxdRk(x) = rk(it)e−t2/2 so that

ϕn(t) =∫ ∞

x=−∞eitxdP (Sn ≤ X) =

∫ ∞

x=−∞eitxdΦ(x) + n−1/2

∫ ∞

−∞eitxdR1(x) + . . . = (5).

We would thus like to try to find such Rk. By repeating integration by parts j times wefind that

e−t2/2 =∫ ∞

x=−∞eitxdΦ(x) = (−it)−1

∫ ∞

x=−∞eitxdΦ(1)(x) = . . . =

(−it)−j

∫ ∞

x=−∞eitxdΦ(j)(x)

where Φ(k)(x) = dk

dxk Φ(x) = ( ddx)kΦ(x) = DkΦ(x). Hence

∫∞−∞ eitxd((−D)kΦ(x)) =

(it)ke−t2/2. Interpreting rk(−D) as a polynomial in D, making rk(−D) a differentialoperator, we thus have that

∫∞−∞ eitxd(rk(−D)Φ(x)) = rk(it)e−t2/2. Thus

Rk(x) = rk(−D)Φ(x).

By differentiating Φ(x) we find that, for k ≥ 1, (−D)kΦ(x) = −Hk−1(x)φ(x) where φ(x)is the density function for the standard normal distribution and Hk are the Hermitepolynomials:

H0(x) = 1H1(x) = x,

H2(x) = x2 − 1

H3(x) = x(x2 − 3),

H4(x) = x4 − 6x2 + 3,

H5(x) = x(x4 − 10x2 + 15), . . .

We concluded above that r1(x) = 16κ3x

3 and r2(x) = 124κ4x

4+ 172κ2

3x6 and since Rk(x) =

rk(−D)Φ(x) we thus have that

R1(x) =16κ3(−D)3Φ(x) = −1

6κ3H2(x)φ(x) = −1

6(x2 − 1)φ(x)

and

R2(x) = − 124

κ4H3(x)φ(x)− 172

κ23H5(x)φ(x) =

− x( 1

24κ4(x2 − 3) +

172

κ23(x

4 − 10x2 + 15))φ(x).

13

Thus

P (Sn ≤ x) = Φ(x) + n−1/2R1(x) + . . . + n−j/2Rj(x) + . . . =

Φ(x) + n−1/2p1(x)φ(x) + n−1p2(x)φ(x) + . . . + n−j/2pj(x)φ(x) + . . .(6)

with

p1(x) = −16κ2

3(x2 − 1) and

p2(x) = −x( 1

24κ4(x2 − 3) +

172

κ23(x

4 − 10x2 + 15)).

κ3 is the skewness of X and κ4 the kurtosis.It can be shown (see Section 2.4 of [22]) that the inversion of (5) leading to (6) is

valid when X is nonsingular and E(|X|j+2) < ∞ if we limit the series to j terms:

P (Sn ≤ x) = Φ(x) + n−1/2R1(x) + . . . + n−j/2Rj(x) + o(n−j/2) =

Φ(x) + n−1/2p1(x)φ(x) + n−1p2(x)φ(x) + . . . + n−j/2pj(x)φ(x) + o(n−j/2).(7)

This completes the proof of half of Corollary 1.

If X ∼ N(µ, σ2) then Sn ∼ N(0, 1) so we’d expect pj to be 0 for all j. This isindeed the case, since κj = 0 for j ≥ 3 for the standard normal distribution. Thus the”expansion” still holds when P (Sn ≤ x) = Φ(x).

It is of interest to note that both the skewness and the kurtosis are scale and trans-lation invariant, so that the third and fourth cumulants for X and Y coincide (we useY in the calculations above because standardized variables are easier to handle). Itcan be shown, by looking at characteristic functions or by straightforward calculation(as is done in Appendix A), that the skewness of X is n−1/2 times the skewness of X.Similarly the kurtosis of X is n−1 times the kurtosis of X and in general, for j ≥ 2 thej:th cumulant of X will be the j:th cumulant of X times n−(j−2)/2. Thus we can viewthe factors n−(j−2)/2 in the Edgeworth expansion for Sn as coming from the cumulantsof X.

2.2.2 Edgeworth expansions for more general statistics

The procedure for finding the Edgeworth expansion for more general statistics An isessentially the same as that in the previous section. We briefly mention the result. LetAn be either AS = n1/2(θ−θ)/σ or AT = n1/2(θ−θ)/σ, where θ is some unknown scalarparameter, θ is an asymtotically unbiased estimator of θ, σ2 is the asymptotic varianceof n1/2θ and σ2 is some consistent estimator of σ2.

Denote by κj,n the j:th cumulant of An. Under the regularity conditions stated inTheorem 2, for j ≥ 1, we can expand κj,n as

κj,n = n−(j−2)/2(kj,1 + n−1kj,2 + n−2kj,3 + . . .).

14

for some kj,i, where κ1,1 = 0 and κ2,1 = 1. It can be shown, through calculations thatare completely analogous to the Sn case, where the cumulants of An are replaced bytheir expansions, that the first two polynomials in the Edgeworth expansion for An are

a1(x) = −(k1,2 +

16k3,1H2(x)

)= −

(k1,2 +

16k3,1(x2 − 1)

)(8)

and

a2(x) = −(1

2(k2,2 + k2

1,2)H1(x) +124

(k4,1 + 4k1,2k3,1)H3(x) +172

k23,1H5(x)

)=

− x(1

2(k2,2 + k2

1,2) +124

(k4,1 + 4k1,2k3,1)(x2 − 3) +172

k23,1(x

4 − 10x2 + 15)) (9)

As before the inversion is valid, giving the truncated expansion

P (An ≤ x) = Φ(x) + n−1/2a1(x)φ(x) + . . . + n−j/2aj(x)φ(x) + o(n−j/2)

when E|X|j+2 < ∞ and X satisfies Cramer’s condition.

Thus the problem of finding the Edgeworth expansion for An on the form n1/2(θ− θ)/σor n1/2(θ − θ)/σ amounts to finding the terms kj,i in the expansion

κj,n = n−(j−2)/2(kj,1 + n−1kj,2 + n−2kj,3 + . . .)

of the cumulants of An.As we discussed in the previous section, if An = Sn is known then

κj,n = n−(j−2)/2κj

for j ≥ 2, where κj is the j:th cumulant for Y = (X −µ)/σ. Thus kj,1 = κj and kj,i = 0for i ≥ 2. This reduces the expressions for a1 and a2 above to those for p1 and p2 in (6).

2.2.3 Edgeworth expansion for Tn

Finally, consider the statistic Tn = n1/2(X − µ)/σ where σ2 = 1n

∑(Xi− X)2. It can be

shown that the kj,i in the expansion of the cumulants of Tn are

k1,2 = −12γ,

k2,2 =14(7γ2 + 12),

k3,1 = −2γ and

k4,1 = 12γ2 − 2κ + 6.

Inserting these into (8) and (9) we get

q1(x) = a1(x) = −(−12γ − 2

6γ(x2 − 1)) =

16γ(2x2 + 1)

15

and

q2(x) = a2(x) = −(1

2(14(7γ2 + 12) + (−1

2γ)2)+

124

(12γ2 − 2κ + 6 + 4(−12γ)(−2γ)(x2 − 3) +

172

(−2γ)2(x4 − 10x2 + 15))

=

x( 1

12κ(x2 − 3)− 1

18γ2(x4 + 2x2 − 3)− 1

4(x2 + 3)

).

We require that EX4 < ∞ and that X satisifes Cramer’s condition for the expansion

P (Tn ≤ x) = Φ(x) + n−1/2q1(x)φ(x) + n−1q2(x)φ(x) + o(n−1)

to hold.

2.2.4 Some remarks; skewness correction

We’ve seen above that for both Sn and Tn the first polynomial a1 depends on theskewness κ3 = γ = E(X − µ)3/σ3 and that the second polynomial a2 depends on γ2

and the kurtosis κ4 = κ = E(X − µ)4/σ4 − 3. This is also true for many more generalstatistics An. In such cases, a1 is said to describe the primary effect of skewness whilea2 is said to describe the primary effect of kurtosis and the secondary effect of skewness.A skewness corrected statistic is thus a statistic that has been modified in some way sothat a1 = 0.

2.3 Cornish-Fisher expansions for quantiles

An interesting use of the Edgeworth expansion is asymptotic expansions of the quantilesof An, obtained by what is essentially an inversion of the Edgeworth expansion. Suchexpansions are called Cornish-Fisher expansions and first appeared in [6] and [17].

Let An be a statistic with the Edgeworth expansion

P (An ≤ x) = Φ(x) + n−1/2a1(x)φ(x) + . . . + n−j/2aj(x)φ(x) + . . . (10)

and let vα be the α-quantile of An, so that P (An ≤ vα) = α. Furthermore, let λα bethe α-quantile of the N(0, 1)-distribution, i.e. let Φ(λα) = α. Then there exists anexpansion of vα in terms of λα:

vα = λα + n−1/2s1(λα) + n−1s2(λα) + . . . + n−j/2sj(λα) + . . . (11)

(11) is called the Cornish-Fisher expansion of vα.The functions sk are polynomials of degree at most k + 1, odd for even k and even

for odd k, that depend on cumulants of order at most k + 2. They are determined bythe polynomials ak in (10).

[22] contains a short introduction to the Cornish-Fisher expansion, where it is shownthat s1(x) = −a1(x) and s2(x) = a1(x)a

′1(x)− 1

2a1(x)2 − a2(x).

16

3 Methods for skewness correction

We discuss the need of skewness correction. Some methods for obtaining second ordercorrect confidence intervals are described. We assume throughout the section that X isabsolutely continuous.

3.1 Coverages of confidence intervals

Definition 2. Let Iθ(α) be a confidence interval for an unknown parameter θ withapproximate confidence level α. We call α the nominal coverage of Iθ(α). Furthermorewe call α

′= P (θ ∈ Iθ(α)) the actual coverage of the confidence interval and define the

coverage error as the difference between the actual coverage and the nominal coverage,α

′ − α.

If Iθ(α) has been reasonably constructed then this difference will converge to zeroas the sample size n increases. We will illustrate how the Edgeworth expansion can beused to estimate the order of coverage errors.

Let X1, . . . , Xn be an i.i.d. sample from a distribution, with mean µ and varianceσ2, such that the Edgeworth expansion for Sn and Tn exists. Consider the two sidedconfidence interval Jµ(α) = (X−n−1/2σzα, X +n−1/2σzα) where P (|Z| ≤ zα) = α whenZ ∼ N(0, 1). We find that

P (µ ∈ Jµ(α)) = P (Sn > −zα)− P (Sn > zα) =P (Sn ≤ zα)− P (Sn ≤ −zα) =

Φ(zα)− Φ(−zα) + n−1/2(p1(zα)φ(zα)− p1(−zα)φ(−zα))

+ n−1(p2(zα)φ(zα)− p2(−zα)φ(−zα))

+ n−3/2(p3(zα)φ(zα)− p3(−zα)φ(−zα)) + o(n−3/2) =

α + 2n−1p2(zα)φ(zα) + o(n−3/2).

The last equality follows since p2 is odd and p1, p3 and φ are even. The same result holdsif we have σ instead of σ, with q2 instead of p2. Thus the coverage error for two-sidednormal approximation confidence intervals is of order n−1. In some sense we can thinkof the two-sided confidence intervals as containing an implicit skewness correction.

The situation is not as good for one-sided confidence intervals. Consider the one-sided normal approximation confidence interval Iµ(α) = (−∞, X + n−1/2σλα), whereΦ(λα) = α. The coverage of Iµ(α) is

P (µ ∈ Iµ(α)) = P (µ ≤ X + n−1/2σλα) = P (Sn ≥ −λα) =

1− (Φ(−λα) + n−1/2p1(−λα)φ(−λα) + o(n−1/2)) =

α− n−1/2p1(λα)φ(λα) + o(n−1/2).

17

The coverage for the interval I′µ(α) = (−∞, X + n−1/2σλα) is analogously found to be

α− n−1/2q1(λα)φ(λα) + o(n−1/2)

Thus, for one-sided confidence intervals, normal approximation gives a coverage error oforder n−1/2.

The polynomials p1 and q1 both contain the skewness γ. Thus we see that the skewnessof X affects the actual coverage of the normal approximation confidence intervals. Inparticular, when the skewness is zero the n−1/2 term of the coverage error disappearsand when the skewness is large the coverage error might be large. When X is skew it ispossible to obtain confidence intervals with a better coverage by correcting for skewness.Some methods for this are presented next.

We will assume that the variance σ2 and the skewness γ are known. It might seemlike a bit of a contradiction that the second and third central moments are known, butnot the mean. An example where such a situation could occur is when a measuringinstrument that has been used sufficiently much, so that the variance and skewness ofit’s measures are known, is used to measure something that has not been measuredbefore. One could of course argue that in that case the distribution, or at least thequantiles, of the measurement errors might be known as well and that a parametricconfidence interval would make more sense. Let us however assume that the quantilesare unknown and that the density function is unkown or too complicated to work withfor such procedures to be fruitful.

3.2 Edgeworth correction

In many cases we wish to derive a confidence interval using some statistic An. In caseswhere the Edgeworth expansion for An is known, we can obtain confidence intervals witha coverage error of a smaller order than that of the normal approximation interval. Inparticular, we can make an explicit correction for skewness using the following theorem,various versions of which were proved in [27], [19], [29] and [1].

Theorem 3. Let An be either Sn = n1/2(X − µ)/σ or Tn = n1/2(X − µ)/σ and assumethat An admits the Edgeworth expansion

P (An ≤ x) = Φ(x) + n−1/2a1(x)φ(x) + o(n−1/2).

ThenP

(An ≤ x− n−1/2a1(x)

)= Φ(x) + o(n−1/2) (12)

andP

(An ≤ x− n−1/2a1(x)

)= Φ(x) + o(n−1/2)

where a1 is the polynomial a1 with population moments replaced by sample moments.

18

Under the assumptions of Theorem 3, if An = Sn and if the skewness of X is known,then IEcorr =

(−∞, X −n−1/2σλ1−α +n−1σp1(λ1−α)

)has nominal coverage α and, by

(12),

P (µ ∈ IEcorr) = P(X − n−1/2σλ1−α + n−1σp1(λ1−α) > µ

)=

P(Sn > λ1−α − n−1/2p1(λ1−α)

)= 1− Φ(λ1−α) + o(n−1/2) = α + o(n−1/2).

Thus the coverage error of the confidence intervals is of order n−1. We note that, in thenotation of Section 2.3, λ1−α−n−1/2a1(λ1−α) = λ1−α+n−1/2s1(λ1−α) = v1−α+o(n−1/2).Heuristically, we can therefore consider the idea behind the explicit correction to beto replace the quantile of the normal distribution with the truncated Cornish-Fisherexpansion of the corresponding quantile of An.

The interval can be corrected further, by correcting for kurtosis using the n−1 termin the Edgeworth expansion for An + n−1/2a1(x) analogously, to obtain an even smallercoverage error. This iteration will however sometimes result in an over-corrected interval,particularly when n is small; see for instance [19] or [1]. Moreover, the expression is alot harder to derive analytically for such corrections.

3.3 The bootstrap

The bootstrap procedure for resampling is a popular tool for estimating the distributionsof statistics. It was first introduced by Efron in 1979 in [15]. Some theoretical results,based on Edgeworth expansions and similar techniques, regarding the asymptotic per-formance of the bootstrap for the statistic Sn = n1/2(X−µ)/σ were provided in 1981 bySingh in [28] and Bickel and Freedman in [3]. Since then asymptotics and performancein more general situations have been studied. One of the most important situationsis the construction of confidence intervals using the bootstrap; we will investigate theproperties of bootstrap confidence intervals using the Edgeworth expansion.

3.3.1 The bootstrap and Sn

We briefly mention the results of [28] and [3] to motivate why the bootstrap might beused for skewness correction. We omit the details and only try to give a general ideaabout the result.

The statistic

Sn = n1/2 X − µ

σ

has the bootstrap analogue

S∗n = n1/2 X∗ − x

σ

where X∗ ∼ Fn, i.e. the distribution function of X∗ is the empirical function, EX∗ = x,

Var(X∗) = σ2 = 1n

∑ni=1(xi − x)2 and Skew(X∗) = γ =

1n

Pni=1(xi−x)3

σ3 . Part D of

19

Theorem 1 in [28] says that

n1/2||P (Sn ≤ x)− P2∗(S∗n ≤ x)||∞ −→ 0 a.s. (13)

The main idea behind this is the following. It is shown that the conditional Edgeworthexpansion for S∗n can be written as

P ∗(S∗n ≤ x) = Φ(x)− n−1/2 16γ(x2 − 1)φ(x) + Rn(x)

uniformly in x where n1/2Rn(x) −→a.s. 0. By looking at the Edgeworth expansions for Sn

and S∗n and considering the difference between γ and γ (13) follows.In particular, this means that there is no n−1/2 error term (or term of lower order)

present when the distribution of Sn is approximated with the bootstrap distribution.It thus seems plausible that a confidence interval based on the quantiles of S∗n will besecond order correct. It has been shown that this indeed is the case.

3.3.2 Bootstrap confidence intervals

There a numerous bootstrap methods for constructing confidence intervals. A few papersin the 1980’s have been important for the understanding of the theoretical propertiesof these methods. Abramovitch and Singh ([1]) and Hall ([20]) discussed the coverageof bootstrap confidence intervals and in [21] Hall developed a unified framework for thetheory for different types of bootstrap confidence intervals. This allowed a comparisonof the different methods. The discussion in the literature has mainly focused on the casewhere σ is unknown, but as before we consider the case where σ is known. We use amethod that we will call percentile-s, in which we try to estimate the quantiles of Sn bythose of S∗n.

The idea is the following. Let v1−α be such that P (Sn ≤ v1−α) = 1−α. A one-sidedconfidence interval with confidence level α is I

′µ(α) = (−∞, X − v1−ασn−1/2). Since the

distribution of Sn is unknown we would like to estimate v1−α somehow. Looking at (13)it seems reasonable to use S∗n for the estimation. The distribution function P ∗(S∗n ≤ x)is however also unknown.

It can be estimated by using B bootstrap replications of Sn: S∗n,1, . . . , S∗n,B where

S∗n,i = n1/2(X∗ − x)/σ. Looking at the bootstrap sample order statistics S∗n,(1) ≤ . . . ≤S∗n,(B) we can estimate v1−α by v∗1−α = S∗n,((1+B)(1−α)).

The coverage of the percentile-s confidence interval I(2)µ (α) = (−∞, X− v∗1−ασn−1/2)

isP

(µ ∈ I(2)

µ (α))

= α + o(n−1/2).

This is a consequence of (13) (see also [21]). We have thus obtained a skewness correctedconfidence interval.

20

3.4 A new simulation method

Both the Edgeworth correction and the bootstrap are methods for getting skewnesscorrected confidence intervals by approximating the quantile v1−α for Sn. As we’veseen this is done by approximating v1−α via Cornish-Fisher expansion in the Edgeworthcorrection case and by estimating the quantiles using bootstrap replicates in the differentbootstrap methods.

Another idea for obtaining skewness corrected confidence intervals is to somehowtransform the problem into one relating to a statistic with zero skewness. A method fordoing this by using transformations is discussed in [23]. Next we describe a simulationmethod that corrects for skewness by adding a simulated random variable. Inferencecan then be made about the obtained skewness corrected distribution.

3.4.1 Skewness correction through addition of a random variable

Lemma 1. Let X be a random variable with EX = µ, Var(X) = σ2 and E(X−EX)3 =µ3 so that Skew(X) = γ = µ3/σ3.

Now let Y be a random variable, independent of X, such that EY = 0, Var(Y ) =aσ2 and E(Y − EY )3 = −µ3. Then E(X − Y ) = µ, Var(X − Y ) = (1 + a)σ2 andSkew(X − Y ) = 0.

Proof. The results for the mean and variance of X − Y are well known. The result forthe skewness follows from Corollary 4 in Appendix A.

Let X1, . . . , Xn be n observations from the distribution of X and Y1, . . . , Yn be nobservations from the distribution of Y . Define Zi = Xi − Yi so that Z1, . . . , Zn becomen observations from the distribution of Z = X − Y .

It was shown in Section 2.2 that p1(x) = −16γ(x2 − 1) in the Edgeworth expansion

for Sn = n1/2(X − µ)/σ. We see that, since γZ = Skew(X − Y ) = 0, the first term inthe Edgeworth expansion for S

′n = n1/2(Z − µ)/

√1 + aσ will be zero.

Theorem 4. Let Zi = Xi − Yi where {Xi} and {Yi} are as in Lemma 1. Then theconfidence interval Inew = (−∞, Z + n−1/2

√1 + aσλα) is skewness corrected and has

coverage α + n−1λα

(124κZ(λ2

α − 3))φ(λα) + o(n−1).

Proof. The Edgeworth expansion for S′n = n1/2(Z−µ)/

√1 + aσ is P (S

′n ≤ x) = Φ(x)+

n−1p2(x)φ(x) + o(n−1). Hence

P (µ ∈ JEcorr) = α− n−1p2(λα)φ(λα) + o(n−1) =

α + n−1λα

( 124

κZ(λ2α − 3) +

172

γ2Z(λ4

α − 10λ2α + 15)

)φ(λα) + o(n−1) =

α + n−1λα

( 124

κZ(λ2α − 3)

)φ(λα) + o(n−1)

(14)

and thus the coverage error of Inew is of order n−1.

A simulation procedure that gives us the observations zi will be described next.

21

3.4.2 Simulation procedure

Let X be as above and suppose that n observations x1, . . . , xn from the distribution ofX are given and that σ2 and γ

′(and thus γ) are known. Then, if we can find a random

variable Y , with the same properties as Y above, such that we can simulate pseudorandom numbers from the distribution of Y , we can obtain the confidence interval Jµ asfollows.

Simulate n numbers Y1, . . . , Yn from the distribution of Y and let Zi = xi−Yi. ThenInew = (−∞, Z + n−1/2

√1 + aσλα) has a coverage error of order n−1.

A natural question at this point is whether or not there exists a distribution or a familyof distributions that can be used for this kind of simulation in general circumstances.The answer is that yes, such distributions exist. A possible choice for the distribution ofthe simulation variable Y is (a shifted version of) the inverse Gaussian distribution. It’susefulness follows from the fact that it’s variance and skewness easily can be controlledby its two parameters, as we will show.

Definition 3. Y is inverse Gaussian distributed with parameters λ > 0 and µ > 0 if

FY (y) = P (Y ≤ y) = Φ(√

λ

y

( y

µ− 1

))+ e2λ/µΦ

(−

√λ

y

( y

µ+ 1

)), y > 0

fY (y) =√

λ/(2πy3) exp(−λ(y − µ)2/(2yµ2)), y > 0

Note that this is not the inverse of the normal distribution; the name is not to betaken literally.

Lemma 2. Let Y be inverse Gaussian distributed with parameters λ > 0 and µ > 0.Then

EY = µ,

Var(Y ) =µ3

λ,

Skew(Y ) = 3(µ

λ)1/2 and

Kurt(Y ) = 15µ

λ.

See Chapter 2 of [5] for a proof and further discussion of the distribution. UsingLemma 2 we can now decide which values of λ and µ to choose for our skewness correctingsimulation variable.

Lemma 3. Let µ3 > 0 and σ > 0. Then the inverse Gaussian distribution with param-eters λ = 27 ·σ10 ·µ−3

3 and µ = 3 ·σ4 ·µ−13 has variance σ2, third central moment µ3 and

skewness µ3/σ3.

Proof. By Lemma 2 the inverse Gaussian distribution has variance µ3/λ and skewness3 ·

√µ/λ. We want the variance to equal σ2 and the skewness to equal µ3/σ3. The first

22

equality gives us that λ = µ3 · σ−2 and putting this into the second equality yields theexpression a/σ3 = 3 · µ1/2 · µ−3/2 · σ−1 from which it follows that µ = 3 · σ4 · µ−1

3 andtherefore that λ = 27 · σ10 · µ−3

3 .

Now assume that Y′is inverse Gaussian distributed with parameters λ and µ. Then

EY′

= µ. Since we wanted to use a random variable with mean 0 for our simulationwe choose Y = Y

′ − µ as our simulation variable. Then EY = 0, Var(Y ) = σ2 andSkew(Y ) = µ3/σ3 where σ and µ3 can be choosen arbitrarily. Note that the skewnessof Y always is positive, so that if the skewness of X is positive we would choose −Y asour simulation variable instead.

Next we state a lemma that will prove to be useful later in our discussion.

Lemma 4. Let Y1, . . . , Yn be i.i.d. inverse Gaussian distributed random variables withparameters λ > 0 and µ > 0. Then Y is inverse Gaussian with parameters nλ and µ.

See Section 2.4 of [5] for a proof.Finally, we mention that the inverse Gaussian distribution is implemented in R in

the SuppDists-library.

4 Comparison

Having looked at different confidence intervals we now want to compare them. We onlycompare the one-sided intervals Inormal, Inew and IEcorr.

4.1 Coverages of confidence intervals

Assume that the variance σ2 and the skewness γ 6= 0 of the i.i.d. random vari-ables X1, . . . , Xn are known and that X satisfies Cramer’s condition and is such thatE|X|3 < ∞. We consider the following three confidence intervals for the mean µ:

The normal approximation confidence interval Inormal(α) = (−∞, X +n−1/2σλα), whereΦ(λα) = α,

P(µ ∈ Inormal(α)

)= α− n−1/2p1(λα)φ(λα) + o(n−1/2),

the Edgeworth corrected confidence interval IEcorr =(−∞, X−n−1/2σλ1−α+n−1σa1(λ1−α)

),

P(µ ∈ IEcorr(α)

)= α + o(n−1/2),

and the skewness corrected simulation confidence interval Inew = (−∞, Z +n−1/2σZλα),where Skew(Z) = 0,

P(µ ∈ Inew(α)

)= α + o(n−1/2).

23

4.2 Criteria for the comparison

It seems that Inormal has the worst coverage. However, coverage is not the only measureof the performance of a confidence interval. Another important property is the lengthof the interval. In general, if two confidence intervals have the same coverage we preferthe one that is shorter. Alternatively, if two confidence intervals are of the same lengthwe prefer the one with the coverage closest to α. Things get a bit trickier if we haveone short interval with bad coverage and one wide interval with good coverage; we mustsomehow choose between good length and good coverage. And then there is of coursethe computional cost of the computer intensive methods versus the algebraic work thatthe analytical methods require. Regardless of how we weigh these properties, it is ofinterest to compare the lengths of the confidence intervals above. Since the intervals areone-sided their length, in the usual sense of the word, is in fact infinite. We thereforearive at the following definition.

Definition 4. Given two intervals, I1 = (−∞, θ1) and I2 = (−∞, θ2), both with coverageα + o(n−j/2) for some j, we say that I2 is better than I1 if P (θ2 ≤ θ1) ≥ 1/2.

4.3 Comparisons of the upper limits

We will compare the upper limit obtained by our skewness correction simulation methodwith those obtained by the bootstrap and explicit skewness correction. Throughout welet γX denote the skewness of X.

Consider the Edgeworth corrected confidence interval IEcorr = (−∞, X −n−1/2σλ1−α +n−1σp1(λ1−α)) = (−∞, θEcorr) and our simulation skewness corrected interval Inew =(−∞, Z + n−1/2

√1 + aσλα) = (−∞, θnew). Both intervals have coverage α + o(n−1/2).

Recall that λ1−α = −λα and that p1(λ1−α) = −16γX(λ2

α − 1). Thus

P (θnew ≤ θEcorr) = P (θnew − θEcorr ≤ 0) =

P(Z + n−1/2

√1 + aσλα − X + n−1/2σλ1−α − n−1σp1(λ1−α) ≤ 0

)=

P(Z − X + n−1/2σλα(

√1 + a− 1) + n−1 1

6σ(λ2

α − 1)γX ≤ 0)

=

P(Y ≤ −n−1/2σλα(

√1 + a− 1)− n−1 1

6σ(λ2

α − 1)γX

).

Since the distribution of Y is known this probability is fully known. It is clearly depen-dent on the distribution of Y , but some words can be said about its general behaviour.

Theorem 5. If the median of Y is larger than −n−1/2σλα(√

1 + a−1)−n−1 16σ(λ2

α−1)γX

then Inew is better than IEcorr.

Proof. By the definition of the median, if −n−1/2σλα(√

1 + a − 1) − n−1 16σ(λ2

α − 1)γX

is larger than the median, the probability will be larger than 1/2.

24

For a large class of unimodal continuous densities, including the Pearson system andthe inverse Gaussian distribution it holds that

mean > median when γ > 0

mean < median when γ < 0

(see [25]). Assume that Y has such a density and note that the sign of the skewness ofX determines the sign of the skewness of Y . Then Inew is never better than IEcorr whenγX > 0. It is sometimes better when γX < 0; in particular the method gets better as|γX | increases.

To see this, let γX > 0. Then γY < 0 and thus the mean of Y is smaller thanthe median. Since EY = 0 the median must thus be positive. But in that case−n−1/2σλα(

√1 + a−1)−n−1 1

6σ(λ2α−1)γX < 0 for reasonable values of α and it therefore

seems that the new method is not to be recommended in such cases.On the other hand, if γX < 0 then γY > 0 and the mean of Y is larger than the

median, so that the median must be negative. Moreover, the second part of the righthand expression, −n−1 1

6σ(λ2α − 1)γX , is positive and if γX is big enough the right hand

side is therefore larger than the median of Y . We see that in this case it is indeed possiblethat the new method gives intervals that are shorter in general than those obtained usingEdgeworth correction.

For the special case considered in Section 3.4.2 we have the following corollary.

Corollary 2. Assume that Y is generated using an inverse Gaussian distributed randomvariable Y

′, as described in Section 3.4.2, and let λY and µY be the parameters in the

distribution of Y′. If γX > 0 then the probability that Inew is better than IEcorr is

Φ(√

nλY

x1

( x1

µY− 1

))+ e2nλY /µY Φ

(−

√nλY

x1

( x1

µY+ 1

))(15)

wherex1 = µY − n−1/2σλα(

√1 + a− 1)− n−1 1

6σ(λ2

α − 1)γX ,

and if γX < 0 the probability is

Φ(√

nλY

x2

( x2

µY− 1

))+ e2nλY /µY Φ

(−

√nλY

x2

( x2

µY+ 1

))(16)

wherex2 = µY + n−1/2σλα(

√1 + a− 1) + n−1 1

6σ(λ2

α − 1)γX .

Proof. First, assume that γX < 0. Then it follows from Lemma 4 that Y +µY is inverseGaussian with parameters nλY and µY and we can rewrite the probability as

P(Y + µY ≤ µY − n−1/2σλα(

√1 + a− 1)− n−1 1

6σ(λ2

α − 1)γX

)= P (Y + µY ≤ x1) =

Φ(√

nλY

x1

( x1

µY− 1

))+ e2nλY /µY Φ

(−

√nλY

x1

( x1

µY+ 1

)).

25

Second, assume that γX > 0 and that Y is generated using an inverse Gaussian dis-tributed random variable Y

′. Then Y will be defined as Y = −(Y

′ − µY ) and −Y + µY

is inverse Gaussian. Thus

P(Y ≤ −n−1/2σλα(

√1 + a− 1)− n−1 1

6σ(λ2

α − 1)γX

)=

P(− Y + µY > µY + n−1/2σλα(

√1 + a− 1) + n−1 1

6σ(λ2

α − 1)γX

)=

1− P (−Y + µY ≤ x2) =

Φ(√

nλY

x2

( x2

µY− 1

))+ e2nλY /µY Φ

(−

√nλY

x2

( x2

µY+ 1

)).

Somewhat surprisingly, (15) and (16) do not depend on σ. Although not easily seenalgebraically, this is an effect of the parametrization of the inverse Gaussian distributionand of how λY and µY were defined from σ and γX . Appendix B contains tables withexplicit values of (15) and (16) for some combinations of α, n and γX as well as somefigures describing how (15) and (16) depend on α.

From the discussion above, as well as from Tables 1-2 and Figures 1-3 in the appendixwe deduce that the sign and magnitude of γX is of great importance for the theoreticalperformance of our simulation method. In particular, the new method seems to have abetter performance when γX is large and negative and n is small. This effect is evenclearer when α is close to 1.

4.4 Simulation results

Some simulation results regarding the performance of the new simulation method com-pared with Edgeworth correction and normal approximation are given in Appendix C.

The performance of the new method is quite good when the nominal coverage is 0.95and γX is −10.4 or −20.1. When γX = −40.3 the usual normal approximation intervalseems preferable. This might seem surprising if we bear in mind the part that skewnessplays in the Edgeworth expansion, but for distributions with a large skew the skewness(and indeed the variance) will largely be due to infrequent big deviations from the mean.In small sample sizes such as those considered in the simulations there may not be suchobservations, in which case the sample skewness will be close to zero.

When the nominal coverage is 0.99 the Edgeworth correction method seems to besuperior. The performance of the new method is however always better than that of thenormal approximation.

The choice of method is not independent of the distribution of the Xi. It would thusbe of interest to perform further simulations with different distributions.

4.5 Discussion

The theoretical investigation, as well as the simulation study, gives slightly encouragingresults. There does seem to be cases, albeit admittedly somewhat unusual, where the

26

new method performs better than Edgeworth correction and normal approximation. Itwould be of interest to compare the new method with the percentile-s bootstrap as well.It can be shown that θboots = θEcorr + OP (n−3/2) and thus we expect the result of sucha comparison to be quite similar to the comparison with Edgeworth correction.

The assumption that both the second and the third central moments are known areperhaps somewhat unrealistic. It would therefore be desireable to expand the methodto the situation where these are estimated from the xi-sample. A problem that must behandled is that this introduces a dependence between X and Y .

It is possible that the correction might be extended to higher moments as well, usingaddition formulae like those for skewness. One idea is to use a SIMEX-like method tocorrect for kurtosis, since the fact that the kurtosis always is greater than −2 makes ithard to get rid of it the same way that the new method gets rid of skewness. This couldthen also be used for two-sided confidence intervals.

It might also be desireable to obtain skewness corrected intervals that are shorterthan those given by the new method. Letting a → 0 does not seem to be a goodway of doing this, so we might consider adding a second pseudo-random variable thatis used in the SIMEX-fashion to reduce the variance. We might also consider to usedifferent conditions on Y ; for instance it is perhaps possible not to require that EY = 0.Another idea is to try to construct Y using a U -statistic where the differences Xi −Xj

are considered. Once more, this would introduce a dependence between X and Y .Different kinds of comparisons between the methods can be made. We can compare

mean length of the intervals or let the length be fixed and study the coverages given bythe different methods under that condition.

Finally, it is possible that the method will have a better performance for more generalstatistics An. The notion of skewness might have a somewhat different meaning for theseand the Edgeworth expansions have a different behaviour.

27

A Appendix: Skewness and kurtosis

Several measures of symmetry (or asymmetry) have been suggested throughout history.The most popular one is the third standardized moment, commonly known as the skew-ness.

Definition 5. The skewness of a random variable X, henceforth denoted Skew(X), isit’s standardized third moment:

Skew(X) = γ =µ3

σ3=

E(X − EX)3

(E(X − EX)2)3/2

(see for instance section 15.8 of [8] or section 4.6 of [18]).

Let us note that the third central moment of aX+b is E(aX+b−aEX−b)3 = a3E(X−EX)3. We notice that the skewness is translation invariant and that Skew(aX) =sign(a) · Skew(X). Furthermore, if X is symmetric then all odd central moments ofX are zero and thus Skew(X) = 0. The converse does not necessarily hold, that is,skewness equal to zero does not imply symmetry.

The ”peakedness” of a distribution is measured using the kurtosis of the distribution. Itdescribes to which extent the variance of the distribution depends on uncommon largedeviations from the mean; if the kurtosis is high then these are the reason for much ofthe variance and if it is low most of the variance comes from frequent moderately sizeddeviations.

Definition 6. The kurtosis (or excess kurtosis) of a random variable X, henceforthdenoted Kurt(X), is:

Kurt(X) = κ =µ4

σ4− 3 =

E(X − EX)4

(E(X − EX)2)2− 3

(see for instance section 15.8 of [8] or section 4.6 of [18]).

We see that Kurt(aX + b) = Kurt(X). Kurt(X) = 0 when X is normal. Moreover,since µ4/σ4 ≥ 1, −2 ≤ Kurt(X) ≤ ∞.

A.1 The skewness of a sum of random variables

Although formulae for the expectation and variance of a sum of random variables arewell known, the corresponding formulae for the third central moment or the skewnessof a sum of random variables is rarely seen in the literature. The proofs amount tostandard calculations of expectations, but nevertheless they are given here for reference.

In the following calculations we will use µZ3 to denote the third central moment of

a random variable Z, whenever we need to be able to distinguish between moments ofdifferent random variables.

28

Lemma 5. Let µZ3 = E(Z −EZ)3 be the third central moment of the random variable Z

and let X and Y be random variables. Then

µX+Y3 = µX

3 + µY3 + 3 · Cov(X2, Y ) + 3 · Cov(X, Y 2)− 6 · (EX + EY ) · Cov(X, Y ) (17)

Proof. We expand the expression for the third central moment of X + Y as follows.

E(X + Y − E(X + Y ))3 = E((X − EX) + (Y − EY ))3 =

E(X − EX)3 + E(Y − EY )3 + 3E(X − EX)2(Y − EY ) + 3E(X − EX)(Y − EY )2

(18)

Next we consider the third part of the last expression above:

E(X − EX)2(Y − EY ) = E(X2 − 2XEX + (EX)2)(Y − EY ) =

E(X2Y − 2XY EX + Y (EX)2 −X2EY + 2XEXEY − (EX)2EY ) =

EX2Y − 2EXY EX + EY (EX)2 − EX2EY + 2(EX)2EY − (EX)2EY =

EX2Y − EX2EY + 2(EX)2EY − 2EXY EX =

EX2Y − EX2EY + 2EX(EXEY − EXY ) = Cov(X2, Y )− 2EXCov(X, Y )

In the exact same order we conclude that the fourth part of (18) can be written as

E(X − EX)(Y − EY )2 = Cov(X, Y 2)− 2EY Cov(X, Y )

And thus, since E(X − EX)3 = µX3 , we have that

(18) = µX+Y3 = µX

3 + µY3 + 3 ·Cov(X2, Y ) + 3 ·Cov(X, Y 2)− 6 · (EX + EY ) ·Cov(X, Y )

Using that Var(X + Y ) = Var(X) + Var(Y ) + 2 ·Cov(X, Y ) the addition formula forskewness follows:

Corollary 3. Let X and Y be random variables. Then

Skew(X+Y ) =µX

3 + µY3 + 3 · Cov(X2, Y ) + 3 · Cov(X, Y 2)− 6 · (EX + EY ) · Cov(X, Y )

(Var(X) + Var(Y ) + 2 · Cov(X, Y ))3/2

(19)

Two important special cases of Corollary 3 immediately follow.

Corollary 4. Let X and Y be independent random variables. Then

Skew(X + Y ) =µX

3 + µY3

(σ2X + σ2

Y )3/2(20)

The extension to the sum of n independent variables is straightforward. It is ofinterest to note that Corollary 4 holds regardless of the values of the expected values ofX and Y ; in particular, it holds when EX and/or EY are unknown.

29

Corollary 5. Let X1, X2, . . . , Xn be i.i.d. random variables. Then

Skew(X1 + X2 + . . . + Xn) =1√n

µX3

σ3X

=1√n· Skew(X) (21)

Corollary 5 is proved by using Corollary 4 n− 1 times and noticing that Skew(X1 +X2 + . . . + Xn) = (n · µX

3 )/(n · σ2X)3/2 = n/n3/2 · Skew(X) = 1/

√n · Skew(X).

The i.i.d. condition can be somewhat weakened. It is clear from Corollary 3 that itis sufficient that the variables have equal variance and third central moments and thatCov(X2

i , Xj) = Cov(Xi, X2j ) = Cov(Xi, Xj) = 0 when i 6= j.

A.2 The kurtosis of a sum of random variables

For higher moments the addition formulae become more complicated, which is one reasonfor introducing cumulants in the first place; they still have nice addition formulae. Weare not as concerned about kurtosis as we are about skewness and therefore we mentiononly the following lemma, which follows from the remark at the end of Section 2.2.2.

Lemma 6. Let X and Y be independent random variables, with finite fourth centralmoments, such that Var(X) = Var(Y ). Then

Kurt(X + Y ) =14

(Kurt(X) + Kurt(Y )

)(22)

30

B Appendix: P (θnew ≤ θEcorr)

This appendix contains some tables and figures relating to the comparison of our sim-ulation method and the explicit correction method. Tables 1 and 2 give explicit valuesof (15) and (16) (see Subsection 4.3) for some combinations of α, n and γX . Figures 1-3show these probabilities as functions of α for eight values of γX and three values of n.Finally, Tables 3 and 4 contain the coverage error terms of order n−1 for the confidenceintervals obtained by the two methods.

Table 1: P (θnew ≤ θEcorr) when α = 0.95α = 0.95 n = 15 n = 30 n = 75 n = 150 n = 500γX = 0.1 0.2461585 0.2466465 0.2470814 0.2473014 0.2475421γX = 0.5 0.2397286 0.2420325 0.2441252 0.2451973 0.2463814γX = 1 0.2191781 0.2271931 0.2405524 0.2426289 0.2449492γX = 2 0.1940548 0.2083596 0.2220317 0.2292854 0.2421463γX = 4 0.1531201 0.1756957 0.1990799 0.2121904 0.2275920γX = 6 0.1222838 0.1488952 0.1787612 0.1964792 0.2181277γX = 10 0.08131108 0.10902326 0.14503125 0.16887748 0.20045674γX = 15 0.05257063 0.07680105 0.11329497 0.14062359 0.18059000γX = 20 0.03635416 0.05635855 0.09006719 0.11805326 0.16298357γX = 50 0.00864711 0.01547122 0.03112579 0.04931212 0.09264032γX = −0.1 0.2495379 0.2490361 0.2485927 0.2483700 0.2481274γX = −0.5 0.2566349 0.2539837 0.2516825 0.2505409 0.2493081γX = −1 0.2802507 0.2703587 0.2556722 0.2533178 0.2508028γX = −2 0.3165029 0.2948042 0.2766470 0.2678903 0.2538560γX = −4 0.3996042 0.3493397 0.3085325 0.2894824 0.2698846γX = −6 0.4922321 0.4104844 0.3434120 0.3126079 0.2816008γX = −10 0.6701141 0.5427119 0.4208967 0.3632395 0.3064168γX = −15 0.8195909 0.6931626 0.5256882 0.4333398 0.3399659γX = −20 0.8963340 0.8001726 0.6269255 0.5075826 0.3761091γX = −50 0.9859162 0.9697800 0.9178636 0.8359562 0.6147019

31

Table 2: P (θnew ≤ θEcorr) when α = 0.99α = 0.99 n = 15 n = 30 n = 75 n = 150 n = 500γX = 0.1 0.1639229 0.1649940 0.1659530 0.1664393 0.1669728γX = 0.5 0.1502835 0.1550681 0.1595138 0.1618278 0.1644116γX = 1 0.1262885 0.1370188 0.1519811 0.1563255 0.1612902γX = 2 0.0967257 0.1128146 0.1300353 0.1399258 0.1553075γX = 4 0.06046637 0.07878596 0.10215891 0.11745456 0.13756958γX = 6 0.04097709 0.05744783 0.08158069 0.09931798 0.12492985γX = 10 0.02240486 0.03418956 0.05479085 0.07284335 0.10368843γX = 15 0.01282831 0.02074561 0.03628517 0.05187652 0.08328521γX = 20 0.008322046 0.013965375 0.025818012 0.038719667 0.067999284γX = 50 0.001777641 0.003268604 0.006983243 0.011876996 0.026872164γX = −0.1 0.1714419 0.1703105 0.1693153 0.1688168 0.1682749γX = −0.5 0.1879753 0.1816847 0.1763341 0.1737183 0.1709231γX = −1 0.2240299 0.2057937 0.1856757 0.1801257 0.1743163γX = −2 0.2972269 0.2523213 0.2172884 0.2013779 0.1813850γX = −4 0.4814233 0.3693050 0.2804016 0.2417861 0.2049413γX = −6 0.6624912 0.5049659 0.3560668 0.2889614 0.2265836γX = −10 0.8704285 0.7391605 0.5270488 0.4005038 0.2759957γX = −15 0.9514307 0.8868428 0.7150916 0.5527714 0.3484026γX = −20 0.9763116 0.9437372 0.8343956 0.6875119 0.4293880γX = −50 0.9972413 0.9939490 0.9820008 0.9574301 0.8227825

32

0.5 0.6 0.7 0.8 0.9 1.0

0.0

0.2

0.4

0.6

0.8

1.0

alpha

P

gamma=1gamma=10gamma=20gamma=50

0.5 0.6 0.7 0.8 0.9 1.0

0.0

0.2

0.4

0.6

0.8

1.0

alpha

P

gamma=−1gamma=−10gamma=−20gamma=−50

Figure 1: P (θnew ≤ θEcorr) when n = 3033

0.5 0.6 0.7 0.8 0.9 1.0

0.0

0.2

0.4

0.6

0.8

1.0

alpha

P


0.5 0.6 0.7 0.8 0.9 1.0

0.0

0.2

0.4

0.6

0.8

1.0

alpha

P



C Appendix: Simulation results

This appendix contains some simulation results regarding the actual coverage of someconfidence intervals for some combinations of α, n and γX . For each combination Rwas used to simulate n random variables 1 000 000 times and each time the confidenceintervals were computed. The figures in the tables are the amount of times that themean was in the confidence intervals.

The distribution used for the random variables was the normal-inverse Gaussiandistribution (the normal variance-mixture when the mixing density is inverse Gaussian).It is found in the HyperbolicDist library in R. The inverse Gaussian variables weresimulated using the SuppDists library.

α = 0.95 Normal Edgeworth New method New methodapproxim. correction a = 1 a = 0.1

γX = −10.4, 0.933507 0.961781 0.95385 0.938311n = 30γX = −10.4, 0.933039 0.955999 0.951491 0.93839n = 75γX = −20.1, 0.941777 0.973632 0.959583 0.945165n = 30γX = −20.1, 0.934541 0.964937 0.954753 0.939006n = 75γX = −40.3, 0.959887 0.986872 0.971116 0.961884n = 30γX = −40.3, 0.947646 0.978591 0.96308 0.950522n = 75

hh

α = 0.99 Normal Edgeworth New method New methodapproxim. correction a = 1 a = 0.1

γX = −10.4, 0.966634 0.991678 0.982543 0.96997n = 30γX = −10.4, 0.971143 0.990596 0.985334 0.974616n = 75γX = −20.1, 0.96562 0.994081 0.980019 0.968311n = 30γX = −20.1, 0.965572 0.99233 0.981408 0.968813n = 75γX = −40.3, 0.973474 0.997139 0.982817 0.975019n = 30γX = −40.3, 0.967729 0.99532 0.980365 0.969895n = 75

36

References

[1] Abramovitch, L., Singh, K. (1985), Edgeworth Corrected Pivotal Statistics and theBootstrap, The Annals of Statistics, Vol. 13, pp. 116-132

[2] Bhattacharya, R. N., Ghosh, J.K. (1978), On the Validity of the Formal EdgeworthExpansion, The Annals of Statistics, Vol. 6, pp. 434-451

[3] Bickel, P.J., Freedman, D.A. (1981), Some Asymptotic Theory for the Bootstrap,The Annals of Statistics, Vol. 9, pp. 1196-1217

[4] Chebyshev, P.L. (1890), Sur Deux Theoremes Relatifs aux Probabilites, Acta Math-ematica, Vol. 14, pp. 305-315

[5] Chhikara, R.S., Folks, J.L. (1989), The Inverse Gaussian Distribution, MercelDekker, ISBN 0-8247-7997-5

[6] Cornish, E. A., Fisher, R. A. (1938), Moments and Cumulants in the Specificationof Distributions, Extrait de la Revue de l’Institute International de Statistique, Vol.5, pp. 307-320

[7] Cramer, H. (1928), On the composition of elementary errors, Skandinavisk Aktua-rietidskrift, Vol. 11, pp. 13-74 and 141-180

[8] Cramer, H. (1946), Mathematical Methods of Statistics, Princeton University Press,ISBN 0-691-00547-8

[9] Cramer, H. (1970), Random Variables and Probability Distributions, CambridgeUniversity Press, ISBN 0-521-07685-4

[10] DasGupta, A. (2008), Asymptotic Theory of Statistics and Probability, Springer,ISBN 978-0-387-75970-8

[11] Dubkov, A. A., Malakhov, A. N. (1976), Properties and interdependence of thecumulants of a random variable, Radiophysics and Quantum Electronics, Vol. 19,pp. 833-839

[12] Edgeworth, F.Y. (1894), The Asymmetrical Probability Curve, Proceedings of theRoyal Society of London, Vol. 56, pp. 271-272

[13] Edgeworth, F.Y. (1905), The Law of Error, Cambridge Philosophical Transactions,Vol. 20, pp. 36-65

[14] Edgeworth, F.Y. (1907), On the Representation of Statistical Frequency by a Series,Journal of the Royal Statistical Society, Vol. 70, pp. 102-106

[15] Efron, B. (1979), Bootstrap Methods: Another Look at the Jackknife, The Annalsof Statistics, Vol. 7, pp. 1-26

37

[16] Esseen, C.-G. (1945), Fourier Analysis of Distribution Functions - A MathematicalStudy of the Laplace-Gaussian Law, Acta Mathematica, Vol. 77, pp. 1-125

[17] Fisher, R.A., Cornish, E.A. (1960), The Percentile Points of Distributions HavingKnown Cumulants, Technometrics, Vol. 2, pp. 209-225

[18] Gut, A. (2005), Probability: A Graduate Course, Springer-Verlag, ISBN 978-0-387-22833-4

[19] Hall, P. (1983), Inverting an Edgeworth Expansion, The Annals of Statistics, Vol.11, pp. 569-576

[20] Hall, P. (1986), On the Bootstrap and Confidence Intervals, The Annals of Statistics,Vol. 14, pp. 1431-1452

[21] Hall, P. (1988), Theoretical Comparison of Bootstrap Confidence Intervals, TheAnnals of Statistics, Vol. 16, pp. 927-953

[22] Hall, P. (1992), The Bootstrap and Edgeworth Expansion, Springer-Verlag, ISBN0-387-97720-1

[23] Hall, P. (1992), On the Removal of Skewness by Transformation, Journal of theRoyal Statistical Society B, Vol. 54, pp. 221-228

[24] Kreyszig, E. (1978), Introductory Functional Analysis with Applications, John Wiley& Sons, ISBN 0-471-50459-9

[25] MacGillivray, H.L. (1981), The Mean, Median, Mode Inequality and Skewness for aClass of Densities, Australian Journal of Statistics, Vol. 23, pp. 247-250

[26] Pearson, K. (1895), Skew Variation in Homogeneous Material, Philosophical Trans-actions of the Royal Society of London. A, Vol. 186, pp. 343-414

[27] Pfanzagl, J. (1979), Nonparametric Minimum Constrast Estimators, Selecta Statis-tica Canadiana, Vol. 5, pp. 105-140

[28] Singh, K. (1981), On the Asymptotic Accuracy of Efron’s Bootstrap, The Annals ofStatistics, Vol. 9, pp. 1187-1195

[29] Withers, C.S. (1983), Expansions for the Distribution and Quantiles of a RegularFunctional of the Empirical Distribution with Applications to Nonparametric Con-fidence Intervals, The Annals of Statistics, Vol. 11, pp. 577-587

38

a simulation method for skewness correction - diva portal302313/fulltext01.pdf · the notion of...

Documents