msc qt: statistics part ii statistical inference (weeks 3 and 4) · 2015-09-10 · msc qt:...

51
MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics, Mathematics and Statistics Malet Street, London WC1E 7HX September 2015 MSc Economics & MSc Financial Economics (FT & PT2) MSc Finance & MSc Financial Risk Management (FT & PT2) MSc Financial Engineering (FT & PT1) PG Certificate in Econometrics (PT1)

Upload: others

Post on 04-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

MSc QT: StatisticsPart II Statistical Inference

(Weeks 3 and 4)

Sotiris MigkosDepartment of Economics, Mathematics and Statistics

Malet Street, London WC1E 7HX

September 2015

MSc Economics & MSc Financial Economics (FT & PT2)MSc Finance & MSc Financial Risk Management (FT & PT2)

MSc Financial Engineering (FT & PT1)PG Certificate in Econometrics (PT1)

Page 2: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,
Page 3: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

Contents

Introduction v

1 Sampling Distributions 1Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . 11.2 Sampling Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . 21.3 Sampling Distributions Derived from the Normal . . . . . . .. . . . . . . . . . . 5

1.3.1 Chi-Square . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.3.2 Student-t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.3.3 F-distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 7

Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Large Sample Theory 11Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 112.1 Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 112.2 The Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . 142.3 The Normal Approximation to the Binomial Distribution .. . . . . . . . . . . . . 15Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3 Estimation 19Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 193.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . 193.2 Evaluation Criteria for Estimators . . . . . . . . . . . . . . . . .. . . . . . . . . 203.3 Confidence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . 21Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4 Hypothesis Testing 27Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 274.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . 274.2 The Elements of a Statistical Test . . . . . . . . . . . . . . . . . . .. . . . . . . . 274.3 Duality of Hypothesis Testing and Confidence Intervals .. . . . . . . . . . . . . . 324.4 Attained Significance Levels: P-Values . . . . . . . . . . . . . .. . . . . . . . . . 324.5 Power of the Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 33Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

iii

Page 4: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

iv QT 2015: Statistical Inference

A Exercise Solutions 37A.1 Sampling Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . 37A.2 Large Sample Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . 39A.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 40A.4 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . 44

Page 5: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

Introduction

Course Content

This part of the course consists of five lectures, followed bya closed book exam. The topics coveredare

1. Sampling distributions.

2. Large sample theory.

3. Estimation.

4. Hypothesis Testing.

The last lecture will cover exercises based on the topics above.

Textbooks

Lecture notes are provided, however these notes are not a substitute for a textbook. The requiredtextbook for this part of the course is.

• Wackerly, D., Mendenhall, W. and Schaeffer, R. (2008).Mathematical Statistics with Appli-cations, 7th ed., Cengage. (HenceforthWMS)

Students who desire a more advanced treatment of the materials might want to consider:

• Casella, G. and Berger, R. (2008).Statistical Inference, 2nd ed., Duxbury press. (HenceforthCB)

• Rice, J. (2006).Mathematical Statistics and Data Analysis, 3rd. ed., Cengage. (HenceforthR)

Furthermore, the following books are recommended for students that plan to take further coursesin econometrics. The appendices of these books also containsummaries of the material covered inthis class.

• Greene, W. (2011).Econometric Analysis, 7th ed., Prentice-Hall. (HenceforthG)

• Verbeek, M. (2012).A Guide to Modern Econometrics, 4th ed., Wiley. (HenceforthV)

v

Page 6: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

vi QT 2015: Statistical Inference

Online Resources

The primary resources for this part of the course are contained in this syllabus. However, furtherresources can be found online at eitherwww.ems.bbk.ac.uk/for_students/presess/ and thecourse page at the virtual learning environment Moodle (login via moodle.bbk.ac.uk/ )

Instructor

The instructor for this part of this course is

• Sotiris Migkos, [email protected]

Page 7: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

Chapter 1

Sampling Distributions

Literature

Required Reading

• WMS, Chapter 7.1 – 7.2

Recommended Further Reading

• WMS, Chapters 4 and 12

• CB, Chapter 5.1 – 5.4

• R, Chapters 6 – 7

1.1 Introduction

A statistical investigation normally starts with some measures of interest of a distribution. Thetotality of elements about which some information is desired is called apopulation. Often we onlyuse a small proportion of a population, known as asample, because it is impractical to gather dataon the whole population. We measure the attributes of this sample and draw conclusions or makepolicy decisions based on the data obtained. That is, with statistical inference weestimatetheunknownparametersunderlying the statistical distributions of the sample. Wecan then measuretheir precision, test hypotheses on them to and use them to generate forecasts.

Definition 1.1 PopulationA population(of size N), x1, x2 . . . , xN is the totality of elements that we are interested in. Thenumerical characteristics of a population are calledparameters. Parameters are often denotedby Greek letters such asθ.

Definition 1.2 SampleA sample (of size n) is a set of random variables, X1,X2, . . . ,Xn, that are drawn from thepopulation. The realization of the sample is denoted by x1, . . . , xn.

1

Page 8: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

2 QT 2015: Statistical Inference

The method of sampling, known sometimes as the design of the experiment, will affect thestructure of the data that you measure, and thus the amount ofinformation and the likelihood ofobserving a certain sample outcome. The type of sample you collect may have profound effects onthe way you can make inferences based on that sample. For the moment we will concern ourselvesonly with the most basic of sampling methodssimple random sampling.

Definition 1.3 Random SampleThe random variables X1, · · · ,Xn are called arandom sampleof size n from the populationf (x) if X1, · · · ,Xn are mutually independent random variables and the marginalpdf or pmfof each Xi is the same function f(x). Alternatively X1, · · · ,Xn are called independent andidentically distributedvariables with pdf of pmf f(x). This is commonly abbreviated to iidrandom variables.

The joint density of the realized xi ’s in a random sample sample has the form:

f (x1, x2, · · · , xn) =n

i=1

fXi (xi) (by independence)

=

n∏

i=1

f (xi) (by identicality). (1.1)

Of course, in economics and finance one normally does not havemuch control on how the data iscollected and the data at hand is often time-series data, which is in most cases neither independentnor identically distributed. Although addressing these issues is extremely important in empiricalanalysis, this course will ignore such considerations to focus on the basic issues.

1.2 Sampling Distributions

When drawing a sample from a population, a researcher is normally interested in reducing the datainto some summary measures. Any well-defined measure may be expressed as a function of therealized values of the sample. As the function will be based on a vector of random variables, thefunction itself, called astatistic, will be a random variable as well.

Definition 1.4 Statistic and Sampling DistributionLet X1, . . . ,Xn be a sample of size n and T(x1, . . . , xn) be a real-valued or vector-valued func-tion whose domain includes the sample space of(X1, . . . ,Xn), that does not include any un-known parameters, then the random variable X= T(x1, . . . , xn) is called a statistic.

The probability distribution of a statistic is called thesampling distributionof X.

The analysis of these statistics and their sampling distributions is at the very core of economet-rics. As the definition of a statistic is very broad, it can include a wide range of different mea-sures. The most two common statistics are probably the meanX and the sample varianceS2. Otherexamples include order statistics such as the smallest observation in the sample,X(1), the largestobservation in the sample,X(n), and the median,X(n/2); correlations,Corr(X,Y), and covariances,Cov(X,Y), between two sequences of random variables are also commonstatistics. Statistics donot need to be scalar, but may also be vector-valued, returning for instance all the unique valuesobserved in the sample or all the order statistics of the sample.

Page 9: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

1. Sampling Distributions 3

Note the important difference between thesampling distributionwhich measures the probabilitydistribution of the statisticT(x1, . . . , xn) and the distribution of the population, which measures themarginal distribution of eachXi .

The following two sections consider the sampling distributions of the two most important statis-tics, the sample mean and the sample variance, on the assumption that the sample is drawn froma normal population. The main features of these sampling distributions are summarized by thefollowing theorem.

Theorem 1.1 The sample mean and the sample variance of a random normal sample have thefollowing three characteristics:

1. E[X] = µ, andX has the sampling distributionX ∼ N(µ, σ2/n),

2. E[S2] = σ2, andσ2 has the sampling distribution(n− 1)S2/σ2 ∼ χ2n−1,

3. X and S2 are independent random variables.

As some of the most common statistics, such as the sample meanand sample total are linearcombinations of the individual sample points, the following theorem is of great value in determiningthe sampling distribution of statistics.

Theorem 1.2 If X1, . . . ,Xn are random variables with defined means,E(Xi ) = µi, and definedvariances, Var(Xi) = σ2

i ; then a linear combination of those random variables,

Z = a+n

i=1

biXi , (1.2)

will have the following mean and variance:

E(Z) = a+n

i=1

[biE(Xi)], (1.3)

Var(Z) =n

i=1

n∑

j=1

[bib jCov(XiX j)] (1.4)

if the Xi are independent, the variance reduces to

Var(Z) =n

i=1

[

b2i Var(Xi)

]

. (1.5)

Sample Mean

Corollary 1.1 If X1, . . . ,Xn is a random sampledrawn from a population with meanµ and

Page 10: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

4 QT 2015: Statistical Inference

varianceσ2. Using theorem 1.2 it can be shown that the the mean of this sample,

Xn = n−1n

i=1

Xi , (1.6)

will have expectation

E(

Xn

)

= µ, (1.7)

and variance

Var(

Xn

)

= σ2Xn

=σ2

n. (1.8)

Let us consider how a sampling distribution may look like. Asan example take the case ofthe sample meanXn of a random sample drawn from a normally distributed population. Combinedwith the knowledge that linear combinations of normal variates are also normally distributed, thesampling distribution ofX will be equal to

Xn ∼ N

(

µ,σ2

n

)

. (1.9)

We can now go one step further and calculate thestandardizedsample mean. Subtracting theexpected value, which is the population meanµ, and dividing by the (asymptotic) standard errorcreates a random variable with a standard normal distribution:

Z =Xn − µσ/√

n∼ N(0, 1). (1.10)

Of course, in reality one does not generally knowσ, in which case it is common practice toreplace it with it’s sample counterpartS, which will give the following sampling distribution:

Z =Xn − µS/√

n∼ tn. (1.11)

The details on why the sampling distribution changes from from a normal to a t-distribution arediscussed in the next section.

Sample Variance

Corollary 1.2 If X1, . . . ,Xn is a random sampledrawn from a population with meanµ andvarianceσ2, then the sample variance

S2n = (n− 1)−1

n∑

i=1

(

Xi − Xn

)2, (1.12)

Page 11: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

1. Sampling Distributions 5

will have the following expectation:

E(

S2)

= σ2. (1.13)

Note that to calculate the sample variance we divide by n− 1 and not n, proof of this isprovided at the end of this chapter.

If the sample is random and drawn from a normal population, then it can also be shownthat the sampling distribution is as follows:

(n− 1)S2

σ2∼ χ2

n−1. (1.14)

An intuition of this result is provided in WMS; the proof can be found in e.g. Casella and Berger,chapter 5.

Finite Population Correction

As a short distraction, notice that if the whole population is sampled, the estimation error of thesample mean will be, logically, equal to zero. Similarly, ifa large proportion of the populationis sampled, without replacement, the standard error calculated above will over-estimate the truestandard error. In such cases, the standard error should be adjusted using a so-calledfinite populationcorrection. Taking the standard error of the sample mean as an example:

σX =

(

1−n− 1N − 1

)

σ√

n. (1.15)

When the sampling fractionn/N approaches zero, then the correction will approach 1. So formost applications,

σX ≈σ√

n, (1.16)

which is the definition of the standard error as given in the previous section. For most samplesconsidered, the sampling fraction will be very small. Thus,the finite sample correction will beneglected throughout most of this syllabus.

1.3 Sampling Distributions Derived from the Normal

The normal distribution plays a central role in econometrics and statistics, for reasons that we willexplore in more depth in the next chapter. However, there area number of other distributions thatfeature as sampling distributions for various (test) statistics. As it turns out, the three most commonof these distributions can actually be derived from the normal distribution.

1.3.1 Chi-Square

Page 12: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

6 QT 2015: Statistical Inference

Definition 1.5 Chi-Square distributionLet Zi ∼ iidN(0, 1). The distribution of U=

∑ni=1 Z2

i is called the chi-square (χ2) distribution,with n degrees of freedom. This is denoted withχ2

n.

Notice that the definition above implies that IfU1,U2, . . . ,Un are independent chi-square ran-dom variables with 1 degree of freedom, the distribution ofV =

Ui will be a chi-square distribu-tion with n degrees of freedom. Also, for large degrees of freedomn the chi-square distribution willconverge to a normal distribution, but this convergence is relatively slow.

The moment generation function of aχ2n distribution is

M(t) = (1− 2t)−n/2. (1.17)

This implies that ifV ∼ χ2n, then

E(Vn) = n, and (1.18)

Var(Vn) = 2n. (1.19)

Like the other distributions that are derived from the normal distribution, the chi-square distribu-tion often appears as the distribution of a test statistic. For instance, testing for the joint significanceof two (or more) independent normally distributed variables. If Za ∼ N(µa, σa) andZb ∼ N(µb, σb)andV is defined as

V2 =

(

Za − µa

σa

)2

+

(

Zb − µb

σb

)2

, (1.20)

thenV ∼ χ22 (Remember that (Z − µ)/σ ∼ N(0, 1) ).

Also, if X1,X2, . . . ,Xn is a sequence of independent normally distributed variables, then theestimated variance

(n− 1)S2

σ2∼ χ2

n−1. (1.21)

1.3.2 Student-t

Definition 1.6 Student t distributionLet Z∼ N(0, 1) and Un ∼ χ2

n, with Z and Un independent, then

Tn =Z

√Un/n

, (1.22)

will have a t distribution with n degrees of freedom, often denoted by tn.

The mean an variance of a t-distribution withn degrees of freedom is

E(Tn) = 0, and (1.23)

Var(T) =n

n− 2, n > 2. (1.24)

Page 13: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

1. Sampling Distributions 7

Like the normal, the expected value of the t-distribution is0, and the distribution is symmetricaround it’s mean, implying thatf (t) = f (−t). In contrast to the normal, thet distribution has moreprobability mass in it’s tails, a property called fat-tailness. As the degrees of freedom,n, increasesthe tails become lighter.

Indeed in appearance the studentt distribution is very similar to the normal distribution; actuallyin the limit n −→ ∞ the t distribution converges in distribution to a standard normal distribution.Already for values ofn as small as 20 or 30, thet distribution is very similar to a standard normal.

Remember that for a random sample drawn from a normal distribution, Z = (X − µ)/(σ/√

n) ∼N(0, 1). However in reality we do not have information aboutσ; thus we normally substitute thesample estimate

√(S2) = S for σ. ThusT = (X − µ)/(S/

√n) will have a t-distribution (proof of

this can be found at the end of the chapter).

1.3.3 F-distribution

Definition 1.7 F distributionLet Un ∼ χ2

n and Vm ∼ χ2m, and let Un and Vm be independent from each other, then

Wn,m =Un/nVm/m

, (1.25)

will have a F distribution with m and n degrees of freedom, often denoted by Fn,m.

The mean an variance of an F-distribution withn andm degrees of freedom is

E(Fn,m) =m

m− 2, m> 2 (1.26)

Var(Fn,m) = 2( mm− 2

)2 n+m− 2n(m− 4)

, m> 4. (1.27)

Under specific circumstances, theF distribution converges to either at or a χ2 distribution.Particularly

F1,m = t2m, (1.28)

and

nFn,md−→ χ2

n. (1.29)

The F-distribution often appears when investigating variances. Recall that the standardizedvariance of a normal sample will have a Chi-square distribution. Hence the ratio of two variancesof independent samples can be expressed as a F-distribution.

[nS21/σ

21]/n

[mS22/σ

22]/m

=Un/nVm/m

= Fn,m

where the degrees of freedom are a function of the two sample sizes:n = n1 − 1 andn = n2 − 1.

Page 14: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

8 QT 2015: Statistical Inference

Problems

1. The fill of a bottle of soda, dispensed by a certain machine,is normally distributed with meanµ = 100 and varianceσ2 = 9 (measured in centiliters).

(a) Calculate the probability that a single bottle of soda contains less than 98cl

(b) Calculate the probability that a random sample of 9 soda bottles contains, on average,less than 98cl.

(c) How does your answer in (b) change if the variance of 9 was an estimate (i.e.S2 = 9),rather than a population parameter.

2. LetX1,X2, ...,Xm andY1,Y2, ...,Yn be two normally distributed independent random samples,with Xi ∽ N

(

µ1, σ21

)

andYi ∽ N(

µ2, σ22

)

. Suppose thatµ1 = µ2 = 10, σ21 = 2,σ2

2 = 2.5, andm= n.

(a) FindE(X) andVar(X).

(b) FindE(X − Y) andVar(X − Y).

(c) Find the sample sizen, such thatσ(X−Y) = 0.1.

3. Let S21 andS2

2 be sample variance of two random samples drawn from a normal populationwith population varianceσ2 = 15. Let the sample size ben = 11.

(a) finda such that Pr[

S21 ≤ a

]

= 0.95

(b) findb such that Pr[

S21/S

22 ≤ b

]

= 0.95

4. Let Z1,Z2,Z3,Z4 be a sequence of independent standard normal variables. Derive distribu-tions for the following random variables.

(a) X1 = Z1 + Z2 + Z3 + Z4.

(b) X2 = Z21 + Z2

2 + Z23 + Z2

4.

(c) X3 =Z2

1

(Z22 + Z2

3 + Z24)/3

.

(d) X4 =Z1

Z22 + Z2

3 + Z24/√

3.

Proofs

Proof 1.1 To prove that the sample variance S2 has expectationσ2, note that

S2n =

n− 1∑n

i=1

(

Xi − Xn

)2

=n− 1

(

(

X2i

)

− nX2n

) .

Page 15: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

1. Sampling Distributions 9

Therefore, by taking expectations we get

E(S2n) = E

n− 1(

(

X2i

)

− nX2n

)

=n− 1

(

(

E[

X2i

])

− nE[X2n])

Recall that Var(Z) = E(Z2) − E(Z)2, so

E(X2i ) = σ2 + µ2 and

E(X2n) = n−1σ2 + µ2.

Substitute to get

E(S2n) =

n− 1[

n(

σ2 + µ2) − n(

n−1σ2 + µ2)]

=n− 1n− 1

σ2

= σ2.

Proof 1.2 To prove T= (X − µ)/(S/√

n) ∼ t(n− 1) rewrite T:

X − µS/√

n=

(X − µ)/(σ/√

n)

(S/√

n)/(σ/√

n)

=Z

S/σ

=Z

S2/σ2

=Z

(n−1)S2/σ2

n−1

=Z

Un−1n−1

,

where

Z =

(

X − µ)

σ/√

n∼ N(0, 1) and

Un−1 = (n− 1)S2

σ2∼ χ2

n−1.

Page 16: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

10 QT 2015: Statistical Inference

Thus

X − µSn/√

n∼ tn−1

Page 17: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

Chapter 2

Large Sample Theory

Literature

Required Reading

• WMS, Chapter 7.3 – 7.6

Recommended Further Reading

• CB, Chapter 5.5

• R, Chapter 5

2.1 Law of Large Numbers

In many situations it is not possible to derive exact distributions of statistics with the use of arandom sample of observations. This problem disappears, inmost cases, if the sample size islarge,because we can derive an approximate distribution. Hence the need for large sample or asymptoticdistribution theory. Two of the main results of large sampletheory are the Law of Large Numbers(LLN), discussed in this section, and the Central Limit Theory, described in the next section.

As large sample theory builds heavily on the notion of limits, let us first define what they are.

Definition 2.1 Limit of a sequenceSuppose a1, a2, ...., an constitute a sequence of real numbers. If there exists a realnumber a suchthat for every realǫ > 0, there exists an integer N(ǫ) with the property that for all n> N(ǫ), wehave| an − a |< ǫ, then we say that a is the limit of the sequence{an} and write limn−→∞an = a.

Intuitively, if an lies in anǫ neighborhood ofa (a− ǫ, a + ǫ) for all n > N(ǫ), thena said to bethe limit of the sequence{an}. Examples of limits are

limn−→∞

[

1+

(

1n

)]

= 1, and (2.1)

limn−→∞

[

1+(an

)]n= ea. (2.2)

The notion of convergence is easily extended to that of a function f (x).

11

Page 18: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

12 QT 2015: Statistical Inference

Definition 2.2 Limit of a functionThe function f(x) has the limit A at the point x0, if for everyǫ > 0 there exists aδ(ǫ) > 0 suchthat | f (x) − A |< ǫ whenever0 <| x− x0 |< δ(ǫ)

One of the core principles in statistics is that the sample estimator will converge to the the ‘true’value when the sample gets larger. For instance, if a coin is flipped enough times, the proportion oftimes it comes up tails should get very close to 0.5. The Law ofLarge Numbers is a formalizationof this notion.

Weak Law of Large Numbers

The concept of convergence in probability can be used to showthat, under very general conditions,the sample mean converges to the population mean, a result that is known as The Weak Law ofLarge Numbers (WLLN). This property of convergence is also referred to aconsistency, will willbe treated in more detail in the next chapter.

Theorem 2.1 (Weak Law of Large Numbers)Let X1,X2, . . . ,Xn be iid random variables withE(Xi) = µ and Var(Xi) = σ2 < ∞. DefineXn = n−1 ∑n

i=1 Xi. Then for everyǫ > 0,

limn−→∞

Pr(|Xn − µ| < ǫ) = 1;

that is,Xn converges in probability toµ

As stated, the weak law of large numbers relies on the notion of convergence in probability.This type of convergence is relatively weak and so normally not too hard to verify.

Definition 2.3 Convergence in Probabilityif

limn−→∞

Pr[| Xn − x |≥ ǫ] = 0 for all ǫ > 0,

the sequence of random variables Xn is said to converge in probability to the real number x .We write

Xnp−→ x or plimXn = x.

Convergence in probability implies that it becomes less andless likely that the random variable(Xn − x) lies the outside the interval (−ǫ,+ǫ) as the sample size gets larger and larger. There existdifferent equivalent definitions of convergence in probability. Some equivalent definitions are givenbelow:

1. limn−→∞ Pr[|Xn − x| < ǫ] = 1, ǫ > 0.

2. Givenǫ > 0 andδ > 0, there existsN(ǫ, δ) such that Pr[| Xn − x |> ǫ] < δ, for all n > N.

3. Pr[| Xn − x |< ǫ] > 1− δ , for all n > N, that is, Pr[| XN+1 − x |< ǫ] > 1− δ, Pr[| XN+2 − x |<ǫ] > 1− δ, and so on.

Page 19: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

2. Large Sample Theory 13

Theorem 2.2 If Xnp−→ X and Yn

p−→ Y, then

(a) (Xn + Yn)p−→ (X + Y),

(b) (XnYn)p−→ XY, and

(c) (Xn/Yn)p−→ X/Y (if Yn,Y , 0).

Theorem 2.3 If g(·) is a continuous function, then Xnp−→ X implies that g(Xn)

p−→ g(X). In

other words, convergence in probability is preserved undercontinuous transformations.

Strong Law of Large Numbers

Like in the case of convergence in probability, almost sure convergence can be used to prove theconvergence (almost surely) of the sample mean to the population mean. This stronger result isknown as thethe Strong Law of Large Numbers (SLLN).

Definition 2.4 Almost Sure Convergenceif

Pr[

limn−→∞

Xn = x]

= 1,

the sequence of random variables Xn is said to converge almost surely to the real number x.and is written as

Xna.s.−→ x.

In other words, almost sure convergence implies that the sequenceXn may not converge every-where tox, but the points where it does not converge form a set of measure zero in the probabilitysense. More formally, givenǫ, andδ > 0, there existsN such that Pr[| XN+1 − x |< ǫ, | XN+2 − x |<ǫ, . . .] > (1 − δ), that is, the probability of these events jointly occurring can be made arbitrarily

close to 1.Xn is said to converge almost surely to the random variableX if (Xn − X)a.s−→ 0.

Do not be fooled by the similarity between the definitions of almost sure convergence and con-vergence in probability. Although they look the same, convergence in probability is much weakerthan almost sure convergence. For almost sure convergence to happen, theXn must converge for allpoint in the sample space (that have a strictly positive probability). For convergence in probabilityall that is needed is for the likelihood of convergence to increase as the sequence gets larger.

Theorem 2.4 (Strong Law of Large Numbers)Let X1,X2, . . . ,Xn be iid random variableswith

E(Xi) = µ and

Var(Xi) = σ2 < ∞.

Page 20: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

14 QT 2015: Statistical Inference

DefineXn = n−1 ∑ni=1 Xi . Then for everyǫ > 0,

Pr[

limn−→∞

(∣

∣Xn − µ∣

∣ < ǫ)

= 1]

; (2.3)

that is, the strong law of large numbers states thatXn converges almost surely toµ:

(

Xn − µ) a.s.−→ 0. (2.4)

The SLLN applies under fairly general conditions; some sufficient cases are outlined below.

Theorem 2.5 If the X′s are iid, then a necessary and sufficient condition for

(

Xn − µ) a.s.−→ 0 is

that E|Xi − µ| < ∞ for all i.

Theorem 2.6 (Kolmogorov’s Theorem on SLLN) If the X′s are independent (but not neces-

sarily identical) with finite variances, and if∑∞

n=1 Var(Xn)/n2 < ∞, then(

Xn − EXn

) a.s.−→ 0.

A third form of point-wise convergence is the concept of convergence in mean.

Definition 2.5 Convergence in Mean(r)The sequence of random variables Xn is said to converge in mean of order(r) to x (r ≥ 1), and

designated Xn(r)−→ x, if E[ | Xn − x |r ] exists and limn−→∞E[ | Xn − x |r ] = 0, that is, if r

th moment of the difference tends to zero. The most commonly used version ismean squaredconvergence, which is when r= 2.

For example, the sample mean (Xn) converges in mean square toµ, becauseVar(Xn) = E[(Xn−µ)2] = (σ2/n) tends to zero asn goes to infinity. Like convergence almost surely, convergence inMean (r) is a stronger concept than convergence in probability.

2.2 The Central Limit Theorem

Perhaps the most important theorem in large sample theory isthe central limit theorem, which im-plies, under quite general conditions, that the standardized mean of a sequence of random variables(for example the sample mean) converges in distribution to astandard normal distribution, eventhough the population is not normal. Thus, even if we did not know the statistical distribution ofthe population from which a sample is drawn, we can approximate quite well the distribution of thesample mean by the normal distribution by having a large sample.

In order to establish this result, we rely on the concept of convergence in distribution.

Definition 2.6 Convergence in DistributionLet {Xn} be a a sequence of random variables whose CDF is Fn(x), and let the CDF FX(x)correspond to the random variable X. We say that Xn converges in distribution to if

limn−→∞

Fn(x) = FX(x)

Page 21: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

2. Large Sample Theory 15

at all points x at which FX(x) is continuous. This can be written as

Xnd−→ X

Sometimes, convergence in distribution is also referred toasconvergence in law.

Intuitively, convergence in distribution occurs when the distribution of Xn comes closer andcloser to that ofX asn increased indefinitely. Thus,FX(x) can be taken to be an approximation tothe distribution ofXn whenn is large. The following relations hold for convergence in distribution:

Theorem 2.7 If Xnd−→ X and Yn

p−→ c, where c is a non-zero constant, then

(a) (Xn + Yn)d−→ (X + c), and

(b) (Xn/Yn)d−→ (X/c).

Using the definition of convergence in distribution we can now introduce formally one versionof the Central Limit Theorem.

Theorem 2.8 (Central Limit Theorem) Let X1,X2, ...,Xn be iid random variables with meanE(Xi) = µ and a finite varianceσ2 < ∞. Define the standardized sample mean,

Zn =Xn − E(Xn)√

Var(Xn)

Then, under a variety of alternative assumptions

Znd−→ N(0, 1). (2.5)

2.3 The Normal Approximation to the Binomial Distribution

The Bernoulli Distribution

The Bernoulli distribution is a binary distribution, with only two possible outcomes: success (X = 1)with probability p and failure (X = 0) with probability q = 1 − p. The probability density of aBernoulli is

Pr(X = x|p) = px(1− p)1−x; x = 0, 1. (2.6)

for X = 0, 1(failure, success) and 0≤ p ≤ 1.The mean and variance of a Bernoulli distribution are given as:

E(X) = p, (2.7)

Var(X) = p(1− p) = pq. (2.8)

Page 22: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

16 QT 2015: Statistical Inference

The Binomial Distribution

The Binomial distribution can be thought of as a sequence of iid Bernoulli rv of lengthn.

Pr(X = x|n, p) =

(

nx

)

px(1− p)n−x

=n!

x! (n− x)!px(1− p)n−x. (2.9)

x = 0, 1, ..., n (X is the number of success inn trials) 0≤ p ≤ 1.The mean and variance of a binomial distribution are given as:

E(x) = np. (2.10)

Var(x) = npq. (2.11)

Example 2.1 Assume a student is given a test with 10 true-false questions. Also assume thatthe student is totally unprepared for the test and guesses the answer to every question. What isthe probability that the student will answer 7 or more questions correctly?

Let X is the number of questions answered correctly. The testrepresents a binomial exper-iment with n= 10, p = 1/2. So X∼ Bin(n = 10, p = 1/2).

Pr(x ≥ 7) = Pr(x = 7)+ Pr(x = 8)+ Pr(x = 9)+ Pr(x = 10)

=

10∑

k=7

(

10k

) (

12

)k (

12

)10−k

=

10∑

k=7

(

10k

) (

12

)10

= 0.172.

The Normal Approximation

For large sample sizen and number of successesk, it becomes cumbersome to calculate the exactprobabilities of the binomial. However, we can obtain approximate probabilities by invoking CLT.

As stated before, a Binomial(n,p), can be thought of asn independent Bernoulli trails, withsuccess probabilityp. Consequently, whenn is large, the sample average of the Bernoulli trails

1n

n∑

i=1

Xi = X,

will be approximately normal with meanE(X) = p and varianceVar(X) = p(1− p)/n. Thus

X − p√

p(1− p)/n∼ N(0, 1)

Even for fairly low numbers ofnandk the normal approximation is surprisingly accurate. Wack-erly provides the useful rule of thumb that the the approximation is adequate if

n > 9larger of p and q

smaller of p and q(2.12)

Page 23: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

2. Large Sample Theory 17

Example 2.2 Consider again the student from example 2.1. What would be the approximateprobability?

Pr(x ≥ 7) = Pr(x/10≥ 0.7)

Definex = x/10

Pr(x ≥ 0.7) = Pr

x− p√

p(1− p)/n≥

0.7− p√

p(1− p)/n

= Pr

(

Z ≥0.2√

0.025

)

= Pr(Z ≥ 1.26)

= 0.104

If we compare the approximate probability of 0.104 with the exact probability of 0.172 fromthe previous exercise, it becomes clear that there may be a substantial approximation error.However, as n gets larger, this approximation error becomesprogressively smaller.

Problems

1. let X1,X2, . . . ,Xn be an independent sample (i.e. independent but not identically distributed),with E(Xi) = µi andVar(Xi) = σ2

i . Also, letn−1 ∑ni=1 µi −→ µ.

Show that ifn−2∑n

i=1σ2i −→ 0, thenX −→ µ in probability.

2. The service times for customers coming through a checkoutcounter in a retail store are inde-pendent random variables with mean 1.5 minutes and variance 1.0. Use CLT to approximatethe probability that 100 customers can be serviced in less than 2 hours of total service time.

3. Suppose that a measurement has meanµ and varianceσ2 = 25. Let X be the average ofnsuch independent measurements. If we are interested in measuring the sample mean with adegree of precision such that 95% of the time the sample mean lies within 1.5 units (in theabsolute sense) from the true population mean, how large should we make our sample size?In on other words how large shouldn be so that Pr(|X − µ| < 1.5) = 0.95 ?

Proofs

The Weak Law of Large numbers can be proven by use of Chebychev’s Inequality.

Proof 2.1 (Weak Law of Large Numbers) The Weak Law of Large numbers can be provenby use of Chebychev’s Inequality:

Pr[

g(X) ≥ ǫ]

≤E

[

g(X)]

ǫ, ǫ > 0.

Page 24: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

18 QT 2015: Statistical Inference

For instance, let g(X) be |X − E(X)|, in this case Chebychev’s inequality reduces to

Pr [|X − E(X)| ≥ ǫ] ≤ E[|X − E(X)|]ǫ

.

Using Chebychev’s inequality; for everyǫ > 0 we have

Pr[∣

∣X − E

(

X)

∣≥ ǫ

]

≤E

[

(

X − µ)2

]

ǫ2,

with

E[

(

X − µ)2]

ǫ2=

Var(

X)

ǫ2

=σ2

nǫ2.

As limn−→∞

(σ2

nǫ2) = 0 we have

limn−→∞

Pr[∣

∣X − E(X)∣

∣ ≥ ǫ]

= 0.

Page 25: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

Chapter 3

Estimation

Literature

Required Reading

• WMS, Chapters 8 & 9.1 – 9.3

Recommended Further Reading

• R, Sections 8.6 – 8.8

• CB, Chapters 7, 9, & 10.1.

3.1 Introduction

The purpose of statistics is to use the information contained in a sample to make inference about theparameters of the population that the sample is taken from. To key to making good inference aboutthe parameters is to have a good estimation procedure that produces good estimates of the quantitiesof interest.

Definition 3.1 EstimatorAnestimatoris a rule for calculating anestimateof a target parameter based on the informationfrom a sample. To indicate the link between an estimator and it’s target parameter, sayθ, theestimator is normally denoted by adding a hat:θ.

A point estimationprocedure uses the information in the sample to arrive at a single numberthat is intended to be close to the true value of the target parameter in the population. For example,the sample mean

X =

∑ni=1 Xi

n(3.1)

is one possible point estimator of the population meanµ. There may be more than one estimator fora population parameter. The sample median,X(n/2), for example might be another estimator for thepopulation mean. Alternatively one might provide a range ofvalues as estimates for the mean, forexample the range from 0.10 to 0.35. This case is referred to as interval estimation.

19

Page 26: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

20 QT 2015: Statistical Inference

3.2 Evaluation Criteria for Estimators

As there are often multiple point estimators available for any given parameter it is important to de-velop some evaluation criteria to judge the performance of each estimator and compare their relativeeffectiveness. The three most important criteria used in economics and finance are: unbiasedness,efficiency, and consistency.

Unbiasedness

Definition 3.2 UnbiasednessAn estimatorθ is calledunbiased estimatorof θ if E(θ) = θ. Thebiasof an estimator is givenby b(θ) = E(θ) − θ.

Definition 3.3 Asymptotic Unbiasedness

If an estimator has the property that Var(θ) and√

n(θn − θ) tend to zero as the sample sizeincreases, then it is said to beasymptotically unbiased.

Efficiency

Definition 3.4 Mean Square Error (MSE)A commonly used measure of the adequacy of an estimator is E[(θ − θ)2], which is called themean square error ( MSE). It is a measure of how closeθ is, on average, to the trueθ. The MSEcan be decomposed into two parts:

MS E= E[(θ − θ)2]

= E[(θ − E(θ) + E(θ) − θ)2]

= Var(θ) + bias2 (θ). (3.2)

Definition 3.5 Relative EfficiencyLet θ1 andθ2 be two alternative estimators ofθ. Then the ratio of the respective MS Es, E[(θ1−θ)2]/E[(θ2 − θ)2], is calledthe relative efficiencyof θ1 with respect toθ2.

Consistency

Definition 3.6 ConsistencyAn estimatorθ is consistent if the sequenceθn converges toθ in the limit, i.e.θ → θ.

There are different types of consistency, corresponding to different versions of the law of largenumbers. Examples are:

1. θnp−→ θ (Weak Consistency)

2. θn(2)−→ θ (Squared-error Consistency)

Page 27: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

3. Estimation 21

3. θna.s.−→ θ (Strong Consistency)

A sufficient condition for weak consistency is that

1. The estimator is asymptotically unbiased

2. The variance of the estimator goes to zero asn→ ∞

3.3 Confidence Intervals

An interval estimator is a estimation rule that specifies twonumbers that form the endpoints ofan interval,θL and θH. A good interval estimator is chosen such that (i) it will contain the targetparameterθ most of the time and (ii) the interval chosen is as small as possible. Of course, as theestimators are random variables one or both of the endpointsof the interval will vary from sampleto sample, so one cannot guarantee with certainty that the parameter will lie inside the interval butwe can be fairly confident; as such interval estimators are often referred to asconfidence intervals.The probability (1−α) thatθ will lie in the confidence interval is called theconfidence leveland theupper and lower endpoints are called, respectively, the upper and lowerconfidence limits

Definition 3.7 Confidence IntervalLet θL and θH be interval estimators ofθ s.t. Pr(θL ≤ θ ≤ θH) = 1−α, then the interval[θL, θH]is called the two-sided(1−α)×100% confidence interval. Normally the interval is chosen suchthat on each sideα/2 falls outside the confidence interval.

In addition to two sided confidence intervals it is also possible to form single sided confidenceintervals. IfθL is chosen s.t.

Pr(θL ≤ θ) = 1− α,

then the interval[

θL,∞)

is thelower confidence interval. Additionally if θH is chosen such that

Pr(θ ≤ θH) = 1− α,

the interval(

−∞, θH]

is theupper confidence interval.

Pivotal Method

A useful method for finding the endpoints of confidence intervals is the pivotal method, which relieson finding apivotal quantity

Definition 3.8 Pivotal QuantityThe random variable Q= q(X1, . . . ,Xn) is said to be a pivotal quantity if the distribution of Qis independent fromθ.

For example for a random sample drawn fromN(µ, 1) the random variableQ = X−µ1/n is a pivotal

quantity sinceQ ∼ N(0, 1). For the more general case of a random sample drawn fromN(µ, σ2) the

pivotal quantity associated with ˆµ will be Q = X−µS/n , whereS is the sample estimate of the standard

deviation, asQ ∼ tn−1

Page 28: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

22 QT 2015: Statistical Inference

Pr(q1 ≤ Q ≤ q2) is unaffected by a change of scale or a translation ofQ. That is if

Pr(q1 ≤ Q ≤ q2) = (1− α) (3.3)

Pr(a+ bq1 ≤ a+ bQ≤ a+ bq2) = (1− α) (3.4)

Thus, if we know the pdf ofQ, it may be possible to use the operations of addition and multipli-cation to find out the desired confidence interval. Let’s takeas an example a sample drawn from anormal population with known variance. To build a confidenceinterval around the mean the pivotalquantity of interest is

Q = X ∼ N(µ, 1/n) ∼ N(0, 1). (3.5)

To find the confidence limits ˆµL andµH s.t.

Pr(µL ≤ µ ≤ µH) = 1− α, (3.6)

we start with finding the confidence limitsq1 andq2 of our pivotal quantity s.t.

Pr

(

q1 ≤x− µ1/√

n≤ q2

)

= 1− α. (3.7)

After we have foundq1 andq2, we can manipulate the probability to find expressions for ˆµL andµH.

Pr

(

q1 ≤x− µ1/√

n≤ q2

)

= Pr

(

1√

nq1 ≤ x− µ ≤ 1

√n

q2

)

= Pr

(

1√

nq1 − x ≤ −µ ≤

1√

nq2 − x

)

= Pr

(

x−1√

nq2 ≤ µ ≤ x−

1√

nq1

)

. (3.8)

So,

µL = x−1√

nq2 (3.9)

µH = x− 1√

nq1 (3.10)

and[

x−1√

nq2, x−

1√

nq1

]

(3.11)

is the (1− α)100% confidence interval forµ.

Constructing Confidence Intervals

Confidence Intervals for the Mean of a Normal Population

Consider the case of a sample drawn from a normal population where bothµ andσ2 are unknown.We know that

Q =x− µ

S√n

∼ t(n−1). (3.12)

Page 29: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

3. Estimation 23

As the distribution ofQ does not depend on any unknown parameters,Q is a pivotal quantity.We start with finding the confidence limitsq1 andq2 of the pivotal quantity. As a t-distribution

is symmetrical (just like the normal distribution), we can simplify the problem somewhat as it canbe shown thatq2 = −q1 = q. So we need to find a numberq s.t.

Pr

−q ≤ x− µs√n

≤ q

= 1− α. (3.13)

which reduces to findingq s.t.

Pr(Q ≥ q) =α

2. (3.14)

After we have retrievedq = t α2 ,(n−1), we manipulate the quantities inside the probability to find

Pr

(

x− qs√

n≤ µ ≤ x+ q

s√

n

)

= 1− α. (3.15)

To obtain the confidence interval[

x− t( α2 ,(n−1))s√

n, x+ t α

2 ,(n−1)s√

n

]

(3.16)

Example 3.1 Consider a sample drawn from a normal population with unknown mean andvariance. Let n= 10, x = 3.22, s= 1.17, (1− α) = 0.95. Filling in the numbers in the formula

[

x− t α2 ,(n−1)

s√

n, x+ t α

2 ,(n−1)s√

n

]

.

The95%CI for µ equals,[

3.22−(2.262)(1.17)√

10, 3.22+

(2.262)(1.17)√

10

]

= [2.38, 4.06] . (3.17)

Confidence Intervals for the Variance of a Normal Population

To find the confidence interval of the variance of a normal population, we start again with findingan appropriate pivotal quantity. In this case recall that

Q = (n− 1)S2

σ2∼ χ2

(n−1). (3.18)

Note that the distribution ofQ does not depend on any unknown parameters, henceQ is a pivotalquantity. Therefore we can find limitsq1 andq2 such that

Pr(q1 ≤ Q ≤ q2) = 1− α. (3.19)

This is slightly more tricky as the Chi-square distributionis not symmetric. It is standard to selectthe thresholds such that

Pr(Q ≤ q1) = Pr(Q ≥ q2) =α

2. (3.20)

Page 30: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

24 QT 2015: Statistical Inference

After retrievingq1 = χ21−α/2,(n−1) andq2 = χ

2α/2,(n−1) we manipulate the expression to find

Pr(q1 ≤ Q ≤ q2) = Pr

(

q1 ≤ (n− 1)S2

σ2≤ q2

)

= Pr

(

(n− 1)S2

q2≤ σ2 ≤ (n− 1)

s2

q1

)

. (3.21)

So,(

(n− 1)S2

q2, (n− 1)S2

q1

)

is a 100(1− α)100% CI forσ2.

Example 3.2 As in the example of the previous sample, let n= 10, x = 3.22, s= 1.17, (1−α) =0.95.

The95 percent CI forσ2 is

[

(n− 1)s2

q2, (n− 1)

s2

q1

]

,

with q2 = χ20.025, (9) = 19.02 and q1 = χ

20.975, (9) = 2.70. so the95%CI equals

[

9× 1.172

19.02, 9× 1.172

2.70

]

= [0.65, 4.56] . (3.22)

Problems

1. LetX1,X2, . . . ,Xn be a random sample with meanµ and varianceσ2. Consider the followingestimators:

(i) µ1 =X1+Xn

2

(ii) µ2 =X14 +

12

∑n−1i=2 Xi

(n−2) +Xn4

(iii) µ3 =

∑ni=1 Xi

n+k where 0< k ≤ 3.

(iv) µ4 = X

(a) Explain for each estimator whether they are unbiased and/or consistent.

(b) Find the efficiency ofµ1, µ2, andµ3 relative toµ4. Assumen = 36, σ2 = 20, µ = 15, andk = 3.

2. Consider the case in which two estimators are available for some parameter,θ.

Suppose thatE(θ1) = E(θ2) = θ, Var(θ1) = σ21, andVar(θ2) = σ2

2.

Consider now a third estimator,θ3, defined as

θ3 = aθ1 + (1− a)θ2.

How should a constanta be chosen in order to minimise the variance ofθ3?

(a) Assume thatθ1 andθ2 are independent.

Page 31: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

3. Estimation 25

(b) Assume thatθ1 andθ2 are not independent but are such thatCov(θ1, θ2) = γ , 0.

3. Consider a random sample drawn from a normal population with unknown mean and variance.You have the following information about the sample:n = 21, x = 10.15, ands = 2.34. Letα = 0.10 throughout this question.

(a) Calculate the (1− α) two-sided, upper, and lower confidence intervals forµ.

(b) Calculate the (1− α) two-sided, upper, and lower confidence intervals forσ2.

(c) Calculate the (1− α) two-sided, upper, and lower confidence intervals forσ.

Page 32: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

26 QT 2015: Statistical Inference

Page 33: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

Chapter 4

Hypothesis Testing

Literature

Required Reading

• WMS, Chapter 10

Recommended Further Reading

• R, Sections 9.1 – 9.3

• CB, Chapter 8.

• G, Chapter 5.

4.1 Introduction

Think for a second about a courtroom drama. A defendant is leddown the aisle, the prosecutionlays out all the evidence, and at the end the judge has to weighthe evidence and make his verdict:innocent or guilty. In many ways a legal trial follows the same logic as a statistical hypothesis test.

The testing of statistical hypotheses on unknown parameters of a probability model is one ofthe most important steps of any empirical study. Examples ofstatistical hypothesis that are testedin economics include

• The comparison of two alternative models,

• The evaluation of the effects of a policy change,

• The testing of the validity of an economic theory.

4.2 The Elements of a Statistical Test

Broadly speaking there are two main approaches to hypothesis testing: the classical approach andthe Bayesian approach. The approach followed in this chapter is the classical approach, which ismost widely used in econometrics. The classical approach isbest described by the Neyman-Pearson

27

Page 34: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

28 QT 2015: Statistical Inference

methodology; it can be roughly described as a decision rule that follows the logic: ‘What type ofdata will lead me to reject the hypothesis?’ A decision rule that selects one of the inferences ‘rejectthe null hypothesis’ or ‘do not reject the null hypothesis’ is called astatistical test. Any statisticaltest of hypotheses is composed of the same three essential components:

1. Selecting a null hypothesis,H0, and an alternative hypothesis,H1,

2. Choosing a test statistic,

3. Defining the rejection region.

Null and Alternative Hypotheses

A hypothesis can be thought of as a binary partition of the parameter spaceΘ into two sets,Θ0 andΘ1 such that

Θ0 ∩ Θ1 = ⊘ andΘ0 ∪ Θ1 = Θ. (4.1)

The setΘ0 is called thenull hypothesis, denoted byH0. The setΘ1 is called thealternative hypoth-esis, denoted byH1 or Ha.

Take as example a political poll. Let’s assume that the current prime minister declares thathe has got the support of more than half the population and we do not believe him. To test hisstatement we randomly select 100 voters and ask them if they approve of the prime minister. Wecan now formulate a null and alternative hypothesis.

Let the null hypothesis be that the prime minister is correct, in that case the proportion of peoplesupporting the prime minister will be at least 0.5, so

H0 : θ ≥ 0.5. (4.2)

Conversely if the prime minister is wrong then the alternative is true

H1 : θ < 0.5. (4.3)

Note that this partitioning of the null and alternative is done such that there is no value forθ thatlies both in the domain of the null and the alternative and theunion of the null and the alternativecontains all possible values thatθ can take.

Often the null hypothesis in the above case is simplified: we are really only interested in theendpoint of the interval described by the null hypothesis, in this case the pointθ = 0.5, so often thenull is written instead as

H0 : θ = 0.5, (4.4)

where it is implicit that any value forθ larger than 0.5 is covered by this hypothesis by the way thealternative is formulated.

The above example outlines what is known as asingle sided hypothesisas the alternative hy-pothesis lies to one side of the null hypothesis. Alternatively one can specify atwo sided hypothesissuch as

H0 : θ = 0.5 vs. H1 : θ , 0.5. (4.5)

In this case the alternative hypothesis includes values forθ that lie on both sides of the postulatednull hypothesis.

Page 35: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

4. Hypothesis Testing 29

Test Statistic

Once the null and alternative hypothesis have been defined a procedure needs to be developed todecide whether the null hypothesis is a reasonable one. Thistest procedure usually contains a samplestatisticT(x) calledthe test statistic, which summarizes the ‘evidence’ against the null hypothesis.Generally the test statistic is chosen such that it’s limiting distribution is known.

Take again the example of the popularity poll of the prime minister. We can exploit the fact that(i) the sample consists of an iid sequence of Bernoulli RV and(ii) CLT to show that approximately

θ = X ∼ N

(

θ,θ(1− θ)

n

)

(4.6)

If we standardizeθ and fill in our hypothesized valueθ0 = 0.5 for θ we can create the test statistic.

Z(x) =X − 0.5

0.25/100∼ N(0, 1). (4.7)

Note thatZ(x) does not rely on any unknown quantities and its limiting distribution is known.

Rejection Region

After a test statisticT has been selected, the researcher needs to define a range of values ofT forwhich the test procedure recommends the rejection of the null. This range is called therejectionregionor thecritical region. Conversely the range of values forT in which the null is not rejected iscalled theacceptance region. The cut-off point(s) that indicate the boundary between the rejectionregion and the acceptance region is called thecritical value.

Going back to the example of the popularity poll, we could create the protocol: if the test statisticT is lower than the critical valueτcrit = −2 I reject the nullH0 : θ = 0.5 in favour of the alternativeH1 : θ < 0.5. In this case the rejection region consists of the setRR= {t < −2} and the acceptanceregion of the setAR= {t ≥ −2}.

To find the right critical value is an interesting problem. Inthe above example, we know thatfinding any value for ˆthetalower than 0.5 (and hence a test statistic lower than 0) is evidence againstthe null hypothesis. But how low should we set our threshold exactly? In order to better understandthis dilemma lets first assume the decision rule fixed and evaluate the possible outcomes of ourstatistical test.

Hopefully our test arrives at the correct conclusion: reject the null when it is not true or notrejecting it when it is indeed true. However there is the possibility that an erroneous conclusion hasbeen made and one of two types of errors has been committed:

Type I error : RejectingH0 when it is true

Type II error: Not RejectingH0 when it is false

Now that we have identified the two correct outcomes and two errors we can commit, we canassociate probabilities with these events.

Definition 4.1 Size of the test(α)The probability of rejecting H0 when it is actually true (ie. committing a type I error) is calledthesizeof the test. Sometimes it is also called thelevel of significance of the test. This proba-bility is usually denoted asα.

Page 36: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

30 QT 2015: Statistical Inference

Table 4.1: Decision outcomes and their associated probabilities

H0 rejected H0 not rejected

H0 true α (1− α)Type I errorLevel / Size

H0 false (1− β) β

Type II errorPower Operating Char.

Common sizes that are used in hypothesis testing areα = 0.10, α = 0.05, andα = 0.01.

Definition 4.2 Power of the test (1− β)The probability of rejecting H0 when it is false is called thepowerof the test. This probabilityis normally denoted as(1− β).

Definition 4.3 Operating Characteristic (β)The probability of not rejecting H0 when it is actually false (ie. committing a type II error) isknown as theoperating characteristic. This probability is usually denoted asβ. This concept iswidely used in statistical quality control theory.

Table 4.2 below summarizes the probabilities. Ideally a test is chosen such that both the prob-ability of a type I error,α, and the probability of a type II error,β, are as low as possible.However,practically this is impossible because, given some fixed sample, reducingα increasesβ: there isa trade-off between the two. The only way to decrease bothα and β is to increase the samplesize, something that is often not feasible. The classical decision procedure therefore chooses anacceptable value for the levelα.

Note that in small samples the empirical size associated with a critical value of a test statistic isoften larger than the asymptotic size because the approximation of the limiting distribution mightnot yet be very good. Thus if a researcher is not careful he risks choosing a test which rejects thenull hypothesis more often than he realizes.

So then howdo we select the critical valueτcrit after fixingα? Let’s consider once more ourpopularity contest. Recall that the test statistic associated with the hypothesis thatθ = 0.5 was(X − 0.5)/(0.25/100)∼ N(0, 1). Let’s say that we are willing to reject the null hypothesis if there isless than 2.5% probability of committing a type I error, ie.α = 0.025. Since we know the limitingdistribution ofT we can find the valueτcrit such that

Pr[T < τcrit | θ = 0.5] = α = 0.025. (4.8)

This value can be found by looking up the CDF of a standard normal: Pr(T ≥ τ) = 1−0.025= 0.975;in this caseτ = −1.96. In any case, we have now found the relevant critical value, and can definethe rejection region asRR= {t < −1.96} and the acceptance region asAR= {t ≥ −1.96}. If we map

Page 37: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

4. Hypothesis Testing 31

the critical value of the test statistic back to a proportion, this translates toθcrit = θ − 1.96× se=0.5− 1.96× 0.05= 0.402; ie. we can reject the null (θ = 0.5) at the 2.5% level if we find a samplemean lower than 0.402.

If a two sided test of the formH0 : θ = 0.5 vs. H1 : θ , 0.5 would have been considered,the rejection region would have consisted of two parts:RR = {t : t < τl or t > τu}. Wherefor a symmetric distribution like the normalτu = −τl = τ which reduces the rejection region toRR= {T : |t| > τ}. Using the data from the popularity poll, we can easily construct a two-sidedrejection region for the hypothesisH0 : θ = 0.5 vs. H1 : θ , 0.5 at the 5% level by realizing that5% / 2 = 2.5%. Hence the critical values for the two-sided test will be −1.96 and 1.96, with theassociated rejection region:RR= {t : |t| > 1.96}.

Example 4.1 Consider the hypothetical example in which a subject is asked to draw, 20 times,a card from a suit of 52 cards and identify, without looking, the suit (hearts, diamonds, clubs,spades). Let T be the number of correct identifications. Let the null hypothesis random guesseswith the alternative being that the person has extrasensoryability (also called ESP). If themaximum level of the test is set atα = 0.05, what should be the decision rule and associatedrejection region?

T ∼ binomial(20, 0.25).

Find τ0.05 such thatPr[T > τ0.05 | π = 0.25] ≤ 0.05.

P[T ≥ 8 | π = 0.25] = 0.102> 0.05and P[T ≥ 9 | π = 0.25] = 0.041< 0.05.

Thus the critical value of this test isτ0.05 = 9 and the rejection region equals

RR: t ≥ 9.

Common Large-Sample Tests

Many hypothesis tests are based around test statistics thatare approximately normal by virtue of theCLT, such as sample meansX. We can exploit this fact to construct a test statistic that is commonlyencountered in econometrics.

Z =θ − θ0σθ

∼ N(0, 1). (4.9)

The standard error is often replaced with its sample estimate S/n which results in the following teststatistic

T =θ − θ0S/n

∼ t(n−1). (4.10)

with associated two-sided rejection region

RR: {t : |t| > τα/2} or RR: {θ : θ < θ − τα/2σθ or θ > θ + τα/2σθ}. (4.11)

Page 38: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

32 QT 2015: Statistical Inference

4.3 Duality of Hypothesis Testing and Confidence Intervals

Recall the concept of a (1− α) two-sided confidence interval[

θl , θh]

as an interval that containsthe true parameterθ with probability (1− α). Also recall that if the sampling distribution ofθ isapproximately normal then the (1− α) confidence interval is given by

θ ± zα/2σθ, (4.12)

with σθ the standard error of the estimator andzα/2 the value such that Pr(Z > zα/2) = α/2.Note the strong similarity with this confidence interval andthe test statistic plus associated

rejection region of a two sided hypothesis test described inthe previous section. This is no coin-cidence. Consider again the two-sided rejection region fora test with levelα from the previoussection:RR: {z : |z| > zα/2}. The complement of the rejection region,RR, is the acceptance regionAR : {z : |z| ≤ zα/2} which maps onto the parameter space as do not reject (‘accept’) null hypothesisat levelα if the estimate lies in the interval

θ0 ± zα/2σθ. (4.13)

Restated, for allθ0 that lie in the interval

θ − zα/2σθ ≥ θ0 ≥ θ + zα/2σθ. (4.14)

the estimateθ will lie inside the acceptance region and the null hypothesis cannot be rejected atlevelα. This interval is, as you will notice, exactly equal to the (1− α) confidence interval outlinedabove. Thus the duality between confidence intervals and hypothesis testing: if the hypothesizedvalueθ0 lies inside the (1− α) confidence interval, one cannot reject the null hypothesisH0 : θ = θ0vs. H1 : θ , θ0 at levelα; if θ0 does not lie in the confidence interval then the null can be rejectedat levelα. A similar statement can be made for upper and lower single sided confidence intervals.

Notice thatanyvalue inside the confidence interval would be an ‘acceptable’ value for the nullhypothesis, in the sense that it cannot be rejected with a hypothesis test of levelα. This explainswhy in statistics we usually only talk about rejecting the null vs. not rejecting the null, rather thansaying we ‘accept’ the null. Even if we do not reject the null we recognize that there are probablymany other values forθ that would be acceptable and we should be hesitant to make statementsabout a singleθ beingthesingle true value. Likewise we do not commonly ‘accept’ the alternativewhen we reject the null hypothesis are there are usually manypotential values the paramaterθ cantake under the alternative.

4.4 Attained Significance Levels: P-Values

Recall that the most common method of selecting a critical value for the test statistic and deter-mining the rejection region is fixing the level of the testα. Of course we would like to haveα assmall as possible as it denotes the probability of committing a type I error. However, as discussed,choosing a lowα comes at the cost of increasingβ, the probability of a type II error. Choosing thecorrect value ofα is thus important, but also rather arbitrary. While one researcher would be happyto conduct a test with levelα = 0.10 another would insist upon only testing with levels lower than,say,α = 0.05. Furthermore, the levels of tests are often fixed at 10%, 5%, or 1% not as a result oflong deliberations, but rather out of custom and tradition.

Page 39: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

4. Hypothesis Testing 33

There is a way to partially sidestep this issue of selecting the right value forα by reportingthe attained significance levelor p-value. For example letT be a test statistic for the hypothesisH0 : θ = θ0 vs. H1 : θ > θ0. If the realized value of the test statistic ist, based on our sample, thenthe p-value is calculated as the probability

pval= Pr[T > t | θ0]. (4.15)

Definition 4.4 p-value The attained significance level, or p-value, is the smallestlevel of sig-nificanceα at which the null hypothesis can be reject given the observedsample.

The advantage of reporting a p-value, rather than fixing the level of the test yourself is that itpermits each of your readers to draw their own conclusion about the strength of your results. Theprocedures for finding p-values are very similar to those of finding the critical value of a test statistic.However, instead of fixing the probabilityα and finding the critical value of the test statisticτ, wenow fix the value of the test statistict and find the associated probabilitypval.

Example 4.2 A financial analyst believes that firms experience positive stock returns upon theannouncement that they are targeted for a takeover. To test his hypothesis he has collecteda data set comprising 300 take-over announcements with an average abnormal return ofr =1.5% on the announcement date, with a standard error of 0.5%. Calculate the p-value of thenull hypothesis H0 : r = 0 vs. H0 : r > 0.

Invoking CLT, the natural test statistic to test this hypothesis is

Z =rabn− 0

Srabn

∼ t(99) ≈ N(0, 1).

The value of the test statistic in this sample equals z= 1.5/0.5 = 3. Looking up the value 3 inthe standard normal table yields us p-val= Pr[Z > 3] = 0.0013. Thus the p-value of this test is0.13%, implying that we can easily reject the null hypothesis of no news effect at the 10%, 5%,or 1% level.

4.5 Power of the Test

In the previous sections we have primarily focused on the probability α of committing a type I error.However, it is at least as important for a test to also have a low probability β of committing a typeII error. Remember that a type II error is committed if the testing procedure fails to reject the nullwhen it was in fact false. In econometrics, rather than looking directly atβ, many statistical testsare evaluated by its complement (1− β): the probability that a statistical test rejects the null when itis indeed false; this probability (1− β) is called thepower of the test.

Before we can calculate the power of a test there are two issues that need to be addressed.Firstly, recall that the alternative hypothesis often contains a large range of potential values forθ.For instance in the single sided hypothesisH0 : θ = θ0 vs H1 : θ > θ0, all values ofθ larger thanθ0 are included in the alternative. However, the power will normally not be the same for all thesedifferent values included inΘ1. Therefore the power of a test is often evaluated at specific valuesfor the alternative, sayθ = θ1.

Page 40: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

34 QT 2015: Statistical Inference

Secondly, as we have focused on type I errors and the associatedα’s, we have only consideredhow the sampling distribution looks like under the assumption that the null hypothesis is correct.This sampling distribution is referred to as thenull distribution. However, if we are interested aboutmaking statements about the power of the test (or type II errors), then we have to consider how thesampling distribution ofθ looks like whenθ = θ1. That is, we evaluate the sampling distribution forthat specific alternative.

Consider once more the one-sided hypothesisH0 : θ = θ0 vs. H0 : θ > θ0 with the associatedtest statisticT and critical valueτα. For a specific alternativeθ = θ1 (with θ1 > θ0) the power of thetest can be calculated as the conditional probability

(1− β) = Pr[T > τ | θ = θ1]. (4.16)

Note that the main difference with the definition ofα

α = Pr[T > τ | θ = θ0], (4.17)

is that the probability is conditioned on the alternative hypothesis being true, rather than the as-sumption that the null hypothesis is true.

Example 4.3 Many American high-schoolers take the SAT (scholastic aptitude test). The av-erage SAT score for mathematics is 633. Consider the following test: a school is considered‘excellent’ if its students obtain an average SAT score of more than 650 (assume a class sizeof 40). School X believes that its own students will have an expected SAT score of 660 with astandard deviation of 113. Thus, school X feels it should be rated excellent; what is the proba-bility of the school to be actually rated ‘excellent’?

This problem is really all about the power of the test. Realize first that we can describe theabove as a hypothesis test of the form H0 : θ = 633vs. H1 : θ > 633with a rejection regionRR= {θ : θ > 650}. Because we are looking at the power of the test, we have to consider thealternative distribution, not the null distribution. In this case the school wants to evaluate thistest at the specific alternativeθ1 = 660and find the probabilityPr[θ > 650 | θ1 = 660]= (1−β)which is equal to the power of the test evaluated atθ1.

Invoking CLT we have, under the alternative distribution, Z= (θ − 660)/(113/√

40) ∼N(0, 1). We can use this to manipulate the probability from above to

(1− β) = Pr [Z ≥ z650] .

Filling in the numbers we find that

z650 =650− 660

113/√

40− 0.56.

Looking up z650 = −0.56 in the standard normal table yields us the probability(1− β) = 0.71.

Asymmetry Null and Alternative Hypotheses

As should be clear by now there is an asymmetry between the null and the alternative hypothesis.The testing procedure outlined heavily focuses on the null hypothesis, ‘favouring’ it over the alter-

Page 41: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

4. Hypothesis Testing 35

native: the decision rule and test statistic are based around thenull distribution and the probabilityof falsely rejecting the null hypothesis; the conclusion drawn is mainly about the null (reject thenull, do not reject the null). The test only rejects the null if there is a lot of evidence against it, evenif the test has low power.

Therefore, the decision as to which is the null and which is the alternative is not merely amathematical one, but depends on context and custom. There are no fast and hard rules on how tochoose the null over the alternative, but often the ’logical’ null can be deduced on the hand of oneof several principles

• Sometimes we have good information about the distribution of one of the two hypothesis, butnot really about how the sampling distribution looks like under the other hypothesis. In thiscase it is standard to choose the ‘simpler’ hypothesis of which we know the distribution as thenull hypothesis. For example, if you are interested whethera certain sample is drawn from anormal population, you know how the distribution looks likeunder the null (ie. normal), butno clue how it might look like under the alternative (exponential, χ2, something else?), so thenatural null is to assume normality.

• Sometimes the consequences of falsely rejecting one hypothesis is much more grave thanrejecting the other hypothesis. In this case we should choose the former as the null hypothesis.For example: if you have to judge the safety of a bridge it is more harmful to wrongly rejectthe hypothesis that is unsafe (potentially killing many people) than it is to wrongly reject thehypothesis that the bridge is safe (which may cost money on spurious repairs). In this casethe null should be: the bridge is deemed unsafe, unless proven otherwise.

• In scientific investigations it is common to approach the research question with a certainlevel of scepsis. If a new medicine is introduced, the appropriate null hypothesis wouldbe to assume that it does not perform better than the current drug on the market. If youevaluate the effect of an economic policy the natural null hypothesis would be to assumethat it had no effect whatsoever. In both cases you put the burden of evidence on your newmedicine/theory/policy.

Problems

1. The output voltage for a certain electric circuit is specified to be 130. A sample of 40 inde-pendent readings on the voltage for this circuit gave a sample mean of 128.6 and a standarddeviation of 2.1.

(a) Test the hypothesis that the average output voltage is 130 against the alternative that it isless than 130 using a test with levelα = 0.05.

(b) If the average voltage falls as low as 128 serious consequences may occur. Calculate theprobability of committing a type II error forH1 : V = 128 given the decision rule outlinedin (a).

2. LetY1,Y2, ...,Yn be a random sample of sizen = 20 from a normal distribution with unknownmeanµ and known varianceσ2 = 5.We wish to testH0 : µ ≤ 7 versusH1 : µ > 7.

(a) Find the uniformly most powerful test with significance level 0.05.

Page 42: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

36 QT 2015: Statistical Inference

(b) For the test in (a), find the power at each of the following alternative values forµ :µ1 = 7.5, µ1 = 8.0, µ1 = 8.5, andµ1 = 9.0.

3. In a study to assess various effects of using a female model in automobile advertising, eachof100 male subjects was shown photographs of two automobiles matched for price, colour, andsize but of different makes. Fifty of the subjects (group A) were shown automobile 1 with afemale model and automobile 2 with no model. Both automobiles were shown without themodel to the other 50 subjects (group B). In group A, automobile 1 (shown with the model)was judged to be more expensive by 37 subjects. In group B, automobile 1 was judged to bemore expensive by 23 subjects. Do these results indicate that using a female model increasesthe perceived cost of an automobile? Find the associatedp-value and indicate your conclusionfor anα = .05 level test.

Page 43: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

Appendix A

Exercise Solutions

A.1 Sampling Distributions

1. (a) As the population is normally distributed, we have (Xi − µ)/σ = Z ∼ N(0, 1). Here:Z = (98− 100)/3 = −0.67. Look upZ = −0.67 in the standard normal table to find thatPr[Xi ≤ 98] = Pr[Z ≤ −0.67] = 0.2514.

(b) As the population is normally distributed, we have (X − µ)/(σ/n) = Z ∼ N(0, 1). Here:Z = (98− 100)/(3/3) = −2. Look upZ = −2 in the standard normal table to find thatPr[X ≤ 98] = Pr[Z ≤ −2] = 0.0238.

(c) If we use the sample variance to calculate the standard error, rather than the populationvariance, the resulting sampling distribution will betn−1, rather than standard normal.That is, (X − µ)/(S/n) = T ∼ tn−1. Pr[X ≤ 98] = Pr[T ≤ −2] = 0.0403. (if you use thestudent t tables, you’ll only be able to find 0.025< Pr < 0.05.

2. (a)

E(X) = µ1,

Var(X) = n−1σ21.

(b)

E(X − Y) = 0,

Var(X) = n−1σ21 +m−1σ2

2

= n−1(σ21 + σ

22)

= 4.5/n.

(c)

σX−Y = 0.1√

4.5/√

n = 0.1

n = 4.5/(0.1)2

n = 450.

37

Page 44: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

38 QT 2015: Statistical Inference

3. (a) Find Pr[

S21 ≤ a

]

.

We know that (n− 1)S2

1

σ21∼ χ2

n−1.

Pr

(n− 1)S2

1

σ21

≤ χ2(10),0.95

, χ2(10),0.05 = 18.307

Pr

10×S2

1

15≤ 18.307

Pr

[

S21 ≤ 18.307× 15

10

]

Pr[

S21 ≤ 27.46

]

(b) Find Pr[

S21

S22≤ b

]

.

We know thatS2

1/σ21

S22/σ

22∼ F(n1−1,n2−1).

Pr

S21/σ

21

S22/σ

22

≤ F(n1−1,n2−1),0.05

, F(n1−1,n2−1),0.05 = 2.98

Pr

S21/15

S22/15

≤ 2.98

Pr

S21

S22

≤ 2.98

4. (a)∑4 Zi ∼ N(0, 4)

(b)∑4 Z2

i ∼ χ24

(c) Z21/

∑4i=2 Zi ∼ F1,3

(d) Z1/

∑4i=2 Z2

i /3 ∼ t3

Page 45: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

A. Exercise Solutions 39

A.2 Large Sample Theory

1. First we show that expectation of the sample mean equals the average of the population means.

E(X) = n−1E[X1 + . . . + Xn]

= n−1n

i=1

µi

= µ.

Also the standard error of the sample mean will be

Var(X) =1n2

i

σ2i .

Next we use Chebychev’s inequality to establish that

Pr[(Xn − µ)2 > ǫ2] ≤Var(Xn)

ǫ2

≤∑

i σ2i

n2

1ǫ2→ 0,

which concludes our proof.

2. Approximate Pr[S100 < 120] with Xi ∼ CDF(1.5, 1).CLT states that,

Sn − E(Sn)√σ2

Sn

d→ N(0, 1),

with σ2Sn= nσ2. Thus

120− 150√

100= −3,

Pr(Z < −3) = (1− 0.9987)≈ 0.13%.

3. Again, use CLT to approximate the sampling distribution with a normal distribution. As thevariance is known, the sampling distribution will be approximately

X − µσ/√

n∼ N(0, 1),

with σ2 = 25. Next we look up the quantile that is associated with an exceedance probabilityof 0.05/2 = 0.025,z0.025 = 1.96. So we solve for:

1.5/(σ/√

(n)) = 1.96√

(n) = 1.96/1.5×√

25

= 1.962/1.52 × 25

n = 42.7.

Normally n would be rounded up, to in this case 43, to ensure that the probability is at leastthe desired level.

Page 46: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

40 QT 2015: Statistical Inference

A.3 Estimation

1. (a) (i) µ1 =X1+Xn

2 is unbiased, but inconsistent.

Unbiasedness:

E[µ1] = EX1 + Xn

2

=EX1 + EXn

2

=2µ2

= µ.

Consistency:

E(µ1)→ µ,

but

Var(µ1) = σ2/2→ σ2/2 , 0.

so

µ29 µ.

(ii) µ2 =X14 +

12

∑n−1i=2 Xi

n−2 +Xn4 is unbiased, but inconsistent.

Unbiasedness:

E(µ2) = EX1 + Xn

4+ E

12

∑n−1i=2 Xi

n− 2

=12µ +

12

∑n−1i=2 EXi

n− 2

=12µ +

12

(n− 2)µ(n− 2)

= µ.

Consistency:

E(µ2)→ µ,

but,

Var(µ2) = σ2/8+14σ2/(n− 2)→ σ2/8 , 0.

so

µ29 µ.

Page 47: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

A. Exercise Solutions 41

(iii) µ3 =∑n

i=1 Xi/(n+ k), 0 < k ≤ 3 is biased, but consistent.

Unbiasedness:

E(µ3) = E((n+ k)−1n

i=1

Xi)

= (n+ k)−1n

i=1

E(Xi)

= (n+ k)−1n

i=1

µ

=n

n+ kµ

= µ −k

n+ kµ

, µ.

Consistency:

limn→∞

[E(µ3)] = limn→∞

[n

n+ kµ],

= µ.

and

Var(µ3) = n/(n+ k)2σ2→ 0.

so

µ3→ µ.

(iv) µ4 = X is both unbiased and consistent, see lecture notes.

(b) Relative Efficiency: (MS E1/MS E2), with MS Ei = Var(µi) + bias2(µi). For n =36, σ2 = 20, µ = 15, andk = 3 we have

(i) MS E1 = σ2/2 = 10

(ii) MS E2 = σ2/8+ 1

4σ2/(n− 2) = 45/17

(iii)

MS E3 =n

(n+ k)2σ2 +

(

kn+ k

µ

)2

= 80/169+ 225/169= 300/169

(iv) MS E4 = σ2/n = 20/36

So the relative efficiency of the sample mean (ˆµ4) will be

(i) MS E1/MS E4 = 0.056

(ii) MS E2/MS E4 = 0.210

(iii) MS E3/MS E4 = 0.313

Page 48: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

42 QT 2015: Statistical Inference

2. Note thatθ3 is unbiased:

E(θ3) = E(aθ1 + (1− a)θ2)

E(θ3) = aθ + (1− a)θ

E(θ3) = θ.

The variance ofθ3 is defined as

σ23 =

n∑

i=1

n∑

j=1

(bib jγi, j),

= a2σ21 + (1− a)2σ2

2 + 2a(1− a)γ.

Let’s consider the general case withγ unconstrained.

To minimizeσ23 find:

arg mina

[a2σ21 + (1− a)2σ2

2 + 2a(1− a)γ].

∂σ23

∂a= 2aσ2

1 − 2(1− a)σ22 + 2(1− 2a)γ = 0

= 2a(σ21 + σ

21 − 2γ) − 2(σ2

2 − γ) = 0,

a =σ2

2 − γσ2

1 + σ22 − 2γ

.

note thatγ = ρσ1σ2.

(a) a =σ2

2

σ21+σ

22

(b) a =σ2

2−γσ2

1+σ22−2γ

3. (a) 90 % Confidence intervals for the mean (x = 10.15)

(i) Two sided

[

x± t0.05,20s√

n

]

.

t0.05,20 = 1.725,s√

n= 2.34/

√21= 0.51

CI = [10.5± 1.725× 0.51] = [9.27, 11.03]

Page 49: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

A. Exercise Solutions 43

(ii) Upper(

−∞, x+ t0.10,20s√

n

]

t0.10,20 = 1.325,s√

n= 2.34/

√21= 0.51

CIH = (−∞, 10.83]

(iii) Lower[

x− t0.10,20s√

n,∞

)

t0.10,20 = 1.325,s√

n= 2.34/

√21= 0.51

CIL = [9.47,∞)

(b) 90 % Confidence intervals for the variance (s2 = 2.342 = 5.48)

(i) Two sided

χ20.05,20 = 31.41;χ2

0.95,20 = 10.85

CI =

[

20× 5.4831.41

,20× 5.48

10.85

]

= [3.49, 10.10]

(ii) Upper

χ20.90,20 = 12.44

CIH = [0, 8.80]

(iii) Lower

χ20.10,20 = 28.41

CIL = [2.71,∞)

(c) To find the 90 % Confidence intervals for the standard deviation, take the square root ofthe CI of the variance

(i) Two sided [1.87, 3.18]

(ii) Upper [0, 2.97]

(iii) Lower [1.65,∞)

Page 50: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

44 QT 2015: Statistical Inference

A.4 Hypothesis Testing

1. (a) TestH0 : ν ≥ 130 vs.H1 : ν < 130 withα = 0.05. n = 40, ν = 128.6, σ = 2.1Using CLT we can construct a test statistic with known sampling statistic:

z=ν − νσ/√

n∼ N(0, 1),

t =ν − νS/√

n∼ t(n− 1),

tν =128.6− 130

2.1/√

40=−1.40.33

= −4.24.

Rejection Region:tν < t0.05, t0.05(39)≈ t0.05(40)= −1.684 (comparez0.05 = −1.645)

−4.24< −1.684⇒ RejectH0 : ν is significantly lower than 130 at the 5% level.

(b) Decision rule: RejectH0 if (V−130)2.1/√

40< −1.684→ V < 129.44

P[V ≥ 129.44 | ν = 128]= P[(V − 128)/0.33≥ (129.44− 128)/0.33]

(129.44− 128)/0.33 = 4.36, look up in t-table or z-table to find this probability is wellbelow 0.1%.

2. ConsiderH0 : µ ≤ 7 vs. Ha : µ > 7. Yi ∼ N(µ, 5), n = 20

(a) Uniformly most powerful test:

arg maxmcrit

[P(µ > mcrit | µ1)] s.t. P(µ > mcrit | µ0) ≤ α.

i.e. setmcrit s.t.

P(µ > mcrit | µ0) = 0.05.

By CLT we know that:

µ − µσ/√

n∼ N(0, 1),Z0.95 = 1.645.

Thus:

mcrit − µσ/√

n= Z0.95,

mcrit − 7√

5/20= 1.645,

mcrit = 7+ 1.645√

5/20,

= 7.8225.

i.e. rejection region: reject if ˆµ > 7.8225.

Page 51: MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) · 2015-09-10 · MSc QT: Statistics Part II Statistical Inference (Weeks 3 and 4) Sotiris Migkos Department of Economics,

A. Exercise Solutions 45

(b) Find the power of the test:

(1− β) = P(µ > mcrit | µ1),

When the alternative takes on the following values (Again, use CLT for the samplingdistribution):

µ1 = 7.5, (7.8225− 7.5)/0.5 = 0.645 , (1− β) = 0.26.

µ1 = 8.0, (7.8225− 8.0)/0.5 = −0.335 , (1− β) = 0.63.

µ1 = 8.5, (7.8225− 8.5)/0.5 = −1.335 , (1− β) = 0.91.

µ1 = 9.0, (7.8225− 9.0)/0.5 = −2.355 , (1− β) = 0.99.

3. In effect there are two random samples which are both a sequence of Bernoulli trails, eachwith n = 50 and some parameterφ ∈ [0, 1]. Setting up the null and alternative hypothesisyields:

H0 : φ1 ≤ φ2,H1 : φ1 > φ2

or alternatively

H0 : (φ1 − φ2) ≤ 0,H1 : (φ1 − φ2) > 0

A Bernoulli distribution has meanφ and varianceφ(1−φ). Remember that the setup of this testis strongly reminiscent of exercise 7, implying thatVar(φ1−φ2) = σ2

1/n1+σ22/n2 and by CLT

it will be normally distributed. Replacing the population variances with their sample equiv-alents yields the following test statistic, which will be a t-distribution with approximately(n1 + n2 − 2)1 degrees of freedom:

t =(φ1 − φ2) − 0√

(s21 + s2

2)/n

Filling in the numbers yields:

t = (0.74− 0.46)/√

[0.74(1− 0.74)+ 0.46(1− 0.46)]/50= 2.982

Checking the table for the t-dist, it can be seen that the p-value< 0.005. p-value is less than0.05, so rejectH0. Alternatively the critical valueτ0.95 = 1.661. Notet > τ0.95 so rejectH0. In both cases the conclusion is that, indeed, the inclusionof female models significantlyincrease the probability that a car is perceived to be more expensive. Note, the 95% upperconfidence interval is(−∞, 0.156]; so observing (φ1 − φ2) = 0.28 falls outside the confidencebounds also leads to the conclusion thatH0 can be rejected.

1to be more precise the degrees of freedom is estimated by

(

σ21/n1 + σ

22/n2

)2

(σ21/n1)2/(n1 − 1)+ (σ2

2/n2)2/(n2 − 1)