class 2 statistical inference lionel nesta observatoire français des conjonctures economiques...

59
Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques [email protected] CERAM February-March-April 2008

Post on 18-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

Class 2Statistical Inference

Lionel Nesta

Observatoire Français des Conjonctures Economiques

[email protected]

CERAM February-March-April 2008

Page 2: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

Hypothesis Testing

Page 3: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

The Notion of Hypothesis in Statistics Expectation

An hypothesis is a conjecture, an expected explanation of why a given

phenomenon is occurring

Operational -ity

An hypothesis must be precise, univocal and quantifiable

Refutability

Le result of a given experiment must give rise to either the refutation or the

corroboration of the tested hypothesis

Replicability

Exclude ad hoc, local arrangements from experiment, and seek universality

Page 4: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

Examples of Good and Bad Hypotheses

« The stakes Peugeot and Citroen have the same variance »

«  God exists! »

« In general, the closure of a given production site in Europe is positively

associated with the share price of a given company on financial markets. »

« Knowledge has a positive impact on economic growth »

Page 5: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

Hypothesis Testing In statistics, hypothesis testing aims at accepting or rejecting a

hypothesis

The statistical hypothesis is called the “null hypothesis” H0

The null hypothesis proposes something initially presumed true.

It is rejected only when it becomes evidently false, that is, when the

researcher has a certain degree of confidence, usually 95% to 99%,

that the data do not support the null hypothesis.

The alternative hypothesis (or research hypothesis) H1 is the

complement of H0.

Page 6: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

Hypothesis Testing There are two kinds of hypothesis testing:

Homogeneity test compares the means of two samples.

H0 : Mean(x) = Mean(y) ; Mean(x) = 0

H1 : Mean(x) ≠ Mean(y) ; Mean(x) ≠ 0

Conformity test looks at whether the distribution of a given sample follows

the properties of a distribution law (normal, Gaussian, Poisson, binomial).

H0 : ℓ(x) = ℓ*(x)

H1 : ℓ(x) ≠ ℓ*(x)

Page 7: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

The Four Steps of Hypothesis Testing1. Spelling out the null hypothesis H0 et and the alternative

hypothesis H1.

2. Computation of a statistics corresponding to the distance

between two sample means (homogeneity test) or between the

sample and the distribution law (conformity test).

3. Computation of the (critical) probability to observe what one

observes.

4. Conclusion of the test according to an agreed threshold around

which one arbitrates between H0 and H1 .

Page 8: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

The Logic of Hypothesis Testing We need to say something about the reliability (or

representativeness) of a mean

Large number theory; Central limit theorem

The notion of confidence interval

Once done, we can whether two mean are alike

If so (not), their confidence intervals are (not) overlapping

Page 9: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

Statistical Inference

In real life calculating parameters of populations is prohibitive because populations are very large.

Rather than investigating the whole population, we take a sample, calculate a statistic related to the parameter of interest, and make an inference.

The sampling distribution of the statistic is the tool that tells us how close is the statistic to the parameter.

Page 10: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

Prerequisite Standard Normal

Distribution

Page 11: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

Two Prerequisites

Large number theory

Large number theory tells us that the sample mean will converge

to the population (true) mean as the sample size increases.

Central Limit Theorem

Central Limit Theorem tells us that for many samples of like and

sufficiently large size, the histogram of these sample means will

appear to be a normal distribution.

Page 12: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

The Dice Experiment

6

1

1 213.5

6 6

x

Xx

E X x

Value P(X = x)

1 1/6

2 1/6

3 1/6

4 1/6

5 1/6

6 1/60.00

0.04

0.08

0.12

0.16

0.20

1 2 3 4 5 6x

Page 13: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

Sample Mean Sample Mean Sample Mean

1 1,1 1 13 3,1 2 25 5,1 32 1,2 1.5 14 3,2 2.5 26 5,2 3.53 1,3 2 15 3,3 3 27 5,3 44 1,4 2.5 16 3,4 3.5 28 5,4 4.55 1,5 3 17 3,5 4 29 5,5 56 1,6 3.5 18 3,6 4.5 30 5,6 5.57 2,1 1.5 19 4,1 2.5 31 6,1 3.58 2,2 2 20 4,2 3 32 6,2 49 2,3 2.5 21 4,3 3.5 33 6,3 4.5

10 2,4 3 22 4,4 4 34 6,4 511 2,5 3.5 23 4,5 4.5 35 6,5 5.512 2,6 4 24 4,6 5 36 6,6 6

The Dice Experiment (n = 2)

1 2 36

1 1 13.5

36 36 36 XXE X X X X

Page 14: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

1 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0

6/365/364/363/362/361/36

x

Sample Mean Sample Mean Sample Mean

1 1,1 1 13 3,1 2 25 5,1 32 1,2 1.5 14 3,2 2.5 26 5,2 3.53 1,3 2 15 3,3 3 27 5,3 44 1,4 2.5 16 3,4 3.5 28 5,4 4.55 1,5 3 17 3,5 4 29 5,5 56 1,6 3.5 18 3,6 4.5 30 5,6 5.57 2,1 1.5 19 4,1 2.5 31 6,1 3.58 2,2 2 20 4,2 3 32 6,2 49 2,3 2.5 21 4,3 3.5 33 6,3 4.5

10 2,4 3 22 4,4 4 34 6,4 511 2,5 3.5 23 4,5 4.5 35 6,5 5.512 2,6 4 24 4,6 5 36 6,6 6

Page 15: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

The Normal Distribution

In probability, a random variable follows a normal distribution

law (also called Gaussian, Laplace-Gauss distribution law) of

expectation μ and standard deviation σ if its probability

density function (pdf) is such that

This law is written (μ,σ ²). The density function of a normal

distribution is symmetrical.

21

21( )

2

x

f x e

Page 16: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

Normal Distributions For Different values of μ and σ

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

-5 -4 -3 -2 -1 0 1 2 3 4 5

(μ=0;σ=1) (μ=0.5;σ=1.1) (μ=-2;σ=0.5)

Page 17: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

The standard normal distribution, also called Z distribution,

represents a probability density function with mean μ = 0 and

standard deviation σ = 1. It is written as N (0,1).

All random variable following a normal law can be standardized via

the following transformation

xz

The Standard Normal Distribution

Page 18: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

The Standard Normal Distribution (μ=0; σ=1)

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

-5.0 -4.5 -4.0 -3.5 -3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

Page 19: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

The Standard Normal Distribution

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

-5 -4 -3 -2 -1 0 1 2 3 4 5

68% of observations

95% of observations

99.7% of observations

Page 20: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

The Standard Normal Distribution

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

-5 -4 -3 -2 -1 0 1 2 3 4 5

95% of observations

2.5% 2.5%

Page 21: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

-5 -4 -3 -2 -1 0 1 2 3 4 5

P(Z ≥ 0)P(Z < 0)

The Standard Normal Distribution (z scores)

Page 22: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

-5 -4 -3 -2 -1 0 1 2 3 4 5

P(Z ≥ 0.51)

Probability of an event (z = 0.51)

Page 23: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

Probability of an event (z = 0.51)

The z-score is used to compute the probability of

obtaining an observed score.

Example

Let z = 0.51. What is the probability of observing

z=0.51?

It is the probability of observing z ≥ 0.51: P(z ≥ 0.51)

= ??

Page 24: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

Standard Normal Distribution Tablez 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

0.0 0.500 0.496 0.492 0.488 0.484 0.480 0.476 0.472 0.468 0.464

0.1 0.460 0.456 0.452 0.448 0.444 0.440 0.436 0.433 0.429 0.425

0.2 0.421 0.417 0.413 0.409 0.405 0.401 0.397 0.394 0.390 0.386

0.3 0.382 0.378 0.375 0.371 0.367 0.363 0.359 0.356 0.352 0.348

0.4 0.345 0.341 0.337 0.334 0.330 0.326 0.323 0.319 0.316 0.312

0.5 0.309 0.305 0.302 0.298 0.295 0.291 0.288 0.284 0.281 0.278

0.6 0.274 0.271 0.268 0.264 0.261 0.258 0.255 0.251 0.248 0.245

0.7 0.242 0.239 0.236 0.233 0.230 0.227 0.224 0.221 0.218 0.215

0.8 0.212 0.209 0.206 0.203 0.201 0.198 0.195 0.192 0.189 0.187

0.9 0.184 0.181 0.179 0.176 0.174 0.171 0.169 0.166 0.164 0.161

1.0 0.159 0.156 0.154 0.152 0.149 0.147 0.145 0.142 0.140 0.138

1.6 0.055 0.054 0.053 0.052 0.050 0.050 0.049 0.048 0.047 0.046

1.9 0.029 0.028 0.027 0.027 0.026 0.026 0.025 0.024 0.024 0.023

2.0 0.023 0.022 0.022 0.021 0.021 0.020 0.020 0.019 0.019 0.018

2.5 0.006 0.006 0.006 0.006 0.006 0.005 0.005 0.005 0.005 0.005

2.9 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.001 0.001

Page 25: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

Probability of an event (Z = 0.51)

The Z-score is used to compute the probability of obtaining

an observed score.

Example

Let z = 0.51. What is the probability of observing z=0.51?

It is the probability of observing z ≥ 0.51: P(z ≥ 0.51)

P(z ≥ 0.51) = 0.3050

Page 26: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

Example

12 10 0.66. ( 0.66) 0.255 25.5%

3z P z

Suppose that for a population students of a famous business school in

Sophia-Antipolis, grades are distributed normal with an average of 10

and a standard deviation of 3. What proportion of them Exceeds 12 ; Exceeds 15 Does not exceed 8 ; Does not exceed 12

Let the mean μ = 10 and standard deviation σ = 3:

15 10 1.66. ( 1.66) 0.049 4.9%

3z P z

8 10 0.66. ( 0.66) ( 0.66) 0.255 25.5%

3z P z P z

12 10 0.66. ( 0.66) 1 - ( 0.66) 1 0.255 74.5%

3z P z P z

Page 27: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

Confidence Interval

Page 28: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

Inverting the way of thinking Until now, we have thought in terms of observations x

and sample values μ and σ to produce the z score.

Let us now imagine that we do not know x, we know μ

and σ. If we consider any interval, we can write:

-

xz z x

z x z

? ?

Page 29: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

Inverting the way of thinking If z∈[-2.55;+2.55] we know that 99% of z-scores will

fall within the range

If z∈[-1.64;+1.64] we know that 90% of z-scores will

fall within the range

Let us now consider an interval which comprises 95% of

observations. Looking at the z table, we know that

z=1.96

Pr 1.96 1.96 0.95x

Page 30: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

Confidence Interval In statistics, a confidence interval is an interval within which the value

of a parameter is likely to be (the mean). Instead of estimating the

parameter by a single value, an interval of likely estimates is given.

Confidence intervals are used to indicate the reliability of an estimate.

A1. The sample mean is a random variable following a normal distribution

A2.The sample values μ and σ are good approximation of the population values.

Page 31: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

If a random sample is drawn from any population, the sampling distribution of the sample mean is

approximately normal for a sufficiently large sample size.

The larger the sample size, the more closely the sampling distribution of will resemble a normal distribution. x

The Central Limit Theorem

Page 32: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

1 2

1 2

1...

1...

1...

1

n

n

X

E X

X X Xn

E X E X E X E Xn

E Xn

E X nn

Moments of Sample Mean: The Mean

On average, the sample mean will be on target, that is, equal to the population mean.

Page 33: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

1 2

1 22

2 2 22

2 2

2

1 1 1var var var ... var

1var var var ... va

Standard error of

r

1var

v

...

ar

n

n

X X X Xn n n

X

X X X Xn

Xn

n

nX

n n

Moments of Sample Mean: The Variance

The standard deviation of the sample means represents the estimation error of the sample mean, and therefore it is called the standard

error.

Page 34: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

22

1.

2.

3. normal, x is normal. If x is nonnormal

x is approximately normally distributed for

sufficiently large sample size.

x

xx

X

nIf x is

The Sampling Distribution of the Sample Mean

Page 35: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

pc pcX z X zN N

1.64 1.64X XN N

1.96 1.96X XN N

General definition

Definition for 95% CI

Definition for 90% CI

Confidence Interval

Page 36: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

Standard Normal Distribution and CI

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

-5 -4 -3 -2 -1 0 1 2 3 4 5

90% of observations

95% of observations

99.7% of observations

Page 37: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

3 310 1.96 10 1.96 8.8 11.2

25 25

Let us draw a sample of 25 students from CERAM (n = 25), with X =

10 and σ = 3. Let us build the 95% CI

Application of Confidence Interval

Page 38: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

CERAM Average grades

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0 5 10 15 20

95% of chances that the mean is indeed located within this interval

8.8 11.2

Page 39: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

3 310 1.96 10 1.96 8.8 11.2

25 25

Let us draw a sample of 25 students from CERAM (n = 25), with X =

10 and σ = 3. Let us build the 95% CI

Application of Confidence Interval

4.7 4.711.5 1.96 11.5 1.96 9.8 13.2

30 30

Let us draw a sample of 25 students from HEC (n = 30), with X = 11.5

and σ = 4.7. Let us build the 95% CI

Page 40: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

HEC Average grades

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0 5 10 15 20

95% of chances that the mean is indeed located within this interval

9.8 13.2

Page 41: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

Hypothesis Testing Hypothesis 1 : Students from CERAM have an average grade which is

not significantly different from 11

H0 : μ(CERAM) = 11

H1 : μ(CERAM) ≠ 11

Hypothesis 2 : Students from CERAM have similar grades as students

from HEC

H0 : μ(CERAM) = μ(HEC)

H1 : μ(CERAM) ≠ μ(HEC)

I Accept H0 and reject H1

I Accept H0 and reject H1

Page 42: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

Comparing the Means Using CI’s

0.00

0.05

0.10

0.15

0.20

0.25

0 5 10 15 20

(μ=11.5;σ=4.7)

(μ=10;σ=3)

HEC

CERAM

The Overlap of the two CIs means that at 95% level, the two means do not differ significantly.

Page 43: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

Thus far, we have assumed that we know both the mean

and the standard deviation of the population. But in fact,

we do not know them: both μ and σ are unknown.

The Student t statistics is then preferred to the z statistics.

Its distribution is similar (identical to z as n → +∞). The CI

becomes

dfcp

sX t

N

The Student Test

Page 44: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

24 242.5 2.5

3 310 10

25 253 3

10 2.06 10 2.06 8.76 11.2325 25

t t

Let us draw a sample of 25 students from CERAM (n = 25), with μ = 10

and σ = 3. Let us build the 95% CI

Application of Student t to CI’s

4.7 4.711.5 2.06 11.5 2.06 9.73 13.26

30 30

Let us draw a sample of 25 students from HEC (n = 30), with μ = 11.5

and σ = 4.7. Let us build the 95% CI

Page 45: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

Import CERAM_LMC into SPSS Produce descriptive statistics for sales; labour, and R&D expenses

Analyse Statistiques descriptives Descriptive Options: choose the statistics you may wish

A newspaper writes that by and large, LMCs have 95,000 employees. Test statistically whether this is true at 1% level Test statistically whether this is true at 5% level Test statistically whether this is true at 10% and 20% level

Write out H0 and H1

Analyse Comparer les moyennes Test t pour échantillon unique Options: 99; 95, 90%

SPSS Application: Student t

Page 46: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

SPSS Application: t test at 99% level

Statistiques sur échantillon unique

1634 91298.87 96400.957 2384.818labourN Moyenne Ecart-type

Erreurstandardmoyenne

Test sur échantillon unique

-1.552 1633 .121 -3701.130 -9851.20 2448.94labourt ddl

Sig.(bilatérale)

Différencemoyenne Inférieure Supérieure

Intervalle de confiance99% de la différence

Valeur du test = 95000

Page 47: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

SPSS Application: t test at 95% level

Test sur échantillon unique

-1.552 1633 .121 -3701.130 -8378.75 976.50labourt ddl

Sig.(bilatérale)

Différencemoyenne Inférieure Supérieure

Intervalle de confiance95% de la différence

Valeur du test = 95000

Statistiques sur échantillon unique

1634 91298.87 96400.957 2384.818labourN Moyenne Ecart-type

Erreurstandardmoyenne

Page 48: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

SPSS Application: t test at 80% level

Statistiques sur échantillon unique

1634 91298.87 96400.957 2384.818labourN Moyenne Ecart-type

Erreurstandardmoyenne

Test sur échantillon unique

-1.552 1633 .121 -3701.130 -6758.63 -643.63labourt ddl

Sig.(bilatérale)

Différencemoyenne Inférieure Supérieure

Intervalle de confiance80% de la différence

Valeur du test = 95000

Page 49: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

2 20.01 0.0195000 95000 95000

96400 9640095000 2.573 95000 95000 2.573

1634 1634

9851.20 95000 2448.94

85148.8 97448.94

Pr 85148.8 97448.94 0.99

s sX t X t

N N

X X

SPSS Results (at 1% level)

Page 50: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

2

Xt

s n

Critical probability The confidence interval is designed in such a way that for each t

statistics chosen, we define a share of observations which this CI is

comprising. For large n, when t = 1.96, we have 95% CI For large n, when t = 2.55, we have 99% CI

Actually, for each t, there corresponds a share of observations One can compute directly the t value from our observations as follows:

Page 51: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

Critical probability The confidence interval is designed in such a way that for each t

statistics chosen, we define a share of observations which this CI is

comprising. For large n, when t = 1.96, we have 95% CI For large n, when t = 2.55, we have 99% CI

Actually, for each t, there corresponds a share of observation http://www.socr.ucla.edu/Applets.dir/T-table.html

One can compute directly the t value from our observations as follows:

2

95000 91298 95000 37021.552

96400 23841634

Xt

s

N

Page 52: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

Critical probability

With t = 1.552, I can conclude the following: 12% probability that μ belongs to the distribution

where the population mean = 95,000

I have 12% chances to wrongly reject H0

88% probability that μ belongs to another

distribution where the population mean ≠ 95,000

I have 88% chances to rightly reject H0

Shall I the accept or reject H0?

Page 53: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

6.1% 6.1%

88.0%

Page 54: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

Critical probability

With t = 1.552, I can conclude the following: 12% probability that μ belongs to the distribution

where the population mean = 95,000

I have 12% chances to wrongly reject H0

88% probability that μ belongs to another

distribution where the population mean ≠ 95,000

I have 88% chances to rightly reject H0

I accept H0 !!!

Page 55: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

Critical probability

The practice is to reject H0 only when the

critical probability is lower than 0.1, or 10% Some are even more cautious and prefer to

reject H0 at a critical probability level of 0.05,

or 5%. In any case, the philosophy of the statistician

is to be conservative.

Page 56: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

A Direct Comparison of Means Using Student t Another way to compare two sample means is to calculate the CI

of the mean difference. If 0 does not belong to CI, then the two

sample have significantly different means.

1 2 1 2

1 2

2 2

1 1 2 22

1 2

( 1) ( 1)

ppc

p

sX X t

n n

X X X Xs

n n

Standard error, also called pooled

variance

Page 57: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

Another newspaper argues that US companies are much larger than

those from the rest of the world. Is this true?

Produce descriptive statistics labour comparing the two groups Produce a group variables which equals 1 for US firms, 0 otherwise This is called a dummy variable

Write out H0 and H1

Analyse Comparer les moyennes Test t pour échantillon

indépendants What do you conclude at 5% level? What do you conclude at 1% level?

SPSS Application: t test comparing means

Page 58: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

SPSS Application: t test comparing means

Statistiques de groupe

628 97808.99 112765.1 4499.817

1006 87234.90 84403.469 2661.101

AM1

0

labourN Moyenne Ecart-type

Erreurstandardmoyenne

Test d'échantillons indépendants

.024 .877 2.159 1632 .031 10574.084 4897.135 968.751 20179.417

2.023 1061.268 .043 10574.084 5227.792 316.102 20832.067

Hypothèse devariances égales

Hypothèse devariances inégales

labourF Sig.

Test de Levene surl'égalité des variances

t ddlSig.

(bilatérale)Différencemoyenne

Différenceécart-type Inférieure Supérieure

Intervalle de confiance95% de la différence

Test-t pour égalité des moyennes

Page 59: Class 2 Statistical Inference Lionel Nesta Observatoire Français des Conjonctures Economiques Lionel.nesta@ofce.sciences-po.fr CERAM February-March-April

SPSS Application: t test comparing means

Statistiques de groupe

628 97808.99 112765.1 4499.817

1006 87234.90 84403.469 2661.101

AM1

0

labourN Moyenne Ecart-type

Erreurstandardmoyenne

Test d'échantillons indépendants

.024 .877 2.159 1632 .031 10574.084 4897.135 -2054.870 23203.038

2.023 1061.268 .043 10574.084 5227.792 -2916.075 24064.243

Hypothèse devariances égales

Hypothèse devariances inégales

labourF Sig.

Test de Levene surl'égalité des variances

t ddlSig.

(bilatérale)Différencemoyenne

Différenceécart-type Inférieure Supérieure

Intervalle de confiance99% de la différence

Test-t pour égalité des moyennes