Download - Andrea Beccarini Summer 2011 - wiwi.uni-muenster.de · Review of basic statistics • Random experiment (Zufallsexperiment) • Sample space (Ergebnismenge) • Event (Ereignis) •

Econometrics I

Andrea Beccarini

Summer 2011

Outline

• Very brief review of statistical basics

• Simple linear regression model(specification, point estimation, interval estimation, hypothesis tests,forecasting, maximum likelihood estimation)

• Multiple linear regression model

• Violations of (some) model assumptions

9

Review of basic statistics

• Random experiment (Zufallsexperiment)

• Sample space (Ergebnismenge)

• Event (Ereignis)

• Set operations (Verknüpfungen von Ereignissen)

• Partition (Partition oder vollständige Zerlegung)

10

• Probability (Wahrscheinlichkeit)

• Kolmogorov’s axioms (Kolmogorovs Axiome)

• Conditional probability (bedingte Wahrscheinlichkeit)

• Total probability (Satz von der totalen Wahrscheinlichkeit)

• Bayes’ theorem (Satz von Bayes)

• Independence (Unabhängigkeit)11

• Random variables (Zufallsvariable)

— Definition and intuition

— Distribution function and quantile function(Verteilungsfunktion und Quantilfunktion)

— Discrete and continuous random variables(diskrete und stetige Zufallsvariable)

— Density function (Dichtefunktion)

— Expectation (Erwartungswert)

— Variance (Varianz)12

• Special discrete distributions,e.g. Bernoulli, binomial, Poisson, geometric, hypergeometric, . . .

• Special continuous distributionse.g. normal, standard normal distribution, exponential, Pareto, χ2, F, t, . . .

• There are many more special distributions

• Which distribution can be used when?

13

Simple linear regression model

• Econometrics: Application of statistical methods to empirical research ineconomics

• Econometric problems:

— Specification of an appropriate model

— Estimation of the model (Schätzung)

— Hypothesis testing

— Forecasting (Prognose)14

Economic model↓

SPECIFICATIONfunctional (A-assumptions)error term (B-assumptions)variables (C-assumptions)

↓Econometric model

↓ESTIMATION

↓Estimated model

↓ ↓HYPOTHESIS TESTS FORECASTING

15

Data

• Empirical research requires (high quality) data

• Often, collecting data is the main problem of empirical research

• There is no systematic approach

Kinds of data:

• Time series data (Zeitreihendaten), cross sectional data (Querschnittsdaten),panel data (Paneldaten)

16

Specification

• Numeric illustration: Data of the gratuity example

t xt yt t xt yt1 10.00 2.00 11 60.00 7.002 30.00 3.00 12 47.50 5.503 50.00 7.00 13 45.00 7.004 25.00 2.00 14 27.50 4.505 7.50 2.50 15 15.00 1.506 42.50 6.00 16 20.00 4.007 35.00 5.00 17 47.50 9.008 40.00 4.00 18 32.50 3.009 25.00 6.00 19 37.50 6.5010 12.50 1.00 20 20.00 2.50

Billing amount xt andtip yt (both in Euro)of 20 observed guests

17

• Functional dependence (generic)

y = f (x)

• More specifically, the functional dependence is assumed to be

y = α+ βx

• Other functional forms are of course possible; more on that later

• The econometric model is specified using the A-, B- and C-assumptions

18

• Economic model: yt = α+ βxt for t = 1, . . . , 20

Rechnungsbetrag x

Trin

kgel

d y

0 20 40 60 80

02

46

8

R

α

20β

20

19

• Econometric model: yt = α+ βxt + ut for t = 1, . . . , 20

xt

y t

0 20 40 60 80

02

46

810

20

The A-assumptions (functional specification):

Assumption a1: No relevant exogenous variable is omitted from the econometricmodel, and the exogenous variable included in the model is relevant

Assumption a2: The true functional dependence between xt and yt is linear

Assumption a3: The parameters α and β are constant for all T observations(xt, yt)

21

The B-assumptions (error term specification):

Assumption b1: E(ut) = 0 for t = 1, . . . , T

Assumption b2: Homoskedasticity: V ar(ut) = σ2 for t = 1, . . . , T

Assumption b3: For all t 6= s with t = 1, 2, ..., T and s = 1, 2, .., T we have

Cov(ut, us) = 0

Assumption b4: The error terms ut are normally distributed.

Compact notation of all B-assumptions: ut ∼ NID(0, σ2) for t = 1, . . . , T

22

• Graphical illustration of the error term distribution

23

The C-assumptions (variable specification):

Assumption c1 The exogenous variable xt is not stochastic, but can be controlledas in an experimental situation

Assumption c2 The exogenous variable xt is not constant for all observations t

• Of course, many (or even all?) of the A-, B-, and C-assumptions are restrictiveand unrealistic

• We will nevertheless suppose they are satisfied for the time being,and consider their violations later on

24

Point estimation

• The simple (two-variable) linear regression model is

yt = α+ βxt + ut

• Numeric illustration: The first data of the gratuity example

t xt yt1 10 22 30 33 50 7

012345678

0 20 40 60

yt

xt

25

• Estimation: Compute estimated values α and β

• Distinguish between true and estimated values

• If the true econometric model is

yt = α+ βxt + ut

then the corresponding estimated model is

yt = α+ βxt

26

• How can we estimate the coefficients?

xt

y t

0 10 20 30 40 50 60

02

46

8

R1

R2

R3

27

Least squares method

• Sum of squared residuals

Suu =TXt=1

u2t

where the residuals are

ut = yt − yt

= yt − α− βxt

• Residual (Residuum): Difference between the observed value yt and theestimated (predicted) value yt

28

• Choose α and β such that the sum of squared residuals

Suu =TXt=1

u2t =TXt=1

(yt − α− βxt)2

is minimized

• Derivation of estimators (Schätzer) [1]

β = Sxy/Sxx

α = y − βx

with

Sxx =X(xt − x)2 =

Xx2t − T x2

Sxy =X(xt − x) (yt − y) =

Xytxt − T xy

29

• Numeric illustration for the three-points example

t xt yt1 10 22 30 33 50 7

• Calculate {1}

α, β

y1, y2, y3

u1, u2, u3

Suu

30

The coefficient of determination R2

• Variation of the endogenous variable Syy =P(yt − y)2

yt

xt

R

x

y =y y1 −

y y2 −

y y3 −g

012345678

0 20 40 60

31

• Variation Syy =P(yt − y)2 and sum of squared residuals Suu =

Pu2t

yt

xt

$RKQ$u3

$u2

g

$u1

$y y1 −y y2 −

y y3 −$y y3 −

012345678

0 20 40 60

32

• Decomposition of sum of squares (Streuungszerlegungssatz): [2]

Syy = Syy + Suu

or X(yt − y)2 =

X(yt − y)2 +

Xu2t

• Coefficient of determination (Bestimmtheitsmaß)

R2 =„explained variation”„unexplained variation”

=Syy − Suu

Syy=

Syy

Syy

• Computation of R2 {2}

R2 =βSxy

Syy=

S2xy

SxxSyy33

Properties of the estimators

• The estimators

β = Sxy/Sxx

α = y − βx

are random variables

• Thought experiment: repeated samples

• Computer simulation [experiment.R]

34

• Under the a-, b- and c-assumptions (without b4) [3]

E(α) = α

E(β) = β

and [4]

Cov(α, β) = −σ2 (x/Sxx)V ar(α) = σ2

³1/T + x2/Sxx

´V ar(β) = σ2/Sxx

• BLUE property: α and β are the best linear unbiased estimators [5]

• If, additionally b4 is true, then α and β are the best unbiased estimators35

• How are yt, α and β distributed?

• Because of

ut ∼ NID(0, σ2)

yt is normally distributed, t = 1, . . . , T

• The expectation of yt is

E (yt) = E(α+ βxt + ut)

= E (α) +E (βxt) +E (ut)

= α+ βxt

36

• The variance of yt is

V ar (yt) = E³(yt −E (yt))

2´

= E³(yt − α− βxt)

2´

= E³u2t´

= E³(ut −E(ut))

2´

= σ2

• Further, for t = 1, . . . , T

yt ∼ NID(α+ βxt, σ2)

37

• Since

β = Sxy/Sxx

α = y − βx

both α and β are linear transformations of the yt

• Linear transformations of independent normally distributed random variablesare normally distributed

• Hence,

α ∼ N³α, σ2(1/T + x2/Sxx)

´β ∼ N

³β, σ2/Sxx

´38

Interval estimation (Intervallschätzung)

• We already know that β is a random variable and

β ∼ N³β, σ2/Sxx

´

• Instead of a point estimator β we now want an interval estimator

[β − k ; β + k]

satisfying

P³β − k ≤ β ≤ β + k

´= 1− a

• The interval [β − k ; β + k] is called (1− a)-confidence interval(Konfidenzintervall)

39

Confidence interval when σ2 is known

• Step 1: Standardization of β

se(β) =qσ2/Sxx

z =β −E(β)

se(β)

=β − β

se(β)∼ N (0, 1)

• The random variable z = (β − β)/se(β) is a pivot (Pivot), i.e. its distributiondoes not depend on unknown parameters

40

• Step 2: Find the (1− α/2)-quantile za/2

P (−za/2 ≤ z ≤ za/2) = 1− a

• Step 3: Substitute z by (β − β)/se(β)

P

Ã−za/2 ≤

β − β

se(β)≤ za/2

!= 1− a

• Rewriting yields the (1− a)-interval [6]{3}hβ − za/2 · se(β); β + za/2 · se(β)

i

41

Confidence interval when σ2 is unknown

• Step 1: Estimation of σ2 and se(β):

σ2 =1

T − 2

TXt=1

u2t

is a consistent and unbiased estimator of σ2 and

cse(β) = qσ2/Sxx

is a consistent estimator of se(β) (we postpone the proofs)

• Step 2: Standardization of β

t =β −E(β)cse(β) =

β − βcse(β) ∼ t(T−2)

42

• The random variable t = (β − β)/cse(β) is a pivot• Step 3: Find the (1− α/2)-quantile ta/2

P (−ta/2 ≤ t ≤ ta/2) = 1− a

• Step 4: Substitute and solve for β,

P (β − ta/2 · cse(β) ≤ β ≤ β + ta/2 · cse(β)) = 1− a

• The interval estimator is {4}hβ − ta/2 · cse(β); β + ta/2 · cse(β)i

43

• Interval estimator for intercept αhα− ta/2 · cse(α) ; α+ ta/2 · cse(α)i

where

cse(α) = qbσ2(1/T + x2/Sxx)

• Some terminology: The standard error (Standardfehler) is se(β);the estimated standard error is cse(β)

• Usually, both se(β) and cse(β) are called standard error (Standardfehler)• Interpretation of interval estimators?

44

Hypothesis tests

• How can we test hypotheses about the regression coefficients(usually about the slope β)?

• Null hypothesis H0 and alternative hypothesis H1(Nullhypothese und Alternativhypothese)

• There are one-sided and two-sided tests

• We already know that

β ∼ N³β, σ2/Sxx

´45

• If the null hypothesis H0 : β = q is true, then β can be substituted by q

β ∼ N³q, σ2/Sxx

´

• Then

P (β − k ≤ q ≤ β + k) = 1− a

P (q − k ≤ β ≤ q + k) = 1− a

• With high probability 1− α, the estimator β will be inside the interval[q − k; q + k], if H0 is true

• If the estimator β is outside the interval, that is evidence against the nullhypothesis

46

• Graphical illustration

47

• The analytical approach is slightly different

• Step 1: Set up H0 and H1 and fix the significance level a

H0 : β = q

H1 : β 6= q

• Step 2: Estimate se(β)

cse(β) = qσ2/Sxx

with σ2 = Suu/ (T − 2)

48

• Step 3: Compute the t-test statistic

t =β − qcse(β)

If H0 : β = q is true, then

t ∼ t(T−2)

• Step 4: Find the critical value ta/2

P (−ta/2 ≤ t ≤ ta/2) = 1− a

• Step 5: Compare ta/2 and t. If t is outside [−ta/2; ta/2], i.e. if |t| > ta/2,then reject H0 {5}

49

Connections between hypothesis testing and confidence intervals

• Under the (two-sided) null hypothesis H0

P³q − ta/2 · cse(β) ≤ β ≤ q + ta/2 · cse(β)´ = 1− a

• The (1− a)-confidence interval ishβ − ta/2 · cse(β); β + ta/2 · cse(β)i

• Conclusion: If q is outside the confidence interval, H0 is rejected {6}

50

One-sided hypothesis tests (einseitige Tests)

• Right or left-sided tests

• Right-sided null hypothesis

H0 : β ≤ q

H1 : β > q

• The basic idea remains the same: If β is „much larger” than q, reject H0

51

• Graphical illustration:

52

Analytical approach (right-sided null hypothesis)

• Step 1: State H0 and H1 and set the significance level a

H0 : β ≤ q

H1 : β > q

• Step 2: Estimate se(β)

• Step 3: Compute the t-statistic

t =β − qcse(β)

Under H0 its distribution is t ∼ t(T−2)53

• Step 4: Find the critical value ta

P (t ≤ ta) = 1− a

For left-sided null hypotheses, the steps 1, 2 and 3 are the same; the criticalvalue is t1−a with P (t < ta) = a

• Step 5: Compare ta and t; reject H0, if t > ta {7}

• For left-sided null hypotheses, H0 is rejected if t is less than the critical value,t < t1−a

54

The p-value (p-Wert)

• The p-value is the probability that the test statistic (a random variable) isgreater than the realized test statistic

• Traditional approach: Reject the null hypothesis if the test statistic is inside thecritical region, e.g. if t > ta

• Alternative approach: Comparison of probabilities; reject the null hypothesis ifthe p-value is less than the significance level a

55

• Graphical illustration:

56

• The two approaches — comparison of t-statistic and critical value or comparisonof p-value and significance level — are essentially identical {8}

• Advantages of the p-value approach?

• Disadvantages?

• p-value formulas for right- and left-sided hypothesis tests? [7]

• p-value formula for two-sided hypothesis test?

57

How to choose the null and alternative hypotheses

• There are basically two strategies:

— State the opposite of the conjecture as the null hypothesisand try to reject it

— State the conjecture as the null hypothesisand show that it cannot be rejected

• There is an important asymmetry between rejection and non-rejection

58

Maximum likelihood estimation

• Main idea: Find those parameter values that maximize the probability (orlikelihood) of observing the actually observed data

• Notation:

θ : Parameter vector, e.g. θ = (α, β, σ2)

L(θ) : Likelihood (given all the data)

lnL (θ) : Log-likelihood

• Maximum likelihood estimators

θ = argmin lnL(θ)

59

• We already know that, for t = 1, . . . , T

yt ∼ NID(α+ βxt, σ2),

hence the density of yt is

fyt(y) =1√2πσ2

exp

Ã−12

(y − α− βxt)2

σ2

!

• Due to independence, the joint likelihood and log-likelihood are

L(α, β, σ2) = fy1,...,yT (y1, . . . , yT ) =TYt=1

fyt(yt)

lnL(α, β, σ2) = ln fy1,...,yT (y1, . . . , yT ) =TXt=1

ln fyt(yt)

60

• Maximize

lnL(α, β, σ2) = ln fy1,...,yT (y1, . . . , yT )

=TXt=1

ln

"1√2πσ2

exp

Ã−12

(yt − α− βxt)2

σ2

!#

with respect to the parameters α, β, σ2 [8]

• The ML estimators are

αML = y − βMLx

βML =Sxy

Sxx

σ2ML =1

T

TXt=1

u2t

61

• Hypothesis tests in the maximum likelihood framework(the three classical tests: Wald, LR, LM)

• Null and alternative hypotheses, e.g.

H0 : β = β0

H1 : β 6= β0

• Derivation of the test statistics [exercise]

62

Forecasting

• Conditional forecast: the value of the exogenous variable is known andnon-stochastic x0

• Point forecast of the endogenous variable is {9}

y0 = α+ βx0

• The true value of y0 is usually not y0 but

y0 = α+ βx0 + u0

63

• The forecasting error is

y0 − y0 = α+ βx0 − (α+ βx0 + u0)

= (α− α) +³β − β

´x0 − u0

• There are two error sources:

1. The error term u0 will not vanish, in general.

2. The parameter estimates α and β will deviate from the true values α and β.

64

Properties of the point forecast

• Expected forecasting error:

E(y0 − y0) = E(α− α) +E(β − β)x0 −E(u0)

= 0

• Variance of the forecasting error [9]

V ar(y0 − y0) = σ2h1 + 1/T + (x0 − x)2 /Sxx

i

• Estimated variance of the forecasting error {9}

dV ar(y0 − y0) = σ2h1 + 1/T + (x0 − x)2 /Sxx

i65

Interval forecast

• Step 1: Estimation of se(y0 − y0)

• Step 2: Standardization of (y0−y0)

t =(y0 − y0)−

=0z }| {E (y0 − y0)cse(y0 − y0)

=y0 − y0cse(y0 − y0)

∼ tT−2

• Step 3: Find the ta/2-value (from statistical tables or using statistical computersoftware)

66

• Step 4: With large probability 1− α, the random variable t will be inside theinterval [−ta/2 ; ta/2],

P

Ã−ta/2 ≤

y0 − y0cse (y0 − y0)≤ ta/2

!= 1− a

Solve for y0

P³y0 − ta/2 · cse(y0 − y0) ≤ y0 ≤ y0 + ta/2 · cse(y0 − y0)

´= 1− a

• Hence, the interval forecast is {9}hy0 − ta/2 · cse(y0 − y0); y0 + ta/2 · cse(y0 − y0)

i

67

• Width of the interval

68

Multiple linear regression model

• Until today we only considered a single exogenous variable, but in mostempirical problems we face many exogenous variables

• Many of the results from the simple linear regression model can be transferredto the multiple case

• Important tool: matrix algebra(main diagonal, transpose, addition, scalar multiplication, inner product, matrixmultiplication, idem potent, determinant, rank, inverse, trace, definit matrices,semidefinite matrices)

69

Specification

• Example: Estimation of a production function for barley

• Conduct an experiment where the barley output (Gerste, gt) is observed fordifferent combinations of phosphate (pt) and nitrogen (nt)

• There are T = 30 different combinations

• The following table shows the data

70

t pt nt gt t pt nt gt1 22,00 40,00 38,36 16 25,00 110,00 59,552 22,00 60,00 49,03 17 26,00 50,00 55,243 22,00 90,00 59,87 18 26,00 70,00 54,134 22,00 120,00 59,35 19 26,00 90,00 66,575 23,00 50,00 45,45 20 26,00 110,00 61,746 23,00 80,00 53,23 21 27,00 40,00 48,997 23,00 100,00 56,55 22 27,00 60,00 54,388 23,00 120,00 50,91 23 27,00 80,00 58,289 24,00 40,00 44,87 24 27,00 100,00 62,8110 24,00 60,00 54,06 25 28,00 50,00 50,7611 24,00 90,00 60,34 26 28,00 70,00 51,5412 24,00 120,00 58,21 27 28,00 100,00 59,3913 25,00 50,00 51,52 28 28,00 110,00 68,1714 25,00 80,00 58,58 29 29,00 60,00 59,2515 25,00 100,00 57,27 30 29,00 100,00 64,39

71

Functional specification (A-assumptions)

• The economic (agro-economic) model formalizes the connection between thebarley output (g) and the fertilizers (p and n)

g = f(p, n)

• Possible function formg = α+ β1p+ β2n

• A more realistic functional form

g = Apβ1nβ2,

where A, β1 and β2 are constant parameters

72

• Take logarithms of the production function g = Apβ1nβ2,

ln g = lnA+ β1 ln p+ β2 lnn

• Define α = lnA, y = ln g, x1 = ln p and x2 = lnn, then

y = α+ β1x1 + β2x2

• Table of log-values:

t x1 x2 yt(= ln pt) (= lnnt) (= ln gt)

1 3,0910 3,6889 3,6470... ........... ........... ..........30 3,3673 4,6052 4,1650

73

• The econometric model is

yt = α+ β1x1t + β2x2t + ut

for t = 1, . . . , T

• General model for K exogenous variables

yt = α+ β1x1t + β2x2t + . . .+ βKxkt + ut

for t = 1, . . . , T or

y1 = α+ β1x11 + β2x21 + ...+ βKxK1 + u1

y2 = α+ β1x12 + β2x22 + ...+ βKxK2 + u2...

yT = α+ β1x1T + β2x2T + ...+ βKxKT + uT

74

• Matrix notation: Define

y =

⎡⎢⎢⎢⎣y1y2...yT

⎤⎥⎥⎥⎦ ; X =

⎡⎢⎢⎢⎣1 x11 . . . xK11 x12 . . . xK2... ... . . . ...1 x1T . . . xKT

⎤⎥⎥⎥⎦ ; β =

⎡⎢⎢⎢⎣αβ1...βK

⎤⎥⎥⎥⎦ ; u =

⎡⎢⎢⎢⎣u1u2...uT

⎤⎥⎥⎥⎦

• Compact notation for the multiple regression model

y = Xβ + u

or ⎡⎢⎢⎢⎣y1y2...yT

⎤⎥⎥⎥⎦ =⎡⎢⎢⎢⎣1 x11 . . . xK11 x12 . . . xK2... ... . . . ...1 x1T . . . xKT

⎤⎥⎥⎥⎦⎡⎢⎢⎢⎣αβ1...βK

⎤⎥⎥⎥⎦+⎡⎢⎢⎢⎣u1u2...uT

⎤⎥⎥⎥⎦75

The A-assumptions

Assumption A1: No relevant exogenous variable is omitted from the econometricmodel, and all exogenous variables included in the model are relevant

Assumption A2: The true functional dependence between X and y is linear

Assumption A3: The parameters β are constant for all T observations (xt, yt)

76

The B-assumptions

The B-assumptions are the same as in the simple linear model, i.e. E(ut) = 0,V ar(ut) = σ2, Cov(ut, us) = 0 for t 6= s and normality

B1 to B4 in matrix notation

u ∼ N³0, σ2IT

´

77

The C-assumptions

Assumption C1: The exogenous variables x1t, . . . , xKt are not stochastic, but canbe controlled as in an experimental situation

Assumption C2: No perfect multicollinearity: The are no parameter values γ0, γ1,γ2, . . . , γK (with at least one γk 6= 0), such that for all t = 1, . . . , T

γ0 + γ1x1t + γ2x2t + . . .+ γKxKt = 0

Assumption C2’ in matrix notation:

rang(X) = K + 1

(implication: T ≥ K + 1)

78

Perfect multicollinearity with two regressors

• If C2 is violated, there are γ0, γ1, γ2, (not all 0) such that

γ0 + γ1x1t + γ2x2t = 0

for all t = 1, . . . , T , thus

x2t = − (γ0/γ2)− (γ1/γ2)x1t = δ0 + δ1x1t

with δ0 = − (γ0/γ2) and δ1 = − (γ1/γ2)

• Hence, there are not really two regressors, since

yt = α+ β1x1t + β2x2t + ut= (α+ β2δ0)| {z }

=α0+(β1 + β2δ1)| {z }

=β0x1t + ut

79

Point estimation

• The econometric model is

y = Xβ + u

yt = α+ β1x1t + . . .+ βKxKt + ut for t = 1, . . . , T

• The estimated model is

y = Xβ

yt = α+ β1x1t + . . .+ βKxKt for t = 1, . . . , T

80

• Define the residuals

u = y− yut = yt − yt for t = 1, . . . , T

• How can we find an estimator β in the multiple regression model?

• The sum of squared residuals is

Suu = u0u

=X

u2t

81

• Because of

u = y−Xβ= yt − α− β1x1t − . . .− βKxKt

we have

Suu =³y−Xβ

´0 ³y−Xβ

´=

X³yt − α− β1x1t − . . .− βKxKt

´2• First order conditions

∂Suu

∂β=

⎡⎢⎢⎢⎢⎣∂Suu/∂α

∂Suu/∂β1...∂Suu/∂βK

⎤⎥⎥⎥⎥⎦ = 082

• Vector of derivatives∂Suu

∂β=

∂

∂β

³y−Xβ

´0 ³y−Xβ

´=

∂

∂βy0y− ∂

∂β2y0Xβ +

∂

∂ββX

0Xβ

= −2X0y+2X0Xβ

• J.R. Magnus, H. Neudecker, Matrix Differential Calculus with Applications inStatistics and Econometrics, rev. ed., John Wiley & Sons: Chichester, 1999.

• Phoebus J. Dhrymes, Mathematics for Econometrics, 3rd ed.,Springer: New York, 2000.

83

• Solving the first order conditions yields the normal equations

X0Xβ = X0y

and thus

β =³X0X

´−1X0y

• The terms are

X0X=

⎡⎢⎢⎢⎣T

Px1t . . .

PxKtP

x1tPx21t . . .

Px1txKt

... ... . . . ...PxKt

PxKtx1t . . .

Px2Kt

⎤⎥⎥⎥⎦ , X0y=

⎡⎢⎢⎢⎣PytPx1tyt

...PxKtyt

⎤⎥⎥⎥⎦

• Numeric illustration {10}

84

Meaning of the estimators α, β1 and β2

• Formal meaning∂yt

∂x1t= β1 and

∂yt

∂x2t= β2

• Meaning of α: for x1t = x2t = 0

ln gt = α = 0.9543

gt = e0.9543 = 2.5969

85

• Meaning of β1 and β2:

β1 =∂yt

∂x1t=

∂ (ln gt)

∂ (ln pt)

• Because of∂ ln gt

∂gt=1

gtand

∂ ln pt

∂pt=1

pt

we find

β1 =∂gt/gt

∂pt/pt

• β1 is the estimated elasticity of the barley output with respect to thephosphate fertilizer

86

Coefficient of determination R2

• The total variation of y can be decomposed in the same way as in the simplelinear model

Syy|{z}„total variation”

= Syy|{z}„explained variation”

+ Suu|{z}„unexplained variation”

• The coefficient of determination is defined as

R2 =„explained variation”„total variation”

=Syy

Syy

=Syy − Suu

Syy

87

• Graphical illustration

E

AB

C

F D G

Syy

S11 S22

• Here

R2 =A+B + C

A+B + C +E

88

• Computation of R2: In the simple linear regression model

R2 =Syy

Syy=

βSxy

Syy

• It can be shown that in the multiple linear regression model

Syy =KXk=1

βkSky

with the covariations Sky =PTt=1 (xkt − xk) (yt − y)

• Then {11}

R2 =

PKk=1

bβkSkySyy

89

Properties of the OLS estimators

• The estimator β is a random vector

• The expectation vector is [10]

E(β) = β

(unbiasedness, Erwartungstreue)

• The covariance matrix of β is [11]

V(β) = σ2³X0X

´−190

• Special case: Covariance matrix in the two regressor model:

V ar(β1) =σ2

S11³1−R21·2

´V ar(β2) =

σ2

S22³1−R21·2

´V ar (α) = σ2/T + x21V ar(β1)

+2x1x2Cov(β1, β2) + x22V ar(β2)

Cov(β1, β2) =−σ2R21·2

S12³1−R21·2

´where

R21·2 =S212

S11S22

91

Gauss-Markov theorem

• The estimator β =¡X0X

¢−1X0y is linear in y, sinceβ = Dy

with D =³X0X

´−1X0

• β =¡X0X

¢−1X0y is not only unbiased but also efficient• Let β be another linear unbiased estimator of β

• Then V(β)−V(β) is positive semidefinit [12]

92

Distribution of the estimator

• The model is y = Xβ + u

• From u ∼ N(0, σ2IT ) we conclude that y is multivariate normally distributed

• Expectation vector and covariance matrix of endogenous variable

E(y) = E(Xβ + u) = Xβ

V(y) = V(Xβ + u) = V(u) = σ2IT

• Thus y ∼ N(Xβ, σ2IT )

93

• How is the estimator β distributed?

• Since β =¡X0X

¢−1X0y the estimator β also has a multivariate normaldistribution

• Expectation vector and covariance matrix are already known

• Hence

β ∼ Nµβ, σ2

³X0X

´−1¶

• Problem: The error term variance σ2 is unknown94

• The covariance matrix V(β) cannot be computed without σ2

• Since usually σ2 is unknown, it has to be estimated

• An estimator of σ2 isσ2 =

SuuT −K − 1

• Its expectation is E(σ2) = σ2 [13]{12}

• The “residual maker matrix”

M = IT −X(X0X)−1X0

95

Interval estimation

• Interval estimation of a single component βk of the vector β

P³βk − c ≤ βk ≤ βk + c

´= 1− a

• We know that

βk ∼ N(βk, V ar(βk))

where V ar(βk) is the (k + 1)th diagonal element of σ2

¡X0X

¢−1

• Problem: σ2 and V ar(βk) are unknown

96

• Step 1: Estimation of σ2 by σ2 and se(βk) =qV ar(βk) by

cse(βk) = q dV ar(βk)• Step 2: Standardization of βk

t =βk −E(βk)cse(βk) =

βk − βkcse(βk) ∼ t(T−K−1)

• Step 3: Find the ta/2-value

• Step 4: The (1− α)-interval estimator is {13}hβk − ta/2 · cse(βk) ; βk + ta/2 · cse(βk)i

97

Interval estimation of linear combinations of β

• Let r be an arbitrary (K + 1)-column vector

• How can we find a confidence interval of r0β ?

• Fertilizer example: r = [0, 1, 1]0, then r0β = β1 + β2 (economies of scale?)

• The point estimator of r0β is r0β

• The variance of r0β is r0V(β)r = σ2r0(X0X)−1r98

• The confidence interval for r0β is∙r0β − ta/2 · σ

qr0(X0X)−1r ; r0β + ta/2 · σ

qr0(X0X)−1r

¸

• Special case of a single component

βk = r0β

for

r = [0, . . . , 0, 1, 0, . . . , 0]0

where the 1 is located at the kth position

• Then V ar(βk) = r0σ2(X0X)−1r99

Hypothesis tests: t-test

• There are tests of a single linear combination (t-tests) and tests of multiplelinear combinations (F -tests)

• Testing a single linear combination of parameters: t-test (two-sided)

• Remember: In the simple linear regression case

H0 : β = q

H1 : β 6= q

100

• In the multiple linear model the null and alternative hypotheses are

H0 : r0α+ r1β1 + . . .+ rKβK = q

H1 : r0α+ r1β1 + . . .+ rKβK 6= q

or

H0 : r0β = q

H1 : r0β 6= q

where

r = [r0, r1, . . . , rK]0

101

The test procedure:

1. Set up H0 and H1 and fix the significance level a

2. Estimate se(r0β)

3. Compute the t-statistic

4. Find the critical value ta/2

5. Test decision: Compare ta/2 and t {14}

102

• The left-sided t-test

H0 : r0β ≥ q

H1 : r0β < q

and the right-sided test

H0 : r0β ≤ q

H1 : r0β > q

are similar

• The critical values are lower quantiles of the t-distribution for the left-sided testand upper quantiles for the right-sided test {14}

103

Hypothesis tests: F-test

• Simultaneous test of two or more linear combinations (restrictions)

• Null hypothesis and alternative hypothesis

H0 : Rβ = q

H1 : Rβ 6= q

• Exampels:

H0 : β1 = β2 = . . . = βK = 0

H0 : β1 = β2 = . . . = βKH0 : β1 + . . .+ βk = 1 and β1 = 2β2H0 : β1 = 5 and β2 = . . . = βK = 0

104

• Basic idea of the F -test: Compare the restricted and the unrestricted model

• Sum of squared residuals of the econometric model and the model under thenull hypothesis

Suu = u0u =TXt=1

u2t

Su0u0 = u00u0 =TXt=1

³u0t´2

where u0 are the residuals if the model is estimated under the restrictions ofthe null hypothesis

105

• Example: Null hypothesis

yt = α+ 0 · x1t + . . .+ 0 · xKt + ut = α+ ut

• Obviously, S0bubu ≥ Sbubu; the null hypothesis is likely to be false if S0bubu is “muchlarger” than Sbubu

• The test statistic is

F =

³S0bubu − Sbubu´.L

Sbubu/ (T −K − 1)where L is the number of restrictions in H0

• If the null hypothesis is true, then F ∼ F(L,T−K−1)106

The five steps of the F -test

1. Set up H0 and H1 and choose the significance level a

2. Calculate Sbubu and S0bubu (more on the computation of S0bubu later)3. Compute the F -test statistic

4. Find the critical value Fa, i.e. the upper a-quantile of theFL,T−K−1-distribution

5. Reject H0 if F > Fa {15}

107

Remarks:

• For L = 1 the F -test is identical to a two-sided t-test

• Careful: A combination of t-tests is not the same as a single F -test

• The decisions of t-tests and an F -test can be contradicting

• Distinction between individual t-tests and a simultaneous F -test

108

• Example: H0 : β1 = β2 = 0.33 {16}

109

Computation of u00u0

• Estimate β subject to the restrictions Rβ = q given in the null hypothesis

• Optimization under constraints: Minimize

Su0u0 = (y−Xβ)0 (y−Xβ)

with respect to β subject to Rβ = q

• A standard Lagrange approach yields [14]

βRLS

= β −³X0X

´−1R0

∙R³X0X

´−1R0¸−1 ³

Rβ − q´

110

• Residuals of the restricted model: u0 = y−XβRLS {17}

• The F -test statistic can also be written as [15]

F =

³Rβ − q

´0 hR¡X0X

¢−1R0i−1 ³Rβ − q´ /Lu0u/ (T −K − 1)

• Note the similarity to the t-test statistic

t2 =

³r0β − q

´2σ2hr0 (X0X)−1 r

i

• Standard statistical software includes simultaneous tests of linear combinations(F -tests)

111

Maximum likelihood estimation

• Repetition: If X is a K-dimensional random vector with multivariate normaldistribution N(μ,Σ) then its joint density is

fX (x) = (2π)−K/2 (detΣ)−1/2 exp

µ−12(x− μ)0Σ−1 (x− μ)

¶

• Multiple linear regression model

y = Xβ + u with u ∼ N³0, σ2I

´

• Distribution of the endogenous variables: y ∼ N³Xβ, σ2I

´112

• Joint density of y

fy (y)

= (2π)−T2

³detσ2I

´−12 expµ−12(y−Xβ)0

³σ2I

´−1(y−Xβ)

¶= (2π)−T/2

³σ2T

´−1/2exp

Ã−(y−Xβ)

0 (y−Xβ)2σ2

!

• Log-likelihood function

lnL³β, σ2

´= −T

2ln (2π)− T

2lnσ2 − (y−Xβ)

0 (y−Xβ)2σ2

113

• First order condition for a maximum⎡⎢⎣ ∂ lnL∂β

∂ lnL∂σ2

⎤⎥⎦ =⎡⎢⎢⎣

X0(y−X0β)σ2

−T2σ2

+(y−Xβ)0(y−Xβ)

2σ4

⎤⎥⎥⎦ ="00

#

• Solution of the FOCs [16]

βML =³X0X

´−1X0y

σ2ML =1

T

³u0u

´

• The ML estimator of β is identical to the OLS estimator, the ML estimator ofσ2 is different and thus biased (but asymptotically unbiased)

114

The classical tests (LR, Wald, LM)

• Illustration of the basic test ideas [threetests.R]

• Generalization to multiple restrictions

H0 : g(β) = 0

H1 : g(β) 6= 0

where β is the coefficient vector of a multiple linear regression modeland g is a (possibly nonlinear) vector-valued function

• Test of L linear restrictions: g(β) = Rβ − q

115

Wald test

• Idea: If g(βML) is significantly different from 0, reject H0

• Test statistic (for multiple restrictions)

W = g³βML

´0 h dCov ³g ³βML

´´i−1g³βML

´d→ U ∼ χ2L

if the null hypothesis is true

• Wald test statistic for L linear restrictions Rβ − q = 0 [17]

116

Likelihood ratio (LR) test

• Idea: If the maximal likelihood under the restrictions L(βR, σ2R) is significantlylower than the maximal likelihood without restrictions L(βML, σ

2ML), then

reject H0

• Test statistic

LR = 2³lnL

³βML, σ

2ML

´− lnL

³βR, σ

2R

´´d→ U ∼ χ2L


• LR test statistic for L linear restrictions Rβ − q = 0 [18]

117

Lagrange multiplier (LM) test

• Idea: If the slope of the log-likelihood function ∂ lnL(βR)/∂β is significantlydifferent from 0, reject H0

• Test statistic

LM =

⎛⎝∂ lnL(βR)∂β

⎞⎠0 h dCov ³βR´i−1⎛⎝∂ lnL(βR)

∂β

⎞⎠ d→ U ∼ χ2L


• LM test statistic for L linear restrictions Rβ − q = 0 [19]

118

Forecasting

• The approach is similar to forecasting in the simple linear regression

• Let x0 = [1, x10, x20, . . . , xK0]0 denote the vector of exogenous variables

• Point forecast

y0 = x00β

• Variance of the forecast error [20]

V ar (y0 − y0) = σ2µ1 + x00

³X0X

´−1x0

¶119

Presentation of the results

• In the literature, the results of regression analyses are often presented as follows

y = α + β1x1 + . . .+ βKxK(cse(α)) (cse(β1)) (cse(βK))

• Sometimes you find t-values in the parentheses, i.e. the values of the teststatistics for the tests H0 : βk = 0 vs H1 : βk 6= 0

• Often, R2 and σ and the value of the test statistic of the F test

H0 : β1 = . . . = βK = 0 vs H1 : not H0

are reported additionally120

• Fertilizer example:y = 0.95432 + 0.59652x1 + 0.26255x2

(0.46943) (0.13788) (0.03400)

• Additional results

R2 = 0.743

σ2 = 0.00425

σ = 0.0652

• Test statisticsH0 : β1 = 0 −→ 4.326H0 : β2 = 0 −→ 7.723H0 : β1 = β2 = 0 −→ 38.98

121

Examples of computer output:

• Excel

• SPSS

• EViews

• Stata

• R

• matlab122

Assumptions

A1: No relevant variable is omitted, and no irrelevant variables are includedA2: The true functional dependence between X and y is linearA3: The parameters β are constant for all T observations (xt, yt)B1-B4: u ∼ N

³0, σ2IT

´C1: The exogenous variables are not stochasticC2: No perfect multicollinearity: rank(X) = K + 1

All assumptions can be violated

What happens if they are violated?

123

Omitted or irrelevant variables

• Assumption A1: No relevant exogenous variable is omitted from theeconometric model, and all exogenous variables included in the model arerelevant

• What happens if relevant variables are missing?

• What happens if there are irrelevant variables included in the model?

• Example: Wage structure in a firm with 20 employees; what are thedeterminants of the wage yt?

124

Data: Education x1t; age x2t; firm tenure x3t

t yt x1t x2t x3t t yt x1t x2t x3t1 1250 1 28 12 11 1350 1 30 132 1950 9 34 8 12 1600 2 43 213 2300 11 55 25 13 1400 2 23 54 1350 3 24 5 14 1500 3 21 15 1650 2 42 21 15 2350 6 50 226 1750 1 43 19 16 1700 9 64 367 1550 4 37 17 17 1350 1 36 108 1400 1 18 1 18 2600 7 58 309 1700 3 63 25 19 1400 2 35 1710 2000 4 58 30 20 1550 2 41 6

125

• Three potential models (M2 is the true model)

(M1) yt = α+ βx1t + u0t(M2) yt = α+ β1x1t + β2x2t + ut(M3) yt = α+ β1x1t + β2x2t + β3x3t + u00t

Model Variable Coeff. bse(.) t-test p-value

(M1) Constant 1354.7 94.2 14.38 0.0000Education 89.3 19.8 4.50 0.0003

(M2) Constant 1027.8 164.5 6.25 0.0000Education 62.6 21.2 2.95 0.0089Age 10.6 4.6 2.32 0.0333

(M3) Constant 1000.5 225.7 4.43 0.0004Education 62.4 21.8 2.86 0.0114Age 12.4 10.7 1.16 0.2634Firm tenure -2.6 14.3 -0.18 0.8569

126

Omitted relevant variables

• Graphical representation

E

AB

C

F D G

Syy

S11 S22

127

• The models:

(M1) yt = α+ βx1t + u0t(M2) yt = α+ β1x1t + β2x2t + ut

(M3) yt = α+ β1x1t + β2x2t + β3x3t + u00t

• The error terms

u0t = β2x2t + ut

E(u0t) = E(β2x2t + ut)

= β2x2t +E(ut)

= β2x2t + 0

6= 0

128

• If a relevant exogenous variable is omitted, assumption B1 is violated!

• Consequence for point estimation

β01 = β1 + β2

S12S11

E(β01) = E

Ãβ1 + β2

S12S11

!

= β1 + β2S12S11

• Consequence for interval estimationhβ01 − ta/2 · cse(β01) ; β01 + ta/2 · cse(β01)i

129

• Further

se(β01) =

rvar

³β01

´with

var³β01

´=

σ2

S11

• The estimator

σ2 =Sbu0bu0T − 2

is biased; the unbiased estimator is

σ2 =SbubuT − 3

130

• Conclusion: The coverage probability of the confidence intervals is not 1− α

• Hypothesis tests are also biased: The probability of an error of the first kinddoes not equal the significance level

• If a relevant exogenous variable is omitted, then

— the point estimators are biased and inconsistent

— the interval estimators and hypothesis tests are no longer valid {18}

131

Irrelevant variables

• The error term in the misspecified model M3 is

u00t = ut − β3x3t

and since β3 = 0

u00t = ut

• Consequently,

E(bα001) = α

E(β001) = β1

E(β002) = β2

E(β003) = β3 = 0

132

• The variances of the estimators are

V ar(β1) =σ2

S11³1−R21·2

´V ar(β

001) =

σ2

S11³1−R21·2 −R21·3

´

• The estimated error term variance is

bσ2 = Sbu00bu00T − 4

• Conclusion: Omitted relevant variables are a serious problem,redundant variables are not (but they inflate the standard errors)

133

Diagnosis

• How can we find the correct model?

• The coeffcient of determination R2 does not help select a model

• Adjusted R2

R2= 1− Sbubu /(T −K − 1)

Syy /(T − 1)

= 1−³1−R2

´ T − 1T −K − 1

134

• Further model selection criteria(trade-off between biasedness and inefficiency)

• Akaike information criterion (AIC)

AIC = lnµSbubuT

¶+2(K + 1)

T

• t-test for single variables;F -test for multiple variables

135

Functional form

• Assumption A2: The true functional dependence between X and y is linear

• Milk example: Milk production m depends on amount of concentrated feed f

t ft mt t ft mt1 10 6525 7 8 58212 30 8437 8 14 75313 20 8019 9 25 83204 33 8255 10 1 43365 5 5335 11 17 72256 22 7236 12 28 8112

136

0 5 10 15 20 25 30

5000

6000

7000

8000

K ra ftfutte r

Milc

hmen

ge

137

• A misspecified model returns useless results

• Some nonlinear dependencies

Semi-Log. : mt = α+ β ln ft + ut

Invers : mt = α+ β (1/ft) + ut

Exponential : lnmt = α+ βft + ut

Logarithmic : lnmt = α+ β ln ft + ut

Quadratic : mt = α+ β1ft + β2f2t + ut

138

• Approach I: Estimation of a nonlinear regression

yt = g(xt) + ut

with criterion functionTXt=1

(yt − g(xt))2

• Optimization by numerical methods

• Approach II: Linearization of the model; then linear regression

yt = α+ βxt + utyt = lnmt

xt = ln ft

139

Diagnosis: Regression Specification Error Test (RESET)

• Higher order Taylor approximation

yt = f(xt) = α+ β1xt + β2x2t + β3x

3t + . . .

• Are the higher orders (jointly) significant?

• F -test of β2 = β3 = . . . = 0

• Problem: What happens if there are many exogenous variables?

140

• Basic idea of the RESET: by2t , by3t , . . . are included as additional exogenousvariables

yt = α+ β1xt + γ2by2t + γ3by3t + ut

• If γ2 and/or γ3 are significant, then there are nonlinearities

• F -test of γ2 = γ3 = 0 (maybe even higher orders)

• The test is implemented in many statistical software packages

141

RESET in the linear model:

1. Estimate the linear model and calculate Sbubu and the fitted byt2. Add L powers of yt to the linear model

yt = α+ β1xt + γ2y2t + γ3y

3t + ut

Estimate the extended model and calculate the sum of squared residuals S∗bubu3. The null hypothesis is H0 : γ2 = γ3 = 0

142

4. Compute the F -test statistic

F(L,T−K∗−1) =

³Sbubu − S∗bubu´ /L

S∗bubu/ (T −K∗ − 1)where K∗ is the number of exogenous variables in the extended model

5. If F > Fa (significance level a, degress of freedom L and T −K∗ − 1),then H0 is rejected and the linear model is discarded

• Milk example {18}

143

Qualitative exogenous variables

• Assumption A3: The parameters β are constant for all T observations (xt, yt)

• Example: The wage yt depends on education x1t and age x2t


• The wage equations for males and females might be different

yt = αM + βM1x1t + βM2x2t + ut

yt = αF + βF1x1t + βF2x2t + ut

• What happens if the difference is neglected? [qualitative.R]

144

• Dummy variable

Dt =

(0 if male1 if female

• Extended model

yt = α+Dtγ + β1x1t + δ1Dtx1t + β2x2t + δ2Dtx2t + ut

• Model for men (Dt = 0)


• Model for women (Dt = 1)

yt = (α+ γ) + (β1 + δ1)x1t + (β2 + δ2)x2t + ut

145

• If the qualitative variable has more than two values, we need more than onedummy variable

• Example: Religion (protestant, catholic, other)

DPt =

⎧⎪⎨⎪⎩0 for other1 for protestant0 for catholic

DCt =

⎧⎪⎨⎪⎩0 for other0 for protestant1 for catholic

• Meaning of the coefficients; testing structural stability

146

• Estimation of the model

• Use the ordinary t- or F -tests to detect differences in the coefficients, e.g.

H0 : γ = δ1 = δ2 = 0

• Very often, the model includes only a level effect, i.e.

yt = α+ γDt + β1x1t + β2x2t + ut

• Then use a t-test for γ

147

• Estimation of the wage equation model

yt = α+Dtγ + β1x1t + δ1Dtx1t + β2x2t + δ2Dtx2t + ut

• Compare with separat estimation of the two models [wages.R]

yt = αM + βM1x1t + βM2x2t + ut for menyt = αF + βF1x1t + βF2x2t + ut for women

• The point estimates and the sum of squared residuals are identical (why?)

• The standard errors differ (why?)

148

• For simplicity we only consider one exogenous variable

yt = α+ γDt + βxt + δDtxt + ut

• Order the observations such that Dt = 0 for t = 1, . . . , T1 and Dt = 1 fort = T1 + 1, . . . , T

• The joint estimation minimizes (with respect to α, β, γ, δ)

S (α, β, γ, δ) =T1Xt=1

(yt − α− βxt)2 +

TXt=T1+1

(yt − (α+ γ)− (β + δ)xt)2

149

The first order conditions for the joint estimation are

∂S

∂α= −

T1Xt=1

(yt − α− βxt)−TX

t=T1+1

(yt − (α+ γ)− (β + δ)xt) = 0

∂S

∂β= −

T1Xt=1

(yt − α− βxt)xt −TX

t=T1+1

(yt − (α+ γ)− (β + δ)xt)xt = 0

∂S

∂γ= −

TXt=T1+1

(yt − (α+ γ)− (β + δ)xt) = 0

∂S

∂δ= −

TXt=T1+1

(yt − (α+ γ)− (β + δ)xt)xt = 0

150

• Hence, the point estimates in the joint estimation are identical to those of theseparat estimations

• If the point estimates are identical, then so are the residuals; and if theresiduals are identical, then so are the sums of squared residuals

• As to the standard errors, in the joint model we estimate

σ2 = Suu/ (T − 4)

while in the separat estimations we estimate

σ20 = S0uu/ (T1 − 2)σ21 = S1uu/ ((T − T1)− 2)

151

Remarks

• What happens if the dummy variables are not 0/1-coded but 1/2-coded?

• Consider the model

yt = α+ γD1t + δD2t + βxt + ut

where

D1t =

(0 for males1 for females

D2t =

(0 for German citizenship1 else

• Interaction terms

152

Download - Andrea Beccarini Summer 2011 - wiwi.uni-muenster.de · Review of basic statistics • Random experiment (Zufallsexperiment) • Sample space (Ergebnismenge) • Event (Ereignis) •

Top Related