Econometrics I
Andrea Beccarini
Summer 2011
Outline
• Very brief review of statistical basics
• Simple linear regression model(specification, point estimation, interval estimation, hypothesis tests,forecasting, maximum likelihood estimation)
• Multiple linear regression model
• Violations of (some) model assumptions
9
Review of basic statistics
• Random experiment (Zufallsexperiment)
• Sample space (Ergebnismenge)
• Event (Ereignis)
• Set operations (Verknüpfungen von Ereignissen)
• Partition (Partition oder vollständige Zerlegung)
10
• Probability (Wahrscheinlichkeit)
• Kolmogorov’s axioms (Kolmogorovs Axiome)
• Conditional probability (bedingte Wahrscheinlichkeit)
• Total probability (Satz von der totalen Wahrscheinlichkeit)
• Bayes’ theorem (Satz von Bayes)
• Independence (Unabhängigkeit)11
• Random variables (Zufallsvariable)
— Definition and intuition
— Distribution function and quantile function(Verteilungsfunktion und Quantilfunktion)
— Discrete and continuous random variables(diskrete und stetige Zufallsvariable)
— Density function (Dichtefunktion)
— Expectation (Erwartungswert)
— Variance (Varianz)12
• Special discrete distributions,e.g. Bernoulli, binomial, Poisson, geometric, hypergeometric, . . .
• Special continuous distributionse.g. normal, standard normal distribution, exponential, Pareto, χ2, F, t, . . .
• There are many more special distributions
• Which distribution can be used when?
13
Simple linear regression model
• Econometrics: Application of statistical methods to empirical research ineconomics
• Econometric problems:
— Specification of an appropriate model
— Estimation of the model (Schätzung)
— Hypothesis testing
— Forecasting (Prognose)14
Economic model↓
SPECIFICATIONfunctional (A-assumptions)error term (B-assumptions)variables (C-assumptions)
↓Econometric model
↓ESTIMATION
↓Estimated model
↓ ↓HYPOTHESIS TESTS FORECASTING
15
Data
• Empirical research requires (high quality) data
• Often, collecting data is the main problem of empirical research
• There is no systematic approach
Kinds of data:
• Time series data (Zeitreihendaten), cross sectional data (Querschnittsdaten),panel data (Paneldaten)
16
Specification
• Numeric illustration: Data of the gratuity example
t xt yt t xt yt1 10.00 2.00 11 60.00 7.002 30.00 3.00 12 47.50 5.503 50.00 7.00 13 45.00 7.004 25.00 2.00 14 27.50 4.505 7.50 2.50 15 15.00 1.506 42.50 6.00 16 20.00 4.007 35.00 5.00 17 47.50 9.008 40.00 4.00 18 32.50 3.009 25.00 6.00 19 37.50 6.5010 12.50 1.00 20 20.00 2.50
Billing amount xt andtip yt (both in Euro)of 20 observed guests
17
• Functional dependence (generic)
y = f (x)
• More specifically, the functional dependence is assumed to be
y = α+ βx
• Other functional forms are of course possible; more on that later
• The econometric model is specified using the A-, B- and C-assumptions
18
• Economic model: yt = α+ βxt for t = 1, . . . , 20
Rechnungsbetrag x
Trin
kgel
d y
0 20 40 60 80
02
46
8
R
α
20β
20
19
• Econometric model: yt = α+ βxt + ut for t = 1, . . . , 20
xt
y t
0 20 40 60 80
02
46
810
20
The A-assumptions (functional specification):
Assumption a1: No relevant exogenous variable is omitted from the econometricmodel, and the exogenous variable included in the model is relevant
Assumption a2: The true functional dependence between xt and yt is linear
Assumption a3: The parameters α and β are constant for all T observations(xt, yt)
21
The B-assumptions (error term specification):
Assumption b1: E(ut) = 0 for t = 1, . . . , T
Assumption b2: Homoskedasticity: V ar(ut) = σ2 for t = 1, . . . , T
Assumption b3: For all t 6= s with t = 1, 2, ..., T and s = 1, 2, .., T we have
Cov(ut, us) = 0
Assumption b4: The error terms ut are normally distributed.
Compact notation of all B-assumptions: ut ∼ NID(0, σ2) for t = 1, . . . , T
22
• Graphical illustration of the error term distribution
23
The C-assumptions (variable specification):
Assumption c1 The exogenous variable xt is not stochastic, but can be controlledas in an experimental situation
Assumption c2 The exogenous variable xt is not constant for all observations t
• Of course, many (or even all?) of the A-, B-, and C-assumptions are restrictiveand unrealistic
• We will nevertheless suppose they are satisfied for the time being,and consider their violations later on
24
Point estimation
• The simple (two-variable) linear regression model is
yt = α+ βxt + ut
• Numeric illustration: The first data of the gratuity example
t xt yt1 10 22 30 33 50 7
012345678
0 20 40 60
yt
xt
25
• Estimation: Compute estimated values α and β
• Distinguish between true and estimated values
• If the true econometric model is
yt = α+ βxt + ut
then the corresponding estimated model is
yt = α+ βxt
26
• How can we estimate the coefficients?
xt
y t
0 10 20 30 40 50 60
02
46
8
R1
R2
R3
27
Least squares method
• Sum of squared residuals
Suu =TXt=1
u2t
where the residuals are
ut = yt − yt
= yt − α− βxt
• Residual (Residuum): Difference between the observed value yt and theestimated (predicted) value yt
28
• Choose α and β such that the sum of squared residuals
Suu =TXt=1
u2t =TXt=1
(yt − α− βxt)2
is minimized
• Derivation of estimators (Schätzer) [1]
β = Sxy/Sxx
α = y − βx
with
Sxx =X(xt − x)2 =
Xx2t − T x2
Sxy =X(xt − x) (yt − y) =
Xytxt − T xy
29
• Numeric illustration for the three-points example
t xt yt1 10 22 30 33 50 7
• Calculate {1}
α, β
y1, y2, y3
u1, u2, u3
Suu
30
The coefficient of determination R2
• Variation of the endogenous variable Syy =P(yt − y)2
yt
xt
R
x
y =y y1 −
y y2 −
y y3 −g
012345678
0 20 40 60
31
• Variation Syy =P(yt − y)2 and sum of squared residuals Suu =
Pu2t
yt
xt
$RKQ$u3
$u2
g
$u1
$y y1 −y y2 −
y y3 −$y y3 −
012345678
0 20 40 60
32
• Decomposition of sum of squares (Streuungszerlegungssatz): [2]
Syy = Syy + Suu
or X(yt − y)2 =
X(yt − y)2 +
Xu2t
• Coefficient of determination (Bestimmtheitsmaß)
R2 =„explained variation”„unexplained variation”
=Syy − Suu
Syy=
Syy
Syy
• Computation of R2 {2}
R2 =βSxy
Syy=
S2xy
SxxSyy33
Properties of the estimators
• The estimators
β = Sxy/Sxx
α = y − βx
are random variables
• Thought experiment: repeated samples
• Computer simulation [experiment.R]
34
• Under the a-, b- and c-assumptions (without b4) [3]
E(α) = α
E(β) = β
and [4]
Cov(α, β) = −σ2 (x/Sxx)V ar(α) = σ2
³1/T + x2/Sxx
´V ar(β) = σ2/Sxx
• BLUE property: α and β are the best linear unbiased estimators [5]
• If, additionally b4 is true, then α and β are the best unbiased estimators35
• How are yt, α and β distributed?
• Because of
ut ∼ NID(0, σ2)
yt is normally distributed, t = 1, . . . , T
• The expectation of yt is
E (yt) = E(α+ βxt + ut)
= E (α) +E (βxt) +E (ut)
= α+ βxt
36
• The variance of yt is
V ar (yt) = E³(yt −E (yt))
2´
= E³(yt − α− βxt)
2´
= E³u2t´
= E³(ut −E(ut))
2´
= σ2
• Further, for t = 1, . . . , T
yt ∼ NID(α+ βxt, σ2)
37
• Since
β = Sxy/Sxx
α = y − βx
both α and β are linear transformations of the yt
• Linear transformations of independent normally distributed random variablesare normally distributed
• Hence,
α ∼ N³α, σ2(1/T + x2/Sxx)
´β ∼ N
³β, σ2/Sxx
´38
Interval estimation (Intervallschätzung)
• We already know that β is a random variable and
β ∼ N³β, σ2/Sxx
´
• Instead of a point estimator β we now want an interval estimator
[β − k ; β + k]
satisfying
P³β − k ≤ β ≤ β + k
´= 1− a
• The interval [β − k ; β + k] is called (1− a)-confidence interval(Konfidenzintervall)
39
Confidence interval when σ2 is known
• Step 1: Standardization of β
se(β) =qσ2/Sxx
z =β −E(β)
se(β)
=β − β
se(β)∼ N (0, 1)
• The random variable z = (β − β)/se(β) is a pivot (Pivot), i.e. its distributiondoes not depend on unknown parameters
40
• Step 2: Find the (1− α/2)-quantile za/2
P (−za/2 ≤ z ≤ za/2) = 1− a
• Step 3: Substitute z by (β − β)/se(β)
P
Ã−za/2 ≤
β − β
se(β)≤ za/2
!= 1− a
• Rewriting yields the (1− a)-interval [6]{3}hβ − za/2 · se(β); β + za/2 · se(β)
i
41
Confidence interval when σ2 is unknown
• Step 1: Estimation of σ2 and se(β):
σ2 =1
T − 2
TXt=1
u2t
is a consistent and unbiased estimator of σ2 and
cse(β) = qσ2/Sxx
is a consistent estimator of se(β) (we postpone the proofs)
• Step 2: Standardization of β
t =β −E(β)cse(β) =
β − βcse(β) ∼ t(T−2)
42
• The random variable t = (β − β)/cse(β) is a pivot• Step 3: Find the (1− α/2)-quantile ta/2
P (−ta/2 ≤ t ≤ ta/2) = 1− a
• Step 4: Substitute and solve for β,
P (β − ta/2 · cse(β) ≤ β ≤ β + ta/2 · cse(β)) = 1− a
• The interval estimator is {4}hβ − ta/2 · cse(β); β + ta/2 · cse(β)i
43
• Interval estimator for intercept αhα− ta/2 · cse(α) ; α+ ta/2 · cse(α)i
where
cse(α) = qbσ2(1/T + x2/Sxx)
• Some terminology: The standard error (Standardfehler) is se(β);the estimated standard error is cse(β)
• Usually, both se(β) and cse(β) are called standard error (Standardfehler)• Interpretation of interval estimators?
44
Hypothesis tests
• How can we test hypotheses about the regression coefficients(usually about the slope β)?
• Null hypothesis H0 and alternative hypothesis H1(Nullhypothese und Alternativhypothese)
• There are one-sided and two-sided tests
• We already know that
β ∼ N³β, σ2/Sxx
´45
• If the null hypothesis H0 : β = q is true, then β can be substituted by q
β ∼ N³q, σ2/Sxx
´
• Then
P (β − k ≤ q ≤ β + k) = 1− a
P (q − k ≤ β ≤ q + k) = 1− a
• With high probability 1− α, the estimator β will be inside the interval[q − k; q + k], if H0 is true
• If the estimator β is outside the interval, that is evidence against the nullhypothesis
46
• Graphical illustration
47
• The analytical approach is slightly different
• Step 1: Set up H0 and H1 and fix the significance level a
H0 : β = q
H1 : β 6= q
• Step 2: Estimate se(β)
cse(β) = qσ2/Sxx
with σ2 = Suu/ (T − 2)
48
• Step 3: Compute the t-test statistic
t =β − qcse(β)
If H0 : β = q is true, then
t ∼ t(T−2)
• Step 4: Find the critical value ta/2
P (−ta/2 ≤ t ≤ ta/2) = 1− a
• Step 5: Compare ta/2 and t. If t is outside [−ta/2; ta/2], i.e. if |t| > ta/2,then reject H0 {5}
49
Connections between hypothesis testing and confidence intervals
• Under the (two-sided) null hypothesis H0
P³q − ta/2 · cse(β) ≤ β ≤ q + ta/2 · cse(β)´ = 1− a
• The (1− a)-confidence interval ishβ − ta/2 · cse(β); β + ta/2 · cse(β)i
• Conclusion: If q is outside the confidence interval, H0 is rejected {6}
50
One-sided hypothesis tests (einseitige Tests)
• Right or left-sided tests
• Right-sided null hypothesis
H0 : β ≤ q
H1 : β > q
• The basic idea remains the same: If β is „much larger” than q, reject H0
51
• Graphical illustration:
52
Analytical approach (right-sided null hypothesis)
• Step 1: State H0 and H1 and set the significance level a
H0 : β ≤ q
H1 : β > q
• Step 2: Estimate se(β)
• Step 3: Compute the t-statistic
t =β − qcse(β)
Under H0 its distribution is t ∼ t(T−2)53
• Step 4: Find the critical value ta
P (t ≤ ta) = 1− a
For left-sided null hypotheses, the steps 1, 2 and 3 are the same; the criticalvalue is t1−a with P (t < ta) = a
• Step 5: Compare ta and t; reject H0, if t > ta {7}
• For left-sided null hypotheses, H0 is rejected if t is less than the critical value,t < t1−a
54
The p-value (p-Wert)
• The p-value is the probability that the test statistic (a random variable) isgreater than the realized test statistic
• Traditional approach: Reject the null hypothesis if the test statistic is inside thecritical region, e.g. if t > ta
• Alternative approach: Comparison of probabilities; reject the null hypothesis ifthe p-value is less than the significance level a
55
• Graphical illustration:
56
• The two approaches — comparison of t-statistic and critical value or comparisonof p-value and significance level — are essentially identical {8}
• Advantages of the p-value approach?
• Disadvantages?
• p-value formulas for right- and left-sided hypothesis tests? [7]
• p-value formula for two-sided hypothesis test?
57
How to choose the null and alternative hypotheses
• There are basically two strategies:
— State the opposite of the conjecture as the null hypothesisand try to reject it
— State the conjecture as the null hypothesisand show that it cannot be rejected
• There is an important asymmetry between rejection and non-rejection
58
Maximum likelihood estimation
• Main idea: Find those parameter values that maximize the probability (orlikelihood) of observing the actually observed data
• Notation:
θ : Parameter vector, e.g. θ = (α, β, σ2)
L(θ) : Likelihood (given all the data)
lnL (θ) : Log-likelihood
• Maximum likelihood estimators
θ = argmin lnL(θ)
59
• We already know that, for t = 1, . . . , T
yt ∼ NID(α+ βxt, σ2),
hence the density of yt is
fyt(y) =1√2πσ2
exp
Ã−12
(y − α− βxt)2
σ2
!
• Due to independence, the joint likelihood and log-likelihood are
L(α, β, σ2) = fy1,...,yT (y1, . . . , yT ) =TYt=1
fyt(yt)
lnL(α, β, σ2) = ln fy1,...,yT (y1, . . . , yT ) =TXt=1
ln fyt(yt)
60
• Maximize
lnL(α, β, σ2) = ln fy1,...,yT (y1, . . . , yT )
=TXt=1
ln
"1√2πσ2
exp
Ã−12
(yt − α− βxt)2
σ2
!#
with respect to the parameters α, β, σ2 [8]
• The ML estimators are
αML = y − βMLx
βML =Sxy
Sxx
σ2ML =1
T
TXt=1
u2t
61
• Hypothesis tests in the maximum likelihood framework(the three classical tests: Wald, LR, LM)
• Null and alternative hypotheses, e.g.
H0 : β = β0
H1 : β 6= β0
• Derivation of the test statistics [exercise]
62
Forecasting
• Conditional forecast: the value of the exogenous variable is known andnon-stochastic x0
• Point forecast of the endogenous variable is {9}
y0 = α+ βx0
• The true value of y0 is usually not y0 but
y0 = α+ βx0 + u0
63
• The forecasting error is
y0 − y0 = α+ βx0 − (α+ βx0 + u0)
= (α− α) +³β − β
´x0 − u0
• There are two error sources:
1. The error term u0 will not vanish, in general.
2. The parameter estimates α and β will deviate from the true values α and β.
64
Properties of the point forecast
• Expected forecasting error:
E(y0 − y0) = E(α− α) +E(β − β)x0 −E(u0)
= 0
• Variance of the forecasting error [9]
V ar(y0 − y0) = σ2h1 + 1/T + (x0 − x)2 /Sxx
i
• Estimated variance of the forecasting error {9}
dV ar(y0 − y0) = σ2h1 + 1/T + (x0 − x)2 /Sxx
i65
Interval forecast
• Step 1: Estimation of se(y0 − y0)
• Step 2: Standardization of (y0−y0)
t =(y0 − y0)−
=0z }| {E (y0 − y0)cse(y0 − y0)
=y0 − y0cse(y0 − y0)
∼ tT−2
• Step 3: Find the ta/2-value (from statistical tables or using statistical computersoftware)
66
• Step 4: With large probability 1− α, the random variable t will be inside theinterval [−ta/2 ; ta/2],
P
Ã−ta/2 ≤
y0 − y0cse (y0 − y0)≤ ta/2
!= 1− a
Solve for y0
P³y0 − ta/2 · cse(y0 − y0) ≤ y0 ≤ y0 + ta/2 · cse(y0 − y0)
´= 1− a
• Hence, the interval forecast is {9}hy0 − ta/2 · cse(y0 − y0); y0 + ta/2 · cse(y0 − y0)
i
67
• Width of the interval
68
Multiple linear regression model
• Until today we only considered a single exogenous variable, but in mostempirical problems we face many exogenous variables
• Many of the results from the simple linear regression model can be transferredto the multiple case
• Important tool: matrix algebra(main diagonal, transpose, addition, scalar multiplication, inner product, matrixmultiplication, idem potent, determinant, rank, inverse, trace, definit matrices,semidefinite matrices)
69
Specification
• Example: Estimation of a production function for barley
• Conduct an experiment where the barley output (Gerste, gt) is observed fordifferent combinations of phosphate (pt) and nitrogen (nt)
• There are T = 30 different combinations
• The following table shows the data
70
t pt nt gt t pt nt gt1 22,00 40,00 38,36 16 25,00 110,00 59,552 22,00 60,00 49,03 17 26,00 50,00 55,243 22,00 90,00 59,87 18 26,00 70,00 54,134 22,00 120,00 59,35 19 26,00 90,00 66,575 23,00 50,00 45,45 20 26,00 110,00 61,746 23,00 80,00 53,23 21 27,00 40,00 48,997 23,00 100,00 56,55 22 27,00 60,00 54,388 23,00 120,00 50,91 23 27,00 80,00 58,289 24,00 40,00 44,87 24 27,00 100,00 62,8110 24,00 60,00 54,06 25 28,00 50,00 50,7611 24,00 90,00 60,34 26 28,00 70,00 51,5412 24,00 120,00 58,21 27 28,00 100,00 59,3913 25,00 50,00 51,52 28 28,00 110,00 68,1714 25,00 80,00 58,58 29 29,00 60,00 59,2515 25,00 100,00 57,27 30 29,00 100,00 64,39
71
Functional specification (A-assumptions)
• The economic (agro-economic) model formalizes the connection between thebarley output (g) and the fertilizers (p and n)
g = f(p, n)
• Possible function formg = α+ β1p+ β2n
• A more realistic functional form
g = Apβ1nβ2,
where A, β1 and β2 are constant parameters
72
• Take logarithms of the production function g = Apβ1nβ2,
ln g = lnA+ β1 ln p+ β2 lnn
• Define α = lnA, y = ln g, x1 = ln p and x2 = lnn, then
y = α+ β1x1 + β2x2
• Table of log-values:
t x1 x2 yt(= ln pt) (= lnnt) (= ln gt)
1 3,0910 3,6889 3,6470... ........... ........... ..........30 3,3673 4,6052 4,1650
73
• The econometric model is
yt = α+ β1x1t + β2x2t + ut
for t = 1, . . . , T
• General model for K exogenous variables
yt = α+ β1x1t + β2x2t + . . .+ βKxkt + ut
for t = 1, . . . , T or
y1 = α+ β1x11 + β2x21 + ...+ βKxK1 + u1
y2 = α+ β1x12 + β2x22 + ...+ βKxK2 + u2...
yT = α+ β1x1T + β2x2T + ...+ βKxKT + uT
74
• Matrix notation: Define
y =
⎡⎢⎢⎢⎣y1y2...yT
⎤⎥⎥⎥⎦ ; X =
⎡⎢⎢⎢⎣1 x11 . . . xK11 x12 . . . xK2... ... . . . ...1 x1T . . . xKT
⎤⎥⎥⎥⎦ ; β =
⎡⎢⎢⎢⎣αβ1...βK
⎤⎥⎥⎥⎦ ; u =
⎡⎢⎢⎢⎣u1u2...uT
⎤⎥⎥⎥⎦
• Compact notation for the multiple regression model
y = Xβ + u
or ⎡⎢⎢⎢⎣y1y2...yT
⎤⎥⎥⎥⎦ =⎡⎢⎢⎢⎣1 x11 . . . xK11 x12 . . . xK2... ... . . . ...1 x1T . . . xKT
⎤⎥⎥⎥⎦⎡⎢⎢⎢⎣αβ1...βK
⎤⎥⎥⎥⎦+⎡⎢⎢⎢⎣u1u2...uT
⎤⎥⎥⎥⎦75
The A-assumptions
Assumption A1: No relevant exogenous variable is omitted from the econometricmodel, and all exogenous variables included in the model are relevant
Assumption A2: The true functional dependence between X and y is linear
Assumption A3: The parameters β are constant for all T observations (xt, yt)
76
The B-assumptions
The B-assumptions are the same as in the simple linear model, i.e. E(ut) = 0,V ar(ut) = σ2, Cov(ut, us) = 0 for t 6= s and normality
B1 to B4 in matrix notation
u ∼ N³0, σ2IT
´
77
The C-assumptions
Assumption C1: The exogenous variables x1t, . . . , xKt are not stochastic, but canbe controlled as in an experimental situation
Assumption C2: No perfect multicollinearity: The are no parameter values γ0, γ1,γ2, . . . , γK (with at least one γk 6= 0), such that for all t = 1, . . . , T
γ0 + γ1x1t + γ2x2t + . . .+ γKxKt = 0
Assumption C2’ in matrix notation:
rang(X) = K + 1
(implication: T ≥ K + 1)
78
Perfect multicollinearity with two regressors
• If C2 is violated, there are γ0, γ1, γ2, (not all 0) such that
γ0 + γ1x1t + γ2x2t = 0
for all t = 1, . . . , T , thus
x2t = − (γ0/γ2)− (γ1/γ2)x1t = δ0 + δ1x1t
with δ0 = − (γ0/γ2) and δ1 = − (γ1/γ2)
• Hence, there are not really two regressors, since
yt = α+ β1x1t + β2x2t + ut= (α+ β2δ0)| {z }
=α0+(β1 + β2δ1)| {z }
=β0x1t + ut
79
Point estimation
• The econometric model is
y = Xβ + u
yt = α+ β1x1t + . . .+ βKxKt + ut for t = 1, . . . , T
• The estimated model is
y = Xβ
yt = α+ β1x1t + . . .+ βKxKt for t = 1, . . . , T
80
• Define the residuals
u = y− yut = yt − yt for t = 1, . . . , T
• How can we find an estimator β in the multiple regression model?
• The sum of squared residuals is
Suu = u0u
=X
u2t
81
• Because of
u = y−Xβ= yt − α− β1x1t − . . .− βKxKt
we have
Suu =³y−Xβ
´0 ³y−Xβ
´=
X³yt − α− β1x1t − . . .− βKxKt
´2• First order conditions
∂Suu
∂β=
⎡⎢⎢⎢⎢⎣∂Suu/∂α
∂Suu/∂β1...∂Suu/∂βK
⎤⎥⎥⎥⎥⎦ = 082
• Vector of derivatives∂Suu
∂β=
∂
∂β
³y−Xβ
´0 ³y−Xβ
´=
∂
∂βy0y− ∂
∂β2y0Xβ +
∂
∂ββX
0Xβ
= −2X0y+2X0Xβ
• J.R. Magnus, H. Neudecker, Matrix Differential Calculus with Applications inStatistics and Econometrics, rev. ed., John Wiley & Sons: Chichester, 1999.
• Phoebus J. Dhrymes, Mathematics for Econometrics, 3rd ed.,Springer: New York, 2000.
83
• Solving the first order conditions yields the normal equations
X0Xβ = X0y
and thus
β =³X0X
´−1X0y
• The terms are
X0X=
⎡⎢⎢⎢⎣T
Px1t . . .
PxKtP
x1tPx21t . . .
Px1txKt
... ... . . . ...PxKt
PxKtx1t . . .
Px2Kt
⎤⎥⎥⎥⎦ , X0y=
⎡⎢⎢⎢⎣PytPx1tyt
...PxKtyt
⎤⎥⎥⎥⎦
• Numeric illustration {10}
84
Meaning of the estimators α, β1 and β2
• Formal meaning∂yt
∂x1t= β1 and
∂yt
∂x2t= β2
• Meaning of α: for x1t = x2t = 0
ln gt = α = 0.9543
gt = e0.9543 = 2.5969
85
• Meaning of β1 and β2:
β1 =∂yt
∂x1t=
∂ (ln gt)
∂ (ln pt)
• Because of∂ ln gt
∂gt=1
gtand
∂ ln pt
∂pt=1
pt
we find
β1 =∂gt/gt
∂pt/pt
• β1 is the estimated elasticity of the barley output with respect to thephosphate fertilizer
86
Coefficient of determination R2
• The total variation of y can be decomposed in the same way as in the simplelinear model
Syy|{z}„total variation”
= Syy|{z}„explained variation”
+ Suu|{z}„unexplained variation”
• The coefficient of determination is defined as
R2 =„explained variation”„total variation”
=Syy
Syy
=Syy − Suu
Syy
87
• Graphical illustration
E
AB
C
F D G
Syy
S11 S22
• Here
R2 =A+B + C
A+B + C +E
88
• Computation of R2: In the simple linear regression model
R2 =Syy
Syy=
βSxy
Syy
• It can be shown that in the multiple linear regression model
Syy =KXk=1
βkSky
with the covariations Sky =PTt=1 (xkt − xk) (yt − y)
• Then {11}
R2 =
PKk=1
bβkSkySyy
89
Properties of the OLS estimators
• The estimator β is a random vector
• The expectation vector is [10]
E(β) = β
(unbiasedness, Erwartungstreue)
• The covariance matrix of β is [11]
V(β) = σ2³X0X
´−190
• Special case: Covariance matrix in the two regressor model:
V ar(β1) =σ2
S11³1−R21·2
´V ar(β2) =
σ2
S22³1−R21·2
´V ar (α) = σ2/T + x21V ar(β1)
+2x1x2Cov(β1, β2) + x22V ar(β2)
Cov(β1, β2) =−σ2R21·2
S12³1−R21·2
´where
R21·2 =S212
S11S22
91
Gauss-Markov theorem
• The estimator β =¡X0X
¢−1X0y is linear in y, sinceβ = Dy
with D =³X0X
´−1X0
• β =¡X0X
¢−1X0y is not only unbiased but also efficient• Let β be another linear unbiased estimator of β
• Then V(β)−V(β) is positive semidefinit [12]
92
Distribution of the estimator
• The model is y = Xβ + u
• From u ∼ N(0, σ2IT ) we conclude that y is multivariate normally distributed
• Expectation vector and covariance matrix of endogenous variable
E(y) = E(Xβ + u) = Xβ
V(y) = V(Xβ + u) = V(u) = σ2IT
• Thus y ∼ N(Xβ, σ2IT )
93
• How is the estimator β distributed?
• Since β =¡X0X
¢−1X0y the estimator β also has a multivariate normaldistribution
• Expectation vector and covariance matrix are already known
• Hence
β ∼ Nµβ, σ2
³X0X
´−1¶
• Problem: The error term variance σ2 is unknown94
• The covariance matrix V(β) cannot be computed without σ2
• Since usually σ2 is unknown, it has to be estimated
• An estimator of σ2 isσ2 =
SuuT −K − 1
• Its expectation is E(σ2) = σ2 [13]{12}
• The “residual maker matrix”
M = IT −X(X0X)−1X0
95
Interval estimation
• Interval estimation of a single component βk of the vector β
P³βk − c ≤ βk ≤ βk + c
´= 1− a
• We know that
βk ∼ N(βk, V ar(βk))
where V ar(βk) is the (k + 1)th diagonal element of σ2
¡X0X
¢−1
• Problem: σ2 and V ar(βk) are unknown
96
• Step 1: Estimation of σ2 by σ2 and se(βk) =qV ar(βk) by
cse(βk) = q dV ar(βk)• Step 2: Standardization of βk
t =βk −E(βk)cse(βk) =
βk − βkcse(βk) ∼ t(T−K−1)
• Step 3: Find the ta/2-value
• Step 4: The (1− α)-interval estimator is {13}hβk − ta/2 · cse(βk) ; βk + ta/2 · cse(βk)i
97
Interval estimation of linear combinations of β
• Let r be an arbitrary (K + 1)-column vector
• How can we find a confidence interval of r0β ?
• Fertilizer example: r = [0, 1, 1]0, then r0β = β1 + β2 (economies of scale?)
• The point estimator of r0β is r0β
• The variance of r0β is r0V(β)r = σ2r0(X0X)−1r98
• The confidence interval for r0β is∙r0β − ta/2 · σ
qr0(X0X)−1r ; r0β + ta/2 · σ
qr0(X0X)−1r
¸
• Special case of a single component
βk = r0β
for
r = [0, . . . , 0, 1, 0, . . . , 0]0
where the 1 is located at the kth position
• Then V ar(βk) = r0σ2(X0X)−1r99
Hypothesis tests: t-test
• There are tests of a single linear combination (t-tests) and tests of multiplelinear combinations (F -tests)
• Testing a single linear combination of parameters: t-test (two-sided)
• Remember: In the simple linear regression case
H0 : β = q
H1 : β 6= q
100
• In the multiple linear model the null and alternative hypotheses are
H0 : r0α+ r1β1 + . . .+ rKβK = q
H1 : r0α+ r1β1 + . . .+ rKβK 6= q
or
H0 : r0β = q
H1 : r0β 6= q
where
r = [r0, r1, . . . , rK]0
101
The test procedure:
1. Set up H0 and H1 and fix the significance level a
2. Estimate se(r0β)
3. Compute the t-statistic
4. Find the critical value ta/2
5. Test decision: Compare ta/2 and t {14}
102
• The left-sided t-test
H0 : r0β ≥ q
H1 : r0β < q
and the right-sided test
H0 : r0β ≤ q
H1 : r0β > q
are similar
• The critical values are lower quantiles of the t-distribution for the left-sided testand upper quantiles for the right-sided test {14}
103
Hypothesis tests: F-test
• Simultaneous test of two or more linear combinations (restrictions)
• Null hypothesis and alternative hypothesis
H0 : Rβ = q
H1 : Rβ 6= q
• Exampels:
H0 : β1 = β2 = . . . = βK = 0
H0 : β1 = β2 = . . . = βKH0 : β1 + . . .+ βk = 1 and β1 = 2β2H0 : β1 = 5 and β2 = . . . = βK = 0
104
• Basic idea of the F -test: Compare the restricted and the unrestricted model
• Sum of squared residuals of the econometric model and the model under thenull hypothesis
Suu = u0u =TXt=1
u2t
Su0u0 = u00u0 =TXt=1
³u0t´2
where u0 are the residuals if the model is estimated under the restrictions ofthe null hypothesis
105
• Example: Null hypothesis
yt = α+ 0 · x1t + . . .+ 0 · xKt + ut = α+ ut
• Obviously, S0bubu ≥ Sbubu; the null hypothesis is likely to be false if S0bubu is “muchlarger” than Sbubu
• The test statistic is
F =
³S0bubu − Sbubu´.L
Sbubu/ (T −K − 1)where L is the number of restrictions in H0
• If the null hypothesis is true, then F ∼ F(L,T−K−1)106
The five steps of the F -test
1. Set up H0 and H1 and choose the significance level a
2. Calculate Sbubu and S0bubu (more on the computation of S0bubu later)3. Compute the F -test statistic
4. Find the critical value Fa, i.e. the upper a-quantile of theFL,T−K−1-distribution
5. Reject H0 if F > Fa {15}
107
Remarks:
• For L = 1 the F -test is identical to a two-sided t-test
• Careful: A combination of t-tests is not the same as a single F -test
• The decisions of t-tests and an F -test can be contradicting
• Distinction between individual t-tests and a simultaneous F -test
108
• Example: H0 : β1 = β2 = 0.33 {16}
109
Computation of u00u0
• Estimate β subject to the restrictions Rβ = q given in the null hypothesis
• Optimization under constraints: Minimize
Su0u0 = (y−Xβ)0 (y−Xβ)
with respect to β subject to Rβ = q
• A standard Lagrange approach yields [14]
βRLS
= β −³X0X
´−1R0
∙R³X0X
´−1R0¸−1 ³
Rβ − q´
110
• Residuals of the restricted model: u0 = y−XβRLS {17}
• The F -test statistic can also be written as [15]
F =
³Rβ − q
´0 hR¡X0X
¢−1R0i−1 ³Rβ − q´ /Lu0u/ (T −K − 1)
• Note the similarity to the t-test statistic
t2 =
³r0β − q
´2σ2hr0 (X0X)−1 r
i
• Standard statistical software includes simultaneous tests of linear combinations(F -tests)
111
Maximum likelihood estimation
• Repetition: If X is a K-dimensional random vector with multivariate normaldistribution N(μ,Σ) then its joint density is
fX (x) = (2π)−K/2 (detΣ)−1/2 exp
µ−12(x− μ)0Σ−1 (x− μ)
¶
• Multiple linear regression model
y = Xβ + u with u ∼ N³0, σ2I
´
• Distribution of the endogenous variables: y ∼ N³Xβ, σ2I
´112
• Joint density of y
fy (y)
= (2π)−T2
³detσ2I
´−12 expµ−12(y−Xβ)0
³σ2I
´−1(y−Xβ)
¶= (2π)−T/2
³σ2T
´−1/2exp
Ã−(y−Xβ)
0 (y−Xβ)2σ2
!
• Log-likelihood function
lnL³β, σ2
´= −T
2ln (2π)− T
2lnσ2 − (y−Xβ)
0 (y−Xβ)2σ2
113
• First order condition for a maximum⎡⎢⎣ ∂ lnL∂β
∂ lnL∂σ2
⎤⎥⎦ =⎡⎢⎢⎣
X0(y−X0β)σ2
−T2σ2
+(y−Xβ)0(y−Xβ)
2σ4
⎤⎥⎥⎦ ="00
#
• Solution of the FOCs [16]
βML =³X0X
´−1X0y
σ2ML =1
T
³u0u
´
• The ML estimator of β is identical to the OLS estimator, the ML estimator ofσ2 is different and thus biased (but asymptotically unbiased)
114
The classical tests (LR, Wald, LM)
• Illustration of the basic test ideas [threetests.R]
• Generalization to multiple restrictions
H0 : g(β) = 0
H1 : g(β) 6= 0
where β is the coefficient vector of a multiple linear regression modeland g is a (possibly nonlinear) vector-valued function
• Test of L linear restrictions: g(β) = Rβ − q
115
Wald test
• Idea: If g(βML) is significantly different from 0, reject H0
• Test statistic (for multiple restrictions)
W = g³βML
´0 h dCov ³g ³βML
´´i−1g³βML
´d→ U ∼ χ2L
if the null hypothesis is true
• Wald test statistic for L linear restrictions Rβ − q = 0 [17]
116
Likelihood ratio (LR) test
• Idea: If the maximal likelihood under the restrictions L(βR, σ2R) is significantlylower than the maximal likelihood without restrictions L(βML, σ
2ML), then
reject H0
• Test statistic
LR = 2³lnL
³βML, σ
2ML
´− lnL
³βR, σ
2R
´´d→ U ∼ χ2L
if the null hypothesis is true
• LR test statistic for L linear restrictions Rβ − q = 0 [18]
117
Lagrange multiplier (LM) test
• Idea: If the slope of the log-likelihood function ∂ lnL(βR)/∂β is significantlydifferent from 0, reject H0
• Test statistic
LM =
⎛⎝∂ lnL(βR)∂β
⎞⎠0 h dCov ³βR´i−1⎛⎝∂ lnL(βR)
∂β
⎞⎠ d→ U ∼ χ2L
if the null hypothesis is true
• LM test statistic for L linear restrictions Rβ − q = 0 [19]
118
Forecasting
• The approach is similar to forecasting in the simple linear regression
• Let x0 = [1, x10, x20, . . . , xK0]0 denote the vector of exogenous variables
• Point forecast
y0 = x00β
• Variance of the forecast error [20]
V ar (y0 − y0) = σ2µ1 + x00
³X0X
´−1x0
¶119
Presentation of the results
• In the literature, the results of regression analyses are often presented as follows
y = α + β1x1 + . . .+ βKxK(cse(α)) (cse(β1)) (cse(βK))
• Sometimes you find t-values in the parentheses, i.e. the values of the teststatistics for the tests H0 : βk = 0 vs H1 : βk 6= 0
• Often, R2 and σ and the value of the test statistic of the F test
H0 : β1 = . . . = βK = 0 vs H1 : not H0
are reported additionally120
• Fertilizer example:y = 0.95432 + 0.59652x1 + 0.26255x2
(0.46943) (0.13788) (0.03400)
• Additional results
R2 = 0.743
σ2 = 0.00425
σ = 0.0652
• Test statisticsH0 : β1 = 0 −→ 4.326H0 : β2 = 0 −→ 7.723H0 : β1 = β2 = 0 −→ 38.98
121
Examples of computer output:
• Excel
• SPSS
• EViews
• Stata
• R
• matlab122
Assumptions
A1: No relevant variable is omitted, and no irrelevant variables are includedA2: The true functional dependence between X and y is linearA3: The parameters β are constant for all T observations (xt, yt)B1-B4: u ∼ N
³0, σ2IT
´C1: The exogenous variables are not stochasticC2: No perfect multicollinearity: rank(X) = K + 1
All assumptions can be violated
What happens if they are violated?
123
Omitted or irrelevant variables
• Assumption A1: No relevant exogenous variable is omitted from theeconometric model, and all exogenous variables included in the model arerelevant
• What happens if relevant variables are missing?
• What happens if there are irrelevant variables included in the model?
• Example: Wage structure in a firm with 20 employees; what are thedeterminants of the wage yt?
124
Data: Education x1t; age x2t; firm tenure x3t
t yt x1t x2t x3t t yt x1t x2t x3t1 1250 1 28 12 11 1350 1 30 132 1950 9 34 8 12 1600 2 43 213 2300 11 55 25 13 1400 2 23 54 1350 3 24 5 14 1500 3 21 15 1650 2 42 21 15 2350 6 50 226 1750 1 43 19 16 1700 9 64 367 1550 4 37 17 17 1350 1 36 108 1400 1 18 1 18 2600 7 58 309 1700 3 63 25 19 1400 2 35 1710 2000 4 58 30 20 1550 2 41 6
125
• Three potential models (M2 is the true model)
(M1) yt = α+ βx1t + u0t(M2) yt = α+ β1x1t + β2x2t + ut(M3) yt = α+ β1x1t + β2x2t + β3x3t + u00t
Model Variable Coeff. bse(.) t-test p-value
(M1) Constant 1354.7 94.2 14.38 0.0000Education 89.3 19.8 4.50 0.0003
(M2) Constant 1027.8 164.5 6.25 0.0000Education 62.6 21.2 2.95 0.0089Age 10.6 4.6 2.32 0.0333
(M3) Constant 1000.5 225.7 4.43 0.0004Education 62.4 21.8 2.86 0.0114Age 12.4 10.7 1.16 0.2634Firm tenure -2.6 14.3 -0.18 0.8569
126
Omitted relevant variables
• Graphical representation
E
AB
C
F D G
Syy
S11 S22
127
• The models:
(M1) yt = α+ βx1t + u0t(M2) yt = α+ β1x1t + β2x2t + ut
(M3) yt = α+ β1x1t + β2x2t + β3x3t + u00t
• The error terms
u0t = β2x2t + ut
E(u0t) = E(β2x2t + ut)
= β2x2t +E(ut)
= β2x2t + 0
6= 0
128
• If a relevant exogenous variable is omitted, assumption B1 is violated!
• Consequence for point estimation
β01 = β1 + β2
S12S11
E(β01) = E
Ãβ1 + β2
S12S11
!
= β1 + β2S12S11
• Consequence for interval estimationhβ01 − ta/2 · cse(β01) ; β01 + ta/2 · cse(β01)i
129
• Further
se(β01) =
rvar
³β01
´with
var³β01
´=
σ2
S11
• The estimator
σ2 =Sbu0bu0T − 2
is biased; the unbiased estimator is
σ2 =SbubuT − 3
130
• Conclusion: The coverage probability of the confidence intervals is not 1− α
• Hypothesis tests are also biased: The probability of an error of the first kinddoes not equal the significance level
• If a relevant exogenous variable is omitted, then
— the point estimators are biased and inconsistent
— the interval estimators and hypothesis tests are no longer valid {18}
131
Irrelevant variables
• The error term in the misspecified model M3 is
u00t = ut − β3x3t
and since β3 = 0
u00t = ut
• Consequently,
E(bα001) = α
E(β001) = β1
E(β002) = β2
E(β003) = β3 = 0
132
• The variances of the estimators are
V ar(β1) =σ2
S11³1−R21·2
´V ar(β
001) =
σ2
S11³1−R21·2 −R21·3
´
• The estimated error term variance is
bσ2 = Sbu00bu00T − 4
• Conclusion: Omitted relevant variables are a serious problem,redundant variables are not (but they inflate the standard errors)
133
Diagnosis
• How can we find the correct model?
• The coeffcient of determination R2 does not help select a model
• Adjusted R2
R2= 1− Sbubu /(T −K − 1)
Syy /(T − 1)
= 1−³1−R2
´ T − 1T −K − 1
134
• Further model selection criteria(trade-off between biasedness and inefficiency)
• Akaike information criterion (AIC)
AIC = lnµSbubuT
¶+2(K + 1)
T
• t-test for single variables;F -test for multiple variables
135
Functional form
• Assumption A2: The true functional dependence between X and y is linear
• Milk example: Milk production m depends on amount of concentrated feed f
t ft mt t ft mt1 10 6525 7 8 58212 30 8437 8 14 75313 20 8019 9 25 83204 33 8255 10 1 43365 5 5335 11 17 72256 22 7236 12 28 8112
136
0 5 10 15 20 25 30
5000
6000
7000
8000
K ra ftfutte r
Milc
hmen
ge
137
• A misspecified model returns useless results
• Some nonlinear dependencies
Semi-Log. : mt = α+ β ln ft + ut
Invers : mt = α+ β (1/ft) + ut
Exponential : lnmt = α+ βft + ut
Logarithmic : lnmt = α+ β ln ft + ut
Quadratic : mt = α+ β1ft + β2f2t + ut
138
• Approach I: Estimation of a nonlinear regression
yt = g(xt) + ut
with criterion functionTXt=1
(yt − g(xt))2
• Optimization by numerical methods
• Approach II: Linearization of the model; then linear regression
yt = α+ βxt + utyt = lnmt
xt = ln ft
139
Diagnosis: Regression Specification Error Test (RESET)
• Higher order Taylor approximation
yt = f(xt) = α+ β1xt + β2x2t + β3x
3t + . . .
• Are the higher orders (jointly) significant?
• F -test of β2 = β3 = . . . = 0
• Problem: What happens if there are many exogenous variables?
140
• Basic idea of the RESET: by2t , by3t , . . . are included as additional exogenousvariables
yt = α+ β1xt + γ2by2t + γ3by3t + ut
• If γ2 and/or γ3 are significant, then there are nonlinearities
• F -test of γ2 = γ3 = 0 (maybe even higher orders)
• The test is implemented in many statistical software packages
141
RESET in the linear model:
1. Estimate the linear model and calculate Sbubu and the fitted byt2. Add L powers of yt to the linear model
yt = α+ β1xt + γ2y2t + γ3y
3t + ut
Estimate the extended model and calculate the sum of squared residuals S∗bubu3. The null hypothesis is H0 : γ2 = γ3 = 0
142
4. Compute the F -test statistic
F(L,T−K∗−1) =
³Sbubu − S∗bubu´ /L
S∗bubu/ (T −K∗ − 1)where K∗ is the number of exogenous variables in the extended model
5. If F > Fa (significance level a, degress of freedom L and T −K∗ − 1),then H0 is rejected and the linear model is discarded
• Milk example {18}
143
Qualitative exogenous variables
• Assumption A3: The parameters β are constant for all T observations (xt, yt)
• Example: The wage yt depends on education x1t and age x2t
yt = α+ β1x1t + β2x2t + ut
• The wage equations for males and females might be different
yt = αM + βM1x1t + βM2x2t + ut
yt = αF + βF1x1t + βF2x2t + ut
• What happens if the difference is neglected? [qualitative.R]
144
• Dummy variable
Dt =
(0 if male1 if female
• Extended model
yt = α+Dtγ + β1x1t + δ1Dtx1t + β2x2t + δ2Dtx2t + ut
• Model for men (Dt = 0)
yt = α+ β1x1t + β2x2t + ut
• Model for women (Dt = 1)
yt = (α+ γ) + (β1 + δ1)x1t + (β2 + δ2)x2t + ut
145
• If the qualitative variable has more than two values, we need more than onedummy variable
• Example: Religion (protestant, catholic, other)
DPt =
⎧⎪⎨⎪⎩0 for other1 for protestant0 for catholic
DCt =
⎧⎪⎨⎪⎩0 for other0 for protestant1 for catholic
• Meaning of the coefficients; testing structural stability
146
• Estimation of the model
• Use the ordinary t- or F -tests to detect differences in the coefficients, e.g.
H0 : γ = δ1 = δ2 = 0
• Very often, the model includes only a level effect, i.e.
yt = α+ γDt + β1x1t + β2x2t + ut
• Then use a t-test for γ
147
• Estimation of the wage equation model
yt = α+Dtγ + β1x1t + δ1Dtx1t + β2x2t + δ2Dtx2t + ut
• Compare with separat estimation of the two models [wages.R]
yt = αM + βM1x1t + βM2x2t + ut for menyt = αF + βF1x1t + βF2x2t + ut for women
• The point estimates and the sum of squared residuals are identical (why?)
• The standard errors differ (why?)
148
• For simplicity we only consider one exogenous variable
yt = α+ γDt + βxt + δDtxt + ut
• Order the observations such that Dt = 0 for t = 1, . . . , T1 and Dt = 1 fort = T1 + 1, . . . , T
• The joint estimation minimizes (with respect to α, β, γ, δ)
S (α, β, γ, δ) =T1Xt=1
(yt − α− βxt)2 +
TXt=T1+1
(yt − (α+ γ)− (β + δ)xt)2
149
The first order conditions for the joint estimation are
∂S
∂α= −
T1Xt=1
(yt − α− βxt)−TX
t=T1+1
(yt − (α+ γ)− (β + δ)xt) = 0
∂S
∂β= −
T1Xt=1
(yt − α− βxt)xt −TX
t=T1+1
(yt − (α+ γ)− (β + δ)xt)xt = 0
∂S
∂γ= −
TXt=T1+1
(yt − (α+ γ)− (β + δ)xt) = 0
∂S
∂δ= −
TXt=T1+1
(yt − (α+ γ)− (β + δ)xt)xt = 0
150
• Hence, the point estimates in the joint estimation are identical to those of theseparat estimations
• If the point estimates are identical, then so are the residuals; and if theresiduals are identical, then so are the sums of squared residuals
• As to the standard errors, in the joint model we estimate
σ2 = Suu/ (T − 4)
while in the separat estimations we estimate
σ20 = S0uu/ (T1 − 2)σ21 = S1uu/ ((T − T1)− 2)
151
Remarks
• What happens if the dummy variables are not 0/1-coded but 1/2-coded?
• Consider the model
yt = α+ γD1t + δD2t + βxt + ut
where
D1t =
(0 for males1 for females
D2t =
(0 for German citizenship1 else
• Interaction terms
152