part 3: least squares algebra 3-1/27 econometrics i professor william greene stern school of...
TRANSCRIPT
Part 3: Least Squares Algebra3-1/27
Econometrics IProfessor William GreeneStern School of Business
Department of Economics
Part 3: Least Squares Algebra3-2/27
Econometrics I
Part 3 – Least Squares Algebra
Part 3: Least Squares Algebra3-3/27
Vocabulary Some terms to be used in the discussion.
Population characteristics and entities vs. sample quantities and analogs
Residuals and disturbances Population regression line and sample
regression Objective: Learn about the conditional mean
function. Estimate and 2
First step: Mechanics of fitting a line (hyperplane) to a set of data
Part 3: Least Squares Algebra3-4/27
Fitting Criteria The set of points in the sample Fitting criteria - what are they:
LAD Least squares and so on
Why least squares? A fundamental result: Sample moments are “good” estimators of their population counterparts We will spend the next few weeks using this
principle and applying it to least squares computation.
Part 3: Least Squares Algebra3-5/27
An Analogy Principle for Estimating In the population E[y | X ] = X so E[y - X |X] = 0 Continuing E[xi i] = 0 Summing, Σi E[xi i] = Σi 0 = 0 Exchange Σi and E[] E[Σi xi i] = E[ X ] = 0 E[X(y - X) ] = 0
Choose b, the estimator of to mimic this population result: i.e., mimic the population mean with the sample mean
Find b such that As we will see, the solution is the least squares coefficient
vector.
( )N N
= 1 1
Xe 0 X y- Xb
Part 3: Least Squares Algebra3-6/27
Population and Sample Moments
We showed that E[i|xi] = 0 and Cov[xi,i] = 0. If so, and if E[y|X] = X, then
= (Var[xi])-1 Cov[xi,yi].
This will provide a population analog to the
statistics we compute with the data.
Part 3: Least Squares Algebra3-7/27
U.S. Gasoline Market, 1960-1995
Part 3: Least Squares Algebra3-8/27
Least Squares Example will be, Gi on
xi = [1, PGi , Yi]
Fitting criterion: Fitted equation will be yi = b1xi1 + b2xi2 + ... + bKxiK.
Criterion is based on residuals: ei = yi - b1xi1 + b2xi2 + ... + bKxiK
Make ei as small as possible.
Form a criterion and minimize it.
Part 3: Least Squares Algebra3-9/27
Fitting Criteria
Sum of residuals:
Sum of squares:
Sum of absolute values of residuals:
Absolute value of sum of residuals
We focus on now and later
1e
N
ii
2
1e
N
ii
1e
N
ii
1e
N
ii
2
1e
N
ii 1e
N
ii
Part 3: Least Squares Algebra3-10/27
Least Squares Algebra
2
1e
e e = (y - Xb)'(y - Xb)
N
ii
A digression on multivariate calculus.
Matrix and vector derivatives.
Derivative of a scalar with respect to a vector
Derivative of a column vector wrt a row vector
Other derivatives
Part 3: Least Squares Algebra3-11/27
Least Squares Normal Equations
2 2
1 1e y
2
Note: Derivative of 1x1 w
( - x b)
b b(y - Xb)'(y - Xb)
X'(y - Xb) = 0b
(1x1) / (kx1) (-2)(NxK)'(Nx1)
= (-2)(KxN)(Nx1) = Kx1
N N
i i ii i
rt Kx1 is a Kx1 vector.
Solution: 2 X'(y - Xb) = 0 X'y = X'Xb
Part 3: Least Squares Algebra3-12/27
Least Squares Solution
-1
1
1
Assuming it exists: = ( )
Note the analogy: = Var( ) Cov( ,y)
1 1 =
Suggests something desirable about least squares
b X'X X'y
x x
b X'X X'y
N N
Part 3: Least Squares Algebra3-13/27
Second Order Conditions
2
Necessary Condition: First derivatives = 0
2
Sufficient Condition: Second derivatives ...
=
(y - Xb)'(y - Xb)X'(y - Xb)
b
(y - Xb)'(y - Xb)(y - Xb)'(y - Xb) b
b b b K 1 column vector
= 1 K row vector
= 2
X'X
Part 3: Least Squares Algebra3-14/27
Does b Minimize e’e?
21 1 1 1 2 1 1
221 2 1 1 2 1 2
21 1 1 2 1
...
...2
... ... ... ...
...
If there were a single b, we would require this to be
po
e'eX'X = 2
b b'
N N Ni i i i i i i iK
N N Ni i i i i i i iK
N N Ni iK i i iK i i iK
x x x x x
x x x x x
x x x x x
2
1sitive, which it would be; 2 = 2 0.
The matrix counterpart of a positive number is a
.
positive definite ma
x'x
trix
N
iix
Part 3: Least Squares Algebra3-15/27
Sample Moments - Algebra
2 21 1 1 1 2 1 1 1 1 2 1
2 21 2 1 1 2 1 2 2 1 2 2
1
21 1 1 2 1 1 2
... ...
... ...=
... ... ... ... ... ... ... ...
...
X'X =
N N Ni i i i i i i iK i i i i iK
N N NNi i i i i i i iK i i i i iKi
N N Ni iK i i iK i i iK iK i iK i
x x x x x x x x x x
x x x x x x x x x x
x x x x x x x x x
2
1
21 1 2
1
...
= ......
=
x x
iK
i
iNi i i iK
ik
Ni i i
x
x
xx x x
x
Part 3: Least Squares Algebra3-16/27
Positive Definite Matrix
Matrix is positive definite if is > 0
for any .
Generally hard to check. Requires a look at
characteristic roots (later in the course).
For some matrices, it is easy to verify. i
C a'Ca
a
X'X
K 2kk=1
= v 0
-1
s
one of these.
= ( )( ) = ( ) ( ) =
Could = ? means . Is this possible?
Conclusion: =( ) does indeed minimize .
a'X'Xa a'X' Xa Xa ' Xa v'v
v 0 v=0 Xa = 0
b X'X X'y e'e
Part 3: Least Squares Algebra3-17/27
Algebraic Results - 1
1
In the population: E[ ' ] =
1In the sample: ei
X 0
x 0
N
iiN
Part 3: Least Squares Algebra3-18/27
Residuals vs. Disturbances
i i i
i i i
Disturbances (population) y
Partitioning : = E[ | ] +
Residuals (sample) y e
Partitio
x
y y y X
xb
ε
= conditional mean + disturbance
ning : = +
, i.e., the
set of linear combinations of the columns of -
y y Xb
X
X Xb
e
= projection + residual
( Note: Projection into the column space of
is one of these.)
Part 3: Least Squares Algebra3-19/27
Algebraic Results - 2
A “residual maker” M = (I - X(X’X)-1X’) e = y - Xb= y - X(X’X)-1X’y = My My = The residuals that result when y is regressed on X MX = 0 (This result is fundamental!) How do we interpret this result in terms of residuals? When a column of X is regressed on all of X, we get a
perfect fit and zero residuals. (Therefore) My = MXb + Me = Me = e (You should be able to prove this. y = Py + My, P = X(X’X)-1X’ = (I - M). PM = MP = 0. Py is the projection of y into the column space of X.
Part 3: Least Squares Algebra3-20/27
The M Matrix
M = I- X(X’X)-1X’ is an nxn matrix M is symmetric – M = M’ M is idempotent – M*M = M (just multiply it out) M is singular; M-1 does not exist. (We will prove this later as a side result
in another derivation.)
Part 3: Least Squares Algebra3-21/27
Results when X Contains a Constant Term X = [1,x2,…,xK] The first column of X is a column of ones Since X’e = 0, x1’e = 0 – the residuals sum to
zero.
N
ii=1
Define [1,1,...,1]' a column of n ones
= y ny
implies (after dividing by N)
y (the regression line passes through the means)
These do not apply if the model has no
y Xb e
i
i'y
i'y i'Xb+i'e=i'Xb
x b
+
constant term.
Part 3: Least Squares Algebra3-22/27
Least Squares Algebra
Part 3: Least Squares Algebra3-23/27
Least Squares
Part 3: Least Squares Algebra3-24/27
Residuals
Part 3: Least Squares Algebra3-25/27
Least Squares Residuals
Part 3: Least Squares Algebra3-26/27
Least Squares Algebra-3
M is NxN potentially huge
I X XX X M
X e
Part 3: Least Squares Algebra3-27/27
Least Squares Algebra-4
MX =