ols geometry

Vector SpacesOLS and ProjectionsThe FWL Theorem

Applications

OLS Geometry

Walter Sosa-Escudero

Econ 507. Econometric Analysis. Spring 2009

February 3, 2009

Walter Sosa-Escudero OLS Geometry


Applications

Vector Space Geometry

A vector space S is a set along with an addition and a scalarmultiplication on S that satisfies some properties:conmutativity, associativity, etc.

The euclidean space


Applications

Some Definitions and Notation

Inner product: < x, y > xyNorm: ||x|| (xx)1/2 = (ni=1 x2i )1/2.Orthogonality: x and y are orthogonal iff < x, y >= xy = 0Linear dependence: x1, . . . , xk are linearly dependent if thereexists xj , 1 j k and coefficients ci such thatxj =

i 6=j cixi



Applications

Vector geometry in


Applications

A vector in


Applications

Vector addition: parallelograms rule



Applications

Subspaces of the Euclidean Space

A vector subspace is any subset of a vector space that is itselfa vector space.

Span: S(x1, . . . , xk) {z En | z = ki=1 bixi, bi


Applications

Orthogonal complement:S(X) {w En | wz = 0 for all z S(x)}. All vectorsthat are orthogonal to the columns of X.

Basis: a basis of V is a list of linearly independent vectorsthat spans V .

Dimension: # of vectors of any basis.Note dimS(X) (X)Result: Xnk with dimS(X) = k dimS(X) = n k



Applications

X is a vector in


Applications

Variables and observations in the axis

The goal is to represent the data and the OLS estimator.

We need to change our notion of point. A scatter plot takesevery observation as a point.

Now we need to think of Y and the columns of X as K + 1points in


Applications

Source: Bring, J., 1996, A Geometric Approach to Compare Variables in a Regression Model, The AmericanStatistician, 50,1, pp. 57-62.

What do you expect to happen with this picture if we add a third person? A

fourth?



Applications

OLS Geometry

By definition, any point in S(X) can be expressed as X,


Applications

The problem: min ||y x|| min ||y x||2.

Define: (solution to the problem), Y = X , e = Y Y

Some properties:

e is orthogonal to any point in S(X), in particular, to X orX.

= (X X)1X Y .From the orthogonality condition X (Y ) = 0.



Applications



Applications

Projections

A projection is a mapping that takes any point in En into apoint in a subspace of En.

An orthogonal projection maps any point into the point of thesubspace that is closest to it.

Y = X = X(X X)1X Y = PXY is the orthogonalprojection of Y on S(X). PX = X(X X)1X is theprojection matrix that projects Y orthogonally on to S(X).e = Y Y = Y X = (IX(X X)1X )Y = MXY is theprojection of Y on to the orthogonal complement of S(X),that is, S(X). MX I PX = I X(X X)1X . is theprojecton matrix that projects Y orthogonally on to S(X).



Applications

Properties: easy to check algebraically, better to understand themgeometrically

MX and PX are symmetric matrices.

MX + PX = I. This suggests the orthogonal decompositionY = MXY + PXY



Applications

PX and MX are idempotent: PXPX = PX , MXMX = MX .Intuition: if a vector is already in S(X), further projecting itin S(X) has no effect.PXMX = 0. Think about what you get of doing fisrt oneprojection and then the other (in any order). PX and MXanihilate each other. 0 is the only point that belongs to bothS(X) and S(X).MX anihilates any point in S(X), that is MXX = 0PX anihilates any point in S

(X) : PXX = 0 CHECKIf A is a non-singular matrix K K, PXA = PX .(X) = (PX)



Applications

Goodness of fit

From the orthogonal decomposition

Y = PY +MY

Then

Y Y = Y PY + Y MY (1)= Y P PY + Y M MY (2)

||Y ||2 = ||PY ||2 + ||MY ||2 (3)In


Applications



Applications

The Frisch-Waugh-Lovell Theorem

Consider the linear model: Y = X + u

And partition it as follows: Y = X11 +X22 + u

X1, X2 matrices of k1 and k2 explanatory variables. Then,X = [X1 X2], = (1 2) and k = k1 + k2.

M1 I X1(X 1X1)1X 1, projects any vector in Rn in theorthogonal complement of the span of X1.

Y M1Y , X2 M1X2, respectively, OLS residuals of regressingY on X1, and all columns of X2 on X1.



Applications

Suppose that we are interested in estimating 2, and consider thefollowing alternative methods:

Method 1: Proceed as usual and regress Y on X obtainingthe OLS estimator = (1 2) = (X X)1X Y . 2 wouldbe the desired estimate.

Method 2: Regress Y on X2 and obtain as estimate2 = (X2 X2 )1X2 Y

Let e1 and e2 be the residuals vectors of the regressions in Method1 and 2, respectively.

Theorem (Frisch and Waugh, 1933, Lovell, 1963): 2 = 2 (firstpart) and e1 = e2 (second part).



Applications

Proof (boring): Start point with the orthogonal decomposition:

Y = PY +MY = X11 +X22 +MY

To prove the first part, multiply by X 2M1 to get:

X 2M1Y = X2M1X11 +X

2M1X22 +X

2M1MY

M1X1 = 0, why?X 2M1M = X 2M X 2P1M = 0 (same reasons as before)

Then: X 2M1Y = X 2M1X22

So: 2 = (X 2M1X2)1 X 2M1Y



Applications

To prove the second part multiply the orthogonal decomposition byM1 and obtain:

M1Y = M1X11 +M1X22 +M1MY

Again, M1X1 = 0MY belongs to the orthogonal complement of [X1 X2], sofurther projecting it on the orthogonal complement of X1(which is what premultiplying by M1 would do) has no effect,hence M1MY = MY .

This leaves:

M1Y M1X22 = MYY X2 2 = MY

e2 = e1



Applications

Geometric Illustration of FWLT



Applications

Comments and Intuitions

Idea of controling for X1: either put it in the model, or firstget rid of it by extracting its effect.

What if X1 and X2 are orthogonal?



Applications

Applications of the FWLT

Deviations from means.

Detrending

Seasonal effects

Later on: multicolinearity, omitted variable bias, panel-datafixed-effects estimation, instrumental variables.



Applications

Deviation from means

Simple model with intercept

Y = X + u = 1 1 + [X2 X3 XK ] 1,

1 (1, 1, . . . , 1), 1 = (2, 3, . . . , K), and Xk, k = 2, . . . ,Kare the corresponding columns of X.

Two methods of estimating 1

Method 1: Regress Y on X = [1 X2 XK ].

Method 2: Get residuals of projecting Xk, k = 2, . . . ,K on 1, callthem Xk . Do the same with Y , and call them Y

.



Applications

Note P1 = 1(11)11 = n1J , J is an n n matrix of 1s. Then

P1Xk =1nJXk = (Xk, Xk, . . . , Xk)

so Xk = M1Xk = (I P1)Xk = Xk (Xk, Xk, . . . , Xk), ann 1 vector with typical element

Xik = Xik XkSo the second method consists in:

1 Reexpress all varaibles as deviations from their sample means.

2 Run the standard regression of these residuals withoutintercept.

Question: what happens if we forget to reexpress Y as deviationsfrom its means. Generalize this result


Vector SpacesOLS and ProjectionsThe FWL TheoremApplications

ols geometry

Documents

bi vector spacesols

x x1x y

columns of x

orthogonality condition

x xx12

itselfa vector space

y xynorm

orthogonal iff x