multiple regression analysis: estimationbvankamm/files/360 notes/02 - multiple regression...from...
TRANSCRIPT
Multiple Regression Analysis: EstimationECONOMETRICS (ECON 360)
BEN VAN KAMMEN, PHD
OutlineMotivation.
Mechanics and Interpretation.
Expected Values and Variances of the Estimators.
Motivation for multiple regressionConsider the following results of a regression of the number of crimes reported in Milwaukee on the search volume (on Google) for the term âice creamâ⊠which Iâm using as a proxy for ice cream sales.⊠A couple caveats about these data.
Dep. Var.: Crimes Coefficient ( ï¿œð·ð· ) Std. Err. tIce Cream 1.5437 .4306 3.58
N=82; ð ð 2=0.1384
Motivation for multiple regression (continued)Ice cream is somehow an input into or a reason for committing crimes? ⊠Evidenced by its positive association with crime.
Common sense and past experiences with ice cream, however, argue that there is probably no causal connection between the two. ⊠A spurious correlation that⊠disappears when we control for a confounding variable that is related to both x and y.
In this case summarized as âgood weatherâ, ⊠i.e., people eat more ice cream when it is warm and also go outside more when it is warm (up to a
point, at least). ⊠The increase in social interaction occasioned by warm weather, then, creates more opportunities for
conflicts and crimes, according to this informal theory, while coincidentally leading to more ice cream sales, as well.
Motivation for multiple regression (concluded)To use regression analysis to disconfirm the theory that ice cream causes more crime, perform a regression that controls for the effect of weather in some way.
Either,⊠Examine sub-samples of days in which the weather is (roughly) the same but ice cream consumption
varies, or⊠Explicitly control for the weather by including it in a multiple regression model.
Multiple regression definedMultiple regression expands the regression model using more than 1 regressor / explanatory variable / âindependent variableâ.
For 2 regressors, we would model the following relationship.ðŠðŠ = ðœðœ0 + ðœðœ1ð¥ð¥1 + ðœðœ2ð¥ð¥2 + ð¢ð¢,
and estimate separate effects (ðœðœ1 and ðœðœ2)âfor each explanatory variable (ð¥ð¥1 and ð¥ð¥2).
Assuming enough data (and reasons for adding additional variables), the model with ðð > 1regressors:
ðŠðŠ = ðœðœ0 + ðœðœ1ð¥ð¥1 + ðœðœ2ð¥ð¥2 + ðœðœ3ð¥ð¥3+. . . +ðœðœððð¥ð¥ðð + ð¢ð¢,
would not be anything fundamentally different from a simple regression.
Multiple OLS and simple OLSMultiple regression relaxes the assumption that all other factors are held fixed.
Instead of, ðžðž ðððððððððð ð¥ð¥1 = 0, one needs only assume that ðžðž ðððððððððð ð¥ð¥1, ð¥ð¥2 . . . ð¥ð¥ðð = 0.⊠A more realistic assumption when dealing with non-experimental data.
One other way of thinking about multiple regression:⊠simple regression as a special caseâin which {ð¥ð¥2, . . . ð¥ð¥ðð} have been relegated to the error term and
treated as mean independent of ð¥ð¥1. ⊠I.e., simple regression has you estimating:
ðŠðŠ = ðœðœ0 + ðœðœ1ð¥ð¥1 + ðð; ðð â¡ ðœðœ2ð¥ð¥2 + ðœðœ3ð¥ð¥3+. . . +ðœðœððð¥ð¥ðð + ð¢ð¢.
Multiple OLS and simple OLS (continued)To demonstrate why this could be a bad strategy, consider the ice cream-crime problem again, assuming that the true relationship follows:
ðððððððððð = ðœðœ0 + ðœðœ1ðððððððððððððððð + ðœðœ2ð¡ð¡ððððð¡ð¡ððððððð¡ð¡ð¢ð¢ðððð + ð¢ð¢;ðžðž ð¢ð¢ ðððððððððððððððð, ð¡ð¡ððððð¡ð¡ððððððð¡ð¡ð¢ð¢ðððð = 0.
My informal theory states that ðœðœ1 = 0 and ðœðœ2 > 0.
Multiple OLS and simple OLS (continued)Estimating the simple regression between ice cream and crime, was as if the model was transformed:
ðððððððððð = ðœðœ0 + ðœðœ1ðððððððððððððððð + ðð; ðð = ðœðœ2ð¡ð¡ððððð¡ð¡ððððððð¡ð¡ð¢ð¢ðððð + ð¢ð¢,
which I estimated under the assumption that: ðžðž ðð ðððððððððððððððð = 0.
However this likely a bad assumption, since it is probable that: ðžðž(ð¡ð¡ððððð¡ð¡ððððððð¡ð¡ð¢ð¢ðððð|ðððððððððððððððð) â 0.
Multiple OLS and simple OLS (concluded)High (low) values of icecream correspond with high (low) values of temperature.
The assumptions underlying simple regression analysis state that when the conditional mean independence is violated, bias is introduced in OLS.⊠This bias would explain the positive estimate for ðœðœ1 shown above.
We will examine the source of the bias more closely and how to estimate its direction later in this chapter.
First we turn our attention back to the technical aspects of estimating the OLS parameters with multiple regressors.
Results, controlling for temperatureNow the coefficient on ice cream is much smaller (closer to 0) and the effect of temperature is large and significant.⊠Itâs not precisely zero, but we can no longer reject the null that it has zero effect on crime.
Dep. Var.: Crimes Coefficient ( ï¿œð·ð· ) Std. Err. tIce Cream 0.4760 .4477 1.06Temperature 1.9492 .4198 4.64
N=82; ð ð 2=0.3231
Mechanics and interpretation of OLSEven with ðð > 1 regressors, OLS minimizes the sum of the squares of the residuals.
With ðð = 2 regressors:ï¿œð¢ð¢ðð = ðŠðŠðð â ï¿œÌï¿œðœ0 â ï¿œÌï¿œðœ1ð¥ð¥ðð1 â ï¿œÌï¿œðœ2ð¥ð¥ðð2 â ï¿œð¢ð¢ðð2 = ðŠðŠðð â ï¿œÌï¿œðœ0 â ï¿œÌï¿œðœ1ð¥ð¥ðð1 â ï¿œÌï¿œðœ2ð¥ð¥ðð2
2.
ððð¢ð¢ðð ðððð ððððð¢ð¢ðððððððð ð ð ððð ð ððððð¢ð¢ððð ð ð ð (ððððð ð ) = ï¿œðð=1
ðð
ðŠðŠðð â ï¿œÌï¿œðœ0 â ï¿œÌï¿œðœ1ð¥ð¥ðð1 â ï¿œÌï¿œðœ2ð¥ð¥ðð22
Note is the change in subscripting: each observation is indexed by âiâ as before, but the two âxâ variables are now distinguished from one another by a â1â and a â2â in the subscript.
Minimizing SSRFor ðð > 1 regressors:
ððððð ð = ï¿œðð=1
ðð
ðŠðŠðð â ï¿œÌï¿œðœ0 â ï¿œÌï¿œðœ1ð¥ð¥ðð1 â ï¿œÌï¿œðœ2ð¥ð¥ðð2â. . .âï¿œÌï¿œðœððð¥ð¥ðððð2 .
Once again minimization is accomplished by differentiating the above line with respect to each of the ðð + 1 statistics,⊠ᅵÌï¿œðœ0 and ⊠all the (k) slope parameters.
The first order conditions for the minimum are:ððððððð ð ððï¿œÌï¿œðœ0
= 0 andððððððð ð ððï¿œÌï¿œðœðð
= 0 â ðð â 1,2, . . . ðð .
Minimizing SSR (continued)âSimultaneously choose all the âbetasâ to make the regression model fit as closely as possible to all the data points.â
The partial derivatives for the problem look like this:
1ððððððð ð ððï¿œÌï¿œðœ0
= â2ï¿œðð=1
ðð
ðŠðŠðð â ï¿œÌï¿œðœ0 â ï¿œÌï¿œðœ1ð¥ð¥ðð1 â ï¿œÌï¿œðœ2ð¥ð¥ðð2â. . .âï¿œÌï¿œðœððð¥ð¥ðððð
2ððððððð ð ððï¿œÌï¿œðœ1
= â2ï¿œðð=1
ðð
ð¥ð¥ðð1 ðŠðŠðð â ï¿œÌï¿œðœ0 â ï¿œÌï¿œðœ1ð¥ð¥ðð1 â ï¿œÌï¿œðœ2ð¥ð¥ðð2â. . .âï¿œÌï¿œðœððð¥ð¥ðððð
. . . ðð + 1ððððððð ð ððï¿œÌï¿œðœðð
= â2ï¿œðð=1
ðð
ð¥ð¥ðððð ðŠðŠðð â ï¿œÌï¿œðœ0 â ï¿œÌï¿œðœ1ð¥ð¥ðð1 â ï¿œÌï¿œðœ2ð¥ð¥ðð2â. . .âï¿œÌï¿œðœððð¥ð¥ðððð .
Example with ðð = 2As a representative (and comparatively easy to solve) example, consider the solution when ðð =2.
Setting (1) equal to zero (FOC) and solving for ï¿œÌï¿œðœ0:
2ððï¿œÌï¿œðœ0 â 2ï¿œðð=1
ðð
ðŠðŠðð + 2ï¿œÌï¿œðœ1ï¿œðð=1
ðð
ð¥ð¥ðð1 + 2ï¿œÌï¿œðœ1ï¿œðð=1
ðð
ð¥ð¥ðð2 = 0
4 ï¿œÌï¿œðœ0 =1ððï¿œðð=1
ðð
ðŠðŠðð â ï¿œÌï¿œðœ1ï¿œðð=1
ðð
ð¥ð¥ðð1 â ï¿œÌï¿œðœ2ï¿œðð=1
ðð
ð¥ð¥ðð2 = ï¿œðŠðŠ â ï¿œÌï¿œðœ1ï¿œÌ ï¿œð¥1 â ï¿œÌï¿œðœ2ï¿œÌ ï¿œð¥2
Example with ðð = 2 (continued)Doing the same thing with (2) and (3) gives you:
2ððððððð ð ððï¿œÌï¿œðœ1
= 0 â ï¿œÌï¿œðœ1ï¿œðð=1
ðð
ð¥ð¥ðð12 = ï¿œðð=1
ðð
ð¥ð¥ðð1ðŠðŠðð â ï¿œÌï¿œðœ0ï¿œðð=1
ðð
ð¥ð¥ðð1 â ï¿œÌï¿œðœ2ï¿œðð=1
ðð
ð¥ð¥ðð1ð¥ð¥ðð2
3ððððððð ð ððï¿œÌï¿œðœ2
= 0 â ï¿œÌï¿œðœ2ï¿œðð=1
ðð
ð¥ð¥ðð22 = ï¿œðð=1
ðð
ð¥ð¥ðð2ðŠðŠðð â ï¿œÌï¿œðœ0ï¿œðð=1
ðð
ð¥ð¥ðð2 â ï¿œÌï¿œðœ1ï¿œðð=1
ðð
ð¥ð¥ðð1ð¥ð¥ðð2
Example with ðð = 2 (continued)Substituting (4):
2 ï¿œÌï¿œðœ1ï¿œðð=1
ðð
ð¥ð¥ðð12 = ï¿œðð=1
ðð
ð¥ð¥ðð1ðŠðŠðð â ï¿œðŠðŠ â ï¿œÌï¿œðœ1ï¿œÌ ï¿œð¥1 â ï¿œÌï¿œðœ2ï¿œÌ ï¿œð¥2 ððï¿œÌ ï¿œð¥1 â ï¿œÌï¿œðœ2ï¿œðð=1
ðð
ð¥ð¥ðð1ð¥ð¥ðð2
3 ï¿œÌï¿œðœ2ï¿œðð=1
ðð
ð¥ð¥ðð22 = ï¿œðð=1
ðð
ð¥ð¥ðð2ðŠðŠðð â ï¿œðŠðŠ â ï¿œÌï¿œðœ1ï¿œÌ ï¿œð¥1 â ï¿œÌï¿œðœ2ï¿œÌ ï¿œð¥2 ððï¿œÌ ï¿œð¥2 â ï¿œÌï¿œðœ1ï¿œðð=1
ðð
ð¥ð¥ðð1ð¥ð¥ðð2
Example with ðð = 2 (continued)Solving (3) for ï¿œÌï¿œðœ2:
(3) â ï¿œÌï¿œðœ2 ï¿œðð=1
ðð
ð¥ð¥ðð22 â ððï¿œÌ ï¿œð¥22 = ï¿œðð=1
ðð
ð¥ð¥ðð2ðŠðŠðð â ððï¿œÌ ï¿œð¥2 ï¿œðŠðŠ + ððï¿œÌ ï¿œð¥2ï¿œÌï¿œðœ1ï¿œÌ ï¿œð¥1 â ï¿œÌï¿œðœ1ï¿œðð=1
ðð
ð¥ð¥ðð1ð¥ð¥ðð2
â ï¿œÌï¿œðœ2 = ï¿œðð=1
ðð
ð¥ð¥ðð22 â ððï¿œÌ ï¿œð¥22â1
ï¿œðð=1
ðð
ð¥ð¥ðð2ðŠðŠðð â ððï¿œÌ ï¿œð¥2 ï¿œðŠðŠ + ððï¿œÌ ï¿œð¥2ï¿œÌï¿œðœ1ï¿œÌ ï¿œð¥1 â ï¿œÌï¿œðœ1ï¿œðð=1
ðð
ð¥ð¥ðð1ð¥ð¥ðð2
â ï¿œÌï¿œðœ2 = ðððððð ð¥ð¥2â1 ð¶ð¶ððð¶ð¶ ð¥ð¥2,ðŠðŠ â ï¿œÌï¿œðœ1 ð¶ð¶ððð¶ð¶ ð¥ð¥1, ð¥ð¥2
Example with ðð = 2 (continued)Solve it simultaneously with (2) by substituting it in:
2 â ï¿œÌï¿œðœ1ï¿œðð=1
ðð
ð¥ð¥ðð12 = ð¶ð¶ððð¶ð¶ ð¥ð¥1,ðŠðŠ + ï¿œÌï¿œðœ1ððï¿œÌ ï¿œð¥12 â ï¿œÌï¿œðœ2 ð¶ð¶ððð¶ð¶ ð¥ð¥1, ð¥ð¥2 ; now
ï¿œÌï¿œðœ1ï¿œðð=1
ðð
ð¥ð¥ðð12 = ð¶ð¶ððð¶ð¶ ð¥ð¥1,ðŠðŠ + ï¿œÌï¿œðœ1ððï¿œÌ ï¿œð¥12 âð¶ð¶ððð¶ð¶ ð¥ð¥2,ðŠðŠ ð¶ð¶ððð¶ð¶ ð¥ð¥1, ð¥ð¥2 â ï¿œÌï¿œðœ1 ð¶ð¶ððð¶ð¶ ð¥ð¥1, ð¥ð¥2
2
ðððððð ð¥ð¥2
Example with ðð = 2 (continued)Collect all the ï¿œÌï¿œðœ1 terms.
ï¿œÌï¿œðœ1 ï¿œðð=1
ðð
ð¥ð¥ðð12 â ððï¿œÌ ï¿œð¥12 âð¶ð¶ððð¶ð¶ ð¥ð¥1, ð¥ð¥2
2
ðððððð ð¥ð¥2= ð¶ð¶ððð¶ð¶ ð¥ð¥1,ðŠðŠ â
ð¶ð¶ððð¶ð¶ ð¥ð¥2,ðŠðŠ ð¶ð¶ððð¶ð¶ ð¥ð¥1, ð¥ð¥2ðððððð ð¥ð¥2
â ï¿œÌï¿œðœ1 ðððððð ð¥ð¥1 âð¶ð¶ððð¶ð¶ ð¥ð¥1, ð¥ð¥2
2
ðððððð ð¥ð¥2= ð¶ð¶ððð¶ð¶ ð¥ð¥1,ðŠðŠ â
ð¶ð¶ððð¶ð¶ ð¥ð¥2,ðŠðŠ ð¶ð¶ððð¶ð¶ ð¥ð¥1, ð¥ð¥2ðððððð ð¥ð¥2
â ï¿œÌï¿œðœ1ðððððð ð¥ð¥1 ðððððð ð¥ð¥2 â ð¶ð¶ððð¶ð¶ ð¥ð¥1, ð¥ð¥2
2
ðððððð ð¥ð¥2=ð¶ð¶ððð¶ð¶ ð¥ð¥1,ðŠðŠ ðððððð ð¥ð¥2 â ð¶ð¶ððð¶ð¶ ð¥ð¥2,ðŠðŠ ð¶ð¶ððð¶ð¶ ð¥ð¥1, ð¥ð¥2
ðððððð ð¥ð¥2
Example with ðð = 2 (concluded)Simplifying gives you:
ï¿œÌï¿œðœ1 =ð¶ð¶ððð¶ð¶ ð¥ð¥1,ðŠðŠ ðððððð ð¥ð¥2 â ð¶ð¶ððð¶ð¶ ð¥ð¥2,ðŠðŠ ð¶ð¶ððð¶ð¶ ð¥ð¥1, ð¥ð¥2
ðððððð ð¥ð¥1 ðððððð ð¥ð¥2 â ð¶ð¶ððð¶ð¶ ð¥ð¥1, ð¥ð¥22
Then you can solve for ï¿œÌï¿œðœ2 which, after a lot of simplification, is:
ï¿œÌï¿œðœ2 =ð¶ð¶ððð¶ð¶ ð¥ð¥2,ðŠðŠ ðððððð ð¥ð¥1 â ð¶ð¶ððð¶ð¶ ð¥ð¥1,ðŠðŠ ð¶ð¶ððð¶ð¶ ð¥ð¥1, ð¥ð¥2
ðððððð ð¥ð¥1 ðððððð ð¥ð¥2 â ð¶ð¶ððð¶ð¶ ð¥ð¥1, ð¥ð¥22 .
Lastly, substitute the above expressions into the solution for the intercept:ï¿œÌï¿œðœ0 = ï¿œðŠðŠ â ï¿œÌï¿œðœ1ï¿œÌ ï¿œð¥1 â ï¿œÌï¿œðœ2ï¿œÌ ï¿œð¥2.
Multiple OLS and variance/covarianceExamine the solutions closely.
They depend, as with simple regression, only on the variances and covariances of the regressors and their covariances with ðŠðŠ.⊠This is true generally of OLS, even when there are many explanatory variables in the regression.
As this number grows, even to 2, the solution becomes difficult to work out with algebra, but software (like STATA) is very good at performing the calculations and solving for the ðð + 1estimates, even when ðð is quite large.⊠How software works out the solution: matrix algebra.
Multiple regression and âpartialling outâThe âPartialling Outâ Interpretation of Multiple Regression is revealed by the matrix and non-matrix estimate of ï¿œÌï¿œðœ1.⊠What goes into ï¿œÌï¿œðœ1 in a multiple regression is the variation in ð¥ð¥1 that cannot be âexplainedâ by its
relation to the other ð¥ð¥ variables.⊠The covariance between this residual variation in ð¥ð¥1, not explained by other regressors, and ðŠðŠ is what
matters.
It also provides a good way to think of ï¿œÌï¿œðœ1 as the partial effect of ð¥ð¥1, holding other factors fixed.
It is estimated using variation in ð¥ð¥1 that is independent of variation in other regressors.
More.
Expected value of the OLS estimatorsNow we resume the discussion of a misspecified OLS regression, e.g., crime on ice cream, by considering the assumptions under which OLS yields unbiased estimates and how misspecifyinga regression can produce biased estimates.⊠Also what is the direction of the bias?
There is a population model that is linear in parameters.
Assumption MLR.1 states this model which represents the true relationship between ðŠðŠ and all ð¥ð¥.
I.e.,
ðŠðŠ = ðœðœ0 + ðœðœ1ð¥ð¥1 + ðœðœ2ð¥ð¥2 + ðœðœ3ð¥ð¥3+. . . +ðœðœððð¥ð¥ðð + ð¢ð¢.
OLS under assumptions MLR 1-4The sample of ðð observations is assumed random and indexed by ðð (MLR.2).
From simple regression, we know that there must be variation in ð¥ð¥ for an estimate to exist.
With multiple regression, each regressor must have (at least some) variation that is not explained by the other regressors. ⊠The other regressors cannot âpartial outâ all of the variation in, say, ð¥ð¥1 while still estimating ðœðœ1. ⊠Perfect multicollinearity is ruled out (by Assumption MLR.3).
When you have a set of regressors that violates this assumption, one of them must be removed from the regression. ⊠Software will usually do this automatically for you.
OLS under assumptions MLR 1-4 (continued)Finally multiple regression assumes a less restrictive version of mean independence between regressors and error term.
Assumption MLR.4 states that the error term has zero conditional mean, i.e., conditional on all regressors.
ðžðž ð¢ð¢ ð¥ð¥1, ð¥ð¥2, . . . , ð¥ð¥ðð = 0.
This can fail if the estimated regression is misspecified, in terms of the functional form of one of the variables or the omission of a relevant regressor.
The model contains exogenous regressors, with the alternative being an endogenous regressor (explanatory variable).⊠This is when some ð¥ð¥ðð is correlated with an omitted variable in the error term.
UnbiasednessUnder the four assumptions above, the OLS estimates are unbiased, i.e.,
ðžðž ï¿œÌï¿œðœðð = ðœðœððâ ðð.
This includes cases in which variables have no effect on ðŠðŠ and in which ðœðœðð = 0.⊠This is the âoverspecifiedâ case in the text, in which an irrelevant variable is included in the regression. ⊠Though this does not affect the unbiased-ness of OLS, it does impact the variance of the estimatesâ
sometimes in undesirable ways we will study later.
Expected value and variance of OLS estimators: under MLR assumptions 1-4Unbiasedness does not apply when you exclude a variable from the population model when estimating it.⊠This is called underspecifying the model.⊠It occurs when, because of an empiricistâs mistake or failure to observe a variable in data, a relevant
regressor is excluded from the estimation.⊠For the sake of concreteness, consider an example of each.
1. An empiricist regresses crime on ice cream because he has an axe to grind about ice cream or something like that, and he wants to show that it causes crime.
2. An empiricist regresses earnings on education because he cannot observe other determinants of workersâ productivity (âabilityâ) in data.
Bias from under-specificationSo in both cases, there is a population (âtrueâ) model that includes two variables:
ðŠðŠ = ðœðœ0 + ðœðœ1ð¥ð¥1 + ðœðœ2ð¥ð¥2 + ð¢ð¢,
but the empiricist estimates a version that excludes ð¥ð¥2, generating estimates different from unbiased OLSâwhich are denoted with the âtildeâ on the next line.
ï¿œðŠðŠ = ï¿œðœðœ0 + ï¿œðœðœ1ð¥ð¥1; ð¥ð¥2 is omitted from the estimation. So,
ï¿œðœðœ1 =âðð=1ðð ð¥ð¥ðð1 â ï¿œÌ ï¿œð¥1 ðŠðŠðð â ï¿œðŠðŠ
âðð=1ðð ð¥ð¥ðð1 â ï¿œÌ ï¿œð¥1 2 =âðð=1ðð ð¥ð¥ðð1 â ï¿œÌ ï¿œð¥1 ðŠðŠððâðð=1ðð ð¥ð¥ðð1 â ï¿œÌ ï¿œð¥1 2 .
Omitted variable biasIf the true relationship includes ð¥ð¥2, however, the bias in this estimate is revealed by substituting the population model for ðŠðŠðð.
ï¿œðœðœ1 =âðð=1ðð ð¥ð¥ðð1 â ï¿œÌ ï¿œð¥1 ðœðœ0 + ðœðœ1ð¥ð¥ðð1 + ðœðœ2ð¥ð¥ðð2 + ð¢ð¢ðð
âðð=1ðð ð¥ð¥ðð1 â ï¿œÌ ï¿œð¥1 2
Simplify the numerator:
ðœðœ0ï¿œðð=1
ðð
ð¥ð¥ðð1 â ï¿œÌ ï¿œð¥1 + ðœðœ1ï¿œðð=1
ðð
ð¥ð¥ðð1 ð¥ð¥ðð1 â ï¿œÌ ï¿œð¥1 + ðœðœ2ï¿œðð=1
ðð
ð¥ð¥ðð2 ð¥ð¥ðð1 â ï¿œÌ ï¿œð¥1 + ï¿œðð=1
ðð
ð¥ð¥ðð1 â ï¿œÌ ï¿œð¥1 ð¢ð¢ðð .
Then take expectations:
â ðžðž ï¿œðœðœ1 =0 + ðœðœ1 âðð=1ðð ð¥ð¥ðð1 â ï¿œÌ ï¿œð¥1 2 + ðœðœ2ð¶ð¶ððð¶ð¶ ð¥ð¥1, ð¥ð¥2 + 0
âðð=1ðð ð¥ð¥ðð1 â ï¿œÌ ï¿œð¥1 2 ,
since the regressors are uncorrelated with the error from the correctly specified model.
Omitted variable bias (concluded)A little more simplifying gives:
â ðžðž ï¿œðœðœ1 = ðœðœ1 + ðœðœ2ð¶ð¶ððð¶ð¶ ð¥ð¥1, ð¥ð¥2
âðð=1ðð ð¥ð¥ðð1 â ï¿œÌ ï¿œð¥1 2 .
The estimator using the misspecified model is equal to the true parameter (ðœðœ1) plus a second term that captures the bias.
This bias is non-zero unless one of two things is true:
1. ðœðœ2 is zero. There is no bias from excluding an irrelevant variable.
2. ð¶ð¶ððð¶ð¶ ð¥ð¥1, ð¥ð¥2 is zero. This means that ð¥ð¥1 is independent of the error term, varying it does not induce variation in an omitted variable.
Direction of omitted variable biasThe textbook calls the fraction in the bias term, ð¿ð¿1,⊠the regression coefficient you would get if you regressed ð¥ð¥2 on ð¥ð¥1.
ð¿ð¿1 â¡ð¶ð¶ððð¶ð¶ ð¥ð¥1, ð¥ð¥2ðððððð ð¥ð¥1
This term, along with ðœðœ2, helps predict the direction of the bias.
Specifically, if ð¿ð¿1and ðœðœ2 are the same sign (+/-), the bias is positive, and if they are opposite signs, the bias is negative.
1. ð¿ð¿1,ðœðœ2 same sign â ðžðž ï¿œðœðœ1 > ðœðœ1 "upward bias"
2. ð¿ð¿1,ðœðœ2 opposite sign â ðžðž ï¿œðœðœ1 < ðœðœ1 "downward bias"
Omitted variable bias (concluded)One more way to think about bias: the ceteris paribus effect of ð¥ð¥1 on ðŠðŠ.
ï¿œðœðœ1 estimates this effect, but it does so by lumping the true effect in with an unknown quantity of misleading effects, e.g., you take the population model,
ðŠðŠ = ðœðœ0 + ðœðœ1ð¥ð¥1 + ðœðœ2ð¥ð¥2 + ð¢ð¢, and estimate ðŠðŠ = ðœðœ0 + ðœðœ1ð¥ð¥1 + ð¶ð¶;ð¶ð¶ â¡ ðœðœ2ð¥ð¥2 + ð¢ð¢. So,
ðððžðž(ðŠðŠ|ð¥ð¥)ððð¥ð¥1
= ðœðœ1 +ððð¶ð¶ððð¥ð¥1
= ðœðœ1 + ðœðœ2ðððžðž(ð¥ð¥2|ð¥ð¥1)
ððð¥ð¥1.
The partial derivative of ð¥ð¥2 with respect to ð¥ð¥1 is the output you would get if you estimated a regression of ð¥ð¥2 on ð¥ð¥1, i.e., ð¿ð¿1.
Standard error of the OLS estimatorWhy is variance of an estimator (âstandard errorâ) so important? ⊠Inference: next chapter.
Generating and reporting only point estimates is irresponsible.⊠It leads your (especially an uninitiated) audience to an overly specific conclusion about your results. ⊠This is bad for you (empiricist) and them (policy makers, customers, other scholars) because they may
base decisions on your findings, e.g., choosing a marginal income tax rate depends crucially on labor supply elasticity.
A point estimate outside the context of its standard error may prompt your audience to rash decisions that they would not make if they knew your estimates were not âpinpointâ accurate.⊠This is bad for you because if they do the âwrongâ thing with your results, they will blame you for the
policyâs failure.⊠And empiricists, as a group, will have their credibility diminished a little, as well.
Variance of OLS estimatorsHere we show that the standard error of the ððð¡ð¡ð¡ estimate, ï¿œÌï¿œðœðð , is:
ð ð ðð ï¿œÌï¿œðœðð â¡ ðððððð ï¿œÌï¿œðœðð12 =
âðð=1ðð ï¿œð¢ð¢ðð2
ðð â ðð â 1 1 â ð ð ðð2 âðð=1ðð ð¥ð¥ðððð â ï¿œÌ ï¿œð¥ðð2 , where
ï¿œð¢ð¢ðð are the residuals from the regression,
ðð is the number of regressors, and
ð ð ðð2 is the proportion of ð¥ð¥ððâ²ð ð variance explained by the other ð¥ð¥ variables.
Once again the estimate of the variance assumes homoskedasticity, i.e.,ðððððð ð¢ð¢ ððð ð ð ð ð¥ð¥ðð = ðððððð(ð¢ð¢) = ðð2.
Variance of OLS estimators (continued)Most of the components of the variance are familiar from simple regression.⊠Subscripts (ðð) and that the degrees of freedom (ðð â ðð â 1 instead of ðð â 1 â 1) have been generalized
for ðð regressors.
The term, (1 â ð ð ðð2), is new, reflecting the consequence of (imperfect) multicollinearity.
Perfect multicollinearity has already been ruled out by Assumption MLR.3, but the standard error of a multicollinear variable (ð ð ðð2 â 1) is very large because the denominator of the expression above will be very small ((1 â ð ð ðð2) â 0).
Variance of OLS estimators (continued)Once again the variance of the errors is estimated without bias, i.e., (ðžðž ï¿œðð2 = ðð2), by the (degrees of freedom-adjusted) mean squared error:
ððððð ð â¡ï¿œðð=1
ðð
ï¿œð¢ð¢ðð2 ; ï¿œðð2 =ððððð ð
ðð â ðð â 1, so ðððððð ï¿œÌï¿œðœðð =
ï¿œðð2
1 â ð ð ðð2 âðð=1ðð ð¥ð¥ðððð â ï¿œÌ ï¿œð¥ðð2 ,
Where the residuals are: ï¿œð¢ð¢ðð â¡ ðŠðŠðð â ï¿œÌï¿œðœ0 â ï¿œÌï¿œðœ1ð¥ð¥ðð1â. . .âï¿œÌï¿œðœððð¥ð¥ðððð .
Variance of OLS estimators (continued)Observations about the standard error.⊠When adding a (relevant) regressor to a model, the standard error of an existing regressor can increase
or decrease.⊠Generally the sum of squares of residuals decreases, since more information is being used to fit the
model. But the degrees of freedom decreases as well.⊠The standard error depends on Assumption MLR.5 (homoskedasticity). ⊠If it is violated, ï¿œÌï¿œðœðð is still unbiased, however, the above expression becomes a biased estimate of the
variance of ï¿œÌï¿œðœðð. ⊠The 8th chapter in the text is devoted to dealing with the problem of heteroskedasticity in regression.
Gauss-Markov Theorem and BLUEIf, however, Assumptions MLR 1-5 hold, the OLS estimators (ï¿œÌï¿œðœ0through ï¿œÌï¿œðœðð) have minimal variance among the class of linear unbiased estimators, ⊠i.e., it is the âbestâ in the sense of minimal variance.
OLS is âBLUEâ, which stands for Best Linear Unbiased Estimator.
The 5 Assumptions that lead to this conclusion (MLR 1-5) are collectively known as the Gauss-Markov assumptions.
The BLUE-ness of the OLS estimators under those assumptions is known as the Gauss-Markov Theorem.
ConclusionMultiple regression solves the problem of omitted variables that are correlated with the dependent variable and regressor.
It is estimated in a manner analogous to simple OLS and has similar properties:⊠Unbiasedness under MLR 1-4,⊠BLUE under MLR 1-5.
Its estimates are interpreted as partial effects of changing one regressor and holding the others constant.
The estimates have standard errors that can be used to generate confidence intervals and test hypotheses about the parametersâ values.
Data on crime and ice creamThe unit of observation is daily, i.e., these are 82 days on which ice cream search volume and crime reports have been counted in the City of Milwaukee.
All the variables have had the âday of the weekâ-specific mean subtracted from them, because all of them fluctuate across the calendar week, and this avoids creating another spurious correlation through the days of the week.
Back.
OLS estimates using matricesUnderstanding what software is doing when it puts out regression estimates is one reason for the following exercise.
Another reason is to demonstrate the treatment of OLS you will experience in graduate-level econometrics classesâone in which everything is espoused using matrix algebra.
Imagine the variables in your regression as a spreadsheet, i.e., with ðŠðŠ in column 1, ð¥ð¥1 in column 2, and each row is a different observation.
With this image in mind, divide the spreadsheet into two matrices,* on which operations may be performed as they are on numbers (âscalarsâ in matrix jargon).
*âA rectangular array of numbers enclosed in parentheses. It is conventionally denoted by a capital letter.â
OLS estimates using matrices (continued)The dimensions of a matrix are expressed: ððððððð ð à ððððð ð ð¢ð¢ððððð ð .⊠I.e., ðð observations (rows) of a variable ðŠðŠ, would be an ðð à 1 matrix (âYâ); ðð observations (rows) of ðð
explanatory variables ð¥ð¥, would be an ðð à ðð matrix (âXâ).
One operation that can be performed on a matrix is called transposition, denoted by an apostrophe behind the matrixâs name.⊠Transposing a matrix just means exchanging its rows and columns. E.g.,
ðð â¡ 5 2 13 0 1 â ððâ² =
5 32 01 1
.
Matrix multiplicationAnother operation that can be performed on some pairs of matrices is multiplication, but only if they are conformable.
âTwo matrices A and B of dimensions ðð Ã ðð and ðð Ã ðð respectively are conformable to form the product matrix AB, since the number of columns in A is equal to the number of rows in B. The product matrix AB is of dimension ðð Ã ðð, and its ððððð¡ð¡ð¡ element, ðððððð, is obtained by multiplying the elements of the ððð¡ð¡ð¡ row of A by the corresponding elements of the ððð¡ð¡ð¡ column of B and adding the resulting products.â
OLS estimates using matrices (continued)Transposition and multiplication are useful in econometrics for obtaining a matrix of covariances between ðŠðŠ and all the ð¥ð¥, as well as among the ð¥ð¥ variables.
Specifically when you multiply Y by the transpose of X, you get a matrix (ðð Ã 1) with elements,
ððâ²ðð =
ï¿œðð=1
ðð
ð¥ð¥ðð1ðŠðŠðð
ï¿œðð=1
ðð
ð¥ð¥ðð2ðŠðŠðð
â®
ï¿œðð=1
ðð
ð¥ð¥ðð2ðŠðŠðð
=
ð ð ðŠðŠ1ð ð ðŠðŠ2â®ð ð ðŠðŠðð
;
x and y are expressed as deviations from their means.
OLS estimates using matrices (continued)Similarly multiplying X by its own transpose gives you a matrix (ðð Ã ðð) with elements,
ððâ²ðð =
ï¿œðð=1
ðð
ð¥ð¥ðð12 ⊠ᅵðð=1
ðð
ð¥ð¥ðð1ð¥ð¥ðððð
â® â± â®
ï¿œðð=1
ðð
ð¥ð¥ðð1ð¥ð¥ðððð ⊠ᅵðð=1
ðð
ð¥ð¥ðððð2
=ð ð 11 ⊠ð ð 1ððâ® â± â®ð ð ðð1 ⊠ð ð ðððð
.
OLS estimates using matrices (continued)XâX and XâY are matrices of the variances of the explanatory variables and covariances among all variables. They can be combined to obtain the matrix (ðð à 1) of coefficient estimates. ⊠This is the part for which computers are really indispensable.
First of all, you invert the matrix ððððð. This means solving for the values that make the following hold:
ððâ²ðð ððâ²ðð â1 = ðŒðŒ âð ð 11 ⊠ð ð 1ððâ® â± â®ð ð ðð1 ⊠ð ð ðððð
ðð11 ⊠ðð1ððâ® â± â®ðððð1 ⊠ðððððð
=1 ⊠0â® â± â®0 ⊠1
.
OLS estimates using matrices (continued)To illustrate using the case of ðð = 2, you have to solve for all the âaâ terms using simultaneous equations.
ð ð 11 ð ð 12ð ð 21 ð ð 22
ðð11 ðð12ðð21 ðð22 = 1 0
0 1 , soðð11ð ð 11 + ðð21ð ð 12 = 1,ðð11ð ð 21 + ðð21ð ð 22 = 0,
ðð12ð ð 11 + ðð22ð ð 12 = 0, andðð12ð ð 21 + ðð22ð ð 22 = 1.
OLS estimates using matrices (continued)Solving the first two simultaneously gives you:
ðð11 =ð ð 22
ð ð 11ð ð 22 â ð ð 12ð ð 21and ðð21 =
ð ð 21ð ð 12ð ð 21 â ð ð 11ð ð 22
.
Similarly the last to give you:ðð12 =
ð ð 12ð ð 12ð ð 21 â ð ð 11ð ð 22
and ðð22 =ð ð 11
ð ð 11ð ð 22 â ð ð 12ð ð 21.
To get the coefficient estimates, you do the matrix equivalent of dividing covariance by variance, i.e., you solve for the vector B:
Î = ððâ²ðð â1 ððâ²ðð .
OLS estimates using matrices (continued)For the ðð = 2 case,
Î =
ð ð 22ð ð 11ð ð 22 â ð ð 12ð ð 21
ð ð 12ð ð 12ð ð 21 â ð ð 11ð ð 22
ð ð 21ð ð 12ð ð 21 â ð ð 11ð ð 22
ð ð 11ð ð 11ð ð 22 â ð ð 12ð ð 21
ð ð ðŠðŠ1ð ð ðŠðŠ2 .
Multiplying this out gives the same solution as minimizing the Sum of Squared Residuals.
Î =
ð ð 22ð ð ðŠðŠ1 â ð ð 12ð ð ðŠðŠ2ð ð 11ð ð 22 â ð ð 12ð ð 21ð ð 11ð ð ðŠðŠ2 â ð ð 21ð ð ðŠðŠ1ð ð 11ð ð 22 â ð ð 12ð ð 21
=
ðððððð ð¥ð¥2 ð¶ð¶ððð¶ð¶(ð¥ð¥1,ðŠðŠ) â ð¶ð¶ððð¶ð¶ ð¥ð¥1, ð¥ð¥2 ð¶ð¶ððð¶ð¶ ð¥ð¥2,ðŠðŠðððððð ð¥ð¥1 ðððððð ð¥ð¥2 â ð¶ð¶ððð¶ð¶ ð¥ð¥1, ð¥ð¥2 2
ðððððð ð¥ð¥1 ð¶ð¶ððð¶ð¶ ð¥ð¥2,ðŠðŠ â ð¶ð¶ððð¶ð¶ ð¥ð¥1, ð¥ð¥2 ð¶ð¶ððð¶ð¶ ð¥ð¥1,ðŠðŠðððððð ð¥ð¥1 ðððððð ð¥ð¥2 â ð¶ð¶ððð¶ð¶ ð¥ð¥1, ð¥ð¥2 2
OLS estimates using matrices (concluded)Solving OLS for ðð > 2 is not fundamentally different, and the solution will always be a function of the variances and covariances of the variables in the model.⊠It just gets more difficult (at an accelerating rate) to write down solutions for the individual coefficients. ⊠It is left as an exercise to show that the âpartialling outâ interpretation of OLS holds.
Î = ððâ²ðð â1 ððâ²ðð =
âðð=1ðð ï¿œÌï¿œððð1ðŠðŠððâðð=1ðð ï¿œÌï¿œððð12
â®âðð=1ðð ï¿œÌï¿œððððððŠðŠððâðð=1ðð ï¿œÌï¿œððð12
,
in which ï¿œÌï¿œððððð is the residual for observation ðð from a regression of ð¥ð¥ððon all the other regressors.⊠ᅵÌï¿œððð1 â¡ ð¥ð¥ðð1 â ï¿œð¥ð¥ðð1; ï¿œð¥ð¥ðð1 is the fitted value.
Back.
Partialling out interpretation of OLSThe expression for ï¿œÌï¿œðœ1 can be derived from the first order condition for ð¥ð¥1 from the least squares minimization.
This states that:ððððððð ð ððï¿œÌï¿œðœ1
= â2ï¿œðð=1
ðð
ð¥ð¥ðð1 ðŠðŠðð â ï¿œÌï¿œðœ0 â ï¿œÌï¿œðœ1ð¥ð¥ðð1 â ï¿œÌï¿œðœ2ð¥ð¥ðð2â. . .âï¿œÌï¿œðœððð¥ð¥ðððð = 0.
Substituting in the definition of the residuals from the previous line, you have:
ï¿œðð=1
ðð
ï¿œð¥ð¥ðð1 + ï¿œÌï¿œððð1 ðŠðŠðð â ï¿œÌï¿œðœ0 â ï¿œÌï¿œðœ1ð¥ð¥ðð1 â ï¿œÌï¿œðœ2ð¥ð¥ðð2â. . .âï¿œÌï¿œðœððð¥ð¥ðððð = 0
Partialling out interpretation of OLS (continued)
âï¿œðð=1
ðð
ï¿œð¥ð¥ðð1 ðŠðŠðð â ï¿œÌï¿œðœ0 â ï¿œÌï¿œðœ1ð¥ð¥ðð1 â ï¿œÌï¿œðœ2ð¥ð¥ðð2â. . .âï¿œÌï¿œðœððð¥ð¥ðððð +
ï¿œðð=1
ðð
ï¿œÌï¿œððð1 ðŠðŠðð â ï¿œÌï¿œðœ0 â ï¿œÌï¿œðœ1ð¥ð¥ðð1 â ï¿œÌï¿œðœ2ð¥ð¥ðð2â. . .âï¿œÌï¿œðœððð¥ð¥ðððð = 0.
The term in brackets is the OLS residual from regressing ðŠðŠ on ð¥ð¥1. . . ð¥ð¥ðð , âï¿œð¢ð¢ððâ, and ï¿œð¥ð¥1 is a linear function of the other ð¥ð¥ variables, i.e.,
ï¿œð¥ð¥ðð1 = ï¿œðŸðŸ0 + ï¿œðŸðŸ1ð¥ð¥ðð2+. . . +ï¿œðŸðŸððð¥ð¥ðððð ,
. . .
Partialling out interpretation of OLS (continued). . . so one may write:
ï¿œðð=1
ðð
ï¿œð¥ð¥ðð1 ðŠðŠðð â ï¿œÌï¿œðœ0 â ï¿œÌï¿œðœ1ð¥ð¥ðð1 â ï¿œÌï¿œðœ2ð¥ð¥ðð2â. . .âï¿œÌï¿œðœððð¥ð¥ðððð = ï¿œðð=1
ðð
ï¿œðŸðŸ0 + ï¿œðŸðŸ1ð¥ð¥ðð2+. . . +ï¿œðŸðŸððð¥ð¥ðððð ï¿œð¢ð¢ðð .
Since the OLS residuals are uncorrelated with each ð¥ð¥ðð, and the sum of the residuals is zero, this entire line equals zero and drops out.
Partialling out interpretation of OLS (continued)Then you have:
ï¿œðð=1
ðð
ï¿œÌï¿œððð1 ðŠðŠðð â ï¿œÌï¿œðœ0 â ï¿œÌï¿œðœ1ð¥ð¥ðð1 â ï¿œÌï¿œðœ2ð¥ð¥ðð2â. . .âï¿œÌï¿œðœððð¥ð¥ðððð = 0, and
the residuals (ï¿œÌï¿œð1) are uncorrelated with the other regressors (2 through ðð) and sum to zero, as well. So,
ï¿œðð=1
ðð
ï¿œÌï¿œððð1ðŠðŠðð = ï¿œðð=1
ðð
ï¿œÌï¿œðœ1(ï¿œÌï¿œððð12+ï¿œÌï¿œððð1 ï¿œð¥ð¥ðð1) âï¿œðð=1
ðð
ï¿œÌï¿œððð1ðŠðŠðð = ï¿œÌï¿œðœ1ï¿œðð=1
ðð
ï¿œÌï¿œððð12 .
Partialling out interpretation of OLS (concluded)Then you get the âpartialling outâ expression:
â ï¿œÌï¿œðœ1 =âðð=1ðð ï¿œÌï¿œððð1ðŠðŠððâðð=1ðð ï¿œÌï¿œððð1
2 .
Back.