multiple regression with two explanatory variables: example 1 this sequence provides a geometrical...
TRANSCRIPT
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE
1
This sequence provides a geometrical interpretation of a multiple regression model with two explanatory variables.
EARNINGS = b1 + b2S + b3EXP + u
S
b1
EARNINGS
EXP
2
Specifically, we will look at an earnings function model where hourly earnings, EARNINGS, depend on years of schooling (highest grade completed), S, and years of work experience, EXP.
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE
EARNINGS = b1 + b2S + b3EXP + u
S
b1
EARNINGS
EXP
3
The model has three dimensions, one each for EARNINGS, S, and EXP. The starting point for investigating the determination of EARNINGS is the intercept, b1.
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE
EARNINGS = b1 + b2S + b3EXP + u
S
b1
EARNINGS
EXP
4
Literally the intercept gives EARNINGS for those respondents who have no schooling and no work experience. However, there were no respondents with less than 6 years of schooling. Hence a literal interpretation of b1 would be unwise.
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE
EARNINGS = b1 + b2S + b3EXP + u
S
b1
EARNINGS
EXP
pure S effect
5
S
b1
EARNINGS
b1 + b2S
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE
EARNINGS = b1 + b2S + b3EXP + u
The next term on the right side of the equation gives the effect of variations in S. A one year increase in S causes EARNINGS to increase by b2 dollars, holding EXP constant.
EXP
pure EXP effect
6
S
b1
b1 + b3EXP
EARNINGS
EXP
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE
EARNINGS = b1 + b2S + b3EXP + u
Similarly, the third term gives the effect of variations in EXP. A one year increase in EXP causes earnings to increase by b3 dollars, holding S constant.
pure EXP effect
7
S
b1
b1 + b3EXP
b1 + b2S + b3EXP
EARNINGS
EXP
b1 + b2S
combined effect of S and EXP
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE
EARNINGS = b1 + b2S + b3EXP + u
Different combinations of S and EXP give rise to values of EARNINGS which lie on the plane shown in the diagram, defined by the equation EARNINGS = b1 + b2S + b3EXP. This is the nonstochastic (nonrandom) component of the model.
pure S effect
8
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE
EARNINGS = b1 + b2S + b3EXP + u
The final element of the model is the disturbance term, u. This causes the actual values of EARNINGS to deviate from the plane. In this observation, u happens to have a positive value.
pure EXP effect
S
b1
b1 + b3EXP
EARNINGS
EXP
u
b1 + b2S
pure S effect
b1 + b2S + b3EXP
combined effect of S and EXP
b1 + b2S + b3EXP + u
9
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE
EARNINGS = b1 + b2S + b3EXP + u
A sample consists of a number of observations generated in this way. Note that the interpretation of the model does not depend on whether S and EXP are correlated or not.
pure EXP effect
S
b1
b1 + b3EXP
EARNINGS
EXP
u
b1 + b2S
pure S effect
b1 + b2S + b3EXP
combined effect of S and EXP
b1 + b2S + b3EXP + u
pure EXP effect
10
S
b1
b1 + b3EXP
b1 + b2S + b3EXP + u
EARNINGS
EXP
u
However we do assume that the effects of S and EXP on EARNINGS are additive. The impact of a difference in S on EARNINGS is not affected by the value of EXP, or vice versa.
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE
EARNINGS = b1 + b2S + b3EXP + u
b1 + b2S
pure S effect
b1 + b2S + b3EXP
combined effect of S and EXP
The regression coefficients are derived using the same least squares principle used in simple regression analysis. The fitted value of Y in observation i depends on our choice of b1, b2, and b3.
11
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE
uXXY 33221 33221ˆ XbXbbY
Fitted model
True model
The residual ei in observation i is the difference between the actual and fitted values of Y.
12
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE
iiiiii XbXbbYYYe 33221ˆ
uXXY 33221 33221ˆ XbXbbY
Fitted model
True model
We define RSS, the sum of the squares of the residuals, and choose b1, b2, and b3 so as to minimize it.
13
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE
iiiiii XbXbbYYYe 33221ˆ
uXXY 33221 33221ˆ XbXbbY
Fitted model
True model
233221
2 )( iiii XbXbbYeRSS
First, we expand RSS as shown.
14
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE
iiiiii XbXbbYYYe 33221ˆ
uXXY 33221 33221ˆ XbXbbY
Fitted model
True model
233221
2 )( iiii XbXbbYeRSS
)2222
22(
323233122133
22123
23
22
22
21
2
iiiiii
iiiiii
XXbbXbbXbbYXb
YXbYbXbXbbY
iii
iiiii
iiii
XXbbXbb
XbbYXbYXb
YbXbXbnbY
3232331
2213322
123
23
22
22
21
2
22
222
2
15
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE
iii
iiiii
iiii
XXbbXbb
XbbYXbYXb
YbXbXbnbYRSS
3232331
2213322
123
23
22
22
21
2
22
222
2
Then we use the first order conditions for minimizing it.
01
bRSS
0
2
bRSS
0
3
bRSS
iiiiii XbXbbYYYe 33221ˆ
uXXY 33221 33221ˆ XbXbbY
Fitted model
True model
33221 XbXbYb
We thus obtain three equations in three unknowns. Solving for b1, b2, and b3, we obtain the expressions shown above. (The expression for b3 is the same as that for b2, with the subscripts 2 and 3 interchanged everywhere.)
16
23322
233
222
3322332
XXXXXXXX
XXXXYYXXb
iiii
iiii
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE
23322 XXYYXX iii
uXXY 33221 33221ˆ XbXbbY
Fitted model
True model
17
The expression for b1 is a straightforward extension of the expression for it in simple regression analysis.
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE
uXXY 33221 33221ˆ XbXbbY
Fitted model
True model
33221 XbXbYb
23322
233
222
3322332
XXXXXXXX
XXXXYYXXb
iiii
iiii
23322 XXYYXX iii
18
However, the expressions for the slope coefficients are considerably more complex than that for the slope coefficient in simple regression analysis.
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE
uXXY 33221 33221ˆ XbXbbY
Fitted model
True model
33221 XbXbYb
23322
233
222
3322332
XXXXXXXX
XXXXYYXXb
iiii
iiii
23322 XXYYXX iii
19
For the general case when there are many explanatory variables, ordinary algebra is inadequate. It is necessary to switch to matrix algebra.
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE
uXXY 33221 33221ˆ XbXbbY
Fitted model
True model
33221 XbXbYb
23322
233
222
3322332
XXXXXXXX
XXXXYYXXb
iiii
iiii
23322 XXYYXX iii
. reg EARNINGS S EXP
Source | SS df MS Number of obs = 540-------------+------------------------------ F( 2, 537) = 67.54 Model | 22513.6473 2 11256.8237 Prob > F = 0.0000 Residual | 89496.5838 537 166.660305 R-squared = 0.2010-------------+------------------------------ Adj R-squared = 0.1980 Total | 112010.231 539 207.811189 Root MSE = 12.91
------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- S | 2.678125 .2336497 11.46 0.000 2.219146 3.137105 EXP | .5624326 .1285136 4.38 0.000 .3099816 .8148837 _cons | -26.48501 4.27251 -6.20 0.000 -34.87789 -18.09213------------------------------------------------------------------------------
Here is the regression output for the earnings function using Data Set 21.
20
EXPSINGSNEAR 56.068.249.26ˆ
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE
21
It indicates that earnings increase by $2.68 for every extra year of schooling and by $0.56 for every extra year of work experience.
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE
EXPSINGSNEAR 56.068.249.26ˆ
. reg EARNINGS S EXP
Source | SS df MS Number of obs = 540-------------+------------------------------ F( 2, 537) = 67.54 Model | 22513.6473 2 11256.8237 Prob > F = 0.0000 Residual | 89496.5838 537 166.660305 R-squared = 0.2010-------------+------------------------------ Adj R-squared = 0.1980 Total | 112010.231 539 207.811189 Root MSE = 12.91
------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- S | 2.678125 .2336497 11.46 0.000 2.219146 3.137105 EXP | .5624326 .1285136 4.38 0.000 .3099816 .8148837 _cons | -26.48501 4.27251 -6.20 0.000 -34.87789 -18.09213------------------------------------------------------------------------------
22
Literally, the intercept indicates that an individual who had no schooling or work experience would have hourly earnings of –$26.49.
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE
EXPSINGSNEAR 56.068.249.26ˆ
. reg EARNINGS S EXP
Source | SS df MS Number of obs = 540-------------+------------------------------ F( 2, 537) = 67.54 Model | 22513.6473 2 11256.8237 Prob > F = 0.0000 Residual | 89496.5838 537 166.660305 R-squared = 0.2010-------------+------------------------------ Adj R-squared = 0.1980 Total | 112010.231 539 207.811189 Root MSE = 12.91
------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- S | 2.678125 .2336497 11.46 0.000 2.219146 3.137105 EXP | .5624326 .1285136 4.38 0.000 .3099816 .8148837 _cons | -26.48501 4.27251 -6.20 0.000 -34.87789 -18.09213------------------------------------------------------------------------------
23
Obviously, this is impossible. The lowest value of S in the sample was 6. We have obtained a nonsense estimate because we have extrapolated too far from the data range.
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE
EXPSINGSNEAR 56.068.249.26ˆ
. reg EARNINGS S EXP
Source | SS df MS Number of obs = 540-------------+------------------------------ F( 2, 537) = 67.54 Model | 22513.6473 2 11256.8237 Prob > F = 0.0000 Residual | 89496.5838 537 166.660305 R-squared = 0.2010-------------+------------------------------ Adj R-squared = 0.1980 Total | 112010.231 539 207.811189 Root MSE = 12.91
------------------------------------------------------------------------------ EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- S | 2.678125 .2336497 11.46 0.000 2.219146 3.137105 EXP | .5624326 .1285136 4.38 0.000 .3099816 .8148837 _cons | -26.48501 4.27251 -6.20 0.000 -34.87789 -18.09213------------------------------------------------------------------------------
Copyright Christopher Dougherty 2012.
These slideshows may be downloaded by anyone, anywhere for personal use.
Subject to respect for copyright and, where appropriate, attribution, they may be
used as a resource for teaching an econometrics course. There is no need to
refer to the author.
The content of this slideshow comes from Section 3.2 of C. Dougherty,
Introduction to Econometrics, fourth edition 2011, Oxford University Press.
Additional (free) resources for both students and instructors may be
downloaded from the OUP Online Resource Centre
http://www.oup.com/uk/orc/bin/9780199567089/.
Individuals studying econometrics on their own who feel that they might benefit
from participation in a formal course should consider the London School of
Economics summer school course
EC212 Introduction to Econometrics
http://www2.lse.ac.uk/study/summerSchools/summerSchool/Home.aspx
or the University of London International Programmes distance learning course
EC2020 Elements of Econometrics
www.londoninternational.ac.uk/lse.
2012.10.28