lecture 7: review - columbia universityso33/susdev/lecture_7.pdf · 2005-07-29 · ex 1029: wage...

26
Lecture 7: Review Prof. Sharyn O’Halloran Sustainable Development U9611 Econometrics II

Upload: others

Post on 30-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 7: Review - Columbia Universityso33/SusDev/Lecture_7.pdf · 2005-07-29 · Ex 1029: Wage and Race The dataset provided is designed to explore the relationship between wage

Lecture 7: Review

Prof. Sharyn O’Halloran Sustainable Development U9611Econometrics II

Page 2: Lecture 7: Review - Columbia Universityso33/SusDev/Lecture_7.pdf · 2005-07-29 · Ex 1029: Wage and Race The dataset provided is designed to explore the relationship between wage

Ex 1029: Wage and Race

The dataset provided is designed to explore the relationship between wage and race (black_indicator), controlling for the region in the US, education, experience and weather they worked in a standard metropolitan statistical area.Model to be tested:

uindblackregionindsmsaeducerlwage

++++++=

__exp

54

3210

ββββββ

Page 3: Lecture 7: Review - Columbia Universityso33/SusDev/Lecture_7.pdf · 2005-07-29 · Ex 1029: Wage and Race The dataset provided is designed to explore the relationship between wage

Creating Dummy Variables and Interactive Terms

We proceed by recoding region into 4 dummies:We rewrite our model including interaction terms as follows:

ublackregSEblackregNEblackregMWindblack

regSregNEregMWindsmsaeducerlwage

+++++

++++++=

109

87

654

3210

_

_exp

ββββ

βββββββ

Page 4: Lecture 7: Review - Columbia Universityso33/SusDev/Lecture_7.pdf · 2005-07-29 · Ex 1029: Wage and Race The dataset provided is designed to explore the relationship between wage

Hypotheses

We expect positive coefficients for:EducationExperience and SMSA

We expect a negative coefficient on:black-indicator

Page 5: Lecture 7: Review - Columbia Universityso33/SusDev/Lecture_7.pdf · 2005-07-29 · Ex 1029: Wage and Race The dataset provided is designed to explore the relationship between wage

Results of Tentative Modelreg lwage exper educ smsa_ind regMW regNE regS black_ind blackregMW blackregNE blackregS

Source | SS df MS Number of obs = 25631-------------+------------------------------ F( 10, 25620) = 1010.01

Model | 2852.13293 10 285.213293 Prob > F = 0.0000Residual | 7234.7208 25620 .282385667 R-squared = 0.2828

-------------+------------------------------ Adj R-squared = 0.2825Total | 10086.8537 25630 .393556525 Root MSE = .5314

------------------------------------------------------------------------------lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------exper | .0183495 .0002789 65.79 0.000 .0178028 .0188961educ | .0969922 .0012015 80.72 0.000 .0946371 .0993473

smsa_ind | .1575999 .0077298 20.39 0.000 .1424491 .1727507regMW | .0034929 .0100704 0.35 0.729 -.0162456 .0232315regNE | .0382672 .0102318 3.74 0.000 .0182123 .0583221regS | -.0571932 .0097345 -5.88 0.000 -.0762735 -.038113

black_ind | -.1937687 .040989 -4.73 0.000 -.2741094 -.113428blackregMW | -.0468973 .051017 -0.92 0.358 -.1468935 .053099blackregNE | -.0242864 .0508993 -0.48 0.633 -.1240519 .0754792blackregS | -.0435901 .0443259 -0.98 0.325 -.1304714 .0432912

_cons | 4.573932 .0196774 232.45 0.000 4.535363 4.612501------------------------------------------------------------------------------

Fail to reject the null hypothesis that βi=0 in favor of the alternatives that βi≠0.

Page 6: Lecture 7: Review - Columbia Universityso33/SusDev/Lecture_7.pdf · 2005-07-29 · Ex 1029: Wage and Race The dataset provided is designed to explore the relationship between wage

Residual Plots

No obvious pattern

-4-2

02

4Res

idua

ls

5 5.5 6 6.5 7 7.5Fitted values

Residuals versus predicted values plot

Page 7: Lecture 7: Review - Columbia Universityso33/SusDev/Lecture_7.pdf · 2005-07-29 · Ex 1029: Wage and Race The dataset provided is designed to explore the relationship between wage

F-test Test the joint significance of the interactive termsCommand:

test blackregMW blackregNE blackregS( 1) blackregMW = 0( 2) blackregNE = 0( 3) blackregS = 0

F( 3, 25620) = 0.42Prob > F = 0.7408

Results:Variables not jointly significantRemove from model

Page 8: Lecture 7: Review - Columbia Universityso33/SusDev/Lecture_7.pdf · 2005-07-29 · Ex 1029: Wage and Race The dataset provided is designed to explore the relationship between wage

Re-run Resultsreg lwage exper educ smsa_ind regMW regNE regS black_ind

Source | SS df MS Number of obs = 25631-------------+------------------------------ F( 7, 25623) = 1442.80

Model | 2851.77965 7 407.397093 Prob > F = 0.0000Residual | 7235.07408 25623 .282366393 R-squared = 0.2827

-------------+------------------------------ Adj R-squared = 0.2825Total | 10086.8537 25630 .393556525 Root MSE = .53138

------------------------------------------------------------------------------lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------exper | .0183511 .0002789 65.81 0.000 .0178046 .0188977educ | .0970151 .0012011 80.77 0.000 .0946609 .0993693

smsa_ind | .1578088 .0077105 20.47 0.000 .1426959 .1729218regMW | .0017984 .0098616 0.18 0.855 -.0175308 .0211276regNE | .0377502 .0100117 3.77 0.000 .0181268 .0573737regS | -.0593619 .0094407 -6.29 0.000 -.0778662 -.0408576

black_ind | -.230438 .012657 -18.21 0.000 -.2552465 -.2056296_cons | 4.574619 .0196608 232.68 0.000 4.536083 4.613155

------------------------------------------------------------------------------

After removing interactive terms, black indicator variableremains significant

Page 9: Lecture 7: Review - Columbia Universityso33/SusDev/Lecture_7.pdf · 2005-07-29 · Ex 1029: Wage and Race The dataset provided is designed to explore the relationship between wage

Interpretation

Keeping all else constant:This is a log-level problem; we’re regressing the log of y on the level of xSo we use the formula: %∆y=(100β)∆x

In this case, b=-.23, so a 1-unit change in x causes a 23% decrease in y.That is, black workers on average have wages 23% lower than non-black workers.

Also note salary differentials by region

Page 10: Lecture 7: Review - Columbia Universityso33/SusDev/Lecture_7.pdf · 2005-07-29 · Ex 1029: Wage and Race The dataset provided is designed to explore the relationship between wage

Ex 1123: Air Pollution and Mortality

The dataset provided is designed to explore the relationship between mortality rate and concentrations in dangerous pollutants such as nitrogen oxides and sulfur dioxide.The model we would like to study is the following:

uwhitenoneducationionprecipitatSONOmortality x

+−+++++=

54

32210 )log()log(ββ

ββββ

Page 11: Lecture 7: Review - Columbia Universityso33/SusDev/Lecture_7.pdf · 2005-07-29 · Ex 1029: Wage and Race The dataset provided is designed to explore the relationship between wage

Transforming VariablesIt makes sense to log the independent variables for NOx and SO2

800

900

1000

1100

Tota

l age

adj

uste

d m

orta

lity

0 100 200 300NOX

Fitted values MORT

Mortality for all causes is measured as deaths per 100,000 populationScatter plot and fitted line

Scatterplot with NOx

Page 12: Lecture 7: Review - Columbia Universityso33/SusDev/Lecture_7.pdf · 2005-07-29 · Ex 1029: Wage and Race The dataset provided is designed to explore the relationship between wage

Transforming VariablesIt makes sense to log the independent variables for Nox and SO2

Scatterplot with log of Nox

800

900

1000

1100

Tota

l age

adj

uste

d m

orta

lity

0 2 4 6lnox

Fitted values MORT

Mortality for all causes is measured as deaths per 100,000 populationScatter plot and fitted line

Page 13: Lecture 7: Review - Columbia Universityso33/SusDev/Lecture_7.pdf · 2005-07-29 · Ex 1029: Wage and Race The dataset provided is designed to explore the relationship between wage

Hypotheses

We expect positive coefficients on:log(NOx) log(SO2) Precipitation (due to acid rain) Non-white population

We expect a negative coefficient on:Education

Page 14: Lecture 7: Review - Columbia Universityso33/SusDev/Lecture_7.pdf · 2005-07-29 · Ex 1029: Wage and Race The dataset provided is designed to explore the relationship between wage

Results from Tentative Modelreg mort lnox lso2 precip educ nonwhite

Source | SS df MS Number of obs = 60-------------+------------------------------ F( 5, 54) = 23.85

Model | 157116.254 5 31423.2507 Prob > F = 0.0000Residual | 71159.1703 54 1317.76241 R-squared = 0.6883

-------------+------------------------------ Adj R-squared = 0.6594Total | 228275.424 59 3869.07498 Root MSE = 36.301

------------------------------------------------------------------------------mort | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------lnox | 6.716442 7.399021 0.91 0.368 -8.117702 21.55059lso2 | 11.35782 5.295537 2.14 0.036 .7409073 21.97473

precip | 1.946748 .7007028 2.78 0.008 .5419234 3.351573educ | -14.66453 6.937913 -2.11 0.039 -28.57421 -.7548551

nonwhite | 3.028928 .6685249 4.53 0.000 1.688616 4.36924_cons | 940.6586 94.05514 10.00 0.000 752.0894 1129.228

All signs as expectedCoefficient on NOx is insignificant, however

Page 15: Lecture 7: Review - Columbia Universityso33/SusDev/Lecture_7.pdf · 2005-07-29 · Ex 1029: Wage and Race The dataset provided is designed to explore the relationship between wage

Checking Case Influence Statistics

New Orleans has a high Cook’s Distance

San Jose, CAWichita, KS San Diego, CA

Lancaster, PA

Minneapolis, MNDallas, TX

Miami, FLLos Angeles, CA

Grand Rapids, MIDenver, CORochester, NYHartford, CTFort Worth, TX Portland, ORWorcester, MASeattle, WABridgeport, CTSpringfield, MA San Francisco, CAYork, PA

Utica, NY Canton, OHKansas City, MO Akron, OHNew Haven, CT Milwaukee, WIBoston, MADayton, OHProvidence, RIFlint, MI Reading, PASyracuse, NYHouston, TX Saint Louis, MOYoungstown, OHColumbus, OH Detroit, MINashville, TNAllentown, PA Washington, DCIndianapolis, IN Cincinnati, OHGreeensboro, NCToledo, OHAtlanta, GA Cleveland, OHLouisville, KYPittsburgh, PANew York, NYAlbany, NYBuffalo, NYWilmington, DEMemphis, TNPhiladelphia, PAChattanooga, TN Chicago, ILRichmond, VABirmingham, AL

Baltimore, MD

New Orleans, LA

0.5

11.

52

Coo

k's

D

0 2 4 6lnox

Cooks'D by city

Page 16: Lecture 7: Review - Columbia Universityso33/SusDev/Lecture_7.pdf · 2005-07-29 · Ex 1029: Wage and Race The dataset provided is designed to explore the relationship between wage

Checking for Problems

It’s also an outlier in the avplot for NOx…

Denver, CORichmond, VA

Dallas, TX

Fort Worth, TXFlint, MIDayton, OH

Rochester, NYGrand Rapids, MI

Indianapolis, IN

Utica, NYGreeensboro, NC

Toledo, OH

Atlanta, GASaint Louis, MO

Syracuse, NY

Washington, DCHartford, CT

Nashville, TN

Wichita, KS

Seattle, WA

Minneapolis, MN

Kansas City, MO

Milwaukee, WI

Springfield, MA

New Haven, CTLouisville, KY

Detroit, MICanton, OHAkron, OH

Memphis, TN

Providence, RICleveland, OHColumbus, OH

Albany, NY

Baltimore, MD

Youngstown, OHChattanooga, TNReading, PA

Wilmington, DE

Buffalo, NY

York, PA

Chicago, IL

Cincinnati, OH

Philadelphia, PA

Birmingham, AL

Worcester, MA

Allentown, PA

Lancaster, PA

Miami, FL

New York, NYPittsburgh, PA

Portland, ORBridgeport, CTSan Diego, CA

Houston, TX

San Francisco, CA

Boston, MA

Los Angeles, CASan Jose, CA

New Orleans, LA

-100

-50

050

100

e( m

ort |

X )

-1 0 1 2 3e( lnox | X )

coef = 6.7164423, se = 7.3990212, t = .91

Page 17: Lecture 7: Review - Columbia Universityso33/SusDev/Lecture_7.pdf · 2005-07-29 · Ex 1029: Wage and Race The dataset provided is designed to explore the relationship between wage

Checking for Problems

And for SO2 as well.

New Orleans, LA

San Jose, CA

Houston, TX

Bridgeport, CTSan Diego, CA

Los Angeles, CA

Miami, FL

Wichita, KSSan Francisco, CA

Lancaster, PA

Fort Worth, TX

Worcester, MA

Kansas City, MO

Boston, MA

York, PA

Greeensboro, NC

Dallas, TX

Allentown, PA

Chattanooga, TN

Providence, RI

Canton, OH

Buffalo, NY

Portland, OR

New Haven, CTYoungstown, OH

Columbus, OH

Memphis, TNBirmingham, AL

Reading, PA

Utica, NY

New York, NY

Albany, NY

Pittsburgh, PA

Grand Rapids, MIHartford, CTSpringfield, MA

Flint, MI

Saint Louis, MO

Toledo, OH

Minneapolis, MN

Cleveland, OH

Wilmington, DE

Cincinnati, OH

Louisville, KY

Baltimore, MD

Seattle, WA

Detroit, MIAkron, OH

Philadelphia, PA

Nashville, TN

Rochester, NYAtlanta, GA

Syracuse, NYChicago, IL

Milwaukee, WI

Dayton, OHIndianapolis, INDenver, CORichmond, VA

Washington, DC

-100

-50

050

100

e( m

ort |

X )

-4 -2 0 2e( lso2 | X )

coef = 11.357821, se = 5.2955374, t = 2.14

Page 18: Lecture 7: Review - Columbia Universityso33/SusDev/Lecture_7.pdf · 2005-07-29 · Ex 1029: Wage and Race The dataset provided is designed to explore the relationship between wage

Results Dropping New Orleans

Now education is no longer significant

reg mort lnox lso2 precip educ nonwhite if city!=“New Orleans”

Source | SS df MS Number of obs = 59-------------+------------------------------ F( 5, 53) = 27.90

Model | 143441.648 5 28688.3296 Prob > F = 0.0000Residual | 54501.7233 53 1028.3344 R-squared = 0.7247

-------------+------------------------------ Adj R-squared = 0.6987Total | 197943.371 58 3412.81675 Root MSE = 32.068

------------------------------------------------------------------------------mort | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------lnox | -9.89842 7.730678 -1.28 0.206 -25.4042 5.607357lso2 | 26.03266 5.931109 4.39 0.000 14.13636 37.92896

precip | 1.363333 .6357352 2.14 0.037 .08821 2.638457educ | -5.667182 6.523808 -0.87 0.389 -18.75228 7.417919

nonwhite | 3.039655 .590569 5.15 0.000 1.855124 4.224186_cons | 852.3782 85.93317 9.92 0.000 680.0181 1024.738

Page 19: Lecture 7: Review - Columbia Universityso33/SusDev/Lecture_7.pdf · 2005-07-29 · Ex 1029: Wage and Race The dataset provided is designed to explore the relationship between wage

Cprplot for NOx

Some indications of non-linearity

Miami, FL

Dallas, TX

Fort Worth, TXUtica, NY

Wichita, KS

Hartford, CT

Greeensboro, NC

Grand Rapids, MIWorcester, MANew Haven, CTSpringfield, MA

Providence, RIDayton, OHKansas City, MO

Rochester, NYBridgeport, CT

Flint, MI

Syracuse, NY

Houston, TX

Allentown, PA

To ledo, OH

Seattle, WA

Canton, OH

Lanca ster, PA

Indian apolis, IN

Chattanooga, TNAtlanta, GA

Denver, CO

York, PA

Columbus, OHRichmond, VA

Albany, NY

Minneapolis, MN

Wilmington, DE

Reading, PA

Buffalo, NY

Youngstown, OH

Nashville, TN

Saint Louis, MO

Akron, OH

New Orleans, LA

Memphis, TN

Portland, OR

Cleveland, OH

Milwaukee, WI

New York, NY

Cincinnati, OHWashington, DC

San Jose, CA

Philadelphia, PA

Birmingha m, AL

Boston, MADetroit, MI

Louisville, KY

Baltimore, MDPittsburgh, PA

Chicago, IL

San Diego, CA

San Francisco, CA

Los Angeles, CA

-100

-50

050

100

150

Com

pone

nt p

lus

resi

dual

0 2 4 6lnox

Partial residual plot

Page 20: Lecture 7: Review - Columbia Universityso33/SusDev/Lecture_7.pdf · 2005-07-29 · Ex 1029: Wage and Race The dataset provided is designed to explore the relationship between wage

Cprplot for SO2

This looks more or less linear

Fort Worth, TX

Miami, FL

Houston, TX

New Orleans, LA

Wichita, KSDallas, TXSan Jose, CA

Bridgeport, CT

Kansas City, MO

Greeensboro, NCNew Haven, CTWorcester, MA

Grand Rapids, MIHartford, CT

Utica, NY

Flint, MI

Columbus, OH

Dayton, OH

Rochester, NY

Providence, RISeattle, WA

Springfield, MACanton, OHSan Diego, CAAtlanta, GA

Syracuse, NYToledo, OH

Minneapolis, MN

Chattanooga, TNDenver, CO

Lancaster, PA

Indianapolis, INAllentown, PA

Memphis, TN

Buffalo, NY

Albany, NY

Youngstown, OH

Wilmington, DE

Portland, OR

Richmond, VA

York, PA

Akron, OH

Boston, MA

Cleveland, OH

Saint Louis, MO

Birmingham, ALNashville, TN

San Francisco, CAReading, PA

Washington, DC

New York, NY

Detroit, MIMilwaukee, WI

Los Angeles, CA

Cincinnati, OH

Philadelphia, PALouisville, KY

Baltimore, MDPittsburgh, PA

Chicago, IL

-50

050

100

150

Com

pone

nt p

lus

resi

dual

0 2 4 6lso2

Partial residual plot

Page 21: Lecture 7: Review - Columbia Universityso33/SusDev/Lecture_7.pdf · 2005-07-29 · Ex 1029: Wage and Race The dataset provided is designed to explore the relationship between wage

Review

1. “Regression”, “regression model”, “linear regression model”,“regression analysis”

2. Fitted values, residuals, least squares method of estimation

3. Properties of least squares; tests and confidence intervals forindividual coefficients; prediction intervals; extra SS F-tests(full and reduced models)

4. Model building and refinement: transformation, indicator variables, x2, interaction, variable selection

5. Influence and case-influence statistics

6. Variable selection

Page 22: Lecture 7: Review - Columbia Universityso33/SusDev/Lecture_7.pdf · 2005-07-29 · Ex 1029: Wage and Race The dataset provided is designed to explore the relationship between wage

7. A note on the difference between “confounding variable”and “interaction”

a. Is there an association between gestation and mean brainweight after accounting for body weight?

µ(brain) = β0 + β1 body + β2 gest

(β2 represents the association of gestation with mean brainweight after accounting for body weight.)

b. Is the association between gestation and brain weight Different for animals of different body sizes?

µ(brain) = β0 + β1 body + β2 gest + β3 body*gest

(There is an interactive effect of body and gest on brain)

Page 23: Lecture 7: Review - Columbia Universityso33/SusDev/Lecture_7.pdf · 2005-07-29 · Ex 1029: Wage and Race The dataset provided is designed to explore the relationship between wage

8. What about all those F-tests?

a. All FAll F--tests wetests we’’ve considered are special cases of the extra ve considered are special cases of the extra sum of squares Fsum of squares F--test (Sect. 10.3)test (Sect. 10.3)

b. F-test for overall significance of regressionFull: a model of interestReduced: model with β0 only

c. F-test for lack-of fitFull: one-way anova (separate means for each distinctcombination of x’s)Reduced: a model of interest

d. Partial F-test is an F-test for a single β

Page 24: Lecture 7: Review - Columbia Universityso33/SusDev/Lecture_7.pdf · 2005-07-29 · Ex 1029: Wage and Race The dataset provided is designed to explore the relationship between wage

e. One-way ANOVA F-testFull: model with a separate mean for each groupi.e. β0 and k-1 indicators to distinguish k groupsReduced: b0 only (single mean model)

f. “Type III” F-tests (a computer package term)Full: model that has been specifiedReduced: model without a particular term

g. “Sequential” F-tests (depends on order that x’s are listed)i. Full: intercept and x1

Reduced: intercept

ii. Full: intercept, x1, and x2

Reduced: intercept and x1

iii. Full: intercept, x1, x2, and x3

Reduced: intercept, x1, and x2

Page 25: Lecture 7: Review - Columbia Universityso33/SusDev/Lecture_7.pdf · 2005-07-29 · Ex 1029: Wage and Race The dataset provided is designed to explore the relationship between wage

9. In “linear regression,” what does “linear in b’s” mean?a. β0 *something + β1 *something + β2 *something + ...b. Ex. of nonlinear regression: µ( y|x)= β0 xβ1

10. A note about “mean response.” It is useful to explicitly write µ( y|x1, x2 , x3) to talk about the mean of y as a function of x1 , x2 , and x3 . Sometimes we abbreviate this to “the mean of the response” if it’s clear what x’s we’re talking about.

Page 26: Lecture 7: Review - Columbia Universityso33/SusDev/Lecture_7.pdf · 2005-07-29 · Ex 1029: Wage and Race The dataset provided is designed to explore the relationship between wage

11. Partial residualsa. You may find a plot of partial residuals vs. x1 to be usefulwhen it is desired to study the relationship between y and x1 , after getting the effects of x2 , x3 . etc. out of the way,especially if the effect of x1 is relatively small (in which case the plot of y versus x1 does not reveal much).

b. For example: How is mammal brain weight related to litter size, after accounting for body weight?

c. Suppose µ( y|x1, x2) = β0 + β1 x1 + β2 x2 . A plot of y versus x1 won’t show a linear relationship whose slope is β1 if x1 and x2 are correlated. However, a plot of y - (β0 + β2 x2) versus x1 will show a pattern whose slope is β1 .

d. So, the partial residuals are yi - ( ), where the b’s arethe estimates from the regression of y on x1 and x1.