chapter 8 conclusion - university of manitobahome.cc.umanitoba.ca/~godwinrt/3040/overheads/test...
TRANSCRIPT
1
Chapter 8 Conclusion
Three questions about test scores (score) and student-teacher ratio (str):
a) After controlling for differences in economic characteristics of different
districts, does the effect of str on score depend on the fraction of English
learners (pctel)?
b) Does this effect depend on str? (Is there a non-linear relationship?)
c) After taking economic factors and nonlinearities into account, what is the
estimated effect on score of reducing str?
2
> teachdata =
read.csv("http://home.cc.umanitoba.ca/~godwinrt/3180/data/str3.csv")
> attach(teachdata)
> head(teachdata)
sublunch score str avginc pctel
1 2.0408 690.80 17.88991 22.690001 0.000000
2 47.9167 661.20 21.52466 9.824000 4.583333
3 76.3226 643.60 18.69723 8.978000 30.000002
4 77.0492 647.70 17.35714 8.978000 0.000000
5 78.4270 640.85 18.67133 9.080333 13.857677
6 86.9565 605.55 21.40625 10.415000 12.408759
3
An economics study should always include a description of the data:
sublunch – percent qualifying for reduced-price lunch
score – average test score
str – student teacher ratio
avginc – district average income (in $1000’s)
pctel – percentage of English learners
It is also common to provide descriptive statistics for the variables.
The variable of interest is str (“policy” variable).
Two measures of the economic background of students: sublunch and avginc
pctel also important because of O.V.B.
4
In a previous lecture, it was argued that avginc might have a non-linear
relationship with score:
> plot(avginc, score, xlim = c(5,60), ylim = c(600,710))
10 20 30 40 50 60
60
06
20
64
06
60
68
07
00
avginc
sco
re
5
What are some ways we can deal with this?
(i) Polynomials: > avginc2 = avginc^2
> avginc3 = avginc^3
> eqcubic = lm(score ~ avginc + avginc2 + avginc3)
> summary(eqcubic)
Call:
lm(formula = score ~ avginc + avginc2 + avginc3)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.001e+02 5.830e+00 102.937 < 2e-16 ***
avginc 5.019e+00 8.595e-01 5.839 1.06e-08 ***
avginc2 -9.581e-02 3.736e-02 -2.564 0.0107 *
avginc3 6.855e-04 4.720e-04 1.452 0.1471
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 12.71 on 416 degrees of freedom
Multiple R-squared: 0.5584, Adjusted R-squared: 0.5552
F-statistic: 175.4 on 3 and 416 DF, p-value: < 2.2e-16
6
Let’s plot the cubic regression function:
> par(new = TRUE)
> curve(600.1 + 5.019*x - 0.09581*x^2 + 0.0006855*x^3, xlim =
c(5,60), ylim = c(600,710), ylab = "", xlab = "", col = 2)
10 20 30 40 50 60
60
06
20
64
06
60
68
07
00
avginc
sco
re
10 20 30 40 50 60
60
06
20
64
06
60
68
07
00
7
(ii) Logarithms:
> eqlog = lm(score ~ log(avginc))
> summary(eqlog)
Call:
lm(formula = score ~ log(avginc))
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 557.832 4.200 132.81 <2e-16 ***
log(avginc) 36.420 1.571 23.18 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 12.62 on 418 degrees of freedom
Multiple R-squared: 0.5625, Adjusted R-squared: 0.5615
F-statistic: 537.4 on 1 and 418 DF, p-value: < 2.2e-16
Add this regression to the plot:
8
> par(new = TRUE)
> curve(557.832 + 36.42*log(x), xlim = c(5,60), ylim = c(600,710),
ylab = "", xlab = "", col = 3)
> legend("bottomright", c("Cubic", "Lin-Log"), pch ="__",
col=c(2,3))
10 20 30 40 50 60
60
06
20
64
06
60
68
07
00
avginc
sco
re
10 20 30 40 50 60
60
06
20
64
06
60
68
07
00
10 20 30 40 50 60
60
06
20
64
06
60
68
07
00
_
_Cubic
Lin-Log
9
Do you like the cubic or lin-log model better? What are the
advantages/disadvantages? Does heteroskedasticity appear to be present?
We will proceed by using log(avginc). But first, to revise omitted variable bias,
let’s see what happens if we leave log(avginc) out of the regression.
> eq1 = lm(score ~ str + pctel + sublunch)
> summary(eq1)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 700.14996 4.68569 149.423 < 2e-16 ***
str -0.99831 0.23875 -4.181 3.54e-05 ***
pctel -0.12157 0.03232 -3.762 0.000193 ***
sublunch -0.54735 0.02160 -25.341 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 9.08 on 416 degrees of freedom
Multiple R-squared: 0.7745, Adjusted R-squared: 0.7729
F-statistic: 476.3 on 3 and 416 DF, p-value: < 2.2e-16
10
Now add log(avginc):
> eq2 = lm(score ~ str + pctel + sublunch + log(avginc))
> summary(eq2)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 658.55195 7.68466 85.697 < 2e-16 ***
str -0.73433 0.23069 -3.183 0.00157 **
pctel -0.17553 0.03181 -5.518 6.06e-08 ***
sublunch -0.39823 0.03043 -13.088 < 2e-16 ***
log(avginc) 11.56897 1.74045 6.647 9.43e-11 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 8.643 on 415 degrees of freedom
Multiple R-squared: 0.7962, Adjusted R-squared: 0.7942
F-statistic: 405.4 on 4 and 415 DF, p-value: < 2.2e-16
How have the results changed? What is going on here?
11
Regressor (1) (2) (3) (4) (5) (6) (7)
str -1.00**
(0.24)
-0.73**
(0.23)
str2
str3
pctel -0.122**
(0.033)
-0.176**
(0.032)
hiel
hiel×str
hiel×str2
hiel×str3
sublunch -0.547**
(0.022)
-0.398**
(0.030)
log(avginc)
11.57**
(1.74)
Intercept 700.2**
(4.7)
658.6**
(7.7)
�̅�2 0.7729 0.7942
12
Let’s address (a): After controlling for differences in economic characteristics of
different districts, does the effect of str on score depend on the fraction of
English learners (pctel)?
An easier way to examine this might be to create a dummy variable.
Let’s define a new variable (high percentage of English learners):
hiel = 0 for classes with small percentage of English learners
hiel = 1 for classes with large percentage of English learners
How should we determine the threshold?
> summary(pctel)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000 1.941 8.778 15.770 22.970 85.540
13
Create hiel:
hiel = 0
hiel[pctel >= 10] = 1
To address (a), create the interaction term:
hielstr = hiel*str
14
Try a regression without economic controls:
> eq3 = lm(score ~ str + hiel + hielstr)
> summary(eq3)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 682.2458 10.5109 64.908 <2e-16 ***
str -0.9685 0.5398 -1.794 0.0735 .
hiel 5.6391 16.7177 0.337 0.7360
hielstr -1.2766 0.8441 -1.512 0.1312
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 15.88 on 416 degrees of freedom
Multiple R-squared: 0.3103, Adjusted R-squared: 0.3054
F-statistic: 62.4 on 3 and 416 DF, p-value: < 2.2e-16
Which coefficient should we be testing to see if str has a different effect for
classes with many English learners? What do we conclude?
In anticipation of (c), let’s test if str matters. Does it appear to matter from the
results above?
15
𝐻0: student-teacher ratio has no effect on test scores
𝐻0: model (3)
The model under the null hypothesis is:
> eqnul1 = lm(score ~ hiel)
> summary(eqnul1)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 663.482 1.068 621.16 <2e-16 ***
hiel -20.400 1.580 -12.91 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 16.13 on 418 degrees of freedom
Multiple R-squared: 0.2852, Adjusted R-squared: 0.2834
F-statistic: 166.7 on 1 and 418 DF, p-value: < 2.2e-16
16
Formula for F-statistic:
𝐹 =(𝑅𝑈
2 − 𝑅𝑅2) 𝑞⁄
(1 − 𝑅𝑈2) (𝑛 − 𝑘𝑈 − 1)⁄
𝐹 =(0.3103 − 0.2852) 2⁄
(1 − 0.3103) (420 − 3 − 1)⁄= 7.57
Since this is greater than the 5% critical value of 3.00, we reject the null.
Alternatively, use the following R-code to perform the test:
> anova(eq3,eqnul1)
Analysis of Variance Table
Model 1: score ~ str + hiel + hielstr
Model 2: score ~ hiel
Res.Df RSS Df Sum of Sq F Pr(>F)
1 416 104904
2 418 108734 -2 -3830.3 7.5947 0.000576 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
17
Let’s try a model with economic controls.
> eq4 = lm(score ~ str + hiel + hielstr + sublunch + log(avginc))
> summary(eq4)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 653.66612 8.89113 73.519 < 2e-16 ***
str -0.53103 0.30039 -1.768 0.0778 .
hiel 5.49821 9.13897 0.602 0.5478
hielstr -0.57767 0.46463 -1.243 0.2145
sublunch -0.41138 0.02869 -14.337 < 2e-16 ***
log(avginc) 12.12447 1.76513 6.869 2.38e-11 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 8.629 on 414 degrees of freedom
Multiple R-squared: 0.7974, Adjusted R-squared: 0.7949
F-statistic: 325.8 on 5 and 414 DF, p-value: < 2.2e-16
Has the conclusion (about a different effect for classes with many English
learners) changed?
18
Again, let’s test the null that str doesn’t matter.
Restricted model:
> eqnul2 = lm(score ~ hiel + sublunch + log(avginc))
> anova(eq4,eqnul2)
Analysis of Variance Table
Model 1: score ~ str + hiel + hielstr + sublunch + log(avginc)
Model 2: score ~ hiel + sublunch + log(avginc)
Res.Df RSS Df Sum of Sq F Pr(>F)
1 414 30824
2 416 31784 -2 -960.78 6.4523 0.001740 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
19
Regressor (1) (2) (3) (4) (5) (6) (7)
str -1.00**
(0.24)
-0.73**
(0.23)
-0.97
(0.54)
-0.53
(0.30)
str2
str3
pctel -0.122**
(0.033)
-0.176**
(0.032)
hiel
5.64
(16.7)
5.50
(9.1)
hiel×str
-1.28
(0.84)
-0.58
(0.47)
hiel×str2
hiel×str3
sublunch -0.547**
(0.022)
-0.398**
(0.030)
-0.411**
(0.029)
log(avginc)
11.57**
(1.74)
12.12**
(1.8)
Intercept 700.2**
(4.7)
658.6**
(7.7)
682.2**
(10.5)
653.7**
(8.9)
�̅�2 0.7729 0.7942 0.3054 0.7949
20
Now let’s address (b): is the relationship between str and score non-linear?
> str2 = str^2
> str3 = str^3
> eq5 = lm(score ~ str + str2 + str3 + hiel + sublunch +
log(avginc))
> summary(eq5)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 252.05089 165.82433 1.520 0.12928
str 64.33886 25.46223 2.527 0.01188 *
str2 -3.42388 1.29374 -2.646 0.00844 **
str3 0.05929 0.02174 2.728 0.00665 **
hiel -5.47399 1.03187 -5.305 1.84e-07 ***
sublunch -0.42006 0.02814 -14.928 < 2e-16 ***
log(avginc) 11.74818 1.73446 6.773 4.34e-11 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 8.559 on 413 degrees of freedom
Multiple R-squared: 0.8011, Adjusted R-squared: 0.7982
F-statistic: 277.2 on 6 and 413 DF, p-value: < 2.2e-16
21
Regressor (1) (2) (3) (4) (5) (6) (7)
str -1.00**
(0.24)
-0.73**
(0.23)
-0.97
(0.54)
-0.53
(0.30)
64.33**
(25.5)
str2
-3.42**
(1.29)
str3
0.059**
(0.022)
pctel -0.122**
(0.033)
-0.176**
(0.032)
hiel
5.64
(16.7)
5.50
(9.1)
-5.47**
(1.03)
hiel×str
-1.28
(0.84)
-0.58
(0.47)
hiel×str2
hiel×str3
sublunch -0.547**
(0.022)
-0.398**
(0.030)
-0.411**
(0.029)
-0.420**
(0.028)
log(avginc)
11.57**
(1.74)
12.12**
(1.8)
11.75**
(1.7)
Intercept 700.2**
(4.7)
658.6**
(7.7)
682.2**
(10.5)
653.7**
(8.9)
252.0
(165.8)
�̅�2 0.7729 0.7942 0.3054 0.7949 0.7982
22
To test the null hypothesis that the relationship between str and score is linear,
estimate a restricted model and compare it to model (5):
> eqnul3 = lm(score ~ hiel + sublunch + log(avginc))
> anova(eq5,eqnul3)
Analysis of Variance Table
Model 1: score ~ str + str2 + str3 + hiel + sublunch + log(avginc)
Model 2: score ~ hiel + sublunch + log(avginc)
Res.Df RSS Df Sum of Sq F Pr(>F)
1 413 30257
2 416 31784 -3 -1527.7 6.9512 0.0001424 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
What do you conclude?
What other way might you try to capture this non-linear effect?
How would you test to see if str matters, using model (5)?
23
Let’s reconsider (a) under the cubic specification. We want to know if the effect
of str on score is different for classes with a high percentage of English learners.
Again, the strategy is:
have the dummy variable hiel interact with all terms involving str
this allows for the “marginal effect” to differ between the two groups
testing to see if the coeffecients on the interaction terms are jointly equal to
zero is equivalent to testing that there is no difference between the two
groups
Create the new interaction terms:
hielstr2 = hiel*str2
hielstr3 = hiel*str3
Add the interaction terms to model (5):
eq6 = lm(score ~ str + str2 + str3 + hiel + hielstr + hielstr2 +
hielstr3 + sublunch + log(avginc))
24
Regressor (1) (2) (3) (4) (5) (6) (7)
str -1.00**
(0.24)
-0.73**
(0.23)
-0.97
(0.54)
-0.53
(0.30)
64.33**
(25.5)
83.70**
(29.69)
str2
-3.42**
(1.29)
-4.38**
(1.51)
str3
0.059**
(0.022)
0.075**
(0.025)
pctel -0.122**
(0.033)
-0.176**
(0.032)
hiel
5.64
(16.7)
5.50
(9.1)
-5.47**
(1.03)
816.1*
(434.61)
hiel×str
-1.28
(0.84)
-0.58
(0.47)
-123.3*
(66.35)
hiel×str2
6.12*
(3.35)
hiel×str3
-0.101*
(0.056)
sublunch -0.547**
(0.022)
-0.398**
(0.030)
-0.411**
(0.029)
-0.420**
(0.028)
-0.418**
(0.029)
log(avginc)
11.57**
(1.74)
12.12**
(1.8)
11.75**
(1.7)
11.80**
(1.75)
Intercept 700.2**
(4.7)
658.6**
(7.7)
682.2**
(10.5)
653.7**
(8.9)
252.0
(165.8)
122.4
(192.2)
�̅�2 0.7729 0.7942 0.3054 0.7949 0.7982 0.7988
25
How do we test (a) using model (6)?
> anova(eq6,eq5)
Analysis of Variance Table
Model 1: score ~ str + str2 + str3 + hiel + hielstr + hielstr2 +
hielstr3 +
sublunch + log(avginc)
Model 2: score ~ str + str2 + str3 + hiel + sublunch + log(avginc)
Res.Df RSS Df Sum of Sq F Pr(>F)
1 410 29954
2 413 30257 -3 -302.33 1.3794 0.2485
So, once again, we can’t reject the null that the effect of str on score is the same
regardless of number of English learners.
This suggests that the interaction terms are not needed, and model (5) is
adequate.
For a final model, let’s make sure that our results are invariant to the use of hiel
or pctel. eq7 = lm(score ~ str + str2 + str3 + pctel + sublunch +
log(avginc))
26
Regressor (1) (2) (3) (4) (5) (6) (7)
str -1.00**
(0.24)
-0.73**
(0.23)
-0.97
(0.54)
-0.53
(0.30)
64.33**
(25.5)
83.70**
(29.69)
65.29**
(25.48)
str2
-3.42**
(1.29)
-4.38**
(1.51)
-3.47**
(1.30)
str3
0.059**
(0.022)
0.075**
(0.025)
0.060**
(0.022)
pctel -0.122**
(0.033)
-0.176**
(0.032)
-0.166**
(0.032)
hiel
5.64
(16.7)
5.50
(9.1)
-5.47**
(1.03)
816.1*
(434.61)
hiel×str
-1.28
(0.84)
-0.58
(0.47)
-123.3*
(66.35)
hiel×str2
6.12*
(3.35)
hiel×str3
-0.101*
(0.056)
sublunch -0.547**
(0.022)
-0.398**
(0.030)
-0.411**
(0.029)
-0.420**
(0.028)
-0.418**
(0.029)
-0.402**
(0.030)
log(avginc)
11.57**
(1.74)
12.12**
(1.8)
11.75**
(1.7)
11.80**
(1.75)
11.51**
(1.73)
Intercept 700.2**
(4.7)
658.6**
(7.7)
682.2**
(10.5)
653.7**
(8.9)
252.0
(165.8)
122.4
(192.2)
244.8
(165.9)
�̅�2 0.7729 0.7942 0.3054 0.7949 0.7982 0.7988 0.7978
27
Summary
(a) Based on hypothesis tests involving models (3), (4) and (6), there doesn’t
appear to be a substantial difference in the effect of str on score for classes with
many English learners.
(b) A hypothesis test involving model (5) indicates the relationship between str
and score is non-linear.
(c) Using F-tests, the null hypothesis that str has no effect on score is rejected in
all models. (Only one of these F-tests was shown). Model (5) and (7) should be
our preferred models based on the sequence of testing. Let’s use them to provide
some “policy recommendation.”
If str = 20, then reducing str to 18 would improve score by 3.00 using model
(5), and 2.93 using model (7).
If str = 22, then reducing str to 20 would improve score by 1.93 (model 5) or
1.90 (model 7).