chapter 8 conclusion - university of manitobahome.cc.umanitoba.ca/~godwinrt/3040/overheads/test...

1

Chapter 8 Conclusion

Three questions about test scores (score) and student-teacher ratio (str):

a) After controlling for differences in economic characteristics of different

districts, does the effect of str on score depend on the fraction of English

learners (pctel)?

b) Does this effect depend on str? (Is there a non-linear relationship?)

c) After taking economic factors and nonlinearities into account, what is the

estimated effect on score of reducing str?

2

> teachdata =

read.csv("http://home.cc.umanitoba.ca/~godwinrt/3180/data/str3.csv")

> attach(teachdata)

> head(teachdata)

sublunch score str avginc pctel

1 2.0408 690.80 17.88991 22.690001 0.000000

2 47.9167 661.20 21.52466 9.824000 4.583333

3 76.3226 643.60 18.69723 8.978000 30.000002

4 77.0492 647.70 17.35714 8.978000 0.000000

5 78.4270 640.85 18.67133 9.080333 13.857677

6 86.9565 605.55 21.40625 10.415000 12.408759

3

An economics study should always include a description of the data:

sublunch – percent qualifying for reduced-price lunch

score – average test score

str – student teacher ratio

avginc – district average income (in $1000’s)

pctel – percentage of English learners

It is also common to provide descriptive statistics for the variables.

The variable of interest is str (“policy” variable).

Two measures of the economic background of students: sublunch and avginc

pctel also important because of O.V.B.

4

In a previous lecture, it was argued that avginc might have a non-linear

relationship with score:

> plot(avginc, score, xlim = c(5,60), ylim = c(600,710))

10 20 30 40 50 60

60

06

20

64

06

60

68

07

00

avginc

sco

re

5

What are some ways we can deal with this?

(i) Polynomials: > avginc2 = avginc^2

> avginc3 = avginc^3

> eqcubic = lm(score ~ avginc + avginc2 + avginc3)

> summary(eqcubic)

Call:

lm(formula = score ~ avginc + avginc2 + avginc3)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 6.001e+02 5.830e+00 102.937 < 2e-16 ***

avginc 5.019e+00 8.595e-01 5.839 1.06e-08 ***

avginc2 -9.581e-02 3.736e-02 -2.564 0.0107 *

avginc3 6.855e-04 4.720e-04 1.452 0.1471

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 12.71 on 416 degrees of freedom

Multiple R-squared: 0.5584, Adjusted R-squared: 0.5552

F-statistic: 175.4 on 3 and 416 DF, p-value: < 2.2e-16

6

Let’s plot the cubic regression function:

> par(new = TRUE)

> curve(600.1 + 5.019*x - 0.09581*x^2 + 0.0006855*x^3, xlim =

c(5,60), ylim = c(600,710), ylab = "", xlab = "", col = 2)

10 20 30 40 50 60

60

06

20

64

06

60

68

07

00

avginc

sco

re

10 20 30 40 50 60

60

06

20

64

06

60

68

07

00

7

(ii) Logarithms:

> eqlog = lm(score ~ log(avginc))

> summary(eqlog)

Call:

lm(formula = score ~ log(avginc))

Coefficients:


(Intercept) 557.832 4.200 132.81 <2e-16 ***

log(avginc) 36.420 1.571 23.18 <2e-16 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1




Add this regression to the plot:

8

> par(new = TRUE)

> curve(557.832 + 36.42*log(x), xlim = c(5,60), ylim = c(600,710),

ylab = "", xlab = "", col = 3)

> legend("bottomright", c("Cubic", "Lin-Log"), pch ="__",

col=c(2,3))

10 20 30 40 50 60

60

06

20

64

06

60

68

07

00

avginc

sco

re

10 20 30 40 50 60

60

06

20

64

06

60

68

07

00

10 20 30 40 50 60

60

06

20

64

06

60

68

07

00

_

_Cubic

Lin-Log

9

Do you like the cubic or lin-log model better? What are the

advantages/disadvantages? Does heteroskedasticity appear to be present?

We will proceed by using log(avginc). But first, to revise omitted variable bias,

let’s see what happens if we leave log(avginc) out of the regression.

> eq1 = lm(score ~ str + pctel + sublunch)

> summary(eq1)

Coefficients:


(Intercept) 700.14996 4.68569 149.423 < 2e-16 ***

str -0.99831 0.23875 -4.181 3.54e-05 ***

pctel -0.12157 0.03232 -3.762 0.000193 ***

sublunch -0.54735 0.02160 -25.341 < 2e-16 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1




10

Now add log(avginc):

> eq2 = lm(score ~ str + pctel + sublunch + log(avginc))

> summary(eq2)

Coefficients:


(Intercept) 658.55195 7.68466 85.697 < 2e-16 ***

str -0.73433 0.23069 -3.183 0.00157 **

pctel -0.17553 0.03181 -5.518 6.06e-08 ***

sublunch -0.39823 0.03043 -13.088 < 2e-16 ***

log(avginc) 11.56897 1.74045 6.647 9.43e-11 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1




How have the results changed? What is going on here?

11

Regressor (1) (2) (3) (4) (5) (6) (7)

str -1.00**

(0.24)

-0.73**

(0.23)

str2

str3

pctel -0.122**

(0.033)

-0.176**

(0.032)

hiel

hiel×str

hiel×str2

hiel×str3

sublunch -0.547**

(0.022)

-0.398**

(0.030)

log(avginc)

11.57**

(1.74)

Intercept 700.2**

(4.7)

658.6**

(7.7)

�̅�2 0.7729 0.7942

12

Let’s address (a): After controlling for differences in economic characteristics of

different districts, does the effect of str on score depend on the fraction of

English learners (pctel)?

An easier way to examine this might be to create a dummy variable.

Let’s define a new variable (high percentage of English learners):

hiel = 0 for classes with small percentage of English learners

hiel = 1 for classes with large percentage of English learners

How should we determine the threshold?

> summary(pctel)

Min. 1st Qu. Median Mean 3rd Qu. Max.

0.000 1.941 8.778 15.770 22.970 85.540

13

Create hiel:

hiel = 0

hiel[pctel >= 10] = 1

To address (a), create the interaction term:

hielstr = hiel*str

14

Try a regression without economic controls:

> eq3 = lm(score ~ str + hiel + hielstr)

> summary(eq3)

Coefficients:


(Intercept) 682.2458 10.5109 64.908 <2e-16 ***

str -0.9685 0.5398 -1.794 0.0735 .

hiel 5.6391 16.7177 0.337 0.7360

hielstr -1.2766 0.8441 -1.512 0.1312

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1




Which coefficient should we be testing to see if str has a different effect for

classes with many English learners? What do we conclude?

In anticipation of (c), let’s test if str matters. Does it appear to matter from the

results above?

15

𝐻0: student-teacher ratio has no effect on test scores

𝐻0: model (3)

The model under the null hypothesis is:

> eqnul1 = lm(score ~ hiel)

> summary(eqnul1)

Coefficients:


(Intercept) 663.482 1.068 621.16 <2e-16 ***

hiel -20.400 1.580 -12.91 <2e-16 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1




16

Formula for F-statistic:

𝐹 =(𝑅𝑈

2 − 𝑅𝑅2) 𝑞⁄

(1 − 𝑅𝑈2) (𝑛 − 𝑘𝑈 − 1)⁄

𝐹 =(0.3103 − 0.2852) 2⁄

(1 − 0.3103) (420 − 3 − 1)⁄= 7.57

Since this is greater than the 5% critical value of 3.00, we reject the null.

Alternatively, use the following R-code to perform the test:

> anova(eq3,eqnul1)

Analysis of Variance Table

Model 1: score ~ str + hiel + hielstr

Model 2: score ~ hiel

Res.Df RSS Df Sum of Sq F Pr(>F)

1 416 104904

2 418 108734 -2 -3830.3 7.5947 0.000576 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

17

Let’s try a model with economic controls.

> eq4 = lm(score ~ str + hiel + hielstr + sublunch + log(avginc))

> summary(eq4)

Coefficients:


(Intercept) 653.66612 8.89113 73.519 < 2e-16 ***

str -0.53103 0.30039 -1.768 0.0778 .

hiel 5.49821 9.13897 0.602 0.5478

hielstr -0.57767 0.46463 -1.243 0.2145

sublunch -0.41138 0.02869 -14.337 < 2e-16 ***

log(avginc) 12.12447 1.76513 6.869 2.38e-11 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1




Has the conclusion (about a different effect for classes with many English

learners) changed?

18

Again, let’s test the null that str doesn’t matter.

Restricted model:

> eqnul2 = lm(score ~ hiel + sublunch + log(avginc))

> anova(eq4,eqnul2)


Model 1: score ~ str + hiel + hielstr + sublunch + log(avginc)

Model 2: score ~ hiel + sublunch + log(avginc)


1 414 30824

2 416 31784 -2 -960.78 6.4523 0.001740 **

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

19

Regressor (1) (2) (3) (4) (5) (6) (7)

str -1.00**

(0.24)

-0.73**

(0.23)

-0.97

(0.54)

-0.53

(0.30)

str2

str3

pctel -0.122**

(0.033)

-0.176**

(0.032)

hiel

5.64

(16.7)

5.50

(9.1)

hiel×str

-1.28

(0.84)

-0.58

(0.47)

hiel×str2

hiel×str3

sublunch -0.547**

(0.022)

-0.398**

(0.030)

-0.411**

(0.029)

log(avginc)

11.57**

(1.74)

12.12**

(1.8)

Intercept 700.2**

(4.7)

658.6**

(7.7)

682.2**

(10.5)

653.7**

(8.9)

�̅�2 0.7729 0.7942 0.3054 0.7949

20

Now let’s address (b): is the relationship between str and score non-linear?

> str2 = str^2

> str3 = str^3

> eq5 = lm(score ~ str + str2 + str3 + hiel + sublunch +

log(avginc))

> summary(eq5)

Coefficients:


(Intercept) 252.05089 165.82433 1.520 0.12928

str 64.33886 25.46223 2.527 0.01188 *

str2 -3.42388 1.29374 -2.646 0.00844 **

str3 0.05929 0.02174 2.728 0.00665 **

hiel -5.47399 1.03187 -5.305 1.84e-07 ***

sublunch -0.42006 0.02814 -14.928 < 2e-16 ***

log(avginc) 11.74818 1.73446 6.773 4.34e-11 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1




21

Regressor (1) (2) (3) (4) (5) (6) (7)

str -1.00**

(0.24)

-0.73**

(0.23)

-0.97

(0.54)

-0.53

(0.30)

64.33**

(25.5)

str2

-3.42**

(1.29)

str3

0.059**

(0.022)

pctel -0.122**

(0.033)

-0.176**

(0.032)

hiel

5.64

(16.7)

5.50

(9.1)

-5.47**

(1.03)

hiel×str

-1.28

(0.84)

-0.58

(0.47)

hiel×str2

hiel×str3

sublunch -0.547**

(0.022)

-0.398**

(0.030)

-0.411**

(0.029)

-0.420**

(0.028)

log(avginc)

11.57**

(1.74)

12.12**

(1.8)

11.75**

(1.7)

Intercept 700.2**

(4.7)

658.6**

(7.7)

682.2**

(10.5)

653.7**

(8.9)

252.0

(165.8)

�̅�2 0.7729 0.7942 0.3054 0.7949 0.7982

22

To test the null hypothesis that the relationship between str and score is linear,

estimate a restricted model and compare it to model (5):

> eqnul3 = lm(score ~ hiel + sublunch + log(avginc))

> anova(eq5,eqnul3)


Model 1: score ~ str + str2 + str3 + hiel + sublunch + log(avginc)

Model 2: score ~ hiel + sublunch + log(avginc)


1 413 30257

2 416 31784 -3 -1527.7 6.9512 0.0001424 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

What do you conclude?

What other way might you try to capture this non-linear effect?

How would you test to see if str matters, using model (5)?

23

Let’s reconsider (a) under the cubic specification. We want to know if the effect

of str on score is different for classes with a high percentage of English learners.

Again, the strategy is:

have the dummy variable hiel interact with all terms involving str

this allows for the “marginal effect” to differ between the two groups

testing to see if the coeffecients on the interaction terms are jointly equal to

zero is equivalent to testing that there is no difference between the two

groups

Create the new interaction terms:

hielstr2 = hiel*str2

hielstr3 = hiel*str3

Add the interaction terms to model (5):

eq6 = lm(score ~ str + str2 + str3 + hiel + hielstr + hielstr2 +

hielstr3 + sublunch + log(avginc))

24

Regressor (1) (2) (3) (4) (5) (6) (7)

str -1.00**

(0.24)

-0.73**

(0.23)

-0.97

(0.54)

-0.53

(0.30)

64.33**

(25.5)

83.70**

(29.69)

str2

-3.42**

(1.29)

-4.38**

(1.51)

str3

0.059**

(0.022)

0.075**

(0.025)

pctel -0.122**

(0.033)

-0.176**

(0.032)

hiel

5.64

(16.7)

5.50

(9.1)

-5.47**

(1.03)

816.1*

(434.61)

hiel×str

-1.28

(0.84)

-0.58

(0.47)

-123.3*

(66.35)

hiel×str2

6.12*

(3.35)

hiel×str3

-0.101*

(0.056)

sublunch -0.547**

(0.022)

-0.398**

(0.030)

-0.411**

(0.029)

-0.420**

(0.028)

-0.418**

(0.029)

log(avginc)

11.57**

(1.74)

12.12**

(1.8)

11.75**

(1.7)

11.80**

(1.75)

Intercept 700.2**

(4.7)

658.6**

(7.7)

682.2**

(10.5)

653.7**

(8.9)

252.0

(165.8)

122.4

(192.2)

�̅�2 0.7729 0.7942 0.3054 0.7949 0.7982 0.7988

25

How do we test (a) using model (6)?

> anova(eq6,eq5)


Model 1: score ~ str + str2 + str3 + hiel + hielstr + hielstr2 +

hielstr3 +

sublunch + log(avginc)

Model 2: score ~ str + str2 + str3 + hiel + sublunch + log(avginc)


1 410 29954

2 413 30257 -3 -302.33 1.3794 0.2485

So, once again, we can’t reject the null that the effect of str on score is the same

regardless of number of English learners.

This suggests that the interaction terms are not needed, and model (5) is

adequate.

For a final model, let’s make sure that our results are invariant to the use of hiel

or pctel. eq7 = lm(score ~ str + str2 + str3 + pctel + sublunch +

log(avginc))

26

Regressor (1) (2) (3) (4) (5) (6) (7)

str -1.00**

(0.24)

-0.73**

(0.23)

-0.97

(0.54)

-0.53

(0.30)

64.33**

(25.5)

83.70**

(29.69)

65.29**

(25.48)

str2

-3.42**

(1.29)

-4.38**

(1.51)

-3.47**

(1.30)

str3

0.059**

(0.022)

0.075**

(0.025)

0.060**

(0.022)

pctel -0.122**

(0.033)

-0.176**

(0.032)

-0.166**

(0.032)

hiel

5.64

(16.7)

5.50

(9.1)

-5.47**

(1.03)

816.1*

(434.61)

hiel×str

-1.28

(0.84)

-0.58

(0.47)

-123.3*

(66.35)

hiel×str2

6.12*

(3.35)

hiel×str3

-0.101*

(0.056)

sublunch -0.547**

(0.022)

-0.398**

(0.030)

-0.411**

(0.029)

-0.420**

(0.028)

-0.418**

(0.029)

-0.402**

(0.030)

log(avginc)

11.57**

(1.74)

12.12**

(1.8)

11.75**

(1.7)

11.80**

(1.75)

11.51**

(1.73)

Intercept 700.2**

(4.7)

658.6**

(7.7)

682.2**

(10.5)

653.7**

(8.9)

252.0

(165.8)

122.4

(192.2)

244.8

(165.9)

�̅�2 0.7729 0.7942 0.3054 0.7949 0.7982 0.7988 0.7978

27

Summary

(a) Based on hypothesis tests involving models (3), (4) and (6), there doesn’t

appear to be a substantial difference in the effect of str on score for classes with

many English learners.

(b) A hypothesis test involving model (5) indicates the relationship between str

and score is non-linear.

(c) Using F-tests, the null hypothesis that str has no effect on score is rejected in

all models. (Only one of these F-tests was shown). Model (5) and (7) should be

our preferred models based on the sequence of testing. Let’s use them to provide

some “policy recommendation.”

If str = 20, then reducing str to 18 would improve score by 3.00 using model

(5), and 2.93 using model (7).

If str = 22, then reducing str to 20 would improve score by 1.93 (model 5) or

1.90 (model 7).

chapter 8 conclusion - university of manitobahome.cc.umanitoba.ca/~godwinrt/3040/overheads/test...

Documents