polynomial regression models possible models for when the response function is “curved”

36
Polynomial regression models Possible models for when the response function is “curved”

Upload: adele-fox

Post on 23-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Polynomial regression models Possible models for when the response function is “curved”

Polynomial regression models

Possible models for when the response function is “curved”

Page 2: Polynomial regression models Possible models for when the response function is “curved”

Uses of polynomial models

• When the true response function really is a polynomial function.

• (Very common!) When the true response function is unknown or complex, but a polynomial function approximates the true function well.

Page 3: Polynomial regression models Possible models for when the response function is “curved”

Example

• What is impact of exercise on human immune system?

• Is amount of immunoglobin in blood (y) related to maximal oxygen uptake (x) (in a curved manner)?

Page 4: Polynomial regression models Possible models for when the response function is “curved”

7060504030

2000

1500

1000

Immunoglobin (mg)

Max

imal

oxy

ge

n up

take

(m

l/kg

)

Scatter plot

Page 5: Polynomial regression models Possible models for when the response function is “curved”

A quadratic polynomial regression function

iiii XXY 21110

where:

• Yi = amount of immunoglobin in blood (mg)

• Xi = maximal oxygen uptake (ml/kg)

• typical assumptions about error terms (“INE”)

Page 6: Polynomial regression models Possible models for when the response function is “curved”

Estimated quadratic function

7060504030

2000

1500

1000

oxygen

igg

S = 106.427 R-Sq = 93.8 % R-Sq(adj) = 93.3 %

igg = -1464.40 + 88.3071 oxygen - 0.536247 oxygen**2

Regression Plot

Page 7: Polynomial regression models Possible models for when the response function is “curved”

Interpretation of the regression coefficients

• If 0 is a possible x value, then b0 is the predicted response. Otherwise, interpretation of b0 is meaningless.

• b1 does not have a very helpful interpretation. It is the slope of the tangent line at x = 0.

• b2 indicates the up/down direction of curve– b2 < 0 means curve is concave down– b2 > 0 means curve is concave up

Page 8: Polynomial regression models Possible models for when the response function is “curved”

The regression equation is igg = - 1464 + 88.3 oxygen - 0.536 oxygensq

Predictor Coef SE Coef T P VIFConstant -1464.4 411.4 -3.56 0.001oxygen 88.31 16.47 5.36 0.000 99.9oxygensq -0.5362 0.1582 -3.39 0.002 99.9

S = 106.4 R-Sq = 93.8% R-Sq(adj) = 93.3%

Analysis of Variance

Source DF SS MS F PRegression 2 4602211 2301105 203.16 0.000Residual Error 27 305818 11327Total 29 4908029

Source DF Seq SSoxygen 1 4472047oxygensq 1 130164

Page 9: Polynomial regression models Possible models for when the response function is “curved”

A multicollinearity problem

7060504030

5000

4000

3000

2000

1000

oxygen

oxy

ge

nsq

Pearson correlation of oxygen and oxygensq = 0.995

Page 10: Polynomial regression models Possible models for when the response function is “curved”

“Center” the predictors

637.50OxygenOxCent

2637.50 OxygenOxCentSq

Mean of oxygen = 50.637

oxygen oxcent oxcentsq 34.6 -16.037 257.185 45.0 -5.637 31.776 62.3 11.663 136.026 58.9 8.263 68.277 42.5 -8.137 66.211 44.3 -6.337 40.158 67.9 17.263 298.011 58.5 7.863 61.827 35.6 -15.037 226.111 49.6 -1.037 1.075 33.0 -17.637 311.064

Page 11: Polynomial regression models Possible models for when the response function is “curved”

Does it really work?

20100-10-20

400

300

200

100

0

oxcent

oxc

ent

sq

Pearson correlation of oxcent and oxcentsq = 0.219

Page 12: Polynomial regression models Possible models for when the response function is “curved”

A better quadratic polynomial regression function

iiii xxY 2*11

*1

*0

XXx ii where denotes the centered predictor, and

β*0 = mean response at the predictor mean

β*1 = “linear effect coefficient”

β*11 = “quadratic effect coefficient”

Page 13: Polynomial regression models Possible models for when the response function is “curved”

The regression equation isigg = 1632 + 34.0 oxcent - 0.536 oxcentsq

Predictor Coef SE Coef T P VIFConstant 1632.20 29.35 55.61 0.000oxcent 34.000 1.689 20.13 0.000 1.1oxcentsq -0.5362 0.1582 -3.39 0.002 1.1

S = 106.4 R-Sq = 93.8% R-Sq(adj) = 93.3%

Analysis of Variance

Source DF SS MS F PRegression 2 4602211 2301105 203.16 0.000Residual Error 27 305818 11327Total 29 4908029

Source DF Seq SSoxcent 1 4472047oxcentsq 1 130164

Page 14: Polynomial regression models Possible models for when the response function is “curved”

Interpretation of the regression coefficients

• b0 is predicted response at the predictor mean.

• b1 is the estimated slope of the tangent line at the predictor mean; and, typically, also the estimated slope in the simple model.

• b2 indicates the up/down direction of curve

– b2 < 0 means curve is concave down

– b2 > 0 means curve is concave up

Page 15: Polynomial regression models Possible models for when the response function is “curved”

20 10 0-10-20

2000

1500

1000

oxcent

igg

S = 106.427 R-Sq = 93.8 % R-Sq(adj) = 93.3 %

igg = 1632.20 + 33.9995 oxcent - 0.536247 oxcent**2

Regression Plot

Estimated regression function

Page 16: Polynomial regression models Possible models for when the response function is “curved”

Similar estimates

20 10 0-10-20

2000

1500

1000

oxcent

igg

S = 124.783 R-Sq = 91.1 % R-Sq(adj) = 90.8 %

igg = 1557.63 + 32.7427 oxcent

Regression Plot

Page 17: Polynomial regression models Possible models for when the response function is “curved”

The relationship between the two forms of the model

2*11

*1

*0

ˆiii xbxbbY Centered model:

21110

ˆiii XbXbbY Original model:

*1111

*11

*11

2*11

*1

*00

2

bb

Xbbb

XbXbbb

Where:

Page 18: Polynomial regression models Possible models for when the response function is “curved”

25362.00.342.1632ˆiii xxY

5362.0

3.88)637.50)(5362.(234

3.1464)637.50(5362.0)637.50(342.1632

11

1

20

b

b

b

2536.031.884.1464ˆiii XXY

Mean of oxygen = 50.637

Page 19: Polynomial regression models Possible models for when the response function is “curved”

200015001000

200

100

0

-100

-200

Fitted Value

Res

idua

lResiduals Versus the Fitted Values

(response is igg)

Page 20: Polynomial regression models Possible models for when the response function is “curved”

2001000-100-200

2

1

0

-1

-2

Nor

mal

Sco

re

Residual

Normal Probability Plot of the Residuals(response is igg)

Page 21: Polynomial regression models Possible models for when the response function is “curved”

What is predicted IgG if maximal oxygen uptake is 90?

There is an even greater danger in extrapolation when modeling data with a polynomial function, because of changes in direction.

Predicted Values for New Observations

New Obs Fit SE Fit 95.0% CI 95.0% PI1 2139.6 219.2 (1689.8,2589.5) (1639.6,2639.7) XXX denotes a row with X values away from the centerXX denotes a row with very extreme X values

Values of Predictors for New Observations

New Obs oxcent oxcentsq1 39.4 1549

Page 22: Polynomial regression models Possible models for when the response function is “curved”

It is possible to “overfit” the data with polynomial models.

65432

8

7

6

5

4

3

2

x

y

S = 2.62950 R-Sq = 64.0 % R-Sq(adj) = 0.0 %

- 8.64286 x**2 + 0.666667 x**3

y = -38.4 + 34.9762 x

Regression Plot

Page 23: Polynomial regression models Possible models for when the response function is “curved”

It is even theoretically possible to fit the data perfectly.

If you have n data points, then a polynomial of order n-1 will fit the data perfectly, that is, it will pass through each data point.

** Error ** Not enough non-missing observations to fit a polynomial of this order; execution aborted

But, good statistical software will keep an unsuspecting user from fitting such a model.

Page 24: Polynomial regression models Possible models for when the response function is “curved”

The hierarchical approach to model fitting

Widely accepted approach is to fit a higher-order model and then explore whether a lower-order (simpler) model is adequate.

iiiii xxxY 3111

21110

Is a first-order linear model (“line”) adequate?

0: 111110 H

Page 25: Polynomial regression models Possible models for when the response function is “curved”

The hierarchical approach to model fitting

But then … if a polynomial term of a given order is retained, then all related lower-order terms are also retained.

That is, if a quadratic term was significant, you would use this regression function:

21110 iii xxYE

2110 ii xYE

and not this one:

Page 26: Polynomial regression models Possible models for when the response function is “curved”

Example

• Quality of a product (y) – a score between 0 and 100

• Temperature (x1) – degrees Fahrenheit

• Pressure (x2) – pounds per square inch

Page 27: Polynomial regression models Possible models for when the response function is “curved”

82.725

53.375

95

85

82.72553.375

57.5

52.5

9585 57.552.5

quality

temp

pressure

Page 28: Polynomial regression models Possible models for when the response function is “curved”

A two-predictor, second-order polynomial regression function

iiiiiiii XXXXXXY 21122222

211122110

where:

• Yi = quality

• Xi1 = temperature

• Xi2 = pressure

• β12 = “interaction effect coefficient”

Page 29: Polynomial regression models Possible models for when the response function is “curved”

The regression equation isquality = - 5128 + 31.1 temp + 140 pressure - 0.133 tempsq - 1.14 presssq - 0.145 tp

Predictor Coef SE Coef T P VIFConstant -5127.9 110.3 -46.49 0.000temp 31.096 1.344 23.13 0.000 1154.5pressure 139.747 3.140 44.50 0.000 1574.5tempsq -0.133389 0.006853 -19.46 0.000 973.0Press -1.14422 0.02741 -41.74 0.000 1453.0tp -0.145500 0.009692 -15.01 0.000 304.0

S = 1.679 R-Sq = 99.3% R-Sq(adj) = 99.1%

Page 30: Polynomial regression models Possible models for when the response function is “curved”

Again, some correlation

quality temp pressure tempsq presssqtemp -0.423pressure 0.182 0.000tempsq -0.434 0.999 0.000presssq 0.162 0.000 1.000 -0.000tp -0.227 0.773 0.632 0.772 0.632

Cell Contents: Pearson correlation

Page 31: Polynomial regression models Possible models for when the response function is “curved”

A better two-predictor, second-order polynomial regression function

iiiiiiii xxxxxxY 21*12

22

*22

21

*112

*21

*1

*0

where:

• Yi = quality

• xi1 = centered temperature

• xi2 = centered pressure

• β*12 = “interaction effect coefficient”

Page 32: Polynomial regression models Possible models for when the response function is “curved”

Reduced correlation

quality tcent pcent tpcent tcentsqtcent -0.423pcent 0.182 0.000tpcent -0.274 0.000 0.000tcentsq -0.355 -0.000 0.000 0.000pcentsq -0.762 0.000 0.000 0.000 -0.000

Cell Contents: Pearson correlation

Page 33: Polynomial regression models Possible models for when the response function is “curved”

The regression equation isquality = 94.9 - 0.916 tcent + 0.788 pcent - 0.146 tpcent - 0.133 tcentsq - 1.14 pcentsq

Predictor Coef SE Coef T P VIFConstant 94.9259 0.7224 131.40 0.000tcent -0.91611 0.03957 -23.15 0.000 1.0pcent 0.78778 0.07913 9.95 0.000 1.0tpcent -0.145500 0.009692 -15.01 0.000 1.0tcentsq -0.133389 0.006853 -19.46 0.000 1.0pcentsq -1.14422 0.02741 -41.74 0.000 1.0

S = 1.679 R-Sq = 99.3% R-Sq(adj) = 99.1%

Page 34: Polynomial regression models Possible models for when the response function is “curved”

100908070605040

3

2

1

0

-1

-2

-3

Fitted Value

Res

idua

l

Residuals Versus the Fitted Values(response is quality)

Page 35: Polynomial regression models Possible models for when the response function is “curved”

3210-1-2-3

2

1

0

-1

-2

Nor

mal

Sco

re

Residual

Normal Probability Plot of the Residuals(response is quality)

Page 36: Polynomial regression models Possible models for when the response function is “curved”

Predicted Values for New Observations

New Obs Fit SE Fit 95.0% CI 95.0% PI1 94.926 0.722 (93.424,96.428) (91.125,98.726)

Values of Predictors for New Observations

New Obs tcent pcent tpcent tcentsq pcentsq1 0.0000 0.0000 0.0000 0.0000 0.0000