polynomial regression models possible models for when the response function is “curved”

Polynomial regression models

Possible models for when the response function is “curved”

Uses of polynomial models

• When the true response function really is a polynomial function.

• (Very common!) When the true response function is unknown or complex, but a polynomial function approximates the true function well.

Example

• What is impact of exercise on human immune system?

• Is amount of immunoglobin in blood (y) related to maximal oxygen uptake (x) (in a curved manner)?

7060504030

Immunoglobin (mg)

Scatter plot

A quadratic polynomial regression function

iiii XXY 21110

where:

• Yi = amount of immunoglobin in blood (mg)

• Xi = maximal oxygen uptake (ml/kg)

• typical assumptions about error terms (“INE”)

Estimated quadratic function

7060504030

oxygen

S = 106.427 R-Sq = 93.8 % R-Sq(adj) = 93.3 %

igg = -1464.40 + 88.3071 oxygen - 0.536247 oxygen**2

Regression Plot

Interpretation of the regression coefficients

• If 0 is a possible x value, then b0 is the predicted response. Otherwise, interpretation of b0 is meaningless.

• b1 does not have a very helpful interpretation. It is the slope of the tangent line at x = 0.

• b2 indicates the up/down direction of curve– b2 < 0 means curve is concave down– b2 > 0 means curve is concave up

The regression equation is igg = - 1464 + 88.3 oxygen - 0.536 oxygensq

Predictor Coef SE Coef T P VIFConstant -1464.4 411.4 -3.56 0.001oxygen 88.31 16.47 5.36 0.000 99.9oxygensq -0.5362 0.1582 -3.39 0.002 99.9

S = 106.4 R-Sq = 93.8% R-Sq(adj) = 93.3%

Analysis of Variance

Source DF SS MS F PRegression 2 4602211 2301105 203.16 0.000Residual Error 27 305818 11327Total 29 4908029

Source DF Seq SSoxygen 1 4472047oxygensq 1 130164

A multicollinearity problem

7060504030

oxygen

Pearson correlation of oxygen and oxygensq = 0.995

“Center” the predictors

637.50OxygenOxCent

2637.50 OxygenOxCentSq

Mean of oxygen = 50.637

oxygen oxcent oxcentsq 34.6 -16.037 257.185 45.0 -5.637 31.776 62.3 11.663 136.026 58.9 8.263 68.277 42.5 -8.137 66.211 44.3 -6.337 40.158 67.9 17.263 298.011 58.5 7.863 61.827 35.6 -15.037 226.111 49.6 -1.037 1.075 33.0 -17.637 311.064

Does it really work?

20100-10-20

oxcent

Pearson correlation of oxcent and oxcentsq = 0.219

A better quadratic polynomial regression function

iiii xxY 2*11

XXx ii where denotes the centered predictor, and

β*0 = mean response at the predictor mean

β*1 = “linear effect coefficient”

β*11 = “quadratic effect coefficient”

The regression equation isigg = 1632 + 34.0 oxcent - 0.536 oxcentsq

Predictor Coef SE Coef T P VIFConstant 1632.20 29.35 55.61 0.000oxcent 34.000 1.689 20.13 0.000 1.1oxcentsq -0.5362 0.1582 -3.39 0.002 1.1

S = 106.4 R-Sq = 93.8% R-Sq(adj) = 93.3%

Analysis of Variance

Source DF SS MS F PRegression 2 4602211 2301105 203.16 0.000Residual Error 27 305818 11327Total 29 4908029

Source DF Seq SSoxcent 1 4472047oxcentsq 1 130164

Interpretation of the regression coefficients

• b0 is predicted response at the predictor mean.

• b1 is the estimated slope of the tangent line at the predictor mean; and, typically, also the estimated slope in the simple model.

• b2 indicates the up/down direction of curve

– b2 < 0 means curve is concave down

– b2 > 0 means curve is concave up

20 10 0-10-20

oxcent

S = 106.427 R-Sq = 93.8 % R-Sq(adj) = 93.3 %

igg = 1632.20 + 33.9995 oxcent - 0.536247 oxcent**2

Regression Plot

Estimated regression function

Similar estimates

20 10 0-10-20

oxcent

S = 124.783 R-Sq = 91.1 % R-Sq(adj) = 90.8 %

igg = 1557.63 + 32.7427 oxcent

Regression Plot

The relationship between the two forms of the model

ˆiii xbxbbY Centered model:

ˆiii XbXbbY Original model:

XbXbbb

Where:

25362.00.342.1632ˆiii xxY

5362.0

3.88)637.50)(5362.(234

3.1464)637.50(5362.0)637.50(342.1632

2536.031.884.1464ˆiii XXY

Mean of oxygen = 50.637

200015001000

Fitted Value

lResiduals Versus the Fitted Values

(response is igg)

2001000-100-200

Residual

Normal Probability Plot of the Residuals(response is igg)

What is predicted IgG if maximal oxygen uptake is 90?

There is an even greater danger in extrapolation when modeling data with a polynomial function, because of changes in direction.

Predicted Values for New Observations

New Obs Fit SE Fit 95.0% CI 95.0% PI1 2139.6 219.2 (1689.8,2589.5) (1639.6,2639.7) XXX denotes a row with X values away from the centerXX denotes a row with very extreme X values

Values of Predictors for New Observations

New Obs oxcent oxcentsq1 39.4 1549

It is possible to “overfit” the data with polynomial models.

S = 2.62950 R-Sq = 64.0 % R-Sq(adj) = 0.0 %

- 8.64286 x**2 + 0.666667 x**3

y = -38.4 + 34.9762 x

Regression Plot

It is even theoretically possible to fit the data perfectly.

If you have n data points, then a polynomial of order n-1 will fit the data perfectly, that is, it will pass through each data point.

** Error ** Not enough non-missing observations to fit a polynomial of this order; execution aborted

But, good statistical software will keep an unsuspecting user from fitting such a model.

The hierarchical approach to model fitting

Widely accepted approach is to fit a higher-order model and then explore whether a lower-order (simpler) model is adequate.

iiiii xxxY 3111

Is a first-order linear model (“line”) adequate?

0: 111110 H

The hierarchical approach to model fitting

But then … if a polynomial term of a given order is retained, then all related lower-order terms are also retained.

That is, if a quadratic term was significant, you would use this regression function:

21110 iii xxYE

2110 ii xYE

and not this one:

Example

• Quality of a product (y) – a score between 0 and 100

• Temperature (x1) – degrees Fahrenheit

• Pressure (x2) – pounds per square inch

82.725

53.375

82.72553.375

9585 57.552.5

quality

pressure

A two-predictor, second-order polynomial regression function

iiiiiiii XXXXXXY 21122222

211122110

where:

• Yi = quality

• Xi1 = temperature

• Xi2 = pressure

• β12 = “interaction effect coefficient”

The regression equation isquality = - 5128 + 31.1 temp + 140 pressure - 0.133 tempsq - 1.14 presssq - 0.145 tp

Predictor Coef SE Coef T P VIFConstant -5127.9 110.3 -46.49 0.000temp 31.096 1.344 23.13 0.000 1154.5pressure 139.747 3.140 44.50 0.000 1574.5tempsq -0.133389 0.006853 -19.46 0.000 973.0Press -1.14422 0.02741 -41.74 0.000 1453.0tp -0.145500 0.009692 -15.01 0.000 304.0

S = 1.679 R-Sq = 99.3% R-Sq(adj) = 99.1%

Again, some correlation

quality temp pressure tempsq presssqtemp -0.423pressure 0.182 0.000tempsq -0.434 0.999 0.000presssq 0.162 0.000 1.000 -0.000tp -0.227 0.773 0.632 0.772 0.632

Cell Contents: Pearson correlation

A better two-predictor, second-order polynomial regression function

iiiiiiii xxxxxxY 21*12

where:

• Yi = quality

• xi1 = centered temperature

• xi2 = centered pressure

• β*12 = “interaction effect coefficient”

Reduced correlation

quality tcent pcent tpcent tcentsqtcent -0.423pcent 0.182 0.000tpcent -0.274 0.000 0.000tcentsq -0.355 -0.000 0.000 0.000pcentsq -0.762 0.000 0.000 0.000 -0.000

Cell Contents: Pearson correlation

The regression equation isquality = 94.9 - 0.916 tcent + 0.788 pcent - 0.146 tpcent - 0.133 tcentsq - 1.14 pcentsq

Predictor Coef SE Coef T P VIFConstant 94.9259 0.7224 131.40 0.000tcent -0.91611 0.03957 -23.15 0.000 1.0pcent 0.78778 0.07913 9.95 0.000 1.0tpcent -0.145500 0.009692 -15.01 0.000 1.0tcentsq -0.133389 0.006853 -19.46 0.000 1.0pcentsq -1.14422 0.02741 -41.74 0.000 1.0

S = 1.679 R-Sq = 99.3% R-Sq(adj) = 99.1%

100908070605040

Fitted Value

Residuals Versus the Fitted Values(response is quality)

3210-1-2-3

Residual

Normal Probability Plot of the Residuals(response is quality)

Predicted Values for New Observations

New Obs Fit SE Fit 95.0% CI 95.0% PI1 94.926 0.722 (93.424,96.428) (91.125,98.726)

Values of Predictors for New Observations

New Obs tcent pcent tpcent tcentsq pcentsq1 0.0000 0.0000 0.0000 0.0000 0.0000

polynomial regression models possible models for when the response function is “curved”

curved slide

polynomial function

regression coefficients

mean of oxygen

interpretation of b

scatter plot slide

true function

oxygen oxcent oxcentsq

Documents

lecture 25 multiple regression diagnostics (sections...

polynomial spline estimation and inference of proportional...

mixed-eï¬€ects polynomial regression models chapter 5

on the multicomponent polynomial solution models

curved a algebras and landau-ginzburg...

chapter 4€¦ · chapter 4 polynomial and rational...

local polynomial kernel regression for generalized linear...

college algebra and trigonometry€¦ · zeros polynomial...

holt mcdougal algebra 2 curve fitting with polynomial models...

multi-fidelity sparse polynomial chaos surrogate models...

polynomial chaos expansions for quantifying uncertainty in...

polynomial functions advanced math chapter 3. quadratic...

extracting articulation models from cad models of parts...

surface normals and lighting vectors normals lighting...

nonlinear relationships - university of notre...

chapter 2 polynomial and rational functions · chapter 2...

forward-curved - greenheck · march 2016 centrifugal roof...

polynomial models of finite dynamical systems

polynomial chaos expansion models for the … · 1...

polynomial...