© 1998, geoff kuenning linear regression models what is a (good) model? estimating model parameters...

33
© 1998, Geoff Kuenning Linear Regression Models • What is a (good) model? • Estimating model parameters • Allocating variation • Confidence intervals for regressions • Verifying assumptions visually

Upload: ashley-byrd

Post on 30-Dec-2015

221 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: © 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions

© 1998, Geoff Kuenning

Linear Regression Models

• What is a (good) model?• Estimating model parameters• Allocating variation• Confidence intervals for regressions• Verifying assumptions visually

Page 2: © 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions

Data Analysis OverviewExperimentalenvironment

prototypereal sys

exec-driven

sim

trace-driven

sim

stochasticsim

Workloadparameters

SystemConfig

parameters

Factorlevels Raw Data

Samples

Samples

.

.

.

Predictorvalues x,

factor levels

Samples of response

. . .

y1

y2

yn

Regression models• response var = f (predictors)

Page 3: © 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions

© 1998, Geoff Kuenning

What Is a (Good) Model?

• For correlated data, model predicts response given an input

• Model should be equation that fits data• Standard definition of “fits” is least-squares

– Minimize squared error– While keeping mean error zero– Minimizes variance of errors

Page 4: © 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions

© 1998, Geoff Kuenning

Least-Squared Error

• If then error in estimate for xi is

• Minimize Sum of Squared Errors (SSE)

• Subject to the constraint

y b b x 0 1

e y b b xii

n

i ii

n2

10 1

2

1

e y b b xii

n

i ii

n

10 1

1

0

ei = yi - yi

Page 5: © 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions

© 1998, Geoff Kuenning

Estimating Model Parameters

• Best regression parameters are

where

• Note error in eq. 14.1 book!

bxy nxy

x nx1 2 2

b y b x0 1

xn

xi 1y

nyi 1

xy x yi i x xi2 2

Page 6: © 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions

© 1998, Geoff Kuenning

Parameter EstimationExample

• Execution time of a script for various loop counts:

• = 6.8, = 2.32, xy = 88.54, x2 = 264

• b0 = 2.32 (0.29)(6.8) = 0.35

Loops 3 5 7 9 10Time 1.19 1.73 2.53 2.89 3.26

x y

b1 2

88 54 5 6 8 2 32

264 5 6 80 29

. . .

..

Page 7: © 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions

© 1998, Geoff Kuenning

Graph of Parameter Estimation Example

0

1

2

3

3 4 5 6 7 8 9 10

Page 8: © 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions

© 1998, Geoff Kuenning

Allocating Variation

• If no regression, best guess of y is• Observed values of y differ from , giving

rise to errors (variance)• Regression gives better guess, but there are

still errors• We can evaluate quality of regression by

allocating sources of errors

y

y

Page 9: © 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions

© 1998, Geoff Kuenning

The Total Sum of Squares

• Without regression, squared error is

SST

SSY SS

y y y y y y

y y y ny

y y ny ny

y ny

ii

n

i ii

n

ii

n

ii

n

ii

n

ii

n

2

1

2 2

1

2

1 1

2

2

1

2

2

1

2

2

2

2

0

Page 10: © 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions

© 1998, Geoff Kuenning

The Sum of Squaresfrom Regression

• Recall that regression error is

• Error without regression is SST• So regression explains SSR = SST - SSE• Regression quality measured by coefficient

of determination

R2 SSR

SST

SST SSE

SST

SSE e y yi i i2

Page 11: © 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions

© 1998, Geoff Kuenning

Evaluating Coefficientof Determination

• Compute• Compute

• Compute

SST ( ) y ny2 2

SSE y b y b xy20 1

R2 SST SSE

SST

Page 12: © 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions

© 1998, Geoff Kuenning

Example of Coefficientof Determination

• For previous regression example

– y = 11.60, y2 = 29.79, xy = 88.54,

– SSE = 29.79-(0.35)(11.60)-(0.29)(88.54) = 0.05

– SST = 29.79-26.9 = 2.89– SSR = 2.89-.05 = 2.84– R2 = (2.89-0.05)/2.89 = 0.98

3 5 7 9 101.19 1.73 2.53 2.89 3.26

ny2 25 2 32 26 9 . .

Page 13: © 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions

© 1998, Geoff Kuenning

Standard Deviationof Errors

• Variance of errors is SSE divided by degrees of freedom– DOF is n2 because we’ve calculated 2

regression parameters from the data– So variance (mean squared error, MSE)

is SSE/(n2)

• Standard deviation of errors is square root:

sne

SSE

2

Page 14: © 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions

© 1998, Geoff Kuenning

Checking Degreesof Freedom

• Degrees of freedom always equate:– SS0 has 1 (computed from )– SST has n1 (computed from data and ,

which uses up 1)– SSE has n2 (needs 2 regression

parameters)– So

y

y

SST SSY SS SSR SSE

( )

0

1 1 1 2n n n

Page 15: © 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions

© 1998, Geoff Kuenning

Example ofStandard Deviation of Errors

• For our regression example, SSE was 0.05, so MSE is 0.05/3 = 0.017 and se = 0.13

• Note high quality of our regression:– R2 = 0.98– se = 0.13– Why such a nice straight-line fit?

Page 16: © 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions

© 1998, Geoff Kuenning

Confidence Intervals for Regressions

• Regression is done from a single population sample (size n)– Different sample might give different

results– True model is y = 0 + 1x– Parameters b0 and b1 are really means

taken from a population sample

Page 17: © 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions

© 1998, Geoff Kuenning

Calculating Intervalsfor Regression Parameters

• Standard deviations of parameters:

• Confidence intervals arewhere t has n - 2 degrees of freedom

s sn

x

x nx

ss

x nx

b e

be

0

1

1 2

2 2

2 2

b tsi bi

Page 18: © 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions

© 1998, Geoff Kuenning

Example of Regression Confidence Intervals

• Recall se = 0.13, n = 5, x2 = 264, = 6.8

• So

• Using a 90% confidence level, t0.95;3 = 2.353

x

s

s

b

b

0

1

0 131

5

6 8

264 5 6 80 16

0 13

264 5 6 80 004

2

2

2

.( . )

( . ).

.

( . ).

Page 19: © 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions

© 1998, Geoff Kuenning

Regression Confidence Example, cont’d

• Thus, b0 interval is

– Not significant at 90%

• And b1 is

– Significant at 90% (and would survive even 99.9% test)

0 29 2 353 0 004 0 28 0 30. . ( . ) ( . , . )

0 35 2 353 0 16 0 03 0 73. . ( . ) ( . , . )

Page 20: © 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions

© 1998, Geoff Kuenning

Confidence Intervalsfor Nonlinear Regressions

• For nonlinear fits using exponential transformations:– Confidence intervals apply to

transformed parameters– Not valid to perform inverse

transformation on intervals

Page 21: © 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions

© 1998, Geoff Kuenning

Confidence Intervalsfor Predictions

• Previous confidence intervals are for parameters– How certain can we be that the

parameters are correct?• Purpose of regression is prediction

– How accurate are the predictions?– Regression gives mean of predicted

response, based on sample we took

Page 22: © 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions

© 1998, Geoff Kuenning

Predicting m Samples

• Standard deviation for mean of future sample of m observations at xp is

• Note deviation drops as m • Variance minimal at x =

• Use t-quantiles with n–2 DOF for interval

s s

m n

x x

x nxy ep

mp

1 1

2

2 2

x

Page 23: © 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions

© 1998, Geoff Kuenning

Example of Confidenceof Predictions

• Using previous equation, what is predicted time for a single run of 8 loops?

• Time = 0.35 + 0.29(8) = 2.67• Standard deviation of errors se = 0.13

• 90% interval is then

sy p .

.

( . ).

10 13 1

1

5

8 6 8

264 5 6 80 14

2

2 67 2 353 0 14 2 34 3 00. . ( . ) ( . , . )

Page 24: © 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions

© 1998, Geoff Kuenning

Verifying Assumptions Visually

• Regressions are based on assumptions:– Linear relationship between response y and

predictor x• Or nonlinear relationship used in fitting

– Predictor x nonstochastic and error-free– Model errors statistically independent

• With distribution N(0,c) for constant c

• If assumptions violated, model misleading or invalid

Page 25: © 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions

© 1998, Geoff Kuenning

Testing Linearity

• Scatter plot x vs. y to see basic curve type

Linear Piecewise Linear

Outlier Nonlinear (Power)

Page 26: © 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions

© 1998, Geoff Kuenning

Testing Independenceof Errors

• Scatter-plot i versus• Should be no visible trend• Example from our curve fit:

yi

-0.1

-0.05

0

0.05

0.1

0.15

0 0.5 1 1.5 2 2.5 3 3.5 4

Page 27: © 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions

© 1998, Geoff Kuenning

More on Testing Independence

• May be useful to plot error residuals versus experiment number– In previous example, this gives same plot

except for x scaling• No foolproof tests

Page 28: © 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions

© 1998, Geoff Kuenning

Testing for Normal Errors

• Prepare quantile-quantile plot• Example for our regression:

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

-1.3 0 1.3

Page 29: © 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions

© 1998, Geoff Kuenning

Testing for Constant Standard Deviation

• Tongue-twister: homoscedasticity• Return to independence plot• Look for trend in spread• Example:

-0.1

-0.05

0

0.05

0.1

0.15

0 0.5 1 1.5 2 2.5 3 3.5 4

Page 30: © 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions

© 1998, Geoff Kuenning

Linear RegressionCan Be Misleading

• Regression throws away some information about the data– To allow more compact summarization

• Sometimes vital characteristics are thrown away– Often, looking at data plots can tell you

whether you will have a problem

Page 31: © 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions

© 1998, Geoff Kuenning

Example of Misleading Regression

I II III IVx y x y x y x y

10 8.04 10 9.14 10 7.46 8 6.588 6.95 8 8.14 8 6.77 8 5.76

13 7.58 13 8.74 13 12.74 8 7.719 8.81 9 8.77 9 7.11 8 8.84

11 8.33 11 9.26 11 7.81 8 8.4714 9.96 14 8.10 14 8.84 8 7.046 7.24 6 6.13 6 6.08 8 5.254 4.26 4 3.10 4 5.39 19 12.50

12 10.84 12 9.13 12 8.15 8 5.567 4.82 7 7.26 7 6.42 8 7.915 5.68 5 4.74 5 5.73 8 6.89

Page 32: © 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions

© 1998, Geoff Kuenning

What Does Regression Tell Us About These Data Sets?

• Exactly the same thing for each!• N = 11• Mean of y = 7.5• Y = 3 + .5 X• Standard error of regression is 0.118• All the sums of squares are the same• Correlation coefficient = .82• R2 = .67

Page 33: © 1998, Geoff Kuenning Linear Regression Models What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions

© 1998, Geoff Kuenning

Now Look at the Data Plots

0

2

4

6

8

10

12

0 5 10 15 200

2

4

6

8

10

12

0 5 10 15 20

0

2

4

6

8

10

12

14

0 2 4 6 8 10 12 14

0

2

4

6

8

10

12

14

0 5 10 15 20

I II

III IV