chapter 5

27
Chapter 5 Residuals, Residual Plots, Coefficient of determination, & Influential points

Upload: titus

Post on 05-Jan-2016

23 views

Category:

Documents


2 download

DESCRIPTION

Chapter 5. Residuals, Residual Plots, Coefficient of determination, & Influential points. Residuals (error) -. The vertical deviation between the observations & the LSRL the sum of the residuals from the LSRL is always zero error = observed - expected. Residual plot. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Chapter 5

Chapter 5

Residuals, Residual Plots, Coefficient of determination,

& Influential points

Page 2: Chapter 5

Residuals (error) -Residuals (error) -

• The vertical deviation between the observations & the LSRL

• the sum of the residuals from the LSRL is alwaysalways zero zero

• error = observed - expected

yy ˆresidual

Page 3: Chapter 5

Residual plotResidual plot

• A scatterplot of the (x, residual) pairs.

• Residuals can be graphed against other statistics besides x

• Purpose is to tell if a linear associationlinear association exist between the x & y variables

Page 4: Chapter 5

Consider a population of adult women. Let’s examine the relationship between their height and weight.

Height

Weight

60 64 68

Page 5: Chapter 5

Suppose we now take a random sample from our population of women.

Height

Weight

60 64 68

Residuals

Page 6: Chapter 5

Residual plotResidual plot

• A scatterplot of the (x, residual) pairs.

• Residuals can be graphed against other statistics besides x

• Purpose is to tell if a linear associationlinear association exist between the x & y variables

• If no patternno pattern exists between the points in the residual plot, then the association is linearlinear.

Page 7: Chapter 5

Residuals

x

Residuals

x

LinearLinear Not linearNot linear

Page 8: Chapter 5

Age Range of Motion

35 154

24 142

40 137

31 133

28 122

25 126

26 135

16 135

14 108

20 120

21 127

30 122

One measure of the success of knee surgery is post-surgical range of motion for the knee joint following a knee dislocation. Is there a linear relationship between age & range of motion?

Predicted range of motion = 107.58 + .87(age)

Graph the data and find the LSRL:

Page 9: Chapter 5

Age Range of Motion

35 154

24 142

40 137

31 133

28 122

25 126

26 135

16 135

14 108

20 120

21 127

30 122

Predicted range of motion = 107.58 + .87(age)

Find the predicted

y’s:

Find the residuals:

Page 10: Chapter 5

Age Range of Motion

35 154

24 142

40 137

31 133

28 122

25 126

26 135

16 135

14 108

20 120

21 127

30 122

One measure of the success of knee surgery is post-surgical range of motion for the knee joint following a knee dislocation. Is there a linear relationship between age & range of motion?

Sketch a residual plot.

Since there is no pattern in the residual plot, there is a linear relationship between age and range of motion

x

Res

idua

ls

Page 11: Chapter 5

Age Range of Motion

35 154

24 142

40 137

31 133

28 122

25 126

26 135

16 135

14 108

20 120

21 127

30 122

Plot the residuals against the y-hats. How does this residual plot compare to the previous one?

Res

idua

ls

y

Page 12: Chapter 5

Residual plots are the same no matter if plotted against x or y-hat.

x

Res

idua

ls

Res

idua

ls

y

Page 13: Chapter 5

Coefficient of determination-Coefficient of determination-• r2

• gives the approximate proportion of variationvariation in yy that can be attributed to a linear relationship between x & y

• remains the same no matter which variable is labeled x

Page 14: Chapter 5

Interpretation of r2

Approximately rr22%% of the variation in yy can be explained by the LSRL of xx & yy.

Page 15: Chapter 5

Age Range of Motion

35 154

24 142

40 137

31 133

28 122

25 126

26 135

16 135

14 108

20 120

21 127

30 122

How well does age predict the range of motion after knee surgery?

Approximately 30.6% of the variation in range of motion after knee surgery can be explained by the linear regression of age and range of motion.

Page 16: Chapter 5

Age Range of Motion

35 154

24 142

40 137

31 133

28 122

25 126

26 135

16 135

14 108

20 120

21 127

30 122

Let’s examine r2.

Suppose you were going to predict a future y but you didn’t know the x-value. Your best guess would be the overall mean of the existing y’s.

SSy = 1564.917

Sum of the squared residuals (errors) using

the mean of y.

308.130y

Page 17: Chapter 5

Age Range of Motion

35 154

24 142

40 137

31 133

28 122

25 126

26 135

16 135

14 108

20 120

21 127

30 122

Now suppose you were going to predict a future y but you DO know the x-value. Your best guess would be the point on the LSRL for that x-value (y-hat).

Sum of the squared residuals (errors) using the LSRL.

SSy = 1085.735

583.107871.ˆ xy

Page 18: Chapter 5

Age Range of Motion

35 154

24 142

40 137

31 133

28 122

25 126

26 135

16 135

14 108

20 120

21 127

30 122

By what percent did the sum of the squared error go down when you went from just an “overall mean” model to the “regression on x” model?

SSy = 1085.735

SSy = 1564.917

3062916671564

7351085916671564

SSE

SSESSE

y

yy

..

..

ˆ

This is r2 – the amount of the

variation in the y-values that is explained by the x-values.

Page 19: Chapter 5

Computer-generated regression analysis of knee surgery data:

Predictor Coef Stdev T P

Constant 107.58 11.12 9.67 0.000

Age 0.8710 0.4146 2.10 0.062

s = 10.42 R-sq = 30.6% R-sq(adj) = 23.7%

x . . y 871058107ˆ 5532.r

What is the equation of the LSRL?

Find the slope & y-intercept.

NEVER use adjusted r2!

Be sure to convert r2 to decimal beforebefore taking the square

root!

What are the correlation coefficient and the coefficient of

determination?

Page 20: Chapter 5

Outlier –Outlier –

• In a regression setting, an outlier is a data point with a largelarge residual

Page 21: Chapter 5

Influential point-Influential point-

• A point that influences where the LSRL is located

• If removed, it will significantly change the slope of the LSRL

• Usually small residual (or 0)

Page 22: Chapter 5

Racket Resonance Acceleration

(Hz) (m/sec/sec)

1 105 36.0

2 106 35.0

3 110 34.5

4 111 36.8

5 112 37.0

6 113 34.0

7 113 34.2

8 114 33.8

9 114 35.0

10 119 35.0

11 120 33.6

12 121 34.2

13 126 36.2

14 189 30.0

One factor in the development of tennis elbow is the impact-induced vibration of the racket and arm at ball contact.

Sketch a scatterplot of these data.

Calculate the LSRL & correlation coefficient.

Does there appear to be an influential point? If so, remove it and then calculate the new LSRL & correlation coefficient.

Page 23: Chapter 5

(189,30) could be influential. Remove & recalculate LSRL

Predicted acceleration = 42.37 - .06(resonance)r = -.775 r2 = 60.1%

Page 24: Chapter 5

(189,30) was influential since it moved the LSRL

Predicted acceleration = 38.81 - .033(resonance)r = -.174 r2 = 3%

Page 25: Chapter 5

Which of these measures are Which of these measures are resistant?resistant?

• LSRL

• Correlation coefficient

• Coefficient of determination

NONENONE – all are affected by outliers

Page 26: Chapter 5

Year Tuition

2002 $4228

2003 $4768

2004 $5179

2005 $5495

2006 $5808

2007 $6038

2008 $6299

2009 $6459

Find the correlation coefficient and describe the relationship.

r = .9861There is a strong, positive, linear relationship between tuition and year at the UofA.

Find the LSRL:

Predicted tuition = 3821.26 + 311.45(year)

Interpret the slope.

For each 1 year increase, UA tuition goes up by an average of $311.45.

Find the coefficient of determination. Interpret in context of problem.

r2 = 97.2%97.2% of the variation in tuition can be explained by the linear relationship between tuition and year at the UofA.

Page 27: Chapter 5

Year Tuition

2002 $4228

2003 $4768

2004 $5179

2005 $5495

2006 $5808

2007 $6038

2008 $6299

2009 $6459

Make a residual plot of (x, residuals) and( y

, residuals). Sketch and compare.

x

y

Linear not best model. Definite curved pattern in residual plot!