find the least squares regression line and interpret its slope, y-intercept, and the coefficients of...

62
Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination Justify the regression AP Statistics Objectives Ch8

Upload: malcolm-mitchell

Post on 17-Jan-2016

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination

Justify the regression model using the scatterplot and residual plot

AP Statistics Objectives Ch8

Page 2: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

Model Residuals Slope Regression to the meanInterceptR2

VocabularyLinear model

Predicted valueRegression line

Page 3: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

Residual Plot Vocabulary

Chapter 7 Answers

Linear Regression Practice

Regression Line Notes

Chapter 8 Assignments

Chp 8 Part I Day 2 Example

Lurking Variable

Page 4: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

Lurking Variable

Page 5: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

Chapter 8 #1r

a) 10 2 20 3 0.5b) 2 0.06 7.2 1.2 -0.4c) 12 6 -0.8 200-4xd) 2.5 12 100 -100+50x

𝑏1=𝑟 𝑠𝑦𝑠𝑥

�̂�=𝒃𝟎+𝒃𝟏 𝒙

�̂�=𝟏𝟐 .𝟓+𝟎 .𝟕𝟓 𝒙

Page 6: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

Chapter 8 #1r

a) 10 2 20 3 0.5b) 2 0.06 7.2 1.2 -0.4c) 12 6 -0.8 200-4xd) 2.5 12 100 -100+50x𝑏1=

𝑟 𝑠𝑦𝑠𝑥

�̂�=𝒃𝟎+𝒃𝟏 𝒙

�̂�=𝟏𝟐 .𝟓+𝟎 .𝟕𝟓 𝒙

Page 7: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

Chapter 8 #1r

a) 10 2 20 3 0.5b) 2 0.06 7.2 1.2 -0.4c) 12 6 -0.8 200-4xd) 2.5 12 100 -100+50x

𝑏1=𝑟 𝑠𝑦𝑠𝑥200-4x

�̂�=𝟏𝟐 .𝟓+𝟎 .𝟕𝟓 𝒙�̂�=𝟐𝟑 .𝟐−𝟖𝒙

Page 8: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

Chapter 8 #1r

a) 10 2 20 3 0.5b) 2 0.06 7.2 1.2 -0.4c) 12 6 -0.8 200-4xd) 2.5 1.2 100 -100+50x

𝑏1=𝑟 𝑠𝑦𝑠𝑥

-100+50x

�̂�=𝟏𝟐 .𝟓+𝟎 .𝟕𝟓 𝒙�̂�=𝟐𝟑 .𝟐−𝟖𝒙

𝟏𝟓𝟐𝟑𝟎

Page 9: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

Standardized Foot Length vs Height 2011

NOTE: (0,0) represents the mean of x and the mean of y.

𝑧 h𝐻𝑒𝑖𝑔 𝑡=0.84 𝑧𝐹𝑜𝑜𝑡𝑆𝑖𝑧𝑒

Slope is the correlation

is part of all regression

lines

Page 10: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

Regression Line for Standardized Values

=

is the predicted z-score for the response variable

is the z-score for the explanatory variable

𝑟 𝑖𝑠 h𝑡 𝑒𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡

Stand. Regres. Line will always pass through (.

Page 11: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

Regression Line for

= +

is the predicted response variable

is the y-intercept

=

is the slope

=

Regression Line will always pass through (.

Page 12: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

Explanatory or Response

Now interpret the R2. R2 = .697

According to the linear model, 69.7% of the variability in height is accounted for by variation in foot size.

Page 13: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

Explanatory or Response 2011 data resulted in the following linear equation:

CAREFUL! The equations are not the same when you switch

explanatory and response variables.

Page 14: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

Explanatory or Response 2011 data resulted in the following linear equation:

CAREFUL! The equations are not the same when you switch

explanatory and response variables.

Page 15: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

Residual Plot Example

Page 16: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

Residual Plot Example

REMEMBER: POSITIVE RESIDUALS are UNDERESTIMATES

e = y -

Page 17: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

Residual Plot Example

NEGATIVE RESIDUALS are OVERESTIMATES

e = y -

Page 18: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

Assignment

CHAPTER 8 Part I: pp. 189-190 #2,4,8&10,12&14Part II: pp. 190-192 #16,18,20,28&30

Page 19: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

Chapter 7 Answers

a) #1 shows little or no associationb) #4 shows a negative associationc) #2 & #4 each show a linear

associationd) #3 shows a moderately strong,

curved associatione) #2 shows a very strong association

Page 20: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

Chapter 7 Answers

a) -0.977b) 0.736c) 0.951d) -0.021

Page 21: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

Chapter 7 Answers

The researcher should have plotted the data first. A strong, curved relationship may have a very low correlation. In fact, correlation is only a useful measure of the strength of a linear relationship.

Page 22: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

Chapter 7 Answers

If the association between GDP and infant mortality is linear, a correlation of -0.772 shows a moderate, negative association.

Page 23: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

Chapter 7 Answers

Continent is a categorical variable. Correlation measures the strength of linear associations between quantitative variables.

Page 24: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

Chapter 7 Answers

Correlation must be between -1 and 1, inclusive. Correlation can never be 1.22.

Page 25: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

Chapter 7 Answers

A correlation, no matter how strong, cannot prove a cause-and-effect relationship.

Page 26: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

Chapter 8 Vocabulary1) Regression to the mean – each predicted response variable (y) tends to be closer to the mean (in standard deviations) than its corresponding explanatory variable (x)

Page 27: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

Chapter 8 Vocabulary2) – predicted response variable

3) Residual – the difference between the actual response value and the predicted response value

e = y - 4) Overestimate – produces a negative residual

5) Underestimate – produces a positive residual

Page 28: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

Chapter 8 Vocabulary6) Slope – rate of change given in units of the response variable (y) per unit of the explanatory variable (x)

7) intercept – response value when the explanatory value is zero

8) R2 – Must also be interpreted when describing a regression model (aka Coefficient of Determination)

Page 29: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

Chapter 8 Vocabulary8) R2 – Must also be interpreted when describing a regression model

“According to the linear model, _____% of the variability in _______ (response variable) is accounted for by variation in ________ (explanatory variable)”

The remaining variation is due to the residuals

Page 30: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

Chapter 8 VocabularyCONDITIONS FOR USING A LINEAR REGRESSION

1) Quantitative Variables – Check the variables2) Straight Enough – Check the scatterplot 1st

(should be nearly linear) - Check the residual plot next

(should be random scatter)3) Outlier Condition-

- Any outliers need to be investigated

Page 31: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

Chapter 8 Vocabulary9. Residual Plot - a scatterplot of the residuals and either x or

If you find a pattern in the Residual Plot, that means the residuals (errors) are predictable. If the residuals are predictable, then a better model exists. ---- LINEAR MODEL IS NOT APPROPRIATE. A residual plot is done with the RESIDUALS on the y-axis. On the x-axis, put the explanatory variable.

NOTE: Some software packages will put on the x-axis. This does not change the presence of (or lack of) of a pattern.

Page 32: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

Chapter 8 Vocabulary9. Residual Plot - a scatterplot of the residuals and either x or

If you find a pattern in the Residual Plot, that means the residuals (errors) are predictable. If the residuals are predictable, then a better model exists. ---- LINEAR MODEL IS NOT APPROPRIATE. A residual plot is done with the RESIDUALS on the y-axis. On the x-axis, put the explanatory variable.

NOTE: Some software packages will put on the x-axis. This does not change the presence of (or lack of) of a pattern.

Page 33: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

What is the ?

Did you say 2? Wrong. Try again.

It is actually because both (2)2 and (-2)2 is 4.

So what?

Page 34: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

Important Note: The correlation is not given directly in this software package. You need to look in two places for it. Taking the square root of the “R squared” (coefficient of determination) is not enough. You must look at the sign of the slope too. Positive slope is a positive r-value. Negative slope is a negative r-value.

Page 35: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

So here you should note that the slope is negative. The correlation will be negative too. Since R2 is 0.482, r will be -0.694.

S/F Ratio

Grad Rate

-0.07861

Page 36: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

Coefficient of Determination =

(0.694)2 = 0.4816

Page 37: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

0.4816

With the linear regression model, 48.2% of the variability in airline fares is accounted for by the variation in distance of the flight.

Page 38: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

𝑏1=𝑟𝑠𝑦𝑠𝑥

¿0.694𝟓𝟔 .𝟑𝟕497.8

¿0.0786

There is an increase of 7.86 cents for every additional mile.

#10. Interpret the slope.

There is an increase of $7.86 for every additional 100 miles.

Page 39: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

𝑏1=𝑟𝑠𝑦𝑠𝑥

¿0.694𝟓𝟔 .𝟑𝟕497.8

There is an increase of 7.86 cents for every additional mile.

#10. Interpret the slope.

There is an increase of $7.86 for every additional 100 miles.

Page 40: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

244.33 = + (0.0786)(853.7)

𝑏1=0.0786

𝑦=𝑏0+𝑏1𝑥

244.33 – (0.0786)(853.7) =

#9. Interpret the y-intercept.

The model predicts a flight of zero miles will cost $177.23. The airline may have built in an initial cost to pay for some of its expenses.

177.2292=

Page 41: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

𝑏1=0.0786

177.2292 + 0.0786Distance

Page 42: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

𝑏1=0.0786

177.2292 + 0.0786Distance

177.2292 + 0.0786(200)

$192.95

Page 43: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

177.2292 + 0.0786Distance

177.2292 + 0.0786(200)

$192.95

177.2292 + 0.0786(2000)

$334.43

Page 44: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

8. Using those estimates, draw the line on the scatterplot.

177.2292 + 0.0786(200) = $192.95

177.2292 + 0.0786(2000) = $334.43

Page 45: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

177.2292 + 0.0786Distance

177.2292 + 0.0786(1719)

$312.34

y –

212 –

-$100.34

Page 46: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

12. In general, a positive residual means

13. In general, a negative residual means

The model underestimatedthe actual value.

The model overestimatedthe actual value.

Page 47: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

A linear model should be appropriate, because

1) the scatterplot shows a nearly linear form and

2) the residual plot shows random scatter.

Page 48: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

The coefficient of determination is .482, so

the coefficient of correlation is = .694. This shows a moderate strength in association for the model.

Page 49: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

$150 for a flight of about 700 miles seems low compared to the other fares.

Page 50: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

“fare” is the response variable. Not all software will call it the dependent variable.Always look for “Constant” and what is listed beside it. Here above it shows the column is for the “variable” and below “dist” is the explanatory variable.

Page 51: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

Recall:For y = 3x + 1 the coefficient of x is ‘3’.For computer printouts this is the key column for your regression model.

Page 52: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

Recall:For y = 3x + 1 the coefficient of x is ‘3’.For computer printouts this is the key column for your regression model.

The “Coefficient” of the “Constant” is the y-intercept for your linear regression.

Page 53: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

Recall:For y = 3x + 1 the coefficient of x is ‘3’.For computer printouts this is the key column for your regression model.

The “Coefficient” of the “Constant” is the y-intercept for your linear regression.

The “Coefficient” of the variable “dist” is the slope for your linear regression.

Page 54: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

177.215 + 0.078619distance

Recall:For y = 3x + 1 the coefficient of x is ‘3’.For computer printouts this is the key column for your regression model.

The “Coefficient” of the “Constant” is the y-intercept for the linear regression.

The “Coefficient” of the variable “dist” is the slope for the linear regression.

Page 55: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

177.215 + 0.078619distance

177.215 + 0.078619(1000)

5. Predict the airfare for a 1000-mile flight.

¿ $𝟐𝟓𝟓 .𝟖𝟑

Page 56: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

Note: Even when we switchthe response and explanatory

variables, the linear modelis still appropriate.

Page 57: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

-644.287 + 6.13101fare

R2 doesn’t change, but the equation does.

Page 58: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

-644.287 + 6.13101fare

-644.287 + 6.13101

= 924.2 miles

Page 59: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

-644.287 + 6.13101fare

-644.287 + 6.13101

= 924.2 miles

8. Residual? e = y - = 924.2 – 1000 = -75.8

Page 60: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

Chp 8 #17R squared = 92.4%

17a. What is the correlation between tar and nicotine? (NOTE: scatterplot shows a strong positive linear association.)

+ =

Page 61: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

Chp 8 #17R squared = 92.4%

17b. What would you predict about the average nicotine content of cigarettes that are 2 standard deviations below average in tar content.

= r

r=

= 0 = -1.922

I would predict that the nicotine content would be 1.922 standard deviations below the average.

Page 62: Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression

Chp 8 #17R squared = 92.4%

17c. If a cigarette is 1 standard deviation above average in nicotine content, what do you suspect is true about its tar content?

= r

r=

= 0 = 0.961

I would predict that the tar content would be 0.961 standard deviations above the average.