chapters 10 and 11: using regression to predict
DESCRIPTION
Chapters 10 and 11: Using Regression to Predict. Math 1680. Overview. Predicting Values The Regression Line The RMS Error The Regression Effect A Second Regression Line Summary. Predicting Values. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Chapters 10 and 11: Using Regression to Predict](https://reader035.vdocuments.site/reader035/viewer/2022081512/56815a83550346895dc7ef5e/html5/thumbnails/1.jpg)
Chapters 10 and 11: Using Regression to Predict
Math 1680
![Page 2: Chapters 10 and 11: Using Regression to Predict](https://reader035.vdocuments.site/reader035/viewer/2022081512/56815a83550346895dc7ef5e/html5/thumbnails/2.jpg)
Overview
Predicting ValuesThe Regression LineThe RMS ErrorThe Regression EffectA Second Regression LineSummary
![Page 3: Chapters 10 and 11: Using Regression to Predict](https://reader035.vdocuments.site/reader035/viewer/2022081512/56815a83550346895dc7ef5e/html5/thumbnails/3.jpg)
Predicting Values
We have previously seen that a pair of data sets, X and Y, can be characterized by their five-statistic summary
µX, the average value in X SDX, the standard deviation of X µY, the average value in Y SDY, the standard deviation of Y r, the correlation coefficient
Often, we want to predict a y-value given a particular x-value
Want to use only the five-statistic summary to make prediction
![Page 4: Chapters 10 and 11: Using Regression to Predict](https://reader035.vdocuments.site/reader035/viewer/2022081512/56815a83550346895dc7ef5e/html5/thumbnails/4.jpg)
Predicting Values
Suppose we have the following five-number summary stats for the height (X) and weight (Y) of men in the US µX= 70 inches, SDX= 3 inches µY= 162 lbs, SDY= 30 lbs r = 0.47
If you had to guess what the weight of any man would be, what is your best bet?
![Page 5: Chapters 10 and 11: Using Regression to Predict](https://reader035.vdocuments.site/reader035/viewer/2022081512/56815a83550346895dc7ef5e/html5/thumbnails/5.jpg)
Predicting Values
Suppose we have the following five-number summary stats for the height (X) and weight (Y) of men in the US µX= 70 inches, SDX= 3 inches µY= 162 lbs, SDY= 30 lbs r = 0.47
Suppose you know the man is 1 SD above average Would your best guess for his weight be 1
SD above average?
![Page 6: Chapters 10 and 11: Using Regression to Predict](https://reader035.vdocuments.site/reader035/viewer/2022081512/56815a83550346895dc7ef5e/html5/thumbnails/6.jpg)
The SD line is the dashed line running through the scatter plot If we guessed 1
SD above average weight, where would we be on the plot?
What would a better guess be?
![Page 7: Chapters 10 and 11: Using Regression to Predict](https://reader035.vdocuments.site/reader035/viewer/2022081512/56815a83550346895dc7ef5e/html5/thumbnails/7.jpg)
The Regression Line
Suppose we have the following five-number summary stats for the height (X) and weight (Y) of men in the US µX= 70 inches, SDX= 3 inches µY= 162 lbs, SDY= 30 lbs r = 0.47
It turns out that the correlation coefficient determines the best guess For every SD we move in X, we should move
r SD’s in Y
![Page 8: Chapters 10 and 11: Using Regression to Predict](https://reader035.vdocuments.site/reader035/viewer/2022081512/56815a83550346895dc7ef5e/html5/thumbnails/8.jpg)
The Regression Line
The regression line from X to Y Runs through the point of averages Has a slope of r time the slope of the
SD line
The regression line predicts the average value for y within the narrowed-down range specified by a given x
![Page 9: Chapters 10 and 11: Using Regression to Predict](https://reader035.vdocuments.site/reader035/viewer/2022081512/56815a83550346895dc7ef5e/html5/thumbnails/9.jpg)
The Regression Line
The formula for the regression line from X to Y is
Or, alternately,
When is the regression line the same as the SD line?
YXX
Y xSD
SDry ))((
XY rzz
When r = 1 or -1
![Page 10: Chapters 10 and 11: Using Regression to Predict](https://reader035.vdocuments.site/reader035/viewer/2022081512/56815a83550346895dc7ef5e/html5/thumbnails/10.jpg)
The regression line is the solid line running through the scatter plot If we looked at
heights 1 SD above the average, the regression line runs through the point 0.47 SD’s above average in weight
![Page 11: Chapters 10 and 11: Using Regression to Predict](https://reader035.vdocuments.site/reader035/viewer/2022081512/56815a83550346895dc7ef5e/html5/thumbnails/11.jpg)
The Regression Line
Suppose we have the following five-number summary stats for the height (X) and weight (Y) of men in the US µX= 70 inches, SDX= 3 inches µY= 162 lbs, SDY= 30 lbs r = 0.47
What is the average weight of all the men who are 73 inches tall?For a man 73 inches tall, what weight should we predict?
176.1 lbs
![Page 12: Chapters 10 and 11: Using Regression to Predict](https://reader035.vdocuments.site/reader035/viewer/2022081512/56815a83550346895dc7ef5e/html5/thumbnails/12.jpg)
The Regression Line
Suppose we have the following five-number summary stats for the height (X) and weight (Y) of men in the US µX= 70 inches, SDX= 3 inches µY= 162 lbs, SDY= 30 lbs r = 0.47
What is the average weight of all the men who are 64 inches tall?For a man 64 inches tall, what weight should we predict?
133.8 lbs
![Page 13: Chapters 10 and 11: Using Regression to Predict](https://reader035.vdocuments.site/reader035/viewer/2022081512/56815a83550346895dc7ef5e/html5/thumbnails/13.jpg)
The Regression Line
To use the regression line from X to Y… Standardize the given x-value to get
zx
Use the regression equation to go from X to Y zY = rzX
Unstandardize zY to get y
![Page 14: Chapters 10 and 11: Using Regression to Predict](https://reader035.vdocuments.site/reader035/viewer/2022081512/56815a83550346895dc7ef5e/html5/thumbnails/14.jpg)
The Regression Line
Suppose we have the following five-number summary stats for the height (X) and weight (Y) of men in the US µX= 70 inches, SDX= 3 inches µY= 162 lbs, SDY= 30 lbs r = 0.47
Predict the weight of a man who is 6’4”
190.2 lbs
![Page 15: Chapters 10 and 11: Using Regression to Predict](https://reader035.vdocuments.site/reader035/viewer/2022081512/56815a83550346895dc7ef5e/html5/thumbnails/15.jpg)
The Regression Line
Suppose we have the following five-number summary stats for the height (X) and weight (Y) of men in the US µX= 70 inches, SDX= 3 inches µY= 162 lbs, SDY= 30 lbs r = 0.47
Predict the weight of a man who is 5’6”
143.2 lbs
![Page 16: Chapters 10 and 11: Using Regression to Predict](https://reader035.vdocuments.site/reader035/viewer/2022081512/56815a83550346895dc7ef5e/html5/thumbnails/16.jpg)
The Regression Line
Important notes about the regression line from X to Y It predicts the average value for y given an
x value If the scatter plot is football shaped, this
prediction will be above about half of the sample and below the other half
This is because the variables are approximately normal
The slope of the regression line will always be
)(x
y
SD
SDr
![Page 17: Chapters 10 and 11: Using Regression to Predict](https://reader035.vdocuments.site/reader035/viewer/2022081512/56815a83550346895dc7ef5e/html5/thumbnails/17.jpg)
The RMS Error
Recall that an average alone did not uniquely describe a data set A spread measure was needed Since the regression method only
gives us an average value as its prediction, we can’t really tell by this alone how good a guess it is
![Page 18: Chapters 10 and 11: Using Regression to Predict](https://reader035.vdocuments.site/reader035/viewer/2022081512/56815a83550346895dc7ef5e/html5/thumbnails/18.jpg)
The prediction given by the regression line for a height of 73 inches is at (73 in, 176 lbs) How much does
the heaviest 73” tall man weigh?
How much does the lightest 73” tall man weigh?
![Page 19: Chapters 10 and 11: Using Regression to Predict](https://reader035.vdocuments.site/reader035/viewer/2022081512/56815a83550346895dc7ef5e/html5/thumbnails/19.jpg)
The RMS Error
If we are given a specific man to predict, we are likely to be a little off with the regression prediction
You can think of the prediction error as being the vertical distance from the point to the regression line
That is, error = actual – predicted
If we want to get a good sense of what the typical error for a given x-value is, we can find the RMS of all the errors for all the points
This value is called the RMS error for the regression line
![Page 20: Chapters 10 and 11: Using Regression to Predict](https://reader035.vdocuments.site/reader035/viewer/2022081512/56815a83550346895dc7ef5e/html5/thumbnails/20.jpg)
The RMS Error
The RMS error is to the regression line what the SD is to the average The RMS error measures the spread around
a prediction from the regression line Recall we are generally assuming the data
sets are approximately normal About 68% of the points on a scatter plot will fall
within the strip that runs from one RMS error below to one RMS error above the regression line
![Page 21: Chapters 10 and 11: Using Regression to Predict](https://reader035.vdocuments.site/reader035/viewer/2022081512/56815a83550346895dc7ef5e/html5/thumbnails/21.jpg)
The RMS Error
Regression Line
1 RMS error,68%
2 RMS errors,95%
![Page 22: Chapters 10 and 11: Using Regression to Predict](https://reader035.vdocuments.site/reader035/viewer/2022081512/56815a83550346895dc7ef5e/html5/thumbnails/22.jpg)
The RMS Error
The RMS error for regression from X to Y (denoted R) can be calculated from the five-statistic summary by
What units would R have? What happens when r gets close to 0? What happens when r gets close to 1 or -1?
21)( rSDR Y
![Page 23: Chapters 10 and 11: Using Regression to Predict](https://reader035.vdocuments.site/reader035/viewer/2022081512/56815a83550346895dc7ef5e/html5/thumbnails/23.jpg)
The RMS Error
The RMS error allows us to give a range around our predictionIf the scatter plot is football-shaped, the RMS error is roughly constant across the entire range of the data set The vertical spread around one part is
about the same as the vertical spread around other parts
![Page 24: Chapters 10 and 11: Using Regression to Predict](https://reader035.vdocuments.site/reader035/viewer/2022081512/56815a83550346895dc7ef5e/html5/thumbnails/24.jpg)
The RMS Error
Suppose we have the following five-number summary stats for the height (X) and weight (Y) of men in the US µX= 70 inches, SDX= 3 inches µY= 162 lbs, SDY= 30 lbs r = 0.47
Predict and give the RMS error for the weight of a man who is 6’2”180.8 ± 26.5 lbs
![Page 25: Chapters 10 and 11: Using Regression to Predict](https://reader035.vdocuments.site/reader035/viewer/2022081512/56815a83550346895dc7ef5e/html5/thumbnails/25.jpg)
The RMS Error
Suppose we have the following five-number summary stats for the height (X) and weight (Y) of men in the US µX= 70 inches, SDX= 3 inches µY= 162 lbs, SDY= 30 lbs r = 0.47
Predict and give the RMS error for the weight of a man who is 5’4”133.8 ± 26.5 lbs
![Page 26: Chapters 10 and 11: Using Regression to Predict](https://reader035.vdocuments.site/reader035/viewer/2022081512/56815a83550346895dc7ef5e/html5/thumbnails/26.jpg)
The Regression Effect
A preschool program attempts to boost students’ IQ scores The children are tested when they enter the
program (pretest) The children are retested when they leave
the program (post-test)
![Page 27: Chapters 10 and 11: Using Regression to Predict](https://reader035.vdocuments.site/reader035/viewer/2022081512/56815a83550346895dc7ef5e/html5/thumbnails/27.jpg)
The Regression Effect
On both occasions, the average IQ score was 100, with an SD of 15 Also, students with below-average IQs
on the pretest had scores that went up on the average by 5 points
Students with above average scores on the pretest had their scores drop by an average of 5 points
![Page 28: Chapters 10 and 11: Using Regression to Predict](https://reader035.vdocuments.site/reader035/viewer/2022081512/56815a83550346895dc7ef5e/html5/thumbnails/28.jpg)
The Regression Effect
Does the program equalize intelligence?
No. If the program really equalized intelligence, then the SD for the post-test results should be smaller than that of the pre-test results. This is an example of the regression effect.
![Page 29: Chapters 10 and 11: Using Regression to Predict](https://reader035.vdocuments.site/reader035/viewer/2022081512/56815a83550346895dc7ef5e/html5/thumbnails/29.jpg)
The Regression Effect
The regression effect is a byproduct of the fact that predictions from a regression line are average values Some of the people who did very well on the
pre-test may simply have had a good test day Their scores shouldn’t necessarily be as high on the
post-test as they were on the pretest Similarly, some of the people who did poorly
on the pre-test may simply have had a bad test day Their scores shouldn’t necessarily be as low on the
post-test as they were on the pretest
![Page 30: Chapters 10 and 11: Using Regression to Predict](https://reader035.vdocuments.site/reader035/viewer/2022081512/56815a83550346895dc7ef5e/html5/thumbnails/30.jpg)
The Regression Effect
Sometimes researchers mistake the regression effect for some important underlying cause in the study (regression fallacy) Tall fathers tend to have tall sons who
are slightly shorter than the father There is no biological cause for this
reduction It is strictly statistical
![Page 31: Chapters 10 and 11: Using Regression to Predict](https://reader035.vdocuments.site/reader035/viewer/2022081512/56815a83550346895dc7ef5e/html5/thumbnails/31.jpg)
The Regression Effect
As part of their training, air force pilots make practice landings with instructors, and are rated on performance The instructors discuss the ratings with
the pilots after each landing Statistical analysis shows that pilots who make
poor landings the first time tend to do better the second time
Conversely, pilots who make good landings the first time tend to do worse the second time
![Page 32: Chapters 10 and 11: Using Regression to Predict](https://reader035.vdocuments.site/reader035/viewer/2022081512/56815a83550346895dc7ef5e/html5/thumbnails/32.jpg)
The Regression Effect
The conclusion is that criticism helps the pilots while praise makes them do worse As a result, instructors were ordered to
criticize all landings, good or bad
Was this warranted by the facts?
No. This is an example of regression fallacy.
![Page 33: Chapters 10 and 11: Using Regression to Predict](https://reader035.vdocuments.site/reader035/viewer/2022081512/56815a83550346895dc7ef5e/html5/thumbnails/33.jpg)
The Regression Effect
An instructor gives a midterm She asks the students who score 20 points below
average to see her regularly during her office hours for special tutoring
They all score at class average or above on the final
Can this improvement be attributed to the regression effect? Why/why not?
No. If it was only the regression effect, most of the students still would have scored below average. The fact that everyone in the tutoring group scored above average indicated that the tutoring had the proper effect.
![Page 34: Chapters 10 and 11: Using Regression to Predict](https://reader035.vdocuments.site/reader035/viewer/2022081512/56815a83550346895dc7ef5e/html5/thumbnails/34.jpg)
A Second Regression Line
The focus so far has been on the regression line from X to Y Note, however, that there is also a
regression line from Y to X
What would the difference between the two lines be?
The regression line from X to Y is given by zY = rzX, while the regression line from Y to X is given by zX = rzY
![Page 35: Chapters 10 and 11: Using Regression to Predict](https://reader035.vdocuments.site/reader035/viewer/2022081512/56815a83550346895dc7ef5e/html5/thumbnails/35.jpg)
A Second Regression Line
A study of 1,000 families gives the following The husbands’ average height was 68 inches
with an SD of 2.7 inches The wives’ average height was 63 inches with
an SD of 2.5 inches The correlation between them was 0.25
Predict and give the RMS error for the husband’s height when his wife’s height is 68 inches
69.35 inches, give or take 2.61 inches
![Page 36: Chapters 10 and 11: Using Regression to Predict](https://reader035.vdocuments.site/reader035/viewer/2022081512/56815a83550346895dc7ef5e/html5/thumbnails/36.jpg)
A Second Regression Line
A study of 1,000 families gives the following The husbands’ average height was 68 inches
with an SD of 2.7 inches The wives’ average height was 63 inches with
an SD of 2.5 inches The correlation between them was 0.25
Predict and give the RMS error for the wife’s height when her husband’s height is 69.35 inches
63.31 inches, give or take 2.42 inches
![Page 37: Chapters 10 and 11: Using Regression to Predict](https://reader035.vdocuments.site/reader035/viewer/2022081512/56815a83550346895dc7ef5e/html5/thumbnails/37.jpg)
A Second Regression Line
Regression Line from X to Y
Regression Line from Y to X
SD Line
![Page 38: Chapters 10 and 11: Using Regression to Predict](https://reader035.vdocuments.site/reader035/viewer/2022081512/56815a83550346895dc7ef5e/html5/thumbnails/38.jpg)
A Second Regression Line
Regression Line from X to Y
Regression Line from Y to X
SD Line
![Page 39: Chapters 10 and 11: Using Regression to Predict](https://reader035.vdocuments.site/reader035/viewer/2022081512/56815a83550346895dc7ef5e/html5/thumbnails/39.jpg)
A Second Regression Line
Regression Line from X to Y
Regression Line from Y to X
SD Line
![Page 40: Chapters 10 and 11: Using Regression to Predict](https://reader035.vdocuments.site/reader035/viewer/2022081512/56815a83550346895dc7ef5e/html5/thumbnails/40.jpg)
Summary
When trying to make predictions from a football-shaped plot, a good predictor is the average value for one variable within a restricted range in the other
The regression line runs through all of these averages
For every SD moved in the independent variable, the regression line predicts a move of r SD’s in the dependent variable
The prediction from the regression line is likely to be off by the RMS error
The RMS error can be calculated as21)( rSDY
![Page 41: Chapters 10 and 11: Using Regression to Predict](https://reader035.vdocuments.site/reader035/viewer/2022081512/56815a83550346895dc7ef5e/html5/thumbnails/41.jpg)
Summary
The regression effect is purely statistical It does not reflect a significant
underlying trend in the data
There are two regression lines for a scatter plot Which one to use depends on which
variable you are predicting