chapter 20 linear regression. what if… we believe that an important relation between two measures...
TRANSCRIPT
![Page 1: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary](https://reader035.vdocuments.site/reader035/viewer/2022062719/56649ec65503460f94bd1b84/html5/thumbnails/1.jpg)
Chapter 20
Linear Regression
![Page 2: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary](https://reader035.vdocuments.site/reader035/viewer/2022062719/56649ec65503460f94bd1b84/html5/thumbnails/2.jpg)
What if…We believe that an important relation
between two measures exists?For example, we ask 5 people about
their salary and education levelFor each observation we have two
measures, and those two measures came from the same person
![Page 3: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary](https://reader035.vdocuments.site/reader035/viewer/2022062719/56649ec65503460f94bd1b84/html5/thumbnails/3.jpg)
What would we “predict”? Does more education mean more salary? Does more salary mean more education? Does more education mean less salary? Does more salary mean less education? Are salary and education related?
![Page 4: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary](https://reader035.vdocuments.site/reader035/viewer/2022062719/56649ec65503460f94bd1b84/html5/thumbnails/4.jpg)
RegressionDescriptive vs. Inferential Bivariate data - measurements on two
variables for each observation– Heights (X) and weights (Y)– IQ (X) and SAT(Y) scores – Years of educ. (X) and Annual salary (Y)– Number of Policemen (X) and Number of
crimes (Y) in US cities
![Page 5: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary](https://reader035.vdocuments.site/reader035/viewer/2022062719/56649ec65503460f94bd1b84/html5/thumbnails/5.jpg)
Regression
How are the two sets of scores related?
Using a scatterplot we can “look” at the relationship
Constructed by plotting each of the bivariate observations (X, Y)
9 11 13 15 17 19 210
10
20
30
40
50
60
70
Yrs of Education
An
nu
al
Sala
ry (
in 1
00
0's
)
![Page 6: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary](https://reader035.vdocuments.site/reader035/viewer/2022062719/56649ec65503460f94bd1b84/html5/thumbnails/6.jpg)
Regression
Which one’s X and which one’s Y?
That’s up to you, but… Generally, the X
variable is thought of as the “predictor” variable
We try to predict a Y score given an X score
9 11 13 15 17 19 210
10
20
30
40
50
60
70
Yrs of Education
An
nu
al
Sala
ry (
in 1
00
0's
)
![Page 7: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary](https://reader035.vdocuments.site/reader035/viewer/2022062719/56649ec65503460f94bd1b84/html5/thumbnails/7.jpg)
Regression
If the scores seem to “line up,” we call this a “linear relationship”
9 11 13 15 17 19 210
10
20
30
40
50
60
70
Yrs of Education
An
nu
al
Sala
ry (
in 1
00
0's
)
![Page 8: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary](https://reader035.vdocuments.site/reader035/viewer/2022062719/56649ec65503460f94bd1b84/html5/thumbnails/8.jpg)
Interpreting Scatterplots
If the following relations hold:
low x - high ymid x - mid yhigh x - low y,
“A negative linear relationship”
9 11 13 15 17 19 210
10
20
30
40
50
60
70
Yrs of Education
An
nu
al
Sala
ry (
in 1
00
0's
)
![Page 9: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary](https://reader035.vdocuments.site/reader035/viewer/2022062719/56649ec65503460f94bd1b84/html5/thumbnails/9.jpg)
Interpreting Scatterplots
If the following relations hold:
low x - low ymid x - mid yhigh x - high
y,
“A positive linear relationship”
2 4 6 80
1
2
3
4
5
6
7
8
9
10
Police per 1000 citizensN
um
be
r o
f C
rim
es (
10
00
s)
![Page 10: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary](https://reader035.vdocuments.site/reader035/viewer/2022062719/56649ec65503460f94bd1b84/html5/thumbnails/10.jpg)
Interpreting Scatterplots
However, there also can be “no relation” also
2 4 6 899
100
101
102
103
104
105
106
107
Shoe Size
IQ
![Page 11: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary](https://reader035.vdocuments.site/reader035/viewer/2022062719/56649ec65503460f94bd1b84/html5/thumbnails/11.jpg)
Interpreting Scatterplots
Curvelinear
50 55 60 65 70 75100
105
110
115
120
125
130
135
HeightW
eig
ht
![Page 12: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary](https://reader035.vdocuments.site/reader035/viewer/2022062719/56649ec65503460f94bd1b84/html5/thumbnails/12.jpg)
Measuring Linear RelationshipsThe first measure of a linear
relationship (not in the book) is COVARIANCE (sXY)
![Page 13: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary](https://reader035.vdocuments.site/reader035/viewer/2022062719/56649ec65503460f94bd1b84/html5/thumbnails/13.jpg)
Or
SPXY is known as the “Sum of Products” or the sum of the products of the deviations of X and Y from their means
![Page 14: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary](https://reader035.vdocuments.site/reader035/viewer/2022062719/56649ec65503460f94bd1b84/html5/thumbnails/14.jpg)
Easy Calculation
![Page 15: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary](https://reader035.vdocuments.site/reader035/viewer/2022062719/56649ec65503460f94bd1b84/html5/thumbnails/15.jpg)
Covariance
Interpretation:– positive = positive linear relationship– negative = negative linear relationship– zero = no relationship
Magnitude (strength of the relationship)?– Uninterpretable– for example, a large covariance does not
necessarily mean strong relationship
![Page 16: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary](https://reader035.vdocuments.site/reader035/viewer/2022062719/56649ec65503460f94bd1b84/html5/thumbnails/16.jpg)
But, we can use covariance Which line best fits our
data? Do we just draw one
that looks good? No, we can use
something called “least squares regression” to find the equation of the best-fit line (“Best-fit linear regression”)
9 11 13 15 17 19 2120
25
30
35
40
45
50
55
60
65
Yrs of EducationA
nn
ual
Sala
ry (
in 1
00
0's
)
![Page 17: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary](https://reader035.vdocuments.site/reader035/viewer/2022062719/56649ec65503460f94bd1b84/html5/thumbnails/17.jpg)
Linear Equations
Yi = mXi + bm = slopeb = y-intercept
![Page 18: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary](https://reader035.vdocuments.site/reader035/viewer/2022062719/56649ec65503460f94bd1b84/html5/thumbnails/18.jpg)
Finding the Slope
![Page 19: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary](https://reader035.vdocuments.site/reader035/viewer/2022062719/56649ec65503460f94bd1b84/html5/thumbnails/19.jpg)
Or…
![Page 20: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary](https://reader035.vdocuments.site/reader035/viewer/2022062719/56649ec65503460f94bd1b84/html5/thumbnails/20.jpg)
Finding the y-intercept (b)After finding the slope (m), find b using:
![Page 21: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary](https://reader035.vdocuments.site/reader035/viewer/2022062719/56649ec65503460f94bd1b84/html5/thumbnails/21.jpg)
Least Squares Criterion
The best line has the property of least squares
The sum of the squared deviations of the points from the line are a minimum
9 11 13 15 17 19 2120
25
30
35
40
45
50
55
60
65
Yrs of EducationA
nn
ual
Sala
ry (
in 1
00
0's
)
![Page 22: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary](https://reader035.vdocuments.site/reader035/viewer/2022062719/56649ec65503460f94bd1b84/html5/thumbnails/22.jpg)
What’s the “least” again?What are we trying to minimize?
– The best fit line will be described by the function Yi = mXi + b
– Thus, for any Xi, we can estimate a corresponding Yi value
– Problem: for some Xi’s we already have Yi’s
– So, let’s call the estimated value
(“Y-sub-I-hat”), to differentiate it from the “real” Yi
![Page 23: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary](https://reader035.vdocuments.site/reader035/viewer/2022062719/56649ec65503460f94bd1b84/html5/thumbnails/23.jpg)
Least Squares Criterion
For example, when
Xi = 15we would estimate that = 44,000
But, we have a “real” Yi value corresponding to Xi =15 (35,000)
9 11 13 15 17 19 2120
25
30
35
40
45
50
55
60
65
Yrs of Education
An
nu
al
Sala
ry (
in 1
00
0's
)When Xi = 15
Our estimatedY value is44,000
A “real”Y valueof 35,000
iY
![Page 24: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary](https://reader035.vdocuments.site/reader035/viewer/2022062719/56649ec65503460f94bd1b84/html5/thumbnails/24.jpg)
Minimize this…
For every Xi, we have the a value Yi, and an estimate of Yi ( )
Consider the quantity:– Which is the deviation of the real score from the
estimated score, for any give Xi value The sum of these deviations will be zero
![Page 25: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary](https://reader035.vdocuments.site/reader035/viewer/2022062719/56649ec65503460f94bd1b84/html5/thumbnails/25.jpg)
• But, by squaring those deviations and summing,
• We want the line that makes the above quantity the minimum (the least squares criterion)
• This is also called the sums of squares error or SSE (how much do our estimates “err” from our real values?)
![Page 26: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary](https://reader035.vdocuments.site/reader035/viewer/2022062719/56649ec65503460f94bd1b84/html5/thumbnails/26.jpg)
How accurate are our Estimates?Two ways to measure how “good” our
estimates are:– Standard Error of the Estimate– Coefficient of Determination (not covered
in our book, yet)
![Page 27: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary](https://reader035.vdocuments.site/reader035/viewer/2022062719/56649ec65503460f94bd1b84/html5/thumbnails/27.jpg)
Standard Error of the Estimate
but, this term is very hard to interpret. (Hurrah, there are better ways to measure the goodness of the fit!)
![Page 28: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary](https://reader035.vdocuments.site/reader035/viewer/2022062719/56649ec65503460f94bd1b84/html5/thumbnails/28.jpg)
Coefficient of Determination
cd = r2
![Page 29: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary](https://reader035.vdocuments.site/reader035/viewer/2022062719/56649ec65503460f94bd1b84/html5/thumbnails/29.jpg)
Now You:ID INCOME NUMDRK
2001 1 1
2002 6 2
2003 5 8
2004 4 1
2005 6 3
![Page 30: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary](https://reader035.vdocuments.site/reader035/viewer/2022062719/56649ec65503460f94bd1b84/html5/thumbnails/30.jpg)
Practice:ID INCOME NUMDRK
XY
2001 1 1
2002 6 2
2003 5 8
2004 4 1
2005 6 3
Σ
n
M
SS(X)
![Page 31: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary](https://reader035.vdocuments.site/reader035/viewer/2022062719/56649ec65503460f94bd1b84/html5/thumbnails/31.jpg)
Practice:ID INCOME NUMDRK
XY
2001 1 1 1
2002 6 2 12
2003 5 8 40
2004 4 1 4
2005 6 3 18
Σ 22 15 75
n 5 5
M 4.4 3
SS(X) 17.2 34
![Page 32: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary](https://reader035.vdocuments.site/reader035/viewer/2022062719/56649ec65503460f94bd1b84/html5/thumbnails/32.jpg)
Practice:ID INCOME NUMDRK
XY
2001 1 1 1
2002 6 2 12
2003 5 8 40
2004 4 1 4
2005 6 3 18
Σ 22 15 75
n 5 5
M 4.4 3
SS(X) 17.2 34
![Page 33: Chapter 20 Linear Regression. What if… We believe that an important relation between two measures exists? For example, we ask 5 people about their salary](https://reader035.vdocuments.site/reader035/viewer/2022062719/56649ec65503460f94bd1b84/html5/thumbnails/33.jpg)
0 1 2 3 4 5 6 70
1
2
3
4
5
6
7
8
9
f(x) = 0.523255813953 x + 0.697674418605