linear correlation and linear regression + summary of tests dr. omar al jadaan assistant professor...
TRANSCRIPT
![Page 1: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics](https://reader034.vdocuments.site/reader034/viewer/2022042717/56649f575503460f94c7c285/html5/thumbnails/1.jpg)
Linear correlation and linear regression + summary of tests
Dr. Omar Al JadaanAssistant Professor – Computer Science &
Mathematics
![Page 2: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics](https://reader034.vdocuments.site/reader034/viewer/2022042717/56649f575503460f94c7c285/html5/thumbnails/2.jpg)
Recall: Covariance
1
))((),(cov 1
n
YyXxyx
n
iii
![Page 3: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics](https://reader034.vdocuments.site/reader034/viewer/2022042717/56649f575503460f94c7c285/html5/thumbnails/3.jpg)
cov(X,Y) > 0 X and Y are positively correlated
cov(X,Y) < 0 X and Y are inversely correlated
cov(X,Y) = 0 X and Y are independent
Interpreting Covariance
![Page 4: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics](https://reader034.vdocuments.site/reader034/viewer/2022042717/56649f575503460f94c7c285/html5/thumbnails/4.jpg)
Correlation coefficient
Pearson’s Correlation Coefficient is standardized covariance (unitless):
yx
yxariancer
varvar
),(cov
![Page 5: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics](https://reader034.vdocuments.site/reader034/viewer/2022042717/56649f575503460f94c7c285/html5/thumbnails/5.jpg)
Correlation Measures the relative strength of the linear
relationship between two variables Unit-less Ranges between –1 and 1 The closer to –1, the stronger the negative linear
relationship The closer to 1, the stronger the positive linear
relationship The closer to 0, the weaker any positive linear
relationship
![Page 6: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics](https://reader034.vdocuments.site/reader034/viewer/2022042717/56649f575503460f94c7c285/html5/thumbnails/6.jpg)
Scatter Plots of Data with Various Correlation Coefficients
Y
X
Y
X
Y
X
Y
X
Y
X
r = -1 r = -.6 r = 0
r = +.3r = +1
Y
Xr = 0
![Page 7: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics](https://reader034.vdocuments.site/reader034/viewer/2022042717/56649f575503460f94c7c285/html5/thumbnails/7.jpg)
Y
X
Y
X
Y
Y
X
X
Linear relationships Curvilinear relationships
Linear Correlation
![Page 8: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics](https://reader034.vdocuments.site/reader034/viewer/2022042717/56649f575503460f94c7c285/html5/thumbnails/8.jpg)
Y
X
Y
X
Y
Y
X
X
Strong relationships Weak relationships
Linear Correlation
![Page 9: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics](https://reader034.vdocuments.site/reader034/viewer/2022042717/56649f575503460f94c7c285/html5/thumbnails/9.jpg)
Linear Correlation
Y
X
Y
X
No relationship
![Page 10: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics](https://reader034.vdocuments.site/reader034/viewer/2022042717/56649f575503460f94c7c285/html5/thumbnails/10.jpg)
Some calculation formulas…
yx
xy
n
ii
n
ii
n
iii
n
ii
n
ii
n
iii
SSSS
SS
yyxx
yyxx
n
yy
n
xx
n
yyxx
r
1
2
1
2
1
1
2
1
2
1
)()(
))((
1
)(
1
)(
1
))((
ˆ
yx
xy
SSSS
SSr ˆ
Note: Easier computation formulas:
22
22
ynySS
xnxSS
yxnyxSS
iy
ix
iixy
![Page 11: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics](https://reader034.vdocuments.site/reader034/viewer/2022042717/56649f575503460f94c7c285/html5/thumbnails/11.jpg)
Sampling distribution of correlation coefficient:
*note, like a proportion, the variance of the correlation coefficient depends on the correlation coefficient itselfsubstitute in estimated r
2
1)ˆ(
2
n
rrSE
The sample correlation coefficient follows a T-distribution with n-2 degrees of freedom (since you have to estimate the standard error).
![Page 12: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics](https://reader034.vdocuments.site/reader034/viewer/2022042717/56649f575503460f94c7c285/html5/thumbnails/12.jpg)
What is “Linear”?
Remember this: Y=mX+B?
B
m
![Page 13: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics](https://reader034.vdocuments.site/reader034/viewer/2022042717/56649f575503460f94c7c285/html5/thumbnails/13.jpg)
What’s Slope?
A slope of 2 means that every 1-unit change in X yields a 2-unit change in Y.
![Page 14: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics](https://reader034.vdocuments.site/reader034/viewer/2022042717/56649f575503460f94c7c285/html5/thumbnails/14.jpg)
Simple linear regression
The linear regression model:
Love of Math = 5 + .01*math SAT score
intercept
slope
P=.22; not significant
![Page 15: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics](https://reader034.vdocuments.site/reader034/viewer/2022042717/56649f575503460f94c7c285/html5/thumbnails/15.jpg)
PredictionIf you know something about X, this knowledge helps you
predict something about Y. (Sound familiar?…sound like conditional probabilities?)
![Page 16: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics](https://reader034.vdocuments.site/reader034/viewer/2022042717/56649f575503460f94c7c285/html5/thumbnails/16.jpg)
EXAMPLE The distribution of baby weights at
Stanford ~ N(3400, 360000)
Your “Best guess” at a random baby’s weight, given no information about the baby, is what?
3400 grams
But, what if you have relevant information? Can you make a better guess?
![Page 17: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics](https://reader034.vdocuments.site/reader034/viewer/2022042717/56649f575503460f94c7c285/html5/thumbnails/17.jpg)
Predictor variable X=gestation time
Assume that babies that gestate for longer are born heavier, all other things being equal.
Pretend (at least for the purposes of this example) that this relationship is linear.
Example: suppose a one-week increase in gestation, on average, leads to a 100-gram increase in birth-weight
![Page 18: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics](https://reader034.vdocuments.site/reader034/viewer/2022042717/56649f575503460f94c7c285/html5/thumbnails/18.jpg)
Y depends on X
Y=birth- weight
(g)
X=gestation time (weeks)
Best fit line is chosen such that the sum of the squared (why squared?) distances of the points (Yi’s) from the line is minimized:
Or mathematically… (remember max and mins from calculus)…
Derivative[(Yi-(mx+b))2]=0
![Page 19: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics](https://reader034.vdocuments.site/reader034/viewer/2022042717/56649f575503460f94c7c285/html5/thumbnails/19.jpg)
Prediction
A new baby is born that had gestated for just 30 weeks. What’s your best guess at the birth-weight?
Are you still best off guessing 3400? NO!
![Page 20: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics](https://reader034.vdocuments.site/reader034/viewer/2022042717/56649f575503460f94c7c285/html5/thumbnails/20.jpg)
Y=birth- weight
(g)
X=gestation time (weeks)
At 30 weeks…
3000
30
![Page 21: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics](https://reader034.vdocuments.site/reader034/viewer/2022042717/56649f575503460f94c7c285/html5/thumbnails/21.jpg)
Y=birth weight
(g)
X=gestation time (weeks)
At 30 weeks…
(x,y)=
(30,3000)
3000
30
![Page 22: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics](https://reader034.vdocuments.site/reader034/viewer/2022042717/56649f575503460f94c7c285/html5/thumbnails/22.jpg)
At 30 weeks…
The babies that gestate for 30 weeks appear to center around a weight of 3000 grams.
In Math-Speak… E(Y/X=30 weeks)=3000 grams
Note the conditional expectation
![Page 23: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics](https://reader034.vdocuments.site/reader034/viewer/2022042717/56649f575503460f94c7c285/html5/thumbnails/23.jpg)
But…Note that not every Y-value (Yi) sits on the line. There’s variability.
Yi=3000 + random errori
In fact, babies that gestate for 30 weeks have birth-weights that center at 3000 grams, but vary around 3000 with some variance 2
Approximately what distribution do birth-weights follow? Normal. Y/X=30 weeks ~ N(3000, 2)
![Page 24: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics](https://reader034.vdocuments.site/reader034/viewer/2022042717/56649f575503460f94c7c285/html5/thumbnails/24.jpg)
Y=birth- weight
(g)
X=gestation time (weeks)
And, if X=20, 30, or 40…
20 30 40
![Page 25: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics](https://reader034.vdocuments.site/reader034/viewer/2022042717/56649f575503460f94c7c285/html5/thumbnails/25.jpg)
Y=baby weights
(g)
X=gestation times (weeks)
If X=20, 30, or 40…
20 30 40
Y/X=40 weeks ~ N(4000, 2)
Y/X=30 weeks ~ N(3000, 2)
Y/X=20 weeks ~ N(2000, 2)
![Page 26: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics](https://reader034.vdocuments.site/reader034/viewer/2022042717/56649f575503460f94c7c285/html5/thumbnails/26.jpg)
Mean values fall on the line
E(Y/X=40 weeks)=4000 E(Y/X=30 weeks)=3000 E(Y/X=20 weeks)=2000
E(Y/X)= Y/X = 100 grams/week*X weeks
![Page 27: Linear correlation and linear regression + summary of tests Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics](https://reader034.vdocuments.site/reader034/viewer/2022042717/56649f575503460f94c7c285/html5/thumbnails/27.jpg)
Linear Regression Model
Y’s are modeled…
Yi= 100*X + random errori
Follows a normal distribution
Fixed – exactly on the line