lecture 6 notes
DESCRIPTION
Lecture 6 Notes. Note: I will e-mail homework 2 tonight. It will be due next Thursday. The Multiple Linear Regression model (Chapter 4.1) Inferences from multiple regression analysis (Chapter 4.2) - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Lecture 6 Notes](https://reader036.vdocuments.site/reader036/viewer/2022072014/56812d5a550346895d925f39/html5/thumbnails/1.jpg)
Lecture 6 Notes
• Note: I will e-mail homework 2 tonight. It will be due next Thursday.
• The Multiple Linear Regression model (Chapter 4.1)
• Inferences from multiple regression analysis (Chapter 4.2)
• In multiple regression analysis, we consider more than one independent variable x1,…,xK . We are interested in the conditional mean of y given x1,…,xK .
![Page 2: Lecture 6 Notes](https://reader036.vdocuments.site/reader036/viewer/2022072014/56812d5a550346895d925f39/html5/thumbnails/2.jpg)
Automobile Example• A team charged with designing a new automobile is
concerned about the gas mileage that can be achieved. The design team is interested in two things:
(1) Which characteristics of the design are likely to affect mileage?
(2) A new car is planned to have the following characteristics: weight – 4000 lbs, horsepower – 200, cargo – 18 cubic feet, seating – 5 adults. Predict the new car’s gas mileage.
• The team has available information about gallons per 1000 miles and four design characteristics (weight, horsepower, cargo, seating) for a sample of cars made in 1989. Data is in car89.JMP.
![Page 3: Lecture 6 Notes](https://reader036.vdocuments.site/reader036/viewer/2022072014/56812d5a550346895d925f39/html5/thumbnails/3.jpg)
Multivariate Correlations GP1000MHwy Weight(lb) Horsepower Cargo Seating GP1000MHwy 1.0000 0.7097 0.6157 0.3405 0.2599 Weight(lb) 0.7097 1.0000 0.7509 0.1816 0.3499 Horsepower 0.6157 0.7509 1.0000 -0.0548 -0.0914 Cargo 0.3405 0.1816 -0.0548 1.0000 0.4894 Seating 0.2599 0.3499 -0.0914 0.4894 1.0000 7 rows not used due to missing values. Scatterplot Matrix
25
35
45
55
2000
3000
4000
100
150
200
250
20
60
100
140
180
2
4
6
8
GP1000MHwy
25 35 4550
Weight(lb)
2000 3000 4000
Horsepower
100 150 200 250
Cargo
20 60 100 160
Seating
2 3 4 5 6 7 8
![Page 4: Lecture 6 Notes](https://reader036.vdocuments.site/reader036/viewer/2022072014/56812d5a550346895d925f39/html5/thumbnails/4.jpg)
Best Single Predictor
• To obtain the correlation matrix and pairwise scatterplots, click Analyze, Multivariate Methods, Multivariate.
• If we use simple linear regression with each of the four independent variables, which provides the best predictions?
![Page 5: Lecture 6 Notes](https://reader036.vdocuments.site/reader036/viewer/2022072014/56812d5a550346895d925f39/html5/thumbnails/5.jpg)
Best Single Predictor
• Answer: The simple linear regression that has the highest R2 gives the best predictions because recall that
• Weight gives the best predictions of GPM1000Hwy based on simple linear regression.
• But we can obtain better predictions by using more than one of the independent variables.
SST
SSER 12
![Page 6: Lecture 6 Notes](https://reader036.vdocuments.site/reader036/viewer/2022072014/56812d5a550346895d925f39/html5/thumbnails/6.jpg)
Multiple Linear Regression Model
•
• Assumptions about :– The expected value of the disturbances is zero for
each ,
– The variance of each is equal to ,i.e.,
– The are normally distributed.
– The are independent.
11 | ,..., 0 1 1( | , , )( )KK y x x K KE Y x x x x
iKiKiii exxxy 22110
ie
1( , , )Kx x1( | , , ) 0i i iKE e x x
ie
2e
ie
ie
21( | , , )i i iK eVar e x x
![Page 7: Lecture 6 Notes](https://reader036.vdocuments.site/reader036/viewer/2022072014/56812d5a550346895d925f39/html5/thumbnails/7.jpg)
Point Estimates for Multiple Linear Regression Model
• We use the same least squares procedure as for simple linear regression.
• Our estimates of are the coefficients that minimize the sum of squared prediction errors:
• Least Squares in JMP: Click Analyze, Fit Model, put dependent variable into Y and add independent variables to the construct model effects box.
K ,...,0 Kbb ,...,0
n
i iKKiibbK xbxbbybbK 1
2*1
*1
*0,...,0 )(minarg,..., **
0
KK xbxbby 110ˆ
![Page 8: Lecture 6 Notes](https://reader036.vdocuments.site/reader036/viewer/2022072014/56812d5a550346895d925f39/html5/thumbnails/8.jpg)
R esp onse G P1000M H w y W h ole M odel Actu al b y P red icted P lo t S um m ary o f F it R S quare 0.589015 R S quare A dj 0 .573208 R oot M ean S quare E rror 3.542778 M ean o f R esponse 37.33359 O bservations (or S um W gts) 109 An a lysis o f V ariance S ource D F S um of S quares M ean S quare F R atio M odel 4 1870.7788 467.695 37.2627 E rror 104 1305.3330 12.551 P rob > F C . Total 108 3176.1118 < .0001 P aram eter E stim ates Term E stim ate S td E rror t R atio P rob>|t| In tercept 19 .100521 2.098478 9.10 < .0001 W eight(lb ) 0 .0040877 0.001203 3.40 0.0010 H orsepower 0 .0426999 0.01567 2.73 0.0075 C argo 0.0533 0.013787 3.87 0.0002 S eating 0.0268912 0.428283 0.06 0.9501 R esidu al b y P red icted P lo t
-10
-5
0
5
10
GP
1000M
Hw
y R
esid
ual
25 30 35 40 45 50 55
GP1000MHwy Predicted
![Page 9: Lecture 6 Notes](https://reader036.vdocuments.site/reader036/viewer/2022072014/56812d5a550346895d925f39/html5/thumbnails/9.jpg)
Root Mean Square Error
• Estimate of :
• = Root Mean Square Error in JMP • For simple linear regression of GP1000MHWY on
Weight, . For multiple linear regression of GP1000MHWY on weight, horsepower, cargo, seating,
e
1
)ˆ( 2
1
Kn
yys i
n
i ie
es
54.3es
86.3es
![Page 10: Lecture 6 Notes](https://reader036.vdocuments.site/reader036/viewer/2022072014/56812d5a550346895d925f39/html5/thumbnails/10.jpg)
Residuals and Root Mean Square Errors
•
• Residual for observation i = prediction error for observation i =
• Root mean square error = Typical size of absolute value of prediction error• As with simple linear regression model, if multiple linear regression model
holds– About 95% of the observations will be within two RMSEs of their
predicted value • For car data, about 95% of the time, the actual GP1000M will be within
2*3.54=7.08 GP1000M of the predicted GP1000M of the car based on the car’s weight, horsepower, cargo and seating.
1 1 0 1 1ˆ ( | , , )K K K KE Y X x X x b b x b x
1 1
0 1 1
ˆ ( | , , )i i K iK
i i K iK
Y E Y X x X x
Y b b x b x
![Page 11: Lecture 6 Notes](https://reader036.vdocuments.site/reader036/viewer/2022072014/56812d5a550346895d925f39/html5/thumbnails/11.jpg)
Inferences about Regression Coefficients
• Confidence intervals: confidence interval for :
Degrees of freedom for t equals n-(K+1). Standard error of , , found on JMP output.
• Hypothesis Test:
Decision rule for test: Reject H0 if or where p-value for testing is printed in JMP
output under Prob>|t|.
%100)1(
kkbk stb 2/
kbkbs
*
*0
:
:
kka
kk
H
H
2/tt 2/tt
kb
kk
s
bt
*
0:0 kH
![Page 12: Lecture 6 Notes](https://reader036.vdocuments.site/reader036/viewer/2022072014/56812d5a550346895d925f39/html5/thumbnails/12.jpg)
Inference Examples
• Find a 95% confidence interval for ?
• Is seating of any help in predicting gas mileage once horsepower, weight and cargo have been taken into account? Carry out a test at the 0.05 significance level.
horsepower
![Page 13: Lecture 6 Notes](https://reader036.vdocuments.site/reader036/viewer/2022072014/56812d5a550346895d925f39/html5/thumbnails/13.jpg)
Partial Slopes vs. Marginal Slopes
• Multiple Linear Regression Model:
• The coefficient is a partial slope. It indicates the change in the mean of y that is associated with a one unit increase in while holding all other variables fixed.
• A marginal slope is obtained when we perform a simple regression with only one X, ignoring all other variables. Consequently the other variables are not held fixed.
KKxxy xxK
110,...,| 1
k
kxKkk xxxx ...,,,..., ,111
![Page 14: Lecture 6 Notes](https://reader036.vdocuments.site/reader036/viewer/2022072014/56812d5a550346895d925f39/html5/thumbnails/14.jpg)
Simple Linear Regression Bivariate Fit of GP1000MHwy By Seating
25
30
35
40
45
50
55
GP
10
00
MH
wy
2 3 4 5 6 7 8
Seating
Parameter Estimates Term Estimate Std Error t Ratio Prob>|t| Intercept 30.829816 2.277905 13.53 <.0001 Seating 1.3022488 0.442389 2.94 0.0040
Multiple Linear Regression Response GP1000MHwy Whole Model Parameter Estimates Term Estimate Std Error t Ratio Prob>|t| Intercept 19.100521 2.098478 9.10 <.0001 Weight(lb) 0.0040877 0.001203 3.40 0.0010 Cargo 0.0533 0.013787 3.87 0.0002 Seating 0.0268912 0.428283 0.06 0.9501 Horsepower 0.0426999 0.01567 2.73 0.0075
![Page 15: Lecture 6 Notes](https://reader036.vdocuments.site/reader036/viewer/2022072014/56812d5a550346895d925f39/html5/thumbnails/15.jpg)
Partial Slopes vs. Marginal Slopes Example
• In order to evaluate the benefits of a proposed irrigation scheme in a certain region, suppose that the relation of yield Y to rainfall R is investigated over several years.
• Data is in rainfall.JMP.
![Page 16: Lecture 6 Notes](https://reader036.vdocuments.site/reader036/viewer/2022072014/56812d5a550346895d925f39/html5/thumbnails/16.jpg)
B i v a r i a t e F i t o f Y i e l d B y T o t a l S p r i n g R a i n f a l l
3 0
4 0
5 0
6 0
7 0
8 0
9 0Y
ield
7 8 9 1 0 11 1 2 1 3
T o ta l S p rin g R a in fa ll
L in e a r F it
L i n e a r F i t Y i e l d = 7 6 . 6 6 6 6 6 7 - 1 . 6 6 6 6 6 6 7 T o t a l S p r i n g R a i n f a l l S u m m a r y o f F i t R S q u a r e 0 . 0 2 7 7 7 8 R S q u a r e A d j - 0 . 1 3 4 2 6 R o o t M e a n S q u a r e E r r o r 1 3 . 9 4 4 3 3 M e a n o f R e s p o n s e 6 0 O b s e r v a t i o n s ( o r S u m W g t s ) 8 P a r a m e t e r E s t i m a t e s T e r m E s t i m a t e S t d E r r o r t R a t i o P r o b > | t | I n t e r c e p t 7 6 . 6 6 6 6 6 7 4 0 . 5 5 4 6 1 . 8 9 0 . 1 0 7 6 T o t a l S p r i n g R a i n f a l l - 1 . 6 6 6 6 6 7 4 . 0 2 5 3 8 2 - 0 . 4 1 0 . 6 9 3 2
![Page 17: Lecture 6 Notes](https://reader036.vdocuments.site/reader036/viewer/2022072014/56812d5a550346895d925f39/html5/thumbnails/17.jpg)
42.5
45
47.5
50
52.5
55
57.5A
vera
ge S
prin
g Te
mpe
ratu
re
7 8 9 10 11 12 13
Total Spring Rainfall
Bivariate Fit of Average Spring Temperature By Total Spring Rainfall
Higher rainfall is associated with lower temperature.
![Page 18: Lecture 6 Notes](https://reader036.vdocuments.site/reader036/viewer/2022072014/56812d5a550346895d925f39/html5/thumbnails/18.jpg)
Multiple Linear Regression Response Yield Parameter Estimates Term Estimate Std Error t Ratio Prob>|t| Intercept -144.7619 55.8499 -2.59 0.0487 Total Spring Rainfall 5.7142857 2.680238 2.13 0.0862 Average Spring Temperature 2.952381 0.692034 4.27 0.0080
Rainfall is estimated to be beneficial once temperature is held fixed.
Multiple regression provides a better picture of the benefits of an irrigation scheme because temperature would be held fixed inan irrigation scheme.