lecture 6 notes

Lecture 6 Notes

• Note: I will e-mail homework 2 tonight. It will be due next Thursday.

• The Multiple Linear Regression model (Chapter 4.1)

• Inferences from multiple regression analysis (Chapter 4.2)

• In multiple regression analysis, we consider more than one independent variable x1,…,xK . We are interested in the conditional mean of y given x1,…,xK .

Automobile Example• A team charged with designing a new automobile is

concerned about the gas mileage that can be achieved. The design team is interested in two things:

(1) Which characteristics of the design are likely to affect mileage?

(2) A new car is planned to have the following characteristics: weight – 4000 lbs, horsepower – 200, cargo – 18 cubic feet, seating – 5 adults. Predict the new car’s gas mileage.

• The team has available information about gallons per 1000 miles and four design characteristics (weight, horsepower, cargo, seating) for a sample of cars made in 1989. Data is in car89.JMP.

Multivariate Correlations GP1000MHwy Weight(lb) Horsepower Cargo Seating GP1000MHwy 1.0000 0.7097 0.6157 0.3405 0.2599 Weight(lb) 0.7097 1.0000 0.7509 0.1816 0.3499 Horsepower 0.6157 0.7509 1.0000 -0.0548 -0.0914 Cargo 0.3405 0.1816 -0.0548 1.0000 0.4894 Seating 0.2599 0.3499 -0.0914 0.4894 1.0000 7 rows not used due to missing values. Scatterplot Matrix

25

35

45

55

2000

3000

4000

100

150

200

250

20

60

100

140

180

2

4

6

8

GP1000MHwy

25 35 4550

Weight(lb)

2000 3000 4000

Horsepower

100 150 200 250

Cargo

20 60 100 160

Seating

2 3 4 5 6 7 8

Best Single Predictor

• To obtain the correlation matrix and pairwise scatterplots, click Analyze, Multivariate Methods, Multivariate.

• If we use simple linear regression with each of the four independent variables, which provides the best predictions?

Best Single Predictor

• Answer: The simple linear regression that has the highest R2 gives the best predictions because recall that

• Weight gives the best predictions of GPM1000Hwy based on simple linear regression.

• But we can obtain better predictions by using more than one of the independent variables.

SST

SSER 12

Multiple Linear Regression Model

•

• Assumptions about :– The expected value of the disturbances is zero for

each ,

– The variance of each is equal to ,i.e.,

– The are normally distributed.

– The are independent.

11 | ,..., 0 1 1( | , , )( )KK y x x K KE Y x x x x

iKiKiii exxxy 22110

ie

1( , , )Kx x1( | , , ) 0i i iKE e x x

ie

2e

ie

ie

21( | , , )i i iK eVar e x x

Point Estimates for Multiple Linear Regression Model

• We use the same least squares procedure as for simple linear regression.

• Our estimates of are the coefficients that minimize the sum of squared prediction errors:

• Least Squares in JMP: Click Analyze, Fit Model, put dependent variable into Y and add independent variables to the construct model effects box.

K ,...,0 Kbb ,...,0

n

i iKKiibbK xbxbbybbK 1

2*1

*1

*0,...,0 )(minarg,..., **

0

KK xbxbby 110ˆ

R esp onse G P1000M H w y W h ole M odel Actu al b y P red icted P lo t S um m ary o f F it R S quare 0.589015 R S quare A dj 0 .573208 R oot M ean S quare E rror 3.542778 M ean o f R esponse 37.33359 O bservations (or S um W gts) 109 An a lysis o f V ariance S ource D F S um of S quares M ean S quare F R atio M odel 4 1870.7788 467.695 37.2627 E rror 104 1305.3330 12.551 P rob > F C . Total 108 3176.1118 < .0001 P aram eter E stim ates Term E stim ate S td E rror t R atio P rob>|t| In tercept 19 .100521 2.098478 9.10 < .0001 W eight(lb ) 0 .0040877 0.001203 3.40 0.0010 H orsepower 0 .0426999 0.01567 2.73 0.0075 C argo 0.0533 0.013787 3.87 0.0002 S eating 0.0268912 0.428283 0.06 0.9501 R esidu al b y P red icted P lo t

-10

-5

0

5

10

GP

1000M

Hw

y R

esid

ual

25 30 35 40 45 50 55

GP1000MHwy Predicted

Root Mean Square Error

• Estimate of :

• = Root Mean Square Error in JMP • For simple linear regression of GP1000MHWY on

Weight, . For multiple linear regression of GP1000MHWY on weight, horsepower, cargo, seating,

e

1

)ˆ( 2

1

Kn

yys i

n

i ie

es

54.3es

86.3es

Residuals and Root Mean Square Errors

•

• Residual for observation i = prediction error for observation i =

• Root mean square error = Typical size of absolute value of prediction error• As with simple linear regression model, if multiple linear regression model

holds– About 95% of the observations will be within two RMSEs of their

predicted value • For car data, about 95% of the time, the actual GP1000M will be within

2*3.54=7.08 GP1000M of the predicted GP1000M of the car based on the car’s weight, horsepower, cargo and seating.

1 1 0 1 1ˆ ( | , , )K K K KE Y X x X x b b x b x

1 1

0 1 1

ˆ ( | , , )i i K iK

i i K iK

Y E Y X x X x

Y b b x b x

Inferences about Regression Coefficients

• Confidence intervals: confidence interval for :

Degrees of freedom for t equals n-(K+1). Standard error of , , found on JMP output.

• Hypothesis Test:

Decision rule for test: Reject H0 if or where p-value for testing is printed in JMP

output under Prob>|t|.

%100)1(

kkbk stb 2/

kbkbs

*

*0

:

:

kka

kk

H

H

2/tt 2/tt

kb

kk

s

bt

*

0:0 kH

Inference Examples

• Find a 95% confidence interval for ?

• Is seating of any help in predicting gas mileage once horsepower, weight and cargo have been taken into account? Carry out a test at the 0.05 significance level.

horsepower

Partial Slopes vs. Marginal Slopes

• Multiple Linear Regression Model:

• The coefficient is a partial slope. It indicates the change in the mean of y that is associated with a one unit increase in while holding all other variables fixed.

• A marginal slope is obtained when we perform a simple regression with only one X, ignoring all other variables. Consequently the other variables are not held fixed.

KKxxy xxK

110,...,| 1

k

kxKkk xxxx ...,,,..., ,111

Simple Linear Regression Bivariate Fit of GP1000MHwy By Seating

25

30

35

40

45

50

55

GP

10

00

MH

wy

2 3 4 5 6 7 8

Seating

Parameter Estimates Term Estimate Std Error t Ratio Prob>|t| Intercept 30.829816 2.277905 13.53 <.0001 Seating 1.3022488 0.442389 2.94 0.0040

Multiple Linear Regression Response GP1000MHwy Whole Model Parameter Estimates Term Estimate Std Error t Ratio Prob>|t| Intercept 19.100521 2.098478 9.10 <.0001 Weight(lb) 0.0040877 0.001203 3.40 0.0010 Cargo 0.0533 0.013787 3.87 0.0002 Seating 0.0268912 0.428283 0.06 0.9501 Horsepower 0.0426999 0.01567 2.73 0.0075

Partial Slopes vs. Marginal Slopes Example

• In order to evaluate the benefits of a proposed irrigation scheme in a certain region, suppose that the relation of yield Y to rainfall R is investigated over several years.

• Data is in rainfall.JMP.

B i v a r i a t e F i t o f Y i e l d B y T o t a l S p r i n g R a i n f a l l

3 0

4 0

5 0

6 0

7 0

8 0

9 0Y

ield

7 8 9 1 0 11 1 2 1 3

T o ta l S p rin g R a in fa ll

L in e a r F it

L i n e a r F i t Y i e l d = 7 6 . 6 6 6 6 6 7 - 1 . 6 6 6 6 6 6 7 T o t a l S p r i n g R a i n f a l l S u m m a r y o f F i t R S q u a r e 0 . 0 2 7 7 7 8 R S q u a r e A d j - 0 . 1 3 4 2 6 R o o t M e a n S q u a r e E r r o r 1 3 . 9 4 4 3 3 M e a n o f R e s p o n s e 6 0 O b s e r v a t i o n s ( o r S u m W g t s ) 8 P a r a m e t e r E s t i m a t e s T e r m E s t i m a t e S t d E r r o r t R a t i o P r o b > | t | I n t e r c e p t 7 6 . 6 6 6 6 6 7 4 0 . 5 5 4 6 1 . 8 9 0 . 1 0 7 6 T o t a l S p r i n g R a i n f a l l - 1 . 6 6 6 6 6 7 4 . 0 2 5 3 8 2 - 0 . 4 1 0 . 6 9 3 2

42.5

45

47.5

50

52.5

55

57.5A

vera

ge S

prin

g Te

mpe

ratu

re

7 8 9 10 11 12 13

Total Spring Rainfall

Bivariate Fit of Average Spring Temperature By Total Spring Rainfall

Higher rainfall is associated with lower temperature.

Multiple Linear Regression Response Yield Parameter Estimates Term Estimate Std Error t Ratio Prob>|t| Intercept -144.7619 55.8499 -2.59 0.0487 Total Spring Rainfall 5.7142857 2.680238 2.13 0.0862 Average Spring Temperature 2.952381 0.692034 4.27 0.0080

Rainfall is estimated to be beneficial once temperature is held fixed.

Multiple regression provides a better picture of the benefits of an irrigation scheme because temperature would be held fixed inan irrigation scheme.

lecture 6 notes

Documents