business statistics: a decision-making approach, 7th … · business statistics dr. mohammad zainal...

41
Business Statistics Dr. Mohammad Zainal Chapter 14 Multiple Regression ECON 509 Department of Economics

Upload: dangkiet

Post on 26-Aug-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

Business Statistics

Dr. Mohammad Zainal

Chapter 14

Multiple Regression

ECON 509

Department of Economics

Chapter Goals

After completing this chapter, you should be able to:

Apply multiple regression analysis to business decision-

making situations

Analyze and interpret the computer output for a multiple

regression model

Perform a hypothesis test for all regression coefficients

ECON 509, by Dr. M. Zainal Chap 14-2

The Multiple Regression Model

Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more independent variables (Xi)

εXβXβXββY kk22110

Multiple Regression Model with k Independent Variables:

Y-intercept Population slopes Random Error

ECON 509, by Dr. M. Zainal Chap 14-3

Multiple Regression Equation

The coefficients of the multiple regression model are

estimated using sample data

kik2i21i10i xbxbxbby ˆ

Estimated (or predicted) value of y

Estimated slope coefficients

Multiple regression equation with k independent variables:

Estimated intercept

In this chapter we will always use a computer to obtain the

regression slope coefficients and other regression

summary measures. ECON 509, by Dr. M. Zainal Chap 14-4

Multiple Regression Equation

Two variable model

y

x1

x2

22110 xbxbby ˆ

(continued)

ECON 509, by Dr. M. Zainal Chap 14-5

Multiple Regression Model

Two variable model

y

x1

x2

yi

yi

<

ei = (yi – yi)

<

x2i

x1i The best fit equation, y ,

is found by minimizing the

sum of squared errors, e2

<

Sample

observation 22110 xbxbby ˆ

ECON 509, by Dr. M. Zainal Chap 14-6

Standard Multiple Regression Assumptions

The values xi and the error terms εi are

independent

The error terms are random variables with

mean 0 and a constant variance, 2.

(The constant variance property is called homoscedasticity)

n), 1,(i for σ]E[εand0]E[ε 22

ii

ECON 509, by Dr. M. Zainal Chap 14-7

The random error terms, εi , are not correlated with one another, so that

It is not possible to find a set of numbers, c0, c1, . . . , ck, such that

(This is the property of no linear relation for

the Xj’s)

(continued)

ji all for 0]εE[ε ji

0xcxcxcc KiK2i21i10

Standard Multiple Regression Assumptions

ECON 509, by Dr. M. Zainal Chap 14-8

Example: 2 Independent Variables

A distributor of frozen desert pies wants to

evaluate factors thought to influence demand

Dependent variable: Pie sales (units per week)

Independent variables: Price (in $)

Advertising ($100’s)

Data are collected for 15 weeks

ECON 509, by Dr. M. Zainal Chap 14-9

Pie Sales Example

Sales = b0 + b1 (Price)

+ b2 (Advertising)

Week

Pie

Sales

Price

($)

Advertising

($100s)

1 350 5.50 3.3

2 460 7.50 3.3

3 350 8.00 3.0

4 430 8.00 4.5

5 350 6.80 3.0

6 380 7.50 4.0

7 430 4.50 3.0

8 470 6.40 3.7

9 450 7.00 3.5

10 490 5.00 4.0

11 340 7.20 3.5

12 300 7.90 3.2

13 440 5.90 4.0

14 450 5.00 3.5

15 300 7.00 2.7

Multiple regression equation:

ECON 509, by Dr. M. Zainal Chap 14-10

Estimating a Multiple Linear Regression Equation

Excel will be used to generate the coefficients and

measures of goodness of fit for multiple regression

Data / Data Analysis / Regression

ECON 509, by Dr. M. Zainal Chap 14-11

Multiple Regression Output

Regression Statistics

Multiple R 0.72213

R Square 0.52148

Adjusted R Square 0.44172

Standard Error 47.46341

Observations 15

ANOVA df SS MS F Significance F

Regression 2 29460.027 14730.013 6.53861 0.01201

Residual 12 27033.306 2252.776

Total 14 56493.333

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404

Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392

Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888

ertising)74.131(Adv ce)24.975(Pri - 306.526 Sales

ECON 509, by Dr. M. Zainal Chap 14-12

The Multiple Regression Equation

ertising)74.131(Adv ce)24.975(Pri - 306.526 Sales

b1 = -24.975: sales

will decrease, on

average, by 24.975

pies per week for

each $1 increase in

selling price, net of

the effects of changes

due to advertising

b2 = 74.131: sales will

increase, on average,

by 74.131 pies per

week for each $100

increase in

advertising, net of the

effects of changes

due to price

where

Sales is in number of pies per week

Price is in $

Advertising is in $100’s.

ECON 509, by Dr. M. Zainal Chap 14-13

Coefficient of Determination, R2

Reports the proportion of total variation in y

explained by all x variables taken together

This is the ratio of the explained variability to

total sample variability

squares of sum total

squares of sum regression

SST

SSRR2

ECON 509, by Dr. M. Zainal Chap 14-14

Coefficient of Determination, R2

Regression Statistics

Multiple R 0.72213

R Square 0.52148

Adjusted R Square 0.44172

Standard Error 47.46341

Observations 15

ANOVA df SS MS F Significance F

Regression 2 29460.027 14730.013 6.53861 0.01201

Residual 12 27033.306 2252.776

Total 14 56493.333

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404

Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392

Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888

.5214856493.3

29460.0

SST

SSRR2

52.1% of the variation in pie sales

is explained by the variation in

price and advertising

(continued)

ECON 509, by Dr. M. Zainal Chap 14-15

Estimation of Error Variance

Consider the population regression model

The unbiased estimate of the variance of the errors is

where

The square root of the variance, se , is called the

standard error of the estimate

1Kn

SSE

1Kn

e

s

n

1i

2

i2

e

iKiK2i21i10i εxβxβxββY

iii yye ˆ

ECON 509, by Dr. M. Zainal Chap 14-16

Standard Error, se

Regression Statistics

Multiple R 0.72213

R Square 0.52148

Adjusted R Square 0.44172

Standard Error 47.46341

Observations 15

ANOVA df SS MS F Significance F

Regression 2 29460.027 14730.013 6.53861 0.01201

Residual 12 27033.306 2252.776

Total 14 56493.333

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404

Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392

Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888

The magnitude of this

value can be compared to

the average y value

ECON 509, by Dr. M. Zainal Chap 14-17

Adjusted Coefficient of Determination,

R2 never decreases when a new X variable is added to the model, even if the new variable is not an important predictor variable

This can be a disadvantage when comparing models

What is the net effect of adding a new variable?

We lose a degree of freedom when a new X variable is added

Did the new X variable add enough explanatory power to offset the loss of one degree of freedom?

2R

ECON 509, by Dr. M. Zainal Chap 14-18

Adjusted Coefficient of Determination,

Used to correct for the fact that adding non-relevant

independent variables will still reduce the error sum of

squares

(where n = sample size, K = number of independent variables)

Adjusted R2 provides a better comparison between

multiple regression models with different numbers of

independent variables

Penalize excessive use of unimportant independent

variables

Smaller than R2

(continued)

2R

1)(n/SST

1)K(n/SSE1R2

ECON 509, by Dr. M. Zainal Chap 14-19

Regression Statistics

Multiple R 0.72213

R Square 0.52148

Adjusted R Square 0.44172

Standard Error 47.46341

Observations 15

ANOVA df SS MS F Significance F

Regression 2 29460.027 14730.013 6.53861 0.01201

Residual 12 27033.306 2252.776

Total 14 56493.333

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404

Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392

Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888

.44172R2

44.2% of the variation in pie sales is

explained by the variation in price and

advertising, taking into account the sample

size and number of independent variables

2R

ECON 509, by Dr. M. Zainal Chap 14-20

Evaluating Individual Regression Coefficients

Use t-tests for individual coefficients

Shows if a specific independent variable is

conditionally important

Hypotheses:

H0: βj = 0 (no linear relationship)

H1: βj ≠ 0 (linear relationship does exist between xj and y)

ECON 509, by Dr. M. Zainal Chap 14-21

Evaluating Individual Regression Coefficients

H0: βj = 0 (no linear relationship)

H1: βj ≠ 0 (linear relationship does exist between xi and y) Test Statistic: (df = n – k – 1)

jb

j

S

0bt

(continued)

ECON 509, by Dr. M. Zainal Chap 14-22

Evaluating Individual Regression Coefficients

Regression Statistics

Multiple R 0.72213

R Square 0.52148

Adjusted R Square 0.44172

Standard Error 47.46341

Observations 15

ANOVA df SS MS F Significance F

Regression 2 29460.027 14730.013 6.53861 0.01201

Residual 12 27033.306 2252.776

Total 14 56493.333

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404

Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392

Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888

t-value for Price is t = -2.306, with

p-value .0398

t-value for Advertising is t = 2.855,

with p-value .0145

(continued)

ECON 509, by Dr. M. Zainal Chap 14-23

H0: βj = 0

H1: βj 0

d.f. = 15-2-1 = 12

= .05

t12, .025 = 2.1788

The test statistic for each variable falls

in the rejection region (p-values < .05)

There is evidence that both

Price and Advertising affect

pie sales at = .05

From Excel output:

Reject H0 for each variable

Coefficients Standard Error t Stat P-value

Price -24.97509 10.83213 -2.30565 0.03979

Advertising 74.13096 25.96732 2.85478 0.01449

Decision:

Conclusion:

Reject H0 Reject H0

/2=.025

-tα/2

Do not reject H0

0

tα/2

/2=.025

-2.1788 2.1788

Example: Evaluating Individual Regression Coefficients

ECON 509, by Dr. M. Zainal Chap 14-24

Confidence Interval Estimate for the Slope

Confidence interval limits for the population slope βj

Example: Form a 95% confidence interval for the effect of

changes in price (x1) on pie sales:

-24.975 ± (2.1788)(10.832)

So the interval is -48.576 < β1 < -1.374

jbα/21,Knj Stb

Coefficients Standard Error

Intercept 306.52619 114.25389

Price -24.97509 10.83213

Advertising 74.13096 25.96732

where t has (n – K – 1) d.f.

Here, t has

(15 – 2 – 1) = 12 d.f.

ECON 509, by Dr. M. Zainal Chap 14-25

Confidence Interval Estimate for the Slope

Confidence interval for the population slope βi

Example: Excel output also reports these interval endpoints:

Weekly sales are estimated to be reduced by between 1.37 to

48.58 pies for each increase of $1 in the selling price

Coefficients Standard Error … Lower 95% Upper 95%

Intercept 306.52619 114.25389 … 57.58835 555.46404

Price -24.97509 10.83213 … -48.57626 -1.37392

Advertising 74.13096 25.96732 … 17.55303 130.70888

(continued)

ECON 509, by Dr. M. Zainal Chap 14-26

Test on All Coefficients

F-Test for Overall Significance of the Model

Shows if there is a linear relationship between all

of the X variables considered together and Y

Use F test statistic

Hypotheses:

H0: β1 = β2 = … = βk = 0 (no linear relationship)

H1: at least one βi ≠ 0 (at least one independent variable affects Y)

ECON 509, by Dr. M. Zainal Chap 14-27

F-Test for Overall

Significance

Test statistic:

where F has k (numerator) and

(n – K – 1) (denominator)

degrees of freedom

The decision rule is

1)KSSE/(n

SSR/K

s

MSRF

2

e

α1,Knk,0 FF if H Reject

ECON 509, by Dr. M. Zainal Chap 14-28

F-Test for Overall

Significance

6.53862252.8

14730.0

MSE

MSRF

Regression Statistics

Multiple R 0.72213

R Square 0.52148

Adjusted R Square 0.44172

Standard Error 47.46341

Observations 15

ANOVA df SS MS F Significance F

Regression 2 29460.027 14730.013 6.53861 0.01201

Residual 12 27033.306 2252.776

Total 14 56493.333

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404

Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392

Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888

(continued)

With 2 and 12 degrees

of freedom P-value for

the F-Test

ECON 509, by Dr. M. Zainal Chap 14-29

F-Test for Overall

Significance

H0: β1 = β2 = 0

H1: β1 and β2 not both zero

= .05

df1= 2 df2 = 12

Test Statistic:

Decision:

Conclusion:

Since F test statistic is in

the rejection region (p-

value < .05), reject H0

There is evidence that at least one

independent variable affects Y 0

= .05

F.05 = 3.885

Reject H0 Do not reject H0

6.5386MSE

MSRF

Critical

Value:

F = 3.885

(continued)

F

ECON 509, by Dr. M. Zainal Chap 14-30

Driving Experience Example

ECON 509, by Dr. M. Zainal Chap 14-31

A researcher wanted to find the effect of

driving experience and the number of

driving violations on auto insurance

premiums. A sample of 12 drivers insured

with the same company and having

similar insurance policies was selected

from a city. The table lists the monthly

auto insurance premiums (in dollars) paid,

their driving experiences (in years), and

the numbers of driving violations

committed by them. Using MINITAB, find

the regression equation of monthly

premiums paid by drivers on the driving

experiences and the numbers of driving

violations.

Monthly

Premium

Driving

Experience

Number of Driving

Violation

148 5 2

76 14 0

100 6 1

126 10 3

194 4 6

110 8 2

114 11 3

86 16 1

198 3 5

92 9 1

70 19 0

120 13 3

Driving Experience Example

ECON 509, by Dr. M. Zainal Chap 14-32

Solution

Let

We are to estimate the regression model

Using Minitab

Driving Experience Example

ECON 509, by Dr. M. Zainal Chap 14-33

Driving Experience Example

ECON 509, by Dr. M. Zainal Chap 14-34

a) Explain the meaning of the estimated regression coefficients.

From Minitab output

a = 110, b1 = 2.75 and b2 = 16.1

The value of a = 110 gives the value of the estimated monthly premium for a driver with no

driving experience and no driving violations committed in the past three years and it is

expected to be $110.28 per month.

The value of b1 = 2.75 gives the change in the estimated monthly premium for a driver

for a one-unit change in x1 when x2 is held constant. Thus, we can state that a driver

with one extra year of experience but the same number of driving violations is expected

to pay (or $2.75) less per month for the auto insurance premium.

The value of b2 = 16.1 gives the change in the estimated monthly premium for a driver

for a one-unit change in x2 when x1 is held constant. Thus, a driver with one extra

driving violation during the past three years but with the same years of driving experience

is expected to pay $16.106 (or $16.11) more per month for the auto insurance premium.

Driving Experience Example

ECON 509, by Dr. M. Zainal Chap 14-35

b) What are the values of the standard deviation of errors, the coefficient of multiple

determination, and the adjusted coefficient of multiple determination?

Standard deviation of errors is 12.1459

R2 = 93.1% Adjusted R2 = 91.6%

Driving Experience Example

ECON 509, by Dr. M. Zainal Chap 14-36

c) What is the predicted auto insurance premium paid per month by a driver with seven

years of driving experience and three driving violations committed in the past three

years?

$139.37

7 3

Driving Experience Example

ECON 509, by Dr. M. Zainal Chap 14-37

c) Determine a 95% confidence interval for B1 (the coefficient of experience) for the multiple

regression of auto insurance premium on driving experience and the number of driving

violations.

= -2.7473 ± 2.2100

= -2.7473 ± (2.262)(.977)

So the interval is -4.9573 < β1 < -.5373

Here, t has 12 - 2 - 1 d.f. = 9

Driving Experience Example

ECON 509, by Dr. M. Zainal Chap 14-38

d) Using the 2.5% significance level, can you conclude that the coefficient of the number of

years of driving experience in regression model is negative?

H0: β1 = 0

H1: β1 < 0

d.f. = 12-2-1 = 1 and = .05

t9, .025 = -2.262

Reject H0

/2=.025

-tα/2

Do not reject H0

0

-2.262

Driving Experience Example

ECON 509, by Dr. M. Zainal Chap 14-39

The test statistic falls in the rejection

region

There is evidence that both Price and

Advertising affect pie sales at = .05

Reject H0 for each variable

1b

1

S

0bt

81.2

.977

02.7473-

Decision:

Conclusion:

Driving Experience Example

ECON 509, by Dr. M. Zainal Chap 14-40

Copyright

The materials of this presentation were mostly

taken from the PowerPoint files accompanied

Business Statistics: A Decision-Making Approach,

7e © 2008 Prentice-Hall, Inc.

ECON 509, by Dr. M. Zainal Chap 14-41