business statistics: a decision-making approach, 7th … · business statistics dr. mohammad zainal...
TRANSCRIPT
Business Statistics
Dr. Mohammad Zainal
Chapter 14
Multiple Regression
ECON 509
Department of Economics
Chapter Goals
After completing this chapter, you should be able to:
Apply multiple regression analysis to business decision-
making situations
Analyze and interpret the computer output for a multiple
regression model
Perform a hypothesis test for all regression coefficients
ECON 509, by Dr. M. Zainal Chap 14-2
The Multiple Regression Model
Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more independent variables (Xi)
εXβXβXββY kk22110
Multiple Regression Model with k Independent Variables:
Y-intercept Population slopes Random Error
ECON 509, by Dr. M. Zainal Chap 14-3
Multiple Regression Equation
The coefficients of the multiple regression model are
estimated using sample data
kik2i21i10i xbxbxbby ˆ
Estimated (or predicted) value of y
Estimated slope coefficients
Multiple regression equation with k independent variables:
Estimated intercept
In this chapter we will always use a computer to obtain the
regression slope coefficients and other regression
summary measures. ECON 509, by Dr. M. Zainal Chap 14-4
Multiple Regression Equation
Two variable model
y
x1
x2
22110 xbxbby ˆ
(continued)
ECON 509, by Dr. M. Zainal Chap 14-5
Multiple Regression Model
Two variable model
y
x1
x2
yi
yi
<
ei = (yi – yi)
<
x2i
x1i The best fit equation, y ,
is found by minimizing the
sum of squared errors, e2
<
Sample
observation 22110 xbxbby ˆ
ECON 509, by Dr. M. Zainal Chap 14-6
Standard Multiple Regression Assumptions
The values xi and the error terms εi are
independent
The error terms are random variables with
mean 0 and a constant variance, 2.
(The constant variance property is called homoscedasticity)
n), 1,(i for σ]E[εand0]E[ε 22
ii
ECON 509, by Dr. M. Zainal Chap 14-7
The random error terms, εi , are not correlated with one another, so that
It is not possible to find a set of numbers, c0, c1, . . . , ck, such that
(This is the property of no linear relation for
the Xj’s)
(continued)
ji all for 0]εE[ε ji
0xcxcxcc KiK2i21i10
Standard Multiple Regression Assumptions
ECON 509, by Dr. M. Zainal Chap 14-8
Example: 2 Independent Variables
A distributor of frozen desert pies wants to
evaluate factors thought to influence demand
Dependent variable: Pie sales (units per week)
Independent variables: Price (in $)
Advertising ($100’s)
Data are collected for 15 weeks
ECON 509, by Dr. M. Zainal Chap 14-9
Pie Sales Example
Sales = b0 + b1 (Price)
+ b2 (Advertising)
Week
Pie
Sales
Price
($)
Advertising
($100s)
1 350 5.50 3.3
2 460 7.50 3.3
3 350 8.00 3.0
4 430 8.00 4.5
5 350 6.80 3.0
6 380 7.50 4.0
7 430 4.50 3.0
8 470 6.40 3.7
9 450 7.00 3.5
10 490 5.00 4.0
11 340 7.20 3.5
12 300 7.90 3.2
13 440 5.90 4.0
14 450 5.00 3.5
15 300 7.00 2.7
Multiple regression equation:
ECON 509, by Dr. M. Zainal Chap 14-10
Estimating a Multiple Linear Regression Equation
Excel will be used to generate the coefficients and
measures of goodness of fit for multiple regression
Data / Data Analysis / Regression
ECON 509, by Dr. M. Zainal Chap 14-11
Multiple Regression Output
Regression Statistics
Multiple R 0.72213
R Square 0.52148
Adjusted R Square 0.44172
Standard Error 47.46341
Observations 15
ANOVA df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
ertising)74.131(Adv ce)24.975(Pri - 306.526 Sales
ECON 509, by Dr. M. Zainal Chap 14-12
The Multiple Regression Equation
ertising)74.131(Adv ce)24.975(Pri - 306.526 Sales
b1 = -24.975: sales
will decrease, on
average, by 24.975
pies per week for
each $1 increase in
selling price, net of
the effects of changes
due to advertising
b2 = 74.131: sales will
increase, on average,
by 74.131 pies per
week for each $100
increase in
advertising, net of the
effects of changes
due to price
where
Sales is in number of pies per week
Price is in $
Advertising is in $100’s.
ECON 509, by Dr. M. Zainal Chap 14-13
Coefficient of Determination, R2
Reports the proportion of total variation in y
explained by all x variables taken together
This is the ratio of the explained variability to
total sample variability
squares of sum total
squares of sum regression
SST
SSRR2
ECON 509, by Dr. M. Zainal Chap 14-14
Coefficient of Determination, R2
Regression Statistics
Multiple R 0.72213
R Square 0.52148
Adjusted R Square 0.44172
Standard Error 47.46341
Observations 15
ANOVA df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
.5214856493.3
29460.0
SST
SSRR2
52.1% of the variation in pie sales
is explained by the variation in
price and advertising
(continued)
ECON 509, by Dr. M. Zainal Chap 14-15
Estimation of Error Variance
Consider the population regression model
The unbiased estimate of the variance of the errors is
where
The square root of the variance, se , is called the
standard error of the estimate
1Kn
SSE
1Kn
e
s
n
1i
2
i2
e
iKiK2i21i10i εxβxβxββY
iii yye ˆ
ECON 509, by Dr. M. Zainal Chap 14-16
Standard Error, se
Regression Statistics
Multiple R 0.72213
R Square 0.52148
Adjusted R Square 0.44172
Standard Error 47.46341
Observations 15
ANOVA df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
The magnitude of this
value can be compared to
the average y value
ECON 509, by Dr. M. Zainal Chap 14-17
Adjusted Coefficient of Determination,
R2 never decreases when a new X variable is added to the model, even if the new variable is not an important predictor variable
This can be a disadvantage when comparing models
What is the net effect of adding a new variable?
We lose a degree of freedom when a new X variable is added
Did the new X variable add enough explanatory power to offset the loss of one degree of freedom?
2R
ECON 509, by Dr. M. Zainal Chap 14-18
Adjusted Coefficient of Determination,
Used to correct for the fact that adding non-relevant
independent variables will still reduce the error sum of
squares
(where n = sample size, K = number of independent variables)
Adjusted R2 provides a better comparison between
multiple regression models with different numbers of
independent variables
Penalize excessive use of unimportant independent
variables
Smaller than R2
(continued)
2R
1)(n/SST
1)K(n/SSE1R2
ECON 509, by Dr. M. Zainal Chap 14-19
Regression Statistics
Multiple R 0.72213
R Square 0.52148
Adjusted R Square 0.44172
Standard Error 47.46341
Observations 15
ANOVA df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
.44172R2
44.2% of the variation in pie sales is
explained by the variation in price and
advertising, taking into account the sample
size and number of independent variables
2R
ECON 509, by Dr. M. Zainal Chap 14-20
Evaluating Individual Regression Coefficients
Use t-tests for individual coefficients
Shows if a specific independent variable is
conditionally important
Hypotheses:
H0: βj = 0 (no linear relationship)
H1: βj ≠ 0 (linear relationship does exist between xj and y)
ECON 509, by Dr. M. Zainal Chap 14-21
Evaluating Individual Regression Coefficients
H0: βj = 0 (no linear relationship)
H1: βj ≠ 0 (linear relationship does exist between xi and y) Test Statistic: (df = n – k – 1)
jb
j
S
0bt
(continued)
ECON 509, by Dr. M. Zainal Chap 14-22
Evaluating Individual Regression Coefficients
Regression Statistics
Multiple R 0.72213
R Square 0.52148
Adjusted R Square 0.44172
Standard Error 47.46341
Observations 15
ANOVA df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
t-value for Price is t = -2.306, with
p-value .0398
t-value for Advertising is t = 2.855,
with p-value .0145
(continued)
ECON 509, by Dr. M. Zainal Chap 14-23
H0: βj = 0
H1: βj 0
d.f. = 15-2-1 = 12
= .05
t12, .025 = 2.1788
The test statistic for each variable falls
in the rejection region (p-values < .05)
There is evidence that both
Price and Advertising affect
pie sales at = .05
From Excel output:
Reject H0 for each variable
Coefficients Standard Error t Stat P-value
Price -24.97509 10.83213 -2.30565 0.03979
Advertising 74.13096 25.96732 2.85478 0.01449
Decision:
Conclusion:
Reject H0 Reject H0
/2=.025
-tα/2
Do not reject H0
0
tα/2
/2=.025
-2.1788 2.1788
Example: Evaluating Individual Regression Coefficients
ECON 509, by Dr. M. Zainal Chap 14-24
Confidence Interval Estimate for the Slope
Confidence interval limits for the population slope βj
Example: Form a 95% confidence interval for the effect of
changes in price (x1) on pie sales:
-24.975 ± (2.1788)(10.832)
So the interval is -48.576 < β1 < -1.374
jbα/21,Knj Stb
Coefficients Standard Error
Intercept 306.52619 114.25389
Price -24.97509 10.83213
Advertising 74.13096 25.96732
where t has (n – K – 1) d.f.
Here, t has
(15 – 2 – 1) = 12 d.f.
ECON 509, by Dr. M. Zainal Chap 14-25
Confidence Interval Estimate for the Slope
Confidence interval for the population slope βi
Example: Excel output also reports these interval endpoints:
Weekly sales are estimated to be reduced by between 1.37 to
48.58 pies for each increase of $1 in the selling price
Coefficients Standard Error … Lower 95% Upper 95%
Intercept 306.52619 114.25389 … 57.58835 555.46404
Price -24.97509 10.83213 … -48.57626 -1.37392
Advertising 74.13096 25.96732 … 17.55303 130.70888
(continued)
ECON 509, by Dr. M. Zainal Chap 14-26
Test on All Coefficients
F-Test for Overall Significance of the Model
Shows if there is a linear relationship between all
of the X variables considered together and Y
Use F test statistic
Hypotheses:
H0: β1 = β2 = … = βk = 0 (no linear relationship)
H1: at least one βi ≠ 0 (at least one independent variable affects Y)
ECON 509, by Dr. M. Zainal Chap 14-27
F-Test for Overall
Significance
Test statistic:
where F has k (numerator) and
(n – K – 1) (denominator)
degrees of freedom
The decision rule is
1)KSSE/(n
SSR/K
s
MSRF
2
e
α1,Knk,0 FF if H Reject
ECON 509, by Dr. M. Zainal Chap 14-28
F-Test for Overall
Significance
6.53862252.8
14730.0
MSE
MSRF
Regression Statistics
Multiple R 0.72213
R Square 0.52148
Adjusted R Square 0.44172
Standard Error 47.46341
Observations 15
ANOVA df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
(continued)
With 2 and 12 degrees
of freedom P-value for
the F-Test
ECON 509, by Dr. M. Zainal Chap 14-29
F-Test for Overall
Significance
H0: β1 = β2 = 0
H1: β1 and β2 not both zero
= .05
df1= 2 df2 = 12
Test Statistic:
Decision:
Conclusion:
Since F test statistic is in
the rejection region (p-
value < .05), reject H0
There is evidence that at least one
independent variable affects Y 0
= .05
F.05 = 3.885
Reject H0 Do not reject H0
6.5386MSE
MSRF
Critical
Value:
F = 3.885
(continued)
F
ECON 509, by Dr. M. Zainal Chap 14-30
Driving Experience Example
ECON 509, by Dr. M. Zainal Chap 14-31
A researcher wanted to find the effect of
driving experience and the number of
driving violations on auto insurance
premiums. A sample of 12 drivers insured
with the same company and having
similar insurance policies was selected
from a city. The table lists the monthly
auto insurance premiums (in dollars) paid,
their driving experiences (in years), and
the numbers of driving violations
committed by them. Using MINITAB, find
the regression equation of monthly
premiums paid by drivers on the driving
experiences and the numbers of driving
violations.
Monthly
Premium
Driving
Experience
Number of Driving
Violation
148 5 2
76 14 0
100 6 1
126 10 3
194 4 6
110 8 2
114 11 3
86 16 1
198 3 5
92 9 1
70 19 0
120 13 3
Driving Experience Example
ECON 509, by Dr. M. Zainal Chap 14-32
Solution
Let
We are to estimate the regression model
Using Minitab
Driving Experience Example
ECON 509, by Dr. M. Zainal Chap 14-34
a) Explain the meaning of the estimated regression coefficients.
From Minitab output
a = 110, b1 = 2.75 and b2 = 16.1
The value of a = 110 gives the value of the estimated monthly premium for a driver with no
driving experience and no driving violations committed in the past three years and it is
expected to be $110.28 per month.
The value of b1 = 2.75 gives the change in the estimated monthly premium for a driver
for a one-unit change in x1 when x2 is held constant. Thus, we can state that a driver
with one extra year of experience but the same number of driving violations is expected
to pay (or $2.75) less per month for the auto insurance premium.
The value of b2 = 16.1 gives the change in the estimated monthly premium for a driver
for a one-unit change in x2 when x1 is held constant. Thus, a driver with one extra
driving violation during the past three years but with the same years of driving experience
is expected to pay $16.106 (or $16.11) more per month for the auto insurance premium.
Driving Experience Example
ECON 509, by Dr. M. Zainal Chap 14-35
b) What are the values of the standard deviation of errors, the coefficient of multiple
determination, and the adjusted coefficient of multiple determination?
Standard deviation of errors is 12.1459
R2 = 93.1% Adjusted R2 = 91.6%
Driving Experience Example
ECON 509, by Dr. M. Zainal Chap 14-36
c) What is the predicted auto insurance premium paid per month by a driver with seven
years of driving experience and three driving violations committed in the past three
years?
$139.37
7 3
Driving Experience Example
ECON 509, by Dr. M. Zainal Chap 14-37
c) Determine a 95% confidence interval for B1 (the coefficient of experience) for the multiple
regression of auto insurance premium on driving experience and the number of driving
violations.
= -2.7473 ± 2.2100
= -2.7473 ± (2.262)(.977)
So the interval is -4.9573 < β1 < -.5373
Here, t has 12 - 2 - 1 d.f. = 9
Driving Experience Example
ECON 509, by Dr. M. Zainal Chap 14-38
d) Using the 2.5% significance level, can you conclude that the coefficient of the number of
years of driving experience in regression model is negative?
H0: β1 = 0
H1: β1 < 0
d.f. = 12-2-1 = 1 and = .05
t9, .025 = -2.262
Reject H0
/2=.025
-tα/2
Do not reject H0
0
-2.262
Driving Experience Example
ECON 509, by Dr. M. Zainal Chap 14-39
The test statistic falls in the rejection
region
There is evidence that both Price and
Advertising affect pie sales at = .05
Reject H0 for each variable
1b
1
S
0bt
81.2
.977
02.7473-
Decision:
Conclusion: