bus b272 f unit 1

133
BUS B272F Unit 1 ANOVA and Linear Regression ANOVA and Linear Regression

Upload: kocho2

Post on 16-Dec-2014

485 views

Category:

Education


1 download

DESCRIPTION

unit1

TRANSCRIPT

Page 1: Bus b272 f unit 1

BUS B272F Unit 1 ANOVA and Linear Regression

ANOVA andLinear Regression

Page 2: Bus b272 f unit 1

BUS B272F Unit 1 ANOVA and Linear Regression

Analysis of Variance(ANOVA)

Page 3: Bus b272 f unit 1

ANOVA and Linear Regression

3

BUS B272 Unit 1

Analysis of Variance

The Analysis of Variance (ANOVA) is a procedure that tests to determine whether differences exist between two or more populations.

The techniques analyzes the variance of the data to determine whether we can infer that the populations differ.

Page 4: Bus b272 f unit 1

ANOVA and Linear Regression

4

BUS B272 Unit 1

One way (Single-factor) analysis of variance ANOVA assumptions F test for difference among k means

Topics

Page 5: Bus b272 f unit 1

ANOVA and Linear Regression

5

BUS B272 Unit 1

General Experimental Setting Investigator controls one or more

independent variables Called treatments or factors Each treatment contains two or more

levels (or categories/classifications) Observe effects on dependent

variable Response to different levels of

independent variable Experimental design: the plan used to

test hypothesis

Page 6: Bus b272 f unit 1

ANOVA and Linear Regression

6

BUS B272 Unit 1

Completely Randomized Design

Experimental units (subjects) are assigned randomly to treatments Subjects are assumed homogeneous

Only one factor or independent variable With two or more treatment levels

Analyzed by One-way analysis of variance (one-way

ANOVA)

Page 7: Bus b272 f unit 1

ANOVA and Linear Regression

7

BUS B272 Unit 1

Factor (Training Method)

Factor Levels

(Treatments)

Randomly Assigned

Units

Dependent Variable

(Response)

21 hrs 17 hrs 31 hrs

27 hrs 25 hrs 28 hrs

29 hrs 20 hrs 22 hrs

Randomized Design Example

Page 8: Bus b272 f unit 1

ANOVA and Linear Regression

8

BUS B272 Unit 1

One-way Analysis of Variance F Test

Evaluate the difference among the mean responses of 2 or more (k) populationse.g. : Several types of tires, oven temperature settings, different types of marketing strategies

Page 9: Bus b272 f unit 1

ANOVA and Linear Regression

9

BUS B272 Unit 1

Samples are randomly and independently drawn

This condition must be met Populations are normally distributed

F test is robust to moderate departure from normality

Populations have equal variances

Assumptions of ANOVA

Page 10: Bus b272 f unit 1

ANOVA and Linear Regression

10

BUS B272 Unit 1

Hypotheses of One-Way ANOVA

All population means are equal No treatment effect (no variation in means

among groups)

At least one population mean is different (others may be the same!)

There is treatment effect Does not mean that all population means are

different

1 : Not all are the sameiH

kH 210 :

Page 11: Bus b272 f unit 1

ANOVA and Linear Regression

11

BUS B272 Unit 1

One-way ANOVA (No Treatment Effect)

The Null Hypothesis is True

1 : Not all are the sameiH

1 2 3

kH 210 :

Page 12: Bus b272 f unit 1

ANOVA and Linear Regression

12

BUS B272 Unit 1

One-way ANOVA (Treatment Effect Present)

The Null Hypothesis is

NOT True

1 : Not all are the sameiH

1 2 3 1 2 3

kH 210 :

Page 13: Bus b272 f unit 1

ANOVA and Linear Regression

13

BUS B272 Unit 1

One-way ANOVA(Partition of Total Variation)

Variation Due to Treatment SST

Variation Due to Random Sampling SSE

Total Variation SS(Total)

= +

Page 14: Bus b272 f unit 1

ANOVA and Linear Regression

14

BUS B272 Unit 1

ANOVA set-up

Page 15: Bus b272 f unit 1

ANOVA and Linear Regression

15

BUS B272 Unit 1

Total Variation

: the i-th observation in group j

: the number of observations in group j

n : the total number of observations in all groups

k : the number of groups

ijX

jn

k

j

n

iij

j

XXTotalSS1 1

2)(

n

X

X

k

j

n

iij

j

1 1 the overall or grand mean

Page 16: Bus b272 f unit 1

ANOVA and Linear Regression

16

BUS B272 Unit 1

Total Variation(continued)

Group 1 Group 2 Group 3

Response, X

X

2221

211)( XXXXXXTotalSS knk

Page 17: Bus b272 f unit 1

ANOVA and Linear Regression

17

BUS B272 Unit 1

Among-Treatments Variation

Variation Due to Differences Among Groupsi j

: The sample mean of group

: The overall or grand mean

jX j

X

21

XXnSST j

k

jj

1

k

SSTMST

Page 18: Bus b272 f unit 1

ANOVA and Linear Regression

18

BUS B272 Unit 1

Among-Treatments Variation(continued)

Group 1 Group 2 Group 3

Response, X

X1X 2X

3X

2222

211 XXnXXnXXnSST kk

Page 19: Bus b272 f unit 1

ANOVA and Linear Regression

19

BUS B272 Unit 1

Summing the variation within each treatment and then adding over all treatments.

: The sample mean of group

: The -th observation in group

j

ij

X j

X i j

Within-Treatment Variation

k

ij

n

ijij

j

XXSSE1

2

kn

SSEMSE

Page 20: Bus b272 f unit 1

ANOVA and Linear Regression

20

BUS B272 Unit 1

Within-Treatment Variation(continued)

Group 1 Group 2 Group 3

Response, X

1X 2X3X

22

22

1

222

2222

2212

211

2121

2111

2

1

kknkkkk

n

n

XXXXXX

XXXXXX

XXXXXXSSE

k

Page 21: Bus b272 f unit 1

ANOVA and Linear Regression

21

BUS B272 Unit 1

Within-Treatment Variation(continued)

For k = 2, this is the pooled-variance in the t-test.

• If more than 2 groups, use F test.

• For 2 groups, use t-test. F test is more limited.

111

111

21

2222

211

k

kk

nnn

SnSnSn

kn

SSEMSE

Page 22: Bus b272 f unit 1

ANOVA and Linear Regression

22

BUS B272 Unit 1

One-way ANOVAF Test Statistic

Test statistic:

MST is mean squares among or between variances

MSE is mean squares within or error variances

Degrees of freedom: kndf 2

MSE

MSTF

11 kdf

Page 23: Bus b272 f unit 1

ANOVA and Linear Regression

23

BUS B272 Unit 1

One-way ANOVA Summary Table

Source ofVariation

Degrees of

Freedom

Sum ofSquares

Mean Squares

(Variance)

FStatistic

Among(treatmen

t)k – 1 SST MST =

SST/(k – 1 ) MST/MSE

Within(Error) n – k SSE

MSE =SSE/(n – k )

Total n – 1 SS(Total)

SSESSTTotalSS )(

Page 24: Bus b272 f unit 1

ANOVA and Linear Regression

24

BUS B272 Unit 1

Features of One-way ANOVA F Statistic

The F statistic is the ratio of the among estimate of variance and the within estimate of variance. The ratio must always be positive df1 = k -1 will typically be small df2 = n - k will typically be large

The ratio should be closed to 1 if the null is true.

Page 25: Bus b272 f unit 1

ANOVA and Linear Regression

25

BUS B272 Unit 1

One-way ANOVA F Test Example

As production manager, you want to see if three filling machines have different mean filling times. You assign 15 similarly trained and experienced workers, five per machine, to the machines. At the 0.05 significance level, is there a difference in mean filling times?

Machine1 Machine2 Machine3

25.40 23.40 20.0026.31 21.80 22.2024.10 23.50 19.7523.74 22.75 20.6025.10 21.60 20.40

Page 26: Bus b272 f unit 1

ANOVA and Linear Regression

26

BUS B272 Unit 1

One-way ANOVA Example: Scatter Diagram

27

26

25

24

23

22

21

20

19

••

•••

•••••

••••

Time in SecondsMachine1 Machine2 Machine3

25.40 23.40 20.0026.31 21.80 22.2024.10 23.50 19.7523.74 22.75 20.6025.10 21.60 20.40

1 2

3

24.93 22.61

20.59 22.71

X X

X X

1X

2X

3X

X

Page 27: Bus b272 f unit 1

ANOVA and Linear Regression

27

BUS B272 Unit 1

One-way ANOVA Example Computations

Machine 1 Machine 2 Machine 3

25.40 23.40 20.0026.31 21.80 22.2024.10 23.50 19.7523.74 22.75 20.6025.10 21.60 20.40

1

2

3

24.93

22.61

20.59

22.71

X

X

X

X

164.47

71.2259.2071.2261.2271.2293.245 222

SST

15

3

5

n

k

n j

Page 28: Bus b272 f unit 1

ANOVA and Linear Regression

28

BUS B272 Unit 1

9211.012

0532.11

kn

SSEMSE

5820.232

16.47

1

k

SSTMST

0532.11682.3112.32592.4 SSE

Page 29: Bus b272 f unit 1

ANOVA and Linear Regression

29

BUS B272 Unit 1

Summary Table

Source ofVariation

Degrees of

Freedom

Sum ofSquares

Mean Squares

(Variance)

F

Among(Treatmen

t)

Within(Error)

Total

3-1=2

15-3=12

15-1=14

47.1640

11.0532

58.2172

23.5820

0.9211

MST/MSE=25.602

Page 30: Bus b272 f unit 1

ANOVA and Linear Regression

30

BUS B272 Unit 1

3.89 F0

= 0.05

There is evidence to believe that at least one i differs from the rest.

Reject H0 at = 0.05

Critical Value(s):

602.259211.0

5820.23

MSE

MSTF

Test Statistic:

H0: 1 = 2 = 3

H1: Not all the means are equal

df1= 2 df2 = 12

One-way ANOVA Example Solution

Page 31: Bus b272 f unit 1

ANOVA and Linear Regression

31

BUS B272 Unit 1

Computer Application

To obtain the Microsoft Excel computer output in the previous page, first enter the data into c columns in an Excel file, then follow the commands:Tools/ Data Analysis/ Anova: Single Factor

Page 32: Bus b272 f unit 1

ANOVA and Linear Regression

32

BUS B272 Unit 1

Computer Output using Data Analysis of Excel

ANOVA

SUMMARY

Groups Count Sum Average Variance

Machine 1 5 124.65 24.93 1.0648Machine 2 5 113.05 22.61 0.778Machine 3 5 102.95 20.59 0.9205

ANOVASource of Variation SS df MS F P-value F-crit

Between Groups 47.164 2 23.582 25.602 4.684E-05 3.88529Within Groups 11.0532 12 0.9211Total 58.2172 14

Page 33: Bus b272 f unit 1

ANOVA and Linear Regression

33

Statistics Visa Company Card Cash Cheque

n 10 12 18 14

312 547 276 450

s 64 112 41 73

Exercise 1The manager of a large department store wants to test if the average size of customer transactions differs with four types of payment: Visa card, company card, cash or cheque. If there are differences in the average customer transaction size among the four types of payment, the manager will further investigate which types of payment will give rise to higher transaction volumes and hence he will design an appropriate promotional programme. A random sample of 54 customer transactions using various types of payment was drawn during the past two months. With reference to sampled data, the sample statistics are obtained as follows:

BUS B272 Unit 1

Test if differences of average customer transaction size exist among the four types of payment at a 0.05 level of significance.

x

Page 34: Bus b272 f unit 1

ANOVA and Linear Regression

34

Exercise 1One factor is involved, i.e. the type of payment. Under this factor, there are k = 4 treatments (or factor levels) which represent the four types of payment: Visa card, company card, cash and cheque. The experimental units are customer transactions.

BUS B272 Unit 1

388450142761854712312105411

1

k

jjj xn

nx

740,640

38845014388276183885471238831210 2222

1

2

k

jj xxnSST

702,272

7311441118112112641101 2222

1

2

k

jjj snSSE

Page 35: Bus b272 f unit 1

ANOVA and Linear Regression

35

Exercise 1Source Sum of

squaresDegrees of freedom

Mean squares F

Among treatments 640,740 k – 1 = 3 213,580 39.16

Within treatments 272,702 n – k = 50 5,454.04

Total 913,442 n – 1 = 53

differ means theof least twoAt :

:

1

43210

H

H

;16.39:StatisticTest MSE

MSTF

80.2:region Rejection 50,3,05.0,1, FFF knk

Since the test statistic of 39.16 is greater than the critical value of 2.80, reject H0. At 0.05 level of significance, there is evidence to reveal that the average customer transaction sizes are significantly different among the four types of payment.

Excel printoutANOVASource of variation SS df MS F P-valueBetween Groups 640740 3 213580 39.16 0.0000Within Groups 272702 50 5454.04Total 913442 53

BUS B272 Unit 1

Page 36: Bus b272 f unit 1

ANOVA and Linear Regression

36

Can ANOVA be replaced by t-Test?

t-Test : any difference between two population means μ1 and μ2

Multiple t-tests are required for more than two population means

Conducting multiple tests increases the probability of making Type I errors. E.g. compare 6 population means, if use ANOVA with significant level 5%, there will be a 5% chance we reject the null hypothesis when it is true. If we use t-test, we need to perform 15 tests and if same 5% significant level is set, the chance of a Type I error will be

1 – (1 - 0.05)15 = 0.54

BUS B272 Unit 1

Page 37: Bus b272 f unit 1

BUS B272F Unit 1 ANOVA and Linear Regression

Linear Regression

Page 38: Bus b272 f unit 1

ANOVA and Linear Regression

38

BUS B272 Unit 1

Linear Regression

Origin of regression Determining the simple linear regression

equation Assessing the fitness of the model Correlation analysis Estimation and prediction Assumptions of regression and

correlation

Page 39: Bus b272 f unit 1

ANOVA and Linear Regression

39

BUS B272 Unit 1

Origin of Regression “Regression," from a Latin root meaning

"going back," is a series of statistical methods used in studying the relationship between two variables and were first employed by Francis Galton in 1877.

Galton was interested in studying the relationship between a father’s height and the son’ s height. Making use of the “regression” method, he found that son’s height regress to the overall mean and the method is then called “regression”.

Page 40: Bus b272 f unit 1

ANOVA and Linear Regression

40

BUS B272 Unit 1

Linear Regression Analysis

Linear Regression analysis is used primarily to model and describe linear relationship and provide prediction among variables Predicts the value of a dependent

(response) variable based on the value of at least one independent (explanatory) variable

Express statistically the effect of the independent variables on the dependent variable

Page 41: Bus b272 f unit 1

ANOVA and Linear Regression

41

BUS B272 Unit 1

Types of Regression Models

Positive Linear Relationship

Negative Linear Relationship

Relationship NOT Linear

No Relationship

Page 42: Bus b272 f unit 1

ANOVA and Linear Regression

42

BUS B272 Unit 1

Simple Linear Regression Model

The relationship between two variables, say X and Y, is described by a linear function.

The change of the variable Y, (called dependent or response variable) is associated with the change in the other variable X (called independent or explanatory variable).

Explore the dependency of Y on X.

Page 43: Bus b272 f unit 1

ANOVA and Linear Regression

43

BUS B272 Unit 1

Why Regression?

X

1

2

3

4

Y

2

2.5

2.5

5

0

1

2

3

4

5

6

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

X

Y

3Y

(1, 2)

(2, 2.5)

(4, 5)

(3, 2.5)

5.53535.235.232square of Sum 2222 Y

45.135.4545.35.255.25.265.12ˆsquare of Sum 2222 Y

The larger the sum of squares, the poor the estimate.

Page 44: Bus b272 f unit 1

ANOVA and Linear Regression

44

BUS B272 Unit 1

Linear Relationship We wish to study whether there is

any association between two quantitative variables, say X and Y

If ‘Y tends to increase as X increases’

If ‘Y tends to decrease as X increases’

If the corresponding magnitude of increase or decrease follows a specific proportion, the relationship identified is said to be a linear one.

– a positive relationship

– a negative relationship

Page 45: Bus b272 f unit 1

ANOVA and Linear Regression

45

BUS B272 Unit 1

Scatter Diagram

A scatter diagram is a graph plotted for all X-Y pairs of the sample data.

By viewing a scatter diagram, one can determine whether a relationship exists between the two variables. It can also suggest the likely mathematical form of that relationship that allow one to judge initially and intuitively whether or not there exists a linear relationship between the two variables involved.

Page 46: Bus b272 f unit 1

ANOVA and Linear Regression

46

BUS B272 Unit 1

Example

0

10

20

30

40

0 50 100 150 200

Level of pollution

Num

ber o

f con

sulta

tions

The level of air pollution at Kwun Tong and the total number of consultations relating to respiratory diseases in a public clinic in the area were recorded during a specific time period on 14 randomly selected days.

Page 47: Bus b272 f unit 1

ANOVA and Linear Regression

47

BUS B272 Unit 1

PopulationRegressionLine (conditional mean)

Population Linear Regression

Population regression line is a straight line that describes the dependence of the average value (conditional mean) of one variable on the other

Population Y intercept

Population SlopeCoefficient

Random Error

Dependent (Response) Variable

Independent (Explanatory) Variable

ii iY X

YX

Page 48: Bus b272 f unit 1

ANOVA and Linear Regression

48

BUS B272 Unit 1

Population Linear Regression(continued)

ii iY X Random Error (vertical discrepancies or residual for point i )

Y

X

(Observed Value of Y) =

Observed Value of Y

YX iX

i

(Conditional Mean)

Page 49: Bus b272 f unit 1

ANOVA and Linear Regression

49

BUS B272 Unit 1

Least Squares Method

The line fitted by least squares is the one that makes the sum of squares of all those vertical discrepancies (residuals) as small as possible, i.e. minimum of

which is the sum of squared residuals.

2i

Page 50: Bus b272 f unit 1

ANOVA and Linear Regression

50

BUS B272 Unit 1

Sample Y intercept

Residual0 1i iib bY X e

0 1Y b b X

Sample regression line is formed by the point estimates of and , i.e., and . It provides an estimate of the population regression line as well as a predicted value of Y

Sample Linear Regression

Samplecoefficient of slope

Sample regression line (Fitted regression line or predicted value)

0 1 0b 1b

Page 51: Bus b272 f unit 1

ANOVA and Linear Regression

51

BUS B272 Unit 1

Sample Linear Regression

and are obtained by finding the specific values of and that minimizes the sum of the squared residuals

0b0b 1b

(continued)

22

1 1

ˆn n

i i ii i

Y Y e

1b

Page 52: Bus b272 f unit 1

ANOVA and Linear Regression

52

BUS B272 Unit 1

Coefficients of Sample Linear Regression

For ii XbbY 10ˆ

n

XX

n

YXYX

bi

i

iiii

22

1

XbYb 10

Page 53: Bus b272 f unit 1

ANOVA and Linear Regression

53

BUS B272 Unit 1

Interpretation of the Slope and the Intercept

is the average value of Y when

the value of X is zero.

measures the change in the

average value of Y as a result of a

one-unit change in X.

0

1

Page 54: Bus b272 f unit 1

ANOVA and Linear Regression

54

BUS B272 Unit 1

is the estimated average value

of Y when the value of X is zero.

is the estimated change in the

average value of Y as a result of

one-unit change in X.

(continued)

Interpretation of the Slope and the Intercept

0b

1b

Page 55: Bus b272 f unit 1

ANOVA and Linear Regression

55

BUS B272 Unit 1

Example 1 : Simple Linear Regression

Suppose that you want to examine the linear dependency of the annual sales among seven stores on their size in square footage. Sample data for seven stores were obtained. Find the equation of the straight line that fits the data best.

Annual Store Square Sales

Feet ($1000)

1 1,726 3,681 2 1,542 3,395 3 2,816 6,653 4 5,555 9,543 5 1,292 3,318 6 2,208 5,563 7 1,313 3,760

Page 56: Bus b272 f unit 1

ANOVA and Linear Regression

56

BUS B272 Unit 1

Example 1 : Scatter Diagram

0

2000

4000

6000

8000

10000

12000

0 1000 2000 3000 4000 5000 6000

Square Feet

An

nu

al

Sa

les

($00

0)

Excel Output

Page 57: Bus b272 f unit 1

ANOVA and Linear Regression

57

BUS B272 Unit 1

Computation of Regression Coefficient

Annual Square SalesStore Feet ($1000)

X Y

1 1,726 3,681 2 1,542 3,395 3 2,816 6,653 4 5,555 9,543 5 1,292 3,318 6 2,208 5,563 7 1,313 3,760

2XYX

16,452 35,913 104,841,549 52,413,218

2Y

216,500,737

6,353,406 5,235,09018,734,84853,011,365 4,286,85612,283,104 4,936,880

2,979,076 2,377,764 7,929,85630,858,025 1,669,264 4,875,264 1,723,969

13,549,76111,526,02544,262,40991,068,84911,009,12430,946,96914,137,600

n

XX

n

YXYX

bi

i

iiii

22

1

Page 58: Bus b272 f unit 1

ANOVA and Linear Regression

58

BUS B272 Unit 1

Computation of Regression Coefficient

452,16 X

549,841,104 XY218,413,522 X

913,35Y

486633657.1

7452,16

218,413,52

7913,35452,16

549,841,104

22

2

1

n

XX

n

YXYX

bi

i

iiii

41472608.636,17

452,16486633657.1

7

913,3510

XbYb

Page 59: Bus b272 f unit 1

ANOVA and Linear Regression

59

BUS B272 Unit 1

Example 1 : Equation for the Sample

Regression Line

iii XXbbY 487.1415.636,1ˆ10

0

2000

4000

6000

8000

10000

12000

0 1000 2000 3000 4000 5000 6000

Square Feet

An

nu

al

Sa

les

($000)

Y i = 1636.415 +1.487X i

Page 60: Bus b272 f unit 1

ANOVA and Linear Regression

60

BUS B272 Unit 1

Example 1 : Interpretation of Results

The slope of 1.487 means that for each increase of one unit in X, we predict the average of Y to increase by an estimated 1.487 units.

The model estimates that for each increase of one square foot in the size of the store, the expected annual sales are predicted to increase by $1487.

ˆ 1636.415 1.487i iY X

Page 61: Bus b272 f unit 1

ANOVA and Linear Regression

61

BUS B272 Unit 1

Predicting Annual Sales Based on Square Footage

Suppose that we would like to use the fitted model to predict the average annual sales for a store with 4,000 square feet.

35.949,582,7$

94935225.582,7

000,4486633657.1414726.1636

486633657.1414726.1636ˆ

ii XY

Page 62: Bus b272 f unit 1

ANOVA and Linear Regression

62

BUS B272 Unit 1

Interpolation versus Extrapolation

For using regression line for prediction purpose, it is not appropriate to make predictions beyond the relevant range (in the previous example: (1,292, 5,555)) of the independent variable.

That is, we may interpolate within the relevant range of X values, but we SHOULD NOT extrapolate beyond the range of X values. For example, it is not appropriate to predict the average annual sales for a store with 7,000 square feet since it is beyond the range of X values, i.e., (1,292, 5,555).

Page 63: Bus b272 f unit 1

ANOVA and Linear Regression

63

BUS B272 Unit 1

Causal Relationship?

In general, when there is a relationship identified between X and Y using regression analysis, we usually would say that ‘X is associated with Y’ instead of saying ‘X causes Y’.

We cannot claim that two variables are related by cause and effect just because there is a statistical relationship between the two. In fact, you cannot infer a causal relationship from statistics alone.

Page 64: Bus b272 f unit 1

ANOVA and Linear Regression

64

BUS B272 Unit 1

For example, the price of dog food and houses, may well be positively correlated over time.

When you collect data concerning the price of dog food and the price of houses over time, you might end up with an inference that they have a positive relationship, but can you conclude that an increase in the price of dog food would directly cause the price of houses to increase too?

It might be that an inflationary force is influencing both and hence they can be seen to move in the same general direction over time.

Page 65: Bus b272 f unit 1

ANOVA and Linear Regression

65

BUS B272 Unit 1

Computer Application

Import the data into two adjacent columns in an Excel file and then click Tools/Data Analysis/ Regression (See page 624-5 for detail description).

Page 66: Bus b272 f unit 1

ANOVA and Linear Regression

66

BUS B272 Unit 1

Example 1: Computer OutputSUMMARY OUTPUT

Regression Statistics

Multiple R 0.970557204R Square 0.941981286Adjusted R Square 0.930377543Standard Error 611.7515173Observations 7

ANOVAdf SS MS F Significance F

Regression 1 30380456.12 30380456 81.17909 0.000281201Residual 5 1871199.595 374239.9Total 6 32251655.71

Coefficients Standard Error t Stat P-valueIntercept 1636.414726 451.4953308 3.624433 0.015149Square feet 1.486633657 0.164999212 9.009944 0.000281

Page 67: Bus b272 f unit 1

ANOVA and Linear Regression

67

BUS B272 Unit 1

Exercise 2

Day 8 9 10 11 12 13 14

Level of pollution

135 147 107 118 126 143 104

Consultations 32 35 23 28 26 32 22

Day 1 2 3 4 5 6 7

Level of pollution

115 134 126 158 99 86 129

Consultations 20 86 28 38 18 12 29

Consider the example about the level of air pollution at Kwun Tong and the total number of consultations that relate to respiratory diseases in a public clinic in the area. The corresponding data were given as follows:

Page 68: Bus b272 f unit 1

ANOVA and Linear Regression

68

BUS B272 Unit 1

Exercise 1 (a) Determine the sample regression line to

predict the number of consultations by the level of pollution.

(b) Interpret the coefficients.

Solution:

727,11

n

iix 429

1

n

iiy

207,2181

2

n

iix281,55

1

n

iii yx

079,171

2

n

iiy

Page 69: Bus b272 f unit 1

ANOVA and Linear Regression

69

BUS B272 Unit 1

Exercise 1

456701074.0

14727,1

207,218

14429727,1

281,55

22

2

1

n

xx

n

yxyx

bi

i

iiii

694482444.25

357142857.123456701074.06427143.3010

xbyb

For , each additional increase in pollution level, the number of consultations increases, on average by 0.456701074.

1b

No meaningful interpretation for can be made, as the range of x does not include zero.

0b

Page 70: Bus b272 f unit 1

ANOVA and Linear Regression

70

BUS B272 Unit 1

From time to time, after we have set up a linear regression model, we wish to assess the fitness of the model. That is, we wish to find out how well the model fit to the given data. For a good fit, the data as a whole should be quite close to the regression line and the independent variable can thus be used to predict the value of the dependent variable with high accuracy.

To examine how well the independent variable predicts the dependent variable, we need to develop several measures of variation.

Assessing the simple linear regression model

Page 71: Bus b272 f unit 1

ANOVA and Linear Regression

71

BUS B272 Unit 1

Measure of Variation: The Sum of Squares

SS(Total) = SSR + SSE

Total Sample

Variability

= Explained Variability

+ Unexplained Variability

Page 72: Bus b272 f unit 1

ANOVA and Linear Regression

72

BUS B272 Unit 1

Measure of Variation: The Sum of Squares

SS(Total) = total sum of squares Measures the variation of the Yi values

around their mean Y SSR = regression sum of squares

Explained variation attributable to the relationship between X and Y

SSE = error sum of squares Variation attributable to factors other than

the relationship between X and Y (Unexplained variation)

(continued)

Page 73: Bus b272 f unit 1

ANOVA and Linear Regression

73

BUS B272 Unit 1

Measure of Variation: The Sum of Squares

(continued)

Xi

Y

X

Y

SSE =(Yi - Yi )2

SSR = (Yi - Y)2

_

__

Yi

SS(Total) = (Yi – Y )2

_

Page 74: Bus b272 f unit 1

ANOVA and Linear Regression

74

BUS B272 Unit 1

714.655,251,32)(

2

22 n

YYYYTotalSS i

ii

452,16 X 549,841,104 XY

218,413,522 X

913,35Y

737,500,2162 Y

595.199,871,1

ˆ10

22

iiiiii YXbYbYYYSSE

1190.456,380,30

ˆ2

102

n

YYXbYbYYSSR i

iiii

Page 75: Bus b272 f unit 1

ANOVA and Linear Regression

75

BUS B272 Unit 1

Standard Error of Estimate

The standard deviation of the variation of observations around the regression line.

751517366.6115

595.199,871,1

2

ˆ

21

2

n

YY

n

SSES

n

ii

Page 76: Bus b272 f unit 1

ANOVA and Linear Regression

76

BUS B272 Unit 1

The smallest value that can assume is 0, which occurs when SSE = 0, that is, when all the points fall on the regression line. Thus, when is small, the fit is excellent, and the linear regression model is likely to be an effective analytical and forecasting tool.

When is large, the regression model is a poor one, it is of little value to be used.

S

S

S

Standard Error of Estimate

Page 77: Bus b272 f unit 1

ANOVA and Linear Regression

77

BUS B272 Unit 1

The Coefficient of Determination (r 2 or R 2 )

Measures the proportion of variation in Y that is explained by the independent variable X in the regression model

By themselves, SSR, SSE and SS(Total) provide little that can be directly interpreted. A simple ratio of SSR and SS(Total) provides a measure of the usefulness of the regression equation.

941981286.0714.655,251,32

119.456,380,30

Squares of Sum Total

Squares of Sum Regression

)(2

TotalSS

SSRr

Page 78: Bus b272 f unit 1

ANOVA and Linear Regression

78

BUS B272 Unit 1

Coefficients of Determination (r 2)

r2 = 1 r2 = 1

r2 = 0.8 r2 = 0Y

Yi = b0 + b1Xi

X

^

YYi = b0 + b1Xi

X

^Y

Yi = b0 + b1Xi

X

^

Y

Yi = b0 + b1Xi

X

^

Page 79: Bus b272 f unit 1

ANOVA and Linear Regression

79

BUS B272 Unit 1

Coefficient of Correlation

Coefficient of correlation is used to measure strength of association (linear relationship) between two numerical variables) Only concerned with strength of the

relationship No causal effect is implied

Page 80: Bus b272 f unit 1

ANOVA and Linear Regression

80

BUS B272 Unit 1

Population correlation coefficient is denoted by (Rho).

Sample correlation coefficient is denoted by r . It is an estimate of and is used to measure the strength of the linear relationship in the sample observations.

(continued)

2rr

Coefficient of Correlation

Page 81: Bus b272 f unit 1

ANOVA and Linear Regression

81

BUS B272 Unit 1

99705572037.0

7913,35

7737,500,2167452,16

7218,413,52

7913,35

7452,16

7549,841,104

22

1

22

1

22

1

n

ii

n

ii

n

iii

ynyxnx

yxnyxr

Coefficient of Correlation

Page 82: Bus b272 f unit 1

ANOVA and Linear Regression

82

BUS B272 Unit 1r = 0.6 r = 1

Sample of Observations from Various r Values

Y

X

Y

X

Y

X

Y

X

Y

X

r = –1 r = –0.6 r = 0

Page 83: Bus b272 f unit 1

ANOVA and Linear Regression

83

BUS B272 Unit 1

Features of r and r

Unit free Range between –1 and 1 The closer to –1, the stronger the

negative linear relationship The closer to 1, the stronger the positive

linear relationship The closer to 0, the weaker the linear

relationship

Page 84: Bus b272 f unit 1

ANOVA and Linear Regression

84

BUS B272 Unit 1

There is also a more systematic way to assess model fitness, i.e., to perform a hypothesis testing on the slope of the regression line.

Inference about the Slope

If the two variables involved are not at all linearly related, one could observe from the scatter diagram shown on the right that the slope of the regression line will be zero.

Page 85: Bus b272 f unit 1

ANOVA and Linear Regression

85

BUS B272 Unit 1

Hence, we can determine whether a significant relationship between the variables X and Y exists by testing whether (the true slope) is equal to zero.

Inference about the Slope

1

0:

0:

11

10

H

H (There is no linear relationship)

(There is a linear relationship)If is rejected, there is evidence to

believe that a linear relationship exists between X and Y.

0H

Page 86: Bus b272 f unit 1

ANOVA and Linear Regression

86

BUS B272 Unit 1

The standard error of the slope

The estimated standard error of . 1b

n

i

n

ii

b

XnX

S

XX

SS

1

22

1

21

Page 87: Bus b272 f unit 1

ANOVA and Linear Regression

87

BUS B272 Unit 1

Inference about the Slope: t Test

t test for a population slope Is there a linear dependency of Y on

X ? Null and alternative hypotheses

H0: 1 = 0 (no linear dependency) H1: 1 0 (linear dependency)

Test statistic: . . 2d f n

1

11

bS

bt

Page 88: Bus b272 f unit 1

ANOVA and Linear Regression

88

BUS B272 Unit 1

Example: Store SalesData for Seven Stores: Estimated

Regression Equation:

The slope of this model is 1.487.

Is square footage of the store affecting its annual sales?

Annual Store Square Sales

Feet ($000)

1 1,726 3,681 2 1,542 3,395 3 2,816 6,653 4 5,555 9,543 5 1,292 3,318 6 2,208 5,563 7 1,313 3,760

Yi = 1636.415 +1.487Xi

Page 89: Bus b272 f unit 1

ANOVA and Linear Regression

89

BUS B272 Unit 1

H0: 1 = 0 0.05

H1: 1 0 df 7 - 2 = 5

Test Statistic:

009943959.9164999212.0

486633657.1

28571429.350,27218,413,52

751517366.611486633657.1

2

11

1

bS

bt

Page 90: Bus b272 f unit 1

ANOVA and Linear Regression

90

BUS B272 Unit 1

Inferences about the Slope: t Test Example

Critical Value(s): Decision:Conclusion:

At 5% level of significance, there is evidence to reveal that square footage is associated with annual sales.

0 2.5706-2.5706

0.025

Reject Reject

0.025

Reject H0

5t

Page 91: Bus b272 f unit 1

ANOVA and Linear Regression

91

BUS B272 Unit 1

0:

0:

11

10

H

H (No linear relationship)

(A linear relationship)

0:

0:

11

10

H

H (No positive linear relationship)

(A positive linear relationship)

0:

0:

11

10

H

H (No negative linear relationship)

(A negative linear relationship)

Inferences about the Slope

Page 92: Bus b272 f unit 1

ANOVA and Linear Regression

92

BUS B272 Unit 1

Exercise 3

Consider the data of Exercise 2 about the level of air pollution at Kwun Tong and the total number of consultations that relate to respiratory diseases in a public clinic in the area.

Test at the 5% level of significance to determine whether level of air pollution and the total number of consultations are positively linearly related.

Page 93: Bus b272 f unit 1

ANOVA and Linear Regression

93

BUS B272 Unit 1

Solution:

0:

0:

11

10

H

H

727,11

n

iix 429

1

n

iiy

207,2181

2

n

iix

281,551

n

iii yx

079,171

2

n

iiy

0.05; df 14 - 2 = 12

04091530.855,2

ˆ10

22

iiiiii YXbYbYYYSSE

424658060.15214

04091530.855,2

2

n

SSES

Page 94: Bus b272 f unit 1

ANOVA and Linear Regression

94

BUS B272 Unit 1

Exercise 3

214537530.0

357142857.12314207,218

424658060.15

2

221

XnX

SS

i

b

128770074.2214537530.0

456701074.0

1

11

bS

bt

Page 95: Bus b272 f unit 1

ANOVA and Linear Regression

95

BUS B272 Unit 1

Computer Output迴歸統計

R 的倍數 0.52356487R 平方 0.27412017

R 調整的 平方 0.21363018標準誤 15.4246581觀察值個數 14

ANOVA自由度 SS MS F 顯著值

迴歸 1 1078.17337 1078.173 4.531662 0.054675殘差 12 2855.040915 237.9201總和 13 3933.214286

係數 標準誤 t 統計 P-值截距 -25.6944824 26.78388667 -0.95933 0.356325Level of pollution0.45670107 0.214537531 2.12877 0.054675

For two-tailed test

Page 96: Bus b272 f unit 1

ANOVA and Linear Regression

96

BUS B272 Unit 1

Exercise 3

Critical Value(s):

0 1.7823

0.05

Reject H0

12t

Decision:Conclusion:

At 5% level of significance, there is evidence to believe that level of air pollution and total number of consultations are positively linearly related.

Reject H0

Page 97: Bus b272 f unit 1

ANOVA and Linear Regression

97

BUS B272 Unit 1

You have seen how can we assess the model fitness. If the model fits satisfactorily, we can use it to forecast and estimate values of the dependent variable.

We can obtain a point prediction of Y with a given value of X using the linear regression line.

Confidence interval about the particular value of Y or the average of Y for a given value of X can also be computed if desired.

Estimation of Mean Values

Page 98: Bus b272 f unit 1

ANOVA and Linear Regression

98

BUS B272 Unit 1

Estimation of Mean Values

Confidence interval estimate for :

The mean of Y given a particular

t value from table with df = n - 2

Standard error of the estimate

Size of interval varies according to distance away from mean, X

gXXY

gX

n

ii

gni

XX

XX

nStY

1

2

2

2 ,2/1ˆ

Page 99: Bus b272 f unit 1

ANOVA and Linear Regression

99

BUS B272 Unit 1

Prediction of Individual Values

Prediction interval for individual response Yi at a particular

Addition of one increases width of interval from that for the mean of Y

n

ii

gni

XX

XX

nStY

1

2

2

2 ,2/1

gX

Page 100: Bus b272 f unit 1

ANOVA and Linear Regression

100

BUS B272 Unit 1

Interval Estimates for Different Values of X

Y

X

Prediction Interval for a individual Yi

Y given X

Confidence Interval for the mean of Y

Y i = b0 + b1X i

X

Page 101: Bus b272 f unit 1

ANOVA and Linear Regression

101

BUS B272 Unit 1

Example: Stores Sales

Yi = 1636.415 +1.487Xi

Data for seven stores:

Regression Model Obtained:

Predict the annual sales for a store with 2000 square feet.

Annual Store Square Sales

Feet ($000)

1 1,726 3,681 2 1,542 3,395 3 2,816 6,653 4 5,555 9,543 5 1,292 3,318 6 2,208 5,563 7 1,313 3,760

Page 102: Bus b272 f unit 1

ANOVA and Linear Regression

102

BUS B272 Unit 1

Estimation of Mean Values: Example

751517366.611S

Find the 95% confidence interval for the average annual sales for a 2,000 square-foot store.

Predicted Sales Yi = 1636.415 +1.487Xi = 4609.68 ($000)

X = 2350.29 tn-2 = t5 = 2.571

Confidence Interval Estimate forgXXY

44.5222 ,92.39967579.61268.46091ˆ

1

2

2

2 ,2/

n

ii

gni

XX

XX

nStY

4284.1374631728571429.2350752413218 222 XnX

Page 103: Bus b272 f unit 1

ANOVA and Linear Regression

103

BUS B272 Unit 1

Prediction Interval for Y : Example

6297.64 ,72.29219613.168768.46091

2

1

2

2

2 ,2/

xnx

xx

nStY

n

i

gni

751517366.611S

4284.1374631728571429.2350752413218 222 XnX

Find the 95% prediction interval for the annual sales of a 2,000

square-foot storePredicted Sales Yi = 1636.415 +1.487Xi = 4609.68 ($000)

X = 2350.29 tn-2 = t5 = 2.571

Prediction Interval for Individual Y

751517366.611S

Page 104: Bus b272 f unit 1

ANOVA and Linear Regression

104

BUS B272 Unit 1

Computer Application

Commands: Tools/ Data Analysis Plus/ Prediction Interval.

Page 105: Bus b272 f unit 1

ANOVA and Linear Regression

105

BUS B272 Unit 1

Computer OutputPrediction Interval

Annual Sales ($1000)

Predicted value 4609.682

Prediction IntervalLower limit 2921.998Upper limit 6297.366

Interval Estimate of Expected ValueLower limit 3997.025Upper limit 5222.339

Page 106: Bus b272 f unit 1

ANOVA and Linear Regression

106

BUS B272 Unit 1

Linear Regression Assumptions

1. Normality Y values are normally distributed for

each X Probability distribution of error is normal

2. Homoscedasticity (Constant Variance)

3. Independence of Errors

Page 107: Bus b272 f unit 1

ANOVA and Linear Regression

107

BUS B272 Unit 1

• Y values are normally distributed around the regression line.

• For each X value, the “spread” or variance around the regression line is the same.

Variation of Errors around the Regression Line

X1

X2

X

Y

f(e)

Sample Regression Line

.

Page 108: Bus b272 f unit 1

BUS B272F Unit 1 ANOVA and Linear Regression

Multiple Regression

Page 109: Bus b272 f unit 1

ANOVA and Linear Regression

109

BUS B272 Unit 1

Introduction

Extension of the simple linear regression model to allow for any fixed number of independent variables. That is, the number of independent variables could be more than one.

Page 110: Bus b272 f unit 1

ANOVA and Linear Regression

110

BUS B272 Unit 1

To make use of computer printout to Assess the model

How well it fits the data Is it useful Are any required conditions violated?

Employ the model Interpreting the coefficients Predictions using the prediction equation

Estimating the expected value of the dependent variable

Multiple Linear Regression

Page 111: Bus b272 f unit 1

ANOVA and Linear Regression

111

BUS B272 Unit 1

Allow for k independent variables to potentially be related to the dependent variable

y = b0 + b1x1+ b2x2 + …+ bkxk + e

Dependent variableIndependent variables

Random error variable

Model and Required Conditions

RegressionCoefficients

Page 112: Bus b272 f unit 1

ANOVA and Linear Regression

112

BUS B272 Unit 1

Multiple Regression for k = 2, Graphical Demonstration

y

X2

X1

The simple linear regression modelallows for one independent variable, “x”for y = b0 + b1x + e

The multiple linear regression modelallows for more than one independent variable.Y = b0 + b1x1 + b2x2 + e

y = b0 + b1x1 + b2x2

y = b0 + b1x1 + b2x2

y = b0 + b1x1 + b2x2

y = b0 + b1x1 + b2x2y = b0 + b1x1 + b2x2

y = b0 + b1x1 + b2x2

y = b0 + b1x1 + b2x2

XY 10ˆ

Page 113: Bus b272 f unit 1

ANOVA and Linear Regression

113

BUS B272 Unit 1

The error e is normally distributed. The mean is equal to zero and the

standard deviation is constant (se) for all values of y.

The errors are independent.

Required conditions for the error variable

Page 114: Bus b272 f unit 1

ANOVA and Linear Regression

114

BUS B272 Unit 1

Estimating the Coefficients and

Assessing the Model

The procedure used to perform multiple regression analysis:

If the model assessment indicates good fit to the data, use it to interpret the coefficients and generate predictions.

Assess the model fitness using statistics obtained from the sample.

Obtain the model coefficients and statistics using a statistical software.

Page 115: Bus b272 f unit 1

ANOVA and Linear Regression

115

BUS B272 Unit 1

Example 18.1 Keller: Where to locate a new motor inn?

Estimating the Coefficients and Assessing the Model, Example

La Quinta defines profitable inns as those with an operating margin in excess of 50% and unprofitable ones with margins of less than 30%.

La Quinta Motor Inns is planning to build new inns.Management wishes to predict which sites are likely to be profitable.Several areas where predictors of profitability (operating margin) can be identified are:

CompetitionMarket awarenessDemand generatorsDemographicsPhysical quality

Page 116: Bus b272 f unit 1

ANOVA and Linear Regression

116

BUS B272 Unit 1

Margin (%)

Competition Market awareness Customers Community Physical

Profitability

Number Nearest Officespace Enrollment Income Distance

Distance to the downtowncore (in miles)

Medianhouseholdincome of nearbyarea (in $thousands)

Number of miles to closest competition

Number of hotels/motelsrooms within 3 miles from the site

Estimating the Coefficients and Assessing the Model, Example

Office space in nearby community

Enrollemnt in nearby

university or college (in thousands)

Page 117: Bus b272 f unit 1

ANOVA and Linear Regression

117

BUS B272 Unit 1

Data were collected from randomly selected 100 inns that belong to La Quinta, and ran for the following suggested model:

Margin = b + bRooms + b2Nearest + b3Office + b4College + b5Income + b6Disttwn

Estimating the Coefficients and Assessing the Model, Example

Margin Number Nearest Office Space Enrollment Income Distance55.5 3203 4.2 549 8 37 2.733.8 2810 2.8 496 17.5 35 14.449 2890 2.4 254 20 35 2.6

Xm18-01

Page 118: Bus b272 f unit 1

ANOVA and Linear Regression

118

BUS B272 Unit 1

This is the sample regression equation (sometimes called the prediction equation)This is the sample regression equation (sometimes called the prediction equation)

Regression Analysis, Excel Output

SUMMARY OUTPUT

Regression StatisticsMultiple R 0.7246R Square 0.5251Adjusted R Square 0.4944Standard Error 5.51Observations 100

ANOVAdf SS MS F Significance F

Regression 6 3123.8 520.6 17.14 0.0000Residual 93 2825.6 30.4Total 99 5949.5

Coefficients Standard Error t Stat P-valueIntercept 38.14 6.99 5.45 0.0000Number -0.0076 0.0013 -6.07 0.0000Nearest 1.65 0.63 2.60 0.0108Office Space 0.020 0.0034 5.80 0.0000Enrollment 0.21 0.13 1.59 0.1159Income 0.41 0.14 2.96 0.0039Distance -0.23 0.18 -1.26 0.2107

Margin = 38.14 - 0.0076Number +1.65Nearest + 0.020Office Space +0.21Enrollment + 0.41Income - 0.23Distance

Page 119: Bus b272 f unit 1

ANOVA and Linear Regression

119

BUS B272 Unit 1

Model Assessment

The model is assessed using two tools: The coefficient of determination The F -test of the analysis of variance

The standard error of estimates participates in building the above tools.

Page 120: Bus b272 f unit 1

ANOVA and Linear Regression

120

BUS B272 Unit 1

Standard Error of Estimate

The standard deviation of the error is estimated by the Standard Error of Estimate:

The magnitude of s is judged by comparing it to

.y

1

kn

SSES

Page 121: Bus b272 f unit 1

ANOVA and Linear Regression

121

BUS B272 Unit 1

From the printout, se = 5.51 Calculating the mean value of y, we

have

It seems se is not particularly small. Question:

Can we conclude the model does not fit the data well?

739.45y

Standard Error of Estimate

Page 122: Bus b272 f unit 1

ANOVA and Linear Regression

122

BUS B272 Unit 1

Coefficient of Determination

The definition is:

From the printout, r 2 = 0.5251 52.51% of the variation in operating margin is

explained by the six independent variables. 47.49% remains unexplained.

)()(11 2

2

TotalSS

SSR

TotalSS

SSE

yy

SSEr

i

Page 123: Bus b272 f unit 1

ANOVA and Linear Regression

123

BUS B272 Unit 1

For testing the validity of the model, the following question is asked:Is there at least one independent variable linearly related to the dependent variable?

To answer the question we test the hypothesis

H0: b1 = b2 = … = bk = 0H1: At least one bi is not equal to

zero.

If at least one bi is not equal to zero, the model has some validity or usefulness.

Testing the Validity of the Model

Page 124: Bus b272 f unit 1

ANOVA and Linear Regression

124

BUS B272 Unit 1

The hypotheses are tested by an ANOVA procedure ( the Excel output)

Testing the Validity of the La Quinta Inns Regression Model

MSE=SSE / (n-k-1)

MSR=SSR / k

MSR / MSE

SSE

SSR

k =n–k–1 = n-1 =

ANOVAdf SS MS F Significance F

Regression 6 3123.8 520.6 17.14 0.0000Residual 93 2825.6 30.4Total 99 5949.5

Page 125: Bus b272 f unit 1

ANOVA and Linear Regression

125

BUS B272 Unit 1

[Total variation in y] SS(Total) = SSR + SSE. Large F results from a large SSR. That implies much of the variation in y can be explained by the regression model; the model is useful, and thus, the null hypothesis should be rejected. Therefore, the rejection region is:

Testing the Validity of the La Quinta Inns Regression Model

F > Fa, k, n – k – 1

1

knSSE

kSSR

F

while the test statistic is:

Page 126: Bus b272 f unit 1

ANOVA and Linear Regression

126

BUS B272 Unit 1

Also, the p-value (Significance F) = 0.0000; Reject the null hypothesis.

Testing the Validity of the La Quinta Inns Regression Model

ANOVAdf SS MS F Significance F

Regression 6 3123.8 520.6 17.14 0.0000Residual 93 2825.6 30.4Total 99 5949.5

Fa, k, n-k-1 = F0.05,6,100-6 -1 = 2.17F = 17.14 > 2.17

Conclusion: There is sufficient evidence to reject the null hypothesis in favor of the alternative hypothesis. At least one of the bi is not equal to zero. Thus, at least one independent variable is linearly related to y. This linear regression model is valid.

Page 127: Bus b272 f unit 1

ANOVA and Linear Regression

127

BUS B272 Unit 1

b0 = 38.14. This is the intercept, the value of

y when all the variables take the value zero. Since the data range of all the independent variables do not cover the value zero, do not interpret the intercept.

b1 = – 0.0076. In this model, for each

additional room within 3 mile of the La

Quinta inn, the operating margin decreases

on average by 0.0076% (assuming the other

variables are held constant).

Interpreting the Coefficients

Page 128: Bus b272 f unit 1

ANOVA and Linear Regression

128

BUS B272 Unit 1

b2 = 1.65. In this model, for each additional

mile that the nearest competitor is to a La Quinta inn, the operating margin increases on average by 1.65% when the other variables are held constant.

b3 = 0.020. For each additional 1000 sq-ft of office space, the operating margin will increase on average by 0.02% when the other variables are held constant.

b4 = 0.21. For each additional thousand students the operating margin increases on average by 0.21% when the other variables are held constant.

Interpreting the Coefficients

Page 129: Bus b272 f unit 1

ANOVA and Linear Regression

129

BUS B272 Unit 1

b5 = 0.41. For additional $1000 increase in median household income, the operating margin increases on average by 0.41%, when the other variables remain constant.

b6 = -0.23. For each additional mile to the

downtown center, the operating margin

decreases on average by 0.23% when the

other variables are held constant.

Interpreting the Coefficients

Page 130: Bus b272 f unit 1

ANOVA and Linear Regression

130

BUS B272 Unit 1

Coefficients Standard Error t Stat P-valueIntercept 38.14 6.99 5.45 0.0000Number -0.0076 0.0013 -6.07 0.0000Nearest 1.65 0.63 2.60 0.0108Office Space 0.020 0.0034 5.80 0.0000Enrollment 0.21 0.13 1.59 0.1159Income 0.41 0.14 2.96 0.0039Distance -0.23 0.18 -1.26 0.2107

The hypothesis for each bi is

Excel printout

Test statistic:

Testing the Coefficients

H0: bi = 0H1: bi ¹ 0

d.f. = n - k -1

ib

ii

s

bt

Page 131: Bus b272 f unit 1

ANOVA and Linear Regression

131

BUS B272 Unit 1

The model can be used for making predictions by Producing prediction interval estimate for the

particular value of y, for a given set of values of xi.

Producing a confidence interval estimate for the expected value of y, for a given set of values of xi.

The model can be used to learn about relationships between the independent variables xi, and the dependent variable y, by interpreting the coefficients bi

Using the Linear Regression Equation

Page 132: Bus b272 f unit 1

ANOVA and Linear Regression

132

BUS B272 Unit 1

Predict the average operating margin of an inn at a site with the following characteristics: 3815 rooms within 3 miles, Closet competitor 0.9 miles away, 476,000 sq-ft of office space, 24,500 college students, $35,000 median household income, 11.2 miles away from downtown center.

MARGIN = 38.14 - 0.0076(3815) +1.65(0.9) + 0.020(476) +0.21(24.5) + 0.41(35) - 0.23(11.2) = 37.1%

Xm18-01 La Quinta Inns, Predictions

Page 133: Bus b272 f unit 1

ANOVA and Linear Regression

133

BUS B272 Unit 1

Interval estimates by Excel (Data Analysis Plus)

It is predicted, with 95% confidence that the operating margin will lie between 25.4% and 48.8%.It is estimated the average operating margin of all sites that fit this category falls within 33% and 41.2%.Both of them suggested that the given site would not be profitable (less than 50%).

La Quinta Inns, Predictions

Prediction Interval

Margin

Predicted value 37.1

Prediction IntervalLower limit 25.4Upper limit 48.8

Interval Estimate of Expected ValueLower limit 33.0Upper limit 41.2