mlr output interpretation_ w_o_ dummy

6
Developed By Saurabh Bhattacharya for QM-II Sec-A Page 1 Regression Problem MRP Biscuit Company started its operations in Ambala city, Haryana, in 2001. The company was growing at an annual rate of 20 percent, which was above the industry average. However, for the last three years, the growth has been only to the tune of 5 to 6 percent. This very factor has been a main concern to the top management of the company. Mr. P K Malhotra, the Senior Vice President, Marketing, had a meeting of the senior marketing team and was wondering why their company which, which has been doing so well, has slowed down in the last few years. During the discussion it was suggested by one of the senior managers to identify the factors which influence the preference for biscuits. It was argued that once these are known, it will help the company to concentrate on those factors accordingly. Therefore, the company decided to get a study done from a research agency to identify the various factors that influence the preference for biscuits. A sample of 40 individuals was chosen randomly from Ambala. The data was collected on variables like preservation quality, taste, nutrition value and preference on a 7 point likert scale with the higher number indicating a more positive rating. Also it was felt that celebrity endorsement can influence the preference of customers towards biscuits. Thus the sample was also asked a question to choose a celebrity whom they would like to see endorsing biscuits. The three celebrities were Hritik Roshan, Govinda and Saif Ali Khan. Help MRP Biscuit Co. by analyzing the data and Interpret the output to them.

Upload: vaibhav-ahuja

Post on 12-Jan-2016

56 views

Category:

Documents


1 download

DESCRIPTION

qm

TRANSCRIPT

Page 1: MLR Output Interpretation_ W_O_ Dummy

Developed By Saurabh Bhattacharya for QM-II Sec-A Page 1

Regression Problem

MRP Biscuit Company started its operations in Ambala city, Haryana, in 2001. The company

was growing at an annual rate of 20 percent, which was above the industry average. However,

for the last three years, the growth has been only to the tune of 5 to 6 percent. This very factor

has been a main concern to the top management of the company. Mr. P K Malhotra, the Senior

Vice President, Marketing, had a meeting of the senior marketing team and was wondering why

their company which, which has been doing so well, has slowed down in the last few years.

During the discussion it was suggested by one of the senior managers to identify the factors

which influence the preference for biscuits. It was argued that once these are known, it will help

the company to concentrate on those factors accordingly. Therefore, the company decided to get

a study done from a research agency to identify the various factors that influence the preference

for biscuits. A sample of 40 individuals was chosen randomly from Ambala. The data was

collected on variables like preservation quality, taste, nutrition value and preference on a 7 point

likert scale with the higher number indicating a more positive rating.

Also it was felt that celebrity endorsement can influence the preference of customers towards

biscuits. Thus the sample was also asked a question to choose a celebrity whom they would like

to see endorsing biscuits. The three celebrities were Hritik Roshan, Govinda and Saif Ali Khan.

Help MRP Biscuit Co. by analyzing the data and Interpret the output to them.

Page 2: MLR Output Interpretation_ W_O_ Dummy

Developed By Saurabh Bhattacharya for QM-II Sec-A Page 2

MRP Biscuit Co. Regression Analysis (Without Dummy)

Objective of the Study: Factors like preservation quality, taste and nutrition influence the

preference of customers or individuals for biscuits.

H1: Preservation quality positively affects preference for biscuits

H2: Taste positively affects preference for biscuits.

H3: Nutrition positively affects preference for biscuits.

Population Regression Function:

���������� = � +��������������������� +������� +����������� + �

Output Interpretation (After SAS E.G has been used to run Multiple Linear Regression

Table 1: ANOVA Table

The F value in the above ANOVA table tests:

��:�������������������� = ���������� = ������

Ha: at least one of the βs is different.

The significance of the F statistic (at 5% level or .05) implies that the alternate hypothesis is

accepted and the model is a good fit and we can proceed with regression analysis.

In Table 1, the Model sum of square explains how much variance is explained by the

independent variables in the dependent variable. The Error sum of square explains the variance

in the dependent variable not accounted for by the regression model.

Calculations related to Table 1:

Sample Size = 40

Total Sum of Square d.f. = (n-1) = (40-1) = 39, where n= sample size

Model Sum of Square d.f. = k = 3, where k = number of independent variables.

Error Sum of Square d.f. = n-k-1 = 40-3-1 = 36

Page 3: MLR Output Interpretation_ W_O_ Dummy

Developed By Saurabh Bhattacharya for QM-II Sec-A Page 3

Mean Square Model= (Model Sum of Square / model d.f.) = (108.3747/3) = 36.1249

Mean Square Error = (Error Sum of Square / error d.f.) = (17.6003/ 36) = 0.4889

F value = (Mean Square Model / Mean Square Error) = (36.1249 / 0.4889) = 73.893, 36

R square = (Model Sum of Square / Total Sum of Square)= (108.3747/ 125.975) = 0.8603

Table 2: Coefficients Table

In the Parameter Estimates table, the parameter estimates (which are the unstandardized beta)

and the standardized beta estimates and their significance values are given. In this table the first

objective is to test for Multi-collinearity. The tolerance and VIF values for all the estimates are

above .10 and below 5 respectively. Multi-collinearity is said to be present when the VIF value is

above 10 for an estimate. A VIF value between 5-10 indicates towards moderate collinearity.

Next the beta values are checked for significance. It can be observed that apart from the beta for

the variable taste, the betas for nutrition and preservation quality are significant (at 5% level of

confidence). Also, the beta coefficients of nutrition and preservation quality are in the positive

direction. Thus we accept the hypotheses H1 and H3. H2 is not accepted. Also the intercept term

is significant (at 5% level of significance). [Usually it is observed that t values in excess of 2.20

indicate that a variable is significant].

Based on the above findings the estimated regression can be written (using standardized beta) as:

����������� = .52234������������������� + .28394���������

From the above estimated regression equation it can be said that for one unit change in

preservation quality on an average preference for biscuit will increase by .52234 units and for

one unit change in nutrition on an average preference for biscuit will increase by .28394 units.

Also, it can be pointed out that the absolute value of preservation quality estimate is larger than

that of nutrition and hence it is a major factor influencing preservation quality. Thus MRP,

biscuit company should focus on the preservation quality factor.

Standard error of the beta coefficient gives us an indication of how much the point estimate is

likely to vary from the corresponding population parameter. It measures the amount of sampling

error. Lower the sampling error relative to the beta value higher the chances of the beta being

accepted.

Page 4: MLR Output Interpretation_ W_O_ Dummy

Developed By Saurabh Bhattacharya for QM-II Sec-A Page 4

Calculations related to Table 2:

� = ���������������

�. ��

Thus, the t statistics of nutrition will be : t = (.29466/ .10285) = 2.86

The confidence interval for nutrition will be ��������������� ± ��/��

√� = .29466 + 1.96 (.10285)

= (.086, .503)

Table 3: Model Summary

The above is a model summary table. The R square (coefficient of determination) is indicating to

the fact that 86.03% of the variance in the dependent variable preference is is explained by the

independent variables. The adjusted R square takes into account the sample size and the number

of independent variables in the regression model. This value is usually lower than the R- square

value as it acts as a penalization for additional variables and sample size. The adjusted R- Square

is 84.86%.

The Root mean Square Error is used only when different regression models or equations are

compared. A regression model with lower RMSE is preferred than a regression model with

greater RMSE. The RMSE is used to measure difference between the values predicted by a

model and the values actually observed or collected from respondents or firms. We don’t use it

here as only one regression model or equation is being tested.

The coefficient of variation is (RMSE/ Dependent Mean)*100 = (0.69921/4.725)*100 = 14.798.

Models with Coeff Var values below 10% lead to accurate prediction. In the above regression

model the Coefficient of Variation is more than 10%. This can be attributed to the limited sample

size used.

Page 5: MLR Output Interpretation_ W_O_ Dummy

Developed By Saurabh Bhattacharya for QM-II Sec-A Page 5

Figure 1: To Test Normality Assumption of Regression

The above figure is used to check the Normality assumption of regression. The bell curve in the

above line indicates that residuals are almost normally distributed. However, Kernel Curve

indicates that the residuals are slightly positively skewed.

Figure 2: To Test Homoscedasticity Assumption of Regression

The above figure indicates towards the presence of heteroscedasticity as most of the residuals are

not clustered around the centre of the horizontal ‘0’ line. The assumption of homoscedasticity is

violated.

Page 6: MLR Output Interpretation_ W_O_ Dummy

Developed By Saurabh Bhattacharya for QM-II Sec-A Page 6

Figure 3: To Test for Linearity

The above figure indicates that the residuals are not exactly placed on the linear line. Thus

linearity is not exactly satisfied.