stat hw-4

7/31/2019 STAT HW-4

1/3

Question: 27.25

(a) WEIGHT (GRAMS) = - 579 + 14.3 * LENGTH (CM) + 113 * WIDTH (CM)(b) 93.7 % variation in perch is explained by the model.(c) The p-value of the model is around zero. So The ANOVA table indicates that only at least one of the explanatory

variables is helpful in predicting the weight of the perch.

(d) The p-value for 1 and 2 are 0.014 and around zero respectively at t-value 2.53 and 3.75. So the data provideevidence that both 1 and 2 are significantly different from zero.

(e) WEIGHT (GRAMS) = 114 - 3.48 LENGTH (CM) - 94.6 WIDTH (CM) + 5.24 INTERACTION(f) 98.5 % variation in perch is explained by the model.(g) The p-value of the model is around zero. So The ANOVA table indicates that only at least one of the explanatory

variables is helpful in predicting the weight of the perch.

(h) The t-statistic was changed to -1.10 and p-value becomes 0.274 which is greater than -level 0.05 and dataunable to provide evidence that the 1 significantly different from zero.

Question: HW 4

1. Dependent variable: ManHoursIndependent variable: OCCUP, CHECHIN, HOURS, COMMON, WINGS, CAP, ROOMSStep-1

Root MSE of model =455.167

R-Sq=96.1% and R-Sq(adj)=94.5%

P-value is around zero. So at least one of the variables contributes to the model. The individual p-value o

OCCUP(0.129), HOURS (0.722) and WINGS (0.708) are more than alpha level 0.05 and so we will run the another

regression without HOURS which has highest p-value among the all mentioned here.

Step-2

Dependent variable: ManHours

Independent variable: OCCUP, CHECHIN, COMMON, WINGS, CAP, ROOMS

Root MSE of model = 444.049 R-Sq = 96.1% R-Sq(adj) = 94.8%Both root MSE decreased and R-Sq(adj) increased very little.

Here model p-value still near zero. So at least one of the variables contributes to the model. The individual p

value of OCCUP (0.122) and WINGS (0.696) are more than alpha level 0.05 thats why these are not significant

There should be another run of regression without WINGS which has highest p-value.

Step-3


Independent variable: OCCUP, CHECHIN, COMMON, CAP, ROOMS

Root MSE of model = 434.089 R-Sq = 96.1% R-Sq(adj) = 95.0%

Root MSE of model decreased more as compared previous run and little increase in R-Sq(adj) of the model.

The model p-value still near zero and at least one of the variables contributes to the model. The individual p-

value of OCCUP is 0.096 which is more than alpha level 0.05 and OCCUP is not significant. The model run

another regression without OCCUP.

Step: 4


Independent variable: CHECHIN, COMMON, CAP, ROOMS


Root MSE of model increased and little decrease in R-Sq(adj) as compared to previous run.

7/31/2019 STAT HW-4

2/3

The model p-value still near zero and at least one of the variables contributes to the model. The individual p-

value of COMMON is 0.098 which is more than alpha level 0.05 and COMMON is not significant. The model run

another regression without COMMON.

Step:5


Independent variable: CHECHIN, CAP, ROOMS


Root MSE of model increased more and little decrease in R-Sq(adj) as compared to previous run.The model p-value still near zero and at least one of the variables contributes to the model. The individual p-

values of all independent variable is less than 0.05 alpha level and all remaining variable in this run are

significant which contribute to the model.

10005000-500-1000

9

8

7

6

5

4

3

2

1

0

Residual

Frequency

Histogram(response is ManHours)

800070006000500040003000200010000

1000

500

0

-500

-1000

Fitted Value

Resid

ual

Versus Fits

(response is ManHours)

Final model is ManHours = 118 + 1.93 CHECKIN - 11.0 CAP + 22.7 ROOMS

The histogram of residuals looks normal and scatter plot of residuals does not have any pattern and looks

random. So we can choose this model as final useful model to predict the manhour requirement for BOQ for theUS navy.

2. Best subset model

7/31/2019 STAT HW-4

3/3

The best two model which contain one independent variables are CHECKIN and ROOMS which have very high Cp away

from Cp value 2 for single variable and has also very high Se for which we should look for model having more

independent variables. The situation for two best model with two independent variable is similar to the models with on

variable. When two best model with three variable are compared their Cp value is still far from 4 except a model having

independent variables CHECKIN,CAPS and ROOMS with Cp value 6.1 and Se value 477.38 with R-Sq and R-Sq (adj) 94.7 %

and 94.0 % respectively which can be a potential model to choose.

But when two best model with four variable are analyzed, the model having independent variable CHECKIN, COMMON

CAPS and ROOMS has Cp 5.1 near to 5 which is very good. The R-Sq and R-Sq (adj) are 95.4 % , 94.5 % respectively which

looks good as compared to another model having four independent variable having Cp value 13.4. So the model can be

chosen for selection after exploring any other good model than this.

The models having five independent variables have Cp value are 4.3 and 6.4 which are not close to 6 with Se values

more than previously chosen model. There is decrease in Se value for two models having six independentvariables but

Cp values 6.1 for both which is not enough close to 7 but there is less increase in both R-sq and R-sq(adj) values.

The model having all seven independent variables has Cp value 8 which is equal to actual Cp value(8). se value is

455.17 which is very close to chosen model with four independent variables with little increase in R-Sq and R-Sq(adj)

values. But there is no huge change in all above mentioned parameter.

By looking all models, I will choose model having four independent variables as CHECKIN, COMMON, CAPS and

ROOMS which have pretty close Cp value (5.1), good R-Sq(95.4%) and R-Sq(adj) (94.5%) with lower se (455.91)

values. These values are slightly better than the model chosen in question 1. Still I feel I should go with model chosen

in question 1.

So the model is ManHours = 199 + 1.72 CHECKIN - 17.0 COMMON - 13.1 CAP + 27.0 ROOMS

8004000-400-800

9

8

7

6

5

4

3

2

1

0

Residual

Frequency

Histogram(response is ManHours)

800070006000500040003000200010000

1000

500

0

-500

-1000

Fitte d Va lue

Residual

Versus Fits

(response is ManHours)

The residuals histograms looks normal and scatter looks random.

3. Collinearity is a problem in the choosing a regression model when there is a strong correlation between independentvariables. The independent variables which are correlated doesnt not improve model other than gives a higher standard error

of estimated parameters. One of the strongly correlated independent variable is redundant to the model.

stat hw-4

Documents