stat hw-4
TRANSCRIPT
-
7/31/2019 STAT HW-4
1/3
Question: 27.25
(a) WEIGHT (GRAMS) = - 579 + 14.3 * LENGTH (CM) + 113 * WIDTH (CM)(b) 93.7 % variation in perch is explained by the model.(c) The p-value of the model is around zero. So The ANOVA table indicates that only at least one of the explanatory
variables is helpful in predicting the weight of the perch.
(d) The p-value for 1 and 2 are 0.014 and around zero respectively at t-value 2.53 and 3.75. So the data provideevidence that both 1 and 2 are significantly different from zero.
(e) WEIGHT (GRAMS) = 114 - 3.48 LENGTH (CM) - 94.6 WIDTH (CM) + 5.24 INTERACTION(f) 98.5 % variation in perch is explained by the model.(g) The p-value of the model is around zero. So The ANOVA table indicates that only at least one of the explanatory
variables is helpful in predicting the weight of the perch.
(h) The t-statistic was changed to -1.10 and p-value becomes 0.274 which is greater than -level 0.05 and dataunable to provide evidence that the 1 significantly different from zero.
Question: HW 4
1. Dependent variable: ManHoursIndependent variable: OCCUP, CHECHIN, HOURS, COMMON, WINGS, CAP, ROOMSStep-1
Root MSE of model =455.167
R-Sq=96.1% and R-Sq(adj)=94.5%
P-value is around zero. So at least one of the variables contributes to the model. The individual p-value o
OCCUP(0.129), HOURS (0.722) and WINGS (0.708) are more than alpha level 0.05 and so we will run the another
regression without HOURS which has highest p-value among the all mentioned here.
Step-2
Dependent variable: ManHours
Independent variable: OCCUP, CHECHIN, COMMON, WINGS, CAP, ROOMS
Root MSE of model = 444.049 R-Sq = 96.1% R-Sq(adj) = 94.8%Both root MSE decreased and R-Sq(adj) increased very little.
Here model p-value still near zero. So at least one of the variables contributes to the model. The individual p
value of OCCUP (0.122) and WINGS (0.696) are more than alpha level 0.05 thats why these are not significant
There should be another run of regression without WINGS which has highest p-value.
Step-3
Dependent variable: ManHours
Independent variable: OCCUP, CHECHIN, COMMON, CAP, ROOMS
Root MSE of model = 434.089 R-Sq = 96.1% R-Sq(adj) = 95.0%
Root MSE of model decreased more as compared previous run and little increase in R-Sq(adj) of the model.
The model p-value still near zero and at least one of the variables contributes to the model. The individual p-
value of OCCUP is 0.096 which is more than alpha level 0.05 and OCCUP is not significant. The model run
another regression without OCCUP.
Step: 4
Dependent variable: ManHours
Independent variable: CHECHIN, COMMON, CAP, ROOMS
Root MSE of model = 455.909 R-Sq = 95.4% R-Sq(adj) = 94.5%
Root MSE of model increased and little decrease in R-Sq(adj) as compared to previous run.
-
7/31/2019 STAT HW-4
2/3
The model p-value still near zero and at least one of the variables contributes to the model. The individual p-
value of COMMON is 0.098 which is more than alpha level 0.05 and COMMON is not significant. The model run
another regression without COMMON.
Step:5
Dependent variable: ManHours
Independent variable: CHECHIN, CAP, ROOMS
Root MSE of model = 477.343 R-Sq = 94.7% R-Sq(adj) = 94.0%
Root MSE of model increased more and little decrease in R-Sq(adj) as compared to previous run.The model p-value still near zero and at least one of the variables contributes to the model. The individual p-
values of all independent variable is less than 0.05 alpha level and all remaining variable in this run are
significant which contribute to the model.
10005000-500-1000
9
8
7
6
5
4
3
2
1
0
Residual
Frequency
Histogram(response is ManHours)
800070006000500040003000200010000
1000
500
0
-500
-1000
Fitted Value
Resid
ual
Versus Fits
(response is ManHours)
Final model is ManHours = 118 + 1.93 CHECKIN - 11.0 CAP + 22.7 ROOMS
The histogram of residuals looks normal and scatter plot of residuals does not have any pattern and looks
random. So we can choose this model as final useful model to predict the manhour requirement for BOQ for theUS navy.
2. Best subset model
-
7/31/2019 STAT HW-4
3/3
The best two model which contain one independent variables are CHECKIN and ROOMS which have very high Cp away
from Cp value 2 for single variable and has also very high Se for which we should look for model having more
independent variables. The situation for two best model with two independent variable is similar to the models with on
variable. When two best model with three variable are compared their Cp value is still far from 4 except a model having
independent variables CHECKIN,CAPS and ROOMS with Cp value 6.1 and Se value 477.38 with R-Sq and R-Sq (adj) 94.7 %
and 94.0 % respectively which can be a potential model to choose.
But when two best model with four variable are analyzed, the model having independent variable CHECKIN, COMMON
CAPS and ROOMS has Cp 5.1 near to 5 which is very good. The R-Sq and R-Sq (adj) are 95.4 % , 94.5 % respectively which
looks good as compared to another model having four independent variable having Cp value 13.4. So the model can be
chosen for selection after exploring any other good model than this.
The models having five independent variables have Cp value are 4.3 and 6.4 which are not close to 6 with Se values
more than previously chosen model. There is decrease in Se value for two models having six independentvariables but
Cp values 6.1 for both which is not enough close to 7 but there is less increase in both R-sq and R-sq(adj) values.
The model having all seven independent variables has Cp value 8 which is equal to actual Cp value(8). se value is
455.17 which is very close to chosen model with four independent variables with little increase in R-Sq and R-Sq(adj)
values. But there is no huge change in all above mentioned parameter.
By looking all models, I will choose model having four independent variables as CHECKIN, COMMON, CAPS and
ROOMS which have pretty close Cp value (5.1), good R-Sq(95.4%) and R-Sq(adj) (94.5%) with lower se (455.91)
values. These values are slightly better than the model chosen in question 1. Still I feel I should go with model chosen
in question 1.
So the model is ManHours = 199 + 1.72 CHECKIN - 17.0 COMMON - 13.1 CAP + 27.0 ROOMS
8004000-400-800
9
8
7
6
5
4
3
2
1
0
Residual
Frequency
Histogram(response is ManHours)
800070006000500040003000200010000
1000
500
0
-500
-1000
Fitte d Va lue
Residual
Versus Fits
(response is ManHours)
The residuals histograms looks normal and scatter looks random.
3. Collinearity is a problem in the choosing a regression model when there is a strong correlation between independentvariables. The independent variables which are correlated doesnt not improve model other than gives a higher standard error
of estimated parameters. One of the strongly correlated independent variable is redundant to the model.