stat hw-4

Upload: kamalakanta-sahoo

Post on 04-Apr-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 STAT HW-4

    1/3

    Question: 27.25

    (a) WEIGHT (GRAMS) = - 579 + 14.3 * LENGTH (CM) + 113 * WIDTH (CM)(b) 93.7 % variation in perch is explained by the model.(c) The p-value of the model is around zero. So The ANOVA table indicates that only at least one of the explanatory

    variables is helpful in predicting the weight of the perch.

    (d) The p-value for 1 and 2 are 0.014 and around zero respectively at t-value 2.53 and 3.75. So the data provideevidence that both 1 and 2 are significantly different from zero.

    (e) WEIGHT (GRAMS) = 114 - 3.48 LENGTH (CM) - 94.6 WIDTH (CM) + 5.24 INTERACTION(f) 98.5 % variation in perch is explained by the model.(g) The p-value of the model is around zero. So The ANOVA table indicates that only at least one of the explanatory

    variables is helpful in predicting the weight of the perch.

    (h) The t-statistic was changed to -1.10 and p-value becomes 0.274 which is greater than -level 0.05 and dataunable to provide evidence that the 1 significantly different from zero.

    Question: HW 4

    1. Dependent variable: ManHoursIndependent variable: OCCUP, CHECHIN, HOURS, COMMON, WINGS, CAP, ROOMSStep-1

    Root MSE of model =455.167

    R-Sq=96.1% and R-Sq(adj)=94.5%

    P-value is around zero. So at least one of the variables contributes to the model. The individual p-value o

    OCCUP(0.129), HOURS (0.722) and WINGS (0.708) are more than alpha level 0.05 and so we will run the another

    regression without HOURS which has highest p-value among the all mentioned here.

    Step-2

    Dependent variable: ManHours

    Independent variable: OCCUP, CHECHIN, COMMON, WINGS, CAP, ROOMS

    Root MSE of model = 444.049 R-Sq = 96.1% R-Sq(adj) = 94.8%Both root MSE decreased and R-Sq(adj) increased very little.

    Here model p-value still near zero. So at least one of the variables contributes to the model. The individual p

    value of OCCUP (0.122) and WINGS (0.696) are more than alpha level 0.05 thats why these are not significant

    There should be another run of regression without WINGS which has highest p-value.

    Step-3

    Dependent variable: ManHours

    Independent variable: OCCUP, CHECHIN, COMMON, CAP, ROOMS

    Root MSE of model = 434.089 R-Sq = 96.1% R-Sq(adj) = 95.0%

    Root MSE of model decreased more as compared previous run and little increase in R-Sq(adj) of the model.

    The model p-value still near zero and at least one of the variables contributes to the model. The individual p-

    value of OCCUP is 0.096 which is more than alpha level 0.05 and OCCUP is not significant. The model run

    another regression without OCCUP.

    Step: 4

    Dependent variable: ManHours

    Independent variable: CHECHIN, COMMON, CAP, ROOMS

    Root MSE of model = 455.909 R-Sq = 95.4% R-Sq(adj) = 94.5%

    Root MSE of model increased and little decrease in R-Sq(adj) as compared to previous run.

  • 7/31/2019 STAT HW-4

    2/3

    The model p-value still near zero and at least one of the variables contributes to the model. The individual p-

    value of COMMON is 0.098 which is more than alpha level 0.05 and COMMON is not significant. The model run

    another regression without COMMON.

    Step:5

    Dependent variable: ManHours

    Independent variable: CHECHIN, CAP, ROOMS

    Root MSE of model = 477.343 R-Sq = 94.7% R-Sq(adj) = 94.0%

    Root MSE of model increased more and little decrease in R-Sq(adj) as compared to previous run.The model p-value still near zero and at least one of the variables contributes to the model. The individual p-

    values of all independent variable is less than 0.05 alpha level and all remaining variable in this run are

    significant which contribute to the model.

    10005000-500-1000

    9

    8

    7

    6

    5

    4

    3

    2

    1

    0

    Residual

    Frequency

    Histogram(response is ManHours)

    800070006000500040003000200010000

    1000

    500

    0

    -500

    -1000

    Fitted Value

    Resid

    ual

    Versus Fits

    (response is ManHours)

    Final model is ManHours = 118 + 1.93 CHECKIN - 11.0 CAP + 22.7 ROOMS

    The histogram of residuals looks normal and scatter plot of residuals does not have any pattern and looks

    random. So we can choose this model as final useful model to predict the manhour requirement for BOQ for theUS navy.

    2. Best subset model

  • 7/31/2019 STAT HW-4

    3/3

    The best two model which contain one independent variables are CHECKIN and ROOMS which have very high Cp away

    from Cp value 2 for single variable and has also very high Se for which we should look for model having more

    independent variables. The situation for two best model with two independent variable is similar to the models with on

    variable. When two best model with three variable are compared their Cp value is still far from 4 except a model having

    independent variables CHECKIN,CAPS and ROOMS with Cp value 6.1 and Se value 477.38 with R-Sq and R-Sq (adj) 94.7 %

    and 94.0 % respectively which can be a potential model to choose.

    But when two best model with four variable are analyzed, the model having independent variable CHECKIN, COMMON

    CAPS and ROOMS has Cp 5.1 near to 5 which is very good. The R-Sq and R-Sq (adj) are 95.4 % , 94.5 % respectively which

    looks good as compared to another model having four independent variable having Cp value 13.4. So the model can be

    chosen for selection after exploring any other good model than this.

    The models having five independent variables have Cp value are 4.3 and 6.4 which are not close to 6 with Se values

    more than previously chosen model. There is decrease in Se value for two models having six independentvariables but

    Cp values 6.1 for both which is not enough close to 7 but there is less increase in both R-sq and R-sq(adj) values.

    The model having all seven independent variables has Cp value 8 which is equal to actual Cp value(8). se value is

    455.17 which is very close to chosen model with four independent variables with little increase in R-Sq and R-Sq(adj)

    values. But there is no huge change in all above mentioned parameter.

    By looking all models, I will choose model having four independent variables as CHECKIN, COMMON, CAPS and

    ROOMS which have pretty close Cp value (5.1), good R-Sq(95.4%) and R-Sq(adj) (94.5%) with lower se (455.91)

    values. These values are slightly better than the model chosen in question 1. Still I feel I should go with model chosen

    in question 1.

    So the model is ManHours = 199 + 1.72 CHECKIN - 17.0 COMMON - 13.1 CAP + 27.0 ROOMS

    8004000-400-800

    9

    8

    7

    6

    5

    4

    3

    2

    1

    0

    Residual

    Frequency

    Histogram(response is ManHours)

    800070006000500040003000200010000

    1000

    500

    0

    -500

    -1000

    Fitte d Va lue

    Residual

    Versus Fits

    (response is ManHours)

    The residuals histograms looks normal and scatter looks random.

    3. Collinearity is a problem in the choosing a regression model when there is a strong correlation between independentvariables. The independent variables which are correlated doesnt not improve model other than gives a higher standard error

    of estimated parameters. One of the strongly correlated independent variable is redundant to the model.