regression

16
Regression Assignment By: Ashita Jain B- 17(MBA-2) AMSoM

Upload: ankita

Post on 12-Sep-2015

216 views

Category:

Documents


3 download

DESCRIPTION

Regression, technology, r analytics

TRANSCRIPT

Slide 1

Regression AssignmentBy: Ashita Jain B-17(MBA-2) AMSoMreg= read.csv("Regression1.csv")

pairs(~A1+A2+A3+A4+A5+A6+A7+A8+A9+A10+A11+A12+A13+A14+A15,data=reg)

results= lm(B~A1+A2+A3+A4+A5+A6+A7+A8+A9+A10+A11+A12+A13+A14+A15,data=reg) summary(results)Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.863e+03 4.109e+02 4.535 4.41e-05 ***A1 2.073e+00 8.419e-01 2.462 0.01781 * A2 -2.177e+00 6.753e-01 -3.224 0.00238 ** A3 -2.833e+00 1.771e+00 -1.600 0.11682 A4 -1.405e+01 7.747e+00 -1.813 0.07658 . A5 -1.155e+02 6.201e+01 -1.862 0.06931 . A6 -2.426e+01 1.121e+01 -2.164 0.03596 * A7 -1.145e+00 1.467e+00 -0.780 0.43933 A8 1.004e-02 4.124e-03 2.435 0.01903 * A9 3.533e+00 1.283e+00 2.754 0.00852 ** A10 5.245e-01 1.551e+00 0.338 0.73690 A11 2.659e-01 2.565e+00 0.104 0.91792 A12 -8.896e-01 4.525e-01 -1.966 0.05560 . A13 1.868e+00 9.346e-01 1.999 0.05186 . A14 -3.477e-02 1.423e-01 -0.244 0.80812 A15 5.329e-01 1.052e+00 0.507 0.61494 A3, A7, A10, A11, A14,A15 are not importantValues >0.5 are neglectedresults= lm(B~A1+A2+A4+A5+A6+A8+A9+A12+A13, data=reg)summary(results)Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.397e+03 2.906e+02 4.807 1.44e-05 ***A1 1.436e+00 7.803e-01 1.840 0.071698 . A2 -2.021e+00 5.126e-01 -3.943 0.000251 ***A4 -5.846e+00 6.561e+00 -0.891 0.377220 A5 -7.383e+01 5.683e+01 -1.299 0.199878 A6 -2.141e+01 7.513e+00 -2.850 0.006330 ** A8 8.727e-03 3.324e-03 2.626 0.011442 * A9 3.874e+00 9.251e-01 4.188 0.000114 ***A12 -7.555e-01 3.101e-01 -2.437 0.018432 * A13 1.606e+00 5.985e-01 2.683 0.009855 ** A4, A5 are not importantValues >0.5 are neglectedresults= lm(B~A1+A2+A5+A6+A8+A9+A12+A13, data=reg)summary(results)results= lm(B~A1+A2+A6+A8+A9+A12+A13, data=reg)summary(results)Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.034e+03 8.190e+01 12.626 < 2e-16 ***A1 1.219e+00 6.206e-01 1.964 0.054876 . A2 -1.672e+00 4.257e-01 -3.927 0.000254 ***A6 -1.578e+01 6.126e+00 -2.576 0.012873 * A8 9.285e-03 3.227e-03 2.877 0.005811 ** A9 4.081e+00 5.676e-01 7.190 2.46e-09 ***A12 -7.372e-01 3.083e-01 -2.391 0.020444 * A13 1.576e+00 5.889e-01 2.677 0.009914 ** Scaling Variables reg1= reg> reg1$A1 = scale(reg1$A1)> reg1$A2 = scale(reg1$A2)> reg1$A3 = scale(reg1$A3)> reg1$A4 = scale(reg1$A4)> reg1$A5 = scale(reg1$A5)> reg1$A6 = scale(reg1$A6)> reg1$A7 = scale(reg1$A7)> reg1$A8 = scale(reg1$A8)> reg1$A9 = scale(reg1$A9)> reg1$A10 = scale(reg1$A10)> reg1$A11 = scale(reg1$A11)> reg1$A12 = scale(reg1$A12)> reg1$A13 = scale(reg1$A13)> reg1$A14 = scale(reg1$A14)> reg1$A15 = scale(reg1$A15)> reg1 results1= lm(B~A1+A2+A3+A4+A5+A6+A7+A8+A9+A10+A11+A12+A13+A14+A15,data=reg1)> summary(results1)Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 940.313 4.175 225.238 < 2e-16 ***A1 20.694 8.406 2.462 0.01781 * A2 -26.074 8.086 -3.224 0.00238 ** A3 -13.504 8.442 -1.600 0.11682 A4 -20.577 11.346 -1.813 0.07658 . A5 -15.616 8.387 -1.862 0.06931 . A6 -20.508 9.479 -2.164 0.03596 * A7 -5.885 7.541 -0.780 0.43933 A8 14.598 5.996 2.435 0.01903 * A9 31.511 11.441 2.754 0.00852 ** A10 2.427 7.177 0.338 0.73690 A11 1.106 10.672 0.104 0.91792 A12 -81.827 41.616 -1.966 0.05560 . A13 86.591 43.327 1.999 0.05186 . A14 -2.204 9.020 -0.244 0.80812 A15 2.909 5.743 0.507 0.61494 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 940.313 4.182 224.846 < 2e-16 ***A1 12.171 6.197 1.964 0.054876 . A2 -20.020 5.097 -3.927 0.000254 ***A6 -13.340 5.178 -2.576 0.012873 * A8 13.502 4.693 2.877 0.005811 ** A9 36.400 5.063 7.190 2.46e-09 ***A12 -67.810 28.357 -2.391 0.020444 * A13 73.080 27.299 2.677 0.009914 ** Remove one by one starting from the highest value.A3,A4,A5,A7,A10,A11,A14,A15 are neglected

Scaling Variables Including Dependent Variablereg2= reg> reg2B = scale(reg2$B)> reg2$A1 = scale(reg2$A1)> reg2$A2 = scale(reg2$A2)> reg2$A3 = scale(reg2$A3)> reg2$A4 = scale(reg2$A4)> reg2$A5 = scale(reg2$A5)> reg2$A6 = scale(reg2$A6)> reg2$A7 = scale(reg2$A7)> reg2$A8 = scale(reg2$A8)> reg2$A9 = scale(reg2$A9)> reg2$A10 = scale(reg2$A10)> reg2$A11 = scale(reg2$A11)> reg2$A12 = scale(reg2$A12)> reg2$A13 = scale(reg2$A13)> reg2$A14 = scale(reg2$A14)> reg2$A15 = scale(reg2$A15)

> reg2results2= lm(B~A1+A2+A3+A4+A5+A6+A7+A8+A9+A10+A11+A12+A13+A14+A15,data=reg2)

summary(results2)Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 940.313 4.182 224.846 < 2e-16 ***A1 12.171 6.197 1.964 0.054876 . A2 -20.020 5.097 -3.927 0.000254 ***A6 -13.340 5.178 -2.576 0.012873 * A8 13.502 4.693 2.877 0.005811 ** A9 36.400 5.063 7.190 2.46e-09 ***A12 -67.810 28.357 -2.391 0.020444 * A13 73.080 27.299 2.677 0.009914 **results2= lm(B~A1+A2+A6+A8+A9+A12+A13,data=reg2)

summary(results2)Calculating Leveragelev=hat(model.matrix(results))plot(lev)

Sno A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 A15 B7 7 43 30 74 10.9 3.23 12.1 83.9 4679 3.5 49.2 11.3 21 32 62 56 934.729 29 11 53 68 9.2 2.99 12.1 90.6 4700 7.8 48.9 12.3 648 319 130 47 861.832 32 60 67 82 10.0 2.98 11.5 88.6 4657 13.6 47.3 22.4 3 1 1 60 861.440 40 36 29 72 9.5 3.32 10.6 77.6 3437 8.1 45.5 13.8 45 59 263 56 991.247 47 10 55 70 7.3 3.11 12.1 88.9 3033 5.9 51.0 14.0 144 66 20 61 839.748 48 18 48 63 9.2 2.92 12.2 87.7 4253 13.7 51.2 12.0 311 171 86 71 911.749 49 13 49 68 7.0 3.36 12.2 90.7 2702 3.0 51.9 9.7 105 32 3 71 790.755 55 41 37 78 6.2 3.25 12.3 89.5 5308 25.9 59.7 10.3 65 28 102 52 967.859 59 42 83 76 9.7 3.22 9.0 76.2 9699 4.8 42.2 14.5 8 8 49 54 911.8reg[lev>0.2,]Diagnosing Residuals>par(mfrow=c(1,5))

> plot(reg$A1, results$res)> plot(reg$A2, results$res)> plot(reg$A3, results$res)> plot(reg$A4, results$res)> plot(reg$A5, results$res)> plot(reg$A6, results$res)> plot(reg$A7, results$res)> plot(reg$A8, results$res)> plot(reg$A9, results$res)> plot(reg$A10, results$res)> plot(reg$A11, results$res)> plot(reg$A12, results$res)> plot(reg$A13, results$res)> plot(reg$A14, results$res)>plot(reg$A15, results$res)

> plot(results$fitted, results$res)

Plot Studentized Residuals Vs. Fitted Values Suggested power transformation: 0.5839741

qqnorm(results$res)qqline(results$res)hist(results$res)

A VIF greater than 10 for a variable suggests strong multicollinearity.modelvif(results) A1 A2 A6 A8 A9 A12 A13 2.158907 1.460934 1.507709 1.238291 1.440988 45.211507 41.899843 Test Of Multicollinearity