auto choice(revised)

32
How do US Drivers Choose the Cars They Buy SHIH-WEN HUANG, SHEN YAN, LIYAN WANG Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.

Upload: shen-yan

Post on 13-Apr-2017

16 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: auto choice(revised)

How do US Drivers Choose the Cars They Buy

SHIH-WEN HUANG, SHEN YAN, LIYAN WANG

Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.

Page 2: auto choice(revised)

Survey

Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.

Page 3: auto choice(revised)

Source of data

This is a consumer study undertaken across the US by Kelley Blue Book(known as KBB), a vehicle valuation and automotive research company that is recognized by both consumers and the automotive industry.

The purpose of this analysis is to find out how US drivers choose the cars they buy.

Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.

Page 4: auto choice(revised)

Data Cleansing

Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.

Page 5: auto choice(revised)

Missing values removedSorted in alphabet order

Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.

Page 6: auto choice(revised)

Dataset Variables Dependent Variable: OCRAT1(Consumer

Reports numerical score) Number of observations: n=170

1. Mcode(Manufacturer Code)

5. OGAS1(Gas mileage mpg)

9. OSAF1(Consumer Reports rating of safety )

2. CRREC(Recommended by Consumer Reports = 1)

6. ORLGRM1(Rear leg room inches)

10. OHAND1(Consumer Reports rating of handling)

3. OREL1(Consumer Reports reliability rating - 5 pt. scale)

7. OACCEL1(Acceleration 0-60 mph)

11. ORIDE1(Consumer Reports rating of ride)

4. OLCAP1(Luggage capacity cu. ft.)

8. OFSEAT1(Consumer Reports rating of front seat comfort)

Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.

Page 7: auto choice(revised)

Descriptive StatisticsVariable Name Mean Standard DeviationCRREC 3.112 1.0793283

OREL1 31.048 16.5318098OLCAP1 18.647 3.4287537

OGAS1 28.644 1.8149920

ORLGRM1 9.077 1.8248783

OACCEL1 3.900 0.5509540

OFSEAT1 4.323 0.7672302

OSAF1 3.288 0.7333012

OHAND1 3.224 0.6772110

Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.

Page 8: auto choice(revised)

Variable Correlation

All correlations between independent variables are <0.5 except for the following:

OLCAP1 with OGAS1 = -0.559 OHAND1 with OGAS1 = 0.504

Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.

Page 9: auto choice(revised)

Variable Correlation

Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.

Page 10: auto choice(revised)

Transformation

Page 11: auto choice(revised)

Variable Selection

Maximum R-squared Stepwise Selection GLMSelection AIC Selection

Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.

Page 12: auto choice(revised)
Page 13: auto choice(revised)
Page 14: auto choice(revised)
Page 15: auto choice(revised)
Page 16: auto choice(revised)

Selected Model

OCRAT1= -107.53353+ 0.88959*Mcode+ 1.52307* OLCAP1+ 3.59932* OGAS1- -8.34379* OACCEL1+ 17.88028* OFSEAT1+ 15.66684* OSAF1+ 39.36924* OHAND1+ 27.44342* ORIDE1;

Page 17: auto choice(revised)

Regression Conclusion

R Squared: 0.5754 Adjusted R-squared: 0.5543 Number of influence point is 17 Overall F-statistic: 27.27 P-value for Overall F-test: <0.0001 VIF=PRESS/SSR=1.1433

Page 18: auto choice(revised)

PROC REGSource CodePROC REG data=car_new;MODEL OCRAT1= Mcode OLCAP1 OGAS1 OACCEL1 OFSEAT1 OSAF1 OHAND1 ORIDE1 / r p influence vif;PLOT r.* p. r.* nqq.;RUN;

Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.

Page 19: auto choice(revised)

Selected ModelParameter Estimates

Variable DF Parameter Standard t Value Pr > |t|

Estimate Error

---------------------------------------------------------------------------

Intercept 1 -210.36703 36.06039 -5.83 <.0001

Mcode 1 0.65258 0.28073 2.32 0.0215

OLCAP1 1 1.59184 0.21803 7.30 <.0001

OGAS1 1 5.58281 1.11826 4.99 <.0001

OACCEL1 1 -6.79223 1.60769 -4.22 <.0001

OFSEAT1 1 18.78770 5.61708 3.34 0.0011

OSAF1 1 21.18837 3.99055 5.31 <.0001

OHAND1 1 35.98280 4.59454 7.83 <.0001

ORIDE1 1 38.90925 4.57810 8.50 <.0001

Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.

Page 20: auto choice(revised)

Selected Model Cross-validation Estimates --------------Cross Validation Estimates---------------

Parameter 1 2 3 4 5 Intercept -215.22 -221.657 -240.976 -190.66 -174.67 Mcode 0.78 0.471 0.884 0.51 0.64 OLCAP1 1.54 1.563 1.470 1.79 1.61 OGAS1 5.35 5.461 5.613 6.15 5.39 OACCEL1 -6.38 -6.281 -5.785 -7.92 -8.46 OFSEAT1 20.39 19.546 22.426 15.83 15.27 OSAF1 22.50 22.200 22.450 17.85 21.28 OHAND1 34.11 36.681 33.626 39.38 35.78 ORIDE1 38.68 40.296 42.117 35.96 37.25

Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.

Page 21: auto choice(revised)

Analysis of Variance for Model

R-squared: 0.7782

Adjusted R-squared: 0.7659

Overall F-statistic: 63.17

P-value for Overall F-test: <0.0001

VIF=PRESS/SSR:1.1715

Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.

Page 22: auto choice(revised)

PROC REG -- Influence Points and Multicollinarity

Inspection of and for all observations shows that there are 10 influence points.

Here are the VIF values:

Variable VIF -------------------------------

Intercept 0 Mcode 1.27290 OLCAP1 1.77977 OGAS1 1.95318 OACCEL1 1.24724 OFSEAT1 1.19959 OSAF1 1.16904 OHAND1 1.62568 ORIDE1 1.17256

Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.

Page 23: auto choice(revised)

Residual Plot

Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.

Page 24: auto choice(revised)

Normal Plot of Residuals

Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.

Page 25: auto choice(revised)

Conclusions The final model holds up under cross-validation. The R-squared value is relatively high: =.7782. There are 10 influence points which can be

accepted given the sample number. There is no multicollinarity. The residual plot satisfies the assumption: the

residuals are unbiased and homoscedastic. The residuals are normally distributed.

Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.

Page 26: auto choice(revised)

Follow-up Analysis

Is it possible to have more observations in a sample?

Is there any other factors, which influence the consumer choice making, that are not included in the original survey?

Is this analysis too general? Should we break down into several groups, ex. used cars vs new cars, SUV vs sedan?

Page 27: auto choice(revised)

Appendix /* Import data and creat new dataset called car */ PROC IMPORT datafile="C:/datasets/cars.csv" OUT=car DBMS=csv REPLACE; getnames=yes; RUN;

PROC PRINT; RUN;

Page 28: auto choice(revised)

Appendix /* Descriptive statistics about each variable */

PROC MEANS data=car mean min max stddev p25 p75;

VAR OCRAT1 OREL1 OLCAP1 OGAS1 ORLGRM1 OACCEL1 OFSEAT1 OSAF1 OHAND1 ORIDE1;

RUN;

PROC SGSCATTER data=car;

MATRIX OCRAT1 Mcode OREL1 OLCAP1 OGAS1 ORLGRM1 OACCEL1 OFSEAT1 OSAF1 OHAND1 ORIDE1;

RUN;

/* Test the correlation between each independent variable */

PROC CORR data=car;

VAR Mcode OREL1 OLCAP1 OGAS1 ORLGRM1 OACCEL1 OFSEAT1 OSAF1 OHAND1 ORIDE1;

RUN;

Page 29: auto choice(revised)

Appendix /* Build Linear Regression Model for car dataset */

PROC REG data=car;

MODEL OCRAT1= Mcode CRREC OREL1 OLCAP1 OGAS1 ORLGRM1 OACCEL1 OFSEAT1 OSAF1 OHAND1 ORIDE1 / r p vif;

PLOT r.* p. r.* nqq.;

RUN;

* Model 1 Using MaximumR-squared Selection ;

PROC REG data=car ;

MODEL OCRAT1= Mcode CRREC OREL1 OLCAP1 OGAS1 ORLGRM1 OACCEL1 OFSEAT1 OSAF1 OHAND1 ORIDE1 / selection= maxr r p influence vif;

PLOT r.* p. r.* nqq.;

RUN;

Page 30: auto choice(revised)

Appendix * Model 2 Using Stepwise Selection ; PROC REG data=car ; MODEL OCRAT1= Mcode CRREC OREL1 OLCAP1 OGAS1 ORLGRM1 OACCEL1

OFSEAT1 OSAF1 OHAND1 ORIDE1 / selection= stepwise r p influence vif; PLOT r.* p. r.* nqq.; RUN;

*Model 3 Using AIC Selection; PROC RSQUARE AIC; MODEL OCRAT1= Mcode CRREC OREL1 OLCAP1 OGAS1 ORLGRM1 OACCEL1

OFSEAT1 OSAF1 OHAND1 ORIDE1 / select=2; RUN;

* Model 4 Using GLMSelection ; PROC GLMSELECT data=car ; MODEL OCRAT1= Mcode CRREC OREL1 OLCAP1 OGAS1 ORLGRM1 OACCEL1

OFSEAT1 OSAF1 OHAND1 ORIDE1; RUN;

Page 31: auto choice(revised)

Appendix *Initial final model ;

PROC REG data=car;

MODEL OCRAT1=MCODE OLCAP1 OGAS1 OACCEL1 OFSEAT1 OSAF1 OHAND1 ORIDE1/r p influence VIF;

PLOT r.* p. r.* nqq.;

RUN;

/* Remove the influence points and rebuild the model */

*Import new data and creat new dataset called car_new ;

PROC IMPORT datafile='C:/datasets/cars_new.csv'

OUT=car_new

DBMS=csv

REPLACE;

getnames=yes;

RUN;

PROC PRINT;

RUN;

Page 32: auto choice(revised)

Appendix /* Build the best regression model */

PROC REG data=car_new;

MODEL OCRAT1= Mcode OLCAP1 OGAS1 OACCEL1 OFSEAT1 OSAF1 OHAND1 ORIDE1 / r p influence vif;

PLOT r.* p. r.* nqq.;

RUN;

/* Cross validation */

PROC GLMSELECT seed=4530;

MODEL OCRAT1= Mcode OREL1 OLCAP1 OGAS1 ORLGRM1 OACCEL1 OFSEAT1 OSAF1 OHAND1 ORIDE1

/ stats= all cvdetails=all details=summary selection=stepwise(select=cv drop=competitive) cvmethod=random(5);

RUN;

QUIT;