auto choice(revised)

Post on 13-Apr-2017

16 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

How do US Drivers Choose the Cars They Buy

SHIH-WEN HUANG, SHEN YAN, LIYAN WANG

Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.

Survey

Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.

Source of data

This is a consumer study undertaken across the US by Kelley Blue Book(known as KBB), a vehicle valuation and automotive research company that is recognized by both consumers and the automotive industry.

The purpose of this analysis is to find out how US drivers choose the cars they buy.

Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.

Data Cleansing

Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.

Missing values removedSorted in alphabet order

Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.

Dataset Variables Dependent Variable: OCRAT1(Consumer

Reports numerical score) Number of observations: n=170

1. Mcode(Manufacturer Code)

5. OGAS1(Gas mileage mpg)

9. OSAF1(Consumer Reports rating of safety )

2. CRREC(Recommended by Consumer Reports = 1)

6. ORLGRM1(Rear leg room inches)

10. OHAND1(Consumer Reports rating of handling)

3. OREL1(Consumer Reports reliability rating - 5 pt. scale)

7. OACCEL1(Acceleration 0-60 mph)

11. ORIDE1(Consumer Reports rating of ride)

4. OLCAP1(Luggage capacity cu. ft.)

8. OFSEAT1(Consumer Reports rating of front seat comfort)

Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.

Descriptive StatisticsVariable Name Mean Standard DeviationCRREC 3.112 1.0793283

OREL1 31.048 16.5318098OLCAP1 18.647 3.4287537

OGAS1 28.644 1.8149920

ORLGRM1 9.077 1.8248783

OACCEL1 3.900 0.5509540

OFSEAT1 4.323 0.7672302

OSAF1 3.288 0.7333012

OHAND1 3.224 0.6772110

Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.

Variable Correlation

All correlations between independent variables are <0.5 except for the following:

OLCAP1 with OGAS1 = -0.559 OHAND1 with OGAS1 = 0.504

Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.

Variable Correlation

Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.

Transformation

Variable Selection

Maximum R-squared Stepwise Selection GLMSelection AIC Selection

Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.

Selected Model

OCRAT1= -107.53353+ 0.88959*Mcode+ 1.52307* OLCAP1+ 3.59932* OGAS1- -8.34379* OACCEL1+ 17.88028* OFSEAT1+ 15.66684* OSAF1+ 39.36924* OHAND1+ 27.44342* ORIDE1;

Regression Conclusion

R Squared: 0.5754 Adjusted R-squared: 0.5543 Number of influence point is 17 Overall F-statistic: 27.27 P-value for Overall F-test: <0.0001 VIF=PRESS/SSR=1.1433

PROC REGSource CodePROC REG data=car_new;MODEL OCRAT1= Mcode OLCAP1 OGAS1 OACCEL1 OFSEAT1 OSAF1 OHAND1 ORIDE1 / r p influence vif;PLOT r.* p. r.* nqq.;RUN;

Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.

Selected ModelParameter Estimates

Variable DF Parameter Standard t Value Pr > |t|

Estimate Error

---------------------------------------------------------------------------

Intercept 1 -210.36703 36.06039 -5.83 <.0001

Mcode 1 0.65258 0.28073 2.32 0.0215

OLCAP1 1 1.59184 0.21803 7.30 <.0001

OGAS1 1 5.58281 1.11826 4.99 <.0001

OACCEL1 1 -6.79223 1.60769 -4.22 <.0001

OFSEAT1 1 18.78770 5.61708 3.34 0.0011

OSAF1 1 21.18837 3.99055 5.31 <.0001

OHAND1 1 35.98280 4.59454 7.83 <.0001

ORIDE1 1 38.90925 4.57810 8.50 <.0001

Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.

Selected Model Cross-validation Estimates --------------Cross Validation Estimates---------------

Parameter 1 2 3 4 5 Intercept -215.22 -221.657 -240.976 -190.66 -174.67 Mcode 0.78 0.471 0.884 0.51 0.64 OLCAP1 1.54 1.563 1.470 1.79 1.61 OGAS1 5.35 5.461 5.613 6.15 5.39 OACCEL1 -6.38 -6.281 -5.785 -7.92 -8.46 OFSEAT1 20.39 19.546 22.426 15.83 15.27 OSAF1 22.50 22.200 22.450 17.85 21.28 OHAND1 34.11 36.681 33.626 39.38 35.78 ORIDE1 38.68 40.296 42.117 35.96 37.25

Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.

Analysis of Variance for Model

R-squared: 0.7782

Adjusted R-squared: 0.7659

Overall F-statistic: 63.17

P-value for Overall F-test: <0.0001

VIF=PRESS/SSR:1.1715

Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.

PROC REG -- Influence Points and Multicollinarity

Inspection of and for all observations shows that there are 10 influence points.

Here are the VIF values:

Variable VIF -------------------------------

Intercept 0 Mcode 1.27290 OLCAP1 1.77977 OGAS1 1.95318 OACCEL1 1.24724 OFSEAT1 1.19959 OSAF1 1.16904 OHAND1 1.62568 ORIDE1 1.17256

Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.

Residual Plot

Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.

Normal Plot of Residuals

Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.

Conclusions The final model holds up under cross-validation. The R-squared value is relatively high: =.7782. There are 10 influence points which can be

accepted given the sample number. There is no multicollinarity. The residual plot satisfies the assumption: the

residuals are unbiased and homoscedastic. The residuals are normally distributed.

Kelley Blue Book. © Kelley Blue Book Co, Inc. All Rights Reserved.

Follow-up Analysis

Is it possible to have more observations in a sample?

Is there any other factors, which influence the consumer choice making, that are not included in the original survey?

Is this analysis too general? Should we break down into several groups, ex. used cars vs new cars, SUV vs sedan?

Appendix /* Import data and creat new dataset called car */ PROC IMPORT datafile="C:/datasets/cars.csv" OUT=car DBMS=csv REPLACE; getnames=yes; RUN;

PROC PRINT; RUN;

Appendix /* Descriptive statistics about each variable */

PROC MEANS data=car mean min max stddev p25 p75;

VAR OCRAT1 OREL1 OLCAP1 OGAS1 ORLGRM1 OACCEL1 OFSEAT1 OSAF1 OHAND1 ORIDE1;

RUN;

PROC SGSCATTER data=car;

MATRIX OCRAT1 Mcode OREL1 OLCAP1 OGAS1 ORLGRM1 OACCEL1 OFSEAT1 OSAF1 OHAND1 ORIDE1;

RUN;

/* Test the correlation between each independent variable */

PROC CORR data=car;

VAR Mcode OREL1 OLCAP1 OGAS1 ORLGRM1 OACCEL1 OFSEAT1 OSAF1 OHAND1 ORIDE1;

RUN;

Appendix /* Build Linear Regression Model for car dataset */

PROC REG data=car;

MODEL OCRAT1= Mcode CRREC OREL1 OLCAP1 OGAS1 ORLGRM1 OACCEL1 OFSEAT1 OSAF1 OHAND1 ORIDE1 / r p vif;

PLOT r.* p. r.* nqq.;

RUN;

* Model 1 Using MaximumR-squared Selection ;

PROC REG data=car ;

MODEL OCRAT1= Mcode CRREC OREL1 OLCAP1 OGAS1 ORLGRM1 OACCEL1 OFSEAT1 OSAF1 OHAND1 ORIDE1 / selection= maxr r p influence vif;

PLOT r.* p. r.* nqq.;

RUN;

Appendix * Model 2 Using Stepwise Selection ; PROC REG data=car ; MODEL OCRAT1= Mcode CRREC OREL1 OLCAP1 OGAS1 ORLGRM1 OACCEL1

OFSEAT1 OSAF1 OHAND1 ORIDE1 / selection= stepwise r p influence vif; PLOT r.* p. r.* nqq.; RUN;

*Model 3 Using AIC Selection; PROC RSQUARE AIC; MODEL OCRAT1= Mcode CRREC OREL1 OLCAP1 OGAS1 ORLGRM1 OACCEL1

OFSEAT1 OSAF1 OHAND1 ORIDE1 / select=2; RUN;

* Model 4 Using GLMSelection ; PROC GLMSELECT data=car ; MODEL OCRAT1= Mcode CRREC OREL1 OLCAP1 OGAS1 ORLGRM1 OACCEL1

OFSEAT1 OSAF1 OHAND1 ORIDE1; RUN;

Appendix *Initial final model ;

PROC REG data=car;

MODEL OCRAT1=MCODE OLCAP1 OGAS1 OACCEL1 OFSEAT1 OSAF1 OHAND1 ORIDE1/r p influence VIF;

PLOT r.* p. r.* nqq.;

RUN;

/* Remove the influence points and rebuild the model */

*Import new data and creat new dataset called car_new ;

PROC IMPORT datafile='C:/datasets/cars_new.csv'

OUT=car_new

DBMS=csv

REPLACE;

getnames=yes;

RUN;

PROC PRINT;

RUN;

Appendix /* Build the best regression model */

PROC REG data=car_new;

MODEL OCRAT1= Mcode OLCAP1 OGAS1 OACCEL1 OFSEAT1 OSAF1 OHAND1 ORIDE1 / r p influence vif;

PLOT r.* p. r.* nqq.;

RUN;

/* Cross validation */

PROC GLMSELECT seed=4530;

MODEL OCRAT1= Mcode OREL1 OLCAP1 OGAS1 ORLGRM1 OACCEL1 OFSEAT1 OSAF1 OHAND1 ORIDE1

/ stats= all cvdetails=all details=summary selection=stepwise(select=cv drop=competitive) cvmethod=random(5);

RUN;

QUIT;

top related