data & methodology...data & methodology for analyzing the above objectives and hypothesis...

24
METHODOLOGY Data & Methodology For Analyzing the above objectives and hypothesis Data mining Tools and Techniques i.e R programming, R Rattle, WEKA and SPSS 20 software's were using Data validating summarizing visualizing Modeling testing : We are the implemented below Data mining machine learning algorithms techniques 1.Data Mining Tools 2.Data Validation Imputation 3.Classifying models 4.Clustering models 5.Models Comparison 6.Best fitted Model Reachout Analytics Client Sample Report

Upload: others

Post on 15-Oct-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data & Methodology...Data & Methodology For Analyzing the above objectives and hypothesis Data mining Tools and Techniques i.e R programming, R Rattle, WEKA and SPSS 20 software's

METHODOLOGY

Data & Methodology

For Analyzing the above objectives and hypothesis Data mining Tools and Techniques i.e R programming, R Rattle, WEKA and SPSS 20 software's were using Data validating summarizing visualizing Modeling testing : We are the implemented below Data

mining machine learning algorithms techniques 1.Data Mining Tools 2.Data Validation Imputation 3.Classifying models 4.Clustering models 5.Models Comparison 6.Best fitted Model

Reachout Analytics Client Sample Report

Page 2: Data & Methodology...Data & Methodology For Analyzing the above objectives and hypothesis Data mining Tools and Techniques i.e R programming, R Rattle, WEKA and SPSS 20 software's

I look for the popular brands when purchasing online If I am satisfied with the product purchased I intend to buy the other products from the same brand I can shop online by comparing the products and can access it hours all the time I feel we can save the time on online rather than going to buy in retail stores I feel we can have better offers in online purchasing I feel the products that are bought from online satisfy my requirements I read the review and the product information of the product before purchasing it I feel secure when purchasing online I prefer to buy the products which are approved or certified by the quality experts I feel it is easy to book the product in advance and order it when the stock is available I feel the price is an important factor while purchasing online I feel the promotions influence my buying decisions I feel the product advertising will influence me to some extent to buy the product I’m willing to pay more for high value products in certain category I still buy the same product/brand if there is increase in price

I think the convenient sizes and the packages offered may significantly impact my purchase decision I look for discounts gift coupons while purchasing the product I choose to buy product endorsed by my favorite celebrities The opinions and recommendation of your family members would affect your purchase decision The opinions of your friends and colleagues would affect your purchase decision I think buying an expensive product indicates high standard of living I think international brands understands my requirements well I buy products online when it is recommended by my friends and colleagues I buy products online when it is recommended by the doctor I like to check out the things by trailing before I buy the product I prefer to buy online with websites that have up to date contents Attractive website design encourages to spend me more time to search for products I buy personal care products whenever I go out for shopping I buy personal care products every one month I preferred to buy through user friendly web portals

E commerce influence variables gathered from secondary sources : Research journals articles and publication and E commerce consultants the list of variables are below

2.Data Validation Imputation

Reachout Analytics Client Sample Report

Page 3: Data & Methodology...Data & Methodology For Analyzing the above objectives and hypothesis Data mining Tools and Techniques i.e R programming, R Rattle, WEKA and SPSS 20 software's

Dimension Reduction

Dimension Reduction

Load

ed

Factors

4. Customer Behavior,

1. Customer Perception,

2. Customer Attitude

3. Customer Branding

5. Packaging

(Principle Component Method)

6. Customer Self Belief

7. Offers and Discounts

Data Validation

E Commerce Influencing Variables

30 variables

Reachout Analytics Client Sample Report

Page 4: Data & Methodology...Data & Methodology For Analyzing the above objectives and hypothesis Data mining Tools and Techniques i.e R programming, R Rattle, WEKA and SPSS 20 software's

Data Reliability:

The Cronbach’s Alpha which is a reliability statistic is obtained minimum should be 70% or else increase sample size . This is found to be 93% which means the data is reliable. The table for this is shown below.

Sample Validity:

A sample of 727 is adequate for the study is confirmed by the KMO statistic . Minimum should be 50% , if less then 50% should increase sample size. A good result of 96% is showing the sample validity.

Reliability Statistics

Cronbach's Alpha N of Items .939 30

KMO and Bartlett's Test

Kaiser-Meyer-Olkin Measure of Sampling Adequacy 0.964

Bartlett's Test of Sphericity

Approx. Chi-Square 62659.45

df 435

Sig. 0

Reachout Analytics Client Sample Report

Page 5: Data & Methodology...Data & Methodology For Analyzing the above objectives and hypothesis Data mining Tools and Techniques i.e R programming, R Rattle, WEKA and SPSS 20 software's

Validation Imputation of PCA Factors

Reachout Analytics Client Sample Report

Page 6: Data & Methodology...Data & Methodology For Analyzing the above objectives and hypothesis Data mining Tools and Techniques i.e R programming, R Rattle, WEKA and SPSS 20 software's

7. Offers and Discounts 4. Customer Behavior 5. Packaging

6. Customer Self Belief

Default Outlier are 2.16% with 95% confidential significant ; at 99% level there are not outliers . Reachout Analytics Client Sample Report

Page 7: Data & Methodology...Data & Methodology For Analyzing the above objectives and hypothesis Data mining Tools and Techniques i.e R programming, R Rattle, WEKA and SPSS 20 software's

Classification Regression Model

1.1.Ordinal Regression

1.2.Multinomial Regression

1.3.Binary Regression Model

1.4 Naive Bayes Model

1.5 Decision Tree

1.6 KNN Model

1.7 SVM Model

3. Classifying models

Reachout Analytics Client Sample Report

Page 8: Data & Methodology...Data & Methodology For Analyzing the above objectives and hypothesis Data mining Tools and Techniques i.e R programming, R Rattle, WEKA and SPSS 20 software's

Model is build with Demographic Variables VS(verses) 7 Factor variables, which are Customer Perception, Customer Attitude, Customer Branding, Customer Behavior, Packaging Customer Self Belief, Offers and Discounts and 7 behavior variables CBB_9,CBB_10,CBB_11,CBB_12, CBB_13,CBB_14 variables that are affecting the E-commerce. A detailed study is done on each model and conclusions are made about the factors that are influencing consumer buying behavior. All the models are evaluated by the confusion matrix and the respective model diagnostics for each variable which are shown below.

1. Marital Status : Model Validation summary report [Binary Logistic Model]

Confusion Matrix Model Diagnostics Marital Status

Unmarried Married ROC RMSE Classification %(Count)

Miss classification %(Count)

Unmarried 146 154 0.677

46%

67.67 32.32

Married 81 346 0.677 (492) -235

Binary , Ordinal and Multinomial Model Summary

Reachout Analytics Client Sample Report

Page 9: Data & Methodology...Data & Methodology For Analyzing the above objectives and hypothesis Data mining Tools and Techniques i.e R programming, R Rattle, WEKA and SPSS 20 software's

Gender

Confusion Matrix Model Diagnostics

Female

Male

ROC

RMSE

Classification

Miss classification

Female 159 195 0.633 48% 63.14% 36.86%

Male 73 300 0.633 (459) (268)

2.Customer Gender : Model Validation Summary report [Binary Model ]

Region

Confusion Matrix Model Diagnostics

Town Urban Village Rural Metro ROC RMSE

Classification

%(Count)

Miss

Town 137 6 4 0 2 0.976

18% 90.10%

(655)

9.90

%

(72)

Urban 6 206 6 1 11 0.955

Village 3 3 130 0 4 0.965

Rural 0 1 0 0 0 0.437

Metro 0 25 0 0 182 0.967

3.Customer Region : Model Validation summary report [Multinomial Regression]

Reachout Analytics Client Sample Report

Page 10: Data & Methodology...Data & Methodology For Analyzing the above objectives and hypothesis Data mining Tools and Techniques i.e R programming, R Rattle, WEKA and SPSS 20 software's

Income Group

Confusion Matrix Model Diagnostics

Ab

ove

60

,00

0

60

,00

0 t

o

80

,00

0

30

,00

0 t

o

40

,00

0

40

,00

0 t

o

60

,00

0

ROC

RMSE

Classification

%(Count)

Miss

classification

%(Count)

60,000 205 23 6 8 0.976

27% 83.63%

(608)

16.36%

(119)

60,000 to 80,000 23 195 5 6 0.955

30,000 to 40,000 1 3 113 13 0.965

40,000 to 60,000 0 26 5 95 0.437

4. Customer Income Group :Model Validation summary report [Ordinal regression ]

Occupation

Confusion matrix Model Diagnostics

Self

Emp

loye

e

Emp

loye

d

Ho

me

Mak

er

Pro

fess

ion

al

ROC

RMSE

Classification

%(Count)

Miss

classification

%(Count)

Self Employee 28 68 1 55 0.272

48% 66.85% (486) 33.14%(241) Employed 18 197 10 29 0.712

Home Maker 0 10 104 3 0.842

Professional 8 24 15 157 0.701

5. Customer Occupation : Model Validation summary report [Multinomial regression ]

Reachout Analytics Client Sample Report

Page 11: Data & Methodology...Data & Methodology For Analyzing the above objectives and hypothesis Data mining Tools and Techniques i.e R programming, R Rattle, WEKA and SPSS 20 software's

Education

Confusion matrix Model Diagnostics

Gra

du

atio

n

Inte

rme

dia

te/1

0+

2

Pro

fess

ion

al

De

gre

e

Po

st-G

rad

uat

ion

ROC

RMSE

Classification

%(Count)

Miss

classification

%(Count)

Graduation 133 20 10 17 0.884

26% 84.59% (615) 15.40%(112) Intermediate/10+2 13 132 6 7 0.927

Professional Degree 2 0 181 7 0.992

Post-Graduation 26 4 0 169 0.965

6. Education :Model Validation summary report [Ordinal Regression ]

Age

Group

Confusion Matrix Model Diagnostics

36

-45

46

-55

26

-35

Be

low

25

Ab

ove

56

ROC RMSE

Classification

%(Count)

Miss classification

%(Count)

36-45 152 19 35 1 28 0.772

32%

60.52% (440)

39.48%

(287)

7.Age Group: Model Validation summary report [Ordinal Regression ]

Reachout Analytics Client Sample Report

Page 12: Data & Methodology...Data & Methodology For Analyzing the above objectives and hypothesis Data mining Tools and Techniques i.e R programming, R Rattle, WEKA and SPSS 20 software's

Model Fitting Formula : Gender

4.3407+0.2414* Offers and Discounts +0.3309* Customer Self Belief-0.1517* Packaging -0.3154* Customer Behaviour+0.1796* Customer Branding-0.3178 *Customer Attitude+0.1457*Customer Perception-0.6757*CBB_14 -0.2964*CBB_13 +0.133*CBB_12-0.0137*CBB_11+0.0389*CBB_10-0.4398*CBB_9

e

Model Fitting Formula : Marital Status

2.8687+ 0.227 * Offers and Discounts +0.4861* Customer Self Belief-0.1331* Packaging -0.2495* Customer Behaviour+0.2038* Customer Branding-0.4819 *Customer Attitude+0.6287*Customer Perception-0.535*CBB_14 -0.1468*CBB_13 +0.2218*CBB_12+0.1082*CBB_11-0.0836*CBB_10-0.5071*CBB_9

e

Reachout Analytics Client Sample Report

Page 13: Data & Methodology...Data & Methodology For Analyzing the above objectives and hypothesis Data mining Tools and Techniques i.e R programming, R Rattle, WEKA and SPSS 20 software's

Model Fitting Formula : Age Group [36-45]

7.1193+ 0.4248 * Offers and Discounts -0.1704* Customer Self Belief+0.4935* Packaging -0.1931* Customer Behaviour-1.5526* Customer Branding+1.1133 *Customer Attitude-2.6848*Customer Perception+0.5719*CBB_14 -0.621*CBB_13 +0.0359*CBB_12+0.0245*CBB_11+0.157*CBB_10-1.8955*CBB_9 e

Model Fitting Formula : Age Group [46-55]

6.9429 – 0.1778 * Offers and Discounts -0.2134* Customer Self Belief+0.2055* Packaging -0.296* Customer Behaviour-1.3673* Customer Branding+0.7912 *Customer Attitude-1.2464*Customer Perception+0.6122*CBB_14 -0.5026*CBB_13 -0.2009*CBB_12-0.1683*CBB_11+0.0932*CBB_10-1.4382*CBB_9 e

Reachout Analytics Client Sample Report

Page 14: Data & Methodology...Data & Methodology For Analyzing the above objectives and hypothesis Data mining Tools and Techniques i.e R programming, R Rattle, WEKA and SPSS 20 software's

8.9252 + 0.3884 * Offers and Discounts -0.0226* Customer Self Belief+0.1511* Packaging -0.0189* Customer Behaviour-1.0493* Customer Branding+0.8993 *Customer Attitude-2.5456*Customer Perception+1.0159*CBB_14 -0.7425*CBB_13 -0.0238*CBB_12-0.3246*CBB_11+0.2426*CBB_10-2.985*CBB_9

Model Fitting Formula : Age Group [26-35]

e

Model Fitting Formula : Age Group [Below 25]

-57.1636 + 0.1955 * Offers and Discounts +0.1775* Customer Self Belief-0.9933* Packaging +0.2122* Customer Behaviour-2.0521* Customer Branding+1.9345 *Customer Attitude-1.8265*Customer Perception+0.1878*CBB_14 -1.8165*CBB_13 +11.5065*CBB_12+0.1293*CBB_11+0.1547*CBB_10-0.2915*CBB_9 e

Reachout Analytics Client Sample Report

Page 15: Data & Methodology...Data & Methodology For Analyzing the above objectives and hypothesis Data mining Tools and Techniques i.e R programming, R Rattle, WEKA and SPSS 20 software's

Model Fitting Formula : Education [Graduation]

3.7728 + 0.4904 * Offers and Discounts -0.3615* Customer Self Belief+0.6794* Packaging +0.108* Customer Behaviour-1.3465* Customer Branding+0.3239 *Customer Attitude+0.1711*Customer Perception-0.076*CBB_14 -0.3695*CBB_13 +0.2737*CBB_12+0.0713*CBB_11-0.1435*CBB_10-1.6017*CBB_9

e Model Fitting Formula : Education [Intermediate/ 10+2]

10.3017 + 0.4006 * Offers and Discounts -0.6479* Customer Self Belief+0.5467* Packaging +0.1217* Customer Behaviour-1.6277* Customer Branding+0.3364 *Customer Attitude+0.9111*Customer Perception+0.4626*CBB_14 -0.5513*CBB_13 +0.156*CBB_12-0.404*CBB_11-0.1571*CBB_10-4.006*CBB_9

e Reachout Analytics Client Sample Report

Page 16: Data & Methodology...Data & Methodology For Analyzing the above objectives and hypothesis Data mining Tools and Techniques i.e R programming, R Rattle, WEKA and SPSS 20 software's

Model Fitting Formula : Education [Professional Degree]

-432.5593 – 8.1589 * Offers and Discounts +13.6019* Customer Self Belief-8.4274* Packaging -0.1061* Customer Behaviour-4.5206* Customer Branding-2.2585 *Customer Attitude-17.2684*Customer Perception+0.0567*CBB_14 -.1061*CBB_13 -0.1708*CBB_12+0.2005*CBB_11-0.2711*CBB_10+103.7066*CBB_9

e

Reachout Analytics Client Sample Report

Page 17: Data & Methodology...Data & Methodology For Analyzing the above objectives and hypothesis Data mining Tools and Techniques i.e R programming, R Rattle, WEKA and SPSS 20 software's

Model Fitting Formula : Employment Status [Occupation] -Self Employee

6.5139 – 0.4707 * Offers and Discounts +0.4177* Customer Self Belief-0.1122* Packaging -0.5012* Customer Behaviour-0.1658* Customer Branding-0.3998 *Customer Attitude+0.2741*Customer Perception-0.109*CBB_14 +0.0188*CBB_13 -0.8888*CBB_12-0.2882*CBB_11+0.0388*CBB_10-1.2381*CBB_9

e

Model Fitting Formula : Employment Status [Occupation] -Employed

11.9245 – 0.062 * Offers and Discounts +0.6931* Customer Self Belief-0.2403* Packaging -0.7339* Customer Behaviour-0.907* Customer Branding-0.7062 *Customer Attitude+0.3251*Customer Perception-1.0708*CBB_14 +0.0438*CBB_13 +0.1745*CBB_12-0.0651*CBB_11-0.0585*CBB_10-1.7213*CBB_9

e

Reachout Analytics Client Sample Report

Page 18: Data & Methodology...Data & Methodology For Analyzing the above objectives and hypothesis Data mining Tools and Techniques i.e R programming, R Rattle, WEKA and SPSS 20 software's

Model Fitting Formula : Employment Status [Occupation] –Home Maker

18.2195 - 0.0781 * Offers and Discounts +0.7273* Customer Self Belief-0.2368* Packaging -0.6571* Customer Behaviour-1.4237* Customer Branding-0.7015 *Customer Attitude-0.0353*Customer Perception-0.6185*CBB_14 +0.4073*CBB_13 -0.803*CBB_12-0.5537*CBB_11+0.0062*CBB_10-4.8174*CBB_9

e

Reachout Analytics Client Sample Report

Page 19: Data & Methodology...Data & Methodology For Analyzing the above objectives and hypothesis Data mining Tools and Techniques i.e R programming, R Rattle, WEKA and SPSS 20 software's

Model Fitting Formula : Income Group [Above Rs.60,000]

-5.9494 +0.6643 * Offers and Discounts +1.259 * Customer Self Belief-0.6403* Packaging -0.1426* Customer Behaviour+1.8045* Customer Branding-0.0252 *Customer Attitude+1.2725*Customer Perception-1.3824*CBB_14 +0.0523*CBB_13 -0.1824*CBB_12-0.0497*CBB_11-0.1176*CBB_10+3.7996*CBB_9

e Model Fitting Formula : Income Group [Between Rs.60,000 to Rs.80,000]

1.0753+0.2151 * Offers and Discounts +0.6689 * Customer Self Belief-0.586* Packaging -0.2852* Customer Behaviour+1.4364* Customer Branding-0.0087 *Customer Attitude+0.9552*Customer Perception-0.9507*CBB_14 +0.1349*CBB_13 -0.334*CBB_12-0.1056*CBB_11+0.1011*CBB_10+1.7476*CBB_9

e

Reachout Analytics Client Sample Report

Page 20: Data & Methodology...Data & Methodology For Analyzing the above objectives and hypothesis Data mining Tools and Techniques i.e R programming, R Rattle, WEKA and SPSS 20 software's

Model Fitting Formula : Income Group [Between Rs.30,000 to Rs.40,000]

4.9492-0.4401 * Offers and Discounts -0.5695* Customer Self Belief+0.0038* Packaging -0.1781* Customer Behaviour-0.0711* Customer Branding-0.0633 *Customer Attitude-0.7938*Customer Perception-1.1732*CBB_14 -0.5434*CBB_13 +0.1875*CBB_12-0.5506*CBB_11+0.062*CBB_10-4.1027*CBB_9

e

Reachout Analytics Client Sample Report

Page 21: Data & Methodology...Data & Methodology For Analyzing the above objectives and hypothesis Data mining Tools and Techniques i.e R programming, R Rattle, WEKA and SPSS 20 software's

Model Fitting Formula : Region [Town]

8.1358+0.6813 * Offers and Discounts -0.1903* Customer Self Belief+1.0127* Packaging -0.7039* Customer Behaviour-4.7333* Customer Branding+2.0802 *Customer Attitude-2.6166*Customer Perception+1.6688*CBB_14 -0.4427*CBB_13 +0.5599*CBB_12-0.1691*CBB_11+0.1847*CBB_10-4.4126*CBB_9

e Model Fitting Formula : Region [Urban]

7.571-0.0429 * Offers and Discounts -0.0619* Customer Self Belief+0.2488* Packaging -0.6561* Customer Behaviour-2.5204* Customer Branding+1.036 *Customer Attitude-1.5069*Customer Perception+1.0613*CBB_14 -0.35*CBB_13 +0.1758*CBB_12-0.4029*CBB_11+0.1373*CBB_10-2.3338*CBB_9

e Reachout Analytics Client Sample Report

Page 22: Data & Methodology...Data & Methodology For Analyzing the above objectives and hypothesis Data mining Tools and Techniques i.e R programming, R Rattle, WEKA and SPSS 20 software's

Model Fitting Formula : Region [Village]

12.468+0.8203 * Offers and Discounts -0.338* Customer Self Belief+0.8843* Packaging -0.6863* Customer Behaviour-4.9745* Customer Branding+2.3208 *Customer Attitude-2.4159*Customer Perception+2.7935*CBB_14 -0.3705*CBB_13 +0.6338*CBB_12-0.6742*CBB_11+0.4286*CBB_10-8.4662*CBB_9

e Model Fitting Formula : Region [Rural]

-201.1738+0.3586 * Offers and Discounts +0.5564* Customer Self Belief+2.2098* Packaging -0.4867* Customer Behaviour-2.6323* Customer Branding+1.7746 *Customer Attitude+2.894*Customer Perception+28.0273*CBB_14 -29.8167*CBB_13 +14.2044*CBB_12+1.2923*CBB_11+6.7061*CBB_10-5.4107*CBB_9

e Reachout Analytics Client Sample Report

Page 23: Data & Methodology...Data & Methodology For Analyzing the above objectives and hypothesis Data mining Tools and Techniques i.e R programming, R Rattle, WEKA and SPSS 20 software's

Naive Bayes Model

Decision Tree Model

Naïve Bayes model developed for all the demographic variables Vs Factor and Behavior variables, and the result summary is attached below in excel file. File name: Analysis of Naive Bayes-Summary Results

Decision Tree model also built and the result documentation is in final stage. Will finalize the best fit model once I complete remaining Clustering model then Models Comparison among the models

Reachout Analytics Client Sample Report

Page 24: Data & Methodology...Data & Methodology For Analyzing the above objectives and hypothesis Data mining Tools and Techniques i.e R programming, R Rattle, WEKA and SPSS 20 software's

1. IIM Bangalore 3st international conference on Business Analytics and intelligence

17th -19th Dec 2015 “A Study of Customer buying behavior & E commerce: A

Data mining Approach by B. Naveena Devi, K.Venkata rao ,Y. Rama Devi C. Rajeswara

Rao,http://dcal.iimb.ernet.in/baiconf2015/pdf/Presentation%20Schedule_Multiple%20Tr

acks.pdf

Publication /Proceeding/Articles

Reachout Analytics Client Sample Report