market basket analysis in retail store using “r” to sustain and … · 2018. 3. 20. · market...

18
Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers and generate new Customers by using Sales Data Objective: Leverage customer transaction data for right product bundling and promotions, assortment planning and inventory management, and the product Placement in the stores. Data Preparation: Main data source used for a Market Basket Analysis is customer purchase transaction data. The purchase slip or bill will have information on products purchased on a custom er visit along with their quantities, prices and overall prices.

Upload: others

Post on 25-Nov-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Market Basket Analysis in Retail Store using “R” to sustain and … · 2018. 3. 20. · Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers

Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers and generate new Customers by using Sales Data

Objective:

Leverage customer transaction data for right product bundling and promotions,

assortment planning and inventory management, and the product Placement

in the stores.

Data Preparation:

Main data source used for a Market Basket Analysis is customer purchase transaction

data. The purchase slip or bill will have information on products purchased on a custom

er visit along with their quantities, prices and overall prices.

Page 2: Market Basket Analysis in Retail Store using “R” to sustain and … · 2018. 3. 20. · Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers

Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers and generate new Customers by using Sales Data

The transaction table may store information as below:

Order ID Transaction ID

ProductID Product Description

Quantity Purchased

Unit Price

Price

11 1 Jan 2014

22 Pepsodent 50gm

2 12 24

11 1 Jan 2014

54 Babul 50gm

1 25 50

12 1 Jan 2014

35 Modern Bread

1 8 25

12 1 Jan 2014

44 Dairy Milk

1 28 50

12 1 Jan 2014

67 Surf Excel

1 80 125

OrderID Product Code List

[[1]] "0" "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14" "15" "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26" "27" "28" "29"

[[2]] "30" "31" "32"

[[3]] "33" "34" "35"

[[4]] "36" "37" "38" "39" "40" "41" "42" "43" "44" "45" "46"

[[5]] "38" "39" "47" "48"

[[6]] "38" "39" "48" "49" "50" "51" "52" "53" "54" "55" "56" "57" "58"

Page 3: Market Basket Analysis in Retail Store using “R” to sustain and … · 2018. 3. 20. · Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers

Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers and generate new Customers by using Sales Data

Data Analysis:

Once we have data in required format, we need to carry out univariate or exploratory

analysis, So that we understand what is going on.

Some of the typical questions we will try to answer based on Market Basket Analysis are:

What are the distinct visits?

What is typical number of products purchased by a customer in an order or a

visit?

What number of different SKUs (stock keeping units) being sold in a week or

month?

Which are the most frequent items or products?

I will try to answer these Market Basket Data Analysis questions using sample dataset an

d R. No. of Products in Order

% of Orders

0-1 3%

2-5 30%

6-10 29%

11-15 17%

16-20 10%

21-30 8%

31-40 2%

41-High' 1%

Page 4: Market Basket Analysis in Retail Store using “R” to sustain and … · 2018. 3. 20. · Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers

Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers and generate new Customers by using Sales Data

Based on above analysis, 60% of the orders or visits have between 2 to 10 products in an order. A next important question really is who are frequently bought products in customer baskets. We can use R functions – itemFrequencyPlot() – to get count and plot of frequently bought products. This function is part of R package – “arules”. We should always install this package before using this data collected for Market Basket Analysis. Install Required Packages :

For Market Basket or Association Analysis, arules and arulesViz have to be installed and loaded first. # Load Libraries

library(arules) library(arulesViz)

Page 5: Market Basket Analysis in Retail Store using “R” to sustain and … · 2018. 3. 20. · Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers

Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers and generate new Customers by using Sales Data

Read Data for Market Basket Analysis :

A Grocery store data collected in .dat format from website http://fimi.ua.ac.be/data/retail.dat is used as a sample-data for market basket analysis in this research paper. I made some changes in that file.

fc <- file("http://fimi.ua.ac.be/data/retail.dat") mylist <- strsplit(readLines(fc), " ") close(fc) head(mylist)

Exploratory Data Analysis

One of the first-step is to check the basic information about the data. This will help us in addressing any issues with the data creation or reading. And also help in building our understanding of the data.

[[1]] [1] "0" "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" [15] "14" "15" "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26" "27" [29] "28" "29" [[2]] [1] "30" "31" "32" [[3]] [1] "33" "34" "35" [[4]] [1] "36" "37" "38" "39" "40" "41" "42" "43" "44" "45" "46"

# Number of orders

noOrder <- length(mylist)

noOrder

## [1] 88162

Page 6: Market Basket Analysis in Retail Store using “R” to sustain and … · 2018. 3. 20. · Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers

Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers and generate new Customers by using Sales Data

Counting number of items in each of these orders:

.

Creating Table:

#Items in a Basket

prdCount <-sapply(mylist, length)

# Max number of items in an order/visit

max(prdCount)

## [1] 76

# Group Number products in 10

grp <- cut(prdCount, breaks=c(0,1,5,10,15,20,30,40,80))

class(grp)

## [1] "factor"

1 xl <-c("0-1","2-5","6-10","11-15","16-20","21-30","31-40","41-High")

# Bar Chart on Product count in a basket

barplot(t, xlab="Product Counts",

ylab="# of Orders",

main="Number of Products in each Order",

names.arg=xl,

col = "blue",

border="white")

## table with % of order

prop.table(t)*100

Page 7: Market Basket Analysis in Retail Store using “R” to sustain and … · 2018. 3. 20. · Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers

Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers and generate new Customers by using Sales Data

Converted data into “transaction” class before using Association Analysis

functions:

Frequency Plot:

## Convert into "transaction" class

rtrans <- as(mylist, "transactions")

## get frequency

freq <-itemFrequency(rtrans,type="absolute")

freq <-sort(freq,decreasing = T)

freq[1:20]

## 39 48 38 32 41 65 89 225 170 237 36 110

## 50675 42135 15596 15167 14945 4472 3837 3257 3099 3032 2936 2794

## 310 101 475 271 413 438 1327 147

## 2594 2237 2167 2094 1880 1863 1786 1779

# Frequency plot

itemFrequencyPlot(rtrans,

topN=20,

type="absolute",

xlab="Products",

ylab="Frequency of Product Sale",

main="Sale Frequency of Each Product",

col="red",border="white")

Page 8: Market Basket Analysis in Retail Store using “R” to sustain and … · 2018. 3. 20. · Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers

Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers and generate new Customers by using Sales Data

Association or Market basket Analysis Rules:

bel.rules <- apriori(rtrans, parameter = list(supp = 0.001, conf = 0.8))

##

## parameter specification:

## confidence minval smax arem aval originalSupport support minlen maxlen

## 0.8 0.1 1 none FALSE TRUE 0.001 1 10

## target ext

## rules FALSE

##

## algorithmic control:

## filter tree heap memopt load sort verbose

## 0.1 TRUE TRUE FALSE TRUE 2 TRUE

##

## apriori - find association rules with the apriori algorithm

## version 4.21 (2004.05.09) (c) 1996-2004 Christian Borgelt

## set item appearances ...[0 item(s)] done [0.00s].

## set transactions ...[16470 item(s), 88162 transaction(s)] done [0.12s].

## sorting and recoding items ... [2117 item(s)] done [0.01s].

## creating transaction tree ... done [0.05s].

## checking subsets of size 1 2 3 4 5 6 done [0.13s].

## writing ... [711 rule(s)] done [0.01s].

Page 9: Market Basket Analysis in Retail Store using “R” to sustain and … · 2018. 3. 20. · Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers

Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers and generate new Customers by using Sales Data

# number of rules

length(bel.rules)

## [1] 711

# Inspect Bell Rules

inspect(bel.rules[1:5])

## lhs rhs support confidence lift

## 1 {3854} => {38} 0.001066 0.9126 5.159

## 2 {1045} => {32} 0.001100 0.9065 5.270

## 3 {4030} => {48} 0.001021 0.8257 1.728

## 4 {1473} => {39} 0.001225 0.8000 1.392

## 5 {1727} => {38} 0.001838 0.9310 5.263

Page 10: Market Basket Analysis in Retail Store using “R” to sustain and … · 2018. 3. 20. · Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers

Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers and generate new Customers by using Sales Data

Association Rules build by apriori algorithm can be selected using support, lift or confidence like this process:

Sample Association Rules:

#------------- MBA: Select Rules-----------------------

# support

s <- sort(bel.rules, by="support", decreasing=TRUE)

inspect(s[1:10])

## lhs rhs support confidence lift

## 1 {41, ## 48} => {39} 0.08355 0.8168 1.421 ## 2 {170} => {38} 0.03438 0.9781 5.529

## 3 {36} => {38} 0.03165 0.9503 5.372

## 4 {110} => {38} 0.03091 0.9753 5.513

## 5 {170,

## 39} => {38} 0.02290 0.9806 5.543 ## 6 {38, ## 41, ## 48} => {39} 0.02258 0.8387 1.459

## 7 {36, ## 39} => {38} 0.02206 0.9548 5.398 ## 8 {110,

## 39} => {38} 0.01974 0.9892 5.592 ## 9 {170,

## 48} => {38} 0.01745 0.9878 5.584 ## 10 {225,

## 48} => {39} 0.01588 0.8065 1.403

# confidence

c <- sort(bel.rules, by="confidence", decreasing=TRUE)

inspect(c[1:10])

Teachtechtoe
Line
Page 11: Market Basket Analysis in Retail Store using “R” to sustain and … · 2018. 3. 20. · Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers

Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers and generate new Customers by using Sales Data

## lhs rhs support confidence lift

## 1 {32,

## 840} => {38} 0.001032 1 5.653

## 2 {32,

## 371} => {38} 0.001372 1 5.653

## 3 {170,

## 438} => {38} 0.001168 1 5.653

## 4 {310,

## 36} => {38} 0.001044 1 5.653

## 5 {170,

## 225} => {38} 0.001463 1 5.653

## 6 {32,

## 47,

## 48} => {38} 0.001214 1 5.653

## 7 {371,

## 41,

## 48} => {38} 0.001146 1 5.653

## 8 {32,

## 37,

## 48} => {38} 0.001384 1 5.653

## 9 {32,

## 37,

## 39} => {38} 0.001554 1 5.653

## 10 {37,

## 41,

## 48} => {38} 0.001951 1 5.653

Page 12: Market Basket Analysis in Retail Store using “R” to sustain and … · 2018. 3. 20. · Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers

Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers and generate new Customers by using Sales Data

Sample Association Rule:

# lift

l <- sort(bel.rules, by="lift", decreasing=TRUE)

inspect(l[1:10])

# Sample Association Rules

## lhs rhs support confidence lift

## 1 {1818,

## 3311,

## 795} => {1819} 0.001089 0.9057 318.1

## 2 {1818,

## 1819,

## 795} => {3311} 0.001089 0.8276 302.7

## 3 {3311,

## 795} => {1819} 0.001407 0.8435 296.3

## 4 {1818,

## 1819,

## 3311} => {795} 0.001089 0.8421 295.8

## 5 {1818,

## 3311} => {1819} 0.001293 0.8143 286.0

## 6 {1818,

## 1819} => {795} 0.001316 0.8000 281.0

## 7 {1080,

## 1378} => {1379} 0.001078 0.8120 252.9

## 8 {1379,

## 1380} => {309} 0.001044 0.8214 234.4

## 9 {16430,

# 41} => {16431} 0.001180 0.8595 215.9 ## 10 {16430, ## 39, ## 48} => {16431} 0.001202 0.8548 214.7

Page 13: Market Basket Analysis in Retail Store using “R” to sustain and … · 2018. 3. 20. · Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers

Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers and generate new Customers by using Sales Data

Product Combinations:

## First product/combination selected, what is second product

sel.rules<-apriori(data=rtrans, parameter=list(supp=0.001,conf = 0.15,minlen=2),

appearance = list(default="rhs",lhs="38"),

control = list(verbose=F))

sel.rules<-sort(sel.rules, decreasing=TRUE,by="confidence")

inspect(sel.rules[1:5])

## lhs rhs support confidence lift

## 1 {38} => {39} 0.11734 0.6633 1.154

## 2 {38} => {48} 0.09011 0.5094 1.066

## 3 {38} => {41} 0.04420 0.2499 1.474

## 4 {38} => {170} 0.03438 0.1943 5.529

## 5 {38} => {32} 0.03213 0.1816 1.056

## Second product/combination selected when first product select

rhs.rules<-apriori(data=rtrans, parameter=list(supp=0.001,conf = 0.15,minlen=2),

appearance = list(default="lhs",rhs="38"),

control = list(verbose=F))

sel.rules<-sort(rhs.rules, decreasing=TRUE,by="confidence")

inspect(rhs.rules[1:5])

Page 14: Market Basket Analysis in Retail Store using “R” to sustain and … · 2018. 3. 20. · Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers

Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers and generate new Customers by using Sales Data

We can used R functions – itemFrequencyPlot() – here to get count and plot of

frequently bought products. This function is part of R package – “arules”.

Here, We can see that Product – 39 and 48 are the most frequently purchased

products. This will help us to confirm if these are as per expectations.

Market Basket Analysis and Affinity Analysis

## lhs rhs support confidence lift

## 1 {3854} => {38} 0.001066 0.9126 5.159

## 2 {1727} => {38} 0.001838 0.9310 5.263

## 3 {3005} => {38} 0.002212 0.9512 5.377

## 4 {504} => {38} 0.002779 0.8221 4.647

## 5 {2805} => {38} 0.002405 0.9550 5.398

Page 15: Market Basket Analysis in Retail Store using “R” to sustain and … · 2018. 3. 20. · Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers

Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers and generate new Customers by using Sales Data

Post data preparation and exploratory analysis, we can shift to main analysis

targeted toward Market

Market Basket Analysis (MBA):

The key questions this Market Basket Analysis (MBA) report tries to answer are:

Should we perform market basket analysis at a product level or category level?

Do we have information on sequence of products buying in a basket or

customer visit?

Which are products bought together by the customers?

Can we conclude if product „A‟ sale drives product „B‟ sales?

What product categories are bought together?

What product is to be recommended given a customer has bought a product or

a group of products?

The steps used in MBA are:

Identify Rules:

Association Rules or Affinity between products bought together need to be identified

based on transactional data.

R package – “arules” is used to find rules. The rule can be identified and filtered based

on product combinations. We can have rule for group of 2 products, 3 products or

more products.

Example of rules

LHS RHS Support Confidence Lift

{3854} => {38} 0.001 0.913 5.159

{1045} => {32} 0.001 0.907 5.270

{4030} => {48} 0.001 0.826 1.728

Page 16: Market Basket Analysis in Retail Store using “R” to sustain and … · 2018. 3. 20. · Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers

Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers and generate new Customers by using Sales Data

{1473} => {39} 0.001 0.800 1.392

{1727} => {38} 0.002 0.931 5.263

LHS (left hand side) indicates first product or item considered for the rule

RHS (right hand side) indicates second product bought when first product is given

(lhs)Support, Confidence and Lift shows relative importance of each rule identified.

Evaluate Rules:

Support, Confidence and Lift are key KPIs for evaluating rules and we will discuss

importance of each of these metric.

Support: Support indicates percent transactions with a product combination. .

Support indicates % of transactions which are supporting the rule. This is an

important indicator to check whether there are enough transactions in support

of the rule. In the above example, 0.01% of transactions have “{3854} => {38}”

product combination occurring together.

Confidence: For measuring quality of association rules, another measure

confidence is used. It is ratio of support for a rule to condition of one product

purchase. For rule “{3854} => {38}”, we will find support of these two product

bring bought together and also how many times first product (“{3854}”) bought

by the customer.

Conf ( R ) = Sup (A υ B)/Sup (A)

A rule indicates that a Product B is bought along with Product A. So, if buying of

product A triggers purchase of product B, we need to check number of times

product B is bought when customer buys product A.

Life: Lift is measure importance of rule. It compares confidence of a rule against

expected confidence. So, a rule with higher value of Lift is the better. The lift

value close to one indicates redundant rule.

Page 17: Market Basket Analysis in Retail Store using “R” to sustain and … · 2018. 3. 20. · Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers

Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers and generate new Customers by using Sales Data

Find Rule with high Support Values:

Rules- have higher support values,

meaning many transactions have these product combinations in the transaction

data. But the confidence level for these rules is lower than one. Product 41 is

suppose “Milk” and 48 is “Flakes” in Big Bazaar.

Find rules with high Confidence values:

Find rules with high Lift values:

Actionable Insights:

“{41, 48} => {39}” and “{170} => {38}”

Page 18: Market Basket Analysis in Retail Store using “R” to sustain and … · 2018. 3. 20. · Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers

Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers and generate new Customers by using Sales Data

Conclusion: Based on Support, Confidence and Lift values we can select a list of rules.

These rules have to be Analyzed for insights and actions. We can have new hypotheses

as well. We can say what are products or product combination bought by these

customers who have bought a specific product as second product. The business may

want to identify customers who can targeted for “Product 38”, now they are looking

target list of customers based on association of between product take up The second

type of hypotheses can based on first product selection, what product to be targeted

when we know the first product select by a customer.

Reference:

1. Data Source: http://fimi.ua.ac.be/data/retail.dat

INTERNATINAL JOURNAL OF RESEARCH IN COMPUTER SCIENCE AND MANAGEMENT VOL. NO. 00(0), ISSN NO.-2321-8088

Author:

Er. and Mgr. Rohit Dubey

(Data Scientist)

Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers and generate new Customers by using Sales Data