market basket analysis in retail store using “r” to sustain and … · 2018. 3. 20. · market...
TRANSCRIPT
Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers and generate new Customers by using Sales Data
Objective:
Leverage customer transaction data for right product bundling and promotions,
assortment planning and inventory management, and the product Placement
in the stores.
Data Preparation:
Main data source used for a Market Basket Analysis is customer purchase transaction
data. The purchase slip or bill will have information on products purchased on a custom
er visit along with their quantities, prices and overall prices.
Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers and generate new Customers by using Sales Data
The transaction table may store information as below:
Order ID Transaction ID
ProductID Product Description
Quantity Purchased
Unit Price
Price
11 1 Jan 2014
22 Pepsodent 50gm
2 12 24
11 1 Jan 2014
54 Babul 50gm
1 25 50
12 1 Jan 2014
35 Modern Bread
1 8 25
12 1 Jan 2014
44 Dairy Milk
1 28 50
12 1 Jan 2014
67 Surf Excel
1 80 125
OrderID Product Code List
[[1]] "0" "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14" "15" "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26" "27" "28" "29"
[[2]] "30" "31" "32"
[[3]] "33" "34" "35"
[[4]] "36" "37" "38" "39" "40" "41" "42" "43" "44" "45" "46"
[[5]] "38" "39" "47" "48"
[[6]] "38" "39" "48" "49" "50" "51" "52" "53" "54" "55" "56" "57" "58"
Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers and generate new Customers by using Sales Data
Data Analysis:
Once we have data in required format, we need to carry out univariate or exploratory
analysis, So that we understand what is going on.
Some of the typical questions we will try to answer based on Market Basket Analysis are:
What are the distinct visits?
What is typical number of products purchased by a customer in an order or a
visit?
What number of different SKUs (stock keeping units) being sold in a week or
month?
Which are the most frequent items or products?
I will try to answer these Market Basket Data Analysis questions using sample dataset an
d R. No. of Products in Order
% of Orders
0-1 3%
2-5 30%
6-10 29%
11-15 17%
16-20 10%
21-30 8%
31-40 2%
41-High' 1%
Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers and generate new Customers by using Sales Data
Based on above analysis, 60% of the orders or visits have between 2 to 10 products in an order. A next important question really is who are frequently bought products in customer baskets. We can use R functions – itemFrequencyPlot() – to get count and plot of frequently bought products. This function is part of R package – “arules”. We should always install this package before using this data collected for Market Basket Analysis. Install Required Packages :
For Market Basket or Association Analysis, arules and arulesViz have to be installed and loaded first. # Load Libraries
library(arules) library(arulesViz)
Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers and generate new Customers by using Sales Data
Read Data for Market Basket Analysis :
A Grocery store data collected in .dat format from website http://fimi.ua.ac.be/data/retail.dat is used as a sample-data for market basket analysis in this research paper. I made some changes in that file.
fc <- file("http://fimi.ua.ac.be/data/retail.dat") mylist <- strsplit(readLines(fc), " ") close(fc) head(mylist)
Exploratory Data Analysis
One of the first-step is to check the basic information about the data. This will help us in addressing any issues with the data creation or reading. And also help in building our understanding of the data.
[[1]] [1] "0" "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" [15] "14" "15" "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26" "27" [29] "28" "29" [[2]] [1] "30" "31" "32" [[3]] [1] "33" "34" "35" [[4]] [1] "36" "37" "38" "39" "40" "41" "42" "43" "44" "45" "46"
# Number of orders
noOrder <- length(mylist)
noOrder
## [1] 88162
Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers and generate new Customers by using Sales Data
Counting number of items in each of these orders:
.
Creating Table:
#Items in a Basket
prdCount <-sapply(mylist, length)
# Max number of items in an order/visit
max(prdCount)
## [1] 76
# Group Number products in 10
grp <- cut(prdCount, breaks=c(0,1,5,10,15,20,30,40,80))
class(grp)
## [1] "factor"
1 xl <-c("0-1","2-5","6-10","11-15","16-20","21-30","31-40","41-High")
# Bar Chart on Product count in a basket
barplot(t, xlab="Product Counts",
ylab="# of Orders",
main="Number of Products in each Order",
names.arg=xl,
col = "blue",
border="white")
## table with % of order
prop.table(t)*100
Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers and generate new Customers by using Sales Data
Converted data into “transaction” class before using Association Analysis
functions:
Frequency Plot:
## Convert into "transaction" class
rtrans <- as(mylist, "transactions")
## get frequency
freq <-itemFrequency(rtrans,type="absolute")
freq <-sort(freq,decreasing = T)
freq[1:20]
## 39 48 38 32 41 65 89 225 170 237 36 110
## 50675 42135 15596 15167 14945 4472 3837 3257 3099 3032 2936 2794
## 310 101 475 271 413 438 1327 147
## 2594 2237 2167 2094 1880 1863 1786 1779
# Frequency plot
itemFrequencyPlot(rtrans,
topN=20,
type="absolute",
xlab="Products",
ylab="Frequency of Product Sale",
main="Sale Frequency of Each Product",
col="red",border="white")
Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers and generate new Customers by using Sales Data
Association or Market basket Analysis Rules:
bel.rules <- apriori(rtrans, parameter = list(supp = 0.001, conf = 0.8))
##
## parameter specification:
## confidence minval smax arem aval originalSupport support minlen maxlen
## 0.8 0.1 1 none FALSE TRUE 0.001 1 10
## target ext
## rules FALSE
##
## algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## apriori - find association rules with the apriori algorithm
## version 4.21 (2004.05.09) (c) 1996-2004 Christian Borgelt
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[16470 item(s), 88162 transaction(s)] done [0.12s].
## sorting and recoding items ... [2117 item(s)] done [0.01s].
## creating transaction tree ... done [0.05s].
## checking subsets of size 1 2 3 4 5 6 done [0.13s].
## writing ... [711 rule(s)] done [0.01s].
Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers and generate new Customers by using Sales Data
# number of rules
length(bel.rules)
## [1] 711
# Inspect Bell Rules
inspect(bel.rules[1:5])
## lhs rhs support confidence lift
## 1 {3854} => {38} 0.001066 0.9126 5.159
## 2 {1045} => {32} 0.001100 0.9065 5.270
## 3 {4030} => {48} 0.001021 0.8257 1.728
## 4 {1473} => {39} 0.001225 0.8000 1.392
## 5 {1727} => {38} 0.001838 0.9310 5.263
Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers and generate new Customers by using Sales Data
Association Rules build by apriori algorithm can be selected using support, lift or confidence like this process:
Sample Association Rules:
#------------- MBA: Select Rules-----------------------
# support
s <- sort(bel.rules, by="support", decreasing=TRUE)
inspect(s[1:10])
## lhs rhs support confidence lift
## 1 {41, ## 48} => {39} 0.08355 0.8168 1.421 ## 2 {170} => {38} 0.03438 0.9781 5.529
## 3 {36} => {38} 0.03165 0.9503 5.372
## 4 {110} => {38} 0.03091 0.9753 5.513
## 5 {170,
## 39} => {38} 0.02290 0.9806 5.543 ## 6 {38, ## 41, ## 48} => {39} 0.02258 0.8387 1.459
## 7 {36, ## 39} => {38} 0.02206 0.9548 5.398 ## 8 {110,
## 39} => {38} 0.01974 0.9892 5.592 ## 9 {170,
## 48} => {38} 0.01745 0.9878 5.584 ## 10 {225,
## 48} => {39} 0.01588 0.8065 1.403
# confidence
c <- sort(bel.rules, by="confidence", decreasing=TRUE)
inspect(c[1:10])
Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers and generate new Customers by using Sales Data
## lhs rhs support confidence lift
## 1 {32,
## 840} => {38} 0.001032 1 5.653
## 2 {32,
## 371} => {38} 0.001372 1 5.653
## 3 {170,
## 438} => {38} 0.001168 1 5.653
## 4 {310,
## 36} => {38} 0.001044 1 5.653
## 5 {170,
## 225} => {38} 0.001463 1 5.653
## 6 {32,
## 47,
## 48} => {38} 0.001214 1 5.653
## 7 {371,
## 41,
## 48} => {38} 0.001146 1 5.653
## 8 {32,
## 37,
## 48} => {38} 0.001384 1 5.653
## 9 {32,
## 37,
## 39} => {38} 0.001554 1 5.653
## 10 {37,
## 41,
## 48} => {38} 0.001951 1 5.653
Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers and generate new Customers by using Sales Data
Sample Association Rule:
# lift
l <- sort(bel.rules, by="lift", decreasing=TRUE)
inspect(l[1:10])
# Sample Association Rules
## lhs rhs support confidence lift
## 1 {1818,
## 3311,
## 795} => {1819} 0.001089 0.9057 318.1
## 2 {1818,
## 1819,
## 795} => {3311} 0.001089 0.8276 302.7
## 3 {3311,
## 795} => {1819} 0.001407 0.8435 296.3
## 4 {1818,
## 1819,
## 3311} => {795} 0.001089 0.8421 295.8
## 5 {1818,
## 3311} => {1819} 0.001293 0.8143 286.0
## 6 {1818,
## 1819} => {795} 0.001316 0.8000 281.0
## 7 {1080,
## 1378} => {1379} 0.001078 0.8120 252.9
## 8 {1379,
## 1380} => {309} 0.001044 0.8214 234.4
## 9 {16430,
# 41} => {16431} 0.001180 0.8595 215.9 ## 10 {16430, ## 39, ## 48} => {16431} 0.001202 0.8548 214.7
Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers and generate new Customers by using Sales Data
Product Combinations:
## First product/combination selected, what is second product
sel.rules<-apriori(data=rtrans, parameter=list(supp=0.001,conf = 0.15,minlen=2),
appearance = list(default="rhs",lhs="38"),
control = list(verbose=F))
sel.rules<-sort(sel.rules, decreasing=TRUE,by="confidence")
inspect(sel.rules[1:5])
## lhs rhs support confidence lift
## 1 {38} => {39} 0.11734 0.6633 1.154
## 2 {38} => {48} 0.09011 0.5094 1.066
## 3 {38} => {41} 0.04420 0.2499 1.474
## 4 {38} => {170} 0.03438 0.1943 5.529
## 5 {38} => {32} 0.03213 0.1816 1.056
## Second product/combination selected when first product select
rhs.rules<-apriori(data=rtrans, parameter=list(supp=0.001,conf = 0.15,minlen=2),
appearance = list(default="lhs",rhs="38"),
control = list(verbose=F))
sel.rules<-sort(rhs.rules, decreasing=TRUE,by="confidence")
inspect(rhs.rules[1:5])
Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers and generate new Customers by using Sales Data
We can used R functions – itemFrequencyPlot() – here to get count and plot of
frequently bought products. This function is part of R package – “arules”.
Here, We can see that Product – 39 and 48 are the most frequently purchased
products. This will help us to confirm if these are as per expectations.
Market Basket Analysis and Affinity Analysis
## lhs rhs support confidence lift
## 1 {3854} => {38} 0.001066 0.9126 5.159
## 2 {1727} => {38} 0.001838 0.9310 5.263
## 3 {3005} => {38} 0.002212 0.9512 5.377
## 4 {504} => {38} 0.002779 0.8221 4.647
## 5 {2805} => {38} 0.002405 0.9550 5.398
Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers and generate new Customers by using Sales Data
Post data preparation and exploratory analysis, we can shift to main analysis
targeted toward Market
Market Basket Analysis (MBA):
The key questions this Market Basket Analysis (MBA) report tries to answer are:
Should we perform market basket analysis at a product level or category level?
Do we have information on sequence of products buying in a basket or
customer visit?
Which are products bought together by the customers?
Can we conclude if product „A‟ sale drives product „B‟ sales?
What product categories are bought together?
What product is to be recommended given a customer has bought a product or
a group of products?
The steps used in MBA are:
Identify Rules:
Association Rules or Affinity between products bought together need to be identified
based on transactional data.
R package – “arules” is used to find rules. The rule can be identified and filtered based
on product combinations. We can have rule for group of 2 products, 3 products or
more products.
Example of rules
LHS RHS Support Confidence Lift
{3854} => {38} 0.001 0.913 5.159
{1045} => {32} 0.001 0.907 5.270
{4030} => {48} 0.001 0.826 1.728
Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers and generate new Customers by using Sales Data
{1473} => {39} 0.001 0.800 1.392
{1727} => {38} 0.002 0.931 5.263
LHS (left hand side) indicates first product or item considered for the rule
RHS (right hand side) indicates second product bought when first product is given
(lhs)Support, Confidence and Lift shows relative importance of each rule identified.
Evaluate Rules:
Support, Confidence and Lift are key KPIs for evaluating rules and we will discuss
importance of each of these metric.
Support: Support indicates percent transactions with a product combination. .
Support indicates % of transactions which are supporting the rule. This is an
important indicator to check whether there are enough transactions in support
of the rule. In the above example, 0.01% of transactions have “{3854} => {38}”
product combination occurring together.
Confidence: For measuring quality of association rules, another measure
confidence is used. It is ratio of support for a rule to condition of one product
purchase. For rule “{3854} => {38}”, we will find support of these two product
bring bought together and also how many times first product (“{3854}”) bought
by the customer.
Conf ( R ) = Sup (A υ B)/Sup (A)
A rule indicates that a Product B is bought along with Product A. So, if buying of
product A triggers purchase of product B, we need to check number of times
product B is bought when customer buys product A.
Life: Lift is measure importance of rule. It compares confidence of a rule against
expected confidence. So, a rule with higher value of Lift is the better. The lift
value close to one indicates redundant rule.
Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers and generate new Customers by using Sales Data
Find Rule with high Support Values:
Rules- have higher support values,
meaning many transactions have these product combinations in the transaction
data. But the confidence level for these rules is lower than one. Product 41 is
suppose “Milk” and 48 is “Flakes” in Big Bazaar.
Find rules with high Confidence values:
Find rules with high Lift values:
Actionable Insights:
“{41, 48} => {39}” and “{170} => {38}”
Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers and generate new Customers by using Sales Data
Conclusion: Based on Support, Confidence and Lift values we can select a list of rules.
These rules have to be Analyzed for insights and actions. We can have new hypotheses
as well. We can say what are products or product combination bought by these
customers who have bought a specific product as second product. The business may
want to identify customers who can targeted for “Product 38”, now they are looking
target list of customers based on association of between product take up The second
type of hypotheses can based on first product selection, what product to be targeted
when we know the first product select by a customer.
Reference:
1. Data Source: http://fimi.ua.ac.be/data/retail.dat
INTERNATINAL JOURNAL OF RESEARCH IN COMPUTER SCIENCE AND MANAGEMENT VOL. NO. 00(0), ISSN NO.-2321-8088
Author:
Er. and Mgr. Rohit Dubey
(Data Scientist)
Market Basket Analysis in Retail Store using “R” to sustain and retain old Customers and generate new Customers by using Sales Data