discover the association rules of different patterns xuemei fan
TRANSCRIPT
Discover The Association Discover The Association Rules of Different PatternsRules of Different Patterns
Xuemei FanXuemei Fan
IntroductionIntroduction
Description of the datasetDescription of the dataset State the problemsState the problems Describe the Method used in Describe the Method used in
this projectthis project Show the results Show the results Analysis the resultsAnalysis the results
Description of the datasetDescription of the dataset
The dataset is The dataset is about 1 GBabout 1 GB
Each Each transaction transaction includes one includes one user id, one user id, one date and one date and one itemitem
There is about There is about 5.5 millions 5.5 millions transactionstransactions
001001 11/1/200711/1/2007 CarrotCarrot
001001 11/1/200711/1/2007 BananaBanana
002002 2/1/20072/1/2007 LettuceLettuce
002002 4/1/20074/1/2007 IcecreamIcecream
001001 4/1/20074/1/2007 BroccoliBroccoli
003003 4/1/20074/1/2007 PotatoPotato
State the problemsState the problems
Discover the association rules Discover the association rules from different patternfrom different pattern
Find out the dieting habits in Find out the dieting habits in different areadifferent area
Determine if there are any Determine if there are any relations of association rules relations of association rules from different patternfrom different pattern
Describe the Method used in Describe the Method used in this projectthis project
Filter DataFilter Data Group DataGroup Data Remove Duplicate DataRemove Duplicate Data Extract the Regular CustomerExtract the Regular Customer Generate Association RuleGenerate Association Rule
Filter DataFilter Data
Top 3000 listTop 3000 list
Filter data based on the listFilter data based on the list
Calculate the frequency of Calculate the frequency of different itemsdifferent items
Filter DataFilter Data
001001 4/1/20074/1/2007 BananaBanana
001001 11/1/200711/1/2007 Dog foodDog food
001001 11/1/200711/1/2007 CarrotCarrot
001001 11/1/200711/1/2007 BananaBanana
002002 2/1/20072/1/2007 LettuceLettuce
002002 4/1/20074/1/2007 IcecreamIcecream
001001 4/1/20074/1/2007 BroccoliBroccoli
003003 4/1/20074/1/2007 PotatoPotato
BananaBanana
CarrotCarrot
..
..
..
..
Group DataGroup Data
Group the data which has same user_iGroup the data which has same user_id and dated and date
{001,4/1/2007, Banana,Banana,Broccoli}{001,4/1/2007, Banana,Banana,Broccoli}
most frequent least frequentmost frequent least frequent
001001 4/1/20074/1/2007 BananaBanana
001001 11/1/200711/1/2007 CarrotCarrot
001001 11/1/200711/1/2007 CarrotCarrot
001001 4/1/20074/1/2007 BananaBanana
002002 2/1/20072/1/2007 LettuceLettuce
002002 4/1/20074/1/2007 IcecreamIcecream
001001 4/1/20074/1/2007 BroccoliBroccoli
003003 4/1/20074/1/2007 PotatoPotato
Remove Duplicate DataRemove Duplicate Data
Remove the dupicate data has iRemove the dupicate data has in the same group n the same group
{001,4/1/2007, Banana,Banana,Broccoli}{001,4/1/2007, Banana,Banana,Broccoli}
Extract the Regular CustomerExtract the Regular Customer
001001 4/1/20074/1/2007 Banana, BroccoliBanana, Broccoli
001001 11/1/200711/1/2007 Banana, CarrotBanana, Carrot
001001 11/8/200711/8/2007 Carrot, Broccoli, LettuceCarrot, Broccoli, Lettuce
002002 4/1/20074/1/2007 Banana, LettuceBanana, Lettuce
006006 2/1/20072/1/2007 LettuceLettuce
002002 4/1/20074/1/2007 Banana, Lettuce, IcecreamBanana, Lettuce, Icecream
..
..
..
>=24
Generate Association RuleGenerate Association Rule
Generate the Association rule By FP-Tree Generate the Association rule By FP-Tree AlgorithmAlgorithm
Support = 5.0Support = 5.0 Confidence = 20.0Confidence = 20.0 UserID:UserID:
60089441499058106008944149905810
The Results—Filter DataThe Results—Filter Data
Filter dataset based on top Filter dataset based on top 3000 list3000 list
Split dataset based on area Split dataset based on area (Burnside, Elizabeth)(Burnside, Elizabeth)
Burnside contains 1.9 millions Burnside contains 1.9 millions recordsrecords
Elizabeth has 1.5 millions Elizabeth has 1.5 millions transactiontransaction
Top 50 Frequent Items in Top 50 Frequent Items in BurnsideBurnside
BAKERY SNACKS
BI SCUI TS & COOKI ES
CHI LLED SPREADS
COOKED HARD VEG
COOKED SOFT VEG
DESSERTS (FV)
DY MI LK
EGGS
J UI CES/ DRI NKS
SLI CED MEATS
SOFTDRI NKS
Top 50 Frequent Items in ElizTop 50 Frequent Items in Elizabethabeth
BAKERY SNACKS
BI SCUI TS & COOKI ES
CHI LLED SPREADS
COOKED SOFT VEG
DESSERTS (FV)
DY MI LK
EGGS
SLI CED MEATS
SOFTDRI NKS
BEEF
BAKERY BOUGHT I N
CHI LLED DESSERTS
CONVENI ENCE MEALS
POULTRY
SNACKS
The Results—Group DataThe Results—Group Data
247292 grouped transactions in Burn247292 grouped transactions in Burnside side
165480 transactions in Elizabeth 165480 transactions in Elizabeth
The largest single purchase by a cusThe largest single purchase by a customer is 29 items in Burnsidetomer is 29 items in Burnside
The largest single purchase by a cusThe largest single purchase by a customer is 51 items in Burnsidetomer is 51 items in Burnside
The Results—The Results—Regular CustomersRegular Customers Shop 462Shop 462 total: 165480total: 165480 regular customer: 1994regular customer: 1994 regular transactions: 92978regular transactions: 92978 unreguar customer: 17448unreguar customer: 17448 unregular transactions:72502unregular transactions:72502
Shop 453Shop 453 total: 247292total: 247292 regular customers: 3200regular customers: 3200 regular transactions:171009regular transactions:171009 unregular customer: 20227unregular customer: 20227 unregular transactions: 76283unregular transactions: 76283
The Results—FP-TreeThe Results—FP-Tree
Top 10 Association Rules in Top 10 Association Rules in BurnsideBurnside
{AVOCADO, BROCCOLI} -->BANANAS 53.03%
{CUCUMBERS,BROCCOLI}-->BANANAS
APPLES --> BANANAS
PEARS --> BANANAS
{AVOCADO, ONIONS} -->BANANAS
ORANGES --> BANANAS
{BROCCOLI, ONIONS}-->BANANAS
{BROCCOLI, ZUCCHINI}-->BANANAS
MANDARINS --> BANANAS
The Results—FP-TreeThe Results—FP-Tree
Top 10 Association Rules in ElizabTop 10 Association Rules in Elizabetheth{CUCUMBERS, BROCCOLI} -->BANANAS
PEARS --> BANANAS
ORANGES --> BANANAS
GRAPE --> BANANAS
{BROCCOLI, SMART BUY CARROTS} -->BANANAS
WATERMELON --> BANANAS
{CUCUMBERS, SMART BUY CARROTS} -->BANANAS
STRAWBERRIES --> BANANAS
CARROTS --> BANANAS
{BANANAS, LETTUCE} --> CUCUMBERS
Analysis--Nutrition ConditionAnalysis--Nutrition Condition
BurnsideBurnside Better purchase habits Better purchase habits Prefer health food over unhealthy Prefer health food over unhealthy
foodfood The top of the association rulesThe top of the association rules • Fruit with fruit Fruit with fruit • Fruit with vegetables Fruit with vegetables • Vegetables with vegetables Vegetables with vegetables • Milk with fruit Milk with fruit
Analysis--Nutrition ConditionAnalysis--Nutrition Condition
BurnsideBurnside The purchase habits are varied more The purchase habits are varied more
purchases of soft drink than Burnsidpurchases of soft drink than Burnside e
The confidence of the healthy food aThe confidence of the healthy food associations are lower than at Burnsidssociations are lower than at Burnsides es
Analysis—FP-TreeAnalysis—FP-Tree
BurnsideBurnside
Analysis—FP-TreeAnalysis—FP-Tree
ElizabethElizabeth
ConclusionConclusion
The association rules are generated from The association rules are generated from regular customers has strong relationsregular customers has strong relations
If the regular customers are the majority in If the regular customers are the majority in the all customers dataset, the association the all customers dataset, the association rules have strong relations than non-rules have strong relations than non-
regular customersregular customers
ThanksThanks