maximal frequent itemset mining - ut · –leiame kõik transactionid, kus d ei esine –lahutame a...
TRANSCRIPT
![Page 1: Maximal Frequent Itemset Mining - ut · –Leiame kõik transactionid, kus D ei esine –Lahutame A suppordist kõik, kus D ei esine –Vastuseks saame suppordi, kus esinevad AD koos](https://reader033.vdocuments.site/reader033/viewer/2022050221/5f66b6e52b66e7022f47f7fa/html5/thumbnails/1.jpg)
Maximal Frequent Itemset Mining
Oskar Gross
Karl Blum
![Page 2: Maximal Frequent Itemset Mining - ut · –Leiame kõik transactionid, kus D ei esine –Lahutame A suppordist kõik, kus D ei esine –Vastuseks saame suppordi, kus esinevad AD koos](https://reader033.vdocuments.site/reader033/viewer/2022050221/5f66b6e52b66e7022f47f7fa/html5/thumbnails/2.jpg)
Maximal Frequent Itemsets
• Has no superset that is frequent
• MFI ⊆ FI
• Example:– Items: a, b, c, d, e
– Frequent Itemset: {a, b, c}
– {a, b, c, d}, {a, b, c, e}, {a, b, c, d, e} are not Frequent Itemset.
– Maximal Frequent Itemsets: {a, b, c}
![Page 3: Maximal Frequent Itemset Mining - ut · –Leiame kõik transactionid, kus D ei esine –Lahutame A suppordist kõik, kus D ei esine –Vastuseks saame suppordi, kus esinevad AD koos](https://reader033.vdocuments.site/reader033/viewer/2022050221/5f66b6e52b66e7022f47f7fa/html5/thumbnails/3.jpg)
Lexicographic ordering
G
Transaction Head Tail
P 1 2,3,4
G 2,3 4
![Page 4: Maximal Frequent Itemset Mining - ut · –Leiame kõik transactionid, kus D ei esine –Lahutame A suppordist kõik, kus D ei esine –Vastuseks saame suppordi, kus esinevad AD koos](https://reader033.vdocuments.site/reader033/viewer/2022050221/5f66b6e52b66e7022f47f7fa/html5/thumbnails/4.jpg)
MAFIA• Problem of mining frequent itemsets viewed as
finding a cut through itemset lattice– All items above cut are frequent itemsets
– All items below cut are infrequent itemsets
• Depth first traversal
• Effective pruning
![Page 5: Maximal Frequent Itemset Mining - ut · –Leiame kõik transactionid, kus D ei esine –Lahutame A suppordist kõik, kus D ei esine –Vastuseks saame suppordi, kus esinevad AD koos](https://reader033.vdocuments.site/reader033/viewer/2022050221/5f66b6e52b66e7022f47f7fa/html5/thumbnails/5.jpg)
Algorithmic Components
• Depth-first traversal
• Search space pruning
– PEP
– FHUT
– HUTMI
– Dynamic reordering
![Page 6: Maximal Frequent Itemset Mining - ut · –Leiame kõik transactionid, kus D ei esine –Lahutame A suppordist kõik, kus D ei esine –Vastuseks saame suppordi, kus esinevad AD koos](https://reader033.vdocuments.site/reader033/viewer/2022050221/5f66b6e52b66e7022f47f7fa/html5/thumbnails/6.jpg)
Search Space Pruning - PEP
• Parent Equivalence Pruning
• Given current node in itemset tree with head x and tail element y, t(x) ⊆ t(y) means any transaction containing x also contains y
• Since we only want maximal frequentitemsets, we can move y to the head if t(x) ⊆t(y) holds
![Page 7: Maximal Frequent Itemset Mining - ut · –Leiame kõik transactionid, kus D ei esine –Lahutame A suppordist kõik, kus D ei esine –Vastuseks saame suppordi, kus esinevad AD koos](https://reader033.vdocuments.site/reader033/viewer/2022050221/5f66b6e52b66e7022f47f7fa/html5/thumbnails/7.jpg)
Search Space Pruning - FHUT
• Frequent Head Union Tail
• For a node n, the largest possible frequentitemset contained in subtree rooted at n is n’sHUT (Head Union Tail).
• If n’s HUT is found to be frequent, do notexplore any subsets of the HUT.
• The subtree rooted at n can be pruned away.
![Page 8: Maximal Frequent Itemset Mining - ut · –Leiame kõik transactionid, kus D ei esine –Lahutame A suppordist kõik, kus D ei esine –Vastuseks saame suppordi, kus esinevad AD koos](https://reader033.vdocuments.site/reader033/viewer/2022050221/5f66b6e52b66e7022f47f7fa/html5/thumbnails/8.jpg)
Search Space Pruning - HUTMFI
• Head Union Tail Maximal Frequent Itemset
• If a superset of HUT for the current node isalready in the MFI, then the HUT is frequent.
• The subtree rooted at this node can be pruned away.
![Page 9: Maximal Frequent Itemset Mining - ut · –Leiame kõik transactionid, kus D ei esine –Lahutame A suppordist kõik, kus D ei esine –Vastuseks saame suppordi, kus esinevad AD koos](https://reader033.vdocuments.site/reader033/viewer/2022050221/5f66b6e52b66e7022f47f7fa/html5/thumbnails/9.jpg)
Search Space Pruning – DynamicReordering
• Tail of a node such that it only containsfrequent extensions of the current node
• Tail elements are ordered by increasingsupport (keeps search space as small aspossible).
![Page 10: Maximal Frequent Itemset Mining - ut · –Leiame kõik transactionid, kus D ei esine –Lahutame A suppordist kõik, kus D ei esine –Vastuseks saame suppordi, kus esinevad AD koos](https://reader033.vdocuments.site/reader033/viewer/2022050221/5f66b6e52b66e7022f47f7fa/html5/thumbnails/10.jpg)
GenMax
• Optimizing superset checking tehniques
• Reordering the combine set
• Optimizing GenMax
• Final GenMax algorithm
![Page 11: Maximal Frequent Itemset Mining - ut · –Leiame kõik transactionid, kus D ei esine –Lahutame A suppordist kõik, kus D ei esine –Vastuseks saame suppordi, kus esinevad AD koos](https://reader033.vdocuments.site/reader033/viewer/2022050221/5f66b6e52b66e7022f47f7fa/html5/thumbnails/11.jpg)
Optimizing superset checking tehniques
• Double checking problem
• Maximum position p
Use the p for 8*
![Page 12: Maximal Frequent Itemset Mining - ut · –Leiame kõik transactionid, kus D ei esine –Lahutame A suppordist kõik, kus D ei esine –Vastuseks saame suppordi, kus esinevad AD koos](https://reader033.vdocuments.site/reader033/viewer/2022050221/5f66b6e52b66e7022f47f7fa/html5/thumbnails/12.jpg)
Redundant subset checks elimination
• Let us have a tail with cardinality of m
• We should check only if MFI changes
• check_status flag
• When Cl+1 is empty check_status = true
![Page 13: Maximal Frequent Itemset Mining - ut · –Leiame kõik transactionid, kus D ei esine –Lahutame A suppordist kõik, kus D ei esine –Vastuseks saame suppordi, kus esinevad AD koos](https://reader033.vdocuments.site/reader033/viewer/2022050221/5f66b6e52b66e7022f47f7fa/html5/thumbnails/13.jpg)
Local Maximal Frequent Itemsets
• Only potential supersets of candidates
• Only maximal sets which contain Il
![Page 14: Maximal Frequent Itemset Mining - ut · –Leiame kõik transactionid, kus D ei esine –Lahutame A suppordist kõik, kus D ei esine –Vastuseks saame suppordi, kus esinevad AD koos](https://reader033.vdocuments.site/reader033/viewer/2022050221/5f66b6e52b66e7022f47f7fa/html5/thumbnails/14.jpg)
Algorithm With Local Maximum Frequent Itemset
• Line 6* - initializes
next iteration LMFI
• Line 11 * - creates
new LMFI containing x
• Line 13* - sums
up the LMFI’s
![Page 15: Maximal Frequent Itemset Mining - ut · –Leiame kõik transactionid, kus D ei esine –Lahutame A suppordist kõik, kus D ei esine –Vastuseks saame suppordi, kus esinevad AD koos](https://reader033.vdocuments.site/reader033/viewer/2022050221/5f66b6e52b66e7022f47f7fa/html5/thumbnails/15.jpg)
Frequency Testing Optimization
• Diffset optimization for fast frequency testing
![Page 16: Maximal Frequent Itemset Mining - ut · –Leiame kõik transactionid, kus D ei esine –Lahutame A suppordist kõik, kus D ei esine –Vastuseks saame suppordi, kus esinevad AD koos](https://reader033.vdocuments.site/reader033/viewer/2022050221/5f66b6e52b66e7022f47f7fa/html5/thumbnails/16.jpg)
FTO Example
• Ehk:
– Leiame kõik transactionid, kus D ei esine
– Lahutame A suppordist kõik, kus D ei esine
– Vastuseks saame suppordi, kus esinevad AD koos
![Page 17: Maximal Frequent Itemset Mining - ut · –Leiame kõik transactionid, kus D ei esine –Lahutame A suppordist kõik, kus D ei esine –Vastuseks saame suppordi, kus esinevad AD koos](https://reader033.vdocuments.site/reader033/viewer/2022050221/5f66b6e52b66e7022f47f7fa/html5/thumbnails/17.jpg)
GenMax Algorithm in Full Glory
![Page 18: Maximal Frequent Itemset Mining - ut · –Leiame kõik transactionid, kus D ei esine –Lahutame A suppordist kõik, kus D ei esine –Vastuseks saame suppordi, kus esinevad AD koos](https://reader033.vdocuments.site/reader033/viewer/2022050221/5f66b6e52b66e7022f47f7fa/html5/thumbnails/18.jpg)
Stats
Dataset Transactions Items AverageTransaction Length
Chess 3196 76 37
Connect4 67,557 130 43
pumbsb 49,046 7,117 74
![Page 19: Maximal Frequent Itemset Mining - ut · –Leiame kõik transactionid, kus D ei esine –Lahutame A suppordist kõik, kus D ei esine –Vastuseks saame suppordi, kus esinevad AD koos](https://reader033.vdocuments.site/reader033/viewer/2022050221/5f66b6e52b66e7022f47f7fa/html5/thumbnails/19.jpg)
![Page 20: Maximal Frequent Itemset Mining - ut · –Leiame kõik transactionid, kus D ei esine –Lahutame A suppordist kõik, kus D ei esine –Vastuseks saame suppordi, kus esinevad AD koos](https://reader033.vdocuments.site/reader033/viewer/2022050221/5f66b6e52b66e7022f47f7fa/html5/thumbnails/20.jpg)
![Page 21: Maximal Frequent Itemset Mining - ut · –Leiame kõik transactionid, kus D ei esine –Lahutame A suppordist kõik, kus D ei esine –Vastuseks saame suppordi, kus esinevad AD koos](https://reader033.vdocuments.site/reader033/viewer/2022050221/5f66b6e52b66e7022f47f7fa/html5/thumbnails/21.jpg)