kdd’09,june 28-july 1,2009,paris,france copyright 2009 acm frequent pattern mining with uncertain...
TRANSCRIPT
KDD’09,June 28-July 1,2009,Paris,FranceCopyright 2009 ACM
Frequent Pattern Mining with
Uncertain Data
OutlineIntroduction
Definition
Algorithm
Experiment Results
Conclusion
Introduction
This paper will study the problem of frequent pattern mining
by examining the relative behavior of the extensions of well known classes of deterministic algorithms.
Definition
Definition
AlgorithmStep1. Extending the H-mine AlgorithmStep2. Extending the FP-growth AlgorithmStep3.Computation of Support Upper
BoundsStep4.Mining Frequent Patterns with UFP-
treeStep5. Determining Support with a Trie
Tree
H-Mine (Example)TDBTDBIDID ItemsItems100100 c, d, e, f, g, ic, d, e, f, g, i200200 a, c, d, e, ma, c, d, e, m
300300 a, b, d, e, g, a, b, d, e, g, kk
400400 a, c, d, ha, c, d, h
min_sup_count = 2min_sup_count = 2
Scan TDB Complete set of frequent items Complete set of frequent items can be found and outputcan be found and output ::{ { a:3, c:3, d:4, e:3, g:2a:3, c:3, d:4, e:3, g:2 } }
Following the alphabetical Following the alphabetical order of frequent items order of frequent items (called (called F-listF-list): ): a-c-d-e-ga-c-d-e-g
IDID Frequent-item Frequent-item projectionprojection
100100 c, d, e, gc, d, e, g
200200 a, c, d, ea, c, d, e
300300 a, d, e, ga, d, e, g
400400 a, c, da, c, dBuild Build H-structH-struct in in
main memorymain memoryScan TDBScan TDB
H-Mine (Example)TDBTDBIDID ItemsItems100100 c, d, e, f, g, ic, d, e, f, g, i200200 a, c, d, e, ma, c, d, e, m
300300 a, b, d, e, g, a, b, d, e, g, kk
400400 a, c, d, ha, c, d, h
min_sup_count = 2min_sup_count = 2
Scan TDB Complete set of frequent items Complete set of frequent items can be found and outputcan be found and output ::{ { a:3, c:3, d:4, e:3, g:2a:3, c:3, d:4, e:3, g:2 } }
Following the alphabetical Following the alphabetical order of frequent items order of frequent items (called (called F-listF-list): ): a-c-d-e-ga-c-d-e-g
IDID Frequent-item Frequent-item projectionprojection
100100 c, d, e, gc, d, e, g
200200 a, c, d, ea, c, d, e
300300 a, d, e, ga, d, e, g
400400 a, c, da, c, dBuild Build H-structH-struct in in
main memorymain memoryScan TDBScan TDB
H-Mine (Example)TDBTDBIDID ItemsItems100100 c, d, e, f, g, ic, d, e, f, g, i200200 a, c, d, e, ma, c, d, e, m
300300 a, b, d, e, g, a, b, d, e, g, kk
400400 a, c, d, ha, c, d, h
min_sup_count = 2min_sup_count = 2
Scan TDB Complete set of frequent items Complete set of frequent items can be found and outputcan be found and output ::{ { a:3, c:3, d:4, e:3, g:2a:3, c:3, d:4, e:3, g:2 } }
Following the alphabetical Following the alphabetical order of frequent items order of frequent items (called (called F-listF-list): ): a-c-d-e-ga-c-d-e-g
IDID Frequent-item Frequent-item projectionprojection
100100 c, d, e, gc, d, e, g
200200 a, c, d, ea, c, d, e
300300 a, d, e, ga, d, e, g
400400 a, c, da, c, dBuild Build H-structH-struct in in
main memorymain memoryScan TDBScan TDB
H-Mine (Example) (Cont.)aa cc dd ee gg
33 33 44 33 22
cc dd ee gg
aa cc dd ee
aa dd ee gg
aa cc dd
100100
200200
300300
400400FrequentFrequent
projectionsprojections
HeaderHeadertable Htable H
H-StructH-Struct
H-Mine (Example) (Cont.)
cc dd ee gg
aa cc dd ee
aa dd ee gg
aa cc dd
100100
200200
300300
400400FrequentFrequent
projectionsprojections
cc dd ee gg
22 33 22 11
HeaderHeadertable Htable H
aa cc dd ee gg
33 33 44 33 22HeaderHeadertable Htable H
ac: 2ac: 2ad: 3ad: 3ae: 2ae: 2
H-Mine (Example) (Cont.)
a:3, c:3, d:4, e:3, g:2,a:3, c:3, d:4, e:3, g:2,ac:2, ad:3, ae:2,ac:2, ad:3, ae:2,
acd:2,acd:2,ade:2,ade:2,
cd:3, ce:2,cd:3, ce:2,cde:2,cde:2,
de:3, dg:2,de:3, dg:2,deg:2,deg:2,eg: 2eg: 2
TDBTDBIDID ItemsItems100100 c, d, e, f, g, ic, d, e, f, g, i200200 a, c, d, e, ma, c, d, e, m
300300 a, b, d, e, g, a, b, d, e, g, kk
400400 a, c, d, ha, c, d, h
min_sup_count = 2min_sup_count = 2
OutputOutput
FP-growth(Example)
{}
f:4 c:1
b:1
p:1
b:1c:3
a:3
b:1m:2
p:2 m:1
Header Table
Item frequency head f 4c 4a 3b 3m 3p 3
min_support = 3
TID Items bought (ordered) frequent items100 {f, a, c, d, g, i, m, p} {f, c, a, m, p}200 {a, b, c, f, l, m, o} {f, c, a, b, m}300 {b, f, h, j, o, w} {f, b}400 {b, c, k, s, p} {c, b, p}500 {a, f, c, e, l, p, m, n} {f, c, a, m, p}
f-c-a-m-p
Computation of Support Upper Bounds
corollarycorollary
Mining Frequent Patterns with UFP-tree
Goal: It avoids recursively constructing conditional FP-trees.
Trie Tree
Experiment Results
Experiment Results
Experiment Results
Experiment Results
Conclusion
In this tests, we found UApriori and UH-mine are both efficient in mining frequent itemsets.