vincent s. tseng, cheng-wei wu, bai-en shie, and philip s. yu sig kdd 2010 up-growth: an efficient...

Click here to load reader

Upload: john-thorogood

Post on 01-Apr-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

  • Slide 1

Vincent S. Tseng, Cheng-Wei Wu, Bai-En Shie, and Philip S. Yu SIG KDD 2010 UP-Growth: An Efficient Algorithm for High Utility Itemset Mining 2010/8/25 1 Slide 2 Outline 2010/8/25 2 Motivation Problem Definition Method UP-Tree Structure UP-Growth Method Experimental Results Conclusions Slide 3 Motivation 2010/8/25 3 The unit profits and purchased quantities of the items are not taken into considerations in frequent itemset mining. The basic meaning of utility is the interestedness/ importance/profitability of items to the users. Slide 4 (Cont.) 2010/8/25 4 The utility of items in a transaction database consists of two aspects: External utility: the importance of distinct items. Internal utility: the importance of the items in the transaction. The utility of an itemset is defined as the external utility multiplied by the internal utility. High utility itemset: its utility is no less than a user- specified threshold. Slide 5 (Cont.) 2010/8/25 5 Mining high utility itemsets from the databases is not an easy task since the downward closure property used in frequent itemset mining cannot be applied here. How to effectively prune the search space and efficiently capture all high utility itemsets with no miss is a big challenge. Slide 6 Problem Definition 2010/8/25 6 If TWU(X) is no less than the minimum utility threshold, X is called a high transaction- weighted utilization itemset (abbreviated as HTWUI) u(i p,T d )=p(i p )*q(i p, T d ) u({A},T 1 )=5*1=5 u({AC},T 1 )=u({A},T 1 )+u({C},T 1 )=5+1=6 u({AD})=u({AD},T 1 )+u({AD}, T 3 )=7+17=24 TU(T 1 )=u({ACD},T 1 )= 8 TWU({AD})=TU(T 1 )+TU(T 3 ) =8+30=38 The transaction-weighted downward closure(TWDC): For any itemset X, if X is not a HTWUI, any superset of X is a low utility itemset. An itemset is called a high utility itemset if its utility is no less than min_util Slide 7 Proposed Method 2010/8/25 7 Construction of UP-Tree Generation of potential high utility itemsets (PHUIs) from the UP-Tree by UP-Growth Slide 8 Construction of UP-Tree 2010/8/25 8 The construction of UP-Tree can be performed with two scans of the original database. First scan TU of each transaction is computed. TWU of each single item is also accumulated. Discarding global unpromising items. Unpromising items are removed from the transaction and utilities are eliminated from the TU of the transaction. The remaining promising items in the transaction are sorted in the descending order of TWU. Second scan Transactions are inserted into UP-Tree. Slide 9 (Cont.) 2010/8/25 9 min_util= 40 First scan unpromising items Descending order of TWU Slide 10 (Cont.) 2010/8/25 10 Second scan Slide 11 (Cont.) 2010/8/25 11 18 Slide 12 (Cont.) 2010/8/25 12 18 Slide 13 (Cont.) 2010/8/25 13 230 1 1 22 Slide 14 (Cont.) 2010/8/25 14 Strategy 1. Discarding global unpromising items (DGU). Slide 15 Generating PHUIs from the global UP- tree 2010/8/25 15 {D}s conditional pattern base ({D}-CPB) An item i p is called a local promising item in {a i }-CPB if pu(i p, {a i }-CPB) is no smaller than min_util; {A}is a local unpromising item in {D}-CPB, any superset of {A} is not a high utility itemset. Slide 16 (Cont.) 2010/8/25 16 Generating PHUIs from {D}-Tree: {{D}:58,{DE}:45, {DEB}:45, {DEC}:45, {DEBC}:45, {DB}:45,{DBC}:45, {DC}:53} A set of PHUIs is {{D}:58,{DE}:45, {DEB}:45, {DEC}:45, {DEBC}:45, {DB}:45,{DBC}:45, {DC}:53}, {B}:61 {BE}:54, {BEC}:54, {BC}:54, {A}:65, {AC}:55, {ACE}:47, {AE}:47, {E}:88, {EC}:76, {C}:96}. Slide 17 Decreasing global node (DGN) utilities in construction of a global UP-Tree 2010/8/25 17 Strategy 2. Discarding global node utilities (DGN) The utilities of its descendants are discarded from the utility of the node during the construction of a global UP-Tree {B}s-CPB Slide 18 (Cont.) 2010/8/25 18 Slide 19 (Cont.) 2010/8/25 19 11 Slide 20 (Cont.) 2010/8/25 20 11 Slide 21 (Cont.) 2010/8/25 21 27 {C}.nu=1+p({C})q({C}, T 2 )=1+16=7 Slide 22 (Cont.) 2010/8/25 22 27 {E}.nu=p({C})q({C}, T 2 )+p({E})q({E}, T 2 )=16+32=12 112 Slide 23 (Cont.) 2010/8/25 23 27 {E}.nu=p({C})q({C}, T 2 )+p({E})q({E}, T 2 )+p({A})q({A}, T 2 )=16+32+52=22 112 122 Slide 24 (Cont.) 2010/8/25 24 A set of PHUIs is {{D}:58, {DE}:45, {DEB}:45, {DEBC}:45, {DEC}:45, {DB}:45, {DBC}:45, {DC}:53, {B}:61, {A}:65, {E}:88, {C}:96}. Slide 25 UP-Growth 2010/8/25 25 For efficiently generating PHUIs from the global UP-Tree with two strategies: DLU(Discarding local unpromising items) DLN(Decreasing local node utilities) Slide 26 DLU 2010/8/25 26 Due to memory space limit, instead of maintaining exact utility values of the items in the conditional pattern base, we maintain a minimum item utility table(MIUT). Strategy 3. Discarding local unpromising items(DLU) The MIUT of unpromising items are discarded from path utilities of the paths during the construction of a local UP-Tree Slide 27 (Cont.) 2010/8/25 27 8-miu({A}) {AC}.count = 51 = 5 25-miu({A}) {BAEC}.count = 51 = 5 Slide 28 DLN 2010/8/25 28 Strategy 4. Decreasing local node utilities(DLN): The MIUT of descendant nodes for the node are decreased during the construction of a local UP-Tree. 13 Slide 29 DLN 2010/8/25 29 Decreasing local node utilities(DLN): The MIUT of descendant nodes for the node are decreased during the construction of a local UP-Tree. 2 16 3+{20-miu({B})1-miu({E}) 1} = 3+13 = 16 1 17 1 20 20-miu({E})1 = 20-3= 17 Slide 30 DLN 2010/8/25 30 Decreasing local node utilities(DLN): The MIUT of descendant nodes for the node are decreased during the construction of a local UP-Tree. 3 29 16+{20-miu({B})1-miu({E}) 1} = 16+13 = 29 2 34 2 40 17+20-miu({E})1 = 17+17= 34 Slide 31 Experimental Results 2010/8/25 31 Slide 32 Scalability 2010/8/25 32 Slide 33 Conclusions 2010/8/25 33 This paper proposed an efficient UP-Growth algo. For mining high utility itemsets. A UP-Tree structure is proposed for maintaining the information of high utility itemsets By four strategies, the mining performance is enhanced significantly since both the search space and the number of candidates are effectively reduced.