[ieee communication technologies, research, innovation, and vision for the future (rivf) - hanoi,...

6
Efficient Algorithms for Mining Frequent Weighted Itemsets from Weighted Items Databases Bac Le Faculty of Information Technology, University of Science, Ho Chi Minh, Vietnam [email protected] Huy Nguyen Faculty of Information Technology, Saigon University Ho Chi Minh, Vietnam [email protected] Bay Vo Faculty of Information Technology, Ho Chi Minh City University of Technology, Vietnam [email protected] Abstract—In this paper, we propose algorithms for mining Frequent Weighted Itemsets (FWIs) from weighted items transaction databases. Firstly, we introduce the WIT-tree data structure for mining high utility itemsets in the work of Le et al. (2009) and modify it for mining FWIs. Next, some theorems are proposed. Based on these theorems and the WIT-tree, we propose an algorithm for mining FWIs. Finally, Diffset for fast computing the weighted support of itemsets and saving memory are also discussed. We test the proposed algorithms in many databases and experimental results show that they are very efficient in comparison with Apriori-based approach. Keywords—frequent weighted support; frequent weighted itemsets; WIT-tree; WIT-FWIs; weighted items transaction databases. I. INTRODUCTION Association rule is a most important task in Knowledge Discovery and Data mining (KDD). It gets relationships among the items in large number of transactions. Given } ,..., , { 2 1 n i i i I = as a set of items and a database D is a set of transactions. Let X I, the support of X in D, denoted as σ(X), is the number of transactions in D containing X, X is called frequent if σ(X) minSup (minimum support). A classical association rule is an implication form {X Y (sup, conf)}, where X, Y I, and X Y = Ø. The support of this rule is sup = σ(XY) and the confidence is conf = ) ( ) ( X XY σ σ . Given the minSup and the minimum confidence (minConf), we want to mine all association rules that satisfy minSup and minConf. The classical association rule mining doesn’t care in the benefit of items. However, in some applications, we are interested in weighted value (or benefit) of each item. For example, bread gives 2 cents and a bottle of milk gives 4 cents when we bought them together. Thus, it is necessary to have new methods for mining in this kind of databases. In 1998, Ramkumar et al. [7] proposed the model of weighted association rules (WARs) and presented an Apriori- based algorithm for mining FWIs. In 2003, Tao et al. [8] proposed the method of mining WARs. In that paper, authors did not give any algorithm for mining FWIs. In 2008, Khan et al. [5] proposed a method for mining fuzzy weighted association rules. To mine WARs, we must mine all frequent weighted itemsets. Therefore, mining FWIs is the most important in the process of mining WARs. The contributions of this paper are as follows: i) We modify the WIT-tree and propose an algorithm for mining FWIs based on WIT-tree (modified). ii) Some theorems and corollaries are proposed. Based on them, we propose an algorithm for fast mining FWIs. iii) We apply the Diffset strategy [12] and extend it for mining FWIs. The rest of this paper is organized as follows. Section 2 presents related works to mine WARs. Section 3 presents a modification of WIT-tree [6], for compressing the database into a tree structure. Algorithms for mining FWIs using WIT- tree are discussed in section 4. The experimental results will present in section 5, and we conclude our work in section 6. II. RELATED WORK The weighted items transaction database is defined as follows: Given a database D with a set of transactions } ,..., , { 2 1 m t t t T = , a set of items } ,..., , { 2 1 n i i i I = and a set of positive weights } ,..., , { 2 1 n w w w W = corresponds with each item in I. For example: Consider the database in Table I and Table II. Table I has six transactions T = {t 1 ,…, t 6 }, and five items I = {A, B, C, D, E}. The weights of these items are in W = {0.6, 0.1, 0.3, 0.9, 0.2} stored in Table II. TABLE I. A TRANSACTION DATABASE Transactions Items bought 1 A, B, D, E 2 B, C, E 3 A, B, D, E 4 A, B, C, E 5 A, B, C, D, E, F 6 B, C, D 978-1-4244-8075-3/10/$26.00 ©2010 IEEE

Upload: bay

Post on 31-Mar-2017

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: [IEEE Communication Technologies, Research, Innovation, and Vision for the Future (RIVF) - Hanoi, Vietnam (2010.11.1-2010.11.4)] 2010 IEEE RIVF International Conference on Computing

Efficient Algorithms for Mining Frequent Weighted Itemsets from Weighted Items Databases

Bac Le Faculty of Information Technology, University of Science,

Ho Chi Minh, Vietnam [email protected]

Huy Nguyen Faculty of Information Technology, Saigon University

Ho Chi Minh, Vietnam [email protected]

Bay Vo Faculty of Information Technology, Ho Chi Minh City University of Technology, Vietnam

[email protected]

Abstract—In this paper, we propose algorithms for mining Frequent Weighted Itemsets (FWIs) from weighted items transaction databases. Firstly, we introduce the WIT-tree data structure for mining high utility itemsets in the work of Le et al.(2009) and modify it for mining FWIs. Next, some theorems are proposed. Based on these theorems and the WIT-tree, we propose an algorithm for mining FWIs. Finally, Diffset for fast computing the weighted support of itemsets and saving memory are also discussed. We test the proposed algorithms in many databases and experimental results show that they are very efficient in comparison with Apriori-based approach.

Keywords—frequent weighted support; frequent weighted itemsets; WIT-tree; WIT-FWIs; weighted items transaction databases.

I. INTRODUCTION

Association rule is a most important task in Knowledge Discovery and Data mining (KDD). It gets relationships among the items in large number of transactions. Given

},...,,{ 21 niiiI = as a set of items and a database D is a set of transactions. Let X ⊆ I, the support of X in D, denoted as σ(X),is the number of transactions in D containing X, X is called frequent if σ(X) ≥ minSup (minimum support). A classical association rule is an implication form {X Y (sup, conf)}, where X, Y ⊆ I, and X ∩ Y = Ø. The support of this rule is sup

= σ(XY) and the confidence is conf = )()(

XXY

σσ . Given the

minSup and the minimum confidence (minConf), we want to mine all association rules that satisfy minSup and minConf.

The classical association rule mining doesn’t care in the benefit of items. However, in some applications, we are interested in weighted value (or benefit) of each item. For example, bread gives 2 cents and a bottle of milk gives 4 cents when we bought them together. Thus, it is necessary to have new methods for mining in this kind of databases.

In 1998, Ramkumar et al. [7] proposed the model of weighted association rules (WARs) and presented an Apriori-based algorithm for mining FWIs. In 2003, Tao et al. [8] proposed the method of mining WARs. In that paper, authors

did not give any algorithm for mining FWIs. In 2008, Khan et al. [5] proposed a method for mining fuzzy weighted association rules. To mine WARs, we must mine all frequent weighted itemsets. Therefore, mining FWIs is the most important in the process of mining WARs.

The contributions of this paper are as follows:

i) We modify the WIT-tree and propose an algorithm for mining FWIs based on WIT-tree (modified).

ii) Some theorems and corollaries are proposed. Based on them, we propose an algorithm for fast mining FWIs.

iii) We apply the Diffset strategy [12] and extend it for mining FWIs.

The rest of this paper is organized as follows. Section 2 presents related works to mine WARs. Section 3 presents a modification of WIT-tree [6], for compressing the database into a tree structure. Algorithms for mining FWIs using WIT-tree are discussed in section 4. The experimental results will present in section 5, and we conclude our work in section 6.

II. RELATED WORK

The weighted items transaction database is defined as follows: Given a database D with a set of transactions

},...,,{ 21 mtttT = , a set of items },...,,{ 21 niiiI = and a set

of positive weights },...,,{ 21 nwwwW = corresponds with each item in I.

For example: Consider the database in Table I and Table II.

Table I has six transactions T = {t1,…, t6}, and five items I= {A, B, C, D, E}. The weights of these items are in W = {0.6, 0.1, 0.3, 0.9, 0.2} stored in Table II.

TABLE I. A TRANSACTION DATABASE

Transactions Items bought 1 A, B, D, E 2 B, C, E 3 A, B, D, E 4 A, B, C, E 5 A, B, C, D, E, F 6 B, C, D

978-1-4244-8075-3/10/$26.00 ©2010 IEEE

Page 2: [IEEE Communication Technologies, Research, Innovation, and Vision for the Future (RIVF) - Hanoi, Vietnam (2010.11.1-2010.11.4)] 2010 IEEE RIVF International Conference on Computing

TABLE II. WEIGHTED ITEMS TABLE

Items Weight A 0.6 B 0.1 C 0.3 D 0.9 E 0.2

A. Galois connection Let δ ⊆ I × T be a binary relation, where I is the set of items

and T is the set of transactions of the database D. Let IX ⊆ ,TY ⊆ . Let P(S) include all subset of S. Two mappings

between P(I) and P(T) are called Galois connection as follows [4, 11]:

i) { }yxXxTyXtTPIPt δ,|)(),()(: ∈∀∈=ii) { }yxYyIxYiIPTPi δ,|)(),()(: ∈∀∈=The mapping t(X) is the set of transactions in the database

which contains X, and the mapping i(Y) is the itemset that contains in all transactions Y.

Given X, X1, X2 ∈ P(I) and Y, Y1, Y2 ∈ P(T). The Galois connection satisfies the following properties [4, 11]:

i) X1 ⊂ X2 t(X1) ⊇ t(X2)ii) Y1 ⊂ Y2 i(Y1) ⊇ i(Y2)iii) X ⊆ i(t(X)) and Y ⊆ t(i(Y))

B. Mining weighted association rules Definition 2.1. The transaction weight (tw) of a transaction

tk is defined as follows:

k

tij

k t

wttw kj ∈=)( (2.1)

Definition 2.2. The weighted support of an itemset is defined as follows:

∈=

Ttk

Xttk

k

k

ttw

ttwXws

)(

)()( )( (2.2)

where T is the list of transactions in the database.

Example 2.1. Consider tables I, II, and definition 2.1, we can compute the tw(t1) value as follow:

45.04

2.09.01.06.0)( 1 =+++=ttw

TABLE III. TRANSACTION WEIGHTS OF ALL TRANSACTIONS IN TABLE I

Transactions tw1 0.45 2 0.2 3 0.45 4 0.3 5 0.42 6 0.43

Sum 2.25

From tables I, III, and definition 2.2, we can compute the ws(BD) value as follows:

Because BD appears in transactions {1, 3, 5, 6}, ws(BD) is computed:

78.025.2

43.042.045.045.0)( ≈+++=BDws

Theorem 2.1. The weighted support satisfies the downward closure property. i.e., if X ⊂ Y then ws(X) ≥ ws(Y).

Proof: Because X ⊂ Y, according to the property i) of Galois connection, we have t(X) ⊇ t(Y)

∈∈≥

)()()()(

Yttk

Xttk

kk

ttwttw

∈ ≥

Ttk

Yttk

Ttk

Xttk

k

k

k

k

ttw

ttw

ttw

ttw

)(

)(

)(

)()()( )()( YwsXws ≥ . Thus, the

weighted support satisfies downward closure property.

III. WIT-TREE DATA STRUCTURE

In [6], we proposed the WIT-tree (Weighted Itemset-Tidset tree) data structure, an expansion of IT-tree [10], to mine high utility itemsets. We modify WIT-tree structure by changing twu property into ws property. By WIT-tree technique, algorithms in section 4 only scan the database once because it bases on the intersection of Tidsets to compute the weighted support in next steps. Thus, it saves the time for the database scan and makes the algorithms be done faster.

a) Vertex: Includes 3 fields

X: an itemset. t(X): the set of transactions contains X.ws: the weighted support of X. The vertex is denoted

)()(

XwsXtX × .

We can see that the value of ws(X) is computed by summing all tw values of transactions which their tids belong to t(X), then dividing it by the sum of all tw values. Thus, computing of ws(X) will be done quickly by using Tidset.

b) Arc: Connecting the vertex at kth level (called X) with the vertex at (k+1)th level (called Y).

Definition 3.1[21] – The equivalence class

Let I be a set of items and IX ⊆ , a function p(X,k) = X[1:k] as the k length prefix of X and a prefix-based equivalence relation Kθ on itemsets as follows:

),(),(,, kYpkXpYXIYXk

=⇔≡⊆∀ θ .

The set of all itemsets having the same prefix X is called as an equivalence class, and denoted an equivalence class with prefix X is [X].

Example 3.1: Consider the tables I and III above, we have WIT-tree for mining frequent weighted itemsets as in Figure 1:

Page 3: [IEEE Communication Technologies, Research, Innovation, and Vision for the Future (RIVF) - Hanoi, Vietnam (2010.11.1-2010.11.4)] 2010 IEEE RIVF International Conference on Computing

Figure 1. Search tree using WIT-tree

The root node of WIT-tree contains all single item nodes. All nodes in level 1 belong to the same equivalence class with prefix {} (or [∅]). Each node in level 1 will become a new equivalence class using its item as the prefix. With each node in the same prefix, it will join with all nodes successive it to create a new equivalence class. The process will be done recursively to create new equivalence classes in higher levels.For example: Consider the Figure 1, nodes {A}, {B}, {C}, {D}, {E} belong to the equivalence class [∅]. Consider node {A}: {A} will join with all nodes successive it ({B}, {C}, {D}, {E}) to create a new equivalence class [A] = {{AB}, {AC}, {AD}, {AE}}. [AB] will become a new equivalence class by joining with all nodes successive it ({AC}, {AD}, {AE}).

Figure 1 shows that all itemsets satisfy downward closure property. Thus, we can prune an equivalence class in WIT-tree if its ws does not satisfy the minimum weighted support (minws). For example, suppose that minws = 0.4, because ws(ABC) = 0.32 < minws we can prune the equivalence class which the prefix is ABC, i.e., all child nodes of ABC will be pruned.

IV. MINING FREQUENT WEIGHTED ITEMSETS

A. The algorithm In this section, we base on the hybrid approach [10, 12] to

mine FWIs.

In function WIT-FWIs (Figure 2), let [ ] { }nlllP ,...,, 21=be an equivalence class, P is the parent node of each li, and

each li is an itemset that represents for node )(

)(

i

ii

PlwsPltPl ×

. The

input of this function is the equivalence class P ([P]): It considers each vertex li with all lj (after li in [P], lines 6 and 9).

Let X = li ∪ lj, it computes Y = t(X) = t(li) ∩ t(lj) (lines 10 and 11), if ws(X) (computed through t(X), line 12) satisfies minws (line 13), we add new vertex

)( XwsYX × into equivalence class [Pi]

(line 14). The last line (line 15) calls FWIs-EXTEND function recursively to process the equivalence class that begins from [Pi] until no any vertex is created. Function COMPUTE-WS(Y) uses the eq. (2.2) to compute the ws of itemset X based on the values in Table III with Y = t(X).

Input: database D and minws

Output: FWIs contains all frequent weighted itemsets that satisfy minws from D

Method:1. WIT-FWIs() 2. [∅] = {i ∈ I: ws(i) ≥ minws}3. SORT([∅]) 4. FWIs = ∅5. FWIs_EXTEND([∅])

FWIs-EXTEND([P])6. for all li ∈ [P] do 7. Add (li, ws(li)) to FWIs8. [Pi] = ∅9. for all lj ∈ [P], with j > i do 10. X = li ∪ lj11. Y = t(li) ∩ t(lj)12. ws(X) = COMPUTE-WS(Y) // using the eq. (2.2)13. if ws(X) ≥ minws then 14. [Pi] = [Pi] ∪ {

)( XwsYX × }

15. FWIs-EXTEND ([Pi])Figure 2. WIT-FWIs algorithm for mining frequent weighted itemsets

{}

A×1345 B×123456 C×2456 D×1356 E×12345 0.72 1.0 0.6 0.78 0.81

AB×1345 AC×45 AD×135 AE×1345 BC×2456 BD×1356 BE×12345 CD×56 CE×245 DE×135 0.72 0.32 0.59 0.72 0.6 0.78 0.81 0.38 0.41 0.59

ABC×45 ABD×135 ABE×1345 ACD×5 ACE×45 ADE×135 BCD×56 BCE×245 BDE×135 CDE×5 0.32 0.59 0.72 0.19 0.32 0.59 0.38 0.41 0.59 0.19

ABCD×5 ABCE×45 ABDE×135 ACDE×5 BCDE×5 0.19 0.32 0.59 0.19 0.19

ABCDE×5 0.19

Page 4: [IEEE Communication Technologies, Research, Innovation, and Vision for the Future (RIVF) - Hanoi, Vietnam (2010.11.1-2010.11.4)] 2010 IEEE RIVF International Conference on Computing

B. Illustration According to tables I and III, the illustration of WIT-FWIs

algorithm with minws = 0.4 is as follows:

Figure 3. WIT-tree for mining FWIs from the database in Table I withminws = 0.4.

At the beginning, we compute the ws values of the single items. The weighted support values of 1-items are ws(A) = 0.72, ws(B) = 1.0, ws(C) = 0.60, ws(D) = 0.78, ws(E) = 0.81. All ws values of single items satisfy minws [∅] = {A, C, D,E, B}.

Consider equivalence class [A]: First of all, A is added to FWIs FWIs = {A}.

A joins C, we have a new itemset AC×45 with ws(AC) = 0.32 < minws, so AC is not added to [A].

A joins D, we have a new itemset AD×135 with ws(AD) = 0.59, so add AD to [A] [A] = {AD}.

A joins E, we have a new itemset AE×1345 with ws(AE) = 0.72 [A] = {AD, AE}.

A joins B, we have a new itemset AB×1345 with ws(AB) = 0.72 ≥ minws, add AB to [A] [A] = { AD, AE, AB}.

After making the equivalence class [A], the algorithm will be called recursively to create all equivalence classes successive it.

Consider equivalence class [AD]: Add AD to FWIsFWIs = {A, AD}.

AD joins AE, we have a new itemset ADE×135 with ws(ADE) = 0.59, so add ADE to [AD] [AD] = {ADE}.

AD joins AB, we have a new itemset ADB×135 with ws(ADB) = 0.59, so add ADB to [AD] [AD] = {ADE, ADB}.

Consider equivalence class [ADE]: Add ADE to FWIsFWIs = {A, AD, ADE}.

ADE joins ADB into a new itemset ADEB×135 with ws(ADEB) = 0.59 [ADE] = {ADEB}. Because the equivalence class [ADE] has only one element there is no any equivalence class created in successive it.

Similarly to equivalence classes [C], [D], [E], [B].

Finally, we have the set of all FWIs that satisfy minws = 0.4 is FWIs = {A, AB, AD, AE, ABD, ABE, ABDE, ADE, B, BC, BD, BE, BCE, BDE, C, CE, D, DE, E} (details in Figure 3).

Theorem 4.1: Given two itemsets X and Y, if t(X) = t(Y) then ws(X) = ws(Y).Proof: Because t(X) = t(Y)

∈∈=

)()()()(

Yttk

Xttk

kk

ttwttw

∈ =

Ttk

Yttk

Ttk

Xttk

k

k

k

k

ttw

ttw

ttw

ttw

)(

)(

)(

)()()( or ws(X) = ws(Y)

Corollary 4.1: If X ⊂ Y and |t(X)| = |t(Y)| then ws(X) = ws(Y).

Proof: If X ⊂ Y, we have t(X) ⊇ t(Y) (according to the property i) of the Galois connection). Besides, because |t(X)| = |t(Y)| t(X) = t(Y) ws(X) = ws(Y) according to the theorem 4.1.

Based on corollary 4.1, we develop an algorithm for mining FWIs with some modifications of the algorithm in the section 4.1. When we join two nodes li, lj of [P] to create a new node li∪ lj, if |t(li)| = |t(li∪lj)| then ws(li ∪ lj) = ws(li), we need not compute the ws value of li ∪ lj. Similarly, if |t(lj)| = |t(li∪lj)| then ws(li ∪ lj) = ws(li), we need not also compute the ws value of li ∪ lj.

C. The modification algorithm for mining FWIs We change line 12 of the algorithm in Figure 2 into 3 lines:

12. if |t(li)| = |Y| then ws(X) = ws(li) // use corollary 4.113. elseif |t(lj)| = |Y| then ws(X) = ws(lj) // use corollary 4.114. else ws(X) = COMPUTE-WS(Y) // use the eq. (2.2)

Example 4.1. Consider the Figure 3: when we join node A×1345 with node B×123456 to create new node AB×1345, because of |t(A)| = |1345| = |t(AB)| ws(AB) = ws(A) = 0.72. Similarly, |t(C)| = |2456| = |t(BC)| ws(BC) = ws(BC) = 0.6. Besides, because t(A)| ≠ |t(AD)| and |t(D)| ≠ |t(AD)| we must compute the ws value of AD (based on t(AD)). Based on the corollary 4.1, we need not compute ws values of 11 itemsets, those are {AE, AB, CB, DE, BD, EB, ADE, ADB, AEB, CEB, DEB, ABDE}.

D. Diffset for computing ws values fast and saving memory Zaki and Gouda [12] proposed the Diffset strategy for fast

computing the support of itemsets and saving memory to store Tidsets. We recognize that it can be used for fast computing the ws values of itemsets.

Let d(PXY) be the difference set between PX and PY. We have:

d(PXY) = t(PX) – t(PY) [12] (4.1) where PX and PY in equivalence class [P].

Assume that we have d(PX) and d(PY), and need get d(PXY): According to the results in [12], we can get it easily by computing the difference set between d(PY) and d(PX):

d(PXY) = d(PY) – d(PX) [12] (4.2) Based on eq. (4.1) and eq. (4.2), we can compute the ws

value of PXY by using the d(PXY) as follows:

{}

A×1345 C×2456 D×1356 E×12345 B×123456 0.72 0.6 0.78 0.81 1.0

AD×135 AE×1345 AB×1345 CE×245 CB×2456 DE×135 DB×1356 EB×12345 0.59 0.72 0.72 0.41 0.6 0.59 0.78 0.81

ADE×135 ADB×135 AEB×1345 CEB×245 DEB×135 0.59 0.59 0.72 0.41 0.59

ADEB×135 0.59

Page 5: [IEEE Communication Technologies, Research, Innovation, and Vision for the Future (RIVF) - Hanoi, Vietnam (2010.11.1-2010.11.4)] 2010 IEEE RIVF International Conference on Computing

ws(PXY) = ws(PX) –

Tt

PXYdt

ttw

ttw

)(

)()( (4.3)

Proof: We have t(PXY) = t(PX)∩t(PY) = t(PX) – [t(PX) –

t(PY)] = t(PX) – d(PXY)

∈=

Ttk

PXYttk

k

k

ttw

ttwPXYws

)(

)()( )(

−∈=

Ttk

PXYdPXttk

k

k

ttw

ttw

)(

)()]()([

∈∈

−=

Ttk

PXYdtk

PXttk

k

kk

ttw

ttwttw

)(

)()()()(

−=

Ttk

PXttk

k

k

ttw

ttw

)(

)()( =

Ttk

PXYdtk

k

k

ttw

ttw

)(

)()( ws(PX) –

Ttk

PXYdtk

k

k

ttw

ttw

)(

)()(

Based on eq. (4.1), (4.2) and (4.3), we can use Diffset instead of using Tidset for computing the ws values of itemsets in the process of mining FWIs. Theorem 4.2. If d(PXY) = ∅ then ws(PXY) = ws(PX).Proof: Because d(PXY) = ∅ ws(PXY) = ws(PX) –

Tt

PXYdt

ttw

ttw

)(

)()( = ws(PX).

To save the memory for storing Diffset and the time for computing Diffset, we sort the itemsets in the same equivalence class in increasing order by their ws.

1) The algorithm Input: A database D and minwsOutput: FWIs contains all frequent weighted itemsets that satisfy minws from DMethod:WIT-FWIs-DIFF() 1. [∅] = {i ∈ I: ws(i) ≥ minws}2. FWIs = ∅3. SORT([∅]) 4. FWIs-EXTEND-DIFF ([∅])

FWIs-EXTEND-DIFF ([P])5. for all li ∈ [P] do 6. Add (li, ws(li)) to FWIs7. [Pi] = ∅8. for all lj ∈ [P], with j > i do 9. X = li ∪ lj10. if P = ∅ then 11. Y = t(li) – t(lj)12. else 13. Y = d(lj) – t(li)14. if Y = ∅ then ws(X) = ws(li) // use theorem 4.215. else ws(X) = COMPUTE-WS-DIFF(Y) // use the eq. (4.3) 16. if ws(X) ≥ minws then 17. Add {

)( XwsYX × } to [Pi] // sort in increasing by |Y|

18. FWIs-EXTEND-DIFF ([Pi])Figure 4. WIT-FWIs-DIFF algorithm for mining frequent weighted

itemsets

WIT-FWIs-DIFF algorithm (figure 4) differs from the WIT-FWIs-MODIFY algorithm (figure 2) in that it uses Diffset to compute the ws value faster. Because level 1 stores the Tidsets, and if P = ∅ (line 9), means that li and lj belong to level 1, we use the eq. (4.1) to compute Y = d(X) = d(li∪lj) = t(li) – t(lj) (line 10). From level 2 stores Diffset, we use the eq. (4.2) to compute Y = d(X) = d(li∪lj) = d(lj) – d(li) (line 12). Line 13 uses the theorem 4.2 to fast compute ws(X).

2) Illustration According to tables I and III, we illustrate the WIT-FWIs-

DIFF algorithm with minws = 0.4 as follows. Level 1 of WIT-tree contains single items, their tids, and their ws. They are sorted in increasing order by their |tids|. The purpose of this work is to compute Diffset faster. For example, consider nodes B and D, if they are not sorted, we must compute d(BD) = t(B)– t(D) = 123456 – 1356 = 14; otherwise, d(DB) = t(D) – t(B) = 1356 – 123456 = ∅.

Consider equivalence class [A]:

A joins C: d(AC) = t(A) – t(C) = 1345 – 2456 = 13

ws(AC) = ws(A) –

Tt

ACdt

ttw

ttw

)(

)()( = 0.72 –

25.245.045.0 + = 0.32 <

minws.

A joins D: d(AD) = t(A) – t(D) = 1345 – 1356 = 4

ws(AD) = ws(A) –

Tt

ADdt

ttw

ttw

)(

)()( = 0.72 –

25.23.0 = 0.59 ≥ minws.

A joins E: d(AE) = t(A) – t(E) = 1345 – 12345 = ∅ws(AD) = ws(A) = 0.72.

A joins B: d(AB) = t(A) – t(B) = 1345 – 123456 = ∅ws(AB) = ws(A) = 0.72.

Figure 5. Results of the algorithm WIT-FWIs-DIFF from the database in Table I with minws = 0.4

{}

A×1345 C×2456 D×1356 E×12345 B×123456 0.72 0.6 0.78 0.81 1.0

AD×4 AE×∅ AB×∅ CE×6 CB×∅ DE×6 DB×∅ EB×∅ 0.59 0.72 0.72 0.41 0.6 0.59 0.78 0.81

ADE×∅ ADB×∅ AEB×∅ CEB×∅ DEB×∅ 0.59 0.59 0.72 0.41 0.59

ADEB×∅ 0.59

Page 6: [IEEE Communication Technologies, Research, Innovation, and Vision for the Future (RIVF) - Hanoi, Vietnam (2010.11.1-2010.11.4)] 2010 IEEE RIVF International Conference on Computing

V. EXPERIMENTAL RESULTS

All experiments described below were performed on a Centrino core 2 duo (2×2.53 GHz), 4GBs RAM memory, Windows 7, using C# 2008. The experimental databases were downloaded from http://fimi.cs.helsinki.fi/data/ to perform the test with features displayed in Table IV.

TABLE IV. EXPERIMENTAL DATABASES

We modify by creating one table to store weight values of items (value in range of 1 to 10) for each database.

TABLE V. EXPERIMENTAL RESULTS FROM THE DATABASES IN TABLE IV

DBs minws (%)

Time (s) #FWIs Apriori

[7] WIT-FWIs

WIT-FWIs-MODIFY DIFF

BMS-POS

10 4.04 3.53 3.29 2.5 12 8 4.62 4.42 4.24 3.57 21 6 7.32 6.27 6.08 5.59 32 4 14.74 13.78 13.49 12.27 85

Chess

85 2.4 1.25 1.25 0.08 2624 80 9.78 4.21 4.51 0.23 8088 75 33.12 10.97 10.05 0.37 20298 70 111.82 33.21 27.85 0.7 47181

Mushroom

35 1.5 1.36 0.84 0.33 1257 30 2.86 1.75 1.95 0.44 2937 25 5.34 2.96 2.67 0.91 5751 20 61.85 15.91 13.4 1.3 53853

Remark: DIFF means WIT-FWIs-DIFF

We compare the proposed algorithms to Apriori-based algorithm [7], and experimental results from Table V show that the proposed algorithms are more efficient than Apriori-based in mining FWIs. When the number of FWIs is small, the running time of algorithms based on WIT-tree is slight faster than Apriori-based. For example, consider the BMS-POS database with minws = 4%, the mining time of Apriori-based is 14.74 (s), of WIT-FWIs is 13.78 (s), of WIT-FWIs-MODIFY is 13.49 (s), and of WIT-FWIs-DIFF is 12.27 (s). The scale between Apriori-based and WIT-FWIs-DIFF is

%24.83%10074.1427.12 =× . However, when we compare them

in Chess and Mushroom databases with the number of FWIs is large, the running time of WIT-tree-based is more efficient than Apriori-based. For example: Consider the Chess database with minws = 70%, the mining time of Apriori-based is 111.82(s), of WIT-FWIs is 33.21 (s), of WIT-FWIs-MODIFY is 27.85 (s), and of WIT-FWIs-DIFF is 0.7 (s). The scale between Apriori-based and WIT-FWIs-DIFF is

%63.0%10082.1117.0 =× .

VI. CONCLUSION AND FUTURE WORK

This paper has presented the method for mining frequent weighted itemsets from weighted items transaction databases, and the efficient algorithms are also proposed. As above mentioned, the time of mining FWIs by WIT-tree algorithms is more efficient than the time of mining by Apriori-based. By WIT-tree data structure, the algorithms only scan the database once. The algorithm based on Diffset is faster than the others because it saves more memory for storing and therefore, the time of computing Diffset is more efficient.

In this paper, we only improve the phase of mining FWIs by WIT-tree. In the future, we will study how to mine efficient association rules from FWIs. Besides, mining association rules from frequent weighted closed itemsets (FWCIs) is more efficient than from FWIs. Therefore, we will discuss how to mine FWCIs from weighted items databases.

REFERENCES

[1] R. Agrawal, T. Imielinski, A. Swami, Mining association rules between sets of items in large databases, Proceedings of the 1993 ACM SIGMOD Conference Washington DC, USA, May 1993, pp. 207 – 216 (1993).

[2] R. Agrawal, R. Srikant, Fast algorithms for mining association rules, In VLDB'94 (1994), pp. 487 – 499 (1994).

[3] C.H. Cai, A.W. Fu, C.H. Cheng, W.W. Kwong, Mining Association Rules with Weighted Items, In Proceedings of International Database Engineering and Applications Symposium (IDEAS 98), pp. 68 – 77 (1998)

[4] B. Ganter, R. Wille, Formal Concept Analysis, In Springer-Verlag (1999)

[5] M.S. Khan, M. Muyeba, F. Coenen, Fuzzy Weighted Association Rule Mining with Weighted Support and Confidence Framework, Proc. 1st Int Workshop on Algorithms for Large-Scale Information Processing in Knowledge Discovery (ALSIP 2008), held in conjunction with PAKDD 2008 (Japan), pp. 52 – 64 (2008).

[6] B. Le, H. Nguyen, T.A. Cao, B. Vo, A Novel Algorithm for Mining High Utility Itemsets, the first Asian Conference on Intelligent Information and Database Systems, published by IEEE, pp. 13 – 16 (2009).

[7] G. D. Ramkumar, S. Ranka, S. Tsur, Weighted Association Rules: Model and Algorithm, SIGKDD, pp. 661 – 666 (1998).

[8] F. Tao, F. Murtagh, M. Farid, Weighted Association Rule Mining using Weighted Support and Significance Framework, SIGKDD, pp. 661 – 666 (2003).

[9] W. Wang, J. Yang, P. S. Yu, Efficient Mining of Weighted Association Rules, SIGKDD 2003, pp. 270 – 274 (2003).

[10] M. J. Zaki, C.J. Hsiao, Efficient Algorithms for Mining Closed Itemsets and Their Lattice Structure, IEEE Transactions on Knowledge and Data Engineering, Vol. 17, No 4, April 2005, pp. 462 – 478 (2005).

[11] M. J. Zaki, Mining Non-Redundant Association Rules, Data Mining and Knowledge Discovery. In: Kluwer Academic Publishers. Manufactured in the Netherlands, pp. 223 – 248 (2004).

[12] M. J. Zaki, K. Gouda, Fast Vertical Mining Using Diffsets, Proc. of Ninth ACM SIGKDD Int’l Conf. , pp. 326 - 335 (2003).

Database #Trans #Items Remark BMS-POS 515597 1656 Modified

Chess 3196 76 Modified Mushroom 8124 120 Modified