hash-based algorithm for mining association rules

Hash-Based Algorithm for Mining Association Rules

Data Mining Mining Association Rules

Mining Association Rules

Mining Association Rules Support

Obtain Large Itemset Confidence

Generate Association Rules

Apriori -رويكرد مبتني بر در Apriori ابتدا در ميان مجموعه ساختار هاي داده شده به دنبال زيرساختارهاي متناوبي با اندازه

رويكرد مبتني بر كوچك مي گرديم. پس از آن در هر مرحله با يك نود به يك زير ساختار متناوب، زير ساختار جديدي

.ايجاد مي شود براي افزودن نودها به يك زير ساختار متناوب، تنها نودهايي مورد استفاده قرار م يگيرند كه در

مرحله اول به عنوان نود متناوب شناخته شده باشند. با ايجاد زير ساختار جديد، مجموعه ساختارها براي مشخص شدن

تناوب يا عدم .تناوب زيرساختار جديد مورد پويش قرار م يگيرد

TID Items

A B C E

Itemset

C1Itemset Sup.

Itemset

{A B} 1

{A C} 2

{A E} 1

{B C} 2

{B E} 3

{C E} 2

Itemset

{A C} 2

{B C} 2

{B E} 3

{C E} 2

C2 C2 L2

Itemset

{B C E}

Itemset Sup.

{B C E} 2

Itemset Sup.

{B C E} 2

C3 C3 L3

Apriori

Apriori Cont. Disadvantages

Inefficient Produce much more useless

candidates

DHP Prune useless candidates in advance Reduce database size at each iteration

Direct Hashing with EfficientPruning for Fast Data Mining

C1 Count

Min sup=2

Making a hash table

{B C},{B E},{C E}

{A B},{A C},{A E},{B C},{B E},{C E}

H{[x y]}=((order of x )*10+(order of y)) mod 7;

2 0 2 0 3 1 2

0 1 2 3 4 5 6

1 0 1 0 1 0 1

Hash table H2

Hash address

The number of items hashed to bucket 0

Bit vector

TID Items

A B C E

Perfect Hashing Schemes (PHS) for Mining Association Rules

Motivation Apriori and DHP produce Ci from Li-

1 that may be the bottleneck

Collisions in DHP

Designing a perfect hashing function for every transaction databases is a thorny problem

Definition Definition. A Join operation is to join two

different (k-1)-itemsets, , respectively, to produces a k-itemset, where

= p1p2…pk-1

= q1q2…qk-1 and p2=q1, p3=q2,…,pk-2=qk-3, pk-1=qk-2.

Example: ABC, BCD 3-itemsets of ABCD: ABC, ABD, ACD, BCD only one pair that satisfies the join definition

Algorithm PHS (Perfect Hashing and Data

Shrinking)

Example1 (sup=2)

TID Items

100 ACD

200 BCE

300 BCDE

400 BE

TID Items

100 (CD)

200 (BC) (BE)(CE)

300 (BC)(BD)(BE)(CD)(CE)(DE)

400 (BE)

Itemsets (BC)

Support 2 1 3 2 2 1

Encoding A B C D

Original (BC) (BE) (CD) (CE)

Itemset

2 2( ) ( ) ( ( ) ( )) 1n n-index(X)hash X,Y C C index Y index X

Example2 (sup=2)

TID Items

100 Null

200 (AD)

300 (AC)(AD)

400 Null

Itemsets (AB)

Support 0 1 2 0 0 0

Encoding A

Original (AD)

Decode -> (BC)(CE) = BCE

2 2( ) ( ) ( ( ) ( )) 1n n-index(X)hash X,Y C C index Y index X

Problem on Hash Table Consider a database contains p transactions,

which are comprised of unique items and are of equal length N, and the minimum support of 1.

Loading density :2( )

, ( 1)( 1)

N kpm N

How to Improve the Loading Density

Two level perfect hash scheme (parital hash)

Hash Table C D Null Null

Count 1 2

Itemsets (AB)

Support 0 1 2 0 0 0

Experiments

T5I4D200K

1.5 1.25 1 0.75 0.5 0.25

Minimum Support (%)

PHS DHP Apriori

T20I4D100K

1000150020002500

1.25 1 0.75 0.5 0.25

Minimum Support (%)Tim

PHS DHP MPHP

Experiments

˺ ˹ ˹

˻ ˹ ˹

˼ ˹ ˹

~̊˹ ˹

�̊˹ ˹

˻ ˹ ˹ K ~̊˹ ˹ K �̊˹ ˹ K �̊˹ ˹ K ˺ ˹ ˹ ˹ K

Number of Transactions

Increasing Number of Transactions

T̊�I̊~(PHS) T˺ ˹ I̊�(PHS)

T̊�I̊~(DHP) T˺ ˹ I̊�(DHP)

Experiments

T15I8D500K

400500

1.5 1.25 1 0.75 0.5

Support (%)

Direct Hash Partial Hash

T15I8D500K (sup=0.5%)

2 3 4 5 6 7 8 9 10

PassesM

Direct Hash Partial Hash

We examined in this paper the issue of mining associationrules among items in a large database of sales transactions.The problem of discovering large itemsets wassolved by constructing a candidate set of itemsets firstand then, identifying, within this candidate set, thoseitemsets that meet the large itemset requirement

Conclusions

hash-based algorithm for mining association rules

Documents

hardware design for hash functions...md4 is a 128-bit...

shavisual: a visualization tool for the secure hash...

sm3 cryptographic hash algorithm · 2018-07-24 · 1 sm3...

a new hash algorithm - tu graz

maximizing the hash function of authentication...

cryptographic hash functions cs432. overview hash functions...

issn - ijcrar salehpour, et al.pdfspecial characteristics ....

secure hash algorithm 3 - github pages...secure hash...

image encryption technique based on hash algorithm ·...

secure hash algorithm

generic local algorithm for mining data streams

cryptographic hash...

recommendation for applications using approved … words:...

secure hash algorithm (sha-512)

multi-hash: a family of cryptographic hash algorithm ... ·...

digsig uk 13 · digital signatures nicolas t. courtois,...

novel algorithm for mining high utility itemsets

data mining-knowledge presentation—id3 algorithm

secure hash standard - nist · specify the secure hash...

cryptographic hash...