© tan,steinbach, kumar introduction to data mining 4/18/2004 1 association rule mining l given a...

52
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining Given a set of transactions, find rules that will predict the occurrence of an item based on the occurrences of other items in the transaction Market-Basket transactions TID Items 1 Bread, M ilk 2 Bread, D iaper, Beer, Eggs 3 M ilk, D iaper, Beer, C oke 4 Bread, M ilk, D iaper, Beer 5 Bread, M ilk, D iaper, C oke Example of Association Rules {Diaper} {Beer}, {Milk, Bread} {Eggs,Coke}, {Beer, Bread} {Milk}, Implication means co- occurrence, not causality!

Upload: emmett-face

Post on 31-Mar-2015

221 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1

Association Rule Mining

Given a set of transactions, find rules that will predict the occurrence of an item based on the occurrences of other items in the transaction

Market-Basket transactions

TID Items

1 Bread, Milk

2 Bread, Diaper, Beer, Eggs

3 Milk, Diaper, Beer, Coke

4 Bread, Milk, Diaper, Beer

5 Bread, Milk, Diaper, Coke

Example of Association Rules

{Diaper} {Beer},{Milk, Bread} {Eggs,Coke},{Beer, Bread} {Milk},

Implication means co-occurrence, not causality!

Page 2: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 2

Definition: Frequent Itemset

Itemset– A collection of one or more items

Example: {Milk, Bread, Diaper}

– k-itemset An itemset that contains k items

Support count ()– Frequency of occurrence of an itemset

– E.g. ({Milk, Bread,Diaper}) = 2

Support– Fraction of transactions that contain an

itemset

– E.g. s({Milk, Bread, Diaper}) = 2/5

Frequent Itemset– An itemset whose support is greater

than or equal to a minsup threshold

TID Items

1 Bread, Milk

2 Bread, Diaper, Beer, Eggs

3 Milk, Diaper, Beer, Coke

4 Bread, Milk, Diaper, Beer

5 Bread, Milk, Diaper, Coke

Page 3: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 3

Definition: Association Rule

Example:Beer}Diaper,Milk{

4.052

|T|)BeerDiaper,,Milk(

s

67.032

)Diaper,Milk()BeerDiaper,Milk,(

c

Association Rule– An implication expression of the form

X Y, where X and Y are itemsets

– Example: {Milk, Diaper} {Beer}

Rule Evaluation Metrics– Support (s)

Fraction of transactions that contain both X and Y

– Confidence (c) Measures how often items in Y

appear in transactions thatcontain X

TID Items

1 Bread, Milk

2 Bread, Diaper, Beer, Eggs

3 Milk, Diaper, Beer, Coke

4 Bread, Milk, Diaper, Beer

5 Bread, Milk, Diaper, Coke

Page 4: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 4

Association Rule Mining Task

Given a set of transactions T, the goal of association rule mining is to find all rules having – support ≥ minsup threshold

– confidence ≥ minconf threshold

Brute-force approach:– List all possible association rules

– Compute the support and confidence for each rule

– Prune rules that fail the minsup and minconf thresholds

Computationally prohibitive!

Page 5: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 5

Mining Association Rules

Example of Rules:

{Milk,Diaper} {Beer} (s=0.4, c=0.67){Milk,Beer} {Diaper} (s=0.4, c=1.0){Diaper,Beer} {Milk} (s=0.4, c=0.67){Beer} {Milk,Diaper} (s=0.4, c=0.67) {Diaper} {Milk,Beer} (s=0.4, c=0.5) {Milk} {Diaper,Beer} (s=0.4, c=0.5)

TID Items

1 Bread, Milk

2 Bread, Diaper, Beer, Eggs

3 Milk, Diaper, Beer, Coke

4 Bread, Milk, Diaper, Beer

5 Bread, Milk, Diaper, Coke

Observations:

• All the above rules are binary partitions of the same itemset: {Milk, Diaper, Beer}

• Rules originating from the same itemset have identical support but can have different confidence

• Thus, we may decouple the support and confidence requirements

Page 6: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 6

Mining Association Rules

Two-step approach: 1. Frequent Itemset Generation

– Generate all itemsets whose support minsup

2. Rule Generation– Generate high confidence rules from each frequent itemset,

where each rule is a binary partitioning of a frequent itemset

Frequent itemset generation is still computationally expensive

Page 7: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 7

Frequent Itemset Generation

null

AB AC AD AE BC BD BE CD CE DE

A B C D E

ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE ACDE BCDE

ABCDE

Given d items, there are 2d possible candidate itemsets

Page 8: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 8

Frequent Itemset Generation

Brute-force approach: – Each itemset in the lattice is a candidate frequent itemset

– Count the support of each candidate by scanning the database

– Match each transaction against every candidate

– Complexity ~ O(NMw) => Expensive since M = 2d !!!

TID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke

N

Transactions List ofCandidates

M

w

Page 9: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 9

Frequent Itemset Generation Strategies

Reduce the number of candidates (M)– Complete search: M=2d

– Use pruning techniques to reduce M

Reduce the number of transactions (N)– Reduce size of N as the size of itemset increases– Used by DHP and vertical-based mining algorithms

Reduce the number of comparisons (NM)– Use efficient data structures to store the candidates or

transactions– No need to match every candidate against every

transaction

Page 10: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 10

Reducing Number of Candidates

Apriori principle:– If an itemset is frequent, then all of its subsets must also

be frequent

Apriori principle holds due to the following property of the support measure:

– Support of an itemset never exceeds the support of its subsets

– This is known as the anti-monotone property of support

)()()(:, YsXsYXYX

Page 11: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 11

Found to be Infrequent

null

AB AC AD AE BC BD BE CD CE DE

A B C D E

ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE ACDE BCDE

ABCDE

Illustrating Apriori Principle

null

AB AC AD AE BC BD BE CD CE DE

A B C D E

ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE ACDE BCDE

ABCDEPruned supersets

Page 12: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 12

Illustrating Apriori Principle

Item CountBread 4Coke 2Milk 4Beer 3Diaper 4Eggs 1

Itemset Count{Bread,Milk} 3{Bread,Beer} 2{Bread,Diaper} 3{Milk,Beer} 2{Milk,Diaper} 3{Beer,Diaper} 3

Itemset Count {Bread,Milk,Diaper} 3

Items (1-itemsets)

Pairs (2-itemsets)

(No need to generatecandidates involving Cokeor Eggs)

Triplets (3-itemsets)Minimum Support = 3

If every subset is considered, 6C1 + 6C2 + 6C3 = 41

With support-based pruning,6 + 6 + 1 = 13

Page 13: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 13

Frequent Itemset Mining

2 strategies:

– Breadth-first: AprioriExploit monotonicity to the maximum

– Depth-first strategy: EclatPrune the databaseDo not fully exploit monotonicity

Page 14: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 14

Apriori

A CB D

{}

minsup=2

0 0 0 0

Candidates

1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D

Page 15: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 15

Apriori

A CB D

{}

0 1 1 0

minsup=2

1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D

Candidates

Page 16: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 16

Apriori

A CB D

{}

0 2 2 0

minsup=2

1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D

Candidates

Page 17: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 17

Apriori

A CB D

{}

1 2 3 1

minsup=2

1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D

Candidates

Page 18: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 18

Apriori

A CB D

{}

2 3 4 2

minsup=2

1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D

Candidates

Page 19: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 19

Apriori

A CB D

{}

2 4 4 3

minsup=2

1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D

Candidates

Page 20: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 20

Apriori

AB BCAC AD CDBD

A CB D

{}

2 4 4 3

Candidates

minsup=2

1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D

Page 21: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 21

Apriori

AB BCAC AD CDBD

A CB D

{}

2 4 4 3

1 2 2 3 2 2

minsup=2

1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D

Page 22: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 22

Apriori

ACD BCD

AB BCAC AD CDBD

A CB D

{}

1 2 2 3 2 2

Candidates

2 4 4 3

minsup=2

1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D

Page 23: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 23

Apriori

ACD BCD

AB BCAC AD CDBD

A CB D

{}

1 2 2 3 2 2

2 1

2 4 4 3

minsup=2

1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D

Page 24: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 24

Apriori Algorithm

Method:

– Let k=1– Generate frequent itemsets of length 1– Repeat until no new frequent itemsets are identified

Generate length (k+1) candidate itemsets from length k frequent itemsets

Prune candidate itemsets containing subsets of length k that are infrequent

Count the support of each candidate by scanning the DB Eliminate candidates that are infrequent, leaving only those

that are frequent

Page 25: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 25

Frequent Itemset Mining

2 strategies:

– Breadth-first: AprioriExploit monotonicity to the maximum

– Depth-first strategy: EclatPrune the databaseDo not fully exploit monotonicity

Page 26: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 26

Depth-First Algorithms

Find all frequent itemsets

minsup=2

1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D

Page 27: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 27

Depth-First Algorithms

Find all frequent itemsets

with D

without D

minsup=2

1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D

minsup=2

1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D

minsup=2

1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D

Find all frequent itemsets

Find all frequent itemsets

Page 28: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 28

Depth-First Algorithms

Find all frequent itemsets

with D

without D

minsup=2

1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D

minsup=2

1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D

minsup=2

1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D

Find all frequent itemsets

Find all frequent itemsets

A, B, C, AC

A, B, C, AC, BC

Page 29: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 29

Depth-First Algorithms

Find all frequent itemsets

with D

without D

minsup=2

1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D

minsup=2

1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D

minsup=2

1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D

Find all frequent itemsets

Find all frequent itemsets

A, B, C, AC

A, B, C, AC, BC

AD, BD, CD, ACD A, B, C, AC, BCadd D again

Page 30: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 30

Depth-First Algorithms

Find all frequent itemsets

with D

without D

minsup=2

1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D

minsup=2

1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D

minsup=2

1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D

Find all frequent itemsets

Find all frequent itemsets

A, B, C, AC

A, B, C, AC, BC

AD, BD, CD, ACD + A, B, C, AC, BC

A, B, C, AC, BC, AD, BD, CD, ACD

Page 31: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 31

Depth-First Algorithm

1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D

A: 2B: 4C: 4D: 3

DB

Page 32: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 32

Depth-First Algorithm

1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D

A: 2B: 4C: 4D: 3

3 A, C4 A, B, C5 B,

A: 2B: 4C: 2

DBDB[D]

AC: 2

Page 33: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 33

Depth-First Algorithm

1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D

A: 2B: 4C: 4D: 3

3 A, C4 A, B, C5 B,

A: 2B: 4C: 2

3 A, 4 A, B

A: 2

DBDB[D]

DB[CD]

Page 34: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 34

Depth-First Algorithm

1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D

A: 2B: 4C: 4D: 3

3 A, C4 A, B, C5 B,

A: 2B: 4C: 2

3 A, 4 A, B

A: 2

DBDB[D]

DB[CD]

AC: 2

Page 35: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 35

Depth-First Algorithm

1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D

A: 2B: 4C: 4D: 3

3 A, C4 A, B, C5 B,

A: 2B: 4C: 2

DBDB[D]

AC: 2

Page 36: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 36

Depth-First Algorithm

1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D

A: 2B: 4C: 4D: 3

3 A, C4 A, B, C5 B,

A: 2B: 4C: 2

DBDB[D]

AC: 2

4 A

DB[BD]

A:1

Page 37: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 37

Depth-First Algorithm

1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D

A: 2B: 4C: 4D: 3

3 A, C4 A, B, C5 B,

A: 2B: 4C: 2

DBDB[D]

AC: 2

Page 38: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 38

Depth-First Algorithm

1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D

A: 2B: 4C: 4D: 3

3 A, C4 A, B, C5 B,

A: 2B: 4C: 2

DBDB[D]

AC: 2

AD: 2BD: 4CD: 2ACD: 2

Page 39: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 39

Depth-First Algorithm

1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D

A: 2B: 4C: 4D: 3

DB

AD: 2BD: 4CD: 2ACD: 2

Page 40: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 40

Depth-First Algorithm

1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D

A: 2B: 4C: 4D: 3

DB

AD: 2BD: 4CD: 2ACD: 2

1 B2 B3 A4 A, B

DB[C]

A: 2B: 3

Page 41: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 41

Depth-First Algorithm

1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D

A: 2B: 4C: 4D: 3

DB

AD: 2BD: 4CD: 2ACD: 2

1 B2 B3 A4 A, B

DB[C]

A: 2B: 3

1

2

4 AA: 1

DB[BC]

Page 42: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 42

Depth-First Algorithm

1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D

A: 2B: 4C: 4D: 3

DB

AD: 2BD: 4CD: 2ACD: 2

1 B2 B3 A4 A, B

DB[C]

A: 2B: 3

Page 43: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 43

Depth-First Algorithm

1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D

A: 2B: 4C: 4D: 3

DB

AD: 2BD: 4CD: 2ACD: 2AC: 2BC: 3

1 B2 B3 A4 A, B

DB[C]

A: 2B: 3

Page 44: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 44

Depth-First Algorithm

1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D

A: 2B: 4C: 4D: 3

DB

AD: 2BD: 4CD: 2ACD: 2AC: 2BC: 3

Page 45: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 45

Depth-First Algorithm

1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D

A: 2B: 4C: 4D: 3

DB

AD: 2BD: 4CD: 2ACD: 2AC: 2BC: 3

1 24 A 5

DB[B]

A:1

Page 46: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 46

Depth-First Algorithm

1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D

A: 2B: 4C: 4D: 3

DB

AD: 2BD: 4CD: 2ACD: 2AC: 2BC: 3

Page 47: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 47

Depth-First Algorithm

1 B, C2 B, C3 A, C, D4 A, B, C, D5 B, D

A: 2B: 4C: 4D: 3

DB

AD: 2BD: 4CD: 2ACD: 2AC: 2BC: 3

Final set of frequent itemsets

Page 48: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 48

ECLAT

For each item, store a list of transaction ids (tids)

TID Items1 A,B,E2 B,C,D3 C,E4 A,C,D5 A,B,C,D6 A,E7 A,B8 A,B,C9 A,C,D

10 B

HorizontalData Layout

A B C D E1 1 2 2 14 2 3 4 35 5 4 5 66 7 8 97 8 98 109

Vertical Data Layout

TID-list

Page 49: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 49

ECLAT

Determine support of any k-itemset by intersecting tid-lists of two of its (k-1) subsets.

Depth-first traversal of the search lattice Advantage: very fast support counting Disadvantage: intermediate tid-lists may become too large

for memory

A1456789

B1257810

AB1578

Page 50: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 50

Rule Generation

Given a frequent itemset L, find all non-empty subsets f L such that f L – f satisfies the minimum confidence requirement– If {A,B,C,D} is a frequent itemset, candidate rules:

ABC D, ABD C, ACD B, BCD A, A BCD, B ACD, C ABD, D ABCAB CD, AC BD, AD BC, BC AD, BD AC, CD AB,

If |L| = k, then there are 2k – 2 candidate association rules (ignoring L and L)

Page 51: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 51

Rule Generation

How to efficiently generate rules from frequent itemsets?– In general, confidence does not have an anti-monotone

propertyc(ABC D) can be larger or smaller than c(AB

D)

– But confidence of rules generated from the same itemset has an anti-monotone property

– e.g., L = {A,B,C,D}:

c(ABC D) c(AB CD) c(A BCD) Confidence is anti-monotone w.r.t. number of items on the RHS of the rule

Page 52: © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Association Rule Mining l Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 52

Rule Generation for Apriori Algorithm

ABCD=>{ }

BCD=>A ACD=>B ABD=>C ABC=>D

BC=>ADBD=>ACCD=>AB AD=>BC AC=>BD AB=>CD

D=>ABC C=>ABD B=>ACD A=>BCD

Lattice of rulesABCD=>{ }

BCD=>A ACD=>B ABD=>C ABC=>D

BC=>ADBD=>ACCD=>AB AD=>BC AC=>BD AB=>CD

D=>ABC C=>ABD B=>ACD A=>BCD

Pruned Rules

Low Confidence Rule