icdm'06 panel 1 apriori algorithm rakesh agrawal ramakrishnan srikant (description by c....

17
ICDM'06 Panel ICDM'06 Panel 1 Apriori Algorithm Apriori Algorithm Rakesh Agrawal Rakesh Agrawal Ramakrishnan Srikant Ramakrishnan Srikant (description by C. Faloutsos) (description by C. Faloutsos)

Upload: julius-flowers

Post on 16-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)

ICDM'06 PanelICDM'06 Panel 11

Apriori AlgorithmApriori Algorithm

Rakesh AgrawalRakesh Agrawal

Ramakrishnan SrikantRamakrishnan Srikant(description by C. Faloutsos)(description by C. Faloutsos)

Page 2: ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)

ICDM'06 PanelICDM'06 Panel 22

Association rules - ideaAssociation rules - idea

[Agrawal+SIGMOD93][Agrawal+SIGMOD93] Consider ‘market basket’ case:Consider ‘market basket’ case:

(milk, bread)(milk, bread)(milk)(milk)(milk, chocolate)(milk, chocolate)(milk, bread)(milk, bread)

Find ‘interesting things’, eg., rules of the form:Find ‘interesting things’, eg., rules of the form: milk, bread -> chocolate | 90%milk, bread -> chocolate | 90%

Page 3: ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)

ICDM'06 PanelICDM'06 Panel 33

Association rules - ideaAssociation rules - idea

In general, for a given ruleIn general, for a given ruleIj, Ik, ... Im -> Ix | cIj, Ik, ... Im -> Ix | c

‘‘s’ = support: how often people buy Ij, ... Im, Ixs’ = support: how often people buy Ij, ... Im, Ix

‘‘c’ = ‘confidence’ (how often people by Ix, given c’ = ‘confidence’ (how often people by Ix, given that they have bought Ij, ... Imthat they have bought Ij, ... Im

Ix

40%20%

Eg.: s = 20% c= 20/40= 50%

Page 4: ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)

ICDM'06 PanelICDM'06 Panel 44

Association rules - ideaAssociation rules - idea

Problem definition:Problem definition: givengiven

– a set of ‘market baskets’ (=binary matrix, of N a set of ‘market baskets’ (=binary matrix, of N rows/baskets and M columns/products)rows/baskets and M columns/products)

– min-support ‘s’ andmin-support ‘s’ and– min-confidence ‘c’min-confidence ‘c’

findfind– all the rules with higher support and confidenceall the rules with higher support and confidence

Page 5: ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)

ICDM'06 PanelICDM'06 Panel 55

Association rules - ideaAssociation rules - idea

Closely related concept: “large itemset”Closely related concept: “large itemset”Ij, Ik, ... Im, IxIj, Ik, ... Im, Ix

is a ‘large itemset’, if it appears more than is a ‘large itemset’, if it appears more than ‘min-support’ times‘min-support’ times

Observation: once we have a ‘large itemset’, Observation: once we have a ‘large itemset’, we can find out the qualifying rules easily we can find out the qualifying rules easily

Thus, we focus on finding ‘large itemsets’Thus, we focus on finding ‘large itemsets’

Page 6: ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)

ICDM'06 PanelICDM'06 Panel 66

Association rules - ideaAssociation rules - idea

Naive solution: scan database once; keep 2**|Naive solution: scan database once; keep 2**|I| countersI| counters

Drawback?Drawback?

Improvement?Improvement?

Page 7: ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)

ICDM'06 PanelICDM'06 Panel 77

Association rules - ideaAssociation rules - idea

Naive solution: scan database once; keep 2**|Naive solution: scan database once; keep 2**|I| countersI| counters

Drawback? 2**1000 is prohibitive...Drawback? 2**1000 is prohibitive...

Improvement? scan the db |I| times, looking Improvement? scan the db |I| times, looking for 1-, 2-, etc itemsetsfor 1-, 2-, etc itemsets

Eg., for |I|=4 items only (a,b,c,d), we haveEg., for |I|=4 items only (a,b,c,d), we have

Page 8: ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)

ICDM'06 PanelICDM'06 Panel 88

What itemsets do you What itemsets do you count?count? Anti-monotonicity: Any superset of an infrequent

itemset is also infrequent (SIGMOD ’93).– If an itemset is infrequent, don’t count any of its

extensions.

Flip the property: All subsets of a frequent itemset are frequent.

Need not count any candidate that has an infrequent subset (VLDB ’94)– Simultaneously observed by Mannila et al., KDD ’94

Broadly applicable to extensions and restrictions.

Page 9: ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)

ICDM'06 PanelICDM'06 Panel 99

Apriori Algorithm: Apriori Algorithm: Breadth First SearchBreadth First Search

{ }

a b c d

say, min-sup = 10120

Page 10: ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)

ICDM'06 PanelICDM'06 Panel 1010

Apriori Algorithm: Apriori Algorithm: Breadth First SearchBreadth First Search

{ }

a b c d

say, min-sup = 10120

80 30570

Page 11: ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)

ICDM'06 PanelICDM'06 Panel 1111

Apriori Algorithm: Apriori Algorithm: Breadth First SearchBreadth First Search

{ }

a b c d

80 30570

say, min-sup = 10

Page 12: ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)

ICDM'06 PanelICDM'06 Panel 1212

Apriori Algorithm: Apriori Algorithm: Breadth First SearchBreadth First Search

{ }

a b c

a b a d

d

b d

Page 13: ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)

ICDM'06 PanelICDM'06 Panel 1313

Apriori Algorithm: Apriori Algorithm: Breadth First SearchBreadth First Search

{ }

a b c

a b a d

d

b d

Page 14: ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)

ICDM'06 PanelICDM'06 Panel 1414

Apriori Algorithm: Apriori Algorithm: Breadth First SearchBreadth First Search

{ }

a b c

a b a d

a b d

d

b d

Page 15: ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)

ICDM'06 PanelICDM'06 Panel 1515

Apriori Algorithm: Apriori Algorithm: Breadth First SearchBreadth First Search

{ }

a b c

a b a d

a b d

d

b d

Page 16: ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)

ICDM'06 PanelICDM'06 Panel 1616

Subsequent Algorithmic Subsequent Algorithmic InnovationsInnovations Reducing the cost of checking whether a candidate Reducing the cost of checking whether a candidate

itemset is contained in a transaction:itemset is contained in a transaction:– TID intersection.TID intersection.– Database projection, FP GrowthDatabase projection, FP Growth

Reducing the number of passes over the data:Reducing the number of passes over the data:– Sampling & Dynamic CountingSampling & Dynamic Counting

Reducing the number of candidates counted:Reducing the number of candidates counted:– For maximal patterns & constraints.For maximal patterns & constraints.

Many other innovative ideas …Many other innovative ideas …

Page 17: ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)

ICDM'06 PanelICDM'06 Panel 1717

ImpactImpact

Concepts in Apriori also applied to Concepts in Apriori also applied to many generalizations, e.g., many generalizations, e.g., taxonomies, quantitative Associations, taxonomies, quantitative Associations, sequential Patterns, graphs, …sequential Patterns, graphs, …

Over 3600 citations in Google Scholar.Over 3600 citations in Google Scholar.