rule induction - plone sitemontesi/cbd/beatriz/session4-rule induction.pdf · outline •...

Rule induction

Dr Beatriz de la Iglesiaemail: [email protected]

Outline

• What are rules?• Rule Evaluation• Classification rules• Association rules

2B. de la Iglesia/2016

3

Rule induction (RI)

• As their name suggests, RI algorithms generate rules of some type.

• Rules take the form: IF antecedent THEN consequent.

• For example, in medical insurance DBIf (status=unemployed) then (diabetes=YES)

• Rules do not imply causality! Otherwise being unemployed would be the cause of diabetes! They are simply showing association of values in the real world.

B. de la Iglesia/2016

4

RI - Classification

• In context of classification rules are of the form:IF (set of conditions) then (class),

e.g.IF(age<25 and car_group>15) THEN (Risk=High)

• X is referred to as the antecedent of the rule.• Y is the consequent of the rule describing in this case a classification outcome.

• Association rules have a conjunctive consequent, i.e. made up of many clauses joined by and operators• IF (set of items) THEN (set of items)• IF (bread and cheese) THEN (wine and crackers)


Definition – classification rule

• A general definition of the antecedent and consequent is: • The left-hand side of the rule is a description of a subset of the population.

• The right-hand side of the rule is a description of interesting behaviour particular to the population on the left-hand side.

• An example of a (complex) rule is:

true diseaseheart THENSmoker AND 30 IF2

=>heightweight

5

AntecedentConsequent


Rule Evaluation - general

• Support, confidence and coverage.• The support for the antecedent sup(ant) is the number of records in the database for which the antecedent holds.

• The support for the consequent sup(con) is the number of records in the database for which the consequent holds.

• The support for the rule sup(rule) is the number of records in the database for which the rule holds (antecedent and consequent hold).

• From those other measures can be derived:

)antsup()rulesup(confidence= )consup(

)rulesup(eragecov =


Rule Evaluation

IF BMl > 30 AND smoker = true THEN heart disease = trueBMI smoker age heart disease35 true 55 true40 true 48 true25 true 57 false35 false 72 true31 false 45 false42 true 65 false37 true 43 false

sup(ant) = 4sup(con) = 3sup(rule) = 2

Confidence = 2/4 = 50%Coverage = 2/3 = 67%

7

Default Confidence = 3/7 = 43%)


Rule Evaluation

• Confidence can be taken as an indication of a rule's predictive ability on similar data – i.e. accuracy.

• If a rule has very low coverage it may be too specialised to have any useful predictive ability on similar data.

• The ruleIF age > 70 THEN heart disease = true

has confidence of 100% (in the previous data set) but includes only a single case.

• The coverage is only 14.3%. • Not a good rule, will not generalise


9

RI algorithms

• RI algorithms work by forming an initial set of rules (possibly just one rule) based on some starting criterion (possibly random).

• These rules are then applied to the train set of cases and the performance measured.

• Next, these rules are refined by generalisation, specialisation and adaptation to form new rules that better classify the train set cases.

• RI systems tend to differ in the way they adapt the rules and in their stopping criteria.


10

Generalisation

• As an example of generalisation, consider the condition part of the rule:

If (country=France)• This could be generalised to:

If (country=France) or (gender=M)• It is generalised in the sense that it is likely to pick up more cases than the simple condition.

• This is also referred to as adding a disjunct (an oroperator).


11

Specialisation

• Similarly, a rule can be specialised by adding a conjunct, an and operator.

If (country=France)• May be adapted to

If (country=France) and (age < 25)• This is specialising in the sense that it is looking for a subset of cases whose first condition is already satisfied.


12

Adaptation

l Finally, a rule may be adapted by changing one of the components of the rule:

If (country=France) and (age < 25)

n France replaced by Englandn = replaced by �n and replaced by orn < replaced by >n etc.


13

Rule induction

• The most commonly used rule induction algorithms are:• CN2• Heuristics

• but others are used, such as 1R, which extracts rules based on a single attribute in the antecedent.


14

CN2

• The original CN2 algorithm (Clarke and Niblett, 1989) induces rules from examples using entropy as its search measure.

• The problem with this measure was that it was aiming for rules with very high accuracy, irrespective of applicability.

• In a customer database with 1000 records, a rule such as “if (customer = id 326547) then (customer will churn) will have 100% accuracy but very low applicability and is not at all interesting. You know this already!!


15

Heuristics

• Heuristic techniques make a guess at a rule initiallyIf (age > 36) then (Buy = Yes)

• Then change this in an iterative manner through generalisation, specialisation and adaptation.

• Until the accuracy and applicability measures cannot be improved further.

• We have develop rule induction algorithms using simulated annealing and multi-objective optimisation to search for effective rules.


• Using Quinlan’s Golf dataset again propose 3 different rules that could be examined by a rule induction algorithm.

• For each rule calculate applicability, accuracy and coverage.• Can you rank those rules in terms of interest?

Discussion time

outlook temp humidity windy classsunny 75 70 TRUE playsunny 80 90 TRUE dontplaysunny 85 85 FALSE dontplaysunny 72 95 FALSE dontplaysunny 69 70 FALSE playovercast 72 90 TRUE playovercast 83 78 FALSE playovercast 64 65 TRUE playovercast 81 75 FALSE playrain 71 80 TRUE dontplayrain 65 70 TRUE dontplayrain 75 80 FALSE playrain 68 80 FALSE playrain 70 96 FALSE play


Association rules

• The association rules problem was introduced by Agrawal et. al in 1993.

• These type of rules express associations that exist in transaction data (sometimes called market basket data).

• Transaction data comprises a set of transactions, each transaction comprises a set of items.


Transaction data

• Each transaction can be thought of as a list of itemspurchased by a customer, transactions can vary in length.

• Transactions contain categorical data.• We want to determine which items are bought together, e.g for product positioning.

• Has wider applicability than buying items. • E.g. Comorbidity, component failure etc.

B. de la Iglesia/2016 18

Example of Transaction Data

Transaction Itemsets1 {bread, cheese, eggs, jam}2 {bread, butter, eggs}3 {bread, cheese, tomatoes, milk}4 {bread, cheese, eggs}5 {cheese, eggs, milk}6 {bread, butter, milk}7 {eggs, milk, salt}

19

� The antecedent is a set of items from the database, the consequent is a (single) item that is not in the antecedent.

An example of a rule is:-

{bread, cheese} � {eggs}

Antecedent ConsequentB. de la Iglesia/2016



20

{bread, cheese} � {eggs}

Evaluation

sup(ant) = 3 sup(con) = 5 sup(rule) = 2

Confidence = 2/3 (67 %) Coverage = 2/5 (40% )




21

• In this small example with 8 items there are 28 = 256 possible rules.

• 30 items = 230 = 1 billion rules

• It is a challenge to develop methods that find interesting rules in a reasonable time.

• We may need:Feature subset selectionDiscretization for continuous features


Apriori

22

• Apriori is an algorithm for discovering association rules in transaction data.

• It is widely documented. • Apriori uses minimum support minSup as a constraint.• An itemset is a set of items e.g. {bread, butter, eggs}• A frequent itemset is any itemset that has support �minSup• I.e. the items appear together in more than minSup records


Apriori ~ cont.

23

• The problem is decomposed into two parts.

1 - Find all frequent itemsets and determine their support.

2 - Use these frequent itemsets to find all association rules.

The first part is the time consuming bit.


Apriori ~ cont.

24

• Multiple passes are made over the data. • At pass k itemsets of degree k are evaluated. • In the first pass the database is scanned to determine support of all single itemsets.

• In subsequent passes candidate itemsets are generated from the frequent itemsets found in the previous pass and the database is scanned to determine their support.

• This process continues until a pass produces no frequent itemsets.


Apriori ~ cont.

25

• Let Lk-1 be the set of frequent itemsets obtained from the previous pass.

• Let Ck be the set of candidate itemsets to be used in the this pass.

• Candidate generation is a two step procedure. • Join Lk-1 with itself to produce Ck.• Prune Ck .• Evaluate Ck to remove non-frequent itemsets to create Lk.


Apriori ~ cont.

26

• Find all frequent itemsets of length 1 – single pass through data• JOIN• Every pair of itemsets in Lk-1 is considered. If the itemsets are identical except for the last item a new itemset is created by taking one of the itemsets and adding the last item from the other itemset to it.

Pass 1 Pass 2 Pass 3 Pass 4

{a} {a, b} {a, b, c} {a, b, c, d}{a, c}

{b} {a, d} {a, b, d}

{c} {a, c, d}

{d} {b, c} {b, c, d}{b, d}

{c, d}Frequent itemsets

This gives all combinations. 24 = 16.With, say, 50 items there are ~1015 possible combinations.Clearly not feasible


Apriori ~ Prune.

27

• The pruning stage removes from Ck any itemsets thatcannot be frequent.

• If an itemset is frequent then all of its subsets must also be frequent. In pass k itemsets of length k are considered, therefore all subsets of k have been found in previous passes.

• For each new itemset that is generated Apriori checks that all of its subsets of size k-1 exist in Lk-1. If they do not exist then they are not frequent and therefore the new item cannot be frequent so it is pruned from Ck

• The name of the algorithm is taken from the fact that it is known apriori that an itemset cannot be frequent.


Apriori ~ cont.

28

Pass 1 Pass 2 Pass 3 Pass 4

{a} {a, b} {a, b, c}{a, c}

{b} {a, d} {a, b, d}

{c} {a, c, d}

{d} {b, c} {b,c,d}{b, d}

{c, d}1. {b,c} is not frequent

and is pruned

2. Therefore, {a, b,c} is not frequent

• Scan remaining itemsets• If they are not frequent remove

• Support can only go down by adding more items

Evaluate


Apriori ~ Example

29

minSup = 2

Transaction Item Sets1 {bread, cheese, eggs, jam}2 {bread, eggs, butter }3 {bread, cheese, tomatoes, milk}4 {bread, cheese, eggs}5 {cheese, eggs, milk}6 {bread, butter, milk}7 {eggs, milk, salt}

Single SupportItemsets

{bread} 5{cheese} 4{eggs} 5{jam} 1{butter} 2{tomatoes} 1{milk} 4{salt} 1

The initial candidate set is the set of single itemsets.Evaluate - Database is scanned to count the support of the itemsets in C1Jam, tomatoes and salt are not frequent and so are discarded.

Database

C1


Apriori ~ Example

30

The remaining itemsets are included in L1,(the full set of frequent itemsets of degree 1),

Single Itemsets

Support

{bread} 5{cheese} 4{eggs} 5{butter} 2{milk} 4

Join L1 is joined with itself to generate the candidates for the next pass, C2.

L1

Itemsets{bread, cheese}{bread, eggs}{bread, butter}{bread, milk}{cheese, eggs}{cheese, butter}{cheese, milk}{eggs, butter}{eggs, milk}{butter, milk}

Itemsets Support{bread, cheese} 3{bread, eggs} 3{bread, butter} 2{bread, milk} 2{cheese, eggs} 3{cheese, butter} 0{cheese, milk} 2{eggs, butter} 1{eggs, milk} 2{butter, milk} 1

C2 Prune. No need to prune C2 because all subsets must exist

Evaluate Database is scanned to count the support of the itemsets in C2


Apriori ~ Example31

L2Itemsets Support{bread, cheese} 3{bread, eggs} 3{bread, butter} 2{bread, milk} 2{cheese, eggs} 3{cheese, milk} 2{eggs, milk} 2

Itemsets{bread, cheese, eggs}{bread, cheese, butter}{bread, cheese, milk}{bread, eggs, butter}{bread, eggs, milk}{bread, butter, milk}{cheese, eggs, milk}

Join to give C3

Itemset Subsets{bread, cheese, eggs} {bread, cheese} {bread, eggs} {cheese, eggs}{bread, cheese, butter} {bread, cheese} {bread, butter} {cheese, butter}{bread, cheese, milk} {bread, cheese} {bread, milk} {cheese, milk}{bread, eggs, butter} {bread, eggs} {bread, butter} {eggs, butter}{bread, eggs, milk} {bread, eggs} {bread, milk} {eggs, milk}{bread, butter, milk} {bread, butter} { bread, milk} {butter, milk}{cheese, eggs, milk} {cheese, eggs} {cheese, milk} {eggs, milk}

Prune C3

Itemsets Support{bread, cheese, eggs} 2{bread, cheese, milk} 1{bread, eggs, milk} 0{cheese, eggs, milk} 1

Evaluate

L2


Apriori ~ Example

32

Only one remaining itemset. So the algorithm halts having found all frequent itemsets

Itemsets Support{bread, cheese, eggs} 2

L3

• In this example there were 8 items giving 28 = 256 possible itemsets.• Apriori found all frequent itemsets whilst only evaluating 22 itemsets.

• 8 of which were single items.• Database scanned three times• Of these 22, 13 were found to be frequent.

Itemsets Support{bread} 5{cheese} 4{eggs} 5{butter} 2{milk} 4{bread, cheese} 3{bread, eggs} 3

Itemsets Support{bread, butter} 2{bread, milk} 2{cheese, eggs} 3{cheese, milk} 2{eggs, milk} 2{bread, cheese, eggs} 2

Full Set of Frequent Itemsets


Apriori ~ Generate Rules

33

• Given a frequent item set, A, rules can be constructed of the form

{A - i} ➡ i• Where i is any item in A. • The support of the rule is sup{A}. i.e. the support for the itemsetA

• The antecedent of the rule is (A – i). This must be a frequent itemset because it is a subset of A. Therefore, its support is available from stage 1.

• Similarly, the consequent of the rule is i. This must also be frequent because it is a subset of A. Therefore, its support is available from stage 1.

• Therefore, we have the support for the antecedent, the consequent and the rule and so confidence and coverage can be calculated.


Apriori ~ Example

34

• The result of stage 1 of the example wasFrequent Itemsets Support{bread} 5{cheese} 4{eggs} 5{butter} 2{milk} 4{bread, cheese} 3{bread, eggs} 3{bread, butter} 2{bread, milk} 2{cheese, eggs} 3{cheese, milk} 2{eggs, milk} 2{bread, cheese, eggs}

2

For example;

itemset – {bread, cheese, eggs}

rule - bread, cheese � eggs

sup(ant) = sup(bread, cheese) =3 (available from list)

sup(con) = sup(eggs) = 5 ( available from list)

sup(rule) = sup(bread, cheese, eggs) = 2 (available from list)

confidence = 2/3 (67%)coverage = 2/5 ( 40%)


Complete Set of Rules

35

antecedent sup(ant) consequent sup(con) sup(rule) conf covbread 5 cheese 4 3 3/5 3/4bread eggsbread butterbread milkcheese eggscheese milkeggs milkbread, cheese eggsbread, eggs cheesecheese, eggs bread

Frequent Itemsets Support{bread} 5{cheese} 4{eggs} 5{butter} 2{milk} 4{bread, cheese} 3{bread, eggs} 3{bread, butter} 2{bread, milk} 2{cheese, eggs} 3{cheese, milk} 2{eggs, milk} 2{bread, cheese, eggs}

2


Apriori Summary

36

• Apriori implements efficient algorithms for sub itemset searching and database scanning.

• Apriori works well on transaction data when there are a large number of transactions and relatively few items in each transaction.

• Many databases do not have these properties and the use of Apriori can rapidly become intractable.

• SPSS Modeler 14 includes a General Rule Induction (GRI) node in that extends the principles of Apriori to numeric data.

• If large number of items with high support are created problems are often found to be intractable.


Learning Outcomes

• What are rules?• What is the format of classification rules?• How are rules evaluated?• How do rule induction algorithms operated?• What are association rules?• How does Apriori work?

B. de la Iglesia/2016 37

rule induction - plone sitemontesi/cbd/beatriz/session4-rule induction.pdf · outline •...

Documents