rule-based classifiersmaryam/dmspring94/slides/4_rules.pdf · application of rule-based classifier...

Rule-Based Classifiers

Rule-Based Classifier

• Classify records by using a collection of “if … then …” rules

• Examples of classification rules:

(Blood Type=Warm) ∧ (Lay Eggs=Yes) → Birds

(Taxable Income < 50K) ∧ (Refund=Yes) → Cheat=No

• Rule: (Condition) → y

– where

• Condition is a conjunction of some attribute tests

• y is the class label

– LHS: rule antecedent or condition

– RHS: rule consequent

ExampleName Blood Type Give Birth Can Fly Live in Water Class

human warm yes no no mammalspython cold no no no reptilessalmon cold no no yes fisheswhale warm yes no yes mammalsfrog cold no no sometimes amphibianskomodo cold no no no reptilesbat warm yes yes no mammalspigeon warm no yes no birdscat warm yes no no mammalsleopard shark cold yes no yes fishesturtle cold no no sometimes reptilespenguin warm no no sometimes birdsporcupine warm yes no no mammals

R1: (Give Birth = no) ∧ (Can Fly = yes) → Birds

R2: (Give Birth = no) ∧ (Live in Water = yes) → Fishes

R3: (Give Birth = yes) ∧ (Blood Type = warm) → Mammals

R4: (Give Birth = no) ∧ (Can Fly = no) → Reptiles

R5: (Live in Water = sometimes) → Amphibians

porcupine warm yes no no mammalseel cold no no yes fishessalamander cold no no sometimes amphibiansgila monster cold no no no reptilesplatypus warm no no no mammalsowl warm no yes no birdsdolphin warm yes no yes mammalseagle warm no yes no birds

Application of Rule-Based Classifier

• A rule r covers an instance x if the attributes of the instance

satisfy the condition of the rule

R1: (Give Birth = no) ∧ (Can Fly = yes) → Birds



R4: (Give Birth = no) ∧ (Can Fly = no) → ReptilesR4: (Give Birth = no) ∧ (Can Fly = no) → Reptiles


The rule R1 covers a hawk => Bird

The rule R3 covers the grizzly bear => Mammal

Name Blood Type Give Birth Can Fly Live in Water Class

hawk warm no yes no ?grizzly bear warm yes no no ?

Rule Coverage and Accuracy

Measures to evaluate the quality

of a classification rule:

• Coverage of a rule

Fraction of records that satisfy

the antecedent of a rule

Tid Refund Marital Status

Taxable Income Class

1 Yes Single 125K No

2 No Married 100K No

3 No Single 70K No

4 Yes Married 120K No

5 No Divorced 95K Yes the antecedent of a rule

• Accuracy of a rule

Fraction of records that satisfy

both the antecedent and

consequent of a rule (over those

that satisfy the antecedent)

6 No Married 60K No

7 Yes Divorced 220K No

8 No Single 85K Yes

9 No Married 75K No

10 No Single 90K Yes 10

(Status=Single) →→→→ No

Coverage = 40%, Accuracy = 50%

Decision Trees vs. rulesFrom trees to rules.

• Easy: converting a tree into a set of rules

– One rule for each leaf

– Antecedent contains a condition for every node on the path from

the root to the leaf

– Consequent is the class assigned by the leaf

– Straightforward, but rule set might be overly complex– Straightforward, but rule set might be overly complex

Decision Trees vs. rulesFrom rules to trees

• More difficult: transforming a rule set into a tree

– Tree cannot easily express disjunction among different rules

• Example:

If a and b then xIf a and b then x

If c and d then x

– Corresponding tree contains identical subtrees (“replicated subtreeproblem”)

A tree for a simple disjunction

How does Rule-based Classifier Work?R1: (Give Birth = no) ∧ (Can Fly = yes) → Birds



R4: (Give Birth = no) ∧ (Can Fly = no) → Reptiles



A lemur triggers rule R3 only, so it is classified as a mammal

A turtle triggers both R4 and R5

A dogfish shark triggers none of the rules


lemur warm yes no no ?turtle cold no no sometimes ?dogfish shark cold yes no yes ?

Desiderata for Rule-Based Classifier

• Mutually exclusive rules

– No two rules are triggered by the same record.

– This ensures that every record is covered by at most one rule.

• Exhaustive rules

– There exists a rule for each combination of attribute values.– There exists a rule for each combination of attribute values.

– This ensures that every record is covered by at least one rule.

Together these properties ensure that every record is covered by

exactly one rule.

Rules

• Non mutually exclusive rules

– A record may trigger more than one rule

– Solution: Order rules set in decreasing order of their quality

(e.g. based on their coverage or accuracy)

• Non exhaustive rules• Non exhaustive rules

– A record may not trigger any rules

– Solution: Use a default class

(Default class is the frequently occurring class in training set)

• When a test record is presented to the classifier

– It is assigned to the class label of the highest ranked rule it has

triggered

– If none of the rules fired, it is assigned to the default class

Example: generating a rule

y

x

a

b b

b

b

b

bb

b

b bb

b

bb

aa

aa

a y

a

b b

b

b

b

bb

b

b bb

b

bb

aa

aa

a

x

2·6

y

a

b b

b

b

b

bb

b

b bb

b

bb

aa

aa

a

x1·2

• Consider one class first, e.g. class b

Start with an empty rule: if ? Then b

– Possible rule for class “b”:

If x ≤≤≤≤ 1.2 then class = b

– More rules could be added for “perfect” rule set

If x > 1.2 and y ≤≤≤≤ 2.6 then class = b

xx

1·2x

1·2

A simple covering algorithm

• Concentrates on one class at a time,

disregarding what happens to the other

classes.

• Generates a rule by adding tests

(conditions) to the rule one by one that

space of

examples

rule so far

rule after

adding new

term

(conditions) to the rule one by one that

maximize rule’s accuracy.

• Each new test (growing

the rule) reduces rule’s coverage.

Selecting a test

• Goal: maximizing the accuracy of the rule on focus

– t: total number of instances covered by rule

– p: positive (correct) examples of the class covered by rule

Select a test that maximizes the ratio p/t

• We are finished with an specific rule when

– p/t = 1 or

– the set of instances can’t be split any further (no condition

left).

Example: contact lenses dataAge Spectacle

prescriptionAstigmatism Tear production rate Recommended

Lensesyoung myope no reduced noneyoung myope no normal softyoung myope yes reduced noneyoung myope yes normal hardyoung hypermetrope no reduced noneyoung hypermetrope no normal softyoung hypermetrope yes reduced noneyoung hypermetrope yes normal hardpre-presbyopic myope no reduced nonepre-presbyopic myope no normal softpre-presbyopic myope no normal softpre-presbyopic myope yes reduced nonepre-presbyopic myope yes normal hardpre-presbyopic hypermetrope no reduced nonepre-presbyopic hypermetrope no normal softpre-presbyopic hypermetrope yes reduced nonepre-presbyopic hypermetrope yes normal nonepresbyopic myope no reduced nonepresbyopic myope no normal nonepresbyopic myope yes reduced nonepresbyopic myope yes normal hardpresbyopic hypermetrope no reduced nonepresbyopic hypermetrope no normal softpresbyopic hypermetrope yes reduced nonepresbyopic hypermetrope yes normal none

Example: contact lenses data

Accuracy of

candidate rules

The numbers on the right show the fraction of “correct” instances in

the set singled out by that given condition.

In this case, correct means that their recommendation is “hard.”

Modified rule and resulting data

The rule isn’t very accurate, getting only 4 out of 12 that it covers.

So, it needs further refinement.

Further refinement

Modified rule and resulting data

Should we stop here? Perhaps. But let’s say we are going for exact

rules, no matter how complex they become.

So, let’s refine further.

Further refinement

NOTE: Tie between the first and the forth test

We choose the one with greater coverage

The result

These two rules cover all “hard lenses”:

Process is repeated with other two classes, with the initial 24 instances

Pseudo-code for PRISM

For each class C

Initialize E to the instance set

While E contains instances in class C

- Create a rule R with an empty left-hand side that predicts class C

- Until R is perfect (or there are no more attributes to use) do

For each attribute A not mentioned in R, and each value v,

Heuristic: order C in ascending order of occurrence.

For each attribute A not mentioned in R, and each value v,

Consider adding the condition A = v to the left-hand side of R

Select A and v to maximize the accuracy p/t

(break ties by choosing the condition with the largest t)

Add A = v to R

- Remove the instances covered by R from E

Note: the algorithm uses tests Attribute = value (for discrete attributes)

Attribute ≤ value (for numeric ones)

Attribute > value (for numeric ones)

RIPPER Algorithm is

similar. It uses instead

of p/t the info gain.

Separate and conquer

• Methods like PRISM are separate-and-conquer algorithms:

This method concentrates on one class at a time,

disregarding what happens to the other classes.

• Difference with divide-and-conquer method of decision trees:• Difference with divide-and-conquer method of decision trees:

Decision tree inducer maximizes overall purity for all classes in

one split,

adds tests to maximize the separation among classes

rule-based classifiersmaryam/dmspring94/slides/4_rules.pdf · application of rule-based classifier...

Documents