rule-based classifiersmaryam/dmspring94/slides/4_rules.pdf · application of rule-based classifier...
TRANSCRIPT
Rule-Based Classifiers
Rule-Based Classifier
• Classify records by using a collection of “if … then …” rules
• Examples of classification rules:
(Blood Type=Warm) ∧ (Lay Eggs=Yes) → Birds
(Taxable Income < 50K) ∧ (Refund=Yes) → Cheat=No
• Rule: (Condition) → y
– where
• Condition is a conjunction of some attribute tests
• y is the class label
– LHS: rule antecedent or condition
– RHS: rule consequent
ExampleName Blood Type Give Birth Can Fly Live in Water Class
human warm yes no no mammalspython cold no no no reptilessalmon cold no no yes fisheswhale warm yes no yes mammalsfrog cold no no sometimes amphibianskomodo cold no no no reptilesbat warm yes yes no mammalspigeon warm no yes no birdscat warm yes no no mammalsleopard shark cold yes no yes fishesturtle cold no no sometimes reptilespenguin warm no no sometimes birdsporcupine warm yes no no mammals
R1: (Give Birth = no) ∧ (Can Fly = yes) → Birds
R2: (Give Birth = no) ∧ (Live in Water = yes) → Fishes
R3: (Give Birth = yes) ∧ (Blood Type = warm) → Mammals
R4: (Give Birth = no) ∧ (Can Fly = no) → Reptiles
R5: (Live in Water = sometimes) → Amphibians
porcupine warm yes no no mammalseel cold no no yes fishessalamander cold no no sometimes amphibiansgila monster cold no no no reptilesplatypus warm no no no mammalsowl warm no yes no birdsdolphin warm yes no yes mammalseagle warm no yes no birds
Application of Rule-Based Classifier
• A rule r covers an instance x if the attributes of the instance
satisfy the condition of the rule
R1: (Give Birth = no) ∧ (Can Fly = yes) → Birds
R2: (Give Birth = no) ∧ (Live in Water = yes) → Fishes
R3: (Give Birth = yes) ∧ (Blood Type = warm) → Mammals
R4: (Give Birth = no) ∧ (Can Fly = no) → ReptilesR4: (Give Birth = no) ∧ (Can Fly = no) → Reptiles
R5: (Live in Water = sometimes) → Amphibians
The rule R1 covers a hawk => Bird
The rule R3 covers the grizzly bear => Mammal
Name Blood Type Give Birth Can Fly Live in Water Class
hawk warm no yes no ?grizzly bear warm yes no no ?
Rule Coverage and Accuracy
Measures to evaluate the quality
of a classification rule:
• Coverage of a rule
Fraction of records that satisfy
the antecedent of a rule
Tid Refund Marital Status
Taxable Income Class
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes the antecedent of a rule
• Accuracy of a rule
Fraction of records that satisfy
both the antecedent and
consequent of a rule (over those
that satisfy the antecedent)
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes 10
(Status=Single) →→→→ No
Coverage = 40%, Accuracy = 50%
Decision Trees vs. rulesFrom trees to rules.
• Easy: converting a tree into a set of rules
– One rule for each leaf
– Antecedent contains a condition for every node on the path from
the root to the leaf
– Consequent is the class assigned by the leaf
– Straightforward, but rule set might be overly complex– Straightforward, but rule set might be overly complex
Decision Trees vs. rulesFrom rules to trees
• More difficult: transforming a rule set into a tree
– Tree cannot easily express disjunction among different rules
• Example:
If a and b then xIf a and b then x
If c and d then x
– Corresponding tree contains identical subtrees (“replicated subtreeproblem”)
A tree for a simple disjunction
How does Rule-based Classifier Work?R1: (Give Birth = no) ∧ (Can Fly = yes) → Birds
R2: (Give Birth = no) ∧ (Live in Water = yes) → Fishes
R3: (Give Birth = yes) ∧ (Blood Type = warm) → Mammals
R4: (Give Birth = no) ∧ (Can Fly = no) → Reptiles
R5: (Live in Water = sometimes) → Amphibians
Name Blood Type Give Birth Can Fly Live in Water Class
A lemur triggers rule R3 only, so it is classified as a mammal
A turtle triggers both R4 and R5
A dogfish shark triggers none of the rules
Name Blood Type Give Birth Can Fly Live in Water Class
lemur warm yes no no ?turtle cold no no sometimes ?dogfish shark cold yes no yes ?
Desiderata for Rule-Based Classifier
• Mutually exclusive rules
– No two rules are triggered by the same record.
– This ensures that every record is covered by at most one rule.
• Exhaustive rules
– There exists a rule for each combination of attribute values.– There exists a rule for each combination of attribute values.
– This ensures that every record is covered by at least one rule.
Together these properties ensure that every record is covered by
exactly one rule.
Rules
• Non mutually exclusive rules
– A record may trigger more than one rule
– Solution: Order rules set in decreasing order of their quality
(e.g. based on their coverage or accuracy)
• Non exhaustive rules• Non exhaustive rules
– A record may not trigger any rules
– Solution: Use a default class
(Default class is the frequently occurring class in training set)
• When a test record is presented to the classifier
– It is assigned to the class label of the highest ranked rule it has
triggered
– If none of the rules fired, it is assigned to the default class
Example: generating a rule
y
x
a
b b
b
b
b
bb
b
b bb
b
bb
aa
aa
a y
a
b b
b
b
b
bb
b
b bb
b
bb
aa
aa
a
x
2·6
y
a
b b
b
b
b
bb
b
b bb
b
bb
aa
aa
a
x1·2
• Consider one class first, e.g. class b
Start with an empty rule: if ? Then b
– Possible rule for class “b”:
If x ≤≤≤≤ 1.2 then class = b
– More rules could be added for “perfect” rule set
If x > 1.2 and y ≤≤≤≤ 2.6 then class = b
xx
1·2x
1·2
A simple covering algorithm
• Concentrates on one class at a time,
disregarding what happens to the other
classes.
• Generates a rule by adding tests
(conditions) to the rule one by one that
space of
examples
rule so far
rule after
adding new
term
(conditions) to the rule one by one that
maximize rule’s accuracy.
• Each new test (growing
the rule) reduces rule’s coverage.
Selecting a test
• Goal: maximizing the accuracy of the rule on focus
– t: total number of instances covered by rule
– p: positive (correct) examples of the class covered by rule
Select a test that maximizes the ratio p/t
• We are finished with an specific rule when
– p/t = 1 or
– the set of instances can’t be split any further (no condition
left).
Example: contact lenses dataAge Spectacle
prescriptionAstigmatism Tear production rate Recommended
Lensesyoung myope no reduced noneyoung myope no normal softyoung myope yes reduced noneyoung myope yes normal hardyoung hypermetrope no reduced noneyoung hypermetrope no normal softyoung hypermetrope yes reduced noneyoung hypermetrope yes normal hardpre-presbyopic myope no reduced nonepre-presbyopic myope no normal softpre-presbyopic myope no normal softpre-presbyopic myope yes reduced nonepre-presbyopic myope yes normal hardpre-presbyopic hypermetrope no reduced nonepre-presbyopic hypermetrope no normal softpre-presbyopic hypermetrope yes reduced nonepre-presbyopic hypermetrope yes normal nonepresbyopic myope no reduced nonepresbyopic myope no normal nonepresbyopic myope yes reduced nonepresbyopic myope yes normal hardpresbyopic hypermetrope no reduced nonepresbyopic hypermetrope no normal softpresbyopic hypermetrope yes reduced nonepresbyopic hypermetrope yes normal none
Example: contact lenses data
Accuracy of
candidate rules
The numbers on the right show the fraction of “correct” instances in
the set singled out by that given condition.
In this case, correct means that their recommendation is “hard.”
Modified rule and resulting data
The rule isn’t very accurate, getting only 4 out of 12 that it covers.
So, it needs further refinement.
Further refinement
Modified rule and resulting data
Should we stop here? Perhaps. But let’s say we are going for exact
rules, no matter how complex they become.
So, let’s refine further.
Further refinement
NOTE: Tie between the first and the forth test
We choose the one with greater coverage
The result
These two rules cover all “hard lenses”:
Process is repeated with other two classes, with the initial 24 instances
Pseudo-code for PRISM
For each class C
Initialize E to the instance set
While E contains instances in class C
- Create a rule R with an empty left-hand side that predicts class C
- Until R is perfect (or there are no more attributes to use) do
For each attribute A not mentioned in R, and each value v,
Heuristic: order C in ascending order of occurrence.
For each attribute A not mentioned in R, and each value v,
Consider adding the condition A = v to the left-hand side of R
Select A and v to maximize the accuracy p/t
(break ties by choosing the condition with the largest t)
Add A = v to R
- Remove the instances covered by R from E
Note: the algorithm uses tests Attribute = value (for discrete attributes)
Attribute ≤ value (for numeric ones)
Attribute > value (for numeric ones)
RIPPER Algorithm is
similar. It uses instead
of p/t the info gain.
Separate and conquer
• Methods like PRISM are separate-and-conquer algorithms:
This method concentrates on one class at a time,
disregarding what happens to the other classes.
• Difference with divide-and-conquer method of decision trees:• Difference with divide-and-conquer method of decision trees:
Decision tree inducer maximizes overall purity for all classes in
one split,
adds tests to maximize the separation among classes