# learning set of rules. lehrstuhl für informatik 2 gabriella kókai: maschine learning 2 content ➔...

Post on 19-Dec-2015

213 views

Category:

## Documents

Embed Size (px)

TRANSCRIPT

• Slide 1
• Learning set of rules
• Slide 2
• Lehrstuhl fr Informatik 2 Gabriella Kkai: Maschine Learning 2 Content Introduction Sequential Covering Algorithms Learning Rule Sets: Summary Learning First-Order Rule Learning Sets of First-Order Rules: FOIL Summary
• Slide 3
• Lehrstuhl fr Informatik 2 Gabriella Kkai: Maschine Learning 3 Introduction If-Then Rules Are very expressive Easy to understand Rules with variables: Horn-clauses Set of Horn-clauses build up a PROLOG program Learning of Horn-clauses: Inductive Logic Programming (ILP) Example first-order rule set for the target concept Ancestor IF Parent (x,y) THEN Ancestor (x,y) IF Parent(x,z) Ancestor(z,y) THEN Ancestor(x,y)
• Slide 4
• Lehrstuhl fr Informatik 2 Gabriella Kkai: Maschine Learning 4 Introduction 2 GOAL: Learning a target function as a set of IF-THEN rules BEFORE: Learning with decision trees Learning the decision tree Translate the tree into a set of IF-THEN rules (for each leaf one rule) OTHER POSSIBILITY: Learning with genetic algorithms Each set of rule is coded as a bitvector Several genetic operators are used on the hypothesis space TODAY AND HERE: First: Learning rules in propositional form Second: Learning rules in first-order form (Horn clauses which include variables) Sequential search for rules, one after the other
• Slide 5
• Lehrstuhl fr Informatik 2 Gabriella Kkai: Maschine Learning 5 Content Introduction Sequential Covering Algorithms General to Specific Beam Search Variations Learning Rule Sets: Summary Learning First-Order Rule Learning Sets of First-Order Rules: FOIL Summary
• Slide 6
• Lehrstuhl fr Informatik 2 Gabriella Kkai: Maschine Learning 6 Sequential Covering Algorithms Goal of such an algorithm: Learning a disjunct set of rules, which defines a preferably good classification of the training data Principle: Learning rule sets based on the strategy of learning one rule, removing the examples it covers, then iterating this process. Requirement for the Learn-One-Rule method: As Input it accepts a set of positive and negative training examples As Output it delivers a single rule that covers many of the positive examples and maybe a few of the negative examples Required: The output rule has a high accuracy but not necessarily a high coverage
• Slide 7
• Lehrstuhl fr Informatik 2 Gabriella Kkai: Maschine Learning 7 Sequential Covering Algorithms 2 Procedure: Learning set of rules invokes the Learn-One-Rule method on all of the available training examples Remove every positive example covered by the rule Eventually short the final set of the rules: more accurate rules can be considered first Greedy search: It is not guaranteed to find the smallest or best set of rules that covers the training example.
• Slide 8
• Lehrstuhl fr Informatik 2 Gabriella Kkai: Maschine Learning 8 Sequential Covering Algorithms 3 SequentialCovering( target_attribute, attributes, examples, threshold ) learned_rules { } rule LearnOneRule( target_attribute, attributes, examples ) while (Performance( rule, examples ) > threshold ) do learned_rules learned_rules + rule examples examples - { examples correctly classified by rule } rule LearnOneRule( target_attribute, attributes, examples ) learned_rules sort learned_rules according to Performance over examples return learned_rules
• Slide 9
• Lehrstuhl fr Informatik 2 Gabriella Kkai: Maschine Learning 9 General to Specific Beam Search Specialising search Organises a hypothesis space search in general the same fashion as the ID3, but follows only the most promising branch of the tree at each step Begin with the most general rule (no/empty precondition) Follow the most promising branch: Greedily adding the attribute test that most improves the measured performance of the rule over the training example Greedy depth-first search with no backtracking Danger of sub-optimal choice Reduce the risk: Beam Search (CN2-algorithm) Algorithm maintains the list of the k best candidates In each search step, descendants are generated for each of these k-best candidates The resulting set is then reduced to the k most promising members
• Slide 10
• Lehrstuhl fr Informatik 2 Gabriella Kkai: Maschine Learning 10 General to Specific Beam Search 2 Learning with decision tree
• Slide 11
• Lehrstuhl fr Informatik 2 Gabriella Kkai: Maschine Learning 11 General to Specific Beam Search 3
• Slide 12
• Lehrstuhl fr Informatik 2 Gabriella Kkai: Maschine Learning 12 General to Specific Beam Search 4 LearnOneRule( target_attribute, attributes, examples, k ) Initialise best_hypothesis to the most general hypothesis Initialise candidate_hypotheses to the set { best_hypothesis } while ( candidate_hypothesis is not empty ) do 1. Generate the next more-specific candidate_hypothesis 2. Update best_hypothesis 3. Update candidate_hypothesis return a rule of the form IF best_hypothesis THEN prediction where prediction is the most frequent value of target_attribute among those examples that match best_hypothesis. Performance( h, examples, target_attribute ) h_examples the subset of examples that match h return -Entropy( h_examples ), where Entropy is with respect to target_attribute The CN2-Algorithm
• Slide 13
• Lehrstuhl fr Informatik 2 Gabriella Kkai: Maschine Learning 13 General to Specific Beam Search 5 Generate the next more specific candidate_hypothesis ' Update best_hypothesis all_constraints set of all constraints (a = v), where a attributes and v is a value of a occuring in the current set of examples new_candidate_hypothesis for each h in candidate_hypotheses, for each c in all_constraints create a specialisation of h by adding the constraint c Remove from new_candidate_hypothesis any hypotheses which are duplicate, inconsistent or not maximally specific for all h in new_candidate_hypothesis do if statistically significant when tested on examples Performance( h, examples, target_attribute ) > Performance( best_hypothesis, examples, target_attribute ) ) then best_hypothesis h
• Slide 14
• Lehrstuhl fr Informatik 2 Gabriella Kkai: Maschine Learning 14 General to Specific Beam Search 6 Update the candidate-hypothesis ' Performance function guides the search in the Learn-One -Rule O s: the current set of training examples O c: the number of possible values of the target attribute O : part of the examples, which are classified with the i th. value candidate_hypothesis the k best members of new_candidate_hypothesis, according to Performance function
• Slide 15
• Lehrstuhl fr Informatik 2 Gabriella Kkai: Maschine Learning 15 Example for CN2-Algorithm LearnOneRule(EnjoySport, {Sky, AirTemp, Humidity, Wind, Water, Forecast, EnjoySport}, examples, 2) best_hypothesis = candidate_hypotheses = {} all_constraints = {Sky=Sunny, Sky=Rainy, AirTemp=Warm, AirTemp=Cold, Humidity=Normal, Humidity=High, Wind=Strong, Water=Warm, Water=Cool, Forecast=Same, Forecast=Change} Performance = nc / n n = Number of examples, covered by the rule nc = Number of examples covered by the rule and classification is correct a Horn clause can equivalently be written as A substitution is any function that replaces variables by terms. For example, the substitution replaces the variable x by the term 3 and replaces the variable y by the term z. Given a substitution and a literal L => denotes the result of applying the substitution to L. An unifying substitution for two literals and is any substitution such that">
• Lehrstuhl fr Informatik 2 Gabriella Kkai: Maschine Learning 25 Terminology 2 For any literals A and B, the expression is equivalent to, and the expression is equivalent to => a Horn clause can equivalently be written as A substitution is any function that replaces variables by terms. For example, the substitution replaces the variable x by the term 3 and replaces the variable y by the term z. Given a substitution and a literal L => denotes the result of applying the substitution to L. An unifying substitution for two literals and is any substitution such that
• Slide 26
• Lehrstuhl fr Informatik 2 Gabriella Kkai: Maschine Learning 26 Content Introduction Sequential Covering Algorithms Learning Rule Sets: Summary Learning First-Order Rule Learning Sets of First-Order Rules:FOIL Generating Candidate Specialisation in FOIL Guiding Search in FOIL Learning Recursive Rule Sets Summary of FOIL Summary
• Slide 27
• Lehrstuhl fr Informatik 2 Gabriella Kkai: Maschine Learning 27 Learning Sets of First-Order Rules:FOIL FOIL( target_predicate, predicates, examples ) pos those examples, for which target_predicate is true Neg those examples, for which target_predicate is false learned_rules { } while ( pos ) do /* learn a new rule */ new_rule the rule that predicts target_predicate with no preconditions new_rule_neg neg while ( new_rule_neg ) do /* Add a new literal to specialize new_rule */ candidate_literals generate candidate new literals for new_rule, based on predicates best_literal argmax L candidate_literals FoilGain( L, new_rule ) add best_literal to preconditions of new_rule new_rule_neg subset of new_rule_neg that satisfies new_rule s preconditions learned_rules learned_rules + new_rule pos pos - { members of pos covered by new_rule } return learned_rules
• Slide 28
• Lehrstuhl fr Informatik 2 Gabriella Kkai: Maschine Learning 28 Learning Sets of First-Order Rules:FOIL 2 External loop : Specific-to-General Generalise the set of rules: Add a new rule to the existing hypothesis Each