1 concept learning and the general-to-specific ordering
TRANSCRIPT
1
Concept Learning and the General-to-Specific Ordering
2Lehrstuhl für Informatik 2
Gabriella Kókai: Maschine Learning
Context➔ The learning task
✗ Notation✗ The inductive learning hypothesis
● Concept learning as search
● Find-S: finding a maximally specific hypothesis
● Version Space and the Candidate-Elimination algorithm
● Remarks
● Inductive Bias
● Summary
3Lehrstuhl für Informatik 2
Gabriella Kókai: Maschine Learning
The learning task
Concept learning: Deriving a boolean-valued function from training examples (inputs and output).
4Lehrstuhl für Informatik 2
Gabriella Kókai: Maschine Learning
Notation
Example: Days on which my friend Aldo enjoys his favorite water sport
Example Sky Air Temp Humidity Wind Water Forecast EnjoySport
1 Sunny Warm Normal Strong Warm Same Yes
2 Sunny Warm High Strong Warm Same Yes
3 Rainy Cold High Strong Warm Change No
4 Sunny Warm High Strong Cool Change Yes
Representation: hypothesis consists of constraints on the instance attributes
Most general hypothesis: every day a is positive example: <?,?,?,?,?>Most specific hypothesis: no day is a positive example: , , , ,
For each attribute:?: any value is acceptable : no value is acceptableExample: <?,Cold,High,?,?,?>
5Lehrstuhl für Informatik 2
Gabriella Kókai: Maschine Learning
Notation 2
Given Instance X: some possible days, each described by the attributes
(Sky with possible values Sunny, Cloudy and Rainy, AirTemp with Warm, Cold, ...)
Hypothesis H: each hypothesis is described by a conjunction of constraints on the attributes Sky, Airtemp, Humidity, Wind, Water and Forecast. The constraint contains '?', ' ' and/or specific values
Target concept c: EnjoySport Training examples D: positive and negative examples of target function
X 0,1
DetermineA hypothesis h in H such that h(x)=c(x) for all x in X
6Lehrstuhl für Informatik 2
Gabriella Kókai: Maschine Learning
Notation 3
Set of training examples: each consisting of an instance x from X along with its target concept value c(x)<x, c(x)>✗ c(x) = 1: pos. example or member of the target concept✗ c(x) =0: neg. example or non-member of the target concept
7Lehrstuhl für Informatik 2
Gabriella Kókai: Maschine Learning
The inductive learning hypothesis
The only information available about c is its value over the training examples
Therefore an inductive learning algorithm can at best guarantee that the output hypothesis fits the target concept over the training data
Fundamental assumption: the best hypothesis regarding unseen instances is the hypothesis that best fits the observed data
The Inductive Learning Hypothesis: any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples.
8Lehrstuhl für Informatik 2
Gabriella Kókai: Maschine Learning
Content The learning task➔ Concept learning as search✗ General-to-Specific Ordering of Hypothesis
Find-S: finding a maximally specific hypothesis Version Space and the Candidate-Elimination algorithm Remarks Inductive Bias Summary
9Lehrstuhl für Informatik 2
Gabriella Kókai: Maschine Learning
Concept learning as search
Note: by selecting a hypothesis representation the designer of the learning algorithm implicitly defines the space of all hypotheses, that the program can ever represent and therefore can ever learn.
Homework: Example: 3*2*2*2*2*2= 96 distinct instance Example: 5*4*4*4*4*4= 5120 syntactically distinct hypothesis Example: 1+(4*3*3*3*3*3)= 973 semantically distinct
hypothesis
10Lehrstuhl für Informatik 2
Gabriella Kókai: Maschine Learning
General-to-Specific Ordering of Hypothesis
Example general-to-specific: second is less constrained ->it classifies more positive instances
1h = Sunny,?,?,Strong,?,?
k jx X h x = 1 h x = 1
kh
kh
j g kh h
jhjh
jh kh
j g kh > hkh
j g k k g jh h ¬ h h
jh
In detail: for any instance x in X and hypothesis h in H, we say that x
satisfies h if and only if h(x) = 1 Let and be boolean-valued functions defined over X.
Then is more_general_than_or_equal_to (written ) if and only if
Let and be boolean-valued function defined over X. Then is more_general_than (written ) if and only if
2h = Sunny,?,?,?,?,?
11Lehrstuhl für Informatik 2
Gabriella Kókai: Maschine Learning
General-to-Specific Ordering of Hypothesis 2
The relations are defined independently of the target concept The relation more_general_than_or_equal_to defines a partial order
over the hypothesis space H Informally: there may be pairs of and , such that
and 1h 3h 1 g 3h h
3 g 1h h
g
12Lehrstuhl für Informatik 2
Gabriella Kókai: Maschine Learning
Example
Each hypothesis corresponds to some subset of X (the subset of instances that it classifies positive)
The arrow represents the more_general_than relation with the arrow pointing toward the less general hypothesis
13Lehrstuhl für Informatik 2
Gabriella Kókai: Maschine Learning
Find-S: finding a maximally specific hypothesis
Use the more_general_than partial ordering: Begin with the most specific possible hypothesis in H Generalise this hypothesis each time it fails to cover an observed
positive example
1. Initialise h to the most specific hypothesis in H2. For each positive training instance x For each attribute constraint in h If the constraint is satisfied by x Then do nothing Else replace in h by the next more general constraint that is satisfied by x3. Output hypothesis h
ia
ia
ia
14Lehrstuhl für Informatik 2
Gabriella Kókai: Maschine Learning
Find-S: finding a maximally specific hypothesis (example)
1. Step: 2.Step: 1.Example + 1 Step: 3. Step: substituting a '?' in place of any attribute value in h that is not
satisfied by new example 3. negative Example:FIND-S algorithm simply ignores every negative
example 4.Step:
h , , , , ,
h Sunny, Warm, Normal,Strong, Warm,Same
h Sunny, Warm,?,Strong, Warm,Same
h Sunny, Warm,?,Strong,?,?
15Lehrstuhl für Informatik 2
Gabriella Kókai: Maschine Learning
Remarks In the general case, as long as we assume that the hypothesis space H contains a
hypothesis that describes the true target concept c and that the training data contains no errors, then the current hypothesis h can never require a revision in response to a negative example.
Why? ✗ The current hypothesis is the most specific consistent one with the observed
positive example✗ Target concept c must be more_general_than_or_equal_to h ✗ But the target concept c will never cover a negative example, thus neither h.✗ Therefore no revision to h will be required in response to any negative
example In the literature there are many different algorithms that use the same
more_general_than partial ordering
gc h
16Lehrstuhl für Informatik 2
Gabriella Kókai: Maschine Learning
Content The learning task Concept learning as search Find-S: finding a maximally specific hypothesis➔ Version Space and the Candidate-Elimination Algorithm
➔ Key Idea: output a description of the set of all hypotheses consistent with the training examples
✗ Representation✗ The list-then-eliminate algorithm✗ A more compact representation of the version space✗ Candidate-elimination learning algorithm✗ Example
Remarks Inductive Bias Summary
17Lehrstuhl für Informatik 2
Gabriella Kókai: Maschine Learning
Representation Definition: a hypothesis h is consistent with the set of training
examples D if and only if h(x)=c(x) for each example in D
Difference between consistent and satisfies:x satisfies h <=> h(x) = 1, no matter if x is a positive or negative example of the target conceptx is consistent with h <=> h(x) =c(x),the hypothesis h has to classify the example correctly, corresponding to the target concept
Definition: The version space denoted , with respect to hypothesis space H and training examples D, is the subset of hypotheses from H consistent with the training examples in D.
x,c x
Consistent h, D x,c x D h x = c x
H,DVS
H,DVS h H | Consistent h,D
18Lehrstuhl für Informatik 2
Gabriella Kókai: Maschine Learning
The List-Then-Eliminate algorithm Simplest way to represent the version space: list all of its elements:
The List-Then-Eliminate Algorithm1. Version Space <- a list containing every hypothesis in H2. For each training example remove from VersionSpace any hypothesis h for which3. Output the list of hypotheses in VersionSpace
This algorithm cannot be applied whenever the hyp. space H is infinite Advantage:
It is guarantied to output all hypotheses consistent with the training data Disadvantage:
It's not efficiently computable (enumerating all hyp. in H).
x,c x h x c x
19Lehrstuhl für Informatik 2
Gabriella Kókai: Maschine Learning
A more compact representation of the version space
Representation: The version space is represented by its most general and least general members. These two members form the general and specific boundary sets which delimit the version space within the partially ordered hyp. space.
20Lehrstuhl für Informatik 2
Gabriella Kókai: Maschine Learning
Example of the List-Then-Eliminate Alg.
The result: Arrows indicate more_general_than relation Candidate-Elimination represents the version space by storing only its most general
members (labeled G) and its most specific ones (labeled S).With this two sets it is possible to enumerate all members of the version space needed for generating the hypothesis. The hypothesis we are looking for lies between these two sets in the general-to-specific partial ordering over hypothesis.
h = Sunny, Warm,?,Strong,?,?
21Lehrstuhl für Informatik 2
Gabriella Kókai: Maschine Learning
Definition of the Boundary Sets
Definition: the general boundary G, with respect to hyp. space H and training data D, is the set of maximally general members of H consistent with D
Definition: the specific boundary S, with respect to hyp. space H and training data D, is the set of minimally general (i.e. maximally specific) members of H consistent with D.
gG g H | Consistent g,D ¬ g' H g' > g Consistent g',D
gS s H | Consistent s,D ¬ s' H s < s' Consistent s',D
22Lehrstuhl für Informatik 2
Gabriella Kókai: Maschine Learning
Definition of the Boundary Sets (2)
Theorem: (version space representation theorem)let X be an arbitrary set of instances and let H be a set of boolean-valued hypotheses defined over X. Let be an arbitrary target concept defined over x and let D be an arbitrary set of training examples For all X, H, c and D such that S and G are well defined
c : X 0,1
x,c x
H,D g gVS = h H | s S g G g h s
23Lehrstuhl für Informatik 2
Gabriella Kókai: Maschine Learning
Candidate-Eliminations Learning algorithm
The algorithm computes the version space containing all hypotheses from H that are consistent with an observed sequence of training examples
Begin: version space set of all hypotheses in H G: most general hypothesis in H: S: most specific hypothesis in H:
Delimit the entire hypothesis space As each training example is considered the S and G boundary sets are
generalised and specialised respectively to eliminate from the version space any hypothesis found inconsistent with the new training example.
After all examples have been processed the computed version space contains all the hypotheses consistent with these examples and only these hypotheses.
0G ?,?,?,?,? 0S , , , ,
24Lehrstuhl für Informatik 2
Gabriella Kókai: Maschine Learning
Candidate-Elimination Learning algorithm 2
1. Initialise G to set of maximally general hypotheses in H2. Initialise S to set of maximally specific hypotheses in HFor each training example d, do If d is a positive example Remove from G any hyp. inconsistent with d For each hypothesis s in S that is not consistent with d Remove s from S Add to S all minimal generalisations h of s such that h is consistent with d and some member of G is more general than h Remove from S any hyp. that is more general than another hyp. in S If d is a negative example Remove from S any hyp. inconsistent with d For each hypothesis g in G that is not consistent with d Remove g from G Add to G all minimal specialisation h of g such that h is consistent with d and some member of S is more specific than h
Remove from G any hyp. That is less general than another hyp. in G
25Lehrstuhl für Informatik 2
Gabriella Kókai: Maschine Learning
An illustrative example 1st Example (Sunny,Warm, Normal, Strong, Warm, Same, Yes):
✗ S is overly specific: fails to cover this example✗ Moving boundary to least more general hypothesis✗ No update of the G boundary is needed
2nd Example (Sunny,Warm, High, Strong, Warm, Same, Yes):
✗ Generalising S, leaving G again unchanged
26Lehrstuhl für Informatik 2
Gabriella Kókai: Maschine Learning
An illustrative example 2
3rd Example (Rainy, Cold, High, Strong, Warm, Change, No):✗ It reveals that G is overly general (incorrectly predicts this example as positive
-> G must be specialised
There are 6 attributes That could be specified, why are there only 3 new hypothesis?
✗ For example is a min spec and correctly labels the new example but it is not included inWHY? This hypotheses is inconsistent with the previously encountered positive examples Algorithm determine this by noting h is not more general than the current specific boundary
h = ?,?, Normal,?,?,?
2G
2S
27Lehrstuhl für Informatik 2
Gabriella Kókai: Maschine Learning
An illustrative example 3
28Lehrstuhl für Informatik 2
Gabriella Kókai: Maschine Learning
An illustrative example 4 4th Example (Sunny, Warm, High, Strong, Cool, Change, Yes):
✗ Generalises S ✗ Removing one member of G because this member fails to cover the new positive
example. Why?✔ It cannot be specialised (it would not make it cover the new example)✔ It cannot be generalised
(definition G; any more general hyp will cover at least one negative training example)
✔ Therefore the hypothesis must be dropped from G
29Lehrstuhl für Informatik 2
Gabriella Kókai: Maschine Learning
An illustrative example 5 and delimit the version space of all hyp. consistent with the set of
incrementally observed training examples This learned version space is independent of the sequence in which the
training examples are presented
S 4 G 4
30Lehrstuhl für Informatik 2
Gabriella Kókai: Maschine Learning
Content
The learning task Concept learning as search Find-S: finding a maximally specific hypothesis Version Space and the Candidate-Elimination algorithm➔ Remarks✗ Does the candidate-elimination algorithm converge to a correct hypothesis?✗ What training examples should the learner request next?
Inductive Bias Summary
31Lehrstuhl für Informatik 2
Gabriella Kókai: Maschine Learning
Does the Candidate-Elimination algorithm converge to the correct hypothesis?
Convergence:✗ There are no errors in the training examples✗ There are some hyp. in H that correctly describe the target concept.
The target concept is exactly learned when the S and G boundary sets converge to a single identical hypothesis
What will happen if the training data contains errors✗ Assume example 2 incorrectly as negative => remove the correct target
concept (every hyp. inconsistent with the training examples is removed)✗ Given sufficient additional training data the learner will eventually detect
an inconsistency by noticing that S and G eventually converge to an empty version space.
32Lehrstuhl für Informatik 2
Gabriella Kókai: Maschine Learning
What training example should the learner request next?
Before, we assumed that the training examples are provided to the learner by some external teacher.
Definition: Query: the learner is allowed to conduct experiments in which it chooses the next instance, then obtains the correct classification for this instance from an external oracle.
✗ Example EnjoySport: what would be a general query-strategy?✗ Clearly the learner should choose an instance that would be classified
positive by some hypothesis, but negative by others.✗ Shrinking the version
space from six hypothesis to half this number✗ In general the optimal query-strategy for a concept learner is to generate
instances that satisfy exactly half the hypotheses in the current version space✗ When it is possible the correct target concept can found with
experiments
Sunny, Warm, Normal,Light, Warm,Same
2log VS
33Lehrstuhl für Informatik 2
Gabriella Kókai: Maschine Learning
Content
The learning task Concept learning as search Find-S: finding a maximally specific hypothesis Version Space and the Candidate-Elimination algorithm Remarks➔ Inductive Bias✗ A biased hypothesis space✗ An unbiased learner
Summary
34Lehrstuhl für Informatik 2
Gabriella Kókai: Maschine Learning
Inductive Bias
Questions:✗ What if the target concept is not contained in the hyp. space?✗ Can this difficulty be avoided by using a hyp. space that includes
every possible hypothesis?✗ How does the size of this hypothesis space influence the ability of
the algorithm to generalise the unobserved instance?✗ How does the size of the hypothesis space influence the number of
training examples that must be observed?
35Lehrstuhl für Informatik 2
Gabriella Kókai: Maschine Learning
A biased hypothesis space Suppose we wish to assure that the hyp. space contains the unkown target concept Obvious solution: Enrich the hyp. space to include every possible hyp. Example: EnjoySport restricts hyp. space to include only the conjunctions of the
attribute values In fact for these training examples the alg. would find that there are zero hyp. in the
version space
Example Sky Airtemp Humidity Wind Water Forecast EnjoySport1 Sunny Warm Normal Strong Cool Change Yes2 Cloudy Warm Normal Strong Cool Change Yes
3 Rainy Warm Normal Strong Cool Change No
' Why?: the most spec. hyp. consistent with the first two examples and representable in the given hyp space H is
' It is overly general: it incorrectly covers the third (negative example)Problem: bias: only the conjunctive hyp.
2S : ?, Warm, Normal,Strong,Cool,Change
36Lehrstuhl für Informatik 2
Gabriella Kókai: Maschine Learning
An Unbiased Learner Goal: Assuming the target concept is in the hypothesis space Solution: Provide a hyp. space which is capable of representing every
teachable concept=> capable of representing every possible subset of the instance X; in general the power set of X
Example: EnjoySport size of instance space X is 96 => distinct target concepts
Reformulate the example: defining H' corresponding to the powerset of X ✗ Allow arbitrary disjunction, conjunction and negation of the
earlier hyp.✗ One way to define such an H'
for example Sky = Sunny or Sky = Cloudy
96 282 10
Sunny,?,?,?,?,? Cloudy,?,?,?,?,?
37Lehrstuhl für Informatik 2
Gabriella Kókai: Maschine Learning
An Unbiased Learner 2 Candidate-Elimination alg. can be used BUT new problem:
✗ The concept learning alg. is now completely unable to generalize beyond the observed example
✗ Why? Suppose we represent 3 positive and 2 negative examples to the learner
✗ S will contain the hyp. which is just the disjunction of the positive examples
✗ G will cover the negative examples✗ S and G will be always the disjungtion/ negated disjungtion of the
training example => the only examples that will be unambiguously classified by S and G are the observed training examples themselves
1, 2, 3x x x 4, 5x x
1 2 3S: x x x
4 5G : not x x
38Lehrstuhl für Informatik 2
Gabriella Kókai: Maschine Learning
Summary Concept learning can be seen as a problem of searching through a large predefined space
of potential hypotheses The general_to_specific partial ordering of hyp. provides a useful structure for organizing
the search through the hyp. space The FIND-S performs a specific-to general search along one branch of the partial
ordering to find the most specific hyp. consistent with the training examples. The Candidate-Elimination incrementally computes the sets of maximally specific (S) and
maximally general (G) hyp. S and G delimit the entire set of hyp. consistent with the data and provide to identify the
target concept. By looking at S and G one can determine whether the learner has converged to the target concept, wether the training data is inconsistent, which next query would be the most useful to refine the version space.
Version space and the Candidate-Elimination alg. are not robust to noisy data. If the unknown target concept is not expressible in the provided hypothesis space that arises many problems.