1 concept learning and the general-to-specific ordering

1

Concept Learning and the General-to-Specific Ordering

2Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

Context➔ The learning task

✗ Notation✗ The inductive learning hypothesis

● Concept learning as search

● Find-S: finding a maximally specific hypothesis

● Version Space and the Candidate-Elimination algorithm

● Remarks

● Inductive Bias

● Summary



The learning task

Concept learning: Deriving a boolean-valued function from training examples (inputs and output).



Notation

Example: Days on which my friend Aldo enjoys his favorite water sport

Example Sky Air Temp Humidity Wind Water Forecast EnjoySport

1 Sunny Warm Normal Strong Warm Same Yes

2 Sunny Warm High Strong Warm Same Yes

3 Rainy Cold High Strong Warm Change No

4 Sunny Warm High Strong Cool Change Yes

Representation: hypothesis consists of constraints on the instance attributes

Most general hypothesis: every day a is positive example: <?,?,?,?,?>Most specific hypothesis: no day is a positive example: , , , ,

For each attribute:?: any value is acceptable : no value is acceptableExample: <?,Cold,High,?,?,?>



Notation 2

Given Instance X: some possible days, each described by the attributes

(Sky with possible values Sunny, Cloudy and Rainy, AirTemp with Warm, Cold, ...)

Hypothesis H: each hypothesis is described by a conjunction of constraints on the attributes Sky, Airtemp, Humidity, Wind, Water and Forecast. The constraint contains '?', ' ' and/or specific values

Target concept c: EnjoySport Training examples D: positive and negative examples of target function

X 0,1

DetermineA hypothesis h in H such that h(x)=c(x) for all x in X



Notation 3

Set of training examples: each consisting of an instance x from X along with its target concept value c(x)<x, c(x)>✗ c(x) = 1: pos. example or member of the target concept✗ c(x) =0: neg. example or non-member of the target concept



The inductive learning hypothesis

The only information available about c is its value over the training examples

Therefore an inductive learning algorithm can at best guarantee that the output hypothesis fits the target concept over the training data

Fundamental assumption: the best hypothesis regarding unseen instances is the hypothesis that best fits the observed data

The Inductive Learning Hypothesis: any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples.



Content The learning task➔ Concept learning as search✗ General-to-Specific Ordering of Hypothesis

Find-S: finding a maximally specific hypothesis Version Space and the Candidate-Elimination algorithm Remarks Inductive Bias Summary



Concept learning as search

Note: by selecting a hypothesis representation the designer of the learning algorithm implicitly defines the space of all hypotheses, that the program can ever represent and therefore can ever learn.

Homework: Example: 3*2*2*2*2*2= 96 distinct instance Example: 5*4*4*4*4*4= 5120 syntactically distinct hypothesis Example: 1+(4*3*3*3*3*3)= 973 semantically distinct

hypothesis



General-to-Specific Ordering of Hypothesis

Example general-to-specific: second is less constrained ->it classifies more positive instances

1h = Sunny,?,?,Strong,?,?

k jx X h x = 1 h x = 1

kh

kh

j g kh h

jhjh

jh kh

j g kh > hkh

j g k k g jh h ¬ h h

jh

In detail: for any instance x in X and hypothesis h in H, we say that x

satisfies h if and only if h(x) = 1 Let and be boolean-valued functions defined over X.

Then is more_general_than_or_equal_to (written ) if and only if

Let and be boolean-valued function defined over X. Then is more_general_than (written ) if and only if

2h = Sunny,?,?,?,?,?



General-to-Specific Ordering of Hypothesis 2

The relations are defined independently of the target concept The relation more_general_than_or_equal_to defines a partial order

over the hypothesis space H Informally: there may be pairs of and , such that

and 1h 3h 1 g 3h h

3 g 1h h

g



Example

Each hypothesis corresponds to some subset of X (the subset of instances that it classifies positive)

The arrow represents the more_general_than relation with the arrow pointing toward the less general hypothesis



Find-S: finding a maximally specific hypothesis

Use the more_general_than partial ordering: Begin with the most specific possible hypothesis in H Generalise this hypothesis each time it fails to cover an observed

positive example

1. Initialise h to the most specific hypothesis in H2. For each positive training instance x For each attribute constraint in h If the constraint is satisfied by x Then do nothing Else replace in h by the next more general constraint that is satisfied by x3. Output hypothesis h

ia

ia

ia



Find-S: finding a maximally specific hypothesis (example)

1. Step: 2.Step: 1.Example + 1 Step: 3. Step: substituting a '?' in place of any attribute value in h that is not

satisfied by new example 3. negative Example:FIND-S algorithm simply ignores every negative

example 4.Step:

h , , , , ,

h Sunny, Warm, Normal,Strong, Warm,Same

h Sunny, Warm,?,Strong, Warm,Same

h Sunny, Warm,?,Strong,?,?



Remarks In the general case, as long as we assume that the hypothesis space H contains a

hypothesis that describes the true target concept c and that the training data contains no errors, then the current hypothesis h can never require a revision in response to a negative example.

Why? ✗ The current hypothesis is the most specific consistent one with the observed

positive example✗ Target concept c must be more_general_than_or_equal_to h ✗ But the target concept c will never cover a negative example, thus neither h.✗ Therefore no revision to h will be required in response to any negative

example In the literature there are many different algorithms that use the same

more_general_than partial ordering

gc h



Content The learning task Concept learning as search Find-S: finding a maximally specific hypothesis➔ Version Space and the Candidate-Elimination Algorithm

➔ Key Idea: output a description of the set of all hypotheses consistent with the training examples

✗ Representation✗ The list-then-eliminate algorithm✗ A more compact representation of the version space✗ Candidate-elimination learning algorithm✗ Example

Remarks Inductive Bias Summary



Representation Definition: a hypothesis h is consistent with the set of training

examples D if and only if h(x)=c(x) for each example in D

Difference between consistent and satisfies:x satisfies h <=> h(x) = 1, no matter if x is a positive or negative example of the target conceptx is consistent with h <=> h(x) =c(x),the hypothesis h has to classify the example correctly, corresponding to the target concept

Definition: The version space denoted , with respect to hypothesis space H and training examples D, is the subset of hypotheses from H consistent with the training examples in D.

x,c x

Consistent h, D x,c x D h x = c x

H,DVS

H,DVS h H | Consistent h,D



The List-Then-Eliminate algorithm Simplest way to represent the version space: list all of its elements:

The List-Then-Eliminate Algorithm1. Version Space <- a list containing every hypothesis in H2. For each training example remove from VersionSpace any hypothesis h for which3. Output the list of hypotheses in VersionSpace

This algorithm cannot be applied whenever the hyp. space H is infinite Advantage:

It is guarantied to output all hypotheses consistent with the training data Disadvantage:

It's not efficiently computable (enumerating all hyp. in H).

x,c x h x c x



A more compact representation of the version space

Representation: The version space is represented by its most general and least general members. These two members form the general and specific boundary sets which delimit the version space within the partially ordered hyp. space.



Example of the List-Then-Eliminate Alg.

The result: Arrows indicate more_general_than relation Candidate-Elimination represents the version space by storing only its most general

members (labeled G) and its most specific ones (labeled S).With this two sets it is possible to enumerate all members of the version space needed for generating the hypothesis. The hypothesis we are looking for lies between these two sets in the general-to-specific partial ordering over hypothesis.

h = Sunny, Warm,?,Strong,?,?



Definition of the Boundary Sets

Definition: the general boundary G, with respect to hyp. space H and training data D, is the set of maximally general members of H consistent with D

Definition: the specific boundary S, with respect to hyp. space H and training data D, is the set of minimally general (i.e. maximally specific) members of H consistent with D.

gG g H | Consistent g,D ¬ g' H g' > g Consistent g',D

gS s H | Consistent s,D ¬ s' H s < s' Consistent s',D



Definition of the Boundary Sets (2)

Theorem: (version space representation theorem)let X be an arbitrary set of instances and let H be a set of boolean-valued hypotheses defined over X. Let be an arbitrary target concept defined over x and let D be an arbitrary set of training examples For all X, H, c and D such that S and G are well defined

c : X 0,1

x,c x

H,D g gVS = h H | s S g G g h s



Candidate-Eliminations Learning algorithm

The algorithm computes the version space containing all hypotheses from H that are consistent with an observed sequence of training examples

Begin: version space set of all hypotheses in H G: most general hypothesis in H: S: most specific hypothesis in H:

Delimit the entire hypothesis space As each training example is considered the S and G boundary sets are

generalised and specialised respectively to eliminate from the version space any hypothesis found inconsistent with the new training example.

After all examples have been processed the computed version space contains all the hypotheses consistent with these examples and only these hypotheses.

0G ?,?,?,?,? 0S , , , ,



Candidate-Elimination Learning algorithm 2

1. Initialise G to set of maximally general hypotheses in H2. Initialise S to set of maximally specific hypotheses in HFor each training example d, do If d is a positive example Remove from G any hyp. inconsistent with d For each hypothesis s in S that is not consistent with d Remove s from S Add to S all minimal generalisations h of s such that h is consistent with d and some member of G is more general than h Remove from S any hyp. that is more general than another hyp. in S If d is a negative example Remove from S any hyp. inconsistent with d For each hypothesis g in G that is not consistent with d Remove g from G Add to G all minimal specialisation h of g such that h is consistent with d and some member of S is more specific than h

Remove from G any hyp. That is less general than another hyp. in G



An illustrative example 1st Example (Sunny,Warm, Normal, Strong, Warm, Same, Yes):

✗ S is overly specific: fails to cover this example✗ Moving boundary to least more general hypothesis✗ No update of the G boundary is needed

2nd Example (Sunny,Warm, High, Strong, Warm, Same, Yes):

✗ Generalising S, leaving G again unchanged



An illustrative example 2

3rd Example (Rainy, Cold, High, Strong, Warm, Change, No):✗ It reveals that G is overly general (incorrectly predicts this example as positive

-> G must be specialised

There are 6 attributes That could be specified, why are there only 3 new hypothesis?

✗ For example is a min spec and correctly labels the new example but it is not included inWHY? This hypotheses is inconsistent with the previously encountered positive examples Algorithm determine this by noting h is not more general than the current specific boundary

h = ?,?, Normal,?,?,?

2G

2S



An illustrative example 3



An illustrative example 4 4th Example (Sunny, Warm, High, Strong, Cool, Change, Yes):

✗ Generalises S ✗ Removing one member of G because this member fails to cover the new positive

example. Why?✔ It cannot be specialised (it would not make it cover the new example)✔ It cannot be generalised

(definition G; any more general hyp will cover at least one negative training example)

✔ Therefore the hypothesis must be dropped from G



An illustrative example 5 and delimit the version space of all hyp. consistent with the set of

incrementally observed training examples This learned version space is independent of the sequence in which the

training examples are presented

S 4 G 4



Content

The learning task Concept learning as search Find-S: finding a maximally specific hypothesis Version Space and the Candidate-Elimination algorithm➔ Remarks✗ Does the candidate-elimination algorithm converge to a correct hypothesis?✗ What training examples should the learner request next?

Inductive Bias Summary



Does the Candidate-Elimination algorithm converge to the correct hypothesis?

Convergence:✗ There are no errors in the training examples✗ There are some hyp. in H that correctly describe the target concept.

The target concept is exactly learned when the S and G boundary sets converge to a single identical hypothesis

What will happen if the training data contains errors✗ Assume example 2 incorrectly as negative => remove the correct target

concept (every hyp. inconsistent with the training examples is removed)✗ Given sufficient additional training data the learner will eventually detect

an inconsistency by noticing that S and G eventually converge to an empty version space.



What training example should the learner request next?

Before, we assumed that the training examples are provided to the learner by some external teacher.

Definition: Query: the learner is allowed to conduct experiments in which it chooses the next instance, then obtains the correct classification for this instance from an external oracle.

✗ Example EnjoySport: what would be a general query-strategy?✗ Clearly the learner should choose an instance that would be classified

positive by some hypothesis, but negative by others.✗ Shrinking the version

space from six hypothesis to half this number✗ In general the optimal query-strategy for a concept learner is to generate

instances that satisfy exactly half the hypotheses in the current version space✗ When it is possible the correct target concept can found with

experiments

Sunny, Warm, Normal,Light, Warm,Same

2log VS



Content

The learning task Concept learning as search Find-S: finding a maximally specific hypothesis Version Space and the Candidate-Elimination algorithm Remarks➔ Inductive Bias✗ A biased hypothesis space✗ An unbiased learner

Summary



Inductive Bias

Questions:✗ What if the target concept is not contained in the hyp. space?✗ Can this difficulty be avoided by using a hyp. space that includes

every possible hypothesis?✗ How does the size of this hypothesis space influence the ability of

the algorithm to generalise the unobserved instance?✗ How does the size of the hypothesis space influence the number of

training examples that must be observed?



A biased hypothesis space Suppose we wish to assure that the hyp. space contains the unkown target concept Obvious solution: Enrich the hyp. space to include every possible hyp. Example: EnjoySport restricts hyp. space to include only the conjunctions of the

attribute values In fact for these training examples the alg. would find that there are zero hyp. in the

version space

Example Sky Airtemp Humidity Wind Water Forecast EnjoySport1 Sunny Warm Normal Strong Cool Change Yes2 Cloudy Warm Normal Strong Cool Change Yes

3 Rainy Warm Normal Strong Cool Change No

' Why?: the most spec. hyp. consistent with the first two examples and representable in the given hyp space H is

' It is overly general: it incorrectly covers the third (negative example)Problem: bias: only the conjunctive hyp.

2S : ?, Warm, Normal,Strong,Cool,Change



An Unbiased Learner Goal: Assuming the target concept is in the hypothesis space Solution: Provide a hyp. space which is capable of representing every

teachable concept=> capable of representing every possible subset of the instance X; in general the power set of X

Example: EnjoySport size of instance space X is 96 => distinct target concepts

Reformulate the example: defining H' corresponding to the powerset of X ✗ Allow arbitrary disjunction, conjunction and negation of the

earlier hyp.✗ One way to define such an H'

for example Sky = Sunny or Sky = Cloudy

96 282 10

Sunny,?,?,?,?,? Cloudy,?,?,?,?,?



An Unbiased Learner 2 Candidate-Elimination alg. can be used BUT new problem:

✗ The concept learning alg. is now completely unable to generalize beyond the observed example

✗ Why? Suppose we represent 3 positive and 2 negative examples to the learner

✗ S will contain the hyp. which is just the disjunction of the positive examples

✗ G will cover the negative examples✗ S and G will be always the disjungtion/ negated disjungtion of the

training example => the only examples that will be unambiguously classified by S and G are the observed training examples themselves

1, 2, 3x x x 4, 5x x

1 2 3S: x x x

4 5G : not x x



Summary Concept learning can be seen as a problem of searching through a large predefined space

of potential hypotheses The general_to_specific partial ordering of hyp. provides a useful structure for organizing

the search through the hyp. space The FIND-S performs a specific-to general search along one branch of the partial

ordering to find the most specific hyp. consistent with the training examples. The Candidate-Elimination incrementally computes the sets of maximally specific (S) and

maximally general (G) hyp. S and G delimit the entire set of hyp. consistent with the data and provide to identify the

target concept. By looking at S and G one can determine whether the learner has converged to the target concept, wether the training data is inconsistent, which next query would be the most useful to refine the version space.

Version space and the Candidate-Elimination alg. are not robust to noisy data. If the unknown target concept is not expressible in the provided hypothesis space that arises many problems.

1 concept learning and the general-to-specific ordering

Documents

maschine learning

general hypothesis slide

learning task concept

best hypothesis

hypothesis representation

distinct hypothesis

hypothesis space h

specific possible hypothesis