1 concept learning and the general-to-specific ordering

38
1 Concept Learning and the General-to-Specific Ordering

Upload: logan-hunter

Post on 26-Mar-2015

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Concept Learning and the General-to-Specific Ordering

1

Concept Learning and the General-to-Specific Ordering

Page 2: 1 Concept Learning and the General-to-Specific Ordering

2Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

Context➔ The learning task

✗ Notation✗ The inductive learning hypothesis

● Concept learning as search

● Find-S: finding a maximally specific hypothesis

● Version Space and the Candidate-Elimination algorithm

● Remarks

● Inductive Bias

● Summary

Page 3: 1 Concept Learning and the General-to-Specific Ordering

3Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

The learning task

Concept learning: Deriving a boolean-valued function from training examples (inputs and output).

Page 4: 1 Concept Learning and the General-to-Specific Ordering

4Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

Notation

Example: Days on which my friend Aldo enjoys his favorite water sport

Example Sky Air Temp Humidity Wind Water Forecast EnjoySport

1 Sunny Warm Normal Strong Warm Same Yes

2 Sunny Warm High Strong Warm Same Yes

3 Rainy Cold High Strong Warm Change No

4 Sunny Warm High Strong Cool Change Yes

Representation: hypothesis consists of constraints on the instance attributes

Most general hypothesis: every day a is positive example: <?,?,?,?,?>Most specific hypothesis: no day is a positive example: , , , ,

For each attribute:?: any value is acceptable : no value is acceptableExample: <?,Cold,High,?,?,?>

Page 5: 1 Concept Learning and the General-to-Specific Ordering

5Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

Notation 2

Given Instance X: some possible days, each described by the attributes

(Sky with possible values Sunny, Cloudy and Rainy, AirTemp with Warm, Cold, ...)

Hypothesis H: each hypothesis is described by a conjunction of constraints on the attributes Sky, Airtemp, Humidity, Wind, Water and Forecast. The constraint contains '?', ' ' and/or specific values

Target concept c: EnjoySport Training examples D: positive and negative examples of target function

X 0,1

DetermineA hypothesis h in H such that h(x)=c(x) for all x in X

Page 6: 1 Concept Learning and the General-to-Specific Ordering

6Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

Notation 3

Set of training examples: each consisting of an instance x from X along with its target concept value c(x)<x, c(x)>✗ c(x) = 1: pos. example or member of the target concept✗ c(x) =0: neg. example or non-member of the target concept

Page 7: 1 Concept Learning and the General-to-Specific Ordering

7Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

The inductive learning hypothesis

The only information available about c is its value over the training examples

Therefore an inductive learning algorithm can at best guarantee that the output hypothesis fits the target concept over the training data

Fundamental assumption: the best hypothesis regarding unseen instances is the hypothesis that best fits the observed data

The Inductive Learning Hypothesis: any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples.

Page 8: 1 Concept Learning and the General-to-Specific Ordering

8Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

Content The learning task➔ Concept learning as search✗ General-to-Specific Ordering of Hypothesis

Find-S: finding a maximally specific hypothesis Version Space and the Candidate-Elimination algorithm Remarks Inductive Bias Summary

Page 9: 1 Concept Learning and the General-to-Specific Ordering

9Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

Concept learning as search

Note: by selecting a hypothesis representation the designer of the learning algorithm implicitly defines the space of all hypotheses, that the program can ever represent and therefore can ever learn.

Homework: Example: 3*2*2*2*2*2= 96 distinct instance Example: 5*4*4*4*4*4= 5120 syntactically distinct hypothesis Example: 1+(4*3*3*3*3*3)= 973 semantically distinct

hypothesis

Page 10: 1 Concept Learning and the General-to-Specific Ordering

10Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

General-to-Specific Ordering of Hypothesis

Example general-to-specific: second is less constrained ->it classifies more positive instances

1h = Sunny,?,?,Strong,?,?

k jx X h x = 1 h x = 1

kh

kh

j g kh h

jhjh

jh kh

j g kh > hkh

j g k k g jh h ¬ h h

jh

In detail: for any instance x in X and hypothesis h in H, we say that x

satisfies h if and only if h(x) = 1 Let and be boolean-valued functions defined over X.

Then is more_general_than_or_equal_to (written ) if and only if

Let and be boolean-valued function defined over X. Then is more_general_than (written ) if and only if

2h = Sunny,?,?,?,?,?

Page 11: 1 Concept Learning and the General-to-Specific Ordering

11Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

General-to-Specific Ordering of Hypothesis 2

The relations are defined independently of the target concept The relation more_general_than_or_equal_to defines a partial order

over the hypothesis space H Informally: there may be pairs of and , such that

and 1h 3h 1 g 3h h

3 g 1h h

g

Page 12: 1 Concept Learning and the General-to-Specific Ordering

12Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

Example

Each hypothesis corresponds to some subset of X (the subset of instances that it classifies positive)

The arrow represents the more_general_than relation with the arrow pointing toward the less general hypothesis

Page 13: 1 Concept Learning and the General-to-Specific Ordering

13Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

Find-S: finding a maximally specific hypothesis

Use the more_general_than partial ordering: Begin with the most specific possible hypothesis in H Generalise this hypothesis each time it fails to cover an observed

positive example

1. Initialise h to the most specific hypothesis in H2. For each positive training instance x For each attribute constraint in h If the constraint is satisfied by x Then do nothing Else replace in h by the next more general constraint that is satisfied by x3. Output hypothesis h

ia

ia

ia

Page 14: 1 Concept Learning and the General-to-Specific Ordering

14Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

Find-S: finding a maximally specific hypothesis (example)

1. Step: 2.Step: 1.Example + 1 Step: 3. Step: substituting a '?' in place of any attribute value in h that is not

satisfied by new example 3. negative Example:FIND-S algorithm simply ignores every negative

example 4.Step:

h , , , , ,

h Sunny, Warm, Normal,Strong, Warm,Same

h Sunny, Warm,?,Strong, Warm,Same

h Sunny, Warm,?,Strong,?,?

Page 15: 1 Concept Learning and the General-to-Specific Ordering

15Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

Remarks In the general case, as long as we assume that the hypothesis space H contains a

hypothesis that describes the true target concept c and that the training data contains no errors, then the current hypothesis h can never require a revision in response to a negative example.

Why? ✗ The current hypothesis is the most specific consistent one with the observed

positive example✗ Target concept c must be more_general_than_or_equal_to h ✗ But the target concept c will never cover a negative example, thus neither h.✗ Therefore no revision to h will be required in response to any negative

example In the literature there are many different algorithms that use the same

more_general_than partial ordering

gc h

Page 16: 1 Concept Learning and the General-to-Specific Ordering

16Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

Content The learning task Concept learning as search Find-S: finding a maximally specific hypothesis➔ Version Space and the Candidate-Elimination Algorithm

➔ Key Idea: output a description of the set of all hypotheses consistent with the training examples

✗ Representation✗ The list-then-eliminate algorithm✗ A more compact representation of the version space✗ Candidate-elimination learning algorithm✗ Example

Remarks Inductive Bias Summary

Page 17: 1 Concept Learning and the General-to-Specific Ordering

17Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

Representation Definition: a hypothesis h is consistent with the set of training

examples D if and only if h(x)=c(x) for each example in D

Difference between consistent and satisfies:x satisfies h <=> h(x) = 1, no matter if x is a positive or negative example of the target conceptx is consistent with h <=> h(x) =c(x),the hypothesis h has to classify the example correctly, corresponding to the target concept

Definition: The version space denoted , with respect to hypothesis space H and training examples D, is the subset of hypotheses from H consistent with the training examples in D.

x,c x

Consistent h, D x,c x D h x = c x

H,DVS

H,DVS h H | Consistent h,D

Page 18: 1 Concept Learning and the General-to-Specific Ordering

18Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

The List-Then-Eliminate algorithm Simplest way to represent the version space: list all of its elements:

The List-Then-Eliminate Algorithm1. Version Space <- a list containing every hypothesis in H2. For each training example remove from VersionSpace any hypothesis h for which3. Output the list of hypotheses in VersionSpace

This algorithm cannot be applied whenever the hyp. space H is infinite Advantage:

It is guarantied to output all hypotheses consistent with the training data Disadvantage:

It's not efficiently computable (enumerating all hyp. in H).

x,c x h x c x

Page 19: 1 Concept Learning and the General-to-Specific Ordering

19Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

A more compact representation of the version space

Representation: The version space is represented by its most general and least general members. These two members form the general and specific boundary sets which delimit the version space within the partially ordered hyp. space.

Page 20: 1 Concept Learning and the General-to-Specific Ordering

20Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

Example of the List-Then-Eliminate Alg.

The result: Arrows indicate more_general_than relation Candidate-Elimination represents the version space by storing only its most general

members (labeled G) and its most specific ones (labeled S).With this two sets it is possible to enumerate all members of the version space needed for generating the hypothesis. The hypothesis we are looking for lies between these two sets in the general-to-specific partial ordering over hypothesis.

h = Sunny, Warm,?,Strong,?,?

Page 21: 1 Concept Learning and the General-to-Specific Ordering

21Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

Definition of the Boundary Sets

Definition: the general boundary G, with respect to hyp. space H and training data D, is the set of maximally general members of H consistent with D

Definition: the specific boundary S, with respect to hyp. space H and training data D, is the set of minimally general (i.e. maximally specific) members of H consistent with D.

gG g H | Consistent g,D ¬ g' H g' > g Consistent g',D

gS s H | Consistent s,D ¬ s' H s < s' Consistent s',D

Page 22: 1 Concept Learning and the General-to-Specific Ordering

22Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

Definition of the Boundary Sets (2)

Theorem: (version space representation theorem)let X be an arbitrary set of instances and let H be a set of boolean-valued hypotheses defined over X. Let be an arbitrary target concept defined over x and let D be an arbitrary set of training examples For all X, H, c and D such that S and G are well defined

c : X 0,1

x,c x

H,D g gVS = h H | s S g G g h s

Page 23: 1 Concept Learning and the General-to-Specific Ordering

23Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

Candidate-Eliminations Learning algorithm

The algorithm computes the version space containing all hypotheses from H that are consistent with an observed sequence of training examples

Begin: version space set of all hypotheses in H G: most general hypothesis in H: S: most specific hypothesis in H:

Delimit the entire hypothesis space As each training example is considered the S and G boundary sets are

generalised and specialised respectively to eliminate from the version space any hypothesis found inconsistent with the new training example.

After all examples have been processed the computed version space contains all the hypotheses consistent with these examples and only these hypotheses.

0G ?,?,?,?,? 0S , , , ,

Page 24: 1 Concept Learning and the General-to-Specific Ordering

24Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

Candidate-Elimination Learning algorithm 2

1. Initialise G to set of maximally general hypotheses in H2. Initialise S to set of maximally specific hypotheses in HFor each training example d, do If d is a positive example Remove from G any hyp. inconsistent with d For each hypothesis s in S that is not consistent with d Remove s from S Add to S all minimal generalisations h of s such that h is consistent with d and some member of G is more general than h Remove from S any hyp. that is more general than another hyp. in S If d is a negative example Remove from S any hyp. inconsistent with d For each hypothesis g in G that is not consistent with d Remove g from G Add to G all minimal specialisation h of g such that h is consistent with d and some member of S is more specific than h

Remove from G any hyp. That is less general than another hyp. in G

Page 25: 1 Concept Learning and the General-to-Specific Ordering

25Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

An illustrative example 1st Example (Sunny,Warm, Normal, Strong, Warm, Same, Yes):

✗ S is overly specific: fails to cover this example✗ Moving boundary to least more general hypothesis✗ No update of the G boundary is needed

2nd Example (Sunny,Warm, High, Strong, Warm, Same, Yes):

✗ Generalising S, leaving G again unchanged

Page 26: 1 Concept Learning and the General-to-Specific Ordering

26Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

An illustrative example 2

3rd Example (Rainy, Cold, High, Strong, Warm, Change, No):✗ It reveals that G is overly general (incorrectly predicts this example as positive

-> G must be specialised

There are 6 attributes That could be specified, why are there only 3 new hypothesis?

✗ For example is a min spec and correctly labels the new example but it is not included inWHY? This hypotheses is inconsistent with the previously encountered positive examples Algorithm determine this by noting h is not more general than the current specific boundary

h = ?,?, Normal,?,?,?

2G

2S

Page 27: 1 Concept Learning and the General-to-Specific Ordering

27Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

An illustrative example 3

Page 28: 1 Concept Learning and the General-to-Specific Ordering

28Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

An illustrative example 4 4th Example (Sunny, Warm, High, Strong, Cool, Change, Yes):

✗ Generalises S ✗ Removing one member of G because this member fails to cover the new positive

example. Why?✔ It cannot be specialised (it would not make it cover the new example)✔ It cannot be generalised

(definition G; any more general hyp will cover at least one negative training example)

✔ Therefore the hypothesis must be dropped from G

Page 29: 1 Concept Learning and the General-to-Specific Ordering

29Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

An illustrative example 5 and delimit the version space of all hyp. consistent with the set of

incrementally observed training examples This learned version space is independent of the sequence in which the

training examples are presented

S 4 G 4

Page 30: 1 Concept Learning and the General-to-Specific Ordering

30Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

Content

The learning task Concept learning as search Find-S: finding a maximally specific hypothesis Version Space and the Candidate-Elimination algorithm➔ Remarks✗ Does the candidate-elimination algorithm converge to a correct hypothesis?✗ What training examples should the learner request next?

Inductive Bias Summary

Page 31: 1 Concept Learning and the General-to-Specific Ordering

31Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

Does the Candidate-Elimination algorithm converge to the correct hypothesis?

Convergence:✗ There are no errors in the training examples✗ There are some hyp. in H that correctly describe the target concept.

The target concept is exactly learned when the S and G boundary sets converge to a single identical hypothesis

What will happen if the training data contains errors✗ Assume example 2 incorrectly as negative => remove the correct target

concept (every hyp. inconsistent with the training examples is removed)✗ Given sufficient additional training data the learner will eventually detect

an inconsistency by noticing that S and G eventually converge to an empty version space.

Page 32: 1 Concept Learning and the General-to-Specific Ordering

32Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

What training example should the learner request next?

Before, we assumed that the training examples are provided to the learner by some external teacher.

Definition: Query: the learner is allowed to conduct experiments in which it chooses the next instance, then obtains the correct classification for this instance from an external oracle.

✗ Example EnjoySport: what would be a general query-strategy?✗ Clearly the learner should choose an instance that would be classified

positive by some hypothesis, but negative by others.✗ Shrinking the version

space from six hypothesis to half this number✗ In general the optimal query-strategy for a concept learner is to generate

instances that satisfy exactly half the hypotheses in the current version space✗ When it is possible the correct target concept can found with

experiments

Sunny, Warm, Normal,Light, Warm,Same

2log VS

Page 33: 1 Concept Learning and the General-to-Specific Ordering

33Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

Content

The learning task Concept learning as search Find-S: finding a maximally specific hypothesis Version Space and the Candidate-Elimination algorithm Remarks➔ Inductive Bias✗ A biased hypothesis space✗ An unbiased learner

Summary

Page 34: 1 Concept Learning and the General-to-Specific Ordering

34Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

Inductive Bias

Questions:✗ What if the target concept is not contained in the hyp. space?✗ Can this difficulty be avoided by using a hyp. space that includes

every possible hypothesis?✗ How does the size of this hypothesis space influence the ability of

the algorithm to generalise the unobserved instance?✗ How does the size of the hypothesis space influence the number of

training examples that must be observed?

Page 35: 1 Concept Learning and the General-to-Specific Ordering

35Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

A biased hypothesis space Suppose we wish to assure that the hyp. space contains the unkown target concept Obvious solution: Enrich the hyp. space to include every possible hyp. Example: EnjoySport restricts hyp. space to include only the conjunctions of the

attribute values In fact for these training examples the alg. would find that there are zero hyp. in the

version space

Example Sky Airtemp Humidity Wind Water Forecast EnjoySport1 Sunny Warm Normal Strong Cool Change Yes2 Cloudy Warm Normal Strong Cool Change Yes

3 Rainy Warm Normal Strong Cool Change No

' Why?: the most spec. hyp. consistent with the first two examples and representable in the given hyp space H is

' It is overly general: it incorrectly covers the third (negative example)Problem: bias: only the conjunctive hyp.

2S : ?, Warm, Normal,Strong,Cool,Change

Page 36: 1 Concept Learning and the General-to-Specific Ordering

36Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

An Unbiased Learner Goal: Assuming the target concept is in the hypothesis space Solution: Provide a hyp. space which is capable of representing every

teachable concept=> capable of representing every possible subset of the instance X; in general the power set of X

Example: EnjoySport size of instance space X is 96 => distinct target concepts

Reformulate the example: defining H' corresponding to the powerset of X ✗ Allow arbitrary disjunction, conjunction and negation of the

earlier hyp.✗ One way to define such an H'

for example Sky = Sunny or Sky = Cloudy

96 282 10

Sunny,?,?,?,?,? Cloudy,?,?,?,?,?

Page 37: 1 Concept Learning and the General-to-Specific Ordering

37Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

An Unbiased Learner 2 Candidate-Elimination alg. can be used BUT new problem:

✗ The concept learning alg. is now completely unable to generalize beyond the observed example

✗ Why? Suppose we represent 3 positive and 2 negative examples to the learner

✗ S will contain the hyp. which is just the disjunction of the positive examples

✗ G will cover the negative examples✗ S and G will be always the disjungtion/ negated disjungtion of the

training example => the only examples that will be unambiguously classified by S and G are the observed training examples themselves

1, 2, 3x x x 4, 5x x

1 2 3S: x x x

4 5G : not x x

Page 38: 1 Concept Learning and the General-to-Specific Ordering

38Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

Summary Concept learning can be seen as a problem of searching through a large predefined space

of potential hypotheses The general_to_specific partial ordering of hyp. provides a useful structure for organizing

the search through the hyp. space The FIND-S performs a specific-to general search along one branch of the partial

ordering to find the most specific hyp. consistent with the training examples. The Candidate-Elimination incrementally computes the sets of maximally specific (S) and

maximally general (G) hyp. S and G delimit the entire set of hyp. consistent with the data and provide to identify the

target concept. By looking at S and G one can determine whether the learner has converged to the target concept, wether the training data is inconsistent, which next query would be the most useful to refine the version space.

Version space and the Candidate-Elimination alg. are not robust to noisy data. If the unknown target concept is not expressible in the provided hypothesis space that arises many problems.