problems with learning

Problems with Learning

Concept spaces are very largeTraining sets represent a very small percentage of

instancesGeneralization is not (in general) truth preservingThe same training set may allow for different

generalizationsHeuristics may be necessary to guide search and to

constraint the space

Inductive Bias

Inductive bias is a way to constrain choice. This

could include:Heuristic constraints on the search spaceHeuristics to guide searchBias towards simplicitySyntactic constraints on the representation of

learned concepts

Representational Biases

Conjunctive biases: Only allow conjunctsLimitations on the number of disjunctsFeature vectors: Specify the allowed features and

the range of valuesDecision TreesHorn clauses

Theory of Learnability

Goals: Restrict the set of target concepts so that we

can search the space efficiently and still find high

quality concepts. High quality is indicative of the

effectiveness in classifying objects.

Efficiency and correctness may depend not just upon

the learning algorithm but also upon the language

for expressing concepts, which in turn, denotes the

search space.

Example

Given 1000 balls of various types, the concept of

'ball' would probably be learnable.Given 1000 random objects, it would be difficult to

find an appropriate generalization.This difference is independent of the learning

algorithm.

PAC Learnability (Valiant)

A class of concepts is PAC learnable if there is an

algorithm that executes efficiently and has a high

probability of finding an approximately correct

concept. Let C be a set of concepts and X be a set of

instances, n = |X|. C is PAC learnable if for a

concept error probability ɛ and a failure probability

δ, there is an algorithm which trained on X

produces a concept c of C, such that the probability

that c has a generalization error > ɛ is less than δ.

PAC Learnability (cont'd)

That is, for y drawn from the same distribution of

samples in X were drawn from:

P [ P [y is misclassified by c] > ɛ] ≤ δ.

The running time for the algorithm must by

polynomial in terms of n = |X|, 1/ ɛ, and 1/ δ.

Prior Knowledge

Some learning algorithms use prior domain

knowledge. This is not unusual as people are

believed to learn more efficiently if they can relate

new knowledge to old. In Explanation-Based

Learning, a domain theory is to explain an example.

Generalization is then based on the explanation

rather then the example itself.

Explanation-Based Learning

There are four components:A target concept – this is the goalA training example (positive)A domain theory – a set of rules and facts that

explain how the training example is an example of

the targetOperationality criteria – restriction on the form of

the concepts developed (inductive bias)

EBL Example

target concept: premise(X) -> cup(X) where

premise is a conjunctive expression containing X.domain theory:

liftable(X) ^ hold_liquid(X) -> cup(X) part(Z,W) ^ concave(W) ^ points_up(W) -> holds_liquid(Z) light(Y) ^ part(Y,handle) -> liftable(Y) small(A) -> light(A) made_of(A, feathers) -> light(A)

Example (cont'd)

training example: cup(obj1), small(obj1), part(obj1, handle), owns(bob, obj1), part(obj1, bottom), part(obj1, bowl), points_up(bowl), concave(bowl), color(obj1, red)

operationality criteria: target concepts must be

defined in terms of observable, structural properties

of objects.

Explanation

Generalization

Advantages of EBL

Ignores irrelevant informationGeneralizations are relevant because they are

consistent with the domain theoryCan learn from a single training exampleAllows one to hypothesize unstated relationships

between its goals and its experience

Limitations of EBL

Can only learn rules that are within the deductive

closure of its domain theorySuch rules could be deduced without the need of

training examplesEBL can be seen as a way to speed-up learningHowever, no need for complete domain theory

Reasoning by Analogy

If two situations are similar in certain respects, we

can construct mapping from one to the other and

then use that mapping to reason from the first to the

second situationMust be able to identify key features in both, ignore

extraneous featuresSelection of the source situation is critical

Analogy (cont'd)

Necessary steps:Retrieve potential source caseElaboration: Derive additional features and

relationships in the source caseMapping: Map the source attributes to the targetJustification: Determine that the mapping is validLearning: Apply what you know from the source

case to the target. Store knowledge for the future.

Uses of Analogy

Case-based reasoning: Law, MedicineMathematical theorem provingPhysical modelsGamesDiagnoses

Unsupervised Learning

The system forms and evaluates concepts on its own. Automated discoveryConceptual clustering

AM (Lenat)

AM (Automated Mathematician) was a system of

automatically generating “interesting” concepts in

mathematics, primarily number theory. The system

began with a set of basic concepts (such as a bag, or

multi-set) and operators, and then used

generalization, specialization, and inversion of

operators to define new concepts. AM could

generate instances of the concepts and test them.

A frequently-occurring concept is deemed interesting

AM (cont'd)

Heuristics were used to guide the search. Concepts

were represented as small LISP code were which

could be mutated. The compact representation was

a key to the power of the program to discover new

concepts.

AM Discoveries

NumbersEvenOddFactorsPrimesGoldbach's ConjectureFundamental Theorem of Arithmetic

Conceptual Clustering

The clustering problem is to take a collection of

objects and group them together in a meaningful

way. There is some measurable standard of quality

which is used to maximize similarity of objects in

the same group (cluster).

Clustering Algorithm

A simple clustering algorithm is:Choose the pair of objects with the highest degree

of similarity. Make them a cluster.Define the features of a cluster as the average of the

features of the members. Replace the members by

the cluster.Repeat until a single cluster is formed.

Clustering (cont'd)

Often there is a measure of closeness between

objects, or a list of features that can be compared.

Weights may be different for different features.

Traditional clustering algorithms don't produce

meaningful semantic explanations. Clusters are

represented extensionally (listing their members)

and not intensionally (by providing criteria for

membership).

CLUSTER/2

1 Select k seeds from the set.

2 For each seed, use that seed as a positive example,

and the other seeds as negative example and

produce a maximally general definition

3 Classify all the non-seed objects using the

definitions produced by the seeds to categorize all

objects. Find a specific description for each

category.

CLUSTER/2 (cont'd)

4 Adjust for overlapping definitions

5 Using a distance metric, select an element closest to

the center of each category

6 Repeat steps 1-5 using these new elements as seeds.

Stop when satisfactory.

7 If no improvement after several iterations try seeds

near the edges of the clusters.

Reinforcement Learning

The idea is to interact with the environment and gain

feedback (possibly both positive and negative) to

adjust behavior.There is a trade-off between what

you know and what you gain by further exploration. policyrewardvalue mappingmodel

problems with learning

Documents

set of concepts

concept c of c

set of target concepts

concept error probability

high quality concepts

concept of ball

correct concept

high probability