problems with learning
Post on 05-Jan-2016
63 Views
Preview:
DESCRIPTION
TRANSCRIPT
Problems with Learning
Concept spaces are very largeTraining sets represent a very small percentage of
instancesGeneralization is not (in general) truth preservingThe same training set may allow for different
generalizationsHeuristics may be necessary to guide search and to
constraint the space
Inductive Bias
Inductive bias is a way to constrain choice. This
could include:Heuristic constraints on the search spaceHeuristics to guide searchBias towards simplicitySyntactic constraints on the representation of
learned concepts
Representational Biases
Conjunctive biases: Only allow conjunctsLimitations on the number of disjunctsFeature vectors: Specify the allowed features and
the range of valuesDecision TreesHorn clauses
Theory of Learnability
Goals: Restrict the set of target concepts so that we
can search the space efficiently and still find high
quality concepts. High quality is indicative of the
effectiveness in classifying objects.
Efficiency and correctness may depend not just upon
the learning algorithm but also upon the language
for expressing concepts, which in turn, denotes the
search space.
Example
Given 1000 balls of various types, the concept of
'ball' would probably be learnable.Given 1000 random objects, it would be difficult to
find an appropriate generalization.This difference is independent of the learning
algorithm.
PAC Learnability (Valiant)
A class of concepts is PAC learnable if there is an
algorithm that executes efficiently and has a high
probability of finding an approximately correct
concept. Let C be a set of concepts and X be a set of
instances, n = |X|. C is PAC learnable if for a
concept error probability ɛ and a failure probability
δ, there is an algorithm which trained on X
produces a concept c of C, such that the probability
that c has a generalization error > ɛ is less than δ.
PAC Learnability (cont'd)
That is, for y drawn from the same distribution of
samples in X were drawn from:
P [ P [y is misclassified by c] > ɛ] ≤ δ.
The running time for the algorithm must by
polynomial in terms of n = |X|, 1/ ɛ, and 1/ δ.
Prior Knowledge
Some learning algorithms use prior domain
knowledge. This is not unusual as people are
believed to learn more efficiently if they can relate
new knowledge to old. In Explanation-Based
Learning, a domain theory is to explain an example.
Generalization is then based on the explanation
rather then the example itself.
Explanation-Based Learning
There are four components:A target concept – this is the goalA training example (positive)A domain theory – a set of rules and facts that
explain how the training example is an example of
the targetOperationality criteria – restriction on the form of
the concepts developed (inductive bias)
EBL Example
target concept: premise(X) -> cup(X) where
premise is a conjunctive expression containing X.domain theory:
liftable(X) ^ hold_liquid(X) -> cup(X) part(Z,W) ^ concave(W) ^ points_up(W) -> holds_liquid(Z) light(Y) ^ part(Y,handle) -> liftable(Y) small(A) -> light(A) made_of(A, feathers) -> light(A)
Example (cont'd)
training example: cup(obj1), small(obj1), part(obj1, handle), owns(bob, obj1), part(obj1, bottom), part(obj1, bowl), points_up(bowl), concave(bowl), color(obj1, red)
operationality criteria: target concepts must be
defined in terms of observable, structural properties
of objects.
Explanation
Generalization
Advantages of EBL
Ignores irrelevant informationGeneralizations are relevant because they are
consistent with the domain theoryCan learn from a single training exampleAllows one to hypothesize unstated relationships
between its goals and its experience
Limitations of EBL
Can only learn rules that are within the deductive
closure of its domain theorySuch rules could be deduced without the need of
training examplesEBL can be seen as a way to speed-up learningHowever, no need for complete domain theory
Reasoning by Analogy
If two situations are similar in certain respects, we
can construct mapping from one to the other and
then use that mapping to reason from the first to the
second situationMust be able to identify key features in both, ignore
extraneous featuresSelection of the source situation is critical
Analogy (cont'd)
Necessary steps:Retrieve potential source caseElaboration: Derive additional features and
relationships in the source caseMapping: Map the source attributes to the targetJustification: Determine that the mapping is validLearning: Apply what you know from the source
case to the target. Store knowledge for the future.
Uses of Analogy
Case-based reasoning: Law, MedicineMathematical theorem provingPhysical modelsGamesDiagnoses
Unsupervised Learning
The system forms and evaluates concepts on its own. Automated discoveryConceptual clustering
AM (Lenat)
AM (Automated Mathematician) was a system of
automatically generating “interesting” concepts in
mathematics, primarily number theory. The system
began with a set of basic concepts (such as a bag, or
multi-set) and operators, and then used
generalization, specialization, and inversion of
operators to define new concepts. AM could
generate instances of the concepts and test them.
A frequently-occurring concept is deemed interesting
AM (cont'd)
Heuristics were used to guide the search. Concepts
were represented as small LISP code were which
could be mutated. The compact representation was
a key to the power of the program to discover new
concepts.
AM Discoveries
NumbersEvenOddFactorsPrimesGoldbach's ConjectureFundamental Theorem of Arithmetic
Conceptual Clustering
The clustering problem is to take a collection of
objects and group them together in a meaningful
way. There is some measurable standard of quality
which is used to maximize similarity of objects in
the same group (cluster).
Clustering Algorithm
A simple clustering algorithm is:Choose the pair of objects with the highest degree
of similarity. Make them a cluster.Define the features of a cluster as the average of the
features of the members. Replace the members by
the cluster.Repeat until a single cluster is formed.
Clustering (cont'd)
Often there is a measure of closeness between
objects, or a list of features that can be compared.
Weights may be different for different features.
Traditional clustering algorithms don't produce
meaningful semantic explanations. Clusters are
represented extensionally (listing their members)
and not intensionally (by providing criteria for
membership).
CLUSTER/2
1 Select k seeds from the set.
2 For each seed, use that seed as a positive example,
and the other seeds as negative example and
produce a maximally general definition
3 Classify all the non-seed objects using the
definitions produced by the seeds to categorize all
objects. Find a specific description for each
category.
CLUSTER/2 (cont'd)
4 Adjust for overlapping definitions
5 Using a distance metric, select an element closest to
the center of each category
6 Repeat steps 1-5 using these new elements as seeds.
Stop when satisfactory.
7 If no improvement after several iterations try seeds
near the edges of the clusters.
Reinforcement Learning
The idea is to interact with the environment and gain
feedback (possibly both positive and negative) to
adjust behavior.There is a trade-off between what
you know and what you gain by further exploration. policyrewardvalue mappingmodel
top related