instance-based learning. lehrstuhl für informatik 2 gabriella kókai: maschine learning 2 content...

Instance-Based Learning

2Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

Content

Motivation Eager Learning Lazy Learning Instance-Based Learning

k-Nearest Neighbour Learning (kNN) Distance-Weighted k-NN Locally Weighted Regression (LWR) Case-Based Reasoning (CBR) Summary



Motivation: Eager Learning THE LEARNING TASK:

Try to approximate a target function through a hypothesis on the basis of training examples

EAGER Learning:As soon as the training examples and the hypothesis space are received the search for the first hypothesis begins

Training phase: given: training examples hypothesis space Hsearch: best hypothesis

Processing phase:for every new instance return

Examples

i iD = x ,f x

f̂

qx qf̂ x



Motivation: Lazy Algorithms

LAZY ALGORITHMS: Training examples are stored and sleeping Generalisation beyond these examples is postponed till new

instances must be classified Every time a new query instance is encountered, its relationship

to the previously stored examples is examined in order to compute the value of the target function for this new instance



Motivation: Instance-Based Learning Instance-Based Algorithms can establish a new local

approximation for every new instance Training phase:

given: training sample Processing phase:

given: instancesearch: best local hypothesis return

Examples: Nearest Neighbour Algorithm Distance Weighted Nearest Neighbour Locally Weighted Regression ....

i iD = x ,f x

f̂qx

qf̂ x



Motivation: Instance-Based Learning 2

How are the instances represented? How can we measure the similarity of the instances? How can be computed?f xq



Nearest Neighbour Algorithm

IDEA: All instances correspond to the points in the n-dimensional space .Assign the value of the next, neighboured instance to the new instance

REPRESENTATION:Let be an instance, where denotes the value of the r-th attribute of an instance x

TARGET FUNCTION:Discrete valued or real valued

n

i 1 i 2 i n ix = a x ,a x ,...,a x

r ia x



Nearest Neighbour Algorithm 2 HOW IS THE NEAREST NEIGHBOUR DEFINED :

Metric as similarity measure Minkowski Norm:

where Euclidean distance:

This algorithm never forms an explicit general hypothesis regarding the target function f

q id x ,x

L p

2n

q i r qxir=1

d x ,x a a x

p n p

q i r i r qr=1

1pd x ,x a x a x

f̂



Nearest Neighbour Algorithm 3 HOW IS FORMED?

Discrete target function: where V: set of s classes

Continuous target function:

Let the next neighbour of

==>

n1, 2, sf : V | V = v v ..., v

xn

qf̂ x

n nf :

qx

n q i i qd x ,x min d x ,x

q nf̂ x = f x



k-Nearest Neighbour

IDEA:If we choose k=1, then the algorithm assigns to the value where is the nearest training instance toFor larger values of k the algorithm assigns the most common value among the k nearest training examples

HOW CAN BE ESTABLISHED?

where if and otherwise

qf̂ x if xx i qx

qf̂ x

k

q v V ii=1

f̂ x argmax δ v,f x δ a,b = 1 a = b δ a,b = 0



k-Nearest Neighbour 2 Example:

1NN: 5-NN: Voronoi Diagram Voronoi Diagram: The decision surface is induced by a 1-Nearest Neighbour

algorithm for a typical set of training examples. The convex surrounding of each training example indicates the region of query points whose classification will be

completely determined by the training example.

qf̂ x = + qf̂ x =



k-Nearest Neighbour 3 REFINEMENT: The weights of the neighbours are taken into

account relative to their distance to the query point. The farther a neighbour the less is its influence...

where

To accommodate the case where the query point exactly matches one of the training instances and the denominator therefore is zero, we assign to be in this case

Distance-weight for real-valued target function:

qx

k

q v V i ii=1

f̂ x argmax w δ v,f x

i 2

q i

1w

d x ,x

qf̂ x if x qx

k

i ii=1

q k

ii=1

w f xf̂ x

w



Remarks on k-Nearest Neighbour Algorithm

PROBLEM:The measurement of the distance between two instances considers every attribute. So even irrelevant attributes can influence the approximation.

EXAMPLE: n =20 but only 2 attributes are relevant SOLUTION: Weight each attribute differently when calculating the

distance between two neighbours: stretching the relevant axes in Euclidian space:

shortening the axes that correspond to less relevant attributes lengthening the axes that correspond to more relevant attribute

PROBLEM: Determine which weight belongs to which attribute automatically?

Cross-validation Leave-one-out



Remarks on k-Nearest Neighbour Algorithm 2

ADVANTAGE: The training phase is processed very fast Can learn complex target function Robust to noisy training data Quite effective when a sufficiently large set of training data is provided Under very general conditions holds:

where P is the probability of the error DISADVANTAGE:

Alg. delays all processing until a new query is received => significant computation can be required to process; efficient memory indexing

Processing is slow Sensibility about escape of the dimensions

BIAS:Inductive bias corresponds to an assumption that the classification of an instance will be most similar to the classification of other instances that are nearby in Euclidean distance

Bayes kNN BayesP P 2P

qx



Locally Weighted Regression

IDEA: Generalization of Nearest Neighbour Alg.It constructs an explicit approximation to f over a local region surrounding . It uses nearby or distance-weighted training examples to form the local approximation to f.

Local: The function is approximated based solely on the training data near the query point

Weighted: The construction of each training example is weighted by its distance from the query point

Regression: Means approximating a real-valued target function

qx



Locally Weighted Regression

PROCEDURE: Given a new query , construct an approximation that fits the training

examples in the neighbourhood surrounding This approximation is used to calculate , which is as the estimated

target value assigned to the query instance. The description of may change, because a different

local approximation will be calculated for each instance

qxf̂

qx

f̂

qf̂ x



Locally Weighted Regression 2 PROCEDURE:

Given new query , construct an approximation that fits the training examples in the surrounding neighbourhood

How can be calculated? Linear function Quadratic function Multilayer neural network ...

This approximation is used to calculate , which is the output of the estimated target value for the query instance .

The description of may be deleted, because a different local approximation will be calculated for every distinct query instance

f̂qx

f

f xq

qf̂ x

qf̂ x



Locally Weighted Linear Regression Special case of LWR, simple computation LINEAR HYPOTHESIS SPACE:

where the rth attribute of x, x variable of the hypotheses space Define the error criterion E in order to emphasize the fitting of the

local training example Minimise the squared error over just k nearest neighbours:

Minimise the squared error over the entire set D using some kernel function K to decrease this error based on the distance

Combine and

q 0 1 1 n nf̂ x = w + w a x +...+ w a x

ra

2

1 qx k nearest nbrs of xq

1 Ê x f x f x2

2

2 q qx D

1 Ê x f x f x K d x ,x2

2

3 q qx knearestnbrsofxq

1 Ê x f x f x K d x ,x2

2E1E



Locally Weighted Linear Regression 2

The third error criterion is a good approximation to the second one and it has the advantage that computational costs are independent of the total number of training examples

If is chosen and the gradient descent rule is rederived (see NN)the following training rule is obtained

j q jx knearestnbrsofxq

ˆΔw = η K d x ,x f x f x a x

3E



Evaluation Locally Weighted Regression

ADVANTAGE Pointwise approximation of a complex target function Earlier data has no influence on the new ones

DISADVANTAGE The quality of the result depends on

Choice of the function Choice of the kernel function K Choice of the hypothesis space H

Sensibility against the relevant and irrelevant attributes

f̂



Case-Based Reasoning (CBR)

Instance-based methods and locally weighted regression: lazy learning; They classify new query instances by analysing similar instances

and ignoring the very different ones They represent instances as real-valued points in an n-

dimensional Euclidian space CBR: first two principles and instances are represented by

using a richer symbolic description and the methods used to retrieval



Case-Based Reasoning 2

Given: a new case (instance) Search for relevant cases in the Case-Library Select the best one from them Derive a solution Evaluate the found solution Add the solved case in the Case-Library



Case-Based Reasoning 3

HOW ARE THE INSTANCES REPRESENTED?complex logical relational description

Example ((user-complaint error53 on shutdown) (CPU-model Power PC) (operating-system Windows) (network connection PCIA) (memory 48meg) (installed-application Excel Netscape) (disk 1gig) (likely-causes ???))

HOW CAN THE SIMILARITY BE MEASURED? See Example CADET



CADET

Prototype example of case based reasoning systems Assists in the conceptual design of simple mechanical devices,

such as water faucets It uses a library containing approximately 75 previous designs and

design fragments to suggest a conceptual design to meet the specifications of the new design

Each instance <qualitative function, mechanical structure>is stored

New design problem: Specify desired function Desired: Corresponding structure



CADET Example



CADET Example 2

Searches for subgraph isomorphisms between the two function graphs, so that parts of a case can be found to match parts of the design specification

The system elaborates the original function specification graph in order to create functionally equivalent graphs that match still more cases

It uses general knowledge about physical influences to create these elaborated function graphs: rewrite rule:

x is a universally quantified variable Combination to gain new solution: based on the knowledge-based

reasoning

+A B

+ +A x B



Evaluation of CBR

ADVANTAGE: Formation of autonomous thinking systems ???

DISADVANTAGE Hierarchical system Memory indexing Syntactical similarity measurement Possibility of incompability between two neighboured cases ->

impossible combination Evaluation of the recognised solution



Evaluation of Lazy Algorithms DIFFERENCE TO EAGER LEARNING

Computational timeless during the training phaselonger during the classification

Classification:training samples always remain obtained compute an instance specification approximation

Generalization accuracylocal approximations are computed

Bias:consider the query instance when deciding how to generalize beyond the training data

PROBLEMS: Efficiently labeling new instances Determining an appropriate distance measure Influence of irrelevant attributes



Summary

Lazy learning: Delay processing of training examples until they must label a new query instance. The result is several local approximations.

k-Nearest neighbour: An instance is a point in the n-dimensional Euclidean space. The target function value for a new query is estimated from the known values of the k nearest training examples.

Locally weighted regression: Explicit local approximation to the target function is constructed for each query instance (form: constant, linear,...)

Case-based reasoning: Instances are represented by complex logical description. A rich variety of methods is proposed for mapping from the training examples to the target function values for new instances.

instance-based learning. lehrstuhl für informatik 2 gabriella kókai: maschine learning 2 content...

Documents