instance-based learning algorithms

32
INSTANCE-BASED LEARNING ALGORITHMS Presented by Yan T. Yang

Upload: janna-ryan

Post on 30-Dec-2015

35 views

Category:

Documents


0 download

DESCRIPTION

Instance-based Learning Algorithms. Presented by Yan T. Yang. Agenda. Background what is instance-based learning? Two simple algorithms Extensions [Aha, 1994]: F eedback algorithm Noise reduction Irrelevant attribute elimination Novel attribute adoption. Learning Paradigms. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Instance-based Learning Algorithms

INSTANCE-BASED LEARNING ALGORITHMSPresented by Yan T. Yang

Page 2: Instance-based Learning Algorithms

Agenda

• Background what is instance-based learning?

• Two simple algorithms• Extensions [Aha, 1994]:

• Feedback algorithm• Noise reduction• Irrelevant attribute elimination• Novel attribute adoption

Page 3: Instance-based Learning Algorithms

Learning Paradigms

• Cognitive psychology: how people/animals/ machines learn? Jerome Bruner

• Two schools of thoughts: [Bruner, Goodnow and Austin 1967]• Abstraction-based:

• Form a generalized idea from the

examples, then use it to

classify new objects.

Page 4: Instance-based Learning Algorithms

Learning Paradigms

• Cognitive psychology: how people/animals/ machines learn? Jerome Bruner

• Two schools of thoughts: [Bruner, Goodnow and Austin 1967]• Abstraction-based:

• Examples: • Artificial Neural Network,• Support Vector Machine,• Rule based learner/decision trees:

If not animated… then not an animal

Page 5: Instance-based Learning Algorithms

Learning Paradigms

• Cognitive psychology: how people/animals/ machines learn? Jerome Bruner

• Two schools of thoughts: [Bruner, Goodnow and Austin 1967]• Instance-based:

• Store all (suitable) training

examples, compare new

objects to the examples.

Page 6: Instance-based Learning Algorithms

Comparison Between Two Paradigms

• Abstraction Based• Generalization:

• Rules• Discriminant planes or

functions• Trees

• Workload is during training time

• Little work during query time

• Instance Based• Store (suitable)

examples• Saved instances

• Workload is during query time

• Little work during training time

Page 7: Instance-based Learning Algorithms

Instance-based LearningTrainingSet

Example [Aha, 1994]: Attributes – “is enrolled”, “has MS degree”, and “is married”

( <True, True, True>, PhD student) ( <False, False, True>, not PhD student)( <True, False, False>, PhD student)

}student PhDnot ,student PhD{C

CyyxyxyxT imm )},,),...(,(),,{( 2211

Page 8: Instance-based Learning Algorithms

Instance-based LearningCyyxyxyxT imm )},,),...(,(),,{( 2211

TrainingSet

Instance-based learning Algorithm

Concept Description TT *

Page 9: Instance-based Learning Algorithms

Instance-based LearningCyyxyxyxT imm )},,),...(,(),,{( 2211

TrainingSet

Instance-based learning Algorithm

Concept Description TT *

Similarity Function ]1,0[ ),( : 21 xxsim

Page 10: Instance-based Learning Algorithms

Instance-based LearningCyyxyxyxT imm )},,),...(,(),,{( 2211

TrainingSet

Instance-based learning Algorithm

Concept Description TT *

Similarity Function ]1,0[ ),( : 21 xxsim

Classification Function CyxTclass mm 11 sim,*, :

Page 11: Instance-based Learning Algorithms

Instance-based Learning Algorithm

• Input: Training set• Output: Concept Description• Similarity function• Classification function • Optional:

• Keep track of each concept description instance’s correct and incorrect rates

• Concept Description Adder• Concept Description Remover

Page 12: Instance-based Learning Algorithms

Instance-based Learning Algorithm

• Advantages and disadvantages

[Mitchell, 1997]• Advantages:

• Training is very fast• Learn complex class membership• Do not lose information

• Disadvantages:• Slow at query time• Easily fooled by irrelevant attributes

Page 13: Instance-based Learning Algorithms

Instance-based Learning Algorithm

• Example IBL1:• Assign the class of the most similar concept description instance to the new instance.

• Nearest neighbor • Save all training instances in concept description

CD= concept description

Page 14: Instance-based Learning Algorithms

Instance-based Learning Algorithm

• Example IBL1:– Assign the class of the most similar concept

description instance to the new instance.– Nearest neighbor – Save all training instances in concept

description

VoronoiTessellation

Trainingdata

Page 15: Instance-based Learning Algorithms

Instance-based Learning Algorithm

• Example IBL2:• Similar to IBL1: nearest neighbor• Save only incorrectly classified instances in training set:

Intuition:

“These are nearly always lies in the boundary between two classes. So, only if these are fully saved, the rest which are far from boundaries, can be easily deduced by using the similarity function” [Karadeniz,1996]

CD= concept description

Page 16: Instance-based Learning Algorithms

CriticismsMainly because of Nearest Neighbor Algorithms as the basis: [Brieman, Friedman, Olshen and Stone, 1984 ]

1. They are expensive due to their storage

2. They are sensitive to the choice of the similarity function

3. They cannot easily work with missing attribute values

4. They cannot easily work with nominal attributes

5. They do not yield concise summaries of concepts

Page 17: Instance-based Learning Algorithms

CriticismsMainly because of Nearest Neighbor Algorithms as the basis: [Brieman, Friedman, Olshen and Stone, 1984 ]

1. They are expensive due to their storage

2. They are sensitive to the choice of the similarity function

3. They cannot easily work with missing attribute values

4. They cannot easily work with nominal attributes

5. They do not yield concise summaries of concepts

[Aha, 1992]– IBL2 rectifies 1.– Extensions (following slides) rectifies 1,2,3.– [Stanfill and Waltz, 1986] rectifies 4.– [Salzberg, 1990] rectifies 5.

Page 18: Instance-based Learning Algorithms

Extension: Filtering Noisy Training Instances (IBL3)

Modification:

1. Maintain classification records

2. Only significantly good instances are saved; and

3. Discard noisy saved instance (i.e. those instances with significantly poor classification performance)

Page 19: Instance-based Learning Algorithms

Extension: Filtering Noisy Training Instances (IBL3)

Component IBL2 IBL3Similarity Function Euclidean distance Euclidean distance

Classification Function

Nearest acceptable neighbor

Nearest acceptable neighbor

Concept Description Updater

- Save only misclassified instances

- Save only misclassified instances

- Use only significantly good saved instances

- Remove significantly bad saved instances

Page 20: Instance-based Learning Algorithms

Extension: Filtering Noisy Training Instances (IBL3)

“Signficantly” good or bad:

use statistical confidence intervals (CI).

construct CI for the current instance’s classification accuracy.

construct CI for its class’s current observed relative frequency.

Class frequency

Classification accuracy“Significantly” good

Page 21: Instance-based Learning Algorithms

Extension: Filtering Noisy Training Instances (IBL3)

“Signficantly” good or bad:

use statistical confidence intervals (CI).

construct CI for the current instance’s classification accuracy.

construct CI for its class’s current observed relative frequency.

Class frequency

Classification accuracy“Significantly” bad

Page 22: Instance-based Learning Algorithms

Extension: Filtering Noisy Training Instances (IBL3)

“Signficantly” good or bad:

use statistical confidence intervals (CI).

construct CI for the current instance’s classification accuracy.

construct CI for its class’s current observed relative frequency.

[Hogg and Tanis, 1983]

Page 23: Instance-based Learning Algorithms

Extension: Tolerate irrelevant attributes (IBL4)

•IBL1-IBL3: Assume all attributes have equal relevance ;

•Real World: some attributes are more discriminative than others;

•Irrelevant attributes cause poor performance.

Page 24: Instance-based Learning Algorithms

Extension: Tolerate irrelevant attributes (IBL4)

• Regular similarity measure (Euclidean Distance)

• IBL4’s similarity measure (Euclidean Distance)

Concept-dependent:

sim(animal, tiger, cat) > sim(pet, tiger, cat)

Page 25: Instance-based Learning Algorithms

Extension: Tolerate irrelevant attributes (IBL4)

• IBL4’s similarity measure (Euclidean Distance)

Page 26: Instance-based Learning Algorithms

Extension: Tolerate irrelevant attributes (IBL4)

• IBL4’s similarity measure (Euclidean Distance)

Page 27: Instance-based Learning Algorithms

Extension: Tolerate novel attributes (IBL5)

• (IBL1– IBL4) assume: all attributes are known a priori to the training process;

• Everyday situations: instances may not initially described by all possible attributes;

• Missing value: a different issue. 1) assigning “don’t know”; 2) assigning the most probable value; 3) assigning all possible values [Gams and Lavrac, 1987]

Page 28: Instance-based Learning Algorithms

Extension: Tolerate novel attributes (IBL5)

• Extension (IBL5): allow novel attributes introduced late in the training process (extra: handle missing values in a novel way)

• IBL4’s similarity measure (Euclidean Distance)

• IBL5’s similarity measure (Euclidean Distance)

Page 29: Instance-based Learning Algorithms

Extension: Tolerate novel attributes (IBL5)

• Extension (IBL5): allow novel attributes introduced late in the training process (extra: handle missing values in a novel way)

• IBL5’s similarity measure (Euclidean Distance)

Page 30: Instance-based Learning Algorithms

Results

IB = instance based learning (IBL)

Page 31: Instance-based Learning Algorithms

Results

Page 32: Instance-based Learning Algorithms

Thanks

•Q and A