1 part ii: practical implementations.. 2 modeling the classes stochastic discrimination

78
1 Part II: Practical Implementations.

Upload: annice-henderson

Post on 29-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

1

Part II: Practical Implementations.

Page 2: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

2

Modeling the Classes

Stochastic Discrimination

Page 3: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

3

Algorithm for Training a SD Classifier

Generate projectable weak model

Evaluate model w.r.t. training set, check

enrichment

Check uniformity w.r.t. existing collection

Add to discriminant

Page 4: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

4

Dealing with Data Geometry:

SD in Practice

Page 5: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

5

2D Example

• Adapted from [Kleinberg, PAMI, May 2000]

Page 6: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

6

• An “r=1/2” random subset in the feature space that covers ½ of all the points

Page 7: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

7

• Watch how many such subsets cover a particular point, say, (2,17)

(2,17)

Page 8: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

8

It’s in 1/2 modelsY = ½ = 0.5

It’s in 2/3 modelsY = 2/3 = 0.67

It’s in 3/4 modelsY = ¾ = 0.75

It’s in 4/5 modelsY = 4/5 = 0.8

It’s in 5/6 modelsY = 5/6 = 0.83

It’s in 0/1 modelsY = 0/1 = 0.0

In Out In

In In In

Page 9: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

9

It’s in 6/8 modelsY = 6/8 = 0.75

It’s in 7/9 modelsY = 7/9 = 0.77

It’s in 8/10 modelsY = 8/10 = 0.8

It’s in 8/11 modelsY = 8/11 = 0.73

It’s in 8/12 modelsY = 8/12 = 0.67

It’s in 5/7 modelsY = 5/7 = 0.72

In In

In Out Out

Out

Page 10: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

10

• Fraction of “r=1/2” random subsets covering point (2,17) as more such subsets are generated

Page 11: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

11

• Fractions of “r=1/2” random subsets covering several selected points as more such subsets are generated

Page 12: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

12

• Distribution of model coverage for all points in space, with 100 models

Page 13: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

13

• Distribution of model coverage for all points in space, with 200 models

Page 14: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

14

• Distribution of model coverage for all points in space, with 300 models

Page 15: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

15

• Distribution of model coverage for all points in space, with 400 models

Page 16: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

16

• Distribution of model coverage for all points in space, with 500 models

Page 17: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

17

• Distribution of model coverage for all points in space, with 1000 models

Page 18: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

18

• Distribution of model coverage for all points in space, with 2000 models

Page 19: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

19

• Distribution of model coverage for all points in space, with 5000 models

Page 20: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

20

• Introducing enrichment:

For any discrimination to happen, the models must have some difference in coverage for different classes.

Page 21: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

21

• Enforcing enrichment (adding in a bias): require each subset to cover more points of one class than another

Class distribution A biased (enriched) weak model

Page 22: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

22

• Distribution of model coverage for points in each class, with 100 enriched weak models

Page 23: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

23

• Distribution of model coverage for points in each class, with 200 enriched weak models

Page 24: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

24

• Distribution of model coverage for points in each class, with 300 enriched weak models

Page 25: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

25

• Distribution of model coverage for points in each class, with 400 enriched weak models

Page 26: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

26

• Distribution of model coverage for points in each class, with 500 enriched weak models

Page 27: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

27

• Distribution of model coverage for points in each class, with 1000 enriched weak models

Page 28: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

28

• Distribution of model coverage for points in each class, with 2000 enriched weak models

Page 29: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

29

• Distribution of model coverage for points in each class, with 5000 enriched weak models

Page 30: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

30

• Error rate decreases as number of models increases

Decision rule: if Y < 0.5 then class 2 else class 1

Page 31: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

31

• Sparse Training Data:

Incomplete knowledge about class distributions

Training Set Test Set

Page 32: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

32

• Distribution of model coverage for points in each class, with 100 enriched weak models

Training Set Test Set

Page 33: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

33

• Distribution of model coverage for points in each class, with 200 enriched weak models

Training Set Test Set

Page 34: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

34

• Distribution of model coverage for points in each class, with 300 enriched weak models

Training Set Test Set

Page 35: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

35

• Distribution of model coverage for points in each class, with 400 enriched weak models

Training Set Test Set

Page 36: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

36

• Distribution of model coverage for points in each class, with 500 enriched weak models

Training Set Test Set

Page 37: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

37

• Distribution of model coverage for points in each class, with 1000 enriched weak models

Training Set Test Set

Page 38: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

38

• Distribution of model coverage for points in each class, with 2000 enriched weak models

Training Set Test Set

Page 39: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

39

• Distribution of model coverage for points in each class, with 5000 enriched weak models

Training Set Test Set

No discrimination!

Page 40: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

40

• Models of this type, when enriched for training set, are not necessarily enriched for test set

Training Set Test Set

Random model with 50% coverage of space

Page 41: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

41

• Introducing projectability:

Maintain local continuity of class interpretations.

Neighboring points of the same class should share similar model coverage.

Page 42: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

42

• Allow some local continuity in model membership, so that interpretation of a training point can generalize to its immediate neighborhood

Class distribution A projectable model

Page 43: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

43

• Distribution of model coverage for points in each class, with 100 enriched, projectable weak models

Training Set Test Set

Page 44: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

44

• Distribution of model coverage for points in each class, with 300 enriched, projectable weak models

Training Set Test Set

Page 45: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

45

• Distribution of model coverage for points in each class, with 400 enriched, projectable weak models

Training Set Test Set

Page 46: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

46

• Distribution of model coverage for points in each class, with 500 enriched, projectable weak models

Training Set Test Set

Page 47: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

47

• Distribution of model coverage for points in each class, with 1000 enriched, projectable weak models

Training Set Test Set

Page 48: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

48

• Distribution of model coverage for points in each class, with 2000 enriched, projectable weak models

Training Set Test Set

Page 49: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

49

• Distribution of model coverage for points in each class, with 5000 enriched, projectable weak models

Training Set Test Set

Page 50: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

50

• Promoting uniformity:

All points in the same class should have equal likelihood to be covered by a model of each particular rating.

Retain models that cover the points whose coverage by current collection is less

Page 51: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

51

• Distribution of model coverage for points in each class, with 100 enriched, projectable, uniform weak models

Training Set Test Set

Page 52: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

52

• Distribution of model coverage for points in each class, with 1000 enriched, projectable, uniform weak models

Training Set Test Set

Page 53: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

53

• Distribution of model coverage for points in each class, with 5000 enriched, projectable, uniform weak models

Training Set Test Set

Page 54: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

54

• Distribution of model coverage for points in each class, with 10000 enriched, projectable, uniform weak models

Training Set Test Set

Page 55: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

55

• Distribution of model coverage for points in each class, with 50000 enriched, projectable, uniform weak models

Training Set Test Set

Page 56: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

56

The 3 necessary conditions

Complementary Information

Discriminating Power

Generalization Power

Enrichment:

Projectability:Uniformity:

Page 57: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

57

Extensions and Comparisons

Page 58: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

58

Alternative Discriminants

• [Berlind 1994]

• Different discriminants for N-class problems

• Additional condition on symmetry

• Approximate uniformity

• Hierarchy of indiscernibility

Page 59: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

59

Estimates of Classification Accuracies

• [Chen 1997]

• Statistical estimate of classification accuracy

under weaker conditions:

Approximate uniformity

Approximate indiscernibility

Page 60: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

60

• For n classes, define n discriminants Yi, one for each class i vs the others

• Classify an unknown point to the class i for which the computed Yi is the largest

Multi-class Problems

Page 61: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

61

[Ho & Kleinberg ICPR 1996]

Page 62: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

62

Page 63: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

63

Page 64: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

64

Page 65: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

65

Open Problems

• Algorithm for uniformity enforcementDeterministic methods?

• Desirable form of weak modelsFewer, more sophisticated classifiers?

• Other ways to address the 3-way trade-offEnrichment / Uniformity / Projectability

Page 66: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

66

Random Decision Forest

• [Ho 1995, 1998]

• A structured way to create models: fully split a tree, use leaves as models

• Perfect enrichment and uniformity for TR

• Promote projectability by subspace projection

Page 67: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

67

Compact Distribution Maps

• [Ho & Baird 1993, 1997]

• Another structured way to create models

• Start with projectable models by coarse quantization of feature value range

• Seek enrichment and uniformity

Signature of 2 types of events and measurements from a new observation

Signal IndexSignal Level

Page 68: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

68

SD & Other Ensemble Methods

• Ensemble learning via boosting:

A sequential way to promote uniformity of ensemble element coverage

• XCS (a genetic algorithm)

A way to create, filter, and use stochastic models that are regions in feature space

Page 69: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

69

XCS Classifier System

• [Wilson,95]Recent focus of GA community

Good performance

Reinforcement Learning + Genetic Algorithms

Model: set of rules

Environment

Set of Rules

input class

ReinforcementLearning

GeneticAlgorithms

reward

updatesearch

if (shape=square and number>10) then class=redif (shape=circle and number<5) then class=yellow

Page 70: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

70

Multiple Classifier Systems:Examples in Word Image Recognition

Page 71: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

71

Complementary Strengths of Classifiers

The case for classifier combination

… decision fusion

… mixture of experts

… committee decision making

Rank of true class out of a lexicon of 1091 words, by 10 classifiers for 20 images

Page 72: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

72

Classifier Combination Methods

• Decision Optimization:

find consensus among a given set of classifiers

• Coverage Optimization:

create a set of classifiers that work best with a given decision combination function

Page 73: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

73

Decision Optimization

• Develop classifiers with expert knowledge• Try to make the best use of their decisions

via majority/plurality vote, sum/product rule, probabilistic methods, Bayesian methods, rank/confidence score combination …

• The joint capability of the classifiers set an intrinsic limit on the combined accuracy

• There is no way to handle the blind spots

Page 74: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

74

Difficulties in Decision Optimization

• Reliability versus overall accuracy

• Fixed or trainable combination function

• Simple models or combinatorial estimates

• How to model complementary behavior

Page 75: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

75

Coverage Optimization

• Fix a decision combination function• Generate classifiers automatically and systematically

via training set sub-sampling (stacking, bagging, boosting),subspace projection (RSM), superclass/subclass decomposition (ECOC), random perturbation of training processes, noise injection …

• Need enough classifiers to cover all blind spots(how many are enough?)

• What else is critical?

Page 76: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

76

Difficulties inCoverage Optimization

• What kind of differences to introduce:– Subsamples? Subspaces? Super/Subclasses?– Training parameters? – Model geometry?

• 3-way tradeoff: – discrimination + diversity + generalization

• Effects of the form of component classifiers

Page 77: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

77

Dilemmas and Paradoxes in Classifier Combination

• Weaken individuals for a stronger whole?

• Sacrifice known samples for unseen cases?

• Seek agreements or differences?

Page 78: 1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination

78

Stochastic Discrimination

• A mathematical theory that relates several key concepts in pattern recognition:

– Discriminative power … enrichment– Complementary information … uniformity– Generalization power … projectability

• It offers a way to describe complementary behavior of classifiers

• It offers guidelines to design multiple classifier systems (classifier ensembles)