tin kam ho bell laboratories lucent technologies
DESCRIPTION
Data Complexity Analysis. for Classifier Combination. Tin Kam Ho Bell Laboratories Lucent Technologies. Outlines. Classifier Combination: Motivation, methods, difficulties Data Complexity Analysis: Motivation, methods, early results. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/1.jpg)
Tin Kam Ho Bell Laboratories
Lucent Technologies
![Page 2: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/2.jpg)
Outlines
• Classifier Combination:– Motivation, methods, difficulties
• Data Complexity Analysis:– Motivation, methods, early results
![Page 3: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/3.jpg)
![Page 4: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/4.jpg)
Supervised Classification --Discrimination Problems
![Page 5: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/5.jpg)
An Ill-Posed Problem
![Page 6: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/6.jpg)
Where Were We in the Late 1990’s?
• Statistical Methods– Bayesian classifiers, polynomial discriminators, nearest-
neighbors, decision trees, neural networks, support vector machines, …
• Syntactic Methods– regular grammars, context-free grammars, attributed
grammars, stochastic grammars, ...
• Structural Methods– graph matching, elastic matching, rule-based systems, ...
![Page 7: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/7.jpg)
Classifiers
• Competition among different …– choices of features– feature representations– classifier designs
• Chosen by heuristic judgements
• No clear winners
![Page 8: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/8.jpg)
Classifier Combination Methods
• Decision optimization methods– find consensus from a given set of classifiers– majority/plurality vote, sum/product rule– probability models, Bayesian approaches– logistic regression on ranks or scores– classifiers trained on confidence scores
![Page 9: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/9.jpg)
Classifier Combination Methods
• Coverage optimization methods
– subsampling methods:
stacking, bagging, boosting
– subspace methods: random subspace projection, localized selection
– superclass/subclass methods: mixture of experts, error-correcting output codes
– perturbation in training
![Page 10: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/10.jpg)
Layers of Choices
Best Features?
Best Classifier?
Best Combination Method?
Best (combination of )* combination methods?
![Page 11: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/11.jpg)
Before We Start …
• Single or multiple classifiers?– accuracy vs efficiency
• Feature or decision combination?– compatible units, common metric
• Sequential or parallel classification?– efficiency, classifier specialties
![Page 12: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/12.jpg)
Difficulties in Decision Optimization
• Reliability versus overall accuracy
• Fixed or trainable combination function
• Simple models or combinatorial estimates
• How to model complementary behavior
![Page 13: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/13.jpg)
Difficulties inCoverage Optimization
• What kind of differences to introduce:– Subsamples? Subspaces? Subclasses?– Training parameters? – Model geometry?
• 3-way tradeoff: – discrimination + uniformity + generalization
• Effects of the form of component classifiers
![Page 14: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/14.jpg)
Dilemmas and Paradoxes
• Weaken individuals for a stronger whole?
• Sacrifice known samples for unseen cases?
• Seek agreements or differences?
![Page 15: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/15.jpg)
Model of Complementary Decisions
• Statistical independence of decisions:assumed or observed?
• Collective vs point-wise error estimates
• Related estimates of neighboring samples
![Page 16: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/16.jpg)
Stochastic Discrimination
• Set-theoretic abstraction
• Probabilities in model or feature spaces
• Enrichment / Uniformity / Projectability
• Convergence by law of large numbers
![Page 17: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/17.jpg)
Stochastic Discrimination
• Algorithm for uniformity enforcement
• Fewer, but more sophisticated classifiers
• Other ways to address the 3-way trade-off
![Page 18: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/18.jpg)
Geometry vs Probability
• Geometry of classifiers
• Rule of generalization to unseen samples
• Assumption of representative samples
Optimistic error bounds
• Distribution-free arguments
Pessimistic error bounds
![Page 19: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/19.jpg)
Sampling Density
N = 2 N = 10
N = 100 N = 500 N = 1000
![Page 20: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/20.jpg)
Summary of Difficulties
• Many theories have inadequate assumptions• Geometry and probability lack connection• Combinatorics defies detailed modeling• Attempt to cover all cases gives weak results
• Empirical results overly specific to problems• Lack of systematic organization of evidences
![Page 21: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/21.jpg)
More Questions
• How do confidence scores differ from feature values?• Is combination a convenience or a necessity?• What are common among various
combination methods?• When should the combination hierarchy
terminate?
![Page 22: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/22.jpg)
Data Dependent Behavior of Classifiers
• Different classifiers excel in
different problems
• So do combined systems
• This complicates theories and
interpretation of observations
![Page 23: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/23.jpg)
Questions to ask:
• Does this method work for all problems?
• Does this method work for this problem?
• Does this method work for this type of problems?
Study the interaction of
data and classifiers
![Page 24: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/24.jpg)
Characterization of
Data and Classifier Behavior
in a Common Language
![Page 25: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/25.jpg)
Sources of Difficultyin Classification
• Class ambiguity
• Boundary complexity
• Sample size and dimensionality
![Page 26: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/26.jpg)
Class Ambiguity
• Is the problem intrinsically ambiguous?
• Are the classes well defined?
• What is the information content of the features?
• Are the features sufficient for discrimination?
![Page 27: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/27.jpg)
• Kolmogorov complexity
• Length may be exponential in dimensionality
• Trivial description: list all points, class labels
• Is there a shorter description?
Boundary Complexity
![Page 28: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/28.jpg)
Sample Size & Dimensionality
• Problem may appear deceptively simple or complex with small samples
• Large degree of freedom in high-dim. spaces
• Representativeness of samples vs.
generalization ability of classifiers
![Page 29: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/29.jpg)
Mixture of Effects
• Real problems often have mixed effects of
class ambiguity
boundary complexity
sample size & dimensionality
• Geometrical complexity of class manifolds
coupled with probabilistic sampling process
![Page 30: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/30.jpg)
Easy or Difficult Problems
• Linearly separable problems
![Page 31: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/31.jpg)
Easy or Difficult Problems
• Random noise
1000 points 500 points 100 points 10 points
![Page 32: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/32.jpg)
Easy or Difficult Problems
• Others
Nonlinearboundary Spirals
4x4checkerboard
10x10checkerboard
![Page 33: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/33.jpg)
Description of Complexity
• What are real-world problems like?
• Need a description of complexity to
– set expectation on recognition accuracy
– characterize behavior of classifiers
• Apparent or true complexity?
![Page 34: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/34.jpg)
Possible Measures
• Separability of classes– linear separability– length of class boundary– intra / inter class scatter and distances
• Discriminating power of features– Fisher’s discriminant ratio– overlap of feature values– feature efficiency
![Page 35: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/35.jpg)
Possible Measures
• Geometry, topology, clustering effects– curvature of boundaries– overlap of convex hulls– packing of points in regular shapes– intrinsic dimensionality– density variations
![Page 36: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/36.jpg)
Linear Separability
• Intensively studied in early literature• Many algorithms only stop with positive
conclusions– Perceptrons, Perceptron Cycling Theorem, 1962– Fractional Correction Rule, 1954– Widrow-Hoff Delta Rule, 1960– Ho-Kashyap algorithm, 1965– Linear programming, 1968
![Page 37: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/37.jpg)
Length of Class Boundary
• Friedman & Rafsky 1979– Find MST (minimum spanning tree)
connecting all points regardless of class
– Count edges joining opposite classes
– Sensitive to separability
and clustering effects
![Page 38: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/38.jpg)
Fisher’s Discriminant Ratio
• Defined for one feature:
22
21
221
σσ)μ(μ
f
2
22
121 ,,, σσμμ
means, variances of classes 1,2
• One good feature makes a problem easy
• Take maximum over all features
![Page 39: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/39.jpg)
Volume of Overlap Region
• Overlap of class manifolds
• Overlap region of each dimension asa fraction of range spanned by the two classes
• Multiply fractions to estimate volume
• Zero if no overlap
![Page 40: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/40.jpg)
Convex Hulls & Decision Regions
• Hoekstra & Duin 1996
• Measure nonlinearity of a classifier w.r.t.
a given dataset
• Sensitive to smoothness of decision boundaries
![Page 41: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/41.jpg)
Shapes of Class Manifolds
• Lebourgeois & Emptoz 1996
• Packing of same-class
points in hyperspheres
• Thick and spherical,
or thin and elongated manifolds
![Page 42: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/42.jpg)
Measures of Geometrical Complexity
![Page 43: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/43.jpg)
Space of Complexity Measures
• Single measure may not suffice
• Make a measurement space
• See where datasets are in this space
• Look for a continuum of difficulty:
Easiest Cases Most difficult cases
![Page 44: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/44.jpg)
• UC-Irvine collection
• 14 datasets (no missing values, > 500 pts)
• 844 two-class problems
• 452 linearly separable
• 392 linearly nonseparable
• 2 - 4648 points each
• 8 - 480 dimensional feature spaces
Data Sets: UCI
![Page 45: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/45.jpg)
Data Sets: Random Noise
• Randomly located and labeled points
• 100 artificial problems
• 1 to 100 dimensional feature spaces
• 2 classes, 1000 points per class
![Page 46: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/46.jpg)
Patterns in Measurement Space
![Page 47: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/47.jpg)
Correlated or Uncorrelated Measures
![Page 48: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/48.jpg)
Separation + ScatterSeparation
![Page 49: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/49.jpg)
Observations
• Noise sets and linearly separable sets occupy opposite ends in many dimensions
• In-between positions tell relative difficulty
• Fan-like structure in most plots
• At least 2 independent factors, joint effects
• Noise sets are far from real data
• Ranges of noise sets: apparent complexity
![Page 50: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/50.jpg)
Principle Component Analysis
![Page 51: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/51.jpg)
Principle Component Analysis
![Page 52: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/52.jpg)
A Trajectory of Difficulty
![Page 53: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/53.jpg)
What Else Can We Do?
• Find clusters in this space
• Determine intrinsic dimensionality
Study effectiveness of these measures
Interpret problem distributions
![Page 54: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/54.jpg)
What Else Can We Do?
• Study specific domains with these measures• Study alternative formuations, subproblems
induced by localization, projection, transformation
Apply these measures to more problems
![Page 55: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/55.jpg)
What Else Can We Do?
Relate complexity measures to
classifier behavior
![Page 56: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/56.jpg)
Bagging vs Random Subspaces for Decision Forests
Subspaces betterSubsampling better
Fisher’s discriminant ratioLength of class boundary
![Page 57: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/57.jpg)
Bagging vs Random Subspaces for Decision Forests
Nonlinearity, nearest neighborsvs linear classifier
% Retained adherence subsets,Intra/inter class NN distances
![Page 58: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/58.jpg)
Error Rates of Individual Classifiers
Error rate, nearest neighborsvs linear classifier
Error rate, single treesvs sampling density
![Page 59: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/59.jpg)
Observations• Both types of forests are good for
problems of various degrees of difficulty
• Neither is good for extremely difficult cases- many points on boundary - ratio of intra/inter class NN dist. close to 1 - low Fisher’s discriminant ratio - high nonlinearity of NN or LP classifiers
• Subsampling is preferable for sparse samples
• Subspaces is preferable for smooth boundaries
![Page 60: Tin Kam Ho Bell Laboratories Lucent Technologies](https://reader034.vdocuments.site/reader034/viewer/2022051018/5681381b550346895d9fcdbe/html5/thumbnails/60.jpg)
Conclusions
• Real-world problems have different types of geometric characteristics
• Relevant measures can be related to classifier accuracies
• Data complexity analysis improves understanding of classifier or combination behavior
• Helpful for combination theories and practices