cvpr2007 object category recognition p3 - discriminative models
TRANSCRIPT
Part 3: classifier based methodsAntonio Torralba
Overview of section• A short story of discriminative methods
• Object detection with classifiers
• Boosting– Gentle boosting– Weak detectors– Object model– Object detection
• Multiclass object detection
• Context based object recognition
Classifier based methodsObject detection and recognition is formulated as a classification problem.
Bag of image patches
… and a decision is taken at each window about if it contains a target object or not.
Decision boundary
Computer screen
Background
In some feature space
Where are the screens?
The image is partitioned into a set of overlapping windows
(The lousy painter)
Discriminative vs. generative
0 10 20 30 40 50 60 700
0.05
0.1
x = data
• Generative model
0 10 20 30 40 50 60 700
0.5
1
x = data
• Discriminative model
0 10 20 30 40 50 60 70 80
-1
1
x = data
• Classification function
(The artist)
• The representation and matching of pictorial structures Fischler, Elschlager (1973). • Face recognition using eigenfaces M. Turk and A. Pentland (1991). • Human Face Detection in Visual Scenes - Rowley, Baluja, Kanade (1995) • Graded Learning for Object Detection - Fleuret, Geman (1999) • Robust Real-time Object Detection - Viola, Jones (2001)• Feature Reduction and Hierarchy of Classifiers for Fast Object Detection in Video Images - Heisele, Serre, Mukherjee, Poggio (2001)•….
• The representation and matching of pictorial structures Fischler, Elschlager (1973). • Face recognition using eigenfaces M. Turk and A. Pentland (1991). • Human Face Detection in Visual Scenes - Rowley, Baluja, Kanade (1995) • Graded Learning for Object Detection - Fleuret, Geman (1999) • Robust Real-time Object Detection - Viola, Jones (2001)• Feature Reduction and Hierarchy of Classifiers for Fast Object Detection in Video Images - Heisele, Serre, Mukherjee, Poggio (2001)•….
Face detection
• The representation and matching of pictorial structures Fischler, Elschlager (1973). • Face recognition using eigenfaces M. Turk and A. Pentland (1991). • Human Face Detection in Visual Scenes - Rowley, Baluja, Kanade (1995) • Graded Learning for Object Detection - Fleuret, Geman (1999) • Robust Real-time Object Detection - Viola, Jones (2001)• Feature Reduction and Hierarchy of Classifiers for Fast Object Detection in Video Images - Heisele, Serre, Mukherjee, Poggio (2001)•….
Face detection
• Formulation: binary classification
Formulation
+1-1
x1 x2 x3 xN
…
… xN+1 xN+2 xN+M
-1 -1 ? ? ?
…
Training data: each image patch is labeledas containing the object or background
Test data
Features x =
Labels y =
Where belongs to some family of functions
• Classification function
• Minimize misclassification error(Not that simple: we need some guarantees that there will be generalization)
Discriminative methods
106 examples
Nearest neighbor
Shakhnarovich, Viola, Darrell 2003Berg, Berg, Malik 2005…
Neural networks
LeCun, Bottou, Bengio, Haffner 1998Rowley, Baluja, Kanade 1998…
Support Vector Machines and Kernels Conditional Random Fields
McCallum, Freitag, Pereira 2000Kumar, Hebert 2003…
Guyon, VapnikHeisele, Serre, Poggio, 2001…
A simple object detector with Boosting
Download
• Toolbox for manipulating dataset
• Code and dataset
Matlab code
• Gentle boosting
• Object detector using a part based model
Dataset with cars and computer monitors
http://people.csail.mit.edu/torralba/iccv2005/
• A simple algorithm for learning robust classifiers– Freund & Shapire, 1995– Friedman, Hastie, Tibshhirani, 1998
• Provides efficient algorithm for sparse visual feature selection– Tieu & Viola, 2000– Viola & Jones, 2003
• Easy to implement, not requires external optimization tools.
Why boosting?
Boosting
Boosting fits the additive model
by minimizing the exponential loss
Training samples
The exponential loss is a differentiable upper bound to the misclassification error.
Boosting
Sequential procedure. At each step we add
For more details: Friedman, Hastie, Tibshirani. “Additive Logistic Regression: a Statistical View of Boosting” (1998)
to minimize the residual loss
inputDesired outputParametersweak classifier
Weak classifiers • The input is a set of weighted training
samples (x,y,w)
• Regression stumps: simple but commonly used in object detection.
Four parameters:
b=Ew(y [x> ])
a=Ew(y [x< ])x
fm(x)
Flavors of boosting
• AdaBoost (Freund and Shapire, 1995)
• Real AdaBoost (Friedman et al, 1998)
• LogitBoost (Friedman et al, 1998)
• Gentle AdaBoost (Friedman et al, 1998)
• BrownBoosting (Freund, 2000)
• FloatBoost (Li et al, 2002)
• …
From images to features:A myriad of weak detectors
We will now define a family of visual features that can be used as weak classifiers (“weak detectors”)
Takes image as input and the output is binary response.The output is a weak detector.
A myriad of weak detectors
• Yuille, Snow, Nitzbert, 1998• Amit, Geman 1998• Papageorgiou, Poggio, 2000• Heisele, Serre, Poggio, 2001• Agarwal, Awan, Roth, 2004• Schneiderman, Kanade 2004 • Carmichael, Hebert 2004• …
Weak detectors
Textures of textures Tieu and Viola, CVPR 2000
Every combination of three filters generates a different feature
This gives thousands of features. Boosting selects a sparse subset, so computations on test time are very efficient. Boosting also avoids overfitting to some extend.
Haar wavelets
Haar filters and integral imageViola and Jones, ICCV 2001
The average intensity in the block is computed with four sums independently of the block size.
Haar waveletsPapageorgiou & Poggio (2000)
Polynomial SVM
Edges and chamfer distance
Gavrila, Philomin, ICCV 1999
Edge fragments
Weak detector = k edge fragments and threshold. Chamfer distance uses 8 orientation planes
Opelt, Pinz, Zisserman, ECCV 2006
Histograms of oriented gradients
• Dalal & Trigs, 2006
• Shape context
Belongie, Malik, Puzicha, NIPS 2000• SIFT, D. Lowe, ICCV 1999
Weak detectors
Part based: similar to part-based generative models. We create weak detectors by using parts and voting for the object center location
Car model Screen model
These features are used for the detector on the course web site.
Weak detectors
First we collect a set of part templates from a set of training objects.
Vidal-Naquet, Ullman, Nature Neuroscience 2003
…
Weak detectors
We now define a family of “weak detectors” as:
= =
Better than chance
*
Weak detectors
We can do a better job using filtered images
Still a weak detectorbut better than before
* * ===
TrainingFirst we evaluate all the N features on all the training images.
Then, we sample the feature outputs on the object center and at random locations in the background:
Representation and object model
…
4 10
Selected features for the screen detector
1 2 3
…
100
Lousy painter
Representation and object modelSelected features for the car detector
1 2 3 4 10 100
… …
Detection• Invariance: search strategy
• Part based
fi, Pigi
Here, invariance in translation and scale is achieved by the search strategy: the classifier is evaluated at all locations (by translating the image) and at all scales (by scaling the image in small steps).
The search cost can be reduced using a cascade.
Example: screen detectionFeature output
Example: screen detectionFeature output
Thresholded output
Weak ‘detector’Produces many false alarms.
Example: screen detectionFeature output
Thresholded output
Strong classifier at iteration 1
Example: screen detectionFeature output
Thresholded output
Strongclassifier
Second weak ‘detector’Produces a different set of false alarms.
Example: screen detection
+
Feature output
Thresholded output
Strongclassifier
Strong classifier at iteration 2
Example: screen detection
+
…
Feature output
Thresholded output
Strongclassifier
Strong classifier at iteration 10
Example: screen detection
+
…
Feature output
Thresholded output
Strongclassifier
Adding features
Finalclassification
Strong classifier at iteration 200
We want the complexity of the 3 features classifier with the performance of the 100 features classifier:
Cascade of classifiersFleuret and Geman 2001, Viola and Jones 2001
Recall
Precision
0% 100%
100%
3 features
30 features
100 features
Select a threshold with high recall for each stage.
We increase precision using the cascade
Some goals for object recognition
• Able to detect and recognize many object classes
• Computationally efficient
• Able to deal with data starving situations:– Some training samples might be harder to
collect than others– We want on-line learning to be fast
Multiclass object detection
Multiclass object detection
Number of classes
Amount of labeled data
Number of classes
Complexity
Shared features• Is learning the object class 1000 easier
than learning the first?
• Can we transfer knowledge from one object to another?
• Are the shared properties interesting by themselves?
…
Multitask learning
•horizontal location of doorknob •single or double door•horizontal location of doorway center •width of doorway•horizontal location of left door jamb
•horizontal location of right door jamb•width of left door jamb •width of right door jamb•horizontal location of left edge of door •horizontal location of right edge of door
Primary task: detect door knobs
Tasks used:
R. Caruana. Multitask Learning. ML 1997
Sharing invariancesS. Thrun. Is Learning the n-th Thing Any Easier Than Learning The First? NIPS 1996
“Knowledge is transferred between tasks via a learned model of the invariances of the domain: object recognition is invariant to rotation, translation, scaling, lighting, … These invariances are common to all object recognition tasks”.
Toy world
Without sharing
With sharing
Sharing transformationsMiller, E., Matsakis, N., and Viola, P. (2000). Learning from one example
through shared densities on transforms. In IEEE Computer Vision and Pattern Recognition.
Transformations are sharedand can be learnt from other tasks.
Models of object recognitionI. Biederman, “Recognition-by-components: A theory of human image understanding,” Psychological Review, 1987.
M. Riesenhuber and T. Poggio, “Hierarchical models of object recognition in cortex,” Nature Neuroscience 1999.
T. Serre, L. Wolf and T. Poggio. “Object recognition with features inspired by visual cortex”. CVPR 2005
Sharing in constellation models
Pictorial StructuresFischler & Elschlager, IEEE Trans. Comp. 1973
Constellation ModelBurl, Liung,Perona, 1996; Weber, Welling, Perona, 2000
Fergus, Perona, & Zisserman, CVPR 2003
SVM DetectorsHeisele, Poggio, et. al., NIPS 2001
Model-Guided SegmentationMori, Ren, Efros, & Malik, CVPR 2004
E-Step
Random initialization
Variational EMVariational EM
prior knowledge of p()
new estimate of p(|train)
M-Step
new ’s
(Attias, Hinton, Beal, etc.)Slide from Fei Fei Li
Fei-Fei, Fergus, & Perona, ICCV 2003
Reusable Parts
Goal: Look for a vocabulary of edges that reduces the number of features.
Krempp, Geman, & Amit “Sequential Learning of Reusable Parts for Object Detection”. TR 2002
Num
ber
of f
eatu
res
Number of classes
Examples of reused parts
Sharing patches• Bart and Ullman, 2004
For a new class, use only features similar to features that where good for other classes:
Proposed Dog features
Multiclass boosting
• Adaboost.MH (Shapire & Singer, 2000)
• Error correcting output codes (Dietterich & Bakiri, 1995; …)
• Lk-TreeBoost (Friedman, 2001)
• ...
Shared features
Screen detector
Car detector
Face detector
• Independent binary classifiers:
Torralba, Murphy, Freeman. CVPR 2004. PAMI 2007
Screen detector
Car detector
Face detector
• Binary classifiers that share features:
50 training samples/class29 object classes2000 entries in the dictionary
Results averaged on 20 runsError bars = 80% interval
Krempp, Geman, & Amit, 2002
Torralba, Murphy, Freeman. CVPR 2004
Shared features
Class-specific features
Generalization as a function of object similarities
12 viewpoints12 unrelated object classes
Number of training samples per class Number of training samples per class
Are
a un
der
RO
C
Are
a un
der
RO
C K = 2.1 K = 4.8
Torralba, Murphy, Freeman. CVPR 2004. PAMI 2007
Opelt, Pinz, Zisserman, CVPR 2006
Efficiency Generalization
Some references on multiclass
• Caruana 1997• Schapire, Singer, 2000• Thrun, Pratt 1997• Krempp, Geman, Amit, 2002• E.L.Miller, Matsakis, Viola, 2000• Mahamud, Hebert, Lafferty, 2001• Fink 2004• LeCun, Huang, Bottou, 2004• Holub, Welling, Perona, 2005• …
Context based methodsAntonio Torralba
Why is this hard?
2
1
What are the hidden objects?
What are the hidden objects?
Chance ~ 1/30000
Context-based object recognition
• Cognitive psychology– Palmer 1975 – Biederman 1981– …
• Computer vision– Noton and Stark (1971)– Hanson and Riseman (1978)– Barrow & Tenenbaum (1978) – Ohta, kanade, Skai (1978)– Haralick (1983)– Strat and Fischler (1991)– Bobick and Pinhanez (1995)– Campbell et al (1997)
Global and local representations
building
car
sidewalk
Urban street scene
Global and local representations
Image index: Summary statistics, configuration of textures
Urban street scene
features
histogram
building
car
sidewalk
Urban street scene
Global scene representations
Spatial structure is important in order to provide context for object localization
Sivic, Russell, Freeman, Zisserman, ICCV 2005Fei-Fei and Perona, CVPR 2005Bosch, Zisserman, Munoz, ECCV 2006
Bag of words Spatially organized textures
Non localized textons
S. Lazebnik, et al, CVPR 2006Walker, Malik. Vision Research 2004
…
M. Gorkani, R. Picard, ICPR 1994A. Oliva, A. Torralba, IJCV 2001
…
Contextual object relationshipsCarbonetto, de Freitas & Barnard (2004) Kumar, Hebert (2005)
Torralba Murphy Freeman (2004)
Fink & Perona (2003)E. Sudderth et al (2005)
Context
• Murphy, Torralba & Freeman (NIPS 03)Use global context to predict presence and location of objects
Op1,c1
vp1,c1
OpN,c1
vpN,c1. . .
Op1,c2
vp1,c2
OpN,c2
vpN,c2. . .
Class 1 Class 2
E1 E2
S
c2maxVc1
maxV
X1X2vg
Keyboards
3d Scene Context
Image World
[Hoiem, Efros, Hebert ICCV 2005]
3d Scene Context
Image Support Vertical Sky
V-Left V-Center V-Right V-Porous V-Solid
[Hoiem, Efros, Hebert ICCV 2005]
ObjectSurface?
Support?
Object-Object Relationships
Enforce spatial consistency between labels using MRF
Carbonetto, de Freitas & Barnard (04)
Object-Object Relationships
Use latent variables to induce long distance correlations between labels in a Conditional Random Field (CRF)
He, Zemel & Carreira-Perpinan (04)
• Fink & Perona (NIPS 03)Use output of boosting from other objects at previous
iterations as input into boosting for this iteration
Object-Object Relationships
CRFsObject-Object Relationships
[Torralba Murphy Freeman 2004] [Kumar Hebert 2005]
Hierarchical Sharing and Context
• Scenes share objects
• Objects share parts
• Parts share features
E. Sudderth, A. Torralba, W. T. Freeman, and A. Wilsky.
Some references on context
• Strat & Fischler (PAMI 91) • Torralba & Sinha (ICCV 01), • Torralba (IJCV 03) • Fink & Perona (NIPS 03)• Murphy, Torralba & Freeman (NIPS 03)• Kumar and M. Hebert (NIPS 04)• Carbonetto, Freitas & Barnard (ECCV 04)• He, Zemel & Carreira-Perpinan (CVPR 04)• Sudderth, Torralba, Freeman, Wilsky (ICCV 05)• Hoiem, Efros, Hebert (ICCV 05)• …
With a mixture of generative and discriminative approaches
A car out of context …
Banksy
Integrated models for scene and object recognition
Summary
Many techniques are used for training discriminative models that I have not mention here:
• Conditional random fields
• Kernels for object recognition
• Learning object similarities
• …