object recognition. so what does object recognition involve?
Post on 18-Dec-2015
240 views
TRANSCRIPT
Object Recognition
So what does object recognition involve?
Verification: is that a bus?
Detection: are there cars?
Identification: is that a picture of Mao?
Object categorization
sky
building
flag
wallbanner
bus
cars
bus
face
street lamp
Challenges 1: view point variation
Michelangelo 1475-1564
Challenges 2: illumination
slide credit: S. Ullman
Challenges 3: occlusion
Magritte, 1957
Challenges 4: scale
Challenges 5: deformation
Xu, Beihong 1943
Challenges 7: intra-class variation
Two main approaches
Part-basedGlobal sub-window
Global Approaches
x1 x2 x3
Vectors in high-dimensional space
Aligned images
x1 x2 x3
Vectors in high-dimensional space
Global Approaches
Training
Involves some dimensionality
reduction
Detector
– Scale / position range to search over
Detection
Detection– Scale / position range to search over
Detection– Scale / position range to search over
Detection– Combine detection over space and scale.
PROJECT 1
• Turk and Pentland, 1991• Belhumeur et al. 1997• Schneiderman et al. 2004• Viola and Jones, 2000• Keren et al. 2001• Osadchy et al. 2004
• Amit and Geman, 1999• LeCun et al. 1998• Belongie and Malik, 2002
• Schneiderman et al. 2004• Argawal and Roth, 2002• Poggio et al. 1993
Object DetectionProblem:
Locate instances of object category in a given image. Asymmetric classification
problem!
Background Object (Category)
Very large Relatively small
Complex (thousands of categories)
Simple (single category)
Large prior to appear in an image
Small prior
Easy to collect (not easy to learn from examples)
Hard to collect
All images
Intuition
Denote H to be the acceptance region of a classifier. We propose to minimize the
Pr(All images) ( Pr(bkg)) in H except for the object samples.
Background
Object class
All images Background
We have a prior on the distribution of all natural images
Image smoothness measureLower probability
Lower probability
Distribution of Natural Images – Boltzmann distribution
dxdyIII yx22exp)Pr(
lklkxlk
,
2,
22exp)xPr(
In frequency domain:
Antiface
Lower probability
Lower probability
Ω
d
object images
Acceptance region
Main Idea Claim: for random natural images viewed as
unit vectors,
yx, y x,
is large on average.is large on average.
– for all positive classxd , x
– d is smooth
xd , is large on average for random natural image.
Anti-Face detector is defined as a vector d satisfying:
Discrimination
x
xxd ,
xd ,
x
SMALL
LARGE
If x is an image and is a target class:
Cascade of Independent Detectors
1d
2d
3d
7 inner products
4 inner products
Example
Samples from the training set
4 Anti-Face Detectors
4 Anti-face Detectors4 Anti-face Detectors
Eigenface method with the subspace of dimension 100
Ensemble Learning• Bagging
– reshuffle your training data to create k different training sets and learn f1(x),f2(x),…,fk(x)
– Combine the k different classifiers by majority voting
fFINAL(x) =sign[ 1/k fi(x) ]
• Boosting– Assign different weights to training samples in a “smart”
way so that different classifiers pay more attention to different samples
– Weighted majority voting, the weight of individual classifier is proportional to its accuracy
– Ada-boost (1996) was influenced by bagging, and it is superior to bagging
Boosting - Motivation
• It is usually hard to design an accurate classifier which generalizes well
• However it is usually easy to find many “rule of thumb” weak classifiers– A classifier is weak if it is only slightly better
than random guessing
• Can we combine several weak classifiers to produce an accurate classifier?– Question people have been working on since
1980’s
Ada Boost• Let’s assume we have 2-class classification
problem, with yi -1,1• Ada boost will produce a discriminant function:
T
ttt xfxg
1
where ft(x) is the “weak” classifier
The final classifier is the sign of the discriminant function, that is ffinal(x) = sign[g(x)]
Idea Behind Ada Boost
• Algorithm is iterative• Maintains distribution of weights over the training
examples• Initially distribution of weights is uniform• At successive iterations, the weight of misclassified
examples is increased, forcing the weak learner to focus on the hard examples in the training set
PROJECT 2
Training with small number of Examples
• Majority of object detection method require a large number of training examples.
• Goal: to design a classifier that can learn from a small number of examples
• Use small number in a existing classifiers
Overfiting: learns by hart the training examples, performs poor on unseen examples.
Linear SVM
Maximal margin
Enough training data
Class 1
Class 2
Not Enough training data
Linear SVM –Detection Task
Class 1
Class 2
0 xwx b
MM with prior
0xwx B b
margin wide3)
H samples postive )2
Hin images) natural(min)1
P
Object class
PROJECT 4
Part-Based Approaches
ObjectObject
Bag of ‘words’Bag of ‘words’
Constellation of partsConstellation of parts
Of all the sensory impressions proceeding to the brain, the visual experiences are the dominant ones. Our perception of the world around us is based essentially on the messages that reach the brain from our eyes. For a long time it was thought that the retinal image was transmitted point by point to visual centers in the brain; the cerebral cortex was a movie screen, so to speak, upon which the image in the eye was projected. Through the discoveries of Hubel and Wiesel we now know that behind the origin of the visual perception in the brain there is a considerably more complicated course of events. By following the visual impulses along their path to the various cell layers of the optical cortex, Hubel and Wiesel have been able to demonstrate that the message about the image falling on the retina undergoes a step-wise analysis in a system of nerve cells stored in columns. In this system each cell has its specific function and is responsible for a specific detail in the pattern of the retinal image.
sensory, brain, visual, perception,
retinal, cerebral cortex,eye, cell, optical
nerve, imageHubel, Wiesel
China is forecasting a trade surplus of $90bn (£51bn) to $100bn this year, a threefold increase on 2004's $32bn. The Commerce Ministry said the surplus would be created by a predicted 30% jump in exports to $750bn, compared with a 18% rise in imports to $660bn. The figures are likely to further annoy the US, which has long argued that China's exports are unfairly helped by a deliberately undervalued yuan. Beijing agrees the surplus is too high, but says the yuan is only one factor. Bank of China governor Zhou Xiaochuan said the country also needed to do more to boost domestic demand so more goods stayed within the country. China increased the value of the yuan against the dollar by 2.1% in July and permitted it to trade within a narrow band, but the US wants the yuan to be allowed to trade freely. However, Beijing has made it clear that it will take its time and tread carefully before allowing the yuan to rise further in value.
China, trade, surplus, commerce,
exports, imports, US, yuan, bank, domestic,
foreign, increase, trade, value
Bag of ‘words’ analogy to documents
Interest Point Detectors
• Basic requirements:– Sparse– Informative – Repeatable
• Invariance– Rotation– Scale (Similarity)– Affine
Popular Detectors
Scale Invariant
Affine Invariant
Harris-Laplace Affine
Difference of Gaussians Laplace of Gaussians Scale Saliency (Kadir-Braidy)
Harris-Laplace
Difference of Gaussians
Affine
Laplace of Gaussians
Affine
Affine Saliency (Kadir-Braidy)
The are many others…
See:
1) “Scale and affine invariant interest point detectors” K. Mikolajczyk, C. Schmid,
IJCV, Volume 60, Number 1 - 2004
2) “A comparison of affine region detectors”, K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir and L. Van Gool, http://www.robots.ox.ac.uk/~vgg/research/affine/det_eval_files/vibes_ijcv2004.pdf
Representation of appearance:Local Descriptors
• Invariance– Rotation– Scale – Affine
• Insensitive to small deformations
• Illumination invariance– Normalize out
SIFT – Scale Invariant Feature Transform
• Descriptor overview:– Determine scale (by maximizing DoG in scale and in space),
local orientation as the dominant gradient direction.Use this scale and orientation to make all further computations invariant to scale and rotation.
– Compute gradient orientation histograms of several small windows (128 values for each point)
– Normalize the descriptor to make it invariant to intensity change
David G. Lowe, "Distinctive image features from scale-invariant keypoints,“ International Journal of Computer Vision, 60, 2 (2004), pp. 91-110.
Feature Detection and Representation
Normalize patch
Detect patches[Mikojaczyk and Schmid ’02]
[Matas et al. ’02]
[Sivic et al. ’03]
Compute SIFT
descriptor
[Lowe’99]
Slide credit: Josef Sivic
…
Feature Detection and Representation
Codewords dictionary formationCodewords dictionary formation
…
Codewords dictionary formationCodewords dictionary formation
Vector quantization
…
Slide credit: Josef Sivic
Codewords dictionary formationCodewords dictionary formation
Fei-Fei et al. 2005
Image patch examples of codewordsImage patch examples of codewords
Sivic et al. 2005
Vector X
Representation
Learning
positive negative
SVM classifier
positive negative
SVM classification
SVM classification
Recognition
SVM(X)
Contains object
Vector X
Representation
Doesn’t contain object
PROJECT 3
Pros/Cons• Pros.
– Fast and simple. – Insensitive to pose variation.– No segmentation required during learning.
• Cons.– No localization.– Requires discriminative or no background.
• An object in an image is represented by a collection of parts, characterized by both their visual appearances and locations.
• Object categories are modeled by the appearance and spatial distributions of these characteristic parts.
Constellation of Parts
The correspondence problem• Model with P parts• Image with N possible locations for each part
• NP combinations!!!Slide credit: Rob Fergus
How to model location?
• Explicit: Probability density functions
• Implicit: Voting scheme
• Probability densities– Continuous (Gaussians)– Analogy with springs
• Parameters of model, and – Independence corresponds to zeros in
Explicit shape model
Slide credit: Rob Fergus
Different graph structures
1
3
4 5
6
2
1
3
4 5
6
2
Fully connected Star structure
1
3
4
5
6
2
Tree structure
O(N6) O(N2) O(N2)
• Sparser graphs cannot capture all interactions between parts
Slide credit: Rob Fergus
Implicit shape model
Spatial occurrence distributionsx
y
s
x
y
sx
y
s
x
y
s
Probabilistic Voting
Interest PointsMatched Codebook Entries
Recognition
Learning• Learn appearance codebook
– Cluster over interest points on training images
• Learn spatial distributions– Match codebook to training images– Record matching positions on object– Centroid is given
• Use Hough space voting to find object • Leibe and Schiele ’03,’05
Slide credit: Rob Fergus
Pros/Cons
• Pros– Principle modeling– Models appearance and shape– Provides localization
• Cons– Computationally expensive – Small number of parts (learning on
unsegmented images) or requires bounding box during learning.
Week Shape Model
• Model parts arrangements
• Allows many parts but the model is computationally effective
• context distributions – see each part in the context of other parts.
PROJECT 4