visual recognition: objects, actions and scenes
TRANSCRIPT
![Page 1: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/1.jpg)
Includes slides from: O. Chum, K. Grauman, S. Lazebnik, B. Leibe, D. Lowe, J. Philbin,
J. Ponce, D. Nister, J. Sivic, N. Snavely and A. Zisserman
Lecture 2:
Local invariant image features
Ivan Laptev
INRIA, WILLOW, ENS/INRIA/CNRS UMR 8548
Laboratoire d’Informatique, Ecole Normale Supérieure, Paris, France
University of Trento
July 7-10, 2014
Trento, Italy
Visual Recognition:
Objects, Actions and Scenes
![Page 2: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/2.jpg)
Image matching and recognition with local features
The goal: establish correspondence between two or more
images
Image points x and x’ are in correspondence if they are
projections of the same 3D scene point X.Images courtesy A. Zisserman
x
x'
XPP
/
![Page 3: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/3.jpg)
Example I: Wide baseline matching and 3D reconstruction
Establish correspondence between two (or more) images.
[Schaffalitzky and Zisserman ECCV 2002]
![Page 4: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/4.jpg)
Example I: Wide baseline matching and 3D reconstruction
Establish correspondence between two (or more) images.
[Schaffalitzky and Zisserman ECCV 2002]
X
![Page 5: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/5.jpg)
[Agarwal, Snavely, Simon, Seitz, Szeliski, ICCV’09] –
Building Rome in a Day
57,845 downloaded images, 11,868 registered images. This video: 4,619 images.
![Page 6: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/6.jpg)
Example II: Object recognition
[D. Lowe, 1999]
Establish correspondence between the target image and
(multiple) images in the model database.
Target
image
Model
database
![Page 7: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/7.jpg)
Find these landmarks ...in these images and 1M more
Example III: Visual search
Given a query image, find images depicting the same place /
object in a large unordered image collection.
![Page 8: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/8.jpg)
Establish correspondence between the query image and all
images from the database depicting the same object / scene.
Query image
Database image(s)
![Page 9: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/9.jpg)
Bing visual scan
Mobile visual search
and others… Snaptell.com, Millpix.com
![Page 10: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/10.jpg)
Applications
Take a picture of a product or advertisement
find relevant information on the web
[Pixee – Milpix]
![Page 11: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/11.jpg)
Applications
Finding stolen/missing objects in a large collection…
![Page 12: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/12.jpg)
Applications
Copy detection for images and videos
Search in 200h of videoQuery video
![Page 13: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/13.jpg)
13K. Grauman, B. Leibe
Sony Aibo – Robotics
• Recognize docking station
• Communicate with visual cards
• Place recognition
• Loop closure in SLAM
Slide credit: David Lowe
Applications
![Page 14: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/14.jpg)
Why is it difficult?
Want to establish correspondence despite possibly large changes in scale, viewspoint, lighting and partial occlusion
ViewpointScale
Lighting Occlusion
… and the image collection can be very large (e.g. 1M images)
![Page 15: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/15.jpg)
Compute scale / affine co-variant local features
Estimate pairwise best matches between local features
Enforce geometric constraints between local features
How does it work?
Approach:
![Page 16: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/16.jpg)
Compute scale / affine co-variant local features
Estimate pairwise best matches between local features
Enforce geometric constraints between local features
How does it work?
Approach:
![Page 17: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/17.jpg)
Compute scale / affine co-variant local features
Estimate pairwise best matches between local features
Enforce geometric constraints between local features
How does it work?
Approach:
![Page 18: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/18.jpg)
Why extract features?
• Motivation: panorama stitching• We have two images – how do we combine them?
Slide: S. Lazebnik
![Page 19: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/19.jpg)
Why extract features?
• Motivation: panorama stitching• We have two images – how do we combine them?
Step 1: extract features
Step 2: match features
Slide: S. Lazebnik
![Page 20: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/20.jpg)
Why extract features?
• Motivation: panorama stitching• We have two images – how do we combine them?
Step 1: extract features
Step 2: match features
Step 3: align imagesSlide: S. Lazebnik
![Page 21: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/21.jpg)
Characteristics of good features
• Repeatability• The same feature can be found in several images despite geometric
and photometric transformations
• Saliency• Each feature is distinctive
• Compactness and efficiency• Many fewer features than image pixels
• Locality• A feature occupies a relatively small area of the image; robust to
clutter and occlusion
Slide: S. Lazebnik
![Page 22: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/22.jpg)
A hard feature matching problem
NASA Mars Rover images
Slide: S. Lazebnik
![Page 23: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/23.jpg)
NASA Mars Rover images
with SIFT feature matches
Figure by Noah Snavely
Answer below (look for tiny colored squares…)
Slide: S. Lazebnik
![Page 24: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/24.jpg)
Corner Detection: Basic Idea
• We should easily recognize the point by looking through a small window
• Shifting a window in any direction should give a large change in intensity
“edge”:
no change
along the edge
direction
“corner”:
significant
change in all
directions
“flat” region:
no change in
all directions
Source: A. Efros
![Page 25: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/25.jpg)
Corner Detection: Mathematics
Change in appearance of window W for the shift [u,v]:
I(x, y)E(u, v)
E(3,2)
Wyx
yxIvyuxIvuE),(
2)],(),([),(
Slide: S. Lazebnik
![Page 26: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/26.jpg)
Corner Detection: Mathematics
I(x, y)E(u, v)
E(0,0)
Change in appearance of window W for the shift [u,v]:
Wyx
yxIvyuxIvuE),(
2)],(),([),(
Slide: S. Lazebnik
![Page 27: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/27.jpg)
Corner Detection: Mathematics
We want to find out how this function behaves for
small shiftsE(u, v)
Change in appearance of window W for the shift [u,v]:
Wyx
yxIvyuxIvuE),(
2)],(),([),(
Slide: S. Lazebnik
![Page 28: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/28.jpg)
Corner Detection: Mathematics
• First-order Taylor approximation for small
motions [u, v]:
• Let’s plug this into
v
uIIyxI
vIuIyxI
vIuIyxIvyuxI
yx
yx
yx
),(
),(
sorder termhigher ),(),(
Wyx
yxIvyuxIvuE),(
2)],(),([),(
Slide: S. Lazebnik
![Page 29: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/29.jpg)
Corner Detection: Mathematics
v
u
III
IIIvu
v
uII
yxIv
uIIyxI
yxIvyuxIvuE
yyx
yxx
Wyx
Wyx
yx
Wyx
yx
Wyx
2
2
),(
),(
2
),(
2
),(
2
)],(),([
)],(),([),(
Slide: S. Lazebnik
![Page 30: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/30.jpg)
Corner Detection: Mathematics
The quadratic approximation simplifies to
where M is a second moment matrix computed from image
derivatives:
v
uMvuvuE ),(
Wyx yyx
yxx
III
IIIM
),(2
2
Slide: S. Lazebnik
![Page 31: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/31.jpg)
• The surface E(u,v) is locally approximated by a
quadratic form. Let’s try to understand its shape.
• Specifically, in which directions
does it have the smallest/greatest
change?
Interpreting the second moment matrix
v
uMvuvuE ][),(
Wyx yyx
yxx
III
IIIM
),(2
2
E(u, v)
Slide: S. Lazebnik
![Page 32: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/32.jpg)
First, consider the axis-aligned case
(gradients are either horizontal or vertical)
If either a or b is close to 0, then this is not a corner,
so look for locations where both are large.
Interpreting the second moment matrix
Wyx yyx
yxx
III
IIIM
),(2
2
b
a
0
0
Slide: S. Lazebnik
![Page 33: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/33.jpg)
Consider a horizontal “slice” of E(u, v):
Interpreting the second moment matrix
This is the equation of an ellipse.
const][
v
uMvu
Slide: S. Lazebnik
![Page 34: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/34.jpg)
Visualization of second moment matrices
Slide: S. Lazebnik
![Page 35: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/35.jpg)
Visualization of second moment matrices
Slide: S. Lazebnik
![Page 36: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/36.jpg)
Consider a horizontal “slice” of E(u, v):
Interpreting the second moment matrix
This is the equation of an ellipse.
RRM
2
11
0
0
The axis lengths of the ellipse are determined by the
eigenvalues and the orientation is determined by R
direction of the
slowest change
direction of the
fastest change
(max)-1/2
(min)-1/2
const][
v
uMvu
Diagonalization of M:
Slide: S. Lazebnik
![Page 37: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/37.jpg)
Interpreting the eigenvalues
1
2
“Corner”
1 and 2 are large,
1 ~ 2;
E increases in all
directions
1 and 2 are small;
E is almost constant
in all directions
“Edge”
1 >> 2
“Edge”
2 >> 1
“Flat”
region
Classification of image points using eigenvalues
of M:
Slide: S. Lazebnik
![Page 38: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/38.jpg)
Corner response function
“Corner”
R > 0
“Edge”
R < 0
“Edge”
R < 0
“Flat”
region
|R| small
2
2121
2 )()(trace)det( MMR
α: constant (0.04 to 0.06)
Slide: S. Lazebnik
![Page 39: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/39.jpg)
Harris detector: Steps
1. Compute Gaussian derivatives at each pixel
2. Compute second moment matrix M in a
Gaussian window around each pixel
3. Compute corner response function R
4. Threshold R
5. Find local maxima of response function
(nonmaximum suppression)
C.Harris and M.Stephens. “A Combined Corner and Edge Detector.” Proceedings of the 4th Alvey Vision Conference: pages 147—151, 1988.
Slide: S. Lazebnik
![Page 40: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/40.jpg)
Harris Detector: Steps
Slide: S. Lazebnik
![Page 41: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/41.jpg)
Harris Detector: Steps
Compute corner response R
Slide: S. Lazebnik
![Page 42: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/42.jpg)
Harris Detector: Steps
Find points with large corner response: R>threshold
Slide: S. Lazebnik
![Page 43: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/43.jpg)
Harris Detector: Steps
Take only the points of local maxima of R
Slide: S. Lazebnik
![Page 44: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/44.jpg)
Harris Detector: Steps
Slide: S. Lazebnik
![Page 45: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/45.jpg)
Invariance and covariance
• We want corner locations to be invariant to photometric
transformations and covariant to geometric transformations
• Invariance: image is transformed and corner locations do not change
• Covariance: if we have two transformed versions of the same image,
features should be detected in corresponding locations
Slide: S. Lazebnik
![Page 46: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/46.jpg)
Affine intensity change
• Only derivatives are used =>
invariance to intensity shift I I + b
• Intensity scaling: I a I
R
x (image coordinate)
threshold
R
x (image coordinate)
Partially invariant to affine intensity change
I a I + b
Slide: S. Lazebnik
![Page 47: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/47.jpg)
Image translation
• Derivatives and window function are shift-invariant
Corner location is covariant w.r.t. translation
Slide: S. Lazebnik
![Page 48: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/48.jpg)
Image rotation
Second moment ellipse rotates but its shape
(i.e. eigenvalues) remains the same
Corner location is covariant w.r.t. rotation
Slide: S. Lazebnik
![Page 49: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/49.jpg)
Scaling
All points will
be classified
as edges
Corner
Corner location is not covariant to scaling!
Slide: S. Lazebnik
![Page 50: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/50.jpg)
![Page 51: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/51.jpg)
Blob detection
Slide: S. Lazebnik
![Page 52: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/52.jpg)
Feature detection with scale selection
• We want to extract features with characteristic
scale that is covariant with the image
transformation
Slide: S. Lazebnik
![Page 53: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/53.jpg)
Blob detection: basic idea
• To detect blobs, convolve the image with a
“blob filter” at multiple scales and look for
maxima of filter response in the resulting
scale space
Slide: S. Lazebnik
![Page 54: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/54.jpg)
Blob filter
Laplacian of Gaussian: Circularly symmetric
operator for blob detection in 2D
2
2
2
22
y
g
x
gg
Slide: S. Lazebnik
![Page 55: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/55.jpg)
Recall: Edge detection
gdx
df
f
gdx
d
Source: S. Seitz
Edge
Derivative
of Gaussian
Edge = maximum
of derivative
![Page 56: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/56.jpg)
Edge detection, Take 2
gdx
df
2
2
f
gdx
d2
2
Edge
Second derivative
of Gaussian
(Laplacian)
Edge = zero crossing
of second derivative
Source: S. Seitz
![Page 57: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/57.jpg)
From edges to blobs
• Edge = ripple
• Blob = superposition of two ripples
Spatial selection: the magnitude of the Laplacian
response will achieve a maximum at the center of
the blob, provided the scale of the Laplacian is
“matched” to the scale of the blob
maximum
Slide: S. Lazebnik
![Page 58: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/58.jpg)
Scale-space blob detector: Example
Slide: S. Lazebnik
![Page 59: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/59.jpg)
Scale-space blob detector: Example
Slide: S. Lazebnik
![Page 60: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/60.jpg)
Scale-space blob detector
1. Convolve image with scale-normalized
Laplacian at several scales
2. Find maxima of squared Laplacian response
in scale-space
Slide: S. Lazebnik
![Page 61: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/61.jpg)
Scale-space blob detector: Example
Slide: S. Lazebnik
![Page 62: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/62.jpg)
• Approximating the Laplacian with a
difference of Gaussians:
2 ( , , ) ( , , )xx yyL G x y G x y
( , , ) ( , , )DoG G x y k G x y
(Laplacian)
(Difference of Gaussians)
Efficient implementation
Slide: S. Lazebnik
![Page 63: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/63.jpg)
Efficient implementation
David G. Lowe. "Distinctive image features from scale-invariant
keypoints.” IJCV 60 (2), pp. 91-110, 2004.
Slide: S. Lazebnik
![Page 64: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/64.jpg)
From feature detection to feature description
• Scaled and rotated versions of the same
neighborhood will give rise to blobs that are related
by the same transformation
• What to do if we want to compare the appearance of
these image regions?
• Normalization: transform these regions into same-
size circles
• Problem: rotational ambiguity
Slide: S. Lazebnik
![Page 65: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/65.jpg)
Eliminating rotation ambiguity
• To assign a unique orientation to circular
image windows:• Create histogram of local gradient directions in the patch
• Assign canonical orientation at peak of smoothed histogram
0 2 p
Slide: S. Lazebnik
![Page 66: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/66.jpg)
SIFT features
• Detected features with characteristic scales
and orientations:
David G. Lowe. "Distinctive image features from scale-invariant
keypoints.” IJCV 60 (2), pp. 91-110, 2004. Slide: S. Lazebnik
![Page 67: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/67.jpg)
SIFT descriptors
David G. Lowe. "Distinctive image features from scale-invariant
keypoints.” IJCV 60 (2), pp. 91-110, 2004. Slide: S. Lazebnik
![Page 68: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/68.jpg)
Invariance vs. covariance
Invariance:• features(transform(image)) = features(image)
Covariance:• features(transform(image)) = transform(features(image))
Covariant detection => invariant descriptionSlide: S. Lazebnik
![Page 69: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/69.jpg)
Affine adaptation
• Affine transformation approximates viewpoint
changes for roughly planar objects and
roughly orthographic cameras
Slide: S. Lazebnik
![Page 70: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/70.jpg)
Affine adaptation
RRIII
IIIyxwM
yyx
yxx
yx
2
11
2
2
, 0
0),(
direction of
the slowest
change
direction of the
fastest change
(max)-1/2
(min)-1/2
Consider the second moment matrix of the window
containing the blob:
const][
v
uMvu
Recall:
This ellipse visualizes the “characteristic shape” of the
windowSlide: S. Lazebnik
![Page 71: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/71.jpg)
Affine adaptation example
Scale-invariant regions (blobs)
Slide: S. Lazebnik
![Page 72: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/72.jpg)
Affine adaptation example
Affine-adapted blobs
Slide: S. Lazebnik
![Page 73: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/73.jpg)
Software
VLFeat: Vision Library Features http://www.vlfeat.org/
(will be used in this course)
– Local image features (Harris,SIFT, MSER, …)
– Local image descriptors (SIFT, LBP, …)
– Feature encodig (VLAD, Fisher)
– Machine learning tools (k-means, GMM, SVM)
– Matlab and C interfaces
![Page 74: Visual Recognition: Objects, Actions and Scenes](https://reader036.vdocuments.site/reader036/viewer/2022081412/629b3b06339dbf0461753806/html5/thumbnails/74.jpg)
References
– Schmid, C. and Mohr, R. (1997). Local grayvalue invariants for image
retrieval, IEEE Transactions on Pattern Analysis and Machine Intelligence
19(5): 530–535.
– Lindeberg, T. (1998). Feature detection with automatic scale selection,
International Journal of Computer Vision 30(2): 77–116.
– Lowe, D. (2004). Distinctive image features from scale-invariant keypoints,
International Journal of Computer Vision.
– Mikolajczyk, K. and Schmid, C. (2002). An affine invariant interest point
detector, Proc. Seventh European Conference on Computer Vision, Vol.
2350 of Lecture Notes in Computer Science, Springer Verlag, Berlin,
Copenhagen, Denmark, pp. I:128–142.
– Mikolajczyk, K. and Schmid, C. (2003). A performance evaluation of local
descriptors, Proc. Computer Vision and Pattern Recognition, pp. II: 257–
263.
– Sivic, J. and Zisserman, A. (2003). Video google: A text retrieval approach to
object matching in videos, Proc. Ninth International Conference on
Computer Vision, Nice, France, pp. 1470–1477.
– VLFeat (Vision Library Features) http://www.vlfeat.org/