visual recognition: objects, actions and scenes

Includes slides from: O. Chum, K. Grauman, S. Lazebnik, B. Leibe, D. Lowe, J. Philbin,

J. Ponce, D. Nister, J. Sivic, N. Snavely and A. Zisserman

Lecture 2:

Local invariant image features

Ivan Laptev

[email protected]

INRIA, WILLOW, ENS/INRIA/CNRS UMR 8548

Laboratoire d’Informatique, Ecole Normale Supérieure, Paris, France

University of Trento

July 7-10, 2014

Trento, Italy

Visual Recognition:

Objects, Actions and Scenes

Image matching and recognition with local features

The goal: establish correspondence between two or more

images

Image points x and x’ are in correspondence if they are

projections of the same 3D scene point X.Images courtesy A. Zisserman

x

x'

XPP

/

Example I: Wide baseline matching and 3D reconstruction

Establish correspondence between two (or more) images.

[Schaffalitzky and Zisserman ECCV 2002]

Example I: Wide baseline matching and 3D reconstruction

Establish correspondence between two (or more) images.

[Schaffalitzky and Zisserman ECCV 2002]

X

[Agarwal, Snavely, Simon, Seitz, Szeliski, ICCV’09] –

Building Rome in a Day

57,845 downloaded images, 11,868 registered images. This video: 4,619 images.

Example II: Object recognition

[D. Lowe, 1999]

Establish correspondence between the target image and

(multiple) images in the model database.

Target

image

Model

database

Find these landmarks ...in these images and 1M more

Example III: Visual search

Given a query image, find images depicting the same place /

object in a large unordered image collection.

Establish correspondence between the query image and all

images from the database depicting the same object / scene.

Query image

Database image(s)

Bing visual scan

Mobile visual search

and others… Snaptell.com, Millpix.com

Applications

Take a picture of a product or advertisement

find relevant information on the web

[Pixee – Milpix]

Applications

Finding stolen/missing objects in a large collection…

Applications

Copy detection for images and videos

Search in 200h of videoQuery video

13K. Grauman, B. Leibe

Sony Aibo – Robotics

• Recognize docking station

• Communicate with visual cards

• Place recognition

• Loop closure in SLAM

Slide credit: David Lowe

Applications

Why is it difficult?

Want to establish correspondence despite possibly large changes in scale, viewspoint, lighting and partial occlusion

ViewpointScale

Lighting Occlusion

… and the image collection can be very large (e.g. 1M images)

Compute scale / affine co-variant local features

Estimate pairwise best matches between local features

Enforce geometric constraints between local features

How does it work?

Approach:

Why extract features?

• Motivation: panorama stitching• We have two images – how do we combine them?

Slide: S. Lazebnik



Step 1: extract features

Step 2: match features

Slide: S. Lazebnik



Step 1: extract features

Step 2: match features

Step 3: align imagesSlide: S. Lazebnik

Characteristics of good features

• Repeatability• The same feature can be found in several images despite geometric

and photometric transformations

• Saliency• Each feature is distinctive

• Compactness and efficiency• Many fewer features than image pixels

• Locality• A feature occupies a relatively small area of the image; robust to

clutter and occlusion

Slide: S. Lazebnik

A hard feature matching problem

NASA Mars Rover images

Slide: S. Lazebnik

NASA Mars Rover images

with SIFT feature matches

Figure by Noah Snavely

Answer below (look for tiny colored squares…)

Slide: S. Lazebnik

Corner Detection: Basic Idea

• We should easily recognize the point by looking through a small window

• Shifting a window in any direction should give a large change in intensity

“edge”:

no change

along the edge

direction

“corner”:

significant

change in all

directions

“flat” region:

no change in

all directions

Source: A. Efros

Corner Detection: Mathematics

Change in appearance of window W for the shift [u,v]:

I(x, y)E(u, v)

E(3,2)

Wyx

yxIvyuxIvuE),(

2)],(),([),(

Slide: S. Lazebnik


I(x, y)E(u, v)

E(0,0)


Wyx

yxIvyuxIvuE),(

2)],(),([),(

Slide: S. Lazebnik


We want to find out how this function behaves for

small shiftsE(u, v)


Wyx

yxIvyuxIvuE),(

2)],(),([),(

Slide: S. Lazebnik


• First-order Taylor approximation for small

motions [u, v]:

• Let’s plug this into

v

uIIyxI

vIuIyxI

vIuIyxIvyuxI

yx

yx

yx

),(

),(

sorder termhigher ),(),(

Wyx

yxIvyuxIvuE),(

2)],(),([),(

Slide: S. Lazebnik


v

u

III

IIIvu

v

uII

yxIv

uIIyxI

yxIvyuxIvuE

yyx

yxx

Wyx

Wyx

yx

Wyx

yx

Wyx

2

2

),(

),(

2

),(

2

),(

2

)],(),([

)],(),([),(

Slide: S. Lazebnik


The quadratic approximation simplifies to

where M is a second moment matrix computed from image

derivatives:

v

uMvuvuE ),(

Wyx yyx

yxx

III

IIIM

),(2

2

Slide: S. Lazebnik

• The surface E(u,v) is locally approximated by a

quadratic form. Let’s try to understand its shape.

• Specifically, in which directions

does it have the smallest/greatest

change?

Interpreting the second moment matrix

v

uMvuvuE ][),(

Wyx yyx

yxx

III

IIIM

),(2

2

E(u, v)

Slide: S. Lazebnik

First, consider the axis-aligned case

(gradients are either horizontal or vertical)

If either a or b is close to 0, then this is not a corner,

so look for locations where both are large.


Wyx yyx

yxx

III

IIIM

),(2

2

b

a

0

0

Slide: S. Lazebnik

Consider a horizontal “slice” of E(u, v):


This is the equation of an ellipse.

const][

v

uMvu

Slide: S. Lazebnik

Visualization of second moment matrices

Slide: S. Lazebnik

Consider a horizontal “slice” of E(u, v):


This is the equation of an ellipse.

RRM

2

11

0

0

The axis lengths of the ellipse are determined by the

eigenvalues and the orientation is determined by R

direction of the

slowest change

direction of the

fastest change

(max)-1/2

(min)-1/2

const][

v

uMvu

Diagonalization of M:

Slide: S. Lazebnik

Interpreting the eigenvalues

1

2

“Corner”

1 and 2 are large,

1 ~ 2;

E increases in all

directions

1 and 2 are small;

E is almost constant

in all directions

“Edge”

1 >> 2

“Edge”

2 >> 1

“Flat”

region

Classification of image points using eigenvalues

of M:

Slide: S. Lazebnik

Corner response function

“Corner”

R > 0

“Edge”

R < 0

“Edge”

R < 0

“Flat”

region

|R| small

2

2121

2 )()(trace)det( MMR

α: constant (0.04 to 0.06)

Slide: S. Lazebnik

Harris detector: Steps

1. Compute Gaussian derivatives at each pixel

2. Compute second moment matrix M in a

Gaussian window around each pixel

3. Compute corner response function R

4. Threshold R

5. Find local maxima of response function

(nonmaximum suppression)

C.Harris and M.Stephens. “A Combined Corner and Edge Detector.” Proceedings of the 4th Alvey Vision Conference: pages 147—151, 1988.

Slide: S. Lazebnik

http://www.bmva.org/bmvc/1988/avc-88-023.pdf

Harris Detector: Steps

Slide: S. Lazebnik


Compute corner response R

Slide: S. Lazebnik


Find points with large corner response: R>threshold

Slide: S. Lazebnik


Take only the points of local maxima of R

Slide: S. Lazebnik


Slide: S. Lazebnik

Invariance and covariance

• We want corner locations to be invariant to photometric

transformations and covariant to geometric transformations

• Invariance: image is transformed and corner locations do not change

• Covariance: if we have two transformed versions of the same image,

features should be detected in corresponding locations

Slide: S. Lazebnik

Affine intensity change

• Only derivatives are used =>

invariance to intensity shift I I + b

• Intensity scaling: I a I

R

x (image coordinate)

threshold

R

x (image coordinate)

Partially invariant to affine intensity change

I a I + b

Slide: S. Lazebnik

Image translation

• Derivatives and window function are shift-invariant

Corner location is covariant w.r.t. translation

Slide: S. Lazebnik

Image rotation

Second moment ellipse rotates but its shape

(i.e. eigenvalues) remains the same

Corner location is covariant w.r.t. rotation

Slide: S. Lazebnik

Scaling

All points will

be classified

as edges

Corner

Corner location is not covariant to scaling!

Slide: S. Lazebnik

Blob detection

Slide: S. Lazebnik

Feature detection with scale selection

• We want to extract features with characteristic

scale that is covariant with the image

transformation

Slide: S. Lazebnik

Blob detection: basic idea

• To detect blobs, convolve the image with a

“blob filter” at multiple scales and look for

maxima of filter response in the resulting

scale space

Slide: S. Lazebnik

Blob filter

Laplacian of Gaussian: Circularly symmetric

operator for blob detection in 2D

2

2

2

22

y

g

x

gg

Slide: S. Lazebnik

Recall: Edge detection

gdx

df

f

gdx

d

Source: S. Seitz

Edge

Derivative

of Gaussian

Edge = maximum

of derivative

Edge detection, Take 2

gdx

df

2

2

f

gdx

d2

2

Edge

Second derivative

of Gaussian

(Laplacian)

Edge = zero crossing

of second derivative

Source: S. Seitz

From edges to blobs

• Edge = ripple

• Blob = superposition of two ripples

Spatial selection: the magnitude of the Laplacian

response will achieve a maximum at the center of

the blob, provided the scale of the Laplacian is

“matched” to the scale of the blob

maximum

Slide: S. Lazebnik

Scale-space blob detector: Example

Slide: S. Lazebnik

Scale-space blob detector

1. Convolve image with scale-normalized

Laplacian at several scales

2. Find maxima of squared Laplacian response

in scale-space

Slide: S. Lazebnik

Scale-space blob detector: Example

Slide: S. Lazebnik

• Approximating the Laplacian with a

difference of Gaussians:

2 ( , , ) ( , , )xx yyL G x y G x y

( , , ) ( , , )DoG G x y k G x y

(Laplacian)

(Difference of Gaussians)

Efficient implementation

Slide: S. Lazebnik

Efficient implementation

David G. Lowe. "Distinctive image features from scale-invariant

keypoints.” IJCV 60 (2), pp. 91-110, 2004.

Slide: S. Lazebnik

http://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf

From feature detection to feature description

• Scaled and rotated versions of the same

neighborhood will give rise to blobs that are related

by the same transformation

• What to do if we want to compare the appearance of

these image regions?

• Normalization: transform these regions into same-

size circles

• Problem: rotational ambiguity

Slide: S. Lazebnik

Eliminating rotation ambiguity

• To assign a unique orientation to circular

image windows:• Create histogram of local gradient directions in the patch

• Assign canonical orientation at peak of smoothed histogram

0 2 p

Slide: S. Lazebnik

SIFT features

• Detected features with characteristic scales

and orientations:


keypoints.” IJCV 60 (2), pp. 91-110, 2004. Slide: S. Lazebnik


SIFT descriptors


keypoints.” IJCV 60 (2), pp. 91-110, 2004. Slide: S. Lazebnik


Invariance vs. covariance

Invariance:• features(transform(image)) = features(image)

Covariance:• features(transform(image)) = transform(features(image))

Covariant detection => invariant descriptionSlide: S. Lazebnik

Affine adaptation

• Affine transformation approximates viewpoint

changes for roughly planar objects and

roughly orthographic cameras

Slide: S. Lazebnik

Affine adaptation

RRIII

IIIyxwM

yyx

yxx

yx

2

11

2

2

, 0

0),(

direction of

the slowest

change

direction of the

fastest change

(max)-1/2

(min)-1/2

Consider the second moment matrix of the window

containing the blob:

const][

v

uMvu

Recall:

This ellipse visualizes the “characteristic shape” of the

windowSlide: S. Lazebnik

Affine adaptation example

Scale-invariant regions (blobs)

Slide: S. Lazebnik

Affine adaptation example

Affine-adapted blobs

Slide: S. Lazebnik

Software

VLFeat: Vision Library Features http://www.vlfeat.org/

(will be used in this course)

– Local image features (Harris,SIFT, MSER, …)

– Local image descriptors (SIFT, LBP, …)

– Feature encodig (VLAD, Fisher)

– Machine learning tools (k-means, GMM, SVM)

– Matlab and C interfaces

http://www.vlfeat.org/

References

– Schmid, C. and Mohr, R. (1997). Local grayvalue invariants for image

retrieval, IEEE Transactions on Pattern Analysis and Machine Intelligence

19(5): 530–535.

– Lindeberg, T. (1998). Feature detection with automatic scale selection,

International Journal of Computer Vision 30(2): 77–116.

– Lowe, D. (2004). Distinctive image features from scale-invariant keypoints,

International Journal of Computer Vision.

– Mikolajczyk, K. and Schmid, C. (2002). An affine invariant interest point

detector, Proc. Seventh European Conference on Computer Vision, Vol.

2350 of Lecture Notes in Computer Science, Springer Verlag, Berlin,

Copenhagen, Denmark, pp. I:128–142.

– Mikolajczyk, K. and Schmid, C. (2003). A performance evaluation of local

descriptors, Proc. Computer Vision and Pattern Recognition, pp. II: 257–

263.

– Sivic, J. and Zisserman, A. (2003). Video google: A text retrieval approach to

object matching in videos, Proc. Ninth International Conference on

Computer Vision, Nice, France, pp. 1470–1477.

– VLFeat (Vision Library Features) http://www.vlfeat.org/

visual recognition: objects, actions and scenes

Documents