bag of visual words for image representation & visual search jianping fan dept of computer...

Bag of Visual Words for Image Representation & Visual Search

Jianping Fan Dept of Computer Science UNC-Charlotte

1. Interest Point Extraction & SIFT

2. Clustering for Dictionary Learning

3. Bag of Visual Words

4. Image Representation & Applications

Bag-of-Visual-Words

Interest Point Extraction

• Scale-space extrema detection– Uses difference-of-Gaussian function

• Keypoint localization– Sub-pixel location and scale fit to a model

• Orientation assignment– 1 or more for each keypoint

• Keypoint descriptor– Created from local image gradients

Scale space• Definition:

where


Scale space• Keypoints are detected using scale-space extrema in

difference-of-Gaussian function D

• D definition:

• Efficient to compute

Relationship of D to

• Close approximation to scale-normalized Laplacian of Gaussian,

• Diffusion equation:

• Approximate ∂G/∂σ:

– giving,

• When D has scales differing by a constant factor it already incorporates the σ2 scale normalization required for scale-invariance

G22

G22G

G 2

k

yxGkyxGG ),,(),,(

Gk

yxGkyxG 2),,(),,(

GkyxGkyxG 22)1(),,(),,(

Difference-of-Gaussian images

…

first octave

…

…

second octave

…

…

third octave

…

fourth octave

…

…

Finding extrema

• Sample point is selected only if it is a minimum or a maximum of these points

DoG scale space

Extrema in this image

Localization

• 3D quadratic function is fit to the local sample points

• Start with Taylor expansion with sample point as the origin – where

• Take the derivative with respect to X, and set it to 0, giving

• is the location of the keypoint

• This is a 3x3 linear system

2

2

2

1)(

DDDD T

T

Tyx ),,(

XX

D

X

D ˆ02

2

DD2

12

ˆ

Localization

• Derivatives approximated by finite differences,– example:

• If X is > 0.5 in any dimension, process repeated

x

Dy

D

D

x

y

x

D

yx

D

x

Dyx

D

y

D

y

Dx

D

y

DD

2

222

2

2

22

22

2

2

4

)()(

1

2

2

,11

,11

,11

,11

2

,1

,,1

2

2

,1

,1

jik

jik

jik

jik

jik

jik

jik

jik

jik

DDDD

y

D

DDDD

DDD

Filtering

• Contrast (use prev. equation):– If | D(X) | < 0.03, throw it out

• Edge-iness:– Use ratio of principal curvatures to throw out poorly defined

peaks– Curvatures come from Hessian:– Ratio of Trace(H)2 and Determinant(H)

– If ratio > (r+1)2/(r), throw it out (SIFT uses r=10)

XD

DDT

ˆ2

1)ˆ(

yyxy

xyxx

DD

DDH

2)()(

)(

xyyyxx

yyxx

DDDHDet

DDHTr

Orientation assignment

• Descriptor computed relative to keypoint’s orientation achieves rotation invariance

• Precomputed along with mag. for all levels (useful in descriptor computation)

• Multiple orientations assigned to keypoints from an orientation histogram– Significantly improve stability of matching

))),1(),1(/())1,()1,(((2tan),(

))1,()1,(()),1(),1((),( 22

yxLyxLyxLyxLayx

yxLyxLyxLyxLyxm

Keypoint images

Descriptor

• Descriptor has 3 dimensions (x,y,θ)

• Orientation histogram of gradient magnitudes

• Position and orientation of each gradient sample rotated relative to keypoint orientation

Descriptor• Best results achieved with 4x4x8 = 128

descriptor size

• Normalize to unit length– Reduces effect of illumination change

• Cap each element to 0.2, normalize again– Reduces non-linear illumination changes– 0.2 determined experimentally

PCA-SIFT

• Different descriptor (same keypoints)

• Apply PCA to the gradient patch

• Descriptor size is 20 (instead of 128)

• More robust, faster

Interest Points & SIFT Features

Summary

• Scale space

• Difference-of-Gaussian

• Localization

• Filtering

• Orientation assignment

• Descriptor, 128 elements

Dictionary Learning

Quantization for Identification

28

Sparse Coding & Dictionary Learning

• Dictionary learning and sparse coding

• Sparse factor analysis model

(Factor/feature/dish/dictionary atom)

• Indian Buffet process and beta process

11

minN

iFi

D,W

X DW w

Dictionary Learning

Image Representation via Bag-of-Visual-Words

Dictionary

Application for Visual Search


How to do database indexing?


Visual Phrases & Contexts?

Multi-Resolution SIFT

bag of visual words for image representation & visual search jianping fan dept of computer...

Documents