bag of visual words for image representation & visual search jianping fan dept of computer...
TRANSCRIPT
Bag of Visual Words for Image Representation & Visual Search
Jianping Fan Dept of Computer Science UNC-Charlotte
1. Interest Point Extraction & SIFT
2. Clustering for Dictionary Learning
3. Bag of Visual Words
4. Image Representation & Applications
Bag-of-Visual-Words
Interest Point Extraction
• Scale-space extrema detection– Uses difference-of-Gaussian function
• Keypoint localization– Sub-pixel location and scale fit to a model
• Orientation assignment– 1 or more for each keypoint
• Keypoint descriptor– Created from local image gradients
Scale space• Keypoints are detected using scale-space extrema in
difference-of-Gaussian function D
• D definition:
• Efficient to compute
Relationship of D to
• Close approximation to scale-normalized Laplacian of Gaussian,
• Diffusion equation:
• Approximate ∂G/∂σ:
– giving,
• When D has scales differing by a constant factor it already incorporates the σ2 scale normalization required for scale-invariance
G22
G22G
G 2
k
yxGkyxGG ),,(),,(
Gk
yxGkyxG 2),,(),,(
GkyxGkyxG 22)1(),,(),,(
Finding extrema
• Sample point is selected only if it is a minimum or a maximum of these points
DoG scale space
Extrema in this image
Localization
• 3D quadratic function is fit to the local sample points
• Start with Taylor expansion with sample point as the origin – where
• Take the derivative with respect to X, and set it to 0, giving
• is the location of the keypoint
• This is a 3x3 linear system
2
2
2
1)(
DDDD T
T
Tyx ),,(
XX
D
X
D ˆ02
2
DD2
12
ˆ
Localization
• Derivatives approximated by finite differences,– example:
• If X is > 0.5 in any dimension, process repeated
x
Dy
D
D
x
y
x
D
yx
D
x
Dyx
D
y
D
y
Dx
D
y
DD
2
222
2
2
22
22
2
2
4
)()(
1
2
2
,11
,11
,11
,11
2
,1
,,1
2
2
,1
,1
jik
jik
jik
jik
jik
jik
jik
jik
jik
DDDD
y
D
DDDD
DDD
Filtering
• Contrast (use prev. equation):– If | D(X) | < 0.03, throw it out
• Edge-iness:– Use ratio of principal curvatures to throw out poorly defined
peaks– Curvatures come from Hessian:– Ratio of Trace(H)2 and Determinant(H)
– If ratio > (r+1)2/(r), throw it out (SIFT uses r=10)
XD
DDT
ˆ2
1)ˆ(
yyxy
xyxx
DD
DDH
2)()(
)(
xyyyxx
yyxx
DDDHDet
DDHTr
Orientation assignment
• Descriptor computed relative to keypoint’s orientation achieves rotation invariance
• Precomputed along with mag. for all levels (useful in descriptor computation)
• Multiple orientations assigned to keypoints from an orientation histogram– Significantly improve stability of matching
))),1(),1(/())1,()1,(((2tan),(
))1,()1,(()),1(),1((),( 22
yxLyxLyxLyxLayx
yxLyxLyxLyxLyxm
Descriptor
• Descriptor has 3 dimensions (x,y,θ)
• Orientation histogram of gradient magnitudes
• Position and orientation of each gradient sample rotated relative to keypoint orientation
Descriptor• Best results achieved with 4x4x8 = 128
descriptor size
• Normalize to unit length– Reduces effect of illumination change
• Cap each element to 0.2, normalize again– Reduces non-linear illumination changes– 0.2 determined experimentally
PCA-SIFT
• Different descriptor (same keypoints)
• Apply PCA to the gradient patch
• Descriptor size is 20 (instead of 128)
• More robust, faster
Summary
• Scale space
• Difference-of-Gaussian
• Localization
• Filtering
• Orientation assignment
• Descriptor, 128 elements
28
Sparse Coding & Dictionary Learning
• Dictionary learning and sparse coding
• Sparse factor analysis model
(Factor/feature/dish/dictionary atom)
• Indian Buffet process and beta process
11
minN
iFi
D,W
X DW w