beyond the euclidean distance: creating effective visual codebooks using the histogram intersection...
DESCRIPTION
TRANSCRIPT
Beyond the Euclidean distance: Creating effective visual codebooks
using the histogram intersection kernelAuthors: Jianxin Wu and James Rehg
@Georgia Institute of TechnologyPresenter: Shao-Chuan Wang
Beyond the Euclidean distance
• Key Ideas:– Use histogram intersection kernel (HIK) to create
the visual codebook due to the fact that most of descriptors are histogram-based features• Kernel K-means (using HIK)• One-class SVM (using HIK)
• Conclusions: – One-class SVM with HIK performs the best– K-median is the compromise (comparable with HIK
K-means)
Background: Bag of Visual Words
1. Codebook construction (Find D)– Clustering-based, such as k-means
2. Assignment of descriptors to visual word (Find \alpha)
3. Pooling (sum pooling to construct histograms)
i
ii Dx2
2min
Subject to some constraints
←focus of this paper
Voronoidiagram
Kernel K-means (1/2)
• Finding the nearest centroid from K centroids:
• Updating the centroids by averaging the new assigned atoms
2)(
1
* )()(minarg tji
Kjcxj
*)1()1( ,0,1* jjtij
t
ij
i
tij
ii
tij
tj
xc )1(
)1(
)1(
)()(
Iteration t:
Kernel K-means (2/2)
n
mm
tim
ti
i
tij
n
it
i
i
tij
ii
i
tij
ii
tij
it
ji
xxk
xxkxxk
xxcx
1,precompute
)()(
2)(
computing ofcost main
1
)(
)(
argminaffect not do
2
)(
)(
2)(
),(1
),(2
),(
)()()()(
j
(1)
Contribution 1: fast evaluation of HIK
• Based on (Maji et al. 2008) and transforming R^d_+ into N^d, and the evaluation of (1) can be reduced to O(d)
dj xxij
xxj
djjij
djjij
HIKii
jijjij
cxxc
xxc
xxcxxkcxf
1
1
1
),min(
),min(),()(
},...,2,1,0{ maxxxij →pre-compute a lookup table!
Contribution 2: Encoding via One-class SVM
• Example one-class SVM in 2D using Gaussian kernel:
Gamma = 0.01, C=2000 Gamma = 0.1, C=2000
Contribution 2: Encoding via One-class SVM
1. Use kernel K-means (with HIK) to create codebook of size K.
2. Train K one-class SVM for each cluster.3. Assign the word according to the maximum
response out of K SVM machines.
n
jijjij
Ki xxka
11 ),(maxarg
ja :Lagrangian multiplier
Contribution 3: Comparison with K-median Codebook
• K-median clustering:– Finding nearest centroid using L1 distance– Updating the centroids by finding the median of
the updated atoms.
• ‘Median’ is the minimizer of the following opt. problem,
n
ii
xxx
1
||min
Some engineering details
• Pyramid overlapping pooling strategy
31 subwindows => 31K dimension vector
Some engineering details
• Concatenation of Sobel image
Pictures from Wikipedia
=> 31K*2=62K dimension image representation
Some engineering details
• SIFT for Caltech, CENTRIST for others• Codebook size K = 200• Pyramid level L = 0, 1, 2• Using one-vs-one SVM for smaller dataset,
using BSVM for Caltech 101• Random splitting is repeated 5 times.
Results: Caltech 101
• B, not B: concatenation of Sobel or not• s: grid step size of dense SIFT extraction• oc_{svm}: one class SVM encoding• k_{HI}: using histogram intersection kernel
Results: Scene 15
• B, not B: concatenation of Sobel or not• s: grid step size of dense SIFT extraction• oc_{svm}: one class SVM encoding• k_{HI}: using histogram intersection kernel
Conclusions
1. HIK visual codebook improves classification accuracy.2. K-median is a compromise between k-means and HIK.3. One-class SVM encoding helps build a more compact
representation4. Smaller step
size is better?