beyond the euclidean distance: creating effective visual codebooks using the histogram intersection...

Beyond the Euclidean distance: Creating effective visual codebooks

using the histogram intersection kernelAuthors: Jianxin Wu and James Rehg

@Georgia Institute of TechnologyPresenter: Shao-Chuan Wang

Beyond the Euclidean distance

• Key Ideas:– Use histogram intersection kernel (HIK) to create

the visual codebook due to the fact that most of descriptors are histogram-based features• Kernel K-means (using HIK)• One-class SVM (using HIK)

• Conclusions: – One-class SVM with HIK performs the best– K-median is the compromise (comparable with HIK

K-means)

Background: Bag of Visual Words

1. Codebook construction (Find D)– Clustering-based, such as k-means

2. Assignment of descriptors to visual word (Find \alpha)

3. Pooling (sum pooling to construct histograms)

i

ii Dx2

2min

Subject to some constraints

←focus of this paper

Voronoidiagram

Kernel K-means (1/2)

• Finding the nearest centroid from K centroids:

• Updating the centroids by averaging the new assigned atoms

2)(

1

* )()(minarg tji

Kjcxj

*)1()1( ,0,1* jjtij

t

ij

i

tij

ii

tij

tj

xc )1(

)1(

)1(

)()(

Iteration t:

Kernel K-means (2/2)

n

mm

tim

ti

i

tij

n

it

i

i

tij

ii

i

tij

ii

tij

it

ji

xxk

xxkxxk

xxcx

1,precompute

)()(

2)(

computing ofcost main

1

)(

)(

argminaffect not do

2

)(

)(

2)(

),(1

),(2

),(

)()()()(

j

(1)

Contribution 1: fast evaluation of HIK

• Based on (Maji et al. 2008) and transforming R^d_+ into N^d, and the evaluation of (1) can be reduced to O(d)

dj xxij

xxj

djjij

djjij

HIKii

jijjij

cxxc

xxc

xxcxxkcxf

1

1

1

),min(

),min(),()(

},...,2,1,0{ maxxxij →pre-compute a lookup table!

Contribution 2: Encoding via One-class SVM

• Example one-class SVM in 2D using Gaussian kernel:

Gamma = 0.01, C=2000 Gamma = 0.1, C=2000

Contribution 2: Encoding via One-class SVM

1. Use kernel K-means (with HIK) to create codebook of size K.

2. Train K one-class SVM for each cluster.3. Assign the word according to the maximum

response out of K SVM machines.

n

jijjij

Ki xxka

11 ),(maxarg

ja :Lagrangian multiplier

Contribution 3: Comparison with K-median Codebook

• K-median clustering:– Finding nearest centroid using L1 distance– Updating the centroids by finding the median of

the updated atoms.

• ‘Median’ is the minimizer of the following opt. problem,

n

ii

xxx

1

||min

Some engineering details

• Pyramid overlapping pooling strategy

31 subwindows => 31K dimension vector


• Concatenation of Sobel image

Pictures from Wikipedia

=> 31K*2=62K dimension image representation


• SIFT for Caltech, CENTRIST for others• Codebook size K = 200• Pyramid level L = 0, 1, 2• Using one-vs-one SVM for smaller dataset,

using BSVM for Caltech 101• Random splitting is repeated 5 times.

Results: Caltech 101

• B, not B: concatenation of Sobel or not• s: grid step size of dense SIFT extraction• oc_{svm}: one class SVM encoding• k_{HI}: using histogram intersection kernel

Results: Scene 15

• B, not B: concatenation of Sobel or not• s: grid step size of dense SIFT extraction• oc_{svm}: one class SVM encoding• k_{HI}: using histogram intersection kernel

Conclusions

1. HIK visual codebook improves classification accuracy.2. K-median is a compromise between k-means and HIK.3. One-class SVM encoding helps build a more compact

representation4. Smaller step

size is better?

beyond the euclidean distance: creating effective visual codebooks using the histogram intersection...

Education