eccv2010: distance function and metric learning part 1

Distance Functions and Metric Learning:

Part 1

Michael Werman Ofir Pele

The Hebrew Universityof Jerusalem

12/1/2011 version. Updated Version:http://www.cs.huji.ac.il/~ofirpele/DFML_ECCV2010_tutorial/

Rough Outline

• Bin-to-Bin Distances.

• Cross-Bin Distances:

– Quadratic-Form (aka Mahalanobis) / Quadratic-Chi.

– The Earth Mover’s Distance.

• Perceptual Color Differences.

• Hands-On Code Example.

Distance ?

Metric

² Non-negativity:D(P;Q) ¸ 0

² Identity of indiscernibles:D(P;Q) = 0 i®P = Q

² Symmetry:D(P;Q) = D(Q;P )

² Subadditivity (triangle inequality):D(P;Q) · D(P;K ) + D(K ;Q)

² Property changed to:D(P;Q) = 0 if P = Q

Pseudo-Metric (aka Semi-Metric)

Metric

An object is most similar to itself

² Identity of indiscernibles:D(P;Q) = 0 i®P = Q

Metric

Useful for many algorithms

Minkowski-Form Distances

Lp(P;Q) =

jPi ¡ Qi jp! 1

Minkowski-Form Distances

L2(P;Q) =

(Pi ¡ Qi )2

L1(P;Q) =X

jPi ¡ Qi j

L1 (P;Q) = maxi

jPi ¡ Qi j

Kullback-Leibler Divergence

K L(P;Q) =X

Pi logPi

• Information theoretic origin.

• Non symmetric.

• Qi = 0 ?

Jensen-Shannon Divergence

J S(P;Q) =12K L(P;M ) +

12K L(Q;M )

M =12(P + Q)

J S(P;Q) =12

Pi log2Pi

Pi + Qi+

Qi log2Qi

Pi + Qi

• Information Theoretic origin.

• Symmetric.

• is a metric.

J S(P;Q) =12K L(P;M ) +

12K L(Q;M )

M =12(P + Q)

• Using Taylor extension and some algebra:

J S(P;Q) =1X

12n(2n ¡ 1)

(Pi ¡ Qi )2n

(Pi + Qi )2n¡ 1

(Pi ¡ Qi )2

(Pi + Qi )+

(Pi ¡ Qi )4

(Pi + Qi )3+ :::

Histogram Distance Â2

• Statistical origin.

• Experimentally results are very similar to JS.

• Reduces the effect of large bins.

• is a metric

Â2(P;Q) =12

(Pi ¡ Qi )2

(Pi + Qi )

<Â2( ; )Â2( ; )

>L1( ; ) L1( ; )

• Experimentally better than .L2

Â2(P;Q) =12

(Pi ¡ Qi )2

(Pi + Qi )

L1( ; )

Bin-to-Bin Distances

• Bin-to-Bin distances such as L2 are sensitive to quantization:

L1;L2;Â2

> L1( ; )

robustness distinctiveness

• #bins

>L1( ; ) L1( ; )

robustness distinctiveness

• #bins

Can we achieve robustnessrobustness

distinctivenessdistinctiveness ?

The Quadratic-Form Histogram Distance

• is the similarity between bin i and j.

• If is the inverse of the covariance matrix, QF is called Mahalanobis distance.

(Pi ¡ Qi )(Pj ¡ Qj )A i j

QF A (P;Q) =q

(P ¡ Q)T A(P ¡ Q)

QF A (P;Q) =q

(P ¡ Q)T A(P ¡ Q)

(Pi ¡ Qi )(Pj ¡ Qj )A i j

(Pi ¡ Qi )2 = L2(P;Q)

• Does not reduce the effect of large bins.

• Alleviates the quantization problem.

• Linear time computation in # non zero . A i j

QF A (P;Q) =q

(P ¡ Q)T A(P ¡ Q)

• If A is positive-semidefinite then QF is a pseudo-metric.

QF A (P;Q) =q

(P ¡ Q)T A(P ¡ Q)

² Property changed to:D(P;Q) = 0 if P = Q

Pseudo-Metric (aka Semi-Metric)

• If A is positive-definite then QF is a metric.

QF A (P;Q) =q

(P ¡ Q)T A(P ¡ Q)

QF A (P;Q) =q

(P ¡ Q)T A(P ¡ Q)

(P ¡ Q)T WT W(P ¡ Q)

= L2(WP;WQ)

QF A (P;Q) =q

(P ¡ Q)T A(P ¡ Q)

= L2(WP;WQ)

We assume there is a linear transformationlinear transformation

that makes bins independent independent

There are cases where this is not truenot true

e.g. COLORCOLOR

QF A (P;Q) =q

(P ¡ Q)T A(P ¡ Q)

• Converting distance to similarity (Hafner et. al 95):

A i j = 1¡D i j

maxi j (D i j )

QF A (P;Q) =q

(P ¡ Q)T A(P ¡ Q)

• Converting distance to similarity (Hafner et. al 95):

A i j = e¡ ® D ( i ; j )maxi j (D ( i ; j ) )

If ®is large enough,A will be positive-de nitive

QF A (P;Q) =q

(P ¡ Q)T A(P ¡ Q)

• Learning the similarity matrix: part 2 of this tutorial.

• is the similarity between bin i and j.

• Generalizes and .

• Reduces the effect of large bins.

• Alleviates the quantization problem.

• Linear time computation in # non zero .

The Quadratic-Chi Histogram Distance

QCAm(P;Q) =

(Pi ¡ Qi )(Pj ¡ Qj )Ai j

c(Pc +Qc)Aci )m(P

c(Pc +Qc)Acj )m

QCAm(P;Q) =

(Pi ¡ Qi )(Pj ¡ Qj )Ai j

c(Pc +Qc)Aci )m(P

c(Pc +Qc)Acj )m

The Quadratic-Chi Histogram Distance

• is non-negative if A is positive-semidefinite.

• Symmetric.

• Triangle inequality unknown.

• If we define and , QC is continuous. 0· m < 100 = 0

Similarity-Matrix-Quantization-Invariant Property

( , )D )( , = D

Sparseness-Invariant

D( , )D )( ,

Empty bins

f unct i on di st= Quadrat i cChi (P, Q, A, m)

Z= (P+Q)*A;%1 can be any number as Z_i ==0 i f f D_i =0Z(Z==0)= 1;Z= Z. m;D= (P- Q) . / Z;%max i s redundant i f A i s%posi t i ve- semi def i ni tedi st= sqrt ( max(D*A*D' , 0) ) ;

The Quadratic-Chi Histogram Distance Code

TimeComplexity: O(#A i j 6= 0)

What about sparse (e.g. BoW) histograms ?

0 0 0 0 0 00 00 4 50 -3

P ¡ Q =

TimeComplexity: O(SK )

0 0 0 0 0 00 00 4 50 -3

P ¡ Q =

#(P 6= 0) + #(Q 6= 0)

Averageof non-zero entries

0 0 0 0 0 00 00 4 50 -3

P ¡ Q =

in each row of A

The Earth Mover’s Distance

• The Earth Mover’s Distance is defined as the minimal cost that must be paid to transform one histogram into the other, where there is a “ground distance” between the basic features that are aggregated into the histogram.

EM DD (P;Q) = minF =f F i j g

F i j Di j

s:t :X

F i j = Pi

F i j = Qj

F i j =X

Qj = 1

F i j ¸ 0

E M DD (P;Q) = minF =f F i j g

Pi ;j F i j Di jP

i F i j

s:t :X

F i j · Pi

F i j · Qj

F i j = min(X

F i j ¸ 0

• Pele and Werman 08 – , a new EMD definition.

\E M D

Definition:

\EM DD

C (P;Q) = minF =f F i j g

F i j Di j +

¯¯¯¯¯¯

Pi ¡X

¯¯¯¯¯¯£ C

s:t :X

F i j · Pi

F i j · Qj

F i j = min(X

Qj ) F i j ¸ 0

Supplier

Demander

Di j = 0

PDemander ·

PSupplier

Demander Supplier

PDemander ·

PSupplier

Di j = C

\E M D

When to Use

• When the total mass of two histograms is important.

E M D¡

= E M D¡

\E M D

When to Use

• When the total mass of two histograms is important.

E M D¡

= E M D¡

\E M D

\E M D \E M D

When to Use

• When the difference in total mass between histograms is a distinctive cue.

E M D¡

= E M D¡

;¢= 0

\E M D

E M D¡

= E M D¡

When to Use

• When the difference in total mass between histograms is a distinctive cue.

\E M D

¢\E M D \E M D

When to Use

• If ground distance is a metric:

² E M D is a metric only for normalized histograms.

² \E M D is a metric for all histograms (C ¸ 12¢ ).

\E M D

C = 1Di j =

(0 if i = j2 otherwise

\EM DD

C (P;Q) = minF =f F i j g

F i j Di j +

¯¯¯¯¯¯

Pi ¡X

¯¯¯¯¯¯£ C

² \E M D = L1 if:

\E M D - a Natural Extension to L1

The Earth Mover’s Distance Complexity Zoo

• General ground distance:

– Orlin 88

• normalized 1D histograms

– Werman, Peleg and Rosenfeld 85

O(N 3 logN )

• normalized 1D cyclic histograms

– Pele and Werman 08 (Werman, Peleg, Melter, and Kong 86)

L1 O(N )

O(N 2 logN (D + logN))

• Manhattan grids

– Ling and Okada 07

• What about N-dimensional histograms with a cyclic dimensions?

• general histograms

– Gudmundsson, Klein, Knauer and Smid 07

l2 l1 l`2

O(N 2 log2D ¡ 1 N)

• 1D linear/cyclic histograms

– Pele and Werman 08

min(L1;2) O(N )f1;2;::: ;¢ g

• 1D linear/cyclic histograms

min(L1;2) O(N )f1;2;::: ;¢ g

• general histograms

is the number of edges with cost 1

min(L1;2) f1;2;::: ;¢ gD

K O(N 2K log(NK ))

• Any thresholded distance

O(N 2 logN (K + logN))

number of edges with cost

different from the threshold

K = O(logN)O(N 2 log2 N)

• EMD with a thresholded ground distance is

not an approximation of EMD.

• It has better performance.

Thresholded Distances

The Flow Network Transformation

OriginalNetwork

SimplifiedNetwork

OriginalNetwork

SimplifiedNetwork

Flowing the Monge sequence (if ground distance is a metric,

zero-cost edges are a Monge sequence)

Removing Empty Binsand their edges

We actually finished here….

Combining Algorithms

• EMD algorithms can be combined.

• For example :L1

Combining Algorithms

• EMD algorithms can be combined.

• For example, thresholded :L1

The Earth Mover’s Distance Approximations

• Charikar 02, Indyk and Thaper 03 – approximated EMD on by embedding it into the norm.

Time complexity:

Distortion (in expectation):

f1;: :: ;¢ gd

O(TNdlog¢ )

O(dlog¢ )

• Grauman and Darrell 05 – Pyramid Match Kernel (PMK) same as Indyk and Thaper, replacing with histogram intersection.

• PMK approximates EMD with partial matching.

• PMK is a mercer kernel.

• Time complexity & distortion – same as Indyk and Thaper (proved in Grauman and Darrell 07).

• Lazebnik, Schmid and Ponce 06 – used PMK in the spatial domain (SPM).

x1/8 x1/4 x1/2

level 0 level 1 level 2+

Difference

Wavelet

Transform j=0

Absolutevalues

• Shirdhonkar and Jacobs 08 - approximated EMD using the sum of absolute values of the weighted wavelet coefficients of the difference histogram.

£2¡ 2£ 0

£2¡ 2£ 1

• Khot and Naor 06 – any embedding of the EMD over the d-dimensional Hamming cube into must incur a distortion of .

• Andoni, Indyk and Krauthgamer 08 - for sets with cardinalities upper bounded by a parameter the distortion reduces to .

• Naor and Schechtman 07 - any embedding of the EMD over must incur a distortion of .

sO(logslogd)

f0;1;::: ;¢ g2

log¢ )

Robust Distances

• Very high distances outliers same difference.

Robust Distances

• With colors, the natural choice.

Robust Distances

(blue;red) = 56

¢E 00(blue;yellow) = 102

¢E 00

Robust Distances - Exponent

• Usually a negative exponent is used:

² Let d(a;b) be a distance measure between twofeatures - a;b.

² The negative exponent distance is:

de(a;b) = 1¡ e¡ d(a ;b)

Robust Distances - Exponent

• Exponent is used because (Ruzon and Tomasi 01):

robust, smooth, monotonic, and a metric

Input is always discrete anyway …

Robust Distances - Thresholded

² Let d(a;b) be a distance measure between twofeatures - a;b.

² The thresholded distance with a thresholdof t > 0 is:

dt(a;b) = min(d(a;b);t).

• Thresholded metrics are also metrics (Pele and Werman ICCV 2009).

• Better results.

• Pele and Werman ICCV 2009 algorithm computes EMD with thresholded ground distances much faster.

• Thresholded distance corresponds to sparse similarities matrix -> faster QC / QF computation.

A i j = 1¡D i j

maxi j (D i j )

• Thresholded vs. exponent:

– Fast computation of cross-bin distances with a thresholded ground distance.

– Exponent changes small distances – can be a problem (e.g. color differences).

• Color distance should be thresholded (robust).

¢ E00min(¢ E00;10)10(1¡ e

¡ ¢ E 004 )

10(1¡ e¡ ¢ E 00

¢ E00min(¢ E00;10)10(1¡ e

¡ ¢ E 004 )

10(1¡ e¡ ¢ E 00

Exponent changes small distances

A Ground Distance for SIFT

dR =jj(xi ;yi ) ¡ (xj ;yj )jj2+

min(joi ¡ oj j;M ¡ joi ¡ oj j)

dT =min(dR ;T)

The ground distance between two SIFT bins(xi ;yi ;oi ) and (xj ;yj ;oj ):

A Ground Distance for Color Image

The ground distances between two LAB image bins(xi ;yi ;L i ;ai ;bi ) and (xj ;yj ;L j ;aj ;bj ) we use are:

dcT = min((jj(xi ;yi ) ¡ (xj ;yj )jj2)+

¢ 00((L i ;ai ;bi ); (L j ;aj ;bj ));T)

Perceptual Color Differences

• Euclidean distance on L*a*b* space is widely considered as perceptual uniform.

lue jjLabjj2

60adist2jjLabjj2

Purples before blues

• on L*a*b* space is better.

Luo, Cui and Rigg 01. Sharma, Wu and Dalal 05.

20¢ E00

¢ E00

• on L*a*b* space is better.¢ E00

jjLabjj2

¢ E00

lue ¢ E00

• on L*a*b* space is better.¢ E00

Perceptual Color Differences• on L*a*b* space is better.

• But still has major problems.

• Color distance should be thresholded (robust).

(blue;red) = 56

¢E 00(blue;yellow) = 102

¢E 00

¢ E00

• Color distance should be saturated (robust).

¢ E00min(¢ E00;10)10(1¡ e

¡ ¢ E 004 )

10(1¡ e¡ ¢ E 00

¢ E00min(¢ E00;10)10(1¡ e

¡ ¢ E 004 )

10(1¡ e¡ ¢ E 00

Exponent changes small distances

Perceptual Color Descriptors

•11 basic color terms. Berlin and Kay 69.

white red green yellow blue brown

purple pink orange grey

• Image copyright by Eric Rolph. Taken from: http://upload.wikimedia.org/wikipedia/commons/5/5c/Double-alaskan-rainbow.jpg

•How to give each pixel an “11-colors” description ?

• Learning Color Names from Real-World Images, J. van de Weijer, C. Schmid, J. Verbeek CVPR 2007.

chip-based real-world

• For each color returns a probability distribution over the 11 basic colors.

white red green yellow blue brown

purple pink orange grey

• Applying Color Names to Image Description, J. van de Weijer, C. Schmid ICIP 2007.

• Outperformed state of the art color descriptors.

• Using illumination invariants – black, gray and white are the same.

• “Too much invariance” happens in other cases (Local features and kernels for classification of texture and object categories: An in-depth study - Zhang, Marszalek, Lazebnik and Schmid. IJCV 2007, Learning the discriminative power-invariance trade-off - Varma and

Ray. ICCV 2007).

• To conclude: Don’t solve imaginary problems.

• This method is still not perfect.

• 11 color vector for purple(255,0,255) is:

• In real world images there are no such

over-saturated colors.

0 0 00 0 0 00010

• EMD variant that reduces the effect of large bins.

• Learning the ground distance for EMD.

• Learning the similarity matrix and normalization factor for QC.

Open Questions

Hands-On Code Example

http://www.cs.huji.ac.il/~ofirpele/FastEMD/code/

http://www.cs.huji.ac.il/~ofirpele/QC/code/

Tutorial:

Or “Ofir Pele”

http://www.cs.huji.ac.il/~ofirpele/DFML_ECCV2010_tutorial/

eccv2010: distance function and metric learning part 1

q i p i

p i p i

j p i q i j p

o g p i q i q i

j p i q i p j q j

d p q d p

q j s p q

q p q t w t w p q

Education

learning instance specific distance using metric propagation

distance metric learning: a comprehensive...

linear distance metric learning for large-scale generic...

on (not) indexing quadratic form distance by metric access...

metric space inversions, quasihyperbolic distance,...

distance metric learning

wang eccv2010

1 distance metric learning in data mining fei wang

measuring length and distance in metric units

a boosted distance metric: application to content based...

distance metric learning for large margin nearest neighbor

a belief function distance metric for orderable sets

metric measurement. measuring distance the basic metric unit...

eccv2010-poster-small · title: eccv2010-poster-small...

notion of distance - university at...

a modified metric to compute distance

a survey on distance metric learning (part 2)

routing state distance: a path-based metric for network

administrative distance & metric

hamming distance metric learning supplementary material