phd thesis defence slides

121
Learning to Hash for Large Scale Image Retrieval Sean Moran Pre-Viva Talk 27 th November 2015

Upload: sean-moran-phd

Post on 21-Jan-2017

645 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: PhD thesis defence slides

Learning to Hash for Large Scale Image Retrieval

Sean Moran

Pre-Viva Talk

27th November 2015

Page 2: PhD thesis defence slides

Learning to Hash for Large Scale Image Retrieval

Overview

Evaluation Protocol

Learning Multiple Quantisation Thresholds

Learning the Hashing Hypersurfaces

Conclusions

Page 3: PhD thesis defence slides

Learning to Hash for Large Scale Image Retrieval

Overview

Evaluation Protocol

Learning Multiple Quantisation Thresholds

Learning the Hashing Hypersurfaces

Conclusions

Page 4: PhD thesis defence slides

Efficient Retrieval in Large Image Archives

Page 5: PhD thesis defence slides

Efficient Retrieval in Large Image Archives

Page 6: PhD thesis defence slides

Efficient Retrieval in Large Image Archives

Page 7: PhD thesis defence slides

Applications

Recommendation

Near duplicate detection

Content based IR

Recommendation

Near Duplicate Detection

Noun Clustering

Location Recognition

Page 8: PhD thesis defence slides

Locality Sensitive Hashing

H

DATABASE

Page 9: PhD thesis defence slides

Locality Sensitive Hashing

110101

010111

H

010101

111101

.....

DATABASE

HASH TABLE

Page 10: PhD thesis defence slides

Locality Sensitive Hashing

110101

010111

H

H

QUERY

DATABASE

HASH TABLE

010101

111101

.....

Page 11: PhD thesis defence slides

Locality Sensitive Hashing

110101

010111

H

H

COMPUTE SIMILARITY

NEARESTNEIGHBOURS

QUERY

010101

111101

.....

H

DATABASE

QUERY

HASH TABLE

Page 12: PhD thesis defence slides

Locality Sensitive Hashing

110101

010111

111101

H

H

Content Based IR

Image: Imense Ltd

Image: Doersch et al.

Image: Xu et al.

Location Recognition

Near duplicate detection

010101

111101

.....

H

QUERY

DATABASE

QUERY

NEARESTNEIGHBOURS

HASH TABLE

COMPUTE SIMILARITY

Page 13: PhD thesis defence slides

Toy 2D Feature Space

y

x

Page 14: PhD thesis defence slides

Feature Space Partitioning

y

w2

w1

h1

h2

x

a

b

Page 15: PhD thesis defence slides

Step 1: Projection onto Normal Vectors

x

y

w2

Page 16: PhD thesis defence slides

Step 2: Binary Quantisation

y1

y2Projected Dimension #2

Projected Dimension #1

t1

t 2

10

10

a b

a b

Page 17: PhD thesis defence slides

2-Bit Hashcodes of Data-Points

y

w2

w1

h1

h2

00

10 11

01

x

b

a

Page 18: PhD thesis defence slides

2-Bit Hashcodes of Data-Points

y

w2

w1

h1

h2

00

10 11

01

x

b

a

00

...

01

10

11

...

Hash Table

Page 19: PhD thesis defence slides

Thesis Statement

“The retrieval effectiveness of hashing-based ANN search can beimproved by learning the quantisation thresholds and hashinghyperplanes in a manner that is influenced by the distribution ofthe data”

Page 20: PhD thesis defence slides

Relaxing Common Assumptions (A1-A3)

I Three limiting assumptions permeate literature:

I A1: Single static threshold placed at zero

I A2: Uniform allocation of thresholds across each dimension

I A3: Linear hypersurfaces (hyperplanes) positioned randomly

Page 21: PhD thesis defence slides

Thesis Contributions

I Five novel contributions to address the assumptions:

I Learning Multiple Quantisation Thresholds [1]

I Learning Variable Quantisation Thresholds [2]

I Learning Unimodal Hashing Hypersurfaces [3]

I Learning Cross-Modal Hashing Hypersurfaces [4]

I Learning Hypersurfaces and Thresholds together

[1] Sean Moran, Victor Lavrenko, Miles Osborne. Neighbourhood Preserving Quantisation for LSH. In SIGIR’13.[2] Sean Moran, Victor Lavrenko, Miles Osborne. Variable Bit Quantisation for LSH. In ACL’13.[3] Sean Moran and Victor Lavrenko. Graph Regularised Hashing. In ECIR’15.[4] Sean Moran and Victor Lavrenko. Regularised Cross Modal Hashing. In SIGIR’15.

Page 22: PhD thesis defence slides

Thesis Contributions (This Talk)

I Two contributions discussed in this talk:

I Learning Multiple Quantisation Thresholds [1]

I Learning Variable Quantisation Thresholds [2]

I Learning Unimodal Hashing Hypersurfaces [3]

I Learning Cross-Modal Hashing Hypersurfaces [4]

I Learning Hypersurfaces and Thresholds together

[1] Sean Moran, Victor Lavrenko, Miles Osborne. Neighbourhood Preserving Quantisation for LSH. In SIGIR’13.[2] Sean Moran, Victor Lavrenko, Miles Osborne. Variable Bit Quantisation for LSH. In ACL’13.[3] Sean Moran and Victor Lavrenko. Graph Regularised Hashing. In ECIR’15.[4] Sean Moran and Victor Lavrenko. Regularised Cross Modal Hashing. In SIGIR’15.

Page 23: PhD thesis defence slides

Evaluation Protocol

Overview

Evaluation Protocol

Learning Multiple Quantisation Thresholds

Learning the Hashing Hypersurfaces

Conclusions

Page 24: PhD thesis defence slides

Dataset and Features

I Standard evaluation datasets [5,6]:

I CIFAR-10: 60,000 images, 512-D GIST descriptors1

I NUSWIDE: 270,000 images, 500-D BoW descriptors2

I SIFT-1M: 1,000,000 images, 128-D SIFT descriptors3

[5] Wei Liu, Jun Wang, Rongrong Ji, Yu-Gang Jiang, Shih-Fu Chang. Supervised Hashing with Kernels. InCVPR’12.[6] Yunchao Gong, Svetlana Lazebnik. Iterative Quantization: A Procrustean Approach to Learning BinaryCodes for Large-scale Image Retrieval. In CVPR’11.

1http://www.cs.toronto.edu/~kriz/cifar.html2http://lms.comp.nus.edu.sg/research/NUS-WIDE.htm3http://corpus-texmex.irisa.fr/

Page 25: PhD thesis defence slides

Example NUSWIDE Images [15]

Page 26: PhD thesis defence slides

ε-Nearest Neighbour Groundtruth [6]

y

x

ϵ

[6] Yunchao Gong, Svetlana Lazebnik. Iterative Quantization: A Procrustean Approach to Learning BinaryCodes for Large-scale Image Retrieval. In CVPR’11.

Page 27: PhD thesis defence slides

Class Based Groundtruth [6]

y

x

Class 1

Class 1

Class 2

Class 2Class 3

Class 3

Class 4

Class 4

[6] Yunchao Gong, Svetlana Lazebnik. Iterative Quantization: A Procrustean Approach to Learning BinaryCodes for Large-scale Image Retrieval. In CVPR’11.

Page 28: PhD thesis defence slides

Hamming Ranking Evaluation Paradigm [5,6]

0

1

3

3

4

5

5

01011

01011

01010

11000

11000

11100

10100

10100

Images ranked byHamming distance

Hashcode Hamming distance

Query

+

+

-

-

-

-

-

[5] Wei Liu, Jun Wang, Rongrong Ji, Yu-Gang Jiang, Shih-Fu Chang. Supervised Hashing with Kernels. InCVPR’12.[6] Yunchao Gong, Svetlana Lazebnik. Iterative Quantization: A Procrustean Approach to Learning BinaryCodes for Large-scale Image Retrieval. In CVPR’11.

Page 29: PhD thesis defence slides

Learning to Hash for Large Scale Image Retrieval

Overview

Evaluation Protocol

Learning Multiple Quantisation Thresholds

Learning the Hashing Hypersurfaces

Conclusions

Page 30: PhD thesis defence slides

Why Learn the Quantisation Threshold(s)?

I Threshold at zero is in region of highest point density:

0

200

400

600

800

1000

1200

1400

1600

-4 -3 -2 -1 0 1 2 3 4

Cou

nt

Projected Value

Distribution of Projected Values (LSH, CIFAR-10)

Page 31: PhD thesis defence slides

Previous work

I Single Bit Quantisation (SBQ): Single static thresholdplaced at zero (default used in most previous work) [13].

I Double Bit Quantisation (DBQ): Multiple thresholdsoptimised by minimising squared error objective [7].

I Manhattan Hashing Quantisation (MHQ): Multiplethresholds optimised by k-means [8].

[7] Weihao Kong, Wu-Jun Li.Double Bit Quantisation. In AAAI’12.[8] Weihao Kong, Wu-Jun Li. Manhattan Hashing for Large-Scale Image Retrieval. In SIGIR’12.

[13] Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: Towards removing the curse ofdimensionality. In STOC’98.

Page 32: PhD thesis defence slides

Previous work

Method Data-Dependent # Thresholds Codebook

SBQ [13] 1 0/1DBQ [7] X 2 00/11/10MHQ [8] X 2B − 1 NBC

NPQ [1] X 1,2,2B−1 Any

B : # of bits per projected dimension e.g. 1,2,3,4 bitsNBC: Natural Binary Code

[1] Sean Moran, Victor Lavrenko, Miles Osborne. NeighbourhoodPreserving Quantisation for LSH. In SIGIR’13.

Page 33: PhD thesis defence slides

Semi-Supervised Objective Function

I Idea: learn thresholds by fusing two sources of information:

1. Supervised: distances (or labels ) of points in the originalD-dimensional feature space

2. Unsupervised: distances between pairs of points in the 1Dprojected space

I Next few slides: constructing the semi-supervised objectivefunction

Page 34: PhD thesis defence slides

Adjacency Matrix S

I Adjacency matrix S ∈ 0, 1Ntrd×Ntrd :

Sij =

1, if xi , xj are true NNs

0, otherwise.(1)

Note: Ntrd N and matrix S highly sparse (1% of elementsnon-zero)

Page 35: PhD thesis defence slides

Adjacency Matrix S

t 3t1 t 2

y1

00 01 10 11

a bc de f g h

S a b c d e f g h

a +1b +1c +1d +1e +1f +1g +1h +1

Page 36: PhD thesis defence slides

Adjacency Matrix S

t 3t1 t 2

y1

00 01 10 11

a bc de f g h

S a b c d e f g h

a 0 +1 0 0 0 0 0 0b +1 0 0 0 0 0 0 0c 0 0 0 0 0 +1 0 0d 0 0 0 0 0 0 0 +1e 0 0 0 0 0 0 +1 0f 0 0 +1 0 0 0 0 0g 0 0 0 0 +1 0 0 0h 0 0 0 +1 0 0 0 0

Page 37: PhD thesis defence slides

Indicator Matrix P

I Indicator matrix P ∈ 0, 1Ntrd×Ntrd :

Pkij =

1, if ∃γ s.t. tkγ ≤ (yk

i , ykj ) ≤ tk(γ+1)

0, otherwise.(2)

Page 38: PhD thesis defence slides

Indicator Matrix P

t 3t1 t 2

y1

00 01 10 11

a b

P a b c d e f g h

a +1b +1cdefgh

Page 39: PhD thesis defence slides

Indicator Matrix P

t 3t1 t 2

y1

00 01 10 11

a bc

P a b c d e f g h

a +1 +1b +1 +1c +1 +1defgh

Page 40: PhD thesis defence slides

Indicator Matrix P

t 3t1 t 2

y1

00 01 10 11

a bc de f g h

P a b c d e f g h

a +1 +1b +1 +1c +1 +1de +1 +1 +1f +1 +1 +1g +1 +1 +1h +1 +1 +1

Page 41: PhD thesis defence slides

Indicator Matrix P

t 3t1 t 2

y1

00 01 10 11

a bc de f g h

P a b c d e f g h

a 0 +1 +1 0 0 0 0 0b +1 0 +1 0 0 0 0 0c +1 +1 0 0 0 0 0 0d 0 0 0 0 0 0 0 0e 0 0 0 0 0 +1 +1 +1f 0 0 0 0 +1 0 +1 +1g 0 0 0 0 +1 +1 0 +1h 0 0 0 0 +1 +1 +1 0

Page 42: PhD thesis defence slides

Supervised Part: F1-Measure Maximisation

I Integrate supervisory signal by maximising F1-measure:

F1 =2TP

2TP + FP + FN

I True positives (TPs): true NNs in same regions

I False positives (FPs): non-NNs in same regions

I False negatives (FNs): true NNs in different regions

Page 43: PhD thesis defence slides

Supervised Part: F1-Measure Maximisation

I True Positives (TPs):

TP =1

2

∑ij

PijSij =1

2‖P S‖1

P S a b c d e f g h

a 0 +1 0 0 0 0 0 0b +1 0 0 0 0 0 0 0c 0 0 0 0 0 0 0 0d 0 0 0 0 0 0 0 0e 0 0 0 0 0 0 +1 0f 0 0 0 0 0 0 0 0g 0 0 0 0 +1 0 0 0h 0 0 0 0 0 0 0 0

Page 44: PhD thesis defence slides

Supervised Part: F1-Measure Maximisation

I True Positives (TPs):

TP =1

2‖(P) (S)‖1 = 4/2 = 2

t 3t1 t 2

y1

00 01 10 11

a be g

Page 45: PhD thesis defence slides

Supervised Part: F1-Measure Maximisation

I False Negatives (FNs):

FN =1

2

∑ij

Sij − TP =1

2‖S‖1 − TP

S a b c d e f g h

a 0 +1 0 0 0 0 0 0b +1 0 0 0 0 0 0 0c 0 0 0 0 0 +1 0 0d 0 0 0 0 0 0 0 +1e 0 0 0 0 0 0 +1 0f 0 0 +1 0 0 0 0 0g 0 0 0 0 +1 0 0 0h 0 0 0 +1 0 0 0 0

Page 46: PhD thesis defence slides

Supervised Part: F1-Measure Maximisation

I False Negatives (FNs):

FN =1

2‖S‖1 − TP = 8/2− 2 = 2

t 3t1 t 2

y1

00 01 10 11

c df h

Page 47: PhD thesis defence slides

Supervised Part: F1-Measure Maximisation

I False Positives (FPs):

FP =1

2

∑ij

Pij − TP =1

2‖P‖1 − TP

P a b c d e f g h

a 0 +1 +1 0 0 0 0 0b +1 0 +1 0 0 0 0 0c +1 +1 0 0 0 0 0 0d 0 0 0 0 0 0 0 0e 0 0 0 0 0 +1 +1 +1f 0 0 0 0 +1 0 +1 +1g 0 0 0 0 +1 +1 0 +1h 0 0 0 0 +1 +1 +1 0

Page 48: PhD thesis defence slides

Supervised Part: F1-Measure Maximisation

I False Positives (FPs):

FP =1

2

∑ij

Pij − TP = 9− 2 = 7

t 3t1 t 2

y1

00 01 10 11

e gf h a bc

I Note: Not implemented like this as P may be dense.

Page 49: PhD thesis defence slides

Semi-Supervised Objective Function

I F1-measure term:

F1(tk) =2‖P S‖1‖S‖1 + ‖P‖1

= 4/13

I Unsupervised term:

Ω(tk) =1

σk

T+1∑j=1

∑i :yk

i ∈rj

yki − µj

2

I Maximise objective Jnpq:

Jnpq(tk) = αF1(tk) + (1− α)(1− Ω(tk))

Page 50: PhD thesis defence slides

Experimental Results

I Three research questions: (more in the thesis):

I R1: Do we outperform a single statically placed threshold?

I R2: When is more than one threshold beneficial toeffectiveness?

I R3: Can our method generalise to multiple thresholds andmany different projection functions?

Page 51: PhD thesis defence slides

Learning Curve (AUPRC vs. Amount of Supervision)

0 100 200 300 400 500 600 700 800 900 10000.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

0.50

CIFAR

NUSWIDE

# of Supervisory Data-Points

Va

lid

ati

on

AU

PR

C

Page 52: PhD thesis defence slides

Learning Curve (AUPRC vs. Amount of Supervision)

0 100 200 300 400 500 600 700 800 900 10000.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

0.50

CIFAR

NUSWIDE

# of Supervisory Data-Points

Va

lid

ati

on

AU

PR

C

Small amount of supervision

Page 53: PhD thesis defence slides

Benefit of Optimising a Single Threshold (R1)

RND SBQ NPQ0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

0.50

CIFAR

NUS-WIDE

Te

st

AU

PR

C

Page 54: PhD thesis defence slides

Benefit of Optimising a Single Threshold (R1)

RND SBQ NPQ0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

0.50

CIFARNUS-WIDE

Tes

t A

UP

RC

+55%

+35%

Page 55: PhD thesis defence slides

Benefit of Optimising a Single Threshold (R1)

RND SBQ NPQ0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

0.50

CIFAR

NUS-WIDE

Te

st

AU

PR

C

Better to learn thresholds

Page 56: PhD thesis defence slides

Benefit of Multiple Thresholds (R2)

LSH (T=1) LSH (T=3) LSH (T=7)0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

Te

st

AU

PR

C

Page 57: PhD thesis defence slides

Benefit of Multiple Thresholds (R2)

LSH (T=1) LSH (T=3) LSH (T=7)0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

Test A

UP

RC

-36%

-38%

Page 58: PhD thesis defence slides

Benefit of Multiple Thresholds (R2)

LSH (T=1) LSH (T=3) LSH (T=7)0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

Te

st

AU

PR

C

1 thresholdoptimal for LSH

Page 59: PhD thesis defence slides

Benefit of Multiple Thresholds (R2)

PCA (T=1) PCA (T=3) PCA (T=7)0.05

0.07

0.09

0.11

0.13

0.15

0.17

0.19

0.21

Te

st

AU

PR

C

Page 60: PhD thesis defence slides

Benefit of Multiple Thresholds (R2)

PCA (T=1) PCA (T=3) PCA (T=7)0.05

0.07

0.09

0.11

0.13

0.15

0.17

0.19

0.21

Test A

UP

RC

+14%

+19%

Page 61: PhD thesis defence slides

Benefit of Multiple Thresholds (R2)

PCA (T=1) PCA (T=3) PCA (T=7)0.05

0.07

0.09

0.11

0.13

0.15

0.17

0.19

0.21

Test A

UP

RC

2+ thresholdsoptimal for PCA

Page 62: PhD thesis defence slides

Comparison to the State-of-the-Art (R3)

DBQ MHQ NPQ0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

LSH SH

PCA SKLSH

Te

st

AU

PR

C

Page 63: PhD thesis defence slides

Comparison to the State-of-the-Art (R3)

DBQ MHQ NPQ0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

LSH SH

PCA SKLSH

Test A

UP

RC

+67%

Page 64: PhD thesis defence slides

Comparison to the State-of-the-Art (R3)

DBQ MHQ NPQ0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

0.18

LSH SH

PCA SKLSH

Te

st

AU

PR

C

Page 65: PhD thesis defence slides

Comparison to the State-of-the-Art (R3)

DBQ MHQ NPQ0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

0.18

LSH SH

PCA SKLSH

Te

st

AU

PR

C

Page 66: PhD thesis defence slides

Comparison to the State-of-the-Art (R3)

DBQ MHQ NPQ0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

0.18

LSH SH

PCA SKLSH

Te

st

AU

PR

C

Page 67: PhD thesis defence slides

Comparison to the State-of-the-Art (R3)

DBQ MHQ NPQ0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

0.18

LSH SH

PCA SKLSH

Te

st

AU

PR

C

Generalises to other projection functions

Page 68: PhD thesis defence slides

End of Part 1 of Talk

I Takeaway messages:

I R1: Optimising 1 threshold is much better than placing thethreshold at zero.

I R2: ≥ 2 thresholds beneficial for eigendecomposition-basedprojections.

I R3: The benefits of threshold learning generalise to otherprojection functions.

Page 69: PhD thesis defence slides

Learning to Hash for Large Scale Image Retrieval

Overview

Evaluation Protocol

Learning Multiple Quantisation Thresholds

Learning the Hashing Hypersurfaces

Conclusions

Page 70: PhD thesis defence slides

Why Learn the Hashing Hypersurfaces?

y

w1

h1 0

1

x

Page 71: PhD thesis defence slides

Why Learn the Hashing Hypersurfaces?

y

w1

h1

0 1

x

Page 72: PhD thesis defence slides

Previous work

I Data-independent: Locality Sensitive Hashing (LSH) [13]

I Data-dependent (unsupervised): Anchor Graph Hashing(AGH) [11], Spectral Hashing (SH) [9]

I Data-dependent (supervised): Self Taught Hashing (STH)[12], Supervised Hashing with Kernels (KSH) [5], ITQ + CCA[6], Binary Reconstructive Embedding (BRE) [10]

[5] Wei Liu, Jun Wang, Rongrong Ji, Yu-Gang Jiang, Shih-Fu Chang. Supervised Hashing with Kernels. InCVPR’12.[6] Yunchao Gong, Svetlana Lazebnik. Iterative Quantization: A Procrustean Approach to Learning BinaryCodes for Large-scale Image Retrieval. In CVPR’11.[9] Yair Weiss, Antonio Torralba, Rob Fergus. Spectral Hashing. In NIPS’08.[10] Brian Kulis and Trevor Darrell. Learning to Hash with Binary Reconstructive Embeddings. In NIPS’09.[11] Wei Liu, Jun Wang, Sanjiv Kumar, Shih-Fu Chang. Hashing with Graphs. In ICML’11.[12] Dell Zhang, Jun Wang, Deng Cai, Jinsog Lu. Self-Taught Hashing for Fast Similarity Search. In SIGIR’10.[13] Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: Towards removing the curse ofdimensionality. In STOC’98.

Page 73: PhD thesis defence slides

Previous work

Method Learnt Supervised Scalable Effectiveness

LSH [13] X LowSH [9] X LowSTH [12] X X MediumBRE [10] X X MediumITQ+CCA [6] X X MediumKSH [5] X X High

GRH [3] X X X High

[3] Sean Moran and Victor Lavrenko. Graph Regularised Hashing. InECIR’15.

Page 74: PhD thesis defence slides

Learning the Hashing Hypersurfaces

I Two step iterative hashing model:

I Step A: Graph Regularisation

Lm ← sgn(α SD−1Lm−1 + (1−α)L0

)I Step B: Data-Space Partitioning

for k = 1. . .K : min ||wk ||2 + C∑N

i=1 ξik

s.t. Lik(wkᵀxi + bk) ≥ 1− ξik for i = 1. . .N

I Repeat for a set number of iterations (M)

Page 75: PhD thesis defence slides

Learning the Hashing Hypersurfaces

I Step A: Graph Regularisation [14]

Lm ← sgn(α SD−1Lm−1 + (1−α)L0

)

I S: Affinity (adjacency) matrix

I D: Diagonal degree matrix

I L: Binary bits at specified iteration

I α: Interpolation parameter (0 ≤ α ≤ 1)

[14] Fernando Diaz. Regularizing query-based retrieval scores. InIR’07.

Page 76: PhD thesis defence slides

Learning the Hashing Hypersurfaces

I Step A: Graph Regularisation

Lm ← sgn(α SD−1Lm−1 + (1−α)L0

)

I S: Affinity (adjacency) matrix

I D: Diagonal degree matrix

I L: Binary bits at specified iteration

I α: Interpolation parameter (0 ≤ α ≤ 1)

Page 77: PhD thesis defence slides

Learning the Hashing Hypersurfaces

I Step A: Graph Regularisation

Lm ← sgn(α SD−1Lm−1 + (1−α)L0

)

I S: Affinity (adjacency) matrix

I D: Diagonal degree matrix

I L: Binary bits at specified iteration

I α: Interpolation parameter (0 ≤ α ≤ 1)

Page 78: PhD thesis defence slides

Learning the Hashing Hypersurfaces

I Step A: Graph Regularisation

Lm ← sgn(α SD−1Lm−1 + (1−α)L0

)

I S: Affinity (adjacency) matrix

I D: Diagonal degree matrix

I L: Binary bits at specified iteration

I α: Interpolation parameter (0 ≤ α ≤ 1)

Page 79: PhD thesis defence slides

Learning the Hashing Hypersurfaces

I Step A: Graph Regularisation

Lm ← sgn(α SD−1Lm−1 + (1−α)L0

)

I S: Affinity (adjacency) matrix

I D: Diagonal degree matrix

I L: Binary bits at specified iteration

I α: Interpolation parameter (0 ≤ α ≤ 1)

Page 80: PhD thesis defence slides

Learning the Hashing Hypersurfaces

-1 1 1

-1 -1 -1ba

c

1 1 1

S a b c

a 1 1 0b 1 1 1c 0 1 1

D−1 a b c

a 0.5 0 0b 0 0.33 0c 0 0 0.5

L0 b1 b2 b3

a −1 −1 −1b −1 1 1c 1 1 1

Page 81: PhD thesis defence slides

Learning the Hashing Hypersurfaces

-1 1 1

-1 -1 -1ba

c

1 1 1

L1 = sgn

−1 0 0−0.33 0.33 0.33

0 1 1

Page 82: PhD thesis defence slides

Learning the Hashing Hypersurfaces

-1 1 1

-1 1 1ba

c

1 1 1

L1 =

b1 b2 b3

a −1 1 1b −1 1 1c 1 1 1

Page 83: PhD thesis defence slides

Learning the Hashing Hypersurfaces

I Step B: Data-Space Partitioning

for k = 1. . .K : min ||wk ||2 + C∑N

i=1 ξik

s.t. Lik(wkᵀxi + tk) ≥ 1− ξik for i = 1. . .N

I wk : Hyperplane normal k

I tk : bias of hyperplane k

I xi : data-point i

I Lik : bit k of data-point i

ξik : slack variable ik

K : # bits

N: # data-points

Page 84: PhD thesis defence slides

Learning the Hashing Hypersurfaces

I Step B: Data-Space Partitioning

for k = 1. . .K : min ||wk ||2 + C∑N

i=1 ξik

s.t. Lik(wkᵀxi + tk) ≥ 1− ξik for i = 1. . .N

I wk : Hyperplane normal k

I tk : bias of hyperplane k

I xi : data-point i

I Lik : bit k of data-point i

ξik : slack variable ik

K : # bits

N: # data-points

Page 85: PhD thesis defence slides

Learning the Hashing Hypersurfaces

I Step B: Data-Space Partitioning

for k = 1. . .K : min ||wk ||2 + C∑N

i=1 ξik

s.t. Lik(wkᵀxi + tk) ≥ 1− ξik for i = 1. . .N

I wk : Hyperplane normal k

I tk : bias of hyperplane k

I xi : data-point i

I Lik : bit k of data-point i

ξik : slack variable ik

K : # bits

N: # data-points

Page 86: PhD thesis defence slides

Learning the Hashing Hypersurfaces

I Step B: Data-Space Partitioning

for k = 1. . .K : min ||wk ||2 + C∑N

i=1 ξik

s.t. Lik(wkᵀxi + tk) ≥ 1− ξik for i = 1. . .N

I wk : Hyperplane normal k

I tk : bias of hyperplane k

I xi : data-point i

I Lik : bit k of data-point i

ξik : slack variable ik

K : # bits

N: # data-points

Page 87: PhD thesis defence slides

Learning the Hashing Hypersurfaces

I Step B: Data-Space Partitioning

for k = 1. . .K : min ||wk ||2 + C∑N

i=1 ξik

s.t. Lik(wkᵀxi + tk) ≥ 1− ξik for i = 1. . .N

I wk : Hyperplane normal k

I tk : bias of hyperplane k

I xi : data-point i

I Lik : bit k of data-point i

ξik : slack variable ik

K : # bits

N: # data-points

Page 88: PhD thesis defence slides

Learning the Hashing Hypersurfaces

ba

c

e

f

g

h

d

Page 89: PhD thesis defence slides

Learning the Hashing Hypersurfaces

ba

c

e

f

g

h

d

Page 90: PhD thesis defence slides

Learning the Hashing Hypersurfaces

-1 1 1

1 1 1

-1 -1 -1

1 1 -1

ba

c

e

f

g

h

d

-1 1 1

1 -1 -1

1 -1 -1

1 1 1

Page 91: PhD thesis defence slides

Learning the Hashing Hypersurfaces

1 1 1

-1 1 1

-1 -1 -1

1 1 -1

ba

c

e

f

g

h

d

-1 1 1

1 1 1

1 -1 -1

1 -1 -1

Page 92: PhD thesis defence slides

Learning the Hashing Hypersurfaces

-1 1 1

-1 -1 -1

1 1 -1

ba

c

e

f

g

h

d

-1 1 1

-1 1 1

1 1 1

First bit flipped

1 1 -1 Second bit flipped

1 -1 -1

Page 93: PhD thesis defence slides

Learning the Hashing Hypersurfaces

-1 1 1

1 1 1

-1 -1 -1

1 1 -1

ba

c

e

f

g

h

d

-1 1 1w1. x−t 1=0

w1

Negative (-1)half space

Positive (+1)half space

1 1 -1

1 -1 -1

-1 1 1

Page 94: PhD thesis defence slides

Learning the Hashing Hypersurfaces

-1 1 1

1 1 1

-1 -1 -1

1 1 -11 -1 -1

ba

c

e

f

g

h

1 1 -1

d

-1 1 1

-1 1 1

w2

Positive (+1)half space

w2 . x−t 2=0 Negative (-1) half space

Page 95: PhD thesis defence slides

Experimental Results

I Three research questions: (more in the thesis):

I R1: Does graph regularisation step help us learn more effectivehashcodes?

I R2: Is it necessary to solve an eigenvalue problem in order tolearn effective hashcodes?

I R2: Do non-linear hypersurfaces provide a more effectivebucketing of the space compared to linear hypersurfaces?

Page 96: PhD thesis defence slides

Learning Curve (mAP vs. Amount of Supervision)

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000.10

0.15

0.20

0.25

0.30

# Supervisory Data Points

Valid

atio

n m

AP

Page 97: PhD thesis defence slides

Learning Curve (mAP vs. Amount of Supervision)

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000.10

0.15

0.20

0.25

0.30

# Supervisory Data Points

Valid

atio

n m

AP

Small amount of supervision 5% of dataset size

Page 98: PhD thesis defence slides

Effect of α and Iterations (CIFAR-10 @ 32 bits)

0 1 2 3 4 50.12

0.14

0.16

0.18

0.20

0.22

0.24

0.26

0.280.70.80.9

Iterations

Va

lid

ati

on

mA

P

Page 99: PhD thesis defence slides

Effect of α and Iterations (CIFAR-10 @ 32 bits)

0 1 2 3 4 50.12

0.14

0.16

0.18

0.20

0.22

0.24

0.26

0.280.70.80.9

Iterations

Va

lid

ati

on

mA

P

Fast convergence. Most weight on supervisory signal

Page 100: PhD thesis defence slides

GRH vs Literature (CIFAR-10 @ 32 bits)

LSHBRE

STHKSH

GRH (Linear)GRH (RBF)

0.10

0.12

0.14

0.16

0.18

0.20

0.22

0.24

0.26

0.28

0.30

Tes

t m

AP

Page 101: PhD thesis defence slides

GRH vs Literature (CIFAR-10 @ 32 bits)

LSHBRE

STHKSH

GRH (Linear)GRH (RBF)

0.10

0.12

0.14

0.16

0.18

0.20

0.22

0.24

0.26

0.28

0.30

Test

mA

P

Page 102: PhD thesis defence slides

GRH vs Literature (CIFAR-10 @ 32 bits)

LSHBRE

STHKSH

GRH (Linear)GRH (RBF)

0.10

0.15

0.20

0.25

0.30

0.35

Tes

t m

AP

Page 103: PhD thesis defence slides

GRH vs Literature (CIFAR-10 @ 32 bits)

LSHBRE

STHKSH

GRH (Linear)GRH (RBF)

0.10

0.15

0.20

0.25

0.30

0.35

Test

mA

P

-8%

Linear GRH

Non-Linear KSH

Page 104: PhD thesis defence slides

GRH vs Literature (CIFAR-10 @ 32 bits)

LSHBRE

STHKSH

GRH (Linear)GRH (RBF)

0.10

0.15

0.20

0.25

0.30

0.35

Test

mA

P

-8%

Linear GRH

Non-Linear KSH

Non-Linear GRH

+14%

Page 105: PhD thesis defence slides

GRH vs Literature (CIFAR-10 @ 32 bits)

LSHBRE

STHKSH

GRH (Linear)GRH (RBF)

0.10

0.15

0.20

0.25

0.30

0.35

Test

mA

P Linear hypersurfaces do really well

Page 106: PhD thesis defence slides

GRH vs Literature (CIFAR-10 @ 32 bits)

LSHBRE

STHKSH

GRH (Linear)GRH (RBF)

0.10

0.15

0.20

0.25

0.30

0.35

Test

mA

P Best performance with non-linear hypersurfaces

Page 107: PhD thesis defence slides

GRH vs. Initialisation Strategy (CIFAR-10 @ 32 bits)

GRH (Linear) GRH (RBF)0.00

0.05

0.10

0.15

0.20

0.25

0.30 LSH

ITQ+CCA

Tes

t m

AP

v

Page 108: PhD thesis defence slides

GRH vs. Initialisation Strategy (CIFAR-10 @ 32 bits)

GRH (Linear) GRH (RBF)0.00

0.05

0.10

0.15

0.20

0.25

0.30 LSH

ITQ+CCA

Tes

t m

AP

Page 109: PhD thesis defence slides

GRH vs. Initialisation Strategy (CIFAR-10 @ 32 bits)

GRH (Linear) GRH (RBF)0.00

0.05

0.10

0.15

0.20

0.25

0.30 LSH

ITQ+CCA

Tes

t m

AP

Solving eigenvalue problem not really necessary

Page 110: PhD thesis defence slides

GRH Timing (CIFAR-10 @ 32 bits)

Timings (s)*Method Train Test Total % rel cng vs. KSHGRH 8.008 0.033 8.041 -90%

KSH [5] 74.024 0.103 82.27 –

BRE [10] 227.841 0.370 231.4 +181%

* All times obtained an idle Intel 2.7GHz, 16Gb RAM machine andaveraged over 10 runs.

[5] Wei Liu, Jun Wang, Rongrong Ji, Yu-Gang Jiang, Shih-Fu Chang. Supervised Hashing with Kernels. InCVPR’12.[10] Brian Kulis and Trevor Darrell. Learning to Hash with Binary Reconstructive Embeddings. In NIPS’09.

Page 111: PhD thesis defence slides

Example of Top 10 Retrieved Images

Queries

GRH

LSH

GRH

LSH

Ranked list

Page 112: PhD thesis defence slides

End of Part 2 of Talk

I Takeaway messages:

I R1: Regularising bits over a graph is effective (and efficient)for hashcode learning.

I R2: An intermediate eigendecomposition step is not necessary.

I R2: Non-linear hypersurfaces yield a more effective bucketingof the space than hyperplanes.

Page 113: PhD thesis defence slides

Learning to Hash for Large Scale Image Retrieval

Overview

Evaluation Protocol

Learning Multiple Quantisation Thresholds

Learning the Hashing Hypersurfaces

Conclusions

Page 114: PhD thesis defence slides

Thesis Statement (Recap)

“The retrieval effectiveness of hashing-based ANN search can beimproved by learning the quantisation thresholds and hashinghyperplanes in a manner that is influenced by the distribution ofthe data”

Page 115: PhD thesis defence slides

Conclusions

I Learning the thresholds and hypersurfaces:

I Semi-supervised clustering algorithm for learning quantisationthresholds.

I EM-like algorithm for positioning hashing hypersurfaces basedon a supervisory signal.

I Additional contributions in the thesis:

1. Learning multi-modal hashing hypersurfaces.

2. Learning variable quantisation thresholds.

3. Combining threshold and hypersurface learning.

Page 116: PhD thesis defence slides

Future Work

I Extend the threshold and hypersurface learning to an onlinestreaming data scenario e.g. event detection in Twitter.

I Leverage Deep Learning to optimise feature representationand hash functions end-to-end rather than using hand craftedfeatures (e.g. SIFT).

Page 117: PhD thesis defence slides

Thank you for your attention!

Page 118: PhD thesis defence slides

References

I [1] Sean Moran, Victor Lavrenko, Miles Osborne.Neighbourhood Preserving Quantisation for LSH. InSIGIR’13.

I [2] Sean Moran, Victor Lavrenko, Miles Osborne. VariableBit Quantisation for LSH. In ACL’13.

I [3] Sean Moran and Victor Lavrenko. Graph RegularisedHashing. In ECIR’15.

I [4] Sean Moran and Victor Lavrenko. Regularised CrossModal Hashing. In SIGIR’15.

Page 119: PhD thesis defence slides

References

I [5] Wei Liu, Jun Wang, Rongrong Ji, Yu-Gang Jiang, Shih-FuChang. Supervised Hashing with Kernels. In CVPR’12.

I [6] Yunchao Gong, Svetlana Lazebnik. IterativeQuantization: A Procrustean Approach to LearningBinary Codes for Large-scale Image Retrieval. InCVPR’11.

I [7] Weihao Kong, Wu-Jun Li.Double Bit Quantisation. InAAAI’12.

I [8] Weihao Kong, Wu-Jun Li. Manhattan Hashing forLarge-Scale Image Retrieval. In SIGIR’12.

Page 120: PhD thesis defence slides

References

I [9] Yair Weiss, Antonio Torralba, Rob Fergus. SpectralHashing. In NIPS’08.

I [10] Brian Kulis and Trevor Darrell. Learning to Hash withBinary Reconstructive Embeddings. In NIPS’09.

I [11] Wei Liu, Jun Wang, Sanjiv Kumar, Shih-Fu Chang.Hashing with Graphs. In ICML’11.

I [12] Dell Zhang, Jun Wang, Deng Cai, Jinsog Lu.Self-Taught Hashing for Fast Similarity Search. InSIGIR’10.

Page 121: PhD thesis defence slides

References

I [13] Piotr Indyk and Rajeev Motwani. Approximate nearestneighbors: Towards removing the curse ofdimensionality. In STOC’98.

I [14] Fernando Diaz. Regularizing query-based retrievalscores. In IR’07.

I [15] Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li,Zhiping Luo, and Yan-Tao Zheng. NUS-WIDE: A Real-WorldWeb Image Database from National University of Singapore.In ICIVR’09.