convolutional patch representations for image retrieval an unsupervised approach
TRANSCRIPT
Convolutional Patch Representations for Image Retrieval: an Unsupervised Approach
29th Mar 2016
Original slides by Eva MohedanoInsight Centre for Data Analytics (Dublin City University
Mattis Paulin, Julien Mairal, Matthijs Douze, Zaid Harchaoui, Florent Perronnin, Cordelia Schmidt
OverviewPublished ICCV 2015 (A.K.A. Local Convolutional Features With Unsupervised
Training for Image Retrieval)
Deep Convolutional Architecture to produce patch-level descriptors
• Unsupervised framework
• Comparison in patch and retrieval datasets
• “RomePatches” dataset
Related Work
• Shallow patch descriptors
• Deep learning for image retrieval
• Deep patch descriptors
Related Work• Shallow patch descriptors
SIFT – Scale-Invariant Feature Transform
- stereo matching
- retrieval
- classification
SURF, BRIEF, LIOP, (…)
Hand crafted → Relatively small number of parameters.
Note: A patch is an
image region extracted
from an image.
Related Work• Deep learning for image retrieval
CNN learned on a sufficiently large labeled dataset (ImageNet) generates intermediate layers that
can be used as image descriptors.
Those descriptors work for a wide variety of tasks, including image retrieval
Related Work• Deep learning for image retrieval
source image: http://pubs.sciepub.com/ajme/2/7/9/
Related Work• Deep learning for image retrieval
source image: http://pubs.sciepub.com/ajme/2/7/9/
Fully connected layers → Global Image Descriptors
● Compact representation
● lack of geometric invariance
Below state-of-the art in image
retrieval
Compute at different scales(Babenko, Razavian)
Related Work• Deep learning for image retrieval
source image: http://pubs.sciepub.com/ajme/2/7/9/
Convolutional layers
Related Work• Deep patch descriptors
3 different kind of supervision:
1. Category labels of ImageNet. [Long et al, 2014]
2. Surrogate patch labels: Each class is a given patch under different transformations [Fischer et al, 2014]
3. Matching/non-matching pairs. [Simo-Serra et al, 2015]
Works focussed in patch-level metrics, not image retrieval.
All approaches requiered some kind of supervision.
Image Retrieval Pipeline• Interest point detection
Hessian-Affine detector.
Rotation invariance.
• Interest point description
Feature representation in a Euclidean space
• Patch Matching
VLAD encoding.
Power normalization with exponent 0.5 + L2-norm.
Image Retrieval Pipeline• Interest point detection
Hessian-Affine detector.
Rotation invariance.
• Interest point description
Feature representation in a Euclidean space
• Patch Matching
VLAD encoding.
Power normalization with exponent 0.5 + L2-norm.
Convolutional DescriptorsPatch size = 51x51 – Optimal for SIFT on Oxford dataset.
CNN extended to retrieval by:
• Encoding local descriptors with model trained with an unrelated classification task
• Devising a surrogate classification problem that is as related as possible to image retrieval:
• Using unsupervised learning: Convolutional Kernel Network
Convolutional Descriptors• Using unsupervised learning: Convolutional Kernel Network
Feature representation based in a kernel (feature) map -- Data independent
Convolutional Descriptors• Using unsupervised learning: Convolutional Kernel Network
Projection in Hilbert space
Explicit kernel map can be computed to approximate it for computational efficiency.
- Sub-sample of patches
- Stochastic Gradient Optimization
Convolutional Descriptors• Using unsupervised learning: Convolutional Kernel Network
4 possible inputs
From left to right: CKN-raw, CKN-mean subs, CKN-white (mean subs + PCA-whitening), CKN-grad (fully invariant to color)
Only CKN-raw, CKN-white and CKN-grad are evaluated.
ExperimentsDatasets:
1. Rome Patches-Image
2. Oxford
3. UKbench and Holidays
CKN trained on 1M sub-patches. 300K iterations. Mini-batches size of 1000.
Experiments
Conclusions• CKN offer similar and sometimes better performance than CNN in the
context of patch description.
• Good patch retrieval translates into good image retrieval.
• CKNs are orders of magnitude faster to train than CNNs (10 min vs 2-3 days
on a modern GPU)
• Fully unsupervised – no labels.
ResourcesRomePatches+Code (Although code is not accessible!)
Discriminative Unsupervised Feature Learning with Exemplar Convolutional
Neural Networks
- Code with augmentations in matlab
- Code for training models.
- Models already trained :-)
Triplet’s net + Code !!
- Greyscale local patches of 32x32. Tested in matching datasets