hardnet: convolutional network for local image description
TRANSCRIPT
HARDNET: CONVOLUTIONAL NETWORK FOR LOCAL IMAGE DESCRIPTION
Anastasiia Mishchuk,Dmytro Mishkin,Filip RadenovicJiri Matas
Short review of methods for learning of local descriptors
The L2NetHardNet loss and architectureBenchmarks
2
OUTLINE
3
TRAINING DATA
Discriminant Learning of Local Image DescriptorsBrown et al, PAMI2010
3 sets, 400k patches each: • Liberty (shown)• Notredame• Yosemite
Size: 64x64, grayscale.Obtained from SfM model, 3D point → DoG keypoints
Used in all learned descriptors meantionedin this presentation
4
CONVEXOPT (SIMONYAN ET AL, 2012)
Global margin loss
Simonyan et al, ECCV 2012
Convex optimization problem
5
MATCHNET
Han et al, CVPR2015.
Works well, but rely on metric network. Approximate kNN methods, e.g. FLANNcannot be applied directly
7x7 Convpad 1
64
24ReLU
1 24
5x5 Convpad 2
64ReLU
64
3x3 Convpad 1
ReLU
642x2 MP/2
32 322x2 MP/2
16
96
16
3x3 Convpad 1
ReLU96
163x3 Conv
pad 1
ReLU64
163x3 MP/2
64
88x8 Conv
ReLU
1
128
1x1 Conv
ReLU
1
256
1x1 Conv
ReLU
1
256
1x1 Conv
Softmax
1
2
6
DEEPCOMPARE
Zagoruyko and Komodakis, CVPR 2015
Works well, but rely on metric network. Approximate kNN methods, e.g. FLANNcannot be applied directly
7x7 Convpad 3
64
96ReLU
1 96
5x5 Convpad 2
192ReLU
192
3x3 Convpad 1
ReLU
642x2 MP/2
32 322x2 MP/2
16
256
16
8
8x8 Conv
ReLU
1 1
256
1x1 Conv
ReLU256
1x1 Conv
Sigmoid 1
2x2 MP/2
256
9
Simo-Serra et al, ICCV 2015. Balntas et al, BMVC 201632
327x7 Conv
26
TanH1
2x2 MP/213
6x6 Conv
TanH
8
64
8x8 Conv
TanH
1
12832
TFeat (Balntas et al, 2016) • Even shallower and faster CNN,• hard-negative mining:
by anchor swap in triplet.• triplet margin loss on L2 distance
1
647x7 Conv
58
32TanH
2x2 L2pool/2
29 6x6 Conv
TanH
23
64
5x5 Conv
TanH
4
12832
3x3L2Pool/3 8
64
4x4L2Pool/4
1
128
DeepDesc (Simo-Serra et al, 2015 )Relatively shallow and fast CNN. Hard negative mining:Contrastive loss on L2 distance
10
DESCRIPTOR COMPARISONDescr. #layers
w/paramsLoss Hard mining Kd-tree
readyConvexOpt 1 Global margin - +
DeepDesc 3 Contrastive + +
TFeat 3 Triplet margin +/- +
MatchNet 8 Cross entropy - -
DeepComp 5 Hinge - -
Balntas et al, BMVC 2016
11
L2NET. TIAN ET AL (CVPR 2017)32 32 16 16
3x3 Convpad 1
32
32BN + ReLU
1
3x3 Convpad 1
32BN + ReLU
3x3 Convpad 1 /2
64BN + ReLU
3x3 Convpad 1
64BN + ReLU
3x3 Convpad 1 /2
BN + ReLU
8
128
3x3 Convpad 1
BN + ReLU
8
128
8x8 Conv
BN+ L2Norm
1
128
13
L2NET: LOSS TERMS
Softmax over row/column of distance matrix
14
L2NET: LOSS TERMS
Softmax over row/column of distance matrix
Penalty on descriptor components correlation
15
L2NET: LOSS TERMS
Softmax over row/column of distance matrix
Softmax over row/column of distance matrix of intermediate features
Penalty on descriptor components correlation
16
HARDNET
Triplet margin loss for hard negative
Penalty on descriptor channels correlation
Softmax over row/column of distance matrix of intermediate features
17
HARDNET (OURS)3x3 Conv
pad 1
32
32BN + ReLU
1
3x3 Convpad 1
32BN + ReLU
3x3 Convpad 1 /2
64BN + ReLU
3x3 Convpad 1
64BN + ReLU
3x3 Convpad 1 /2
BN + ReLU
8
128
3x3 Convpad 1
BN + ReLU
8
128
8x8 Conv
BN+ L2Norm
1
128
18
BATCH SIZE INFLUENCE
19
DESCRIPTOR COMPARISON
Descr. #layersw/params
Loss Hard mining Kd-tree ready
ConvexOpt 1 Global margin - +
DeepDesc 3 Contrastive + +
TFeat 3 Triplet margin +/- +
MatchNet 8 Cross entropy - -
DeepComp 5 Hinge - -
L2Net 7 SoftMax + +
HardNet 7 Triplet margin + +
Loss comparison on patch triplets
20
21
LOSSES COMPARISON, DERIVATIVES
22
LOSSES COMPARISON, DERIVATIVES
No gradient
from negative exampleSmall gradients
23
LOSSES COMPARISON
Contrastive Softmax (L2Net) Triplet margin
FPR, Brown Yos
0.009 0.009 0.006
mAUC, W1BS 0.072 0.083 0.083
mAUC, HP-T 0.153 0.157 0.164
Results
24
25
RESULTS: BROWN DATASET
26
RESULTS: W1BS DATASET
Mishkin et al, BMVC 2015
Nuisance factor: Appearance Geometry Lighting Sensor
27
HPATCHES DATASETDoG, Hessian, Harris – in ref.image~1300 patches per image kept.Reprojected to other images with3 levels of “affine frame noise” added
V: 57 image sixplets – photometric changesI: 59 image sixplets – geometric changes
Balntas et al, CVPR 2017
28
RESULTS: HPATCHES
29
RESULTS: MATCHING WITH VIEW SYNTH
Datasets are already saturated On par withRootSIFT
Still challenging due to multiple nuisance factors
Zitnick and Ramnath, 2011, Mishkin et al 2015, Mikolajczyk et al. 2013,Hauagge and Snavely, 2012, Kelman et al, 2007, Fernando et al. 2014
30
RESULTS: BOW OXFORD5K & PARIS 6K
Philbin et al 2007, Philbin et al 2008
31
RESULTS: HQE OXFORD5K & PARIS 6K
Thank youfor attention
PDF: https://arxiv.org/abs/1705.10872Source and models: https://github.com/DagnyT/hardnet
32