hardnet: convolutional network for local image description

HARDNET: CONVOLUTIONAL NETWORK FOR LOCAL IMAGE DESCRIPTION

Anastasiia Mishchuk,Dmytro Mishkin,Filip RadenovicJiri Matas

Short review of methods for learning of local descriptors

The L2NetHardNet loss and architectureBenchmarks

2

OUTLINE

3

TRAINING DATA

Discriminant Learning of Local Image DescriptorsBrown et al, PAMI2010

3 sets, 400k patches each: • Liberty (shown)• Notredame• Yosemite

Size: 64x64, grayscale.Obtained from SfM model, 3D point → DoG keypoints

Used in all learned descriptors meantionedin this presentation

4

CONVEXOPT (SIMONYAN ET AL, 2012)

Global margin loss

Simonyan et al, ECCV 2012

Convex optimization problem

5

MATCHNET

Han et al, CVPR2015.

Works well, but rely on metric network. Approximate kNN methods, e.g. FLANNcannot be applied directly

7x7 Convpad 1

64

24ReLU

1 24

5x5 Convpad 2

64ReLU

64

3x3 Convpad 1

ReLU

642x2 MP/2

32 322x2 MP/2

16

96

16

3x3 Convpad 1

ReLU96

163x3 Conv

pad 1

ReLU64

163x3 MP/2

64

88x8 Conv

ReLU

1

128

1x1 Conv

ReLU

1

256

1x1 Conv

ReLU

1

256

1x1 Conv

Softmax

1

2

6

DEEPCOMPARE

Zagoruyko and Komodakis, CVPR 2015

Works well, but rely on metric network. Approximate kNN methods, e.g. FLANNcannot be applied directly

7x7 Convpad 3

64

96ReLU

1 96

5x5 Convpad 2

192ReLU

192

3x3 Convpad 1

ReLU

642x2 MP/2

32 322x2 MP/2

16

256

16

8

8x8 Conv

ReLU

1 1

256

1x1 Conv

ReLU256

1x1 Conv

Sigmoid 1

2x2 MP/2

256

9

Simo-Serra et al, ICCV 2015. Balntas et al, BMVC 201632

327x7 Conv

26

TanH1

2x2 MP/213

6x6 Conv

TanH

8

64

8x8 Conv

TanH

1

12832

TFeat (Balntas et al, 2016) • Even shallower and faster CNN,• hard-negative mining:

by anchor swap in triplet.• triplet margin loss on L2 distance

1

647x7 Conv

58

32TanH

2x2 L2pool/2

29 6x6 Conv

TanH

23

64

5x5 Conv

TanH

4

12832

3x3L2Pool/3 8

64

4x4L2Pool/4

1

128

DeepDesc (Simo-Serra et al, 2015 )Relatively shallow and fast CNN. Hard negative mining:Contrastive loss on L2 distance

10

DESCRIPTOR COMPARISONDescr. #layers

w/paramsLoss Hard mining Kd-tree

readyConvexOpt 1 Global margin - +

DeepDesc 3 Contrastive + +

TFeat 3 Triplet margin +/- +

MatchNet 8 Cross entropy - -

DeepComp 5 Hinge - -

Balntas et al, BMVC 2016

11

L2NET. TIAN ET AL (CVPR 2017)32 32 16 16

3x3 Convpad 1

32

32BN + ReLU

1

3x3 Convpad 1

32BN + ReLU

3x3 Convpad 1 /2

64BN + ReLU

3x3 Convpad 1

64BN + ReLU

3x3 Convpad 1 /2

BN + ReLU

8

128

3x3 Convpad 1

BN + ReLU

8

128

8x8 Conv

BN+ L2Norm

1

128

13

L2NET: LOSS TERMS

Softmax over row/column of distance matrix

14

L2NET: LOSS TERMS


Penalty on descriptor components correlation

15

L2NET: LOSS TERMS


Softmax over row/column of distance matrix of intermediate features

Penalty on descriptor components correlation

16

HARDNET

Triplet margin loss for hard negative

Penalty on descriptor channels correlation

Softmax over row/column of distance matrix of intermediate features

17

HARDNET (OURS)3x3 Conv

pad 1

32

32BN + ReLU

1

3x3 Convpad 1

32BN + ReLU

3x3 Convpad 1 /2

64BN + ReLU

3x3 Convpad 1

64BN + ReLU

3x3 Convpad 1 /2

BN + ReLU

8

128

3x3 Convpad 1

BN + ReLU

8

128

8x8 Conv

BN+ L2Norm

1

128

18

BATCH SIZE INFLUENCE

19

DESCRIPTOR COMPARISON

Descr. #layersw/params

Loss Hard mining Kd-tree ready

ConvexOpt 1 Global margin - +

DeepDesc 3 Contrastive + +

TFeat 3 Triplet margin +/- +

MatchNet 8 Cross entropy - -

DeepComp 5 Hinge - -

L2Net 7 SoftMax + +

HardNet 7 Triplet margin + +

Loss comparison on patch triplets

20

21

LOSSES COMPARISON, DERIVATIVES

22

LOSSES COMPARISON, DERIVATIVES

No gradient

from negative exampleSmall gradients

23

LOSSES COMPARISON

Contrastive Softmax (L2Net) Triplet margin

FPR, Brown Yos

0.009 0.009 0.006

mAUC, W1BS 0.072 0.083 0.083

mAUC, HP-T 0.153 0.157 0.164

Results

24

25

RESULTS: BROWN DATASET

26

RESULTS: W1BS DATASET

Mishkin et al, BMVC 2015

Nuisance factor: Appearance Geometry Lighting Sensor

27

HPATCHES DATASETDoG, Hessian, Harris – in ref.image~1300 patches per image kept.Reprojected to other images with3 levels of “affine frame noise” added

V: 57 image sixplets – photometric changesI: 59 image sixplets – geometric changes

Balntas et al, CVPR 2017

28

RESULTS: HPATCHES

29

RESULTS: MATCHING WITH VIEW SYNTH

Datasets are already saturated On par withRootSIFT

Still challenging due to multiple nuisance factors

Zitnick and Ramnath, 2011, Mishkin et al 2015, Mikolajczyk et al. 2013,Hauagge and Snavely, 2012, Kelman et al, 2007, Fernando et al. 2014

30

RESULTS: BOW OXFORD5K & PARIS 6K

Philbin et al 2007, Philbin et al 2008

31

RESULTS: HQE OXFORD5K & PARIS 6K

Thank youfor attention

PDF: https://arxiv.org/abs/1705.10872Source and models: https://github.com/DagnyT/hardnet

32

hardnet: convolutional network for local image description

Science