devil in the details: analysing the performance of convnet features
TRANSCRIPT
![Page 1: Devil in the Details: Analysing the Performance of ConvNet Features](https://reader034.vdocuments.site/reader034/viewer/2022050923/55cd41f1bb61eb40628b46b7/html5/thumbnails/1.jpg)
Devil in the Details: Analysing the Performance
of ConvNet FeaturesKen Chatfield - University of Oxford
May 2015
![Page 2: Devil in the Details: Analysing the Performance of ConvNet Features](https://reader034.vdocuments.site/reader034/viewer/2022050923/55cd41f1bb61eb40628b46b7/html5/thumbnails/2.jpg)
The Devil is still in the Details2011 2014
![Page 3: Devil in the Details: Analysing the Performance of ConvNet Features](https://reader034.vdocuments.site/reader034/viewer/2022050923/55cd41f1bb61eb40628b46b7/html5/thumbnails/3.jpg)
• This work is about comparing the latest ConvNet based feature representations on common ground
• We compare both different pre-trained network architectures and different learning heuristics
Comparing Apples to Apples
Fixed Evaluation Protocol
Fixed Learning
CNN Arch 1
CNN Arch 2
IFV
Input Dataset
…
![Page 4: Devil in the Details: Analysing the Performance of ConvNet Features](https://reader034.vdocuments.site/reader034/viewer/2022050923/55cd41f1bb61eb40628b46b7/html5/thumbnails/4.jpg)
Performance Evolution over VOC2007
BOW32K–
IFV-BL327K–
IFV84K–
IFV84Kf s
DeCAF4Kt t
CNN-F4Kf s
CNN-M 2K2Kf s
CNN-S4K (TN)f s
VGG-D+E4KS s
545658606264666870727476788082848688
mAP
68.02
54.48
61.6964.36
73.41
77.15
80.13
2008 2010 2013 2014...
82.42
MethodDim.Aug.
201589.70
CNN-based methods
![Page 5: Devil in the Details: Analysing the Performance of ConvNet Features](https://reader034.vdocuments.site/reader034/viewer/2022050923/55cd41f1bb61eb40628b46b7/html5/thumbnails/5.jpg)
Evaluation Setup
SVM Classifier
train
test
training set
test set
Evaluate using mAP, accuracy etc.
classifier output
Pre-trained Net on 1,000 ImageNet Classes
CNN Feature Extractor
(4096-D feature vector out)
![Page 6: Devil in the Details: Analysing the Performance of ConvNet Features](https://reader034.vdocuments.site/reader034/viewer/2022050923/55cd41f1bb61eb40628b46b7/html5/thumbnails/6.jpg)
Outline
1
2
3
4
Different pre-trained networks
Data augmentation (for both CNN and IFV)
Dataset fine-tuning
![Page 7: Devil in the Details: Analysing the Performance of ConvNet Features](https://reader034.vdocuments.site/reader034/viewer/2022050923/55cd41f1bb61eb40628b46b7/html5/thumbnails/7.jpg)
• CNN-F Network
• CNN-M Network
• CNN-S Network
• VGG Very Deep Network
Network Architectures
![Page 8: Devil in the Details: Analysing the Performance of ConvNet Features](https://reader034.vdocuments.site/reader034/viewer/2022050923/55cd41f1bb61eb40628b46b7/html5/thumbnails/8.jpg)
Network ArchitecturesCNN-F NetworkSimilar to Krizhevsky et al. (ILSVRC-2012 winner)
conv3 256x3x3 stride 1
conv4 512x3x3
conv2 256x5x5 stride 1
conv1 64x11x11 stride 4
conv5 512x3x3
fc6 d.o. 4096-D
fc7 d.o. 4096-D
input image
x2 x2
![Page 9: Devil in the Details: Analysing the Performance of ConvNet Features](https://reader034.vdocuments.site/reader034/viewer/2022050923/55cd41f1bb61eb40628b46b7/html5/thumbnails/9.jpg)
Network ArchitecturesCNN-M NetworkSimilar to Zeiler & Fergus (ILSVRC-2013 winner)
conv3 512x3x3 stride 1
conv4 512x3x3
conv2 256x5x5 stride 2
conv1 96x7x7
stride 2
conv5 512x3x3
fc6 d.o. 4096-D
fc7 d.o. 4096-D
input image
x2 x2
Smaller receptive window size + stride in conv1
![Page 10: Devil in the Details: Analysing the Performance of ConvNet Features](https://reader034.vdocuments.site/reader034/viewer/2022050923/55cd41f1bb61eb40628b46b7/html5/thumbnails/10.jpg)
Network ArchitecturesCNN-S NetworkSimilar to Overfeat ‘accurate’ network (ICLR 2014)
conv3 512x3x3 stride 1
conv4 512x3x3
conv2 256x5x5 stride 1
conv1 96x7x7
stride 2
conv5 512x3x3
fc6 d.o. 4096-D
fc7 d.o. 4096-D
input image
x3 x2
Smaller stride in in conv2
![Page 11: Devil in the Details: Analysing the Performance of ConvNet Features](https://reader034.vdocuments.site/reader034/viewer/2022050923/55cd41f1bb61eb40628b46b7/html5/thumbnails/11.jpg)
Network ArchitecturesVGG Very Deep NetworkSimonyan & Zisserman (ICLR 2015)
conv1a 64x3x3
stride 1
fc6 d.o. 4096-D
fc7 d.o. 4096-D
input image
Smaller receptive window size + stride, and deeper
conv1b 64x3x3
stride 1
conv1c 64x3x3
stride 1x2
conv2a 128x3x3 stride 1
conv2b 128x3x3 stride 1
conv2c 128x3x3 stride 1
3(32C2) = 27C2
72C2 = 49C2
![Page 12: Devil in the Details: Analysing the Performance of ConvNet Features](https://reader034.vdocuments.site/reader034/viewer/2022050923/55cd41f1bb61eb40628b46b7/html5/thumbnails/12.jpg)
Pre-trained networks
mAP
( V
OC
07 )
70
75
80
85
90
Decaf CNN-F CNN-M CNN-S VGG-VD
89.3
79.7479.8977.38
73.41
![Page 13: Devil in the Details: Analysing the Performance of ConvNet Features](https://reader034.vdocuments.site/reader034/viewer/2022050923/55cd41f1bb61eb40628b46b7/html5/thumbnails/13.jpg)
Outline
1
2
3
4
Different pre-trained networks
Data augmentation (for both CNN and IFV)
Dataset fine-tuning
![Page 14: Devil in the Details: Analysing the Performance of ConvNet Features](https://reader034.vdocuments.site/reader034/viewer/2022050923/55cd41f1bb61eb40628b46b7/html5/thumbnails/14.jpg)
Data Augmentation
Given pre-trained ConvNet, augmentation applied at test time
CNN Feature Extractor
Pre-trained Network
a. Extract crops
b. Pool features (average, max)
![Page 15: Devil in the Details: Analysing the Performance of ConvNet Features](https://reader034.vdocuments.site/reader034/viewer/2022050923/55cd41f1bb61eb40628b46b7/html5/thumbnails/15.jpg)
Data Augmentation
a. No augmentation (= 1 image)
b. Flip augmentation (= 2 images)
c. Crop+Flip augmentation (= 10 images)
+
+ flips
224x224
224x224
224x224
![Page 16: Devil in the Details: Analysing the Performance of ConvNet Features](https://reader034.vdocuments.site/reader034/viewer/2022050923/55cd41f1bb61eb40628b46b7/html5/thumbnails/16.jpg)
Data Augmentationm
AP (
VO
C07
)
60
65
70
75
80
IFV CNN-M
79.89
67.17
79.44
66.68
76.99
64.35
76.97
64.36
NoneFlipCrop+Flip (train pooling: sum, test pooling: sum)Crop+Flip (train pooling: none, test pooling: sum)
![Page 17: Devil in the Details: Analysing the Performance of ConvNet Features](https://reader034.vdocuments.site/reader034/viewer/2022050923/55cd41f1bb61eb40628b46b7/html5/thumbnails/17.jpg)
Scale Augmentation
+ flips224x224
[Smin
, Smax
] = [256, 512]
+ flips224x224
256
512
Q = {Smin
, 0.5(Smin
+ Smax
), Smax
}
![Page 18: Devil in the Details: Analysing the Performance of ConvNet Features](https://reader034.vdocuments.site/reader034/viewer/2022050923/55cd41f1bb61eb40628b46b7/html5/thumbnails/18.jpg)
Fully Convolutional Net
Sermanet et al. 2014 (Overfeat)
• Convert final fc layers to convolutional layers • Output is then an activation map which can be pooled
8.8% ⇒ 7.5% top-5 val. error ILSVRC-2014
![Page 19: Devil in the Details: Analysing the Performance of ConvNet Features](https://reader034.vdocuments.site/reader034/viewer/2022050923/55cd41f1bb61eb40628b46b7/html5/thumbnails/19.jpg)
Outline
1
2
3
4
Different pre-trained networks
Data augmentation (for both CNN and IFV)
Dataset fine-tuning
![Page 20: Devil in the Details: Analysing the Performance of ConvNet Features](https://reader034.vdocuments.site/reader034/viewer/2022050923/55cd41f1bb61eb40628b46b7/html5/thumbnails/20.jpg)
Fine Tuning
conv3 512x3x3
conv4 512x3x3
conv2 256x5x5
conv1 96x7x7
conv5 512x3x3
fc6 d.o. 4096-D
fc7 d.o. 4096-D
ILSVRC softmax
![Page 21: Devil in the Details: Analysing the Performance of ConvNet Features](https://reader034.vdocuments.site/reader034/viewer/2022050923/55cd41f1bb61eb40628b46b7/html5/thumbnails/21.jpg)
Fine Tuning
conv3 512x3x3
conv4 512x3x3
conv2 256x5x5
conv1 96x7x7
conv5 512x3x3
fc6 d.o. 4096-D
fc7 d.o. 4096-D
VOC07 SVM loss
VOC 2007 Train Images
![Page 22: Devil in the Details: Analysing the Performance of ConvNet Features](https://reader034.vdocuments.site/reader034/viewer/2022050923/55cd41f1bb61eb40628b46b7/html5/thumbnails/22.jpg)
Fine Tuning
mAP
( V
OC
07 )
79
80
81
82
83
No TN TN-RNK TN-RNK
82.482.2
79.7
• TN-CLS – classification loss max{ 0, 1 - ywTφ( I ) }
• TN-RNK – ranking loss max{ 0, 1 - wT( φ( IPOS ) - φ( INEG ) ) }
![Page 23: Devil in the Details: Analysing the Performance of ConvNet Features](https://reader034.vdocuments.site/reader034/viewer/2022050923/55cd41f1bb61eb40628b46b7/html5/thumbnails/23.jpg)
Comparison with State of the ArtVOC2007 VOC2012ILSVRC-2012
CNN-M 2048CNN-SCNN-S TUNE-RNK
13.513.113.1
80.179.782.4
82.482.983.2
Zeiler & FergusOquab et al.Wei et al.
Clarifai (1 net)
16.1 79.018.0 77.7 78.7 (82.8*)
81.5 (85.2*) 81.7 (90.3*)
GoogLeNet (1 net)12.57.9
VGG Very Deep (1 net) 89.3 89.07.0
![Page 24: Devil in the Details: Analysing the Performance of ConvNet Features](https://reader034.vdocuments.site/reader034/viewer/2022050923/55cd41f1bb61eb40628b46b7/html5/thumbnails/24.jpg)
If you get the details right, a relatively simple ConvNet-based pipeline can outperform much more complex architectures
• Data augmentation helps a lot, both for deep and shallow features
• Fine tuning makes a difference, and should use ranking loss where appropriate
• Smaller filters and deeper networks help, although feature computation is slower
Take-home Messages
![Page 25: Devil in the Details: Analysing the Performance of ConvNet Features](https://reader034.vdocuments.site/reader034/viewer/2022050923/55cd41f1bb61eb40628b46b7/html5/thumbnails/25.jpg)
• Presented here was just a subset of the full results from the paper
• Check out the paper for full results on:
• VOC 2007 • VOC 2012 • Caltech-101 • Caltech-256 • ILSVRC-2012
There’s more…
![Page 26: Devil in the Details: Analysing the Performance of ConvNet Features](https://reader034.vdocuments.site/reader034/viewer/2022050923/55cd41f1bb61eb40628b46b7/html5/thumbnails/26.jpg)
• Caffe-compatible CNN models can be downloaded from the Caffe Model Zoo: https://github.com/BVLC/caffe/wiki/Model-Zoo
• Matlab feature computation code is also available from the project website: http://www.robots.ox.ac.uk/~vgg/software/deep_eval
Source Code
![Page 27: Devil in the Details: Analysing the Performance of ConvNet Features](https://reader034.vdocuments.site/reader034/viewer/2022050923/55cd41f1bb61eb40628b46b7/html5/thumbnails/27.jpg)
Related Publications
“Return of the Devil in the Details: Delving Deep into Convolutional Nets” BMVC 2014 Ken Chatfield, Karen Simonyan, Andrea Vedaldi, Andrew Zisserman (Best Paper Prize)
“The devil is in the details: an evaluation of recent feature encoding methods” BMVC 2011 Ken Chatfield, Karen Simonyan, Andrea Vedaldi, Victor Lempitsky, Andrew Zisserman (Best Poster Prize Honourable Mention, 300+ citations)
http://www.robots.ox.ac.uk/~ken