deep epitomic nets and scale /position search for image classification

21
1 TIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search Deep Epitomic Nets and Scale/Position Search for Image Classification TTIC_ECP team George Papandreou ota Technological Institute at Chicago Iasonas Kokkinos Ecole Centrale Paris/INRIA

Upload: anne-sharpe

Post on 02-Jan-2016

62 views

Category:

Documents


1 download

DESCRIPTION

Deep Epitomic Nets and Scale /Position Search for Image Classification. TTIC_ECP team. George Papandreou Toyota Technological Institute at Chicago. Iasonas Kokkinos Ecole Centrale Paris/INRIA. TTIC_ECP entry in a nutshell. Fusion (1)+(2). (0) Baseline: max-pooled net. - PowerPoint PPT Presentation

TRANSCRIPT

1TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search

Deep Epitomic Nets and Scale/Position Search for Image Classification

TTIC_ECP team

George Papandreou

Toyota Technological Institute

at Chicago

Iasonas KokkinosEcole Centrale Paris/INRIA

2TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search

TTIC_ECP entry in a nutshell

Goal: Invariance in Deep CNNs

Part 1: Deep epitomic nets: local translation (deformation)

Part 2: Global scaling and translation

Top-5 error. All DCNNs have 6 convolutional and 2 fully-connected layers.

(0) Baseline: max-pooled net

13.0%

(1) epitomic DCNN

11.9%

(2) epitomic DCNN+ search

10.56%

Fusion (1)+(2)

10.22%

~1% gain ~1.5% gain

3TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search

LeCun et al.: Gradient-Based Learning Applied to Document Recognition, Proc. IEEE 1998

Krizhevsky et al.: ImageNet Classification with Deep CNNs, NIPS 2012

Cascade of convolution + max-pooling blocks

(deformation-invariant template matching)

Deep Convolutional Neural Networks (DCNNs)

Our work: different blocks (P1) & different architecture (P2)

convolutional fully connected

4TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search

Part 1: Deep epitomic nets

5TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search

Epitomes: translation-invariant patch modelsPatch Templates

Jojic, Frey, Kannan: Epitomic analysis of appearance and shape, ICCV 2003

EM-based training

Benoit, Mairal, Bach, Ponce: Sparse image representation with epitomes, CVPR 2011Grosse, Raina, Kwong, Ng: Shift-invariant sparse coding, UAI 2007

Epitomes: a lot more for just a bit moreSeparate modeling: more data & less power per parameter

6TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search

Papandreou, Chen, Yuille: Modeling Image Patches with a Dictionary of Mini-Epitomes, CVPR14

Mini-epitomes for image classification

Dictionary of mini-epitomes Dictionary of patches (K-means)

Gains in (flat) BoW classification

7TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search

From flat to deep: Epitomic convolution

Max-Pooling Epitomic Convolution

G. Papandreou: Deep Epitomic Convolutional Neural Networks, arXiv, June 2014.

Max over image positions Max over epitome positions

8TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search

Deep Epitomic Convolutional Nets

Convolution + max-pooling

Epitomic convolution

Supervised dictionary learning by back-propagation

G. Papandreou: Deep Epitomic Convolutional Neural Networks, arXiv, June 2014.

9TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search

Deep Epitomic Convolutional Nets

Parameter sharing: faster and more reliable model learning

(0) Baseline: max-pooled net

13.0%

(1) epitomic DCNN

11.9%~1% gain

Consistent improvements

10TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search

Part 2: Global scaling and translation

11TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search

Scale Invariance challenge

Scale-dependent (area)Cat

egor

y-de

pend

ent

(ear

det

ecto

r)

Dogs

12TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search

Scale Invariance challenge

Scale-dependentCat

egor

y-de

pend

ent

(ear

det

ecto

r)

Dogs

Skyscrapers

13TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search

Scale Invariance challenge

Scale-dependentCat

egor

y-de

pend

ent

(ear

det

ecto

r)

Dogs

Skyscrapers

Training set

14TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search

Scale Invariance challenge

Scale-dependentCat

egor

y-de

pend

ent

(ear

det

ecto

r)

Dogs

Skyscrapers

Rule: Large skyscrapers have ears, large dogs don’t

15TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search

Scale Invariant classification

Scale-dependentCat

egor

y-de

pend

ent

A. Howard. Some improvements on deep convolutional neural network based image classification, 2013.

T. Dietterich et al. Solving the multiple-instance problem with axis-parallel rectangles. Artificial Intelligence, 1997. K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition, 2014.

This work:

MIL: End-to-end training!

‘bag’ of featuresfeature

16TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search

Step 1: Efficient multi-scale convolutional features

stitchpyramid GPU

unstitch

I(x,y)

I(x,y,s)Patchwork(x,y) C(x,y)

C(x,y,s)

multi-scale convolutional

features

Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: Overfeat : ICLR 2014Dubout, C., Fleuret, F.: Exact acceleration of linear object detectors. ECCV 2012Iandola, F., Moskewicz, M., Karayev, S., Girshick, R., Darrell, T., Keutzer, K.: Densenet. arXiv 2014

220x220x3 5x5x512

17TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search

Step 2: From fully connected to fully convolutional

convolutional

pyramid GPU I(x,y)

Patchwork(x,y) F(x,y)

stich

I(x,y,s)

220x220x3 1x1x4096

convolutional fully connected

18TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search

Step 3: Global max-pooling

pyramid GPU I(x,y)

Patchwork(x,y)

stich

I(x,y,s)

For free: argmax yields 48% localization error

(0) Baseline: max-pooled net

13.0%

(1) epitomic DCNN

11.9%

~1% gain

(2) epitomic DCNN+ search

10.56%

~1.5% gain

learned class-specific bias

Fusion (1)+(2)

10.22%

Consistent, explicit position and scale search during training and testing

19TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search

DCNN: 6 Convolutional + 2 Fully Connected layers

(0) Baseline: max-pooled net

13.0%

(1) Epitomic DCNN

11.9%

(2) search

10.56%

Fusion (1)+(2)

10.22%

~1% gain ~1.5% gain

The Deeper the Better: stay tuned!

Deep Epitomic Nets and Scale/Position Search for Image Classification

Goal: Invariance in Deep CNNs

?

20TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search

Epitomic implementation details

Architecture of our deep epitomic net (11.94%)

Training took 3 weeks on a singe Titan (60 epochs)

Standard choices for learning rate, momentum, etc.

21TTIC_ECP: Deep Epitomic CNNs and Explicit Scale/Position Search

Pyramidal search implementation details

Image warp to square image. Position in mosaic is fixed

Scales: 400, 300, 220, 160, 120, 90 pixels Mosaic: 720 pixels