the impact of visual saliency prediction in image classification

Post on 11-Apr-2017

121 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

The Impact of Visual Saliency Prediction in Image Classification

1Eric Arazo Sánchez Kevin McGuinness Eva Mohedano Xavier Giró-i-Nieto

Advisors:

Introduction - Computer vision

2

ClassifierHandcrafted descriptors “guitar”

ClassifierLearned descriptors

Trainable

Trainable

Classical computer

vision

Deep Learning “guitar”

Introduction - Imagenet

3

Russakovsky, Olga, et al. “Imagenet large scale visual recognition challenge”. International Journal of Computer Vision (2015).

Imagenet

4

Images:

● 1.2 M train

● 50,000 test

● 1,000 categories

Evaluation dataset unpublished before the

competition

Imagenet

5

Metrics:

● Top-1 accuracy

● Top-5 accuracy

Imagenet

6

Metrics:

● Top-1 accuracy

● Top-5 accuracy

Introduction - Imagenet

7

ILSVRC - Evolution since 2010

Slide credit: Kaiming He (FAIR)

Introduction - Imagenet

8

ILSVRC - Evolution since 2010

Slide credit: Kaiming He (FAIR)

Some models have already reached

human-level performance.

Still the olympic games of computer

vision?

Introduction - Imagenet

9Slide credit: Kaiming He (FAIR)

-9.4%2012

Introduction of the Convolutional Neural

Networks (CNN) in the competition with AlexNet

ILSVRC - Evolution since 2010

Introduction - AlexNet

10

Ref: Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. NIPS 2012.

Introduction - AlexNet

11

5 Convolutional

Layers

3 Fully Connected

Layers

1000 softmax

Object class

Introduction - CNN

12LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324.

Introduction - CNN

13LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324.

CNN are very useful in computer vision:

● Reduction of parameters (shared filters)

● Spatial coherence

Introduction - CNN

14

Image captioning Image segmentation

Introduction - CNN

15

Saliency prediction

Introduction - Saliency prediction

16

CNN model

Images

Saliency maps

Introduction - Saliency prediction

17

CNN for image classification

Objective

18

● Explore if saliency maps could improve other computer vision tasks

Objective

19

● Explore if saliency maps could improve computer vision tasks

Objective

20

● Explore if saliency maps could improve computer vision tasks

Outline● Introduction● Objective● State-of-the-art ● Methodology● Conclusions● Future work

21

State-of-the-art - Saliency prediction

22

SalNet

Pan, Junting and McGuinness, Kevin and Sayrol, Elisa and Giro-i-Nieto, Xavier and O'Connor, Noel E. Shallow and Deep Convolutional Networks for Saliency Prediction. CVPR 2016.

Trained on SALICON

Saliency prediction

23

Application of saliency:

Saliency prediction

24

Application of saliency:

● In image retrieval

○ Finding the last appearance of an object.

Ref: Reyes, Cristian et al. Where is my Phone? Personal Object Retrieval from Egocentric Images (2016)

Saliency prediction

25

Application of saliency:

● In image retrieval

○ Finding the last appearance of an object.

● Object recognition

○ Health care

Ref: Reyes, Cristian et al. Where is my Phone? Personal Object Retrieval from Egocentric Images (2016)

Ref: Pérez de San Roman, Philippe et al. Saliency Driven Object recognition in egocentric videos with deep CNN. 2016

Saliency prediction - our approach

26

Saliency prediction - our approach

27

AlexNet*SalNet

Outline● Introduction● Objective● State-of-the-art ● Methodology● Conclusions● Future work

28

Methodology

29

RGB images

30

RGB images

RGB - The Baseline

31

RGB images

RGB - The Baseline

● 1.2 M images

● 227 x 227

● 1.2 M images

● 227 x 227

32

RGB images

RGB - The Baseline

9 days to train on computation

cluster

RGB - The Baseline

33

RGB - The Baseline

34

9 days

5 days

RGB - The Baseline

35

9 days

5 days

1.5 days

How to introduce saliency predictions?

36

Multiplication

Fan-in Network

Concatenation

37

AlexnetMultiplication

Fan-in Network

Concatenation

Alexnet

How to introduce saliency predictions?

38

Multiplication

Fan-in Network

Concatenation

Alexnet

Alexnet

How to introduce saliency predictions?

39

Multiplication

Fan-in Network

Concatenation

Alexnet

Alexnet

Alexnet

CNN

How to introduce saliency predictions?

40

Multiplication

Fan-in Network

ConcatenationWhere?

Alexnet

Alexnet

Alexnet

CNN

How to introduce saliency predictions?

41

Multiplication

Fan-in Network

Concatenation

Alexnet

Alexnet

Alexnet

CNN

How to introduce saliency predictions?

42

Alexnet

Alexnet

Alexnet

CNN

Makes sense to use the baseline, which is already trained

Multiplication

Fan-in Network

Concatenation

How to introduce saliency predictions?

43

Alexnet

Alexnet

Alexnet

CNN

Makes sense to use the baseline, which is already trained

Multiplication

Fan-in Network

Concatenation

Pre-trained CNN

How to introduce saliency predictions?

Multiplication vs. Concatenation

44

Three strategies for each of them:

Multiplication vs. Concatenation

45

Three strategies for each of them:

RGBS

Conv 1

Conv 2

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.

Batch Norm.

Max-Pooling

Max-Pooling

Max-Pooling

RGBSaliency

Multiplication vs. Concatenation

46

Three strategies for each of them:

RGB-1S-2SRGBS

Conv 1

Conv 2

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.

Batch Norm.

Max-Pooling

Max-Pooling

Max-Pooling

RGBSaliency

Conv 1

Conv 2

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.

Batch Norm.

Max-Pooling

Max-Pooling

Max-Pooling

RGBSaliency

Multiplication vs. Concatenation

47

Three strategies for each of them:

RGBS RGB-1S-2S RGBS-1S-2S

Conv 1

Conv 2

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.

Batch Norm.

Max-Pooling

Max-Pooling

Max-Pooling

RGBSaliency

Conv 1

Conv 2

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.

Batch Norm.

Max-Pooling

Max-Pooling

Max-Pooling

RGBSaliency

Conv 1

Conv 2

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.

Batch Norm.

Max-Pooling

Max-Pooling

Max-Pooling

RGBSaliency

Multiplication vs. Concatenation

48

RGBSRGBS

RGBS

RGB-1S-2S

RGBS-1S-2S

Conv 1

Conv 2

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.

Batch Norm.

Max-Pooling

Max-Pooling

Max-Pooling

RGBSaliency

Multiplication vs. Concatenation

49

RGBSRGBS

RGBS

RGB-1S-2S

RGBS-1S-2S

Conv 1

Conv 2

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.

Batch Norm.

Max-Pooling

Max-Pooling

Max-Pooling

RGBSaliency

Multiplication vs. Concatenation

50

RGB-1S-2S

RGBS

RGB-1S-2S

RGBS-1S-2S

Conv 1

Conv 2

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.

Batch Norm.

Max-Pooling

Max-Pooling

Max-Pooling

RGBSaliency

Multiplication vs. Concatenation

51

RGB-1S-2S

RGBS

RGB-1S-2S

RGBS-1S-2S

Conv 1

Conv 2

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.

Batch Norm.

Max-Pooling

Max-Pooling

Max-Pooling

RGBSaliency

Multiplication vs. Concatenation

52

RGBS-1S-2S

RGBS

RGB-1S-2S

RGBS-1S-2S

Conv 1

Conv 2

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.

Batch Norm.

Max-Pooling

Max-Pooling

Max-Pooling

RGBSaliency

Multiplication vs. Concatenation

53

RGBS-1S-2S

RGBS

RGB-1S-2S

RGBS-1S-2S

Conv 1

Conv 2

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.

Batch Norm.

Max-Pooling

Max-Pooling

Max-Pooling

RGBSaliency

Multiplication vs. Concatenation

54

The best option is concatenation:

● RGBS

● RGB-1S-2S

55

Multiplication

Fan-in Network

Concatenation

How to introduce saliency predictions?

56

Multiplication

Fan-in Network

Concatenation

How to introduce saliency predictions?

57

RGBS

RGB-1S-2S

Multiplication

Fan-in Network

Concatenation

How to introduce saliency predictions?

58

RGBS

RGB-1S-2S

Multiplication

Fan-in Network

Concatenation

How to introduce saliency predictions?

59

Alexnet

CNN

RGBS

RGB-1S-2S

Multiplication

Fan-in Network

Concatenation

How to introduce saliency predictions?

60

Alexnet

CNN

RGBS

RGB-1S-2S

Multiplication

Fan-in Network

Concatenation

Where?

How to introduce saliency predictions?

Fan-in architecture

61

Three strategies:

Fan-in C1.1

Conv 1

Conv 2

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.

Batch Norm.

Max-Pooling

Max-Pooling

Max-Pooling

RGBSaliency

Conv 1Batch Norm.Max-Pooling

Fan-in architecture

62

Three strategies:

Fan-in C1.1 Fan-in C2.1

Conv 1

Conv 2

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.

Batch Norm.

Max-Pooling

Max-Pooling

Max-Pooling

RGBSaliency

Conv 1Batch Norm.Max-Pooling

Conv 1

Conv 2

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.

Batch Norm.

Max-Pooling

Max-Pooling

Max-Pooling

RGBSaliency

Conv 1Batch Norm.Max-Pooling

Conv 2Batch Norm.Max-Pooling

Fan-in architecture

63

Three strategies:

Fan-in C1.1 Fan-in C2.1 Fan-in C2

Conv 1

Conv 2

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.

Batch Norm.

Max-Pooling

Max-Pooling

Max-Pooling

RGBSaliency

Conv 1Batch Norm.Max-Pooling

Conv 1

Conv 2

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.

Batch Norm.

Max-Pooling

Max-Pooling

Max-Pooling

RGBSaliency

Conv 1Batch Norm.Max-Pooling

Conv 2Batch Norm.Max-Pooling

Conv 1

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.Max-Pooling

Max-Pooling

RGBSaliency

Conv 1Batch Norm.Max-Pooling

Fan-in architecture

64

Fan-in C1.1

Fan-in C1.1

Fan-in C2.1

Fan-in C2

Conv 1

Conv 2

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.

Batch Norm.

Max-Pooling

Max-Pooling

Max-Pooling

RGBSaliency

Conv 1Batch Norm.Max-Pooling

Fan-in architecture

65

Fan-in C1.1

Fan-in C1.1

Fan-in C2.1

Fan-in C2

Conv 1

Conv 2

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.

Batch Norm.

Max-Pooling

Max-Pooling

Max-Pooling

RGBSaliency

Conv 1Batch Norm.Max-Pooling

Fan-in architecture

66

Fan-in C1.1

Fan-in C2.1

Fan-in C2

Fan-in C2.1

Conv 1

Conv 2

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.

Batch Norm.

Max-Pooling

Max-Pooling

Max-Pooling

RGBSaliency

Conv 1Batch Norm.Max-Pooling

Conv 2Batch Norm.Max-Pooling

Fan-in architecture

67

Fan-in C1.1

Fan-in C2.1

Fan-in C2

Fan-in C2.1

Conv 1

Conv 2

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.

Batch Norm.

Max-Pooling

Max-Pooling

Max-Pooling

RGBSaliency

Conv 1Batch Norm.Max-Pooling

Conv 2Batch Norm.Max-Pooling

Fan-in architecture

68

Fan-in C1.1

Fan-in C2.1

Fan-in C2

Fan-in C2

Conv 1

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.Max-Pooling

Max-Pooling

RGBSaliency

Conv 1Batch Norm.Max-Pooling

Fan-in architecture

69

Fan-in C1.1

Fan-in C2.1

Fan-in C2

Fan-in C2

Conv 1

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.Max-Pooling

Max-Pooling

RGBSaliency

Conv 1Batch Norm.Max-Pooling

Fan-in architecture

70

The best option is concatenation:

● Fan-in C2.1

● Fan-in C2

Fan-in architecture

71

The best option is concatenation:

● Fan-in C2.1

● Fan-in C2

Surprising result for Fan-in C2 since it

has less parameters than the baseline

More experiments

12.4%

RGB-C2 (128x128)

72

Fan-in C2Fan-in Network

RGB-C2 (128x128)

73

Fan-in C2Fan-in Network

RGB-C2 (128x128)

74

RGB-C2RGB (baseline)

Fan-in C2Fan-in Network

75

RGB-C2 (128x128)

RGB (baseline)

Fan-in Network

RGB-C2

Fan-in C2

76

Multiplication

Fan-in Network

ConcatenationRGBS

RGB-1S-2S

How to introduce saliency predictions?

77

Multiplication

Fan-in Network

ConcatenationRGBS

RGB-1S-2S

Fan-in C2.1

Fan-in C2

How to introduce saliency predictions?

Analysis of per-class improvements

78

Fan-in C2.1

Fan-in C2

RGBS

RGB-1S-2S

Multiplication

Fan-in Network

Concatenation

Analysis of per-class improvements

79

Fan-in C2.1

Fan-in C2

RGBS

RGB-1S-2S

Multiplication

Fan-in Network

Concatenation

Analysis of per-class improvements

80

Class Increase of accuracy

Acoustic guitar

25 %

Volleyball 23 %

81

Analysis of per-class improvementsClass Increase of accuracy

Wrecker, tow car

-23 %

Entertainment center

-18 %

Outline● Introduction● Objective● State-of-the-art ● Methodology● Conclusions● Future work

82

● CNNs trained to predict saliency maps can be used to improve other computer vision tasks such as image classification

83

Conclusions

● CNNs trained to predict saliency maps can be used to improve other computer vision tasks such as image classification

84

Conclusions

Fan-in Network

● CNNs trained to predict saliency maps can be used to improve other computer vision tasks such as image classification

85

Conclusions

Fan-in Network

● The best way to introduce the saliency maps to a CNN is with a Fan-in architecture, that provides freedom to the network to decide how to introduce the saliency maps

86

Conclusions

● The best way to introduce the saliency maps to a CNN is with a Fan-in architecture, that provides freedom to the network to decide how to introduce the saliency maps

87

Conclusions

Fan-in C2.1Conv 1

Conv 2

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.

Batch Norm.

Max-Pooling

Max-Pooling

Max-Pooling

RGBSaliency

Conv 1Batch Norm.Max-Pooling

Conv 2Batch Norm.Max-Pooling

Fan-in NetworkConcatenation

RGBSConv 1

Conv 2

Conv 3Conv 4Conv 5

FC 1

FC 1

FC 3 - Output

Drop Out

Drop Out

Batch Norm.

Batch Norm.

Max-Pooling

Max-Pooling

Max-Pooling

RGBSaliency

● The best way to introduce the saliency maps to a CNN is with a Fan-in architecture, that provides freedom to the network to decide how to introduce the saliency maps

88

Conclusions

● The methodology of downsampling the images provides accurate results on the improvements of the CNN in larger images

89

Conclusions

227 x 227

128 x 128

Outline● Introduction● Objective● State-of-the-art ● Methodology● Conclusions● Future work

90

Future work

91

● Several experiments:○ Fan-in:

■ Fan-in C2 without saliency maps

■ Concatenating instead of multiplying

○ Concatenation only in the first convolutional layer

○ Multiplication and training from scratch

● Once we have a reasonable model try with other saliency models

Future work

92

● Several experiments:○ Fan-in:

■ Fan-in C2 without saliency maps

■ Concatenating instead of multiplying

○ Concatenation only in the first convolutional layer

○ Multiplication and training from scratch

● Once we have a reasonable model try with other saliency models

Thank you

top related