qualcomm research-imagenet2015

39
1 NeoNet: Object centric training for image recognition Daniel Fontijne, Koen E. A. van de Sande, Eren Gölge, R. Blythe Towal, Anthony Sarah, Cees G. M. Snoek Qualcomm Technologies, Inc., December 17, 2015 Presented by: Daniel Fontijne Senior Staff Engineer

Upload: bilkent-university

Post on 13-Jan-2017

2.840 views

Category:

Science


0 download

TRANSCRIPT

Page 1: Qualcomm research-imagenet2015

1

NeoNet: Object centric training for image recognition

Daniel Fontijne, Koen E. A. van de Sande, Eren Gölge, R. Blythe Towal, Anthony Sarah, Cees G. M. SnoekQualcomm Technologies, Inc., December 17, 2015

Presented by:Daniel FontijneSenior Staff Engineer

Page 2: Qualcomm research-imagenet2015

2

Summary

Key component: object centric training

Score Ranking

Classification 4.8 -

Localization 12.6 3

Detection 53.6 2

Places 2 17.6 3

Page 3: Qualcomm research-imagenet2015

3

Agenda

Foundation

1Classification

2Localization

3Detection

4Places 2

5

Page 4: Qualcomm research-imagenet2015

4

The base network for all our submissions is the inception network as introduced in the batch normalization paper by Ioffe & Szegedy.

Foundation: Batch-normalized inception

Ioffe & Szegedy ICML 2015

Page 5: Qualcomm research-imagenet2015

5

Network in an inception module

Note: the 5x5 path is not used.

Lin et al. ICLR 2014

Page 6: Qualcomm research-imagenet2015

6

Agenda

Foundation

1Classification

2Localization

3Detection

4Places 2

5

Page 7: Qualcomm research-imagenet2015

7

Ensemble of 12 networks

Train ‘really long’, 350 epochs.

Randomized RELU.

Test at 14 scales, 10 crops.

Object preserving crops.

Classification overview

Xu et al. ICML workshop 2015

Page 8: Qualcomm research-imagenet2015

8

Quiz: What is this?

Page 9: Qualcomm research-imagenet2015

9

Answer: Flower

Page 10: Qualcomm research-imagenet2015

10

Quiz: In case you got that right, what is this?

Page 11: Qualcomm research-imagenet2015

11

Answer: Butterfly

Page 12: Qualcomm research-imagenet2015

12

Random crop selection might miss the object of interest.

Network tries to remember ‘butterfly’ when presented with leaves.

Solution: use provided boxes to assure crop contains the object.− For images without box annotation, use best box predicted by localization system.

Object preserving crops

X

Page 13: Qualcomm research-imagenet2015

13

Epochs Single view Multi-view

First attempt at inception + batch norm 112 8.63% 6.58%

Train ~325 epochs 324 8.77% 6.34%

32 images / mini-batch 130 8.74% 6.68%

Object preserving, 32 images/mini-batch 120 8.59% 6.51%

Object preserving with generated boxes 130 8.47% 6.46%

Ensemble of 12 - - 4.84%

Component breakdown

Page 14: Qualcomm research-imagenet2015

14

Final classification results

16.4

11.7

6.7

4.9

4.8

4.6

3.6

3.6

0 5 10 15 20

SuperVision ('12)

Clarifai ('13)

GoogLeNet ('14)

Ioffe & Szegedy, ICML '15

NeoNet

Trimps-Soushen

ReCeption

MSRA

Top-5 classification error on test set

NeoNet is competitive on object classification

Page 15: Qualcomm research-imagenet2015

15

Agenda

Foundation

1Classification

2Localization

3Detection

4Places 2

5

Page 16: Qualcomm research-imagenet2015

16

Foundations.− Generate box proposals using fast selective search.

− Train box-classification networks on crops.

Object centric training.− Object pre-training network.

− Object localization network.

− Object alignment network.

Localization overview

Girshik et al. PAMI 2016

Uijlings et al. IJCV 2013

Page 17: Qualcomm research-imagenet2015

17

Use the bounding box annotations for pre-training.

Increase the number of classes from N to 2*N+1:− N classes for the object, well-framed.

− N classes for partially framed objects.

− 1 class for ‘background’, i.e., object not visible.

1% – 1.5% improvement compared to standard pre-training.

Object centric pre-training

Page 18: Qualcomm research-imagenet2015

18

Dual-head network to account for missing bounding boxes. − One with 1000 outputs.

− One with 2001 outputs. No error gradient when box annotation is missing.

Object centric pre-training

Page 19: Qualcomm research-imagenet2015

19

Fully connected layer on top of Inception 4e and 5b.

Re-train Inception 5b and new head.

Then fine-tune entire network.

Object localization network

Page 20: Qualcomm research-imagenet2015

20

Quiz: Is this an entire skyscraper?

Page 21: Qualcomm research-imagenet2015

21

A 40% border worked best.− Such that in 7x7 resolution of Inception 5b there is a 1 pixel border.

Bordering the object

Page 22: Qualcomm research-imagenet2015

22

Extra head for object box alignment.

Classification head is also used, but with cross entropy cost.

Object alignment network

Page 23: Qualcomm research-imagenet2015

23

Object box alignment moves corners up to 50% of the width and height.

100% border allows network to ‘see’ full range of possible alignments.

~2% gain.

Object alignment border

Page 24: Qualcomm research-imagenet2015

24

Component breakdown

Top-5 localization error

First attempt 24.0%

40% border, FC on top of inception 5b 22.5%

FC on top of inception 5b+4e 21.8%

Object centric pre-training 20.3%

Ensemble of 8 17.5%

Object alignment 15.5%

Final result with ILSVRC blacklist applied 14.5%

Page 25: Qualcomm research-imagenet2015

25

Final localization results

42.5

34.2

30.0

25.3

12.6

12.3

9.0

0 5 10 15 20 25 30 35 40 45

UvA ('11)

SuperVision ('12)

OverFeat ('13)

VGG ('14)

NeoNet

Trimps-Soushen

MSRA

Top-5 localization error on test set

NeoNet is competitive on object localization

Page 26: Qualcomm research-imagenet2015

26

Agenda

Foundation

1Classification

2Localization

3Detection

4Places 2

5

Page 27: Qualcomm research-imagenet2015

27

Improved selective search

Fast Improved

Color spaces 2 3

Segmentations 2 4

Similarity functions 2 4

Average boxes 1,600 5,000

MABO 77.5 82.6

Time (s) 0.8 2.4

mAP 41.2 44.0

Page 28: Qualcomm research-imagenet2015

28

Five inception-style networks for feature extraction− Two trained on 1,000 object classes, no input border, fine-tuning on detection boxes

− Three trained on 1,000 object windows with input border, no fine tuning

Object detection network

Page 29: Qualcomm research-imagenet2015

29

Component breakdown

mAP on validation set

Best object class network 44.6

Best object centric network 47.7

Ensemble of 5 51.9

Page 30: Qualcomm research-imagenet2015

30

Component breakdown

mAP on validation set

Best object class network 44.6

Best object centric network 47.7

Ensemble of 5 51.9

+ context 53.2

Four classification networks fine tuned with

200 detection class labels

Page 31: Qualcomm research-imagenet2015

31

mAP on validation set

Best object class network 44.6

Best object centric network 47.7

Ensemble of 5 51.9

+ context 53.2

+ object alignment 54.6

Component breakdown

Page 32: Qualcomm research-imagenet2015

32

Final detection results

22.6

43.9

52.7

53.6

62.1

0 10 20 30 40 50 60 70

UvA/Euvision ('13)

GoogLeNet ('14)

Deep-ID Net

NeoNet

MSRA

Mean average precision on test set

NeoNet is competitive on object detection

Page 33: Qualcomm research-imagenet2015

33

Agenda

Foundation

1Classification

2Localization

3Detection

4Places 2

5

Page 34: Qualcomm research-imagenet2015

34

Our best submission: an ensemble of two inception nets. − Reduce fully connected layer from 1,000 to 401 outputs.

− Use pre-trained weights from ImageNet 1,000 (~325 epochs).

− Train Inception 5b and fully connected layer for two epochs.

− Fine-tune entire network for eight epochs.

Adding other networks reduced the accuracy

Places 2 overview

Page 35: Qualcomm research-imagenet2015

35

Component breakdown (top-5 error)

Single view Multi view

~325 epochs pre-training 17.9% 16.8%

First attempt. 112 epochs pre-training. 19.1% 17.9%

512 channel 5b, Alex-style FC head 20.0% 18.4%

32 images / batch 18.7% 17.6%

Randomized RELU 18.2% 17.5%

Ensemble of 7 - 16.7%

Ensemble of 2 - 16.5%

Page 36: Qualcomm research-imagenet2015

36

Final places 2 results

20

19.4

19.3

18.0

17.6

17.4

16.9

15 16 17 18 19 20 21

HiVision

MERL

ntu_rose

Trimps-Soushen

NeoNet

SIAT_MMLAB

WM

Top-5 classification error on test set

NeoNet is competitive on scene classification

Page 37: Qualcomm research-imagenet2015

37

On device recognition at 18 ms

Page 38: Qualcomm research-imagenet2015

38

Summary

Key component: object centric training

Score Ranking

Classification 4.8 -

Localization 12.6 3

Detection 53.6 2

Places 2 17.6 3

Page 39: Qualcomm research-imagenet2015

39

Nothing in these materials is an offer to sell any of the components or devices referenced herein.

©2013-2015 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.Qualcomm and Snapdragon are trademarks of Qualcomm Incorporated, registered in the United States and other countries. Zeroth is a trademark of Qualcomm Incorporated. Other products and brand names may be trademarks or registered trademarks of their respective owners.

References in this presentation to “Qualcomm” may mean Qualcomm Incorporated, Qualcomm Technologies, Inc., and/or other subsidiaries or business units within the Qualcomm corporate structure, as applicable.

Qualcomm Incorporated includes Qualcomm’s licensing business, QTL, and the vast majority of its patent portfolio. Qualcomm Technologies, Inc., a wholly-owned subsidiary of Qualcomm Incorporated, operates, along with its subsidiaries, substantially all of Qualcomm’s engineering, research and development functions, and substantially all of its product and services businesses, including its semiconductor business, QCT.

For more information on Qualcomm, visit us at: www.qualcomm.com & www.qualcomm.com/blog

Thank youFollow us on: