mcg-ict-cas object detection at ilsvrc...

37
MCG - ICT - CAS Object Detection at ILSVRC 2016 Tang Sheng, Li Yu, Wang Bin, Xiao Junbin, Zhang Rui, Zhang Yongdong, Li Jintao Corresponding Email: [email protected] Institute of Computing Technology, Chinese Academy of Sciences October 9 th , 2016 2 nd ImageNet and COCO Visual Recognition Challenges Joint Workshop

Upload: others

Post on 22-May-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MCG-ICT-CAS Object Detection at ILSVRC 2016image-net.org/.../2016/MCG-ICT-CAS-ILSVRC2016-Talk-final.pdf · 2016-10-10 · MCG-ICT-CAS Object Detection at ILSVRC 2016 Tang Sheng, Li

MCG-ICT-CAS Object Detection at ILSVRC 2016

Tang Sheng, Li Yu, Wang Bin, Xiao Junbin, Zhang Rui, Zhang Yongdong, Li Jintao

Corresponding Email: [email protected]

Institute of Computing Technology, Chinese Academy of Sciences

October 9th, 2016

2nd ImageNet and COCO Visual Recognition Challenges Joint Workshop

Page 2: MCG-ICT-CAS Object Detection at ILSVRC 2016image-net.org/.../2016/MCG-ICT-CAS-ILSVRC2016-Talk-final.pdf · 2016-10-10 · MCG-ICT-CAS Object Detection at ILSVRC 2016 Tang Sheng, Li

Page: 2

Team Members

Xiao JunbinLi Yu Tang Sheng

Zhang Yongdong Li Jintao

Wang Bin

Zhang Rui

Page 3: MCG-ICT-CAS Object Detection at ILSVRC 2016image-net.org/.../2016/MCG-ICT-CAS-ILSVRC2016-Talk-final.pdf · 2016-10-10 · MCG-ICT-CAS Object Detection at ILSVRC 2016 Tang Sheng, Li

Page: 3

Results of our 3 tasks• Three tasks with provided data:

– Object detection (DET): 4th

– Object detection from video (VID):3rd

– Scene Parsing: 3rd

Page 4: MCG-ICT-CAS Object Detection at ILSVRC 2016image-net.org/.../2016/MCG-ICT-CAS-ILSVRC2016-Talk-final.pdf · 2016-10-10 · MCG-ICT-CAS Object Detection at ILSVRC 2016 Tang Sheng, Li

Page: 4

Object Detection (DET)

Page 5: MCG-ICT-CAS Object Detection at ILSVRC 2016image-net.org/.../2016/MCG-ICT-CAS-ILSVRC2016-Talk-final.pdf · 2016-10-10 · MCG-ICT-CAS Object Detection at ILSVRC 2016 Tang Sheng, Li

Page: 5

DET: Overview

• Improvements of loss function– Implicit sub-categories of background class

– Sink class when necessary

• Other training and testing tricks– Segmentation feature

– Dilation as context

– Multi-scale testing

– Box refinement && box voting

Page 6: MCG-ICT-CAS Object Detection at ILSVRC 2016image-net.org/.../2016/MCG-ICT-CAS-ILSVRC2016-Talk-final.pdf · 2016-10-10 · MCG-ICT-CAS Object Detection at ILSVRC 2016 Tang Sheng, Li

Page: 6

Implicit subcategories of BG

• Background(BG): Indiscrimination– As ONE class equally as other object classes

– But: varies greatly

– Unreasonable to describe as one pattern

Page 7: MCG-ICT-CAS Object Detection at ILSVRC 2016image-net.org/.../2016/MCG-ICT-CAS-ILSVRC2016-Talk-final.pdf · 2016-10-10 · MCG-ICT-CAS Object Detection at ILSVRC 2016 Tang Sheng, Li

Page: 7

Implicit subcategories of BG• Add N output nodes in last FC layer

– Represent N subcategories of BG, Cross entropy loss

– Allocate more parameters to many BG class by adding latent BG subclasses. Improve identification capability.

– Voc 2007 with Resnet50: ↑ 1%

Model mAP on VOC07

Res50 baseline 77.5%

Res50+Implicit Subcategories-5 nodes

78.5% ↑1%

Page 8: MCG-ICT-CAS Object Detection at ILSVRC 2016image-net.org/.../2016/MCG-ICT-CAS-ILSVRC2016-Talk-final.pdf · 2016-10-10 · MCG-ICT-CAS Object Detection at ILSVRC 2016 Tang Sheng, Li

Page: 8

Sink class when necessary• Flow diversion of score to wrong classes

– Scores of true classes become relatively low

Score: High, wrong!

Page 9: MCG-ICT-CAS Object Detection at ILSVRC 2016image-net.org/.../2016/MCG-ICT-CAS-ILSVRC2016-Talk-final.pdf · 2016-10-10 · MCG-ICT-CAS Object Detection at ILSVRC 2016 Tang Sheng, Li

Page: 9

Sink class when necessary

0

0.1

0.2

0.3

0.4

0.5

c1(gt) c2 c3 c4 background

Original scores

c1(gt) c2 c3 c4 background

0

0.1

0.2

0.3

0.4

c1(gt) c2 c3 c4 background sink

New scores with sink class

c1(gt) c2 c3 c4 background sink

Seriously wrong!

Diversion

Page 10: MCG-ICT-CAS Object Detection at ILSVRC 2016image-net.org/.../2016/MCG-ICT-CAS-ILSVRC2016-Talk-final.pdf · 2016-10-10 · MCG-ICT-CAS Object Detection at ILSVRC 2016 Tang Sheng, Li

Page:10

Sink class when necessary• Add a Sink class

– Optimize: Minimize((loss (target) && loss (target+sink)), only if all the Top-K results are wrong during training

– Flow diversion of high wrong scores during testing

– Give true class with low scores more chances to win

– Voc 2007 with Resnet50: ↑ 0.7%

Model mAP on VOC07

Res50 baseline 77.5%

Res50+Sink-top5 78.2% ↑0.7%

Page 11: MCG-ICT-CAS Object Detection at ILSVRC 2016image-net.org/.../2016/MCG-ICT-CAS-ILSVRC2016-Talk-final.pdf · 2016-10-10 · MCG-ICT-CAS Object Detection at ILSVRC 2016 Tang Sheng, Li

Page:11

Tricks: Segmentation+Dilation

Segmentation feature[2] Dilation as context[3]

Scene parsingnetwork

VOC 2007: Segmentation ↑ 0.8%, Dilation ↑ 0.8%

Page 12: MCG-ICT-CAS Object Detection at ILSVRC 2016image-net.org/.../2016/MCG-ICT-CAS-ILSVRC2016-Talk-final.pdf · 2016-10-10 · MCG-ICT-CAS Object Detection at ILSVRC 2016 Tang Sheng, Li

Page:12

All together

Model mAP on VOC07

Res50 baseline 77.5%

+dilation as context+sink class when

necessary+implicit sub-categoris+segmentation feature

+testing tricks

80.62% +3.1%

Page 13: MCG-ICT-CAS Object Detection at ILSVRC 2016image-net.org/.../2016/MCG-ICT-CAS-ILSVRC2016-Talk-final.pdf · 2016-10-10 · MCG-ICT-CAS Object Detection at ILSVRC 2016 Tang Sheng, Li

Page:13

Results on ILSVRC 2016 DET

Model mAP on val mAP on test

Res200 baseline+tt(testing tricks) 62.9 59.1

Res200+all+tt 62.3 57.7 T_T

Res101Res152Res200

Res101+allRes152+allRes200+all

Res152+scene parsing featureRes200+all in halfRes200+logistic

58.057.860.557.858.060.355.660.257.9

61.6(with tt)

Page 14: MCG-ICT-CAS Object Detection at ILSVRC 2016image-net.org/.../2016/MCG-ICT-CAS-ILSVRC2016-Talk-final.pdf · 2016-10-10 · MCG-ICT-CAS Object Detection at ILSVRC 2016 Tang Sheng, Li

Page:14

Results on ILSVRC 2016 DETModel mAP on val2 mAP on test

Res200 baseline+tt(testing tricks) 62.9 59.1

Res200+all+tt 62.3 57.7

Res101+Res152+Res200 64.0 60.6

Res101Res101+all

Res152Res152+all

Res200Res200+all

Res152+scene parsing featureRes200+all in halfRes200+logistic

58.057.857.858.060.560.355.660.257.9

61.6(with tt)

↑ 2.4% ↑ 0.3%

↑ 0.4% ↓1.5%

Page 15: MCG-ICT-CAS Object Detection at ILSVRC 2016image-net.org/.../2016/MCG-ICT-CAS-ILSVRC2016-Talk-final.pdf · 2016-10-10 · MCG-ICT-CAS Object Detection at ILSVRC 2016 Tang Sheng, Li

Page:15

Results on ILSVRC 2016 DETModel mAP on val2 mAP on test

Res200 baseline+tt(testing tricks) 62.9 59.1

Res200+all+tt 62.3 57.7

Res101+Res152+Res200 63.9 60.6

Res101Res101+all

Res152Res152+all

Res200Res200+all

Res152+scene parsing featureRes200+all in halfRes200+logistic

58.057.857.858.060.560.355.660.257.9

61.6(with tt)

Page 16: MCG-ICT-CAS Object Detection at ILSVRC 2016image-net.org/.../2016/MCG-ICT-CAS-ILSVRC2016-Talk-final.pdf · 2016-10-10 · MCG-ICT-CAS Object Detection at ILSVRC 2016 Tang Sheng, Li

Page:16

Results on ILSVRC 2016 DETModel mAP on val2 mAP on test

Res200 baseline+tt(testing tricks) 62.9 59.1

Res200+all+tt 62.3 57.7

Res101+Res152+Res200 63.9 60.6

Res101Res101+all

Res152Res152+all

Res200Res200+all

Res152+scene parsing featureRes200+all in halfRes200+logistic

58.057.857.858.060.560.355.660.257.9

61.6(with tt)

Page 17: MCG-ICT-CAS Object Detection at ILSVRC 2016image-net.org/.../2016/MCG-ICT-CAS-ILSVRC2016-Talk-final.pdf · 2016-10-10 · MCG-ICT-CAS Object Detection at ILSVRC 2016 Tang Sheng, Li

Page:17

Results on ILSVRC 2016 DETModel mAP on val2 mAP on test

Res200 baseline+tt(testing tricks) 62.9 59.1

Res200+all+tt 62.3 57.7

Res101+Res152+Res200 63.9 60.6

Res101Res101+all

Res152Res152+all

Res200Res200+all

Res152+scene parsing featureRes200+all in halfRes200+logistic

58.057.857.858.060.560.355.660.257.9

61.6(with tt)

Page 18: MCG-ICT-CAS Object Detection at ILSVRC 2016image-net.org/.../2016/MCG-ICT-CAS-ILSVRC2016-Talk-final.pdf · 2016-10-10 · MCG-ICT-CAS Object Detection at ILSVRC 2016 Tang Sheng, Li

Page:18

Object Detection from Video (VID)

Page 19: MCG-ICT-CAS Object Detection at ILSVRC 2016image-net.org/.../2016/MCG-ICT-CAS-ILSVRC2016-Talk-final.pdf · 2016-10-10 · MCG-ICT-CAS Object Detection at ILSVRC 2016 Tang Sheng, Li

Page:19

Motivation: Tracking-based

Tracking-based TubeletGround Truth

Tracking Bounding Box

Drifting location!

• VID: Challenging task– Frame detection performance & adjacent motion information

– Tracking-based tubelet generation is an effective solution

Page 20: MCG-ICT-CAS Object Detection at ILSVRC 2016image-net.org/.../2016/MCG-ICT-CAS-ILSVRC2016-Talk-final.pdf · 2016-10-10 · MCG-ICT-CAS Object Detection at ILSVRC 2016 Tang Sheng, Li

Page:20

Our Detection-based

Detection-based Tubelet

TubeletkTubeletk+1

Target missing & discrete trajectory!

Ground Truth

Detection Bounding Box

• Detection Box Sequentialization

• Adjacent Checking with optical-flow

Page 21: MCG-ICT-CAS Object Detection at ILSVRC 2016image-net.org/.../2016/MCG-ICT-CAS-ILSVRC2016-Talk-final.pdf · 2016-10-10 · MCG-ICT-CAS Object Detection at ILSVRC 2016 Tang Sheng, Li

Page:21

Our DAT Framework

Fusion TubeletGround Truth

Detection Bounding Box

Improve location quality &

Recall missing objects

• DAT tubelet generation & fusion framework – Tubelet generation: complementary Detection And Tracking (DAT)

– Focused on precision & recall respectively

– Followed by a novel tubelet merging method

Page 22: MCG-ICT-CAS Object Detection at ILSVRC 2016image-net.org/.../2016/MCG-ICT-CAS-ILSVRC2016-Talk-final.pdf · 2016-10-10 · MCG-ICT-CAS Object Detection at ILSVRC 2016 Tang Sheng, Li

Page:22

VID: Overview• Two main contributions:

– Tubelet generation: sequentialize detection box with optical-flow

– Overlapping and successive tubelet fusion

Page 23: MCG-ICT-CAS Object Detection at ILSVRC 2016image-net.org/.../2016/MCG-ICT-CAS-ILSVRC2016-Talk-final.pdf · 2016-10-10 · MCG-ICT-CAS Object Detection at ILSVRC 2016 Tang Sheng, Li

Page:23

Still-image Detection• Training Data

– DET train&val + VID train (1/6)

• Architecture

– Faster R-CNN [1] + ResNet [4]

– Add RPN anchor with size = 64

• Model Ensemble

– ResNet101, ResNet200

– Weighting average for coordinate position and category score.

Page 24: MCG-ICT-CAS Object Detection at ILSVRC 2016image-net.org/.../2016/MCG-ICT-CAS-ILSVRC2016-Talk-final.pdf · 2016-10-10 · MCG-ICT-CAS Object Detection at ILSVRC 2016 Tang Sheng, Li

Page:24

Tracking-based Tubelet Generation

• Anchor Frame Selection– Select the frame with highest detection score object

...

...

1 ntt-1 t+1

Select as Anchor FrameHighest detection score in this video

Page 25: MCG-ICT-CAS Object Detection at ILSVRC 2016image-net.org/.../2016/MCG-ICT-CAS-ILSVRC2016-Talk-final.pdf · 2016-10-10 · MCG-ICT-CAS Object Detection at ILSVRC 2016 Tang Sheng, Li

Page:25

Tracking-based Tubelet Generation

• Anchor Target Selection– Exploit the adjacent information with optical flow [5]

to determine the reliable anchor targetst (Anchor Frame)t-1 t+1

optical flow optical flow

Detection results

Optical flowprediction results

Page 26: MCG-ICT-CAS Object Detection at ILSVRC 2016image-net.org/.../2016/MCG-ICT-CAS-ILSVRC2016-Talk-final.pdf · 2016-10-10 · MCG-ICT-CAS Object Detection at ILSVRC 2016 Tang Sheng, Li

Page:26

Tracking-based Tubelet Generation

• Anchor Target Selection: remove the unreliablet (Anchor Frame)t-1 t+1

bbox match

t (Anchor Frame withAnchor Targets)

bbox match

Detection result

Optical flow result

Page 27: MCG-ICT-CAS Object Detection at ILSVRC 2016image-net.org/.../2016/MCG-ICT-CAS-ILSVRC2016-Talk-final.pdf · 2016-10-10 · MCG-ICT-CAS Object Detection at ILSVRC 2016 Tang Sheng, Li

Page:27

Tracking-based Tubelet Generation

• Multi-target tracking with detection recall– Allocate each anchor target with a MDNet tracker [6]

– Track them in parallel

– Then use detection results to recall missing tubelets since the anchor frame may not contain all true objects.

(a) existed tubelets (b)Recall missing tubelets

High detection score but missing

Page 28: MCG-ICT-CAS Object Detection at ILSVRC 2016image-net.org/.../2016/MCG-ICT-CAS-ILSVRC2016-Talk-final.pdf · 2016-10-10 · MCG-ICT-CAS Object Detection at ILSVRC 2016 Tang Sheng, Li

Page:28

Detection-based Tubelet Generation

• Motivation– Overcome drifting location problem

– Excellent object detectors (Faster R-CNN) can generate precise bounding box of high location quality

Still-imageDetection

Anchor Target Selection

Detection BoxSequentialization

Multi-targetTracking

Coherent Reclassification

TubeletFusion

Detection-based Tubelet Generation

Tracking-based Tubelet Generation

Page 29: MCG-ICT-CAS Object Detection at ILSVRC 2016image-net.org/.../2016/MCG-ICT-CAS-ILSVRC2016-Talk-final.pdf · 2016-10-10 · MCG-ICT-CAS Object Detection at ILSVRC 2016 Tang Sheng, Li

Page:29

Detection-based Tubelet Generation

• Adjacent Checking: – By optical-flow for precise tubelet

Current Detection Box

Prev Detection Box

Prev Prediction Box

Optical Flow

Current Frame

If IoU (Red, Green) > a given threshold: same tubeletelse: a new tubelet

Page 30: MCG-ICT-CAS Object Detection at ILSVRC 2016image-net.org/.../2016/MCG-ICT-CAS-ILSVRC2016-Talk-final.pdf · 2016-10-10 · MCG-ICT-CAS Object Detection at ILSVRC 2016 Tang Sheng, Li

Page:30

Detection-based Tubelet Generation

• Detection Box Sequentialization

… …

Page 31: MCG-ICT-CAS Object Detection at ILSVRC 2016image-net.org/.../2016/MCG-ICT-CAS-ILSVRC2016-Talk-final.pdf · 2016-10-10 · MCG-ICT-CAS Object Detection at ILSVRC 2016 Tang Sheng, Li

Page:31

Detection-based Tubelet Generation

• Coherent reclassification

– Use majority voting to get coherent categories throughout a given tubelet

Cp:S1 Cp:S2 Cp:S3 Cp:S4 …Cp:S5 Cp:SN-1 Cp:SNCq:SN-1Tub

Start Frame End Frame

Cp:S Cp:S Cp:S Cp:S …Cp:S Cp:S Cp:SCp:STub

Start Frame End Frame

𝑆𝑘 =1

𝑁𝑘 𝑖=1

𝑁𝑘𝑆𝑖

Tubelet Category: Cp

Tubelet Box Score: S

max count(C ) (k [1,30])cls kk

Tub

Page 32: MCG-ICT-CAS Object Detection at ILSVRC 2016image-net.org/.../2016/MCG-ICT-CAS-ILSVRC2016-Talk-final.pdf · 2016-10-10 · MCG-ICT-CAS Object Detection at ILSVRC 2016 Tang Sheng, Li

Page:32

Tubelet Fusion

• Union Fusion

– Merge overlapping tubelets

iTub

jTub

1t

if2t

if

1t

jf 2t

jf

Tub

Union Fusion

Page 33: MCG-ICT-CAS Object Detection at ILSVRC 2016image-net.org/.../2016/MCG-ICT-CAS-ILSVRC2016-Talk-final.pdf · 2016-10-10 · MCG-ICT-CAS Object Detection at ILSVRC 2016 Tang Sheng, Li

Page:33

Tubelet Fusion

• Concatenation Fusion– Merge successive tubelets

iTub

jTubkTub

1t

if

1 1t

jf 2 1t

kf Tub

Concatenation Fusion

Page 34: MCG-ICT-CAS Object Detection at ILSVRC 2016image-net.org/.../2016/MCG-ICT-CAS-ILSVRC2016-Talk-final.pdf · 2016-10-10 · MCG-ICT-CAS Object Detection at ILSVRC 2016 Tang Sheng, Li

Page:34

ILSVRC 2016 VID Val Results

Mean Recall:Lower IoU: tracking-based is higherHigher IoU: detection-based is higher

Page 35: MCG-ICT-CAS Object Detection at ILSVRC 2016image-net.org/.../2016/MCG-ICT-CAS-ILSVRC2016-Talk-final.pdf · 2016-10-10 · MCG-ICT-CAS Object Detection at ILSVRC 2016 Tang Sheng, Li

Page:35

ILSVRC 2016 VID Val Results

Mean AP: Detection-based is higher than Tracking-based, Fusion is best!

Page 36: MCG-ICT-CAS Object Detection at ILSVRC 2016image-net.org/.../2016/MCG-ICT-CAS-ILSVRC2016-Talk-final.pdf · 2016-10-10 · MCG-ICT-CAS Object Detection at ILSVRC 2016 Tang Sheng, Li

Page:36

References

• [1] Ren S, He K, Girshick R, Sun J. “Faster R-CNN: Towards real-time object detection with region proposal networks”, NIPS 2015: 91-99.

• [2] Gidaris, Spyros, and Nikos Komodakis. "Object detection via a multi-region and semantic segmentation-aware cnn model." ICCV 2015.

• [3] Yu, Fisher, and Vladlen Koltun. "Multi-scale context aggregation by dilated convolutions." ICLR 2016.

• [4] He K, Zhang X, Ren S, Sun J. “Deep residual learning for image recognition”, CVPR 2016.

• [5] Kang K, Ouyang W, Li H, Wang X. “Object Detection from Video Tubelets with Convolutional Neural Networks”, CVPR 2016.

• [6] Nam H, Han B. “Learning multi-domain convolutional neural networks for visual tracking”, CVPR 2016.

Page 37: MCG-ICT-CAS Object Detection at ILSVRC 2016image-net.org/.../2016/MCG-ICT-CAS-ILSVRC2016-Talk-final.pdf · 2016-10-10 · MCG-ICT-CAS Object Detection at ILSVRC 2016 Tang Sheng, Li

Page:37

Welcome:

Questions and Comments

Thank You!