cnn based object detection in large video images · 2016. 4. 12. · • li shen, zhouchen lin and...

23
CNN Based Object Detection in Large Video Images WangTao, [email protected] IQIYI ltd. 2016.4

Upload: others

Post on 27-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CNN Based Object Detection in Large Video Images · 2016. 4. 12. · • Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition,

CNN Based Object Detection in Large Video Images WangTao, [email protected]

IQIYI ltd. 2016.4

Page 2: CNN Based Object Detection in Large Video Images · 2016. 4. 12. · • Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition,

Outline

• Introduction • Background • Challenge

• Our approach • System framework • Object detection • Scene recognition • Body segmentation • Same style matching

• Experiments • Conclusion

Page 3: CNN Based Object Detection in Large Video Images · 2016. 4. 12. · • Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition,

Background

• Image retrieval

• Video advertising

Video out applications

Page 4: CNN Based Object Detection in Large Video Images · 2016. 4. 12. · • Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition,

Challenge

• Real video data vs. image dataset

- Clutter background

- Multiple objects

- Small objects

- Variant pose/position

- Partial occlusion

Page 5: CNN Based Object Detection in Large Video Images · 2016. 4. 12. · • Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition,

Our task

• Problems:

• Content based object retrieval in large video images

• High accuracy for same style matching

• High speed in large video database

• Solution:

• Accurate object detection + scene classification

• Discriminated DNN features and PCA/LDA transformation

• Speed up by parallel indexing and hierarchical filtering

Page 6: CNN Based Object Detection in Large Video Images · 2016. 4. 12. · • Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition,

System framework

Scene Classification

Video key frame

Object detection

Body segmentation

CNN feature

Indexing Database

Query image

Faster-RCNN rect

CNN feature

Scene Classification

Match

Distance sort

Result

Body segmentation

indexing

query

Page 7: CNN Based Object Detection in Large Video Images · 2016. 4. 12. · • Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition,

Object detection (I)

• Object detection by faster-RCNN • Faster-RCNN, Region proposals + object scores, [Ren, Shaoqing, et al.

NIPS2015]

• Trained on MS coco db (300k images) + video images (10k images)

• More pervasive and general for images with multi-objects

Page 8: CNN Based Object Detection in Large Video Images · 2016. 4. 12. · • Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition,

• Multi-class object detection including • Clothes(skirt,jacket,trousers)

• Bags(handbag , backpack , draw-bar box )

• Electronics (mobile, laptop,TV,keyboard,mouse, microwave oven , oven , refrigerator )

• Glasses, necklace, hat

• Shoes

Page 9: CNN Based Object Detection in Large Video Images · 2016. 4. 12. · • Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition,

Object detection (II)

• Object detection by CNN regression

• Input an image, output the coordinates of the object rectangle [Erhan, Dumitru, et al. CVPR2014]

• Efficient for images with single object, not recognized by faster-RCNN

Page 10: CNN Based Object Detection in Large Video Images · 2016. 4. 12. · • Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition,

Body Segmentation

• Constraint by human body parts • CNN based body segmentation [Jonathan Long,CVPR2015]

• Bounding box, body mask, body parsing

original image segmentation image

Page 11: CNN Based Object Detection in Large Video Images · 2016. 4. 12. · • Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition,

Scene classification

• CNN based Scene classification [Bolei Zhou, NIPS2014]

Video Key frame

Is Scene? yes/no

CNN absed Scene classification

tags

Non scene images Scene images of kitchen, office, living room, and bedroom

Multi-frame fusion

Scene classification Preciosn:65.8% Recall:74%

[email protected] Preciosn:83.8% Recall:56.7%

Page 12: CNN Based Object Detection in Large Video Images · 2016. 4. 12. · • Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition,

Scene classes

• 0 kitchen • 1 dining • 2 bakery • 3 ice_cream_parlor • 4 bathroom • 5 washing_room • 6 bedroom • 7 living_room • 8 office • 9 children_room • 10 nursery • 11 toyshop • 12 shoe_shop • 13 jewelry_shop

14 outdoor_ice_world 15 indoor_ice_skating_rink 16 baseball 17 football 18 basketball_court 19 swimming_pool 20 track 21 bowling_alley 22 billiards 23 tennis 24 volleyball 25 gymnasium 26 pleasure_ground 27 hospital_room

28 dentists 29 drugstore 30 music_studio 31 music_store 32 sandbeach 33 hairsalon 34 bar 35 pagoda 36 bamboo_forest 37 mountain 38 coast 39 creek 40 waterfall 41 grass 42 other

Page 13: CNN Based Object Detection in Large Video Images · 2016. 4. 12. · • Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition,

Same style matching

• SIFT feature matching • Normalization of SIFT • Dimension : 128dim x 400pts • MAP 22%

• CNN feature of imagenet 1k classifier • Model :VGG19 • Layers : fc7 • Dimension : 4096 600 • MAP 28%

• CNN feature of Same style classifier • Model :VGG19 • Layers : fc7 • Dimension : 4096 600 • MAP 34%

Page 14: CNN Based Object Detection in Large Video Images · 2016. 4. 12. · • Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition,

Multi-feature fusion

• Same class matching classifier on imagenet 21k classes of 15M images • Same style matching classifier trained on 1239 queries of 1M images

• Speed • Nvidia K40 GPU, 10x faster than CPU i7 • Faster RCNN speed: 200ms/frame , image size 1920x1080 • Vgg19 feature speed: 60ms/frame, image size 256x256

CNN Models Feature dim MAP

Inception_bn1k 1024 24%

Inception_21k 1024 34%

Vgg19_caffe 4096 34%

Inception_21k + vgg19_caffe 5120 43%

Page 15: CNN Based Object Detection in Large Video Images · 2016. 4. 12. · • Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition,

Experiments

• MAP precision on 3M testing images, trained on1M images

• Speed up • Parallel flann tree indexing • Hierarchical filtering by object classes, 10x faster speed • Query speed: 1s /image on 5000 teleplays with 2M images

Vgg 19model Full image Object rectangle

PCA+LDA Inception-21k MAP

√ √ × × × 27.8%

√ × √ × × 34.2%

√ × √ √ × 37.3%

√ × √ × √ 43.1%

√ × √ √ √ 46.1%

Page 16: CNN Based Object Detection in Large Video Images · 2016. 4. 12. · • Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition,

Query system GUI

Page 17: CNN Based Object Detection in Large Video Images · 2016. 4. 12. · • Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition,

Query examples on image dataset

Page 18: CNN Based Object Detection in Large Video Images · 2016. 4. 12. · • Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition,
Page 19: CNN Based Object Detection in Large Video Images · 2016. 4. 12. · • Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition,

Query examples on video dataset

Page 20: CNN Based Object Detection in Large Video Images · 2016. 4. 12. · • Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition,
Page 21: CNN Based Object Detection in Large Video Images · 2016. 4. 12. · • Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition,

Conclusion

• Bounding box is important to recognize object

• Fusion Same style matching with same class matching features to get higher accuracy

• PCA and LDA further improve accuracy and speed

• GPU is faster for CNN feature extraction

• Speed up query by parallel indexing and hierarchical filtering

Page 22: CNN Based Object Detection in Large Video Images · 2016. 4. 12. · • Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition,

References

• Erhan, Dumitru, et al. "Scalable object detection using deep neural networks." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014.

• Ren, Shaoqing, et al. "Faster R-CNN: Towards real-time object detection with region proposal networks." Advances in Neural Information Processing Systems. 2015.

• Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.

• Arandjelović, Relja, and Andrew Zisserman. "Three things everyone should know to improve object retrieval." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2012.

• Jonathan Long, Evan Shelhamer, Trevor Darrell, Fully convolution Networks for Semantic Segmentation. CVPR 2015 arXiv:1411.4038.

• Conditional Random Fields as Recurrent Neural Networks. S. Zheng, S. Jayasumana, B. Romera-Paredes, V. Vineet, Z. Su, D. Du, C. Huang, P. Torr ICCV 2015.

• Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition, Clinical Orthopaedics and Related Research, 2015

• Bolei Zhou, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba and Aude Oliva, Learning Deep Features for Scene Recognition using Places Database, NIPS, 2014

• Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva and Antonio Torralba, Object detectors emerge in deep scene cnns, ICLR, 2015

• Ruobing Wu, Baoyuan Wang, Wenping Wang and Yizhou Yu, Harvesting discriminative meta objects with deep CNN features for Scene Classification, ICCV, 2015

• Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, Zbigniew Wojna,Rethinking the Inception Architecture for Computer Vision, arXiv:1512.00567 ,2015

Page 23: CNN Based Object Detection in Large Video Images · 2016. 4. 12. · • Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition,