cnn based object detection in large video images · 2016. 4. 12. · • li shen, zhouchen lin and...
TRANSCRIPT
![Page 2: CNN Based Object Detection in Large Video Images · 2016. 4. 12. · • Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition,](https://reader033.vdocuments.site/reader033/viewer/2022052000/6012febf8a2b5150ad3d4576/html5/thumbnails/2.jpg)
Outline
• Introduction • Background • Challenge
• Our approach • System framework • Object detection • Scene recognition • Body segmentation • Same style matching
• Experiments • Conclusion
![Page 3: CNN Based Object Detection in Large Video Images · 2016. 4. 12. · • Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition,](https://reader033.vdocuments.site/reader033/viewer/2022052000/6012febf8a2b5150ad3d4576/html5/thumbnails/3.jpg)
Background
• Image retrieval
• Video advertising
Video out applications
![Page 4: CNN Based Object Detection in Large Video Images · 2016. 4. 12. · • Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition,](https://reader033.vdocuments.site/reader033/viewer/2022052000/6012febf8a2b5150ad3d4576/html5/thumbnails/4.jpg)
Challenge
• Real video data vs. image dataset
- Clutter background
- Multiple objects
- Small objects
- Variant pose/position
- Partial occlusion
![Page 5: CNN Based Object Detection in Large Video Images · 2016. 4. 12. · • Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition,](https://reader033.vdocuments.site/reader033/viewer/2022052000/6012febf8a2b5150ad3d4576/html5/thumbnails/5.jpg)
Our task
• Problems:
• Content based object retrieval in large video images
• High accuracy for same style matching
• High speed in large video database
• Solution:
• Accurate object detection + scene classification
• Discriminated DNN features and PCA/LDA transformation
• Speed up by parallel indexing and hierarchical filtering
![Page 6: CNN Based Object Detection in Large Video Images · 2016. 4. 12. · • Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition,](https://reader033.vdocuments.site/reader033/viewer/2022052000/6012febf8a2b5150ad3d4576/html5/thumbnails/6.jpg)
System framework
Scene Classification
Video key frame
Object detection
Body segmentation
CNN feature
Indexing Database
Query image
Faster-RCNN rect
CNN feature
Scene Classification
Match
Distance sort
Result
Body segmentation
indexing
query
![Page 7: CNN Based Object Detection in Large Video Images · 2016. 4. 12. · • Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition,](https://reader033.vdocuments.site/reader033/viewer/2022052000/6012febf8a2b5150ad3d4576/html5/thumbnails/7.jpg)
Object detection (I)
• Object detection by faster-RCNN • Faster-RCNN, Region proposals + object scores, [Ren, Shaoqing, et al.
NIPS2015]
• Trained on MS coco db (300k images) + video images (10k images)
• More pervasive and general for images with multi-objects
![Page 8: CNN Based Object Detection in Large Video Images · 2016. 4. 12. · • Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition,](https://reader033.vdocuments.site/reader033/viewer/2022052000/6012febf8a2b5150ad3d4576/html5/thumbnails/8.jpg)
• Multi-class object detection including • Clothes(skirt,jacket,trousers)
• Bags(handbag , backpack , draw-bar box )
• Electronics (mobile, laptop,TV,keyboard,mouse, microwave oven , oven , refrigerator )
• Glasses, necklace, hat
• Shoes
![Page 9: CNN Based Object Detection in Large Video Images · 2016. 4. 12. · • Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition,](https://reader033.vdocuments.site/reader033/viewer/2022052000/6012febf8a2b5150ad3d4576/html5/thumbnails/9.jpg)
Object detection (II)
• Object detection by CNN regression
• Input an image, output the coordinates of the object rectangle [Erhan, Dumitru, et al. CVPR2014]
• Efficient for images with single object, not recognized by faster-RCNN
![Page 10: CNN Based Object Detection in Large Video Images · 2016. 4. 12. · • Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition,](https://reader033.vdocuments.site/reader033/viewer/2022052000/6012febf8a2b5150ad3d4576/html5/thumbnails/10.jpg)
Body Segmentation
• Constraint by human body parts • CNN based body segmentation [Jonathan Long,CVPR2015]
• Bounding box, body mask, body parsing
original image segmentation image
![Page 11: CNN Based Object Detection in Large Video Images · 2016. 4. 12. · • Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition,](https://reader033.vdocuments.site/reader033/viewer/2022052000/6012febf8a2b5150ad3d4576/html5/thumbnails/11.jpg)
Scene classification
• CNN based Scene classification [Bolei Zhou, NIPS2014]
Video Key frame
Is Scene? yes/no
CNN absed Scene classification
tags
Non scene images Scene images of kitchen, office, living room, and bedroom
Multi-frame fusion
Scene classification Preciosn:65.8% Recall:74%
[email protected] Preciosn:83.8% Recall:56.7%
![Page 12: CNN Based Object Detection in Large Video Images · 2016. 4. 12. · • Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition,](https://reader033.vdocuments.site/reader033/viewer/2022052000/6012febf8a2b5150ad3d4576/html5/thumbnails/12.jpg)
Scene classes
• 0 kitchen • 1 dining • 2 bakery • 3 ice_cream_parlor • 4 bathroom • 5 washing_room • 6 bedroom • 7 living_room • 8 office • 9 children_room • 10 nursery • 11 toyshop • 12 shoe_shop • 13 jewelry_shop
14 outdoor_ice_world 15 indoor_ice_skating_rink 16 baseball 17 football 18 basketball_court 19 swimming_pool 20 track 21 bowling_alley 22 billiards 23 tennis 24 volleyball 25 gymnasium 26 pleasure_ground 27 hospital_room
28 dentists 29 drugstore 30 music_studio 31 music_store 32 sandbeach 33 hairsalon 34 bar 35 pagoda 36 bamboo_forest 37 mountain 38 coast 39 creek 40 waterfall 41 grass 42 other
![Page 13: CNN Based Object Detection in Large Video Images · 2016. 4. 12. · • Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition,](https://reader033.vdocuments.site/reader033/viewer/2022052000/6012febf8a2b5150ad3d4576/html5/thumbnails/13.jpg)
Same style matching
• SIFT feature matching • Normalization of SIFT • Dimension : 128dim x 400pts • MAP 22%
• CNN feature of imagenet 1k classifier • Model :VGG19 • Layers : fc7 • Dimension : 4096 600 • MAP 28%
• CNN feature of Same style classifier • Model :VGG19 • Layers : fc7 • Dimension : 4096 600 • MAP 34%
![Page 14: CNN Based Object Detection in Large Video Images · 2016. 4. 12. · • Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition,](https://reader033.vdocuments.site/reader033/viewer/2022052000/6012febf8a2b5150ad3d4576/html5/thumbnails/14.jpg)
Multi-feature fusion
• Same class matching classifier on imagenet 21k classes of 15M images • Same style matching classifier trained on 1239 queries of 1M images
• Speed • Nvidia K40 GPU, 10x faster than CPU i7 • Faster RCNN speed: 200ms/frame , image size 1920x1080 • Vgg19 feature speed: 60ms/frame, image size 256x256
CNN Models Feature dim MAP
Inception_bn1k 1024 24%
Inception_21k 1024 34%
Vgg19_caffe 4096 34%
Inception_21k + vgg19_caffe 5120 43%
![Page 15: CNN Based Object Detection in Large Video Images · 2016. 4. 12. · • Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition,](https://reader033.vdocuments.site/reader033/viewer/2022052000/6012febf8a2b5150ad3d4576/html5/thumbnails/15.jpg)
Experiments
• MAP precision on 3M testing images, trained on1M images
• Speed up • Parallel flann tree indexing • Hierarchical filtering by object classes, 10x faster speed • Query speed: 1s /image on 5000 teleplays with 2M images
Vgg 19model Full image Object rectangle
PCA+LDA Inception-21k MAP
√ √ × × × 27.8%
√ × √ × × 34.2%
√ × √ √ × 37.3%
√ × √ × √ 43.1%
√ × √ √ √ 46.1%
![Page 16: CNN Based Object Detection in Large Video Images · 2016. 4. 12. · • Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition,](https://reader033.vdocuments.site/reader033/viewer/2022052000/6012febf8a2b5150ad3d4576/html5/thumbnails/16.jpg)
Query system GUI
![Page 17: CNN Based Object Detection in Large Video Images · 2016. 4. 12. · • Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition,](https://reader033.vdocuments.site/reader033/viewer/2022052000/6012febf8a2b5150ad3d4576/html5/thumbnails/17.jpg)
Query examples on image dataset
![Page 18: CNN Based Object Detection in Large Video Images · 2016. 4. 12. · • Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition,](https://reader033.vdocuments.site/reader033/viewer/2022052000/6012febf8a2b5150ad3d4576/html5/thumbnails/18.jpg)
![Page 19: CNN Based Object Detection in Large Video Images · 2016. 4. 12. · • Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition,](https://reader033.vdocuments.site/reader033/viewer/2022052000/6012febf8a2b5150ad3d4576/html5/thumbnails/19.jpg)
Query examples on video dataset
![Page 20: CNN Based Object Detection in Large Video Images · 2016. 4. 12. · • Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition,](https://reader033.vdocuments.site/reader033/viewer/2022052000/6012febf8a2b5150ad3d4576/html5/thumbnails/20.jpg)
![Page 21: CNN Based Object Detection in Large Video Images · 2016. 4. 12. · • Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition,](https://reader033.vdocuments.site/reader033/viewer/2022052000/6012febf8a2b5150ad3d4576/html5/thumbnails/21.jpg)
Conclusion
• Bounding box is important to recognize object
• Fusion Same style matching with same class matching features to get higher accuracy
• PCA and LDA further improve accuracy and speed
• GPU is faster for CNN feature extraction
• Speed up query by parallel indexing and hierarchical filtering
![Page 22: CNN Based Object Detection in Large Video Images · 2016. 4. 12. · • Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition,](https://reader033.vdocuments.site/reader033/viewer/2022052000/6012febf8a2b5150ad3d4576/html5/thumbnails/22.jpg)
References
• Erhan, Dumitru, et al. "Scalable object detection using deep neural networks." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014.
• Ren, Shaoqing, et al. "Faster R-CNN: Towards real-time object detection with region proposal networks." Advances in Neural Information Processing Systems. 2015.
• Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.
• Arandjelović, Relja, and Andrew Zisserman. "Three things everyone should know to improve object retrieval." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2012.
• Jonathan Long, Evan Shelhamer, Trevor Darrell, Fully convolution Networks for Semantic Segmentation. CVPR 2015 arXiv:1411.4038.
• Conditional Random Fields as Recurrent Neural Networks. S. Zheng, S. Jayasumana, B. Romera-Paredes, V. Vineet, Z. Su, D. Du, C. Huang, P. Torr ICCV 2015.
• Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition, Clinical Orthopaedics and Related Research, 2015
• Bolei Zhou, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba and Aude Oliva, Learning Deep Features for Scene Recognition using Places Database, NIPS, 2014
• Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva and Antonio Torralba, Object detectors emerge in deep scene cnns, ICLR, 2015
• Ruobing Wu, Baoyuan Wang, Wenping Wang and Yizhou Yu, Harvesting discriminative meta objects with deep CNN features for Scene Classification, ICCV, 2015
• Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, Zbigniew Wojna,Rethinking the Inception Architecture for Computer Vision, arXiv:1512.00567 ,2015
![Page 23: CNN Based Object Detection in Large Video Images · 2016. 4. 12. · • Li Shen, Zhouchen Lin and Qingming Huang, Learning deep convolutional neural networks for places2 scene recognition,](https://reader033.vdocuments.site/reader033/viewer/2022052000/6012febf8a2b5150ad3d4576/html5/thumbnails/23.jpg)