relation modeling for computer visionvalser.org/webinar/slide/slides/20190703/relation...jul 03,...
TRANSCRIPT
![Page 1: Relation Modeling for Computer Visionvalser.org/webinar/slide/slides/20190703/relation...Jul 03, 2019 · Relation Networks for Visual Modeling Han Hu Visual Computing Group Microsoft](https://reader031.vdocuments.site/reader031/viewer/2022011921/602bbc8f4baefb455f14171a/html5/thumbnails/1.jpg)
Relation Networks for Visual Modeling
Han Hu
Visual Computing Group
Microsoft Research Asia (MSRA)
Collaborators: Zheng Zhang, Yue Cao, Jiayuan Gu, Jiarui Xu, Jifeng Dai, Yichen Wei, Stephen Lin, Liwei Wang, Zhenda Xie, Fangyun Wei
![Page 2: Relation Modeling for Computer Visionvalser.org/webinar/slide/slides/20190703/relation...Jul 03, 2019 · Relation Networks for Visual Modeling Han Hu Visual Computing Group Microsoft](https://reader031.vdocuments.site/reader031/viewer/2022011921/602bbc8f4baefb455f14171a/html5/thumbnails/2.jpg)
Human Brain
• Human cortex can universally perceive different senses
figure credit to J. Sharma et al.
![Page 3: Relation Modeling for Computer Visionvalser.org/webinar/slide/slides/20190703/relation...Jul 03, 2019 · Relation Networks for Visual Modeling Han Hu Visual Computing Group Microsoft](https://reader031.vdocuments.site/reader031/viewer/2022011921/602bbc8f4baefb455f14171a/html5/thumbnails/3.jpg)
Intelligent Machines
• A universal learning pipeline
data prediction
Loss(prediction, label)back-propagation
![Page 4: Relation Modeling for Computer Visionvalser.org/webinar/slide/slides/20190703/relation...Jul 03, 2019 · Relation Networks for Visual Modeling Han Hu Visual Computing Group Microsoft](https://reader031.vdocuments.site/reader031/viewer/2022011921/602bbc8f4baefb455f14171a/html5/thumbnails/4.jpg)
Intelligent Machines
• Particular basic model for different task/data
convolutionLSTM, GRU, convolution, self-attention, …
graph networks
![Page 5: Relation Modeling for Computer Visionvalser.org/webinar/slide/slides/20190703/relation...Jul 03, 2019 · Relation Networks for Visual Modeling Han Hu Visual Computing Group Microsoft](https://reader031.vdocuments.site/reader031/viewer/2022011921/602bbc8f4baefb455f14171a/html5/thumbnails/5.jpg)
Universal Basic Models for Intelligent Machines?
![Page 6: Relation Modeling for Computer Visionvalser.org/webinar/slide/slides/20190703/relation...Jul 03, 2019 · Relation Networks for Visual Modeling Han Hu Visual Computing Group Microsoft](https://reader031.vdocuments.site/reader031/viewer/2022011921/602bbc8f4baefb455f14171a/html5/thumbnails/6.jpg)
Relation Networks: Towards Universal Basic Models
graph neural networks (self)-attention
query embed key embed
“player”
similarity set
value embed
feat set
weighted sumout feat for
“player”
left figure credit to P. Battaglia et al.
similar things: graph neural networks, self-attention, …
![Page 7: Relation Modeling for Computer Visionvalser.org/webinar/slide/slides/20190703/relation...Jul 03, 2019 · Relation Networks for Visual Modeling Han Hu Visual Computing Group Microsoft](https://reader031.vdocuments.site/reader031/viewer/2022011921/602bbc8f4baefb455f14171a/html5/thumbnails/7.jpg)
Relation Networks for Graph Data
T. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. ICLR 2018
![Page 8: Relation Modeling for Computer Visionvalser.org/webinar/slide/slides/20190703/relation...Jul 03, 2019 · Relation Networks for Visual Modeling Han Hu Visual Computing Group Microsoft](https://reader031.vdocuments.site/reader031/viewer/2022011921/602bbc8f4baefb455f14171a/html5/thumbnails/8.jpg)
Relation Networks for NLP
A. Vaswani and et al. Attention Is All You Need. NIPS 2017
![Page 9: Relation Modeling for Computer Visionvalser.org/webinar/slide/slides/20190703/relation...Jul 03, 2019 · Relation Networks for Visual Modeling Han Hu Visual Computing Group Microsoft](https://reader031.vdocuments.site/reader031/viewer/2022011921/602bbc8f4baefb455f14171a/html5/thumbnails/9.jpg)
Relation Networks for Visual Modeling
pixel-pixel object-pixel object-object
ConvolutionVariants
RoIAlign
Relation Networks
None
Relation Networks
Relation Networks
our study timeline
![Page 10: Relation Modeling for Computer Visionvalser.org/webinar/slide/slides/20190703/relation...Jul 03, 2019 · Relation Networks for Visual Modeling Han Hu Visual Computing Group Microsoft](https://reader031.vdocuments.site/reader031/viewer/2022011921/602bbc8f4baefb455f14171a/html5/thumbnails/10.jpg)
Object-Object Relation Modeling
Han Hu*, Jiayuan Gu*, Zheng Zhang*, Jifeng Dai and Yichen Wei. Relation Networks for Object Detection. CVPR 2018
![Page 11: Relation Modeling for Computer Visionvalser.org/webinar/slide/slides/20190703/relation...Jul 03, 2019 · Relation Networks for Visual Modeling Han Hu Visual Computing Group Microsoft](https://reader031.vdocuments.site/reader031/viewer/2022011921/602bbc8f4baefb455f14171a/html5/thumbnails/11.jpg)
Object-Object Relation Modeling
It is much easier to detect the glove if we know there is a baseball player.
Han Hu*, Jiayuan Gu*, Zheng Zhang*, Jifeng Dai and Yichen Wei. Relation Networks for Object Detection. CVPR 2018
![Page 12: Relation Modeling for Computer Visionvalser.org/webinar/slide/slides/20190703/relation...Jul 03, 2019 · Relation Networks for Visual Modeling Han Hu Visual Computing Group Microsoft](https://reader031.vdocuments.site/reader031/viewer/2022011921/602bbc8f4baefb455f14171a/html5/thumbnails/12.jpg)
Object Relation Module
relation relation relation
concat
…
input
relationoutput
Plug-and-Play
Parallel, learnable, no additional supervision, translational invariant, stackable
(d-dim)
(d-dim)
Key FeatureRelative Geometric Term
Multiple Relation Branches
Shortcut
![Page 13: Relation Modeling for Computer Visionvalser.org/webinar/slide/slides/20190703/relation...Jul 03, 2019 · Relation Networks for Visual Modeling Han Hu Visual Computing Group Microsoft](https://reader031.vdocuments.site/reader031/viewer/2022011921/602bbc8f4baefb455f14171a/html5/thumbnails/13.jpg)
OR
M
The First Fully End-to-End Object Detector
Learnableduplicate removal
FC
FC
CO
NV
sJoint instance
recognition
OR
M
OR
M
× 𝑟1 × 𝑟2
back propagation steps
S. Ren et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. NIPS 2015
![Page 14: Relation Modeling for Computer Visionvalser.org/webinar/slide/slides/20190703/relation...Jul 03, 2019 · Relation Networks for Visual Modeling Han Hu Visual Computing Group Microsoft](https://reader031.vdocuments.site/reader031/viewer/2022011921/602bbc8f4baefb455f14171a/html5/thumbnails/14.jpg)
Results on COCO Object Detection
*Faster R-CNN with ResNet-101 model are used (evaluation on minival/test-dev are reported)
+3.0 mAP
+2.0 mAP
+1.0 mAP
• less than 10% computation overhead on all backbones
![Page 15: Relation Modeling for Computer Visionvalser.org/webinar/slide/slides/20190703/relation...Jul 03, 2019 · Relation Networks for Visual Modeling Han Hu Visual Computing Group Microsoft](https://reader031.vdocuments.site/reader031/viewer/2022011921/602bbc8f4baefb455f14171a/html5/thumbnails/15.jpg)
Object Pairs with High Relation Weights
reference object other objects contributing high weights
instance recognition duplicate removal
![Page 16: Relation Modeling for Computer Visionvalser.org/webinar/slide/slides/20190703/relation...Jul 03, 2019 · Relation Networks for Visual Modeling Han Hu Visual Computing Group Microsoft](https://reader031.vdocuments.site/reader031/viewer/2022011921/602bbc8f4baefb455f14171a/html5/thumbnails/16.jpg)
Class Co-Occurrence Information is Learnt
Class Co-occurrence Probability Learnt Attentional Weights
𝑟 = 0.90
![Page 17: Relation Modeling for Computer Visionvalser.org/webinar/slide/slides/20190703/relation...Jul 03, 2019 · Relation Networks for Visual Modeling Han Hu Visual Computing Group Microsoft](https://reader031.vdocuments.site/reader031/viewer/2022011921/602bbc8f4baefb455f14171a/html5/thumbnails/17.jpg)
Extension: Spatial-Temporal Object Relation
Jiarui Xu, Yue Cao, Zheng Zhang and Han Hu. Spatial-Temporal Relation Networks for Multi-Object Tracking. Tech Report 2018
![Page 18: Relation Modeling for Computer Visionvalser.org/webinar/slide/slides/20190703/relation...Jul 03, 2019 · Relation Networks for Visual Modeling Han Hu Visual Computing Group Microsoft](https://reader031.vdocuments.site/reader031/viewer/2022011921/602bbc8f4baefb455f14171a/html5/thumbnails/18.jpg)
Learnable Object-Pixel Relation (vs. RoIAlign)
Geometric
Appearance
Image Feature to Region Feature
Jiayuan Gu, Han Hu, Liwei Wang, Yichen Wei and Jifeng Dai. Learning Region Features for Object Detection. ECCV 2018
![Page 19: Relation Modeling for Computer Visionvalser.org/webinar/slide/slides/20190703/relation...Jul 03, 2019 · Relation Networks for Visual Modeling Han Hu Visual Computing Group Microsoft](https://reader031.vdocuments.site/reader031/viewer/2022011921/602bbc8f4baefb455f14171a/html5/thumbnails/19.jpg)
Pixel-Pixel Relation Modeling
convolution ConvNets
![Page 20: Relation Modeling for Computer Visionvalser.org/webinar/slide/slides/20190703/relation...Jul 03, 2019 · Relation Networks for Visual Modeling Han Hu Visual Computing Group Microsoft](https://reader031.vdocuments.site/reader031/viewer/2022011921/602bbc8f4baefb455f14171a/html5/thumbnails/20.jpg)
Question I: Can We Go Beyond Convolution?
Can we model the patterns by one channel?
convolution = template matching
Han Hu, Zheng Zhang, Zhenda Xie and Stephen Lin. Local Relation Networks for Visual Recognition. Tech Report 2019
template -> compose
![Page 21: Relation Modeling for Computer Visionvalser.org/webinar/slide/slides/20190703/relation...Jul 03, 2019 · Relation Networks for Visual Modeling Han Hu Visual Computing Group Microsoft](https://reader031.vdocuments.site/reader031/viewer/2022011921/602bbc8f4baefb455f14171a/html5/thumbnails/21.jpg)
Related Works: Capsule Networks
Increase Increasedecrease decrease
• Not aligned well with modern learning infrastructure
Figure credit by Aurélien Géron S. Sabour et al. Dynamic Routing Between Capsules. NIPS2017
![Page 22: Relation Modeling for Computer Visionvalser.org/webinar/slide/slides/20190703/relation...Jul 03, 2019 · Relation Networks for Visual Modeling Han Hu Visual Computing Group Microsoft](https://reader031.vdocuments.site/reader031/viewer/2022011921/602bbc8f4baefb455f14171a/html5/thumbnails/22.jpg)
Related Works: Non-Local Neural Networks
X. Wang et al. Non-local Neural Networks. CVPR2018
• Complementary to ConvNets
![Page 23: Relation Modeling for Computer Visionvalser.org/webinar/slide/slides/20190703/relation...Jul 03, 2019 · Relation Networks for Visual Modeling Han Hu Visual Computing Group Microsoft](https://reader031.vdocuments.site/reader031/viewer/2022011921/602bbc8f4baefb455f14171a/html5/thumbnails/23.jpg)
Beyond Convolution: Local Relation Layer
geometric composability (prior)
deeper
= relation network + locality + geometric prior + scalar key/query
appearance composability
deeper
![Page 24: Relation Modeling for Computer Visionvalser.org/webinar/slide/slides/20190703/relation...Jul 03, 2019 · Relation Networks for Visual Modeling Han Hu Visual Computing Group Microsoft](https://reader031.vdocuments.site/reader031/viewer/2022011921/602bbc8f4baefb455f14171a/html5/thumbnails/24.jpg)
Local Relation Network (LR-Net)
Totally convolution free!
![Page 25: Relation Modeling for Computer Visionvalser.org/webinar/slide/slides/20190703/relation...Jul 03, 2019 · Relation Networks for Visual Modeling Han Hu Visual Computing Group Microsoft](https://reader031.vdocuments.site/reader031/viewer/2022011921/602bbc8f4baefb455f14171a/html5/thumbnails/25.jpg)
Classification on ImageNet (26 Layers)
![Page 26: Relation Modeling for Computer Visionvalser.org/webinar/slide/slides/20190703/relation...Jul 03, 2019 · Relation Networks for Visual Modeling Han Hu Visual Computing Group Microsoft](https://reader031.vdocuments.site/reader031/viewer/2022011921/602bbc8f4baefb455f14171a/html5/thumbnails/26.jpg)
Robust to Adversarial Attacks
![Page 27: Relation Modeling for Computer Visionvalser.org/webinar/slide/slides/20190703/relation...Jul 03, 2019 · Relation Networks for Visual Modeling Han Hu Visual Computing Group Microsoft](https://reader031.vdocuments.site/reader031/viewer/2022011921/602bbc8f4baefb455f14171a/html5/thumbnails/27.jpg)
Question II: Do Non-local Networks Work Well Due to Relation Learning?
Yue Cao*, Jiarui Xu*, Stephen Lin, Fangyun Wei and Han Hu. GCNet: Non-local Networks meet SE-Net and Beyond. Tech Report 2019
attention maps for different query pixels
![Page 28: Relation Modeling for Computer Visionvalser.org/webinar/slide/slides/20190703/relation...Jul 03, 2019 · Relation Networks for Visual Modeling Han Hu Visual Computing Group Microsoft](https://reader031.vdocuments.site/reader031/viewer/2022011921/602bbc8f4baefb455f14171a/html5/thumbnails/28.jpg)
Explicit Query-Independent Attention Map
• Simplified Non-Local Blocks
The same accurate but significantly reducing computation!
![Page 29: Relation Modeling for Computer Visionvalser.org/webinar/slide/slides/20190703/relation...Jul 03, 2019 · Relation Networks for Visual Modeling Han Hu Visual Computing Group Microsoft](https://reader031.vdocuments.site/reader031/viewer/2022011921/602bbc8f4baefb455f14171a/html5/thumbnails/29.jpg)
Meet SE-Net (2017 ImageNet Champion)
![Page 30: Relation Modeling for Computer Visionvalser.org/webinar/slide/slides/20190703/relation...Jul 03, 2019 · Relation Networks for Visual Modeling Han Hu Visual Computing Group Microsoft](https://reader031.vdocuments.site/reader031/viewer/2022011921/602bbc8f4baefb455f14171a/html5/thumbnails/30.jpg)
Abstraction and New Instantiation
new instantiation
1. Global Attention Pooling (Simplified NL-Net)
2. Bottleneck transform (SE-Net)
3. Addition (Simplified NL-Net)
![Page 31: Relation Modeling for Computer Visionvalser.org/webinar/slide/slides/20190703/relation...Jul 03, 2019 · Relation Networks for Visual Modeling Han Hu Visual Computing Group Microsoft](https://reader031.vdocuments.site/reader031/viewer/2022011921/602bbc8f4baefb455f14171a/html5/thumbnails/31.jpg)
COCO Object Detection Results
• Baseline: Mask R-CNN + ResNet50 + FPN
method AP (bbox) AP (mask) #param FLOPs
baseline 37.2 33.8 44.4M 279.4G
NL-Net 38.0 34.7 46.5M 288.7G
SE-Net 38.2 34.7 46.9M 279.5G
GC-Net 39.4 35.7 46.9M 279.6G
![Page 32: Relation Modeling for Computer Visionvalser.org/webinar/slide/slides/20190703/relation...Jul 03, 2019 · Relation Networks for Visual Modeling Han Hu Visual Computing Group Microsoft](https://reader031.vdocuments.site/reader031/viewer/2022011921/602bbc8f4baefb455f14171a/html5/thumbnails/32.jpg)
Discussion: versus Deformable ConvNets
• Both can model content aware adaptiveness
• Verification vs. Regression
• Generality (arbitrary vs. grid)
• Partly complementary
[1] Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu and Yichen Wei. Deformable Convolutional Networks. In ICCV 2017.[2] Xizhou Zhu, Han Hu, Stephen Lin and Jifeng Dai. Deformable ConvNets v2: More Deformable, Better Results. In CVPR 2019.[3] Ze Yang, Shaohui Liu, Han Hu, Liwei Wang and Stephen Lin. RepPoints: Point Set Representation for Object Detection. Tech Report.
relation networks deformable conv
![Page 33: Relation Modeling for Computer Visionvalser.org/webinar/slide/slides/20190703/relation...Jul 03, 2019 · Relation Networks for Visual Modeling Han Hu Visual Computing Group Microsoft](https://reader031.vdocuments.site/reader031/viewer/2022011921/602bbc8f4baefb455f14171a/html5/thumbnails/33.jpg)
Thanks!pixel-pixel object-pixel object-object
ConvolutionVariants
RoIAlign
Relation Networks
None
Relation Networks
Relation Networks
Relation Network is All You Need for AI——SkyNet