![Page 1: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/1.jpg)
Deep Learning in Higher Dimensions
Prof. Leal-Taixé and Prof. Niessner 1
![Page 2: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/2.jpg)
Multi-Dimensional ConvNets• 1D ConvNets
– Audio / Speech– Also Point Clouds
• 2D ConvNets– Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..)
• 3D ConvNets– For videos– For 3D data
• 4D ConvNets– E.g., dynamic 3D data (Haven’t seen much work there)– Simulations
Prof. Leal-Taixé and Prof. Niessner 2
![Page 3: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/3.jpg)
Remember: 1D Convolutions4 3 2 -5 3 5 2 5 5 6
1/3 1/3 1/3
3
𝑓
𝑔
𝑓 ∗ 𝑔
4 ⋅1
3+ 3 ⋅
1
3+ 2 ⋅
1
3= 3
Prof. Leal-Taixé and Prof. Niessner 3
![Page 4: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/4.jpg)
Remember: 1D Convolutions4 3 2 -5 3 5 2 5 5 6
1/3 1/3 1/3
3 0
𝑓
𝑔
𝑓 ∗ 𝑔
3 ⋅1
3+ 2 ⋅
1
3+ (−5) ⋅
1
3= 0
Prof. Leal-Taixé and Prof. Niessner 4
![Page 5: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/5.jpg)
Remember: 1D Convolutions4 3 2 -5 3 5 2 5 5 6
1/3 1/3 1/3
3 0 0
𝑓
𝑔
𝑓 ∗ 𝑔
2 ⋅1
3+ (−5) ⋅
1
3+ 3 ⋅
1
3= 0
Prof. Leal-Taixé and Prof. Niessner 5
![Page 6: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/6.jpg)
Remember: 1D Convolutions4 3 2 -5 3 5 2 5 5 6
1/3 1/3 1/3
3 0 0 1
𝑓
𝑔
𝑓 ∗ 𝑔
(−5) ⋅1
3+ 3 ⋅
1
3+ 5 ⋅
1
3= 1
Prof. Leal-Taixé and Prof. Niessner 6
![Page 7: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/7.jpg)
Remember: 1D Convolutions4 3 2 -5 3 5 2 5 5 6
1/3 1/3 1/3
3 0 0 1 10/3
𝑓
𝑔
𝑓 ∗ 𝑔
3 ⋅1
3+ 5 ⋅
1
3+ 2 ⋅
1
3=10
3
Prof. Leal-Taixé and Prof. Niessner 7
![Page 8: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/8.jpg)
Remember: 1D Convolutions4 3 2 -5 3 5 2 5 5 6
1/3 1/3 1/3
3 0 0 1 10/3 4
𝑓
𝑔
𝑓 ∗ 𝑔
5 ⋅1
3+ 2 ⋅
1
3+ 5 ⋅
1
3= 4
Prof. Leal-Taixé and Prof. Niessner 8
![Page 9: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/9.jpg)
Remember: 1D Convolutions4 3 2 -5 3 5 2 5 5 6
1/3 1/3 1/3
3 0 0 1 10/3 4 4
𝑓
𝑔
𝑓 ∗ 𝑔
2 ⋅1
3+ 5 ⋅
1
3+ 5 ⋅
1
3= 4
Prof. Leal-Taixé and Prof. Niessner 9
![Page 10: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/10.jpg)
Remember: 1D Convolutions4 3 2 -5 3 5 2 5 5 6
1/3 1/3 1/3
3 0 0 1 10/3 4 4 16/3
𝑓
𝑔
𝑓 ∗ 𝑔
5 ⋅1
3+ 5 ⋅
1
3+ 6 ⋅
1
3=16
3
Prof. Leal-Taixé and Prof. Niessner 10
![Page 11: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/11.jpg)
1D ConvNets: WaveNet
[van der Ooord 16] https://deepmind.com/blog/wavenet-generative-model-raw-audio/
![Page 12: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/12.jpg)
1D ConvNets: WaveNet
[van der Ooord 16] https://deepmind.com/blog/wavenet-generative-model-raw-audio/
![Page 13: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/13.jpg)
3D Classification
Class from 3D model (e.g., obtained with Kinect Scan)
[Maturana et al. 15] & [Qi et al. 16] 3D vs Multi-view
![Page 14: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/14.jpg)
3D Semantic Segmentation
[Dai et al. 17] ScanNet
1500 densely annotated 3D scans; 2.5 mio RGB-D frames
![Page 15: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/15.jpg)
Volumetric Grids
![Page 16: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/16.jpg)
Volumetric GridsVolumetric Data Structures
– Occupancy grids– Ternary grids– Distance Fields– Signed Distance fields
(binary) Voxel Grid
Shape completion error (higher == better)
![Page 17: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/17.jpg)
3D Shape Completion on Grids
[Dai et al. 17] CNNCompleteWorks with 32 x 32 x 32 voxels…
![Page 18: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/18.jpg)
ScanNet: Semantic Segmentation in 3D
[Dai et al. 17] ScanNet
![Page 19: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/19.jpg)
ScanNet: Sliding Window
[Dai et al. 17] ScanNet
![Page 20: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/20.jpg)
SurfaceNet: Stereo Reconstruction
Run on 32 x 32 x 32 blocks -> takes forever…
[Ji et al. 17] SurfaceNet
![Page 21: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/21.jpg)
ScanComplete: Fully ConvolutionalTrain on crops of scenes
Test on entire scenes
[Dai et al. 18] ScanComplete
![Page 22: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/22.jpg)
[Dai et al. 18] ScanComplete
Dependent Predictions:Autoregressive Neural Networks
Prof. Leal-Taixé and Prof. Niessner
![Page 23: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/23.jpg)
[Dai et al. 18] ScanComplete
Spatial Extent: Coarse-to-Fine Predictions
Prof. Leal-Taixé and Prof. Niessner
![Page 24: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/24.jpg)
ScanComplete: Fully Convolutional
[Dai et al. 18] ScanComplete
Input Partial Scan Completed Scan
![Page 25: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/25.jpg)
Conclusion so far
• Volumetric Grids are easy– Encode free space– Encode distance fields– Need a lot of memory– Need a lot of processing time– But can be used sliding window or fully-conv.
![Page 26: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/26.jpg)
Conclusion so far
Surface occupancy gets smaller with higher resolutions
![Page 27: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/27.jpg)
Volumetric Hierarchies
![Page 28: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/28.jpg)
Discriminative TasksStructure is known in advance!
OctNet: Learning Deep 3D Representations at High Resolutions (CVPR 2017)O-CNN: Octree-based Convolutional Neural Networks for 3D Shape Analysis (SIG17)
State of the art is somewhere here…
![Page 29: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/29.jpg)
Generative TasksNeed to infer structure!
Octree Generating Networks: Efficient Convolutional Architectures for High-resolution OutputsOctNetFusion: Learning Depth Fusion from Data (that one not end to end)
Pretty interesting: they have end-to-end method: i.e.,split voxels that are partially occupied
![Page 30: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/30.jpg)
Conclusion so far
• Hierarchies– are great for reducing memory and runtime– Comes at a performance hit– Easier for discriminative tasks when structure is known
![Page 31: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/31.jpg)
Multi-view
![Page 32: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/32.jpg)
Multiple Views: Classification• RGB images from fixed views around object:
– view pooling for classification (only RGB; no spatial corr. )
Multi-view Convolutional Neural Networks for 3D Shape Recognition
![Page 33: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/33.jpg)
Multiple Views: Segmentation
3D Shape Segmentation with Projective Convolutional NetworksThis one is interesting in a sense that it does 3D shape segmentation (only on CAD models)But it uses multi-view and has a spatial correlation on top of the mesh surface
![Page 34: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/34.jpg)
Fun thing…
Volumetric and Multi-View CNNs for Object Classification on 3D Data
![Page 35: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/35.jpg)
Hybrid: Volumetric + Multi-view
![Page 36: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/36.jpg)
avg class accuracy
geometry only 54.4
geometry + voxel colors 55.9
2D + 3D Semantic Segmentation
Resolution Mismatch!
[Dai & Niessner 18] 3DMV
![Page 37: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/37.jpg)
3D Volumetric + Multi-view
[Dai & Niessner 18] 3DMV
![Page 38: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/38.jpg)
3D Volumetric + Multi-viewC
olo
r In
put
Geom
etr
y In
put
class labels
[Dai & Niessner 18] 3DMV
![Page 39: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/39.jpg)
3D Volumetric + Multi-view
[Dai & Niessner 18] 3DMV
![Page 40: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/40.jpg)
avg class accuracy
color only 58.2
geometry only 54.4
color+geometry 75.0
3D Volumetric + Multi-view
[Dai & Niessner 18] 3DMV
![Page 41: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/41.jpg)
avg class accuracy
color only 58.2
geometry only 54.4
color+geometry 75.0
3D Volumetric + Multi-view
[Dai & Niessner 18] 3DMV
![Page 42: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/42.jpg)
avg class accuracy
geometry only 54.4
color+geometry (1 views) 70.1
color+geometry (3 views) 73.0
color+geometry (5 views) 75.0
3D Volumetric + Multi-view
[Dai & Niessner 18] 3DMV
![Page 43: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/43.jpg)
3D Volumetric + Multi-view
…
[Dai & Niessner 18] 3DMV
![Page 44: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/44.jpg)
Conclusion so far
• Hybrid:– Nice way to combine color and geometry– Great performance (best so far for segmentation)– End-to-end helps less than we hoped for– Could be faster…
![Page 45: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/45.jpg)
Point Clouds
![Page 46: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/46.jpg)
DeepLearning on Point Clouds: PointNet
[Qi et al. 17] PointNet
![Page 47: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/47.jpg)
DeepLearning on Point Clouds: PointNet
[Qi et al. 17] PointNet
![Page 48: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/48.jpg)
PointNet++Main idea
● Learn hierarchical representation of point cloud● Apply multiple (simplified) PointNets at different locations and scales● Each Scale: Furthest-Point Sampling -> Query Ball Grouping -> PointNet● Multi-scale or Multi-resolution grouping for sampling density robustness
Evaluations: Classification, Part-Segmentation, Scene-Segmentation
[Qi et al. 17] PointNet++
![Page 49: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/49.jpg)
Point ConvolutionsMain idea
● Transform points to continuous R3representation (RBFs)
● Convolve in R3● Restrict results to points
Uses Gaussian RBF representation.
Boils down to computing fixed weights forconvolution.
Don’t use real data as far as I know!
Point Convolutional NN by Extension OperatorsMatan Atzmon, Haggai Maron, Yaron Lipman (SIGGRAPH 2018)
![Page 50: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/50.jpg)
Conclusions so far
• PointNet variants:– Train super fast (also testing)– Can cover large spaces in one shot– Cannot represent free space– Performance (mostly) worse than pure volumetric– Still lots of ongoing research!
![Page 51: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/51.jpg)
Point Sets (global)Unordered point set
PointNet (CVPR 2017)
Hierarchy of point setsPointNet++: Deep Hierarchical Feature Learning on Point Sets in a ...(NIPS 2017)Generalized Convolutional Neural Networks for Point Cloud Data (ICMLA17)
Kd-treeEscape from Cells: Deep Kd-Networks (ICCV 2017)
PointCNNPointCNN (seems arxiv only)
![Page 52: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/52.jpg)
Point Sets (local)RBF
Point Convolutional NN by Extension Operators (SIGGRAPH 2018)Tangent Convolutions for Dense Prediction in 3D (CVPR 2018)
Nearest point neighborhoodsDynamic edge-conditioned filters in convolutional neural networks on graphs (CVPR17)3D Graph Neural Networks for RGBD Semantic Segmentation (ICCV17)PPFNet: Global context aware local features for robust 3d point matching (CVPR18)FeaStNet: Feature-Steered Graph Convolutions for 3D Shape Analysis (CVPR18)
Very interesting combination where convolutions are essentially over line segments in 3D, and where both locations and are being optimized https://arxiv.org/abs/1605.06240Idea is great, performance could be a bit better (probably hard to optimize)
![Page 53: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/53.jpg)
Mesh-based
![Page 54: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/54.jpg)
Convs on Meshes and Graphs• Lots of work by Michael Bronstein et al.
https://www.imperial.ac.uk/people/m.bronstein/publications.html
![Page 55: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/55.jpg)
Conclusion so far
• Meshes / Surfaces:– Needs some differential geometry approximation– Convolutions in DG space– I haven’t seen results on real-world data
• Probably prone to noise and incomplete scans
![Page 56: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/56.jpg)
Sparse Convolutions
![Page 57: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/57.jpg)
Sparse Convolutional Networks
Prof. Leal-Taixé and Prof. Niessner 58Submanifold Sparse Convolutional Networks, https://arxiv.org/abs/1706.01307 [Graham et al. 18]3D Semantic Segmentation with Submanifold Sparse Convolutional Networks
Regular, dense 3x3 Convolution-> set of actives (non-zeros) grows rapidly-> need a lot of memory-> takes a long time for feature prop.
![Page 58: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/58.jpg)
Sparse Convolutional Networks
Prof. Leal-Taixé and Prof. Niessner 59
Regular, dense 3x3 Convolution-> set of actives (non-zeros) grows rapidly-> need a lot of memory-> takes a long time for feature prop.
Submanifold Sparse Convolutional Networks, https://arxiv.org/abs/1706.01307 [Graham et al. 18]3D Semantic Segmentation with Submanifold Sparse Convolutional Networks
![Page 59: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/59.jpg)
Sparse Convolutional Networks
Prof. Leal-Taixé and Prof. Niessner 60Submanifold Sparse Convolutional Networks, https://arxiv.org/abs/1706.01307 [Graham et al. 18]3D Semantic Segmentation with Submanifold Sparse Convolutional Networks
Dense
Sparse
![Page 60: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/60.jpg)
Sparse Convolutional Networks
Prof. Leal-Taixé and Prof. Niessner 61Submanifold Sparse Convolutional Networks, https://arxiv.org/abs/1706.01307 [Graham et al. 18]3D Semantic Segmentation with Submanifold Sparse Convolutional Networks
Submanifold Sparse Conv:-> set of active sites is unchanged-> active sites look at active neighbors (green)-> non-active sites (red) have no overhead
![Page 61: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/61.jpg)
Sparse Convolutional Networks
Prof. Leal-Taixé and Prof. Niessner 62Submanifold Sparse Convolutional Networks, https://arxiv.org/abs/1706.01307 [Graham et al. 18]3D Semantic Segmentation with Submanifold Sparse Convolutional Networks
Submanifold Sparse Conv:-> disconnected components do not communicate at first-> although they will merge due to effect of stride, pooling, convs, etc.
from left: (i) an active point is highlighted; a convolution with stride 2 sees the green active sites (ii) and produces output (iii), 'children' of hightlighted active point from (i) are highlighted; a submanifold sparse convolution sees the green active sites (iv) and produces output (v); a deconvolution operation sees the green active sites (vi) and produces output (vii).
![Page 62: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/62.jpg)
Sparse Convolutional Networks
Prof. Leal-Taixé and Prof. Niessner 63Submanifold Sparse Convolutional Networks, https://arxiv.org/abs/1706.01307 [Graham et al. 18]3D Semantic Segmentation with Submanifold Sparse Convolutional Networks
https://github.com/facebookresearch/SparseConvNet
![Page 63: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/63.jpg)
Conclusions so far
• Spares (volumetric) Convs:– Implemented with spatial hash function– Features only around “surface”– Require significantly less memory– Allow for much higher resolutions– It’s slower, but much higher accuracy
![Page 64: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/64.jpg)
3D Scene Understanding
… http://www.scan-net.org/
Eval
uat
ion
on
Hid
den
Tes
t Se
t o
n S
can
Net
![Page 65: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/65.jpg)
Next Lectures
• This is the last lecture slot!
• Keep working on the projects!
• Research opportunities
Prof. Leal-Taixé and Prof. Niessner 66
![Page 66: Deep Learning in Higher Dimensions · – Audio / Speech – Also Point Clouds • 2D ConvNets – Images (AlexNet, VGG, ResNet -> Classification, Localization, etc..) • 3D ConvNets](https://reader036.vdocuments.site/reader036/viewer/2022071019/5fd33f8afc0d4f17c344c201/html5/thumbnails/66.jpg)
See you
Prof. Leal-Taixé and Prof. Niessner 67