deep models for 3d reconstruction...deep models for 3d reconstruction andreas geiger autonomous...

Deep Models for 3D Reconstruction

Andreas Geiger

Autonomous Vision Group, MPI for Intelligent Systems, TubingenComputer Vision and Geometry Group, ETH Zurich

October 12, 2017

Autonomous Vision Group

Max Planck Institutefor Intelligent Systems

3D Reconstruction

[Furukawa & Hernandez: Multi-View Stereo: A Tutorial]

Task:I Given a set of 2D imagesI Reconstruct 3D shape of object/scene

3D Reconstruction Pipeline

Input Images

Input Images Camera Poses

Input Images Camera Poses Dense Correspondences

Depth Maps3

Depth MapsDepth Map Fusion3

Depth MapsDepth Map Fusion3D Reconstruction3

Large 3D Datasets and Repositories

[Newcombe et al., 2011] [Choi et al., 2011] [Dai et al., 2017]

[Wu et al., 2015] [Chang et al., 2015] [Chang et al., 2017]4

Can we learn 3D Reconstruction from Data?

OctNet: Learning Deep3D Representations at High Resolutions

[Riegler, Ulusoy, & Geiger, CVPR 2017]

Deep Learning in 2D

[LeCun, 1998]

Deep Learning in 3D

I Existing 3D networks limited to ∼ 323 voxels

Deep Learning in 3D

I Existing 3D networks limited to ∼ 323 voxels8

3D Data is often Sparse

[Geiger et al., 2012]9

[Li et al., 2016]

Can we exploit sparsity for efficient deep learning?

[Li et al., 2016]

Can we exploit sparsity for efficient deep learning?9

Network Activations

Layer 1: 323 Layer 2: 163 Layer 3: 83

Idea:I Partition space adaptively based on sparse input

Network Activations

Convolution

I Differentiable⇒ allows for end-to-end learning

Convolution

Efficient ConvolutionThis operation can be implemented very efficiently:

I 4 different casesI First case requires only 1 evaluation!

Pooling

I Unpooling operation defined similarly

Pooling

I Unpooling operation defined similarly13

Results: 3D Shape Classification

FullyConn.

Convolutionand Pooling

FullyConn.

Airplane

83 163 323 643 1283 2563

Input Resolution

OctNetDenseNet

83 163 323 643 1283 2563

Input Resolution

OctNetDenseNet

83 163 323 643 1283 2563

Input Resolution

yOctNetDenseNet

I Input: voxelized meshes from ModelNet

83 163 323 643 1283 2563

Input Resolution

OctNet 1OctNet 2OctNet 3

I Input: voxelized meshes from ModelNet

Results: 3D Semantic Labeling

Input Prediction

I Dataset: RueMonge201418

Unpoolingand Conv.

I Decoder octree structure copied from encoder

[Riemenschneider et al., 2014] 42.3[Martinovic et al., 2015] 52.2[Gadde et al., 2016] 54.4

OctNet 643 45.6OctNet 1283 50.4OctNet 2563 59.2

OctNetFusion:Learning Depth Fusion from Data

[Riegler, Ulusoy, Bischof & Geiger, 3DV 2017]

Volumetric Fusion

di+1(p) =wi(p)di(p) + w(p)d(p)

wi(p) + w(p)

wi+1(p) = wi(p) + w(p)

I p ∈ R3: voxel locationI d: distance, w: weight

[Curless and Levoy, SIGGRAPH 1996]22

Volumetric FusionI Pros:

I Simple, fast, easy to implementI Defacto ”gold standard” (KinectFusion, Voxel Hashing, . . . )

I Cons:I Requires many redundant views to reduce noiseI Can’t handle outliers / complete missing surfaces

Ground Truth Volumetric Fusion

TV-L1 Fusion OctNetFusion

Volumetric FusionI Pros:

I Simple, fast, easy to implementI Defacto ”gold standard” (KinectFusion, Voxel Hashing, . . . )

I Cons:I Requires many redundant views to reduce noiseI Can’t handle outliers / complete missing surfaces

Ground Truth Volumetric Fusion

TV-L1 Fusion OctNetFusion

TV-L1 FusionI Pros:

I Prior on surface areaI Noise reduction

I Cons:I Simplistic local prior (penalizes surface area, shrinking bias)I Can’t complete missing surfaces

Ground Truth Volumetric Fusion TV-L1 Fusion

OctNetFusion

TV-L1 FusionI Pros:

I Prior on surface areaI Noise reduction

I Cons:I Simplistic local prior (penalizes surface area, shrinking bias)I Can’t complete missing surfaces

Ground Truth Volumetric Fusion TV-L1 Fusion

OctNetFusion

Learned FusionI Pros:

I Learn noise suppression from dataI Learn surface completion from data

I Cons:I Requires large 3D datasets for trainingI How to scale to high resolutions?

Ground Truth Volumetric Fusion TV-L1 Fusion OctNetFusion23

Learned FusionI Pros:

I Learn noise suppression from dataI Learn surface completion from data

I Cons:I Requires large 3D datasets for trainingI How to scale to high resolutions?

Ground Truth Volumetric Fusion TV-L1 Fusion OctNetFusion23

Learning 3D Fusion

Unpoolingand Conv.

Input Representation:I TSDFI Higher-order statistics

Output Representation:I OccupancyI TSDF

Learning 3D Fusion

Unpoolingand Conv.

What is the problem?

I Octree structure unknown⇒ needs to be inferred as well!

Learning 3D Fusion

Unpoolingand Conv.

What is the problem?I Octree structure unknown⇒ needs to be inferred as well!

OctNetFusion Architecture

Features

Input Output

256³ 256³

128³128³

64³64³

Octree Structure

∆128

∆256

Octree Structure

Results: Surface Reconstruction

VolFus TV-L1 Ours Ground Truth643

Results: Volumetric Completion

[Firman, 2016] Ours Ground Truth27

Thank you!

deep models for 3d reconstruction...deep models for 3d reconstruction andreas geiger autonomous...

Documents

light field reconstruction using deep convolutional network...

immediate simultaneous bilateral breast reconstruction...

fast light field reconstruction with deep coarse-to-fine...

deep generative modeling for mechanistic-based learning...

breast reconstruction after mastectomy patient/family...

mai et al.: on the reconstruction of deep face templates...

ml-sim: a deep neural network for reconstruction of ... ·...

deep learning for pet image reconstruction

deep textured 3d reconstruction of human bodies arxiv:1809...

surface reconstruction based on self-merging octree with...

deep learning image reconstruction boosting the signal-to

gpumlib: deep learning som library for surface...

deep graph topology learning for 3d point cloud...

3d face reconstruction system based on deep learning and ......

deep non-line-of-sight reconstruction · 2020. 6. 28. ·...

data consistent ct reconstruction from …artifact reduction...

deep learning for surface reconstruction · 2018. 3....

astronomical image reconstruction with deep convolutional...

deep complex convolutional network for fast reconstruction...

hdr image reconstruction from a single exposure using deep...