tables and cars with convolutional...

46
Learning to Generate Chairs, Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias Springenberg, Maxim Tatarchenko, Thomas Brox Liu Jiang and Ian Tam

Upload: others

Post on 20-May-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias

Learning to Generate Chairs, Tables and Cars with

Convolutional NetworksAlexey Dosovitskiy, Jost Tobias Springenberg,

Maxim Tatarchenko, Thomas Brox

Liu Jiang and Ian Tam

Page 2: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias

Introduction and Related Work

Page 3: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias

Overview (Part 1)

● Goal: Using a dataset of 3D models (chairs, tables, and cars), train generative ‘up-convolutional’ neural networks that can generate realistic 2D projections of objects from high-level descriptions○ Object style○ Viewpoint○ Additional transformation parameters (e.g. color and brightness)

Page 4: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias

Overview (Part 2)

● Networks do not merely memorize images but find a meaningful representation of 3D models, allowing them to: ○ Transfer knowledge within object class○ Transfer knowledge between classes○ Interpolate between different objects within a class and between classes○ Invent new objects not present in the training set

Page 5: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias

Related Work

● Train undirected graphical models, which treat encoding and generation as a joint inference problem○ Deep Boltzmann Machines (DBMs)○ Restricted Boltzmann Machines (RBMs)

● Train directed graphical models of the data distribution○ Gaussian mixture models○ Autoregressive models○ Stochastic variations of neural networks

Page 6: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias

Previous Work vs. This Paper

● Previous work○ Unsupervised generative models that can be extended to incorporate label

information, forming semi-supervised models○ Restricted to small models and images (maximum of 48 x 48 pixels)○ Require extensive inference procedure for both training and image generation

● This paper○ Supervised learning and assumes high-level latent representation of the images○ Generate large high quality images of 128 x 128 images○ Complete control over which images to generate. Downside is the need for labels

that fully describe the appearance of each image

Page 7: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias

Network Architectures and Training

Page 8: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias

Network Architecture

● Targets are the RGB output image x and the segmentation mask s. Generative network g(c, v, θ) is composed of three vectors: ○ c: model style○ v: horizontal angle and elevation of the camera position○ θ: parameters of additional transformations applied to the images

● Mostly generated 128 x 128 pixel images but also experimented with 64 x 64 and 256 x 256○ Only difference in the architectures is one less or more up-convolution ○ Adding a convolutional layer after each up-convolution increases quality of

generated images

Page 9: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias

2-Stream Network ArchitectureFC - fully connected, unconv - unpooling+convolution

Build a shared, high dimensional hidden

representation

Generate an image and object segmentation mask

Page 10: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias

Network Training

Network parameters W are trained by minimizing error of reconstructing the segmented-out chair image and the segmentation mask.

Qualitative results with different networks trained on chairs

Per-pixel mean squared error of generated images and # of parameters in expanding network parts

“1s-S-deep” network is best both

qualitatively and quantitatively

Page 11: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias

Training Set Size and Data Augmentation

● Experimented with data augmentation: fixing the network architecture and varying the training set size○ Effect is qualitatively similar to increasing training set size○ Worse reconstruction of fine details but better generalization

Page 12: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias

Qualitative results for different numbers of car models in the training set

Interpolation between two car modelsTop: W/O data augmentationBottom: W/ data augmentation

Page 13: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias

Key Experiments / Results

Page 14: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias

Modeling Transformations

Page 15: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias

Viewpoint Interpolation

Page 16: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias

Elevation Transfer / Extrapolation● Network trained on both tables and chairs can transfer knowledge about

elevations from table dataset to chair dataset and vice-versa● Training on both object classes forces network to model general 3D geometry

Page 17: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias

Style Interpolation● Interpolation between feature/label input vectors

Page 18: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias

Style Interpolation II● Interpolation between

multiple chairs

Page 19: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias

Feature Space Arithmetic

Page 20: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias

Correspondences● Given two images from training set,

generate style interpolations (of say, 64 images) between the two

● Use refined optical flow between interpolations to determine correspondences between objects in the two images

Page 21: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias

Analysis of the Network

Page 22: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias

Reminder: “2S-E” Network Architecture

Page 23: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias

Images Generated from Single Unit Activations in Feature Maps of Different Fully Connected Layers

Activating neurons of FC-1 and FC-2 feature maps of the class

stream while fixing viewpoint and transformation inputs

Activating neurons of FC-3 and FC-4 feature maps of the class

stream with non-fixed viewpoints

Page 24: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias

‘Zoom Neuron’

Increasing the activation of a specialized neuron while keeping all other activations fixed results in these transformations

Page 25: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias

Single neurons in later layers produce edge-like images. Neurons of higher deconvolutional

layers generate blurry ‘clouds’.

Images Generated from Single Neuron Activations in Feature Maps of Some Layers of the “2s-E” Network

Unconv-2

Unconv-1

FC-5

Page 26: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias

Smooth interpolation between a single activation and the whole chair: Neurons are activated in the center and the size of the center region is increased from 2 x 2 to 8 x 8.

Network Can Generate Fine Details Through a Combination of Spatially Neighboring Neurons

Interaction of neighboring neurons is important. In the center, where many neurons are active, the

image is sharp, while in the periphery, it is blurry.

Page 27: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias

Conclusion and Recap

● Supervised training of CNNs can be used to generate images given high-level information

● Network does not simply learn to generate training samples but instead learns an implicit 3D shape and geometry representation

● When trained stochastically, the network can even invent new chair styles

Page 28: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias

Other Approaches to Generative Networks

Page 29: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias

Generative Adversarial Networks

Page 30: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias

Deep Convolutional Generative Adversarial Networks● Generator Network A generates images● Discriminator Network B distinguishes generated images from real images● Backpropagate through both generator and discriminator :

○ Discriminator learns to distinguish real images from generated images○ Generator learns to “fool” discriminator by generating images similar to real images

● Ideally, generator improves such that discriminator can’t distinguish images● However, training the generator can be unstable - Oscillations or collapse of

the generator solution can happen

Generator ArchitectureGenerator-Discriminator Network

Radford, Metz and Chintala

Page 31: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias

Bedrooms in Latent Space

Page 32: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias

Face Rotations

Page 33: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias

Face Arithmetic

Page 34: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias

Generated Faces and Albums

Page 35: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias

InfoGAN● Maximizes the mutual information between latent variables and observations● Learns disentangled representations - Each latent variable corresponds to

some meaningful variable in semantic space (e.g. viewing angle, lighting)

Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever , Pieter Abbeel

Page 36: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias

Voxel-Based Approaches

Page 37: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias

Predictable and Generative Object Representations● Autoencoder to ensure that representation is generative● Convolutional network to ensure that representation is predictable

Rohit Girdhar, David Fouhey

Page 38: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias

Results on IKEA Dataset

Page 39: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias

Results on IKEA Dataset

Page 40: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias

Thank You

Page 41: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias

Variational Autoencoders● Bayesian inference on probabilistic graphical model with latent variables.● Jointly learn the recognition model (encoder) parameters and generative

model (decoder) parameters θ.● Recognition model q (z|x) approximates the intractable posterior pθ(z|x)

Page 42: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias

Deep Recurrent Attentive Writer (DRAW)● Variational Autoencoders + Recurrent Networks● Network decides at each time step

○ Where to Read○ Where to Write○ What to Write

Page 43: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias

DRAWings

Page 44: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias

PixelRNN● Model the conditional distribution of each individual pixel given previous pixels● LSTM network approximates ideal context

Page 45: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias

PixelRNN - Inpainting

Page 46: Tables and Cars with Convolutional Networksweb.stanford.edu/class/cs331b/2016/presentations/paper6.pdf · Tables and Cars with Convolutional Networks Alexey Dosovitskiy, Jost Tobias

PixelRNN - Generated ImageNet 64x64