Generative Adversarial Nets: Applications and Extensions

Wangmeng Zuo

Vision Perception and Cognition CentreHarbin Institute of Technology

LeCun, NIPS 2016

• Reinforcement learning (cherry)

• Supervised learning (Chocolate)

• Unsupervised/Predictive learning (Cake)• Generative adversarial nets (GAN)

For Most Application Tasks

• For most applications, GANs only serve as the accessories to the existing solutions.

• How to Make Latte Art (i.e. improve the trainability of generator)

• How to make a perfect Latte Coffee (i.e. incorporate with other models for solving real problems)


Other Learning Models


• Improve the trainability of GANs: An Application Perspective• Theoretical solution

• Incorporating with other learning models

• Designing generator based on signal/image characteristics

• Applications• Adversarial learning

• Low level vision

• Domain adaptation

• Image translation

Improve the trainability of GANs

Generative Adversarial Networks (Goodfellow et al., NIPS 2014)

• Update the generator to generate more realistic image

• Update the discriminator to discriminate the synthetic images from real ones

Mode Collapse

• D in inner loop: convergence to correct distribution

• G in inner loop: place all mass on most likely point

Let's first turn to supervised deep learning

• Unprecedented successes in:• Image classification

• Image denoising, image super-resolution

• ...

• Can we exploit these achievements to improve GAN training?• How to train a good generator (the later half of image restoration?)

• How to train a good discriminator (classification?)


• Auto-encoder

• Denoising auto-encoder

Variational AutoEncoder

• Variational AutoEncoder

• Relaxation of discrete variables

VAE/GAN (Larsen et al., ICML 2016)




Classifier Discriminator

• Na Lei, Kehua Su, Li Cui, Shing-Tung Yau, David Xianfeng Gu, A Geometric View of Optimal Transportation and Generative Model, Arxiv 2017.

Nguyen et al., NIPS 2016

• Optimize the hidden code input (red bar) of a deep image generator network (DGN) to produce an image that highly activates h

InfoGAN (Chen et al., NIPS 2016)


• InfoGAN (Chen et al., NIPS 2016)• Input: z, c

• Interpretable and disentangled representations

• Easy to train

AC-GAN (Odena et al., ICML 2017)

• Class-conditional image synthesis with Auxiliary Classifier GANs

• The log-likelihood of the correct source:

• The log-likelihood of the correct class:

Arbitrary Facial Attribute Editing

• One model for all tasks (He et al., Arxiv 2018)

A Favorable Framework

• Auto-encoder


Extension for attribute style manipulation

Single task


Continuous attribute

Attribute Style Manipulation

Take home message

• Incorporating auto-encoder to improve the trainability of generator;

• Incorporating deep classification model to improve the trainability of discriminator

Let's then turn to the objective of GANs

• Image generation

• What's the characteristics of an image• Multi-scale property

• Manifold property

• What makes a high quality image• Deep image prior

• Deep image quality assessment

LAPGANs (Denton et al., NIPS 2015)

LAPGANs (Denton et al., 2015)

Stack-GAN (Zhang et al., ICCV 2017)

Stack-GAN (Zhang et al., ICCV 2016)

• Stage-I GAN

• Stage-II GAN

Cascaded Refinement Networks (Chen & Koltun, ICCV 2017)

• CRN: not rely on adversarial training

Manifold property (Benaim & Wolf, NIPS 2017)

• Distance Constraints

• Self-distance Constraints

Total Variation

• Deep feature visualization

• Total variation (TV) regularization

• Better (deep) image prior?

Insight from deep image denoising

• DnCNN for image denoising (Zhang et al., TIP 2017)

• For noisy image,

• For clean image,

• Perceptual regularization (Li et al., Arxiv 2016)

ˆ ( ; )CNN x y y

( ; )CNN y y x2 2( ; )CNN mn y

2( ; ) 0CNN y

2( ; )CNN y

Deep image prior (Ulyanov et al., CVPR 2018)

• Energy

• Image restoration

• A randomly-initialized neural network can be used as a handcrafted prior

• The structure of a generator network is sufficient to capture a great deal of low-level image statistics prior to any learning

Deep Features as a Perceptual Metric (Zhang et al., CVPR 2018)

• Perceptual loss

• Deep features outperform all previous metrics by huge margins.

• This result is not restricted to ImageNet-trained VGG features, but holds across different deep architectures and levels of supervision (supervised, self-supervised, or even unsupervised).

• Deep Non-reference Image Quality Assessment?

Take home message

• Exploiting image property to improve GANs

• Developing deep models/GANs for better revealing image priors/quality

• Object-oriented design


Adversarial learning (Szegedy et al., ICLR 2014)

• Deep neural networks learn input-output mappings that are fairly discontinuous to a significant extent.

• We can cause the network to misclassify an image by applying a certain hardly perceptible perturbation, which is found by maximizing the network’s prediction error.


Intriguing properties of neural networks (Szegedy et al., ICLR 2014)


Deep Neural Networks are Easily Fooled (Nguyen et al., CVPR 2015)



> 99.6%confidences

Adversarial Attacks and Defences Competition (Kurakin et al., Arxiv 2018)

• 1st place in defense track: team TsAIL

• Team members: Yinpeng Dong, Fangzhou Liao, Ming Liang, Tianyu Pang, Jun Zhu and Xiaolin Hu.

• Solution: Denoising U-net

Adversarially-augmented training (Simon-Gabriel et al., Arxiv 2018)

• Adversarially-augmented training

• Replacing strided by average-pooling layers

• Increase generalization performance

Object detection: A-Fast-RCNN (Wang et al., CVPR 2017)

Visual tracking

• CVPR 2018

• VITAL: VIsual Tracking via Adversarial Learning

• SINT++: Robust Visual Tracking via Adversarial Hard Positive Generation

Low level vision

• SRGAN for super-resolution

DSLR-Quality Photos on Mobile Devices (Ignatov et al., ICCV 2017)

• Color loss

• Texture loss

• Content loss

• TV regularizer

• Discriminator

WESPE: Weakly Supervised Photo Enhancer (NTIRE 2018)

• Only require two distinct datasets

Image inpainting: more freedom and non-uniqueness

Context-encoders (Pathak et al., 2016)

• The first key: Auto-encoder

Problem with auto-encoder

• Information bottleneck

Adversarial loss is helpful

• But remains limited ...

Analyzing U-Net (Ronneberger et al., 2015)

• Fine-details

• Unfortunately, also not work for inpainting

Return to traditional patch-based inpainting

• Patch processing order

• PatchMatch

CNN and Patch-based Solutions are Complementary

• CNN-based solution• Poor texture

• Better structure

• Patch-based solution• Better details

• Poor structure

• Can we combine them in an end-to-end learning framework?


CNN architecture

Objective and learning

• Objective

• Learning


• Speed• MNPS: 40mins -> 40s

• Ours: 82 ms


Random mask

Real images

Guided face enhancement (Li et al., Arxiv 2018)

Film Restoration, Smartphones


• 1. Blind enhancement: the degradation model is sophisticated and unknown

• blur, downsampling, noise, compression

• 2. The guided and degraded images are of different pose, expression and illumination

Challenge 1

• Train on realistic synthetic degraded images, test on real degraded image

• The degradation model:

Challenge 2: GFRNet

Model and losses for WarpNet

• Landmark loss

• TV regularization

Model objective

• Reconstruction loss

• Adversarial loss

• Objective

Appearance Flow



More images


Domain Adaptation

• Domain Adaptation: learning from a (labeled) source data distribution a well performing model on a different (but related) (labeled or unlabled) target data distribution (wikipedia)

• Three categories:• Supervised domain adaptation

• Semi-supervised domain adaptation

• Unsupervised domain adaptation

The Future of Real-Time SLAM (ICCV 2015 Workshop)

• Panel discussion: Deep Learning vs SLAM

• Newcombe's Proposal: Use SLAM to fuel Deep Learning

• Today's SLAM systems are large-scale "correspondence engines" which can be used to generate large-scale datasets

• Graphics for CNN

The need of domain adaptation



Domain Transfer

Unsupervised domain adaptation

• Only the class labels of source samples are known, all class labels of the target samples are unknown.

• Goal: a feature extractor f and a classifier c• P(f(xs)) = P(f(xt))

• Better classification performance on xs

• Key issue: Discrepancy metric between two complex distributions• D(P(f(xs)), P(f(xt)))

Weighted MMD

• Let

• Define

• Weighted MMD


Unsupervised Domain Adaptation by Backpropagation (Ganin & Lempitsky, ICML 2015)

Simultaneous Deep Transfer Across Domains and Tasks (Tzeng et al., ICCV 2015)

• “maximally confuse” the two domains

• uniform distribution over domain labels

Domain cocktail network (Xu et al., CVPR 2018)

SimGAN (CVPR 2017)

• Learning from Simulated and Unsupervised Images through Adversarial Training (Shrivastava, Arxiv 2016)

• Realism loss

• Self-regularization

• is also pixel-level DA

Unsupervised Pixel–Level Domain Adaptation (CVPR 2017)

Image translation (Zhu et al., CVPR 2017)

Pix2pix: supervised image translation (Isola et al., CVPR 2017)

• Positive pair: • (Input, groundtruth)

• Negative pair:• (Input, synthesis)

Learning Residual Images (Shen & Liu, CVPR 2017)

Cycle-Consistent supervision (Zhu et al., ICCV 2017)• Cycle consistency loss

BicycleGAN: Multimodal Image-to-Image Translation (Zhu et al., NIPS 2017)


• Problem-oriented• Generator

• Discriminator+


