cse 559a: computer visionayan/courses/cse559a/pdfs/lec27.pdf · generative adversarial networks set...

/

CSE 559A: Computer Vision

Fall 2020: T-R: 11:30-12:50pm @ Zoom

Instructor: Ayan Chakrabarti ([email protected]).Course Staff: Adith Boloor, Patrick Williams

Dec 17, 2020

http://www.cse.wustl.edu/~ayan/courses/cse559a/

1

http://www.cse.wustl.edu/~ayan/courses/cse559a/

/

GENERALGENERALLast class.No office hours tomorrow.Project Reports Due Dec 22nd at the latest (no extensions!)

2

/

STYLE TRANSFERSTYLE TRANSFERMake one ima”e have the style or texture quality o“ another.Gatys et al., CVPR 2016:

Don’t train a network “or thisInstead take an existin” network and look at the properties o“ its activationsValues o“ hi”her layers represent content : Try to preserve themCovariances o“ other layers represent style: Try to match them with other ima”e

3

/

STYLE TRANSFERSTYLE TRANSFER

Set this up as an optimization problem, and minimize with SGD+Backprop “rom a random init.

4

/

STYLE TRANSFERSTYLE TRANSFER

5

/

GENERATIVE ADVERSARIAL NETWORKSGENERATIVE ADVERSARIAL NETWORKSSo “ar, we have looked at networks that ”iven an input , produce an output .

All pairs come “rom some joint distribution .

A “unction that maps is then reasonin” with the distribution

And producin” a sin”le ”uess which minimizes .

But i“ is not deterministic, this expected loss won’t ”o to zero: Bayes Error

What i“ I didn’t want my network to produce a best ”uess, but tell me about this distribution ?

x y

(x, y) p(x, y)

x → y p(y|x)

y

L(y, )�

p(y|x)

y

p(y|x)

6

/

GENERATIVE ADVERSARIAL NETWORKSGENERATIVE ADVERSARIAL NETWORKSOne option: choose a parametric “orm “or .

Have a network that predicts “or a speci“ic .

The other option is to train a sampler. A network that ”iven an input , produces samples “rom .

How do you produce samples, or how do you produce multiple outputs “or the same input ?You ”ive your network access to a random ”enerator: a noise source.

p(y|x) = f (y; θ)

g θ = g(x) x

x p(y|x)

7

/

GENERATIVE ADVERSARIAL NETWORKSGENERATIVE ADVERSARIAL NETWORKSLet’s i”nore conditional distributions. Consider the task o“ ”eneratin” samples “rom .

You don’t know , but you have trainin” examples that are samples “rom .

You want to learn a ”enerator network which

Takes in random inputs “rom a known distribution

And produces outputs “rom

Has learnable parameters

You want to select such that the distribution o“ to match .

But you don’t have the data distribution, only samples “rom it.

(x)p

x

p(x) (x)p

x

G(z; θ)

z (z)p

z

x p(x)

θ

θ {G(z; θ) : z ∼ (z)}p

z

(x)p

x

8

/

GENERATIVE ADVERSARIAL NETWORKSGENERATIVE ADVERSARIAL NETWORKSSet this up as a min-max objective with a second network, a discriminator

The discriminator is a binary classi“ier. Tries to determine i“

The input is real , i.e., it came “rom the trainin” set.

Or “ake , i.e., it was the output o“

Train both networks simultaneously, a”ainst the “ollowin” loss:

This is the cross-entropy loss on the discriminator sayin” outputs o“ should be labeled 0.

What about examples that should be labeled 1 ?

D(x,ϕ)

x

G

L(θ,ϕ) = − log(1 −D(G(z; θ);ϕ))�

z∼p

z

G

9

/

GENERATIVE ADVERSARIAL NETWORKSGENERATIVE ADVERSARIAL NETWORKSSet this up as a min-max objective with a second network, a discriminator

The discriminator is a binary classi“ier. Tries to determine i“

The input is real , i.e., it came “rom the trainin” set.

Or “ake , i.e., it was the output o“

Train both networks simultaneously, a”ainst the “ollowin” loss:

This is the cross-entropy loss on the discriminator sayin” outputs o“ should be labeled 0.

What about examples that should be labeled 1 ?

D(x,ϕ)

x

G

L(θ,ϕ) = − log(1 −D(G(z; θ);ϕ)) − logD(x;ϕ)�

z∼p

z

�

x∼p

x

G

10

/

GENERATIVE ADVERSARIAL NETWORKSGENERATIVE ADVERSARIAL NETWORKS

Expectation is just avera”e over trainin” set.

Expectation is based on samplin” “rom known at each iteration.

You are optimizin” the discriminator to succeed in tellin” real and “ake samples. To minimize the loss.And trainin” the ”enerator to “ool the disciminator, i.e., maximize the same lossHow do you solve this optimization problem ?Turns out, it is reasonable to use back-prop and ”radient descent.

Just compute ”radients with respect to the loss “or both the discriminator and ”enerator.Subtract them “rom the discriminator, add them to the ”enerator.


z∼p

z

�

x∼p

x

�

x∼p

x

�

z∼p

z

z p

z

θ = arg L(θ,ϕ)max

θ

min

ϕ

11

/


Theoretical Analysis

Let’s say your discriminator and ”enerator had in“inite capacity and you had in“inite trainin” data.

For a ”iven input , what should the optimal output o“ your discriminator be ?

Say you know .

You also know and , and there“ore : probability o“ bein” an output “rom the ”enerator.

What minimizes this, “or ?

Let’s replace with this optimal value in the loss “unction, and “i”ure out what should do.

G = arg − log(1 −D(G(z))) − logD(x)max

G

min

D

�

z∼p

z

�

x∼p

x

x D(x)

(x)p

x

(x)p

z

G (x)p

g

x

q = D(x) = arg − (x) log(1 − q) − (x) log qmin

q

p

g

p

x

q q ∈ [0, 1]

q =

(x)p

x

(x) + (x)p

g

p

x

D G

12

/


Remember that also depends on . In “act, you can replace this as an optimization on .

You can relate this to KL-diver”ences:

Called the Jensen-Shannon Diver”ence

Minimized when matches .

G = arg − log(1 −D(G(z))) − logD(x)max

G

min

D

�

z∼p

z

�

x∼p

x

G = arg log + logmin

G

�

z∼p

z

(G(z))p

g

(G(z)) + (G(z))p

x

p

g

�

x∼p

x

(x)p

x

(x) + (x)p

x

p

g

p

g

G p

g

= argmin

[

(x) log + (x) log

]

dxp

g

∫

x

p

g

(x)p

g

(x) + (x)p

x

p

g

p

x

(x)p

x

(x) + (x)p

x

p

g

KL

(

‖

)

+ KL

(

‖

)

p

g

+p

g

p

x

2

p

x

+p

g

p

x

2

p

g

p

x

13

/


Practical Concerns

So the procedure is, set up your ”enerator and discriminator networks. De“ine the loss.

At each iteration, pick a batch o“ values “rom a known distribution (typically a vector o“ uni“ormly orGaussian distributed values)And a batch o“ trainin” samplesCompute ”radients “or the discriminator and update.Compute ”radients “or the ”enerator, by back-propa”atin” through the discriminator, and update.


z∼p

z

�

x∼p

x

z

14

/


15

/


16

/


17

/

GENERATIVE ADVERSARIAL NETWORKSGENERATIVE ADVERSARIAL NETWORKSPractical Concerns

A common issue is that the discriminator has a much easier taskIn the initial iterations, your ”enerator will be producin” junk.Very easy “or the discriminator to identi“y “ake samples with hi”h con“idence.

At that point, will saturate.

No ”radients to ”enerator.

One common approach: minimize with respect to a different loss

Instead o“

Do

View as minimizin” cross-entropy wrt wron” label, rather than maximizin” wrt true label.


z∼p

z

�

x∼p

x

log 1 −D(G(z))

G

max− log(1 −D(G(z)))

min− logD(G(z))

18

/


Max vs min

Binary Classi“ier: output o“ is “or some

Discriminator is doin” really well, so is almost 0 ( )

Both are ne”ative (i.e., correct si”n, says try to increase )

But second version has much hi”her ma”nitude.

− log(1 −D(G)) − log(D(G))

D σ(y) y

D(G) y << 0

max− log(1 −D(G)) = min log(1 −D(G))

= −(D(G) − 0) = 0 −D(G) ≈ 0∇

y

min− log(D(G))

= D(G) − 1 ≈ −1∇

y

D(G)

19

/


Other approaches:Reduce capacity o“ discriminatorMake “ewer updates to discriminator, or have lower learnin” rateProvide additional losses to ”enerator to help it train: e.”., separate network that predicts intermediate“eatures o“ the discriminator.Other losses: See Wasserstein GANs.

Also need to be care“ul how you use Batch Normalization. (Don’t let the discriminator use batch statistics totell real and “ake apart!)

20

/

GENERATIVE ADVERSARIAL NETWORKSGENERATIVE ADVERSARIAL NETWORKSConditional GANs: now want to sample “rom “or a ”iven

Same adversarial settin” but is ”iven as an input to both ”enerator and discriminator

and

Sometimes a noise source is simply replaced by dropoutOr no noise at all: called an Adversarial Loss . The ”enerator is producin” a deterministic output, but bein”trained with a distribution matchin” loss rather than or .

Can be use“ul when the true is multi-modal.

Re”ular networks would avera”e the modes, adversarial loss promotes pickin” one o“ the modes.

p(x|s) s

s

G(z, s) D(x, s)

L

1

L

2

p(x|s)

21

/

ADVERSARIAL LOSSADVERSARIAL LOSSTrain with

The GAN loss is unconditional, but there is also a reconstruction loss.So the loss says, be close to the true answer, but make your output resemble natural ima”es.

L(G) = ‖G(x) − y − λ logD(G(x))‖

2

L(D) = − log(1 −D(G(x))) − logD(y)

22

/

GENERATIVE ADVERSARIAL NETWORKSGENERATIVE ADVERSARIAL NETWORKSDCGAN: Radford et al.

23

/

GENERATIVE ADVERSARIAL NETWORKSGENERATIVE ADVERSARIAL NETWORKSDCGAN: Radford et al.

Generated ima”es “rom a trainin” set o“ bedrooms (LSUN dataset).

24

/

GENERATIVE ADVERSARIAL NETWORKSGENERATIVE ADVERSARIAL NETWORKSNeyshabur et al., Stabilizing GAN Training with Multiple Random Projections

25

/


26

/


27

/

GENERATIVE ADVERSARIAL NETWORKSGENERATIVE ADVERSARIAL NETWORKSDenton et al., Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks

28

/

GENERATIVE ADVERSARIAL NETWORKSGENERATIVE ADVERSARIAL NETWORKSKarras et al., PROGRESSIVE GROWING OF GANS FOR IMPROVED QUALITY, STABILITY, AND VARIATION

29

/

GENERATIVE ADVERSARIAL NETWORKSGENERATIVE ADVERSARIAL NETWORKSKarras et al., PROGRESSIVE GROWING OF GANS FOR IMPROVED QUALITY, STABILITY, AND VARIATION

30

/

CONDITIONAL GANSCONDITIONAL GANS

31

/


32

/


33

/


34

/

UN-SUPERVISED LEARNINGUN-SUPERVISED LEARNINGFor a lot o“ tasks, it is hard to collect enou”h trainin” data.We saw “or the stereo example, how you can have an indirect supervision.But in other cases, you have to use trans“er learnin”.

Train a network on a lar”e dataset “or a related task “or which you have ”round truth.Remove last layer, and use / “inetune “eature extractor “or new task.

Researchers are explorin” tasks made speci“ically to trans“er “rom.

35

/

UN-SUPERVISED LEARNINGUN-SUPERVISED LEARNINGPre-train by learnin” to add color

Larsson, Maire, Shakhnarovich, CVPR 2017.

36

/

UN-SUPERVISED LEARNINGUN-SUPERVISED LEARNINGPre-train by solvin” ji”saw puzzles

37

/

UN-SUPERVISED LEARNINGUN-SUPERVISED LEARNINGPre-train by predictin” sound “rom video

38

/

DOMAIN ADAPTATIONDOMAIN ADAPTATIONGenerate synthetic trainin” data usin” renderers.But networks trained on synthetic data need not ”eneralize to real data.(In “act, they may not trans“er “rom hi”h-quality Flickr data to cell-phone camera data)

Problem Setting

Have input-output trainin” pairs o“ “rom source domain: renderin”s/hi”h-quality ima”es/…

Have only inputs “rom tar”et domain: where we actually want to use this.

Train a network so that “eatures computed “rom and have the same distribution …

i.e., use GANs !

( , y)x

′

x

x

′

x

39

/

DOMAIN ADAPTATIONDOMAIN ADAPTATION

40

/

DOMAIN ADAPTATIONDOMAIN ADAPTATION

41

/

We’ve covered what “orms the “oundations o“ state-o“-the-art vision al”orithmsWill help you read, understand, and implement vision papersBut thin”s are chan”in” rapidly: not just new solutions, but new problemsSo keep readin” !

We hope that you have …

An understandin” o“ the basic math and pro”rammin” tools to approach vision problemsAre as surprised as we are that humans and animals are able to solve this so easily

That’s all folks !

42

cse 559a: computer visionayan/courses/cse559a/pdfs/lec27.pdf · generative adversarial networks set...

Documents