cse 559a: computer visionayan/courses/cse559a/pdfs/lec27.pdf · generative adversarial networks set...
TRANSCRIPT
/
CSE 559A: Computer Vision
Fall 2020: T-R: 11:30-12:50pm @ Zoom
Instructor: Ayan Chakrabarti ([email protected]).Course Staff: Adith Boloor, Patrick Williams
Dec 17, 2020
http://www.cse.wustl.edu/~ayan/courses/cse559a/
1
/
GENERALGENERALLast class.No office hours tomorrow.Project Reports Due Dec 22nd at the latest (no extensions!)
2
/
STYLE TRANSFERSTYLE TRANSFERMake one ima”e have the style or texture quality o“ another.Gatys et al., CVPR 2016:
Don’t train a network “or thisInstead take an existin” network and look at the properties o“ its activationsValues o“ hi”her layers represent content : Try to preserve themCovariances o“ other layers represent style: Try to match them with other ima”e
3
/
STYLE TRANSFERSTYLE TRANSFER
Set this up as an optimization problem, and minimize with SGD+Backprop “rom a random init.
4
/
STYLE TRANSFERSTYLE TRANSFER
5
/
GENERATIVE ADVERSARIAL NETWORKSGENERATIVE ADVERSARIAL NETWORKSSo “ar, we have looked at networks that ”iven an input , produce an output .
All pairs come “rom some joint distribution .
A “unction that maps is then reasonin” with the distribution
And producin” a sin”le ”uess which minimizes .
But i“ is not deterministic, this expected loss won’t ”o to zero: Bayes Error
What i“ I didn’t want my network to produce a best ”uess, but tell me about this distribution ?
x y
(x, y) p(x, y)
x → y p(y|x)
y
L(y, )�
p(y|x)
y
p(y|x)
6
/
GENERATIVE ADVERSARIAL NETWORKSGENERATIVE ADVERSARIAL NETWORKSOne option: choose a parametric “orm “or .
Have a network that predicts “or a speci“ic .
The other option is to train a sampler. A network that ”iven an input , produces samples “rom .
How do you produce samples, or how do you produce multiple outputs “or the same input ?You ”ive your network access to a random ”enerator: a noise source.
p(y|x) = f (y; θ)
g θ = g(x) x
x p(y|x)
7
/
GENERATIVE ADVERSARIAL NETWORKSGENERATIVE ADVERSARIAL NETWORKSLet’s i”nore conditional distributions. Consider the task o“ ”eneratin” samples “rom .
You don’t know , but you have trainin” examples that are samples “rom .
You want to learn a ”enerator network which
Takes in random inputs “rom a known distribution
And produces outputs “rom
Has learnable parameters
You want to select such that the distribution o“ to match .
But you don’t have the data distribution, only samples “rom it.
(x)p
x
p(x) (x)p
x
G(z; θ)
z (z)p
z
x p(x)
θ
θ {G(z; θ) : z ∼ (z)}p
z
(x)p
x
8
/
GENERATIVE ADVERSARIAL NETWORKSGENERATIVE ADVERSARIAL NETWORKSSet this up as a min-max objective with a second network, a discriminator
The discriminator is a binary classi“ier. Tries to determine i“
The input is real , i.e., it came “rom the trainin” set.
Or “ake , i.e., it was the output o“
Train both networks simultaneously, a”ainst the “ollowin” loss:
This is the cross-entropy loss on the discriminator sayin” outputs o“ should be labeled 0.
What about examples that should be labeled 1 ?
D(x,ϕ)
x
G
L(θ,ϕ) = − log(1 −D(G(z; θ);ϕ))�
z∼p
z
G
9
/
GENERATIVE ADVERSARIAL NETWORKSGENERATIVE ADVERSARIAL NETWORKSSet this up as a min-max objective with a second network, a discriminator
The discriminator is a binary classi“ier. Tries to determine i“
The input is real , i.e., it came “rom the trainin” set.
Or “ake , i.e., it was the output o“
Train both networks simultaneously, a”ainst the “ollowin” loss:
This is the cross-entropy loss on the discriminator sayin” outputs o“ should be labeled 0.
What about examples that should be labeled 1 ?
D(x,ϕ)
x
G
L(θ,ϕ) = − log(1 −D(G(z; θ);ϕ)) − logD(x;ϕ)�
z∼p
z
�
x∼p
x
G
10
/
GENERATIVE ADVERSARIAL NETWORKSGENERATIVE ADVERSARIAL NETWORKS
Expectation is just avera”e over trainin” set.
Expectation is based on samplin” “rom known at each iteration.
You are optimizin” the discriminator to succeed in tellin” real and “ake samples. To minimize the loss.And trainin” the ”enerator to “ool the disciminator, i.e., maximize the same lossHow do you solve this optimization problem ?Turns out, it is reasonable to use back-prop and ”radient descent.
Just compute ”radients with respect to the loss “or both the discriminator and ”enerator.Subtract them “rom the discriminator, add them to the ”enerator.
L(θ,ϕ) = − log(1 −D(G(z; θ);ϕ)) − logD(x;ϕ)�
z∼p
z
�
x∼p
x
�
x∼p
x
�
z∼p
z
z p
z
θ = arg L(θ,ϕ)max
θ
min
ϕ
11
/
GENERATIVE ADVERSARIAL NETWORKSGENERATIVE ADVERSARIAL NETWORKS
Theoretical Analysis
Let’s say your discriminator and ”enerator had in“inite capacity and you had in“inite trainin” data.
For a ”iven input , what should the optimal output o“ your discriminator be ?
Say you know .
You also know and , and there“ore : probability o“ bein” an output “rom the ”enerator.
What minimizes this, “or ?
Let’s replace with this optimal value in the loss “unction, and “i”ure out what should do.
G = arg − log(1 −D(G(z))) − logD(x)max
G
min
D
�
z∼p
z
�
x∼p
x
x D(x)
(x)p
x
(x)p
z
G (x)p
g
x
q = D(x) = arg − (x) log(1 − q) − (x) log qmin
q
p
g
p
x
q q ∈ [0, 1]
q =
(x)p
x
(x) + (x)p
g
p
x
D G
12
/
GENERATIVE ADVERSARIAL NETWORKSGENERATIVE ADVERSARIAL NETWORKS
Remember that also depends on . In “act, you can replace this as an optimization on .
You can relate this to KL-diver”ences:
Called the Jensen-Shannon Diver”ence
Minimized when matches .
G = arg − log(1 −D(G(z))) − logD(x)max
G
min
D
�
z∼p
z
�
x∼p
x
G = arg log + logmin
G
�
z∼p
z
(G(z))p
g
(G(z)) + (G(z))p
x
p
g
�
x∼p
x
(x)p
x
(x) + (x)p
x
p
g
p
g
G p
g
= argmin
[
(x) log + (x) log
]
dxp
g
∫
x
p
g
(x)p
g
(x) + (x)p
x
p
g
p
x
(x)p
x
(x) + (x)p
x
p
g
KL
(
‖
)
+ KL
(
‖
)
p
g
+p
g
p
x
2
p
x
+p
g
p
x
2
p
g
p
x
13
/
GENERATIVE ADVERSARIAL NETWORKSGENERATIVE ADVERSARIAL NETWORKS
Practical Concerns
So the procedure is, set up your ”enerator and discriminator networks. De“ine the loss.
At each iteration, pick a batch o“ values “rom a known distribution (typically a vector o“ uni“ormly orGaussian distributed values)And a batch o“ trainin” samplesCompute ”radients “or the discriminator and update.Compute ”radients “or the ”enerator, by back-propa”atin” through the discriminator, and update.
L(θ,ϕ) = − log(1 −D(G(z; θ);ϕ)) − logD(x;ϕ)�
z∼p
z
�
x∼p
x
z
14
/
GENERATIVE ADVERSARIAL NETWORKSGENERATIVE ADVERSARIAL NETWORKS
15
/
GENERATIVE ADVERSARIAL NETWORKSGENERATIVE ADVERSARIAL NETWORKS
16
/
GENERATIVE ADVERSARIAL NETWORKSGENERATIVE ADVERSARIAL NETWORKS
17
/
GENERATIVE ADVERSARIAL NETWORKSGENERATIVE ADVERSARIAL NETWORKSPractical Concerns
A common issue is that the discriminator has a much easier taskIn the initial iterations, your ”enerator will be producin” junk.Very easy “or the discriminator to identi“y “ake samples with hi”h con“idence.
At that point, will saturate.
No ”radients to ”enerator.
One common approach: minimize with respect to a different loss
Instead o“
Do
View as minimizin” cross-entropy wrt wron” label, rather than maximizin” wrt true label.
L(θ,ϕ) = − log(1 −D(G(z; θ);ϕ)) − logD(x;ϕ)�
z∼p
z
�
x∼p
x
log 1 −D(G(z))
G
max− log(1 −D(G(z)))
min− logD(G(z))
18
/
GENERATIVE ADVERSARIAL NETWORKSGENERATIVE ADVERSARIAL NETWORKSPractical Concerns
Max vs min
Binary Classi“ier: output o“ is “or some
Discriminator is doin” really well, so is almost 0 ( )
Both are ne”ative (i.e., correct si”n, says try to increase )
But second version has much hi”her ma”nitude.
− log(1 −D(G)) − log(D(G))
D σ(y) y
D(G) y << 0
max− log(1 −D(G)) = min log(1 −D(G))
= −(D(G) − 0) = 0 −D(G) ≈ 0∇
y
min− log(D(G))
= D(G) − 1 ≈ −1∇
y
D(G)
19
/
GENERATIVE ADVERSARIAL NETWORKSGENERATIVE ADVERSARIAL NETWORKSPractical Concerns
Other approaches:Reduce capacity o“ discriminatorMake “ewer updates to discriminator, or have lower learnin” rateProvide additional losses to ”enerator to help it train: e.”., separate network that predicts intermediate“eatures o“ the discriminator.Other losses: See Wasserstein GANs.
Also need to be care“ul how you use Batch Normalization. (Don’t let the discriminator use batch statistics totell real and “ake apart!)
20
/
GENERATIVE ADVERSARIAL NETWORKSGENERATIVE ADVERSARIAL NETWORKSConditional GANs: now want to sample “rom “or a ”iven
Same adversarial settin” but is ”iven as an input to both ”enerator and discriminator
and
Sometimes a noise source is simply replaced by dropoutOr no noise at all: called an Adversarial Loss . The ”enerator is producin” a deterministic output, but bein”trained with a distribution matchin” loss rather than or .
Can be use“ul when the true is multi-modal.
Re”ular networks would avera”e the modes, adversarial loss promotes pickin” one o“ the modes.
p(x|s) s
s
G(z, s) D(x, s)
L
1
L
2
p(x|s)
21
/
ADVERSARIAL LOSSADVERSARIAL LOSSTrain with
The GAN loss is unconditional, but there is also a reconstruction loss.So the loss says, be close to the true answer, but make your output resemble natural ima”es.
L(G) = ‖G(x) − y − λ logD(G(x))‖
2
L(D) = − log(1 −D(G(x))) − logD(y)
22
/
GENERATIVE ADVERSARIAL NETWORKSGENERATIVE ADVERSARIAL NETWORKSDCGAN: Radford et al.
23
/
GENERATIVE ADVERSARIAL NETWORKSGENERATIVE ADVERSARIAL NETWORKSDCGAN: Radford et al.
Generated ima”es “rom a trainin” set o“ bedrooms (LSUN dataset).
24
/
GENERATIVE ADVERSARIAL NETWORKSGENERATIVE ADVERSARIAL NETWORKSNeyshabur et al., Stabilizing GAN Training with Multiple Random Projections
25
/
GENERATIVE ADVERSARIAL NETWORKSGENERATIVE ADVERSARIAL NETWORKSNeyshabur et al., Stabilizing GAN Training with Multiple Random Projections
26
/
GENERATIVE ADVERSARIAL NETWORKSGENERATIVE ADVERSARIAL NETWORKSNeyshabur et al., Stabilizing GAN Training with Multiple Random Projections
27
/
GENERATIVE ADVERSARIAL NETWORKSGENERATIVE ADVERSARIAL NETWORKSDenton et al., Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks
28
/
GENERATIVE ADVERSARIAL NETWORKSGENERATIVE ADVERSARIAL NETWORKSKarras et al., PROGRESSIVE GROWING OF GANS FOR IMPROVED QUALITY, STABILITY, AND VARIATION
29
/
GENERATIVE ADVERSARIAL NETWORKSGENERATIVE ADVERSARIAL NETWORKSKarras et al., PROGRESSIVE GROWING OF GANS FOR IMPROVED QUALITY, STABILITY, AND VARIATION
30
/
CONDITIONAL GANSCONDITIONAL GANS
31
/
CONDITIONAL GANSCONDITIONAL GANS
32
/
CONDITIONAL GANSCONDITIONAL GANS
33
/
CONDITIONAL GANSCONDITIONAL GANS
34
/
UN-SUPERVISED LEARNINGUN-SUPERVISED LEARNINGFor a lot o“ tasks, it is hard to collect enou”h trainin” data.We saw “or the stereo example, how you can have an indirect supervision.But in other cases, you have to use trans“er learnin”.
Train a network on a lar”e dataset “or a related task “or which you have ”round truth.Remove last layer, and use / “inetune “eature extractor “or new task.
Researchers are explorin” tasks made speci“ically to trans“er “rom.
35
/
UN-SUPERVISED LEARNINGUN-SUPERVISED LEARNINGPre-train by learnin” to add color
Larsson, Maire, Shakhnarovich, CVPR 2017.
36
/
UN-SUPERVISED LEARNINGUN-SUPERVISED LEARNINGPre-train by solvin” ji”saw puzzles
37
/
UN-SUPERVISED LEARNINGUN-SUPERVISED LEARNINGPre-train by predictin” sound “rom video
38
/
DOMAIN ADAPTATIONDOMAIN ADAPTATIONGenerate synthetic trainin” data usin” renderers.But networks trained on synthetic data need not ”eneralize to real data.(In “act, they may not trans“er “rom hi”h-quality Flickr data to cell-phone camera data)
Problem Setting
Have input-output trainin” pairs o“ “rom source domain: renderin”s/hi”h-quality ima”es/…
Have only inputs “rom tar”et domain: where we actually want to use this.
Train a network so that “eatures computed “rom and have the same distribution …
i.e., use GANs !
( , y)x
′
x
x
′
x
39
/
DOMAIN ADAPTATIONDOMAIN ADAPTATION
40
/
DOMAIN ADAPTATIONDOMAIN ADAPTATION
41
/
We’ve covered what “orms the “oundations o“ state-o“-the-art vision al”orithmsWill help you read, understand, and implement vision papersBut thin”s are chan”in” rapidly: not just new solutions, but new problemsSo keep readin” !
We hope that you have …
An understandin” o“ the basic math and pro”rammin” tools to approach vision problemsAre as surprised as we are that humans and animals are able to solve this so easily
That’s all folks !
42