(dl hacks輪読) variational dropout and the local reparameterization trick
TRANSCRIPT
-
Variational Dropout and the Local Reparameterization TrickDiederik P.Kingma, Tim Salimans and Max Welling
-
Submitted on 8 Jun 2015(arXiv) 7/17
Dropout = local reparameterization trick
(SGVB) EM
-
EM
(SGVB)
Variational Dropout and the Local Reparameterization Trick
-
EM
(SGVB)
Variational Dropout and the Local Reparameterization Trick
-
EM EM
q(z)
q(z) q(z)
-
EM
(SGVB)
Variational Dropout and the Local Reparameterization Trick
-
z p(z)p(x|z)
z
-
p(x) p(x)=p(z)p(z|x)dz
p(z|x) p(z|x)=p(x|z)p(z)/p(x) EM
-
q(x|z) p(x|z)
-
reparameterization trick
reparameterization trick
L(,;x) =
q(z|x) logp(x, z)
q(z|x)dz
=
q(z|x) log
p(z)p(x|z)q(z|x)
dz
=
q(z|x) log
p(z)
q(z|x)dz+
q(z|x) log p(x|z)dz
=
q(z|x) logq(z|x)p(z)
dz+
q(z|x) log p(x|z)dz
= DKL(q(z|x)||p(z)) + Eq(z|x)[log p(x|z)] (6)
2 (SGVB),. , Eq(z|x)[f(z)],.
1. q(z|x) {z(l)}Ll=1 .2. z(l) ,.
Eq(z|x)[f(z)] 1
L
L
l=1
f(z(l)) (7)
,z q(z|x), g(,x) (z = g(,x)). ,p()., (7).
Eq(z|x,)[f(z)] =
q(z|x,)f(z)dz
=
p()f(z)d ( q(z|x,)dz = p()d)
=
p()f(g(,x))d
= Ep()[f(g(,x))] 1
L
L
l=1
f(g((l),x))
(l) p() (8)
(5), LA(q, ;x).
LA(q, ;x) = 1L
L
l=1
log p(x, z(l)|) log q(z(l)|x,)
z(l) = g((l),x), (l) p() (9)
, (SGVB)., (6) SGVB. (6) KL,., (6)
2
L(,;x) =
q(z|x) logp(x, z)
q(z|x)dz
=
q(z|x) log
p(z)p(x|z)q(z|x)
dz
=
q(z|x) log
p(z)
q(z|x)dz+
q(z|x) log p(x|z)dz
=
q(z|x) logq(z|x)p(z)
dz+
q(z|x) log p(x|z)dz
= DKL(q(z|x)||p(z)) + Eq(z|x)[log p(x|z)] (6)
2 (SGVB),. , Eq(z|x)[f(z)],.
1. q(z|x) {z(l)}Ll=1 .2. z(l) ,.
Eq(z|x)[f(z)] 1
L
L
l=1
f(z(l)) (7)
,z q(z|x), g(,x) (z = g(,x)). ,p()., (7).
Eq(z|x,)[f(z)] =
q(z|x,)f(z)dz
=
p()f(z)d ( q(z|x,)dz = p()d)
=
p()f(g(,x))d
= Ep()[f(g(,x))] 1
L
L
l=1
f(g((l),x))
(l) p() (8)
(5), LA(q, ;x).
LA(q, ;x) = 1L
L
l=1
log p(x, z(l)|) log q(z(l)|x,)
z(l) = g((l),x), (l) p() (9)
, (SGVB)., (6) SGVB. (6) KL,., (6)
2
-
(SGVB)
(SGVB)
1reconstruction error2
-
(SGVB) N
MSGVB
(M=100)L1
1
SGD
-
(SGVB)
1. M 2. 3. 4. 5.
-
SGVB
zx reparamaterization trick
-
SGVB
Hinton
MCMC1
Deep Learning
-
EM
(SGVB)
Variational Dropout and the Local Reparameterization Trick
-
KL
-
SGVB SGVB
-
SGD SGD
M
-
local reparameteraization trick
f()
0
local reparameterization trick
-
reparameteraization trick
0
10001000M
local reparameteraization trick
B1000
1000 = A
1000
M W
1000
1000
-
local reparameteraization trick
B1000
M = A
1000
M W
1000
1000
B
localreparameteraization trick 0M1000
local
-
01
p
-
independent weight noise
N(1,)b
Wang and Manning (2013) B
B=AWWlocal reparameterizaiton trick
-
correlated weight noise
B
local reparameterizaiton trick
W
-
dropout posterior Dropout
KL
scale invariant log-uniform prior
-
1
-
standard binary dropout Gaussian dropout type A (A) Gaussian dropout type B (B) variational dropout type A variational dropout type B
MNIST
fully connected3 rectified linear units(ReLUs) dropout rate: input layer p=0.2, hidden layers p=0.5 early stopping
-
variational dropout type B
dropout
dropout
-
SGVB
local reparameterizationSGVBepoch SGVB1635sec SGVB7.4sec
local reparameterizaiton200
-
A2KL
-
local reparameterization trick globallocal
local reparameterization trick variational dropout