bayesian dark knowledge and matrix factorization

18
Bayesian Dark Knowledge and Matrix Factorization Masatoshi Uehara Mentor: Oono Kenta, Brian Vogel October 27, 2016

Upload: preferred-infrastructure-preferred-networks

Post on 16-Apr-2017

1.757 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Bayesian Dark Knowledge and Matrix Factorization

Bayesian Dark Knowledge and Matrix Factorization

Masatoshi UeharaMentor: Oono Kenta, Brian Vogel

October 27, 2016

Page 2: Bayesian Dark Knowledge and Matrix Factorization

Contents

1 Introduction

2 Bayesian Dark Knowledge with various SG-MCMC methods

3 Matrix Factorization

(JPN) Masatoshi October 27, 2016 2 / 18

Page 3: Bayesian Dark Knowledge and Matrix Factorization

Introduction

Introduction

SG-MCMC is a sampling algorithm towards large data.

We apply a variety of SG-MCMC methods to Bayesian DarkKnowledge.

We combine GANs with Bayesian Dark Knowledge.

We apply SG-MCMC and neural networks to matrix factorization.

(JPN) Masatoshi October 27, 2016 3 / 18

Page 4: Bayesian Dark Knowledge and Matrix Factorization

Introduction

SGLD

SGLD

SGLD is a method combining with SGD and MLA(a samplingalgorithm)

θt+1 ← θt − εtD∇U(θt) + N(0, 2εD)

In the case of Bayesian Neural Network, the formula is as follows:

∆θt =εt2

(∇ log p(θt) +

N

n

∑∇ log p(yti |xti , θt)

)+ ηt , ηt ∼ N(0, εt).

Note that the noise term is removed in SGD.

(JPN) Masatoshi October 27, 2016 4 / 18

Page 5: Bayesian Dark Knowledge and Matrix Factorization

Bayesian Dark Knowledge with various SG-MCMC methods

Bayesian Dark Knowledge Overview

Overview

Bayesian Dark knowledge is a method of combining SGLD with theconcept of distillation.

SGLD is a useful method for learning Bayeisian Deep Networks.

The problem is that SGLD needs to archive many copies ofparameters.

The motivation is replacing a set of neural networks with a singledeep network.

We can estimate the confidence rate even if data number is small.

(JPN) Masatoshi October 27, 2016 5 / 18

Page 6: Bayesian Dark Knowledge and Matrix Factorization

Bayesian Dark Knowledge with various SG-MCMC methods

Method

Teacher networks is denoted as p(y |x ,DN).Student network is denoted as S(y |x , ω).

In the distillation phase, the followingequation is minimized.

Distillation loss

L(ω) =

∫p(ω|x)p(x)

≈ 1

Θ

1

D ′

∑θ∈Θ

∑x ′∈D′

p(y |x ′, θ)[S(y |x ′, ω)]dx

(JPN) Masatoshi October 27, 2016 6 / 18

Page 7: Bayesian Dark Knowledge and Matrix Factorization

Bayesian Dark Knowledge with various SG-MCMC methods

Algorithm

Algorithm

Note that the student network is trained online. We do not have toarchive many copies of parameters.

(JPN) Masatoshi October 27, 2016 7 / 18

Page 8: Bayesian Dark Knowledge and Matrix Factorization

Bayesian Dark Knowledge with various SG-MCMC methods

How to improve?

We want to make a variety of teachers.

Use other SG-MCMC methods.

How to make unlabeled data set?

Use GANs.

(JPN) Masatoshi October 27, 2016 8 / 18

Page 9: Bayesian Dark Knowledge and Matrix Factorization

Bayesian Dark Knowledge with various SG-MCMC methods

SG-HMC and SG-NHT

SG-HMC

θt+1 ← θt + εM−1rt

rt+1 ← rt − εt∇U(θt)− εtCM−1rt + N(0, εt(2C − εtBt))

SG-NHT

θt+1 ← θt + εrt

rt+1 ← rt − εt∇U(θt)− εtζtrt + N(0, εt(2C − εtBt))

ζt+1 ← ζt + (1

drTt rt − 1)

(JPN) Masatoshi October 27, 2016 9 / 18

Page 10: Bayesian Dark Knowledge and Matrix Factorization

Bayesian Dark Knowledge with various SG-MCMC methods

Bayesian Dark Knowledge with GANs

GANs can mimic the empiricaldistribution.

In the distillation phase, we use GANsas a simulator.

How to remove poor images....

(JPN) Masatoshi October 27, 2016 10 / 18

Page 11: Bayesian Dark Knowledge and Matrix Factorization

Bayesian Dark Knowledge with various SG-MCMC methods

Anormaly detection by GANs

uLSIF

GAN

(JPN) Masatoshi October 27, 2016 11 / 18

Page 12: Bayesian Dark Knowledge and Matrix Factorization

Bayesian Dark Knowledge with various SG-MCMC methods

Result : MNIST

Setting: 800 labeled samples in MNIST, Epoch: 2000, Burn-inintervals:200, Thinning intervals:5.

Network 784-1200-1200-10, Activation: Relu

Result

(JPN) Masatoshi October 27, 2016 12 / 18

Page 13: Bayesian Dark Knowledge and Matrix Factorization

Matrix Factorization

Matrix Factorization

Rating matrix is given.

ui ....user feature, vj ...itemfeature , Rij ... rating matrix.

When learning, use SGD.

ui+1 ← ui −∇ui [(Ri ,j − uTi vj)2 + λu2

i ]

vj+1 ← vj −∇vj [(Ri ,j − uTi vj)2 + λv2

j ]

(JPN) Masatoshi October 27, 2016 13 / 18

Page 14: Bayesian Dark Knowledge and Matrix Factorization

Matrix Factorization

Matrix Factorization with SGLD

p(R|U,V , τ) =L∏

i=1

M∏j=1

[N(Rij |UTi Vj , τ

−1]Iij

p(U|λU) =L∏

i=1

N(Ui |0, λ−1U )

p(V |λV ) =M∏j=1

N(Vj |0, λ−1V )

λUd∼ Gamma(α0, β0)

λVd∼ Gamma(α0, β0)

Use Gibbs Sampling.

When updating u and v, SGLDis used.

λ is automatically tuned.

(JPN) Masatoshi October 27, 2016 14 / 18

Page 15: Bayesian Dark Knowledge and Matrix Factorization

Matrix Factorization

Neural Network Matrix Factorization

Estimate Xn,m by the equation:Xn,m = fθ(Un,Vm)

Cost function:∑(Xn,m − Xn,m)2 + λ[

∑‖Un‖2

2 +∑‖Vm‖2

2]

Update θ, un, vm at the same time

NNMF can reach state of the art accuracy....

(JPN) Masatoshi October 27, 2016 15 / 18

Page 16: Bayesian Dark Knowledge and Matrix Factorization

Matrix Factorization

Results

Use ML-100K, ML-1M data set.

Evaluate by root mean square method(RMSE).

Unfortunately, state of the art accuracy is not reproduced....

(JPN) Masatoshi October 27, 2016 16 / 18

Page 17: Bayesian Dark Knowledge and Matrix Factorization

Matrix Factorization

Discussion

Does data generated by GANs help classifiers?

What is a good method of combining Neural Networks with matrixfactorization?

(JPN) Masatoshi October 27, 2016 17 / 18

Page 18: Bayesian Dark Knowledge and Matrix Factorization

Matrix Factorization

References Papers

Large-Scale Distributed Bayesian Matrix Factorization usingStochastic Gradient MCMC

Neural Network Matrix Factorization

A Complete Recipe for Stochastic Gradient MCMC

Bayesian Dark Knowledge

Probabilistic Matrix Factorization

(JPN) Masatoshi October 27, 2016 18 / 18