deep boltzman machines paper by : r. salakhutdinov, g. hinton presenter : roozbeh gholizadeh
TRANSCRIPT
Deep Boltzman machinesPaper by : R. Salakhutdinov, G. Hinton
Presenter : Roozbeh Gholizadeh
Outline
Problems with some other methods!
Energy based models
Boltzmann machine
Restricted Boltzmann machine
Deep Boltzmann machine
Problems with other methods!
Supervised learning need labeled data.
Amount of information restricted by labels!
Finding and knowing abnormalities before ever seeing them such as some conditions in a nuclear power plant.
So Instead of learning p(label | data) learn p(data)
Energy Based Models
Some Energy function is defined. Energy function shows score (scalar value) assigned to a configuration.
Ex. , Boltzman (Gibbs) Distribution.
, integral of numerator over all observations.
Parameters that lead to lower energy are desired.
Boltzmann machine
Markov random field (MRF) with hidden variables.
Undirected edges representing dependency. Weights can be assigned.
Conditional distributions over hidden and visible units:
Learning process
Parameters update:
Exact maximum likelihood learning is intractable.
Use Gibbs sampling to approximate.
Run 2 separate Markov chains to approximate them.
Restricted Boltzmann Machine
Setting .
Without visible-visible and hidden-hidden connections!
Learning carried out efficiently using Contrastive Divergence (CD)
Or Stochastic approximation procedure (SAP)
Variational Approach to estimating data-dependent expectations.
Stochastic approximation procedure (SAP)
and : current parameters and state
and updated sequentially as :
Given , a new state sampled from a transition operator that leaves invariant.
New parameter obtained by replacing intractable model’s expectation by expectation with respect to
Learning rate has to decrease with time, for example by .
Why go deep?
Why go deep?
Deep architectures are representationally efficient, fewer computational units for same function.
Allow for showing a hierarchy.
Non-local generalization
Easier to monitor what is being learn
and guide the machine.
Deep Boltzmann Machine
Undirected connection between all layers.
Conditional distributions over visible and hidden:”
Pretraining (greedy layerwise)
MNIST dataset
NORB
Misclassification Error rate:DBM : 10.8% , SVM:11.6% , logistic regression: 22.5% , K-nearest neighbors : 18.4%
Thank you!