structured probabilistic models for deep learningjcid/mlg/mlg2018/lorenaalvarez_mlg... · 2018. 3....
TRANSCRIPT
Lorena Álvarez Pérez Machine Learning Group (MLG)
March 5, 2018
Structured Probabilistic
Models for Deep Learning
Lectures slides for Chapter 16 of Deep Learning www.deeplearningbook.org
Ian Goodfellow
Deep Learning Book Chapter 16: Graphical Models March 5, 2018
MLG: Semi-‐supervised Learning November 15, 2017 2/58
Index
0. Overview 1. The Challenge of Unstructured Modeling 2. Using graphs to describe model structure 3. Sampling from Graphical models 4. Advantages of Structured Modeling 5. Learning about Dependencies 6. Inference and Approximate Inference 7. The Deep Learning Approach to Structured Probabilistic
Models
Role of probabilistic structured models in deep learning � For many tasks (other than classification), a full
representation of the probability distribution of variables is needed - e.g., denoising, missing value imputation, sampling, etc.
� Structured models (also known as probabilistic graphical models, PGMs) provide compact representations - When compared to full (unstructured) probability distributions
March 5, 2018 3/58
Deep Learning Book: Chapter 16
0. Overview
What are structured probabilistic models? � A way of describing probability distributions using a
graph to describe which variables interact with each other directly
� Graph is used in the sense of graph theory - Vertices connected to one another by edges
� Because structure is described by a graph, they are are called graphical models
In deep learning, different model structures, learning algorithms and inference procedures are used!
March 5, 2018 4/58
Deep Learning Book: Chapter 16
0. Overview
• The goal of deep learning is to scale machine learning to the kinds of challenges needed to solve artificial intelligence – e.g., understand natural images, audio waveforms representing speech, etc.
• The classification task of machine learning is a limited goal – They take input from a rich high-dimensional distribution and summarize
it with a categorical label – They discard most of the input
– They produce a single output • Or a probability distribution over values of that single output
It is possible to ask probabilistic models to do many
other tasks!
March 5, 2018 5/58
Deep Learning Book: Chapter 16
1. The Challenge of Unstructured Modeling
Probabilistic models for other tasks � They are more expensive than classification
� They require producing many output values
� They require complete understanding of structure of entire input without ignoring sections of it
� Some tasks are: 1) Density estimation
2) Denoising
3) Missing value imputation
4) Sampling
March 5, 2018 6/58 Deep Learning Book: Chapter 16
1. The Challenge of Unstructured Modeling
Example: Probabilistic modeling of natural images Generate new samples from a distribution p(x)
March 5, 2018 7/58 Deep Learning Book: Chapter 16
1. The Challenge of Unstructured Modeling
Intractability of rich distribution • It is a challenging task, both computationally and statistically
• Consider a 32x32x3 binary image – There are 23072 possible images
• If we have n discrete variables with k possible values each, naïve approach of representing p(x) requires storing a table with kn values!!
This is not feasible!
March 5, 2018 8/58 Deep Learning Book: Chapter 16
1. The Challenge of Unstructured Modeling
Intractability of rich distribution • Memory • Statistical efficiency • Runtime: Cost of inference
– If we have p(x) and need to infer p(x1) or p(x2|x1) requires summing across the entire table
• Runtime: Cost of sampling
Table-based approach models every possible interaction between variables, but usually variables
influence each other only inderectly
March 5, 2018 9/58 Deep Learning Book: Chapter 16
1. The Challenge of Unstructured Modeling
Direct and indirect interaction � Consider modeling finishing times in a relay. The team has
three runners: Alice, Bob and Carol. � Alice hands baton to Bob, Bob hands to Carol who finishes the
lap - Alice’s finishing does not depend on anyone else, Bob’s
finishing time depends on Alice’s and Carol’s depends on Bob’s
- Carol’s finishing time depends only indirectly on Alice’s
- If we already know Bob’s finishing time, we will not be able to better estimate Carol’s finishing time by finding out what Alice’s finishing time was
March 5, 2018 10/58
2. Using Graphs to Describe Model Structured
2.1 Directed Models
Deep Learning Book: Chapter 16
Using graphs to describe model structure • Each node represents a random variable
• Each edge represents a direct interaction - These direct interactions imply other indirect interactions
- But only direct interactions need to be represented
• Graphical models can be largely divided into two categories 1) Models based on direct acyclic graphs
2) Models based on undirected graphs
March 5, 2018 11/58
2. Using Graphs to Describe Model Structured
2.1 Directed Models
Deep Learning Book: Chapter 16
Direct Graphical Model • Also called belief network or Bayesian network
• In the relay race example…
Bob’s finishing time t1 depends on Alice’s finishing time t0 and Carol’s finishing time t2 depends on Bob’s finishing time t1
What does the arrow represent?
March 5, 2018 12/58
2. Using Graphs to Describe Model Structured
2.1 Directed Models
Deep Learning Book: Chapter 16
t0 t1 t2
Alice Bob Carol
Meaning of directed edges Drawing an arrow from “a” to “b” means we define a conditional probability distribution (CPD) over “b” via a conditional distribution. with “a” as one of the variables on the right side of the conditional bar
- i.e., distribution over “b” depends on the value of “a”
March 5, 2018 13/58
2. Using Graphs to Describe Model Structured
2.1 Directed Models
Deep Learning Book: Chapter 16
a b p(b|a)
Formal direct graphical model • A directed graphical acyclic graph ( ) defined on variables x
needs: – A set of vertices which represent the random variables in the
model
– A set of local CPDs:
– The probability distribution over x is given by
– In the relay race example:
March 5, 2018 14/58
2. Using Graphs to Describe Model Structured
2.1 Directed Models
Deep Learning Book: Chapter 16
G
p(x) =�
i
p(xi|PaG(xi))
This expression gives the parents of xi in Gp(xi|PaG(xi))
p(t0, t1, t2) = p(t0)p(t0|t1)p(t2|t1)
Savings achieved by direct model • As an example: If t0, t1 and t2 are discrete with 100 values,
the single table would require 999999 values - By making tables for only conditional probabilities, we need only
18999 values (it is reduced by a factor of more than 50)
• The cost of a single table for modeling n discrete variables each having k values is
• If m is the maximum no. of variables appearing (on either side of the conditioning bar) in a single CPD, the cost of the tables for direct model is – As long as m << n, very dramatic saving are got!
March 5, 2018 15/58
2. Using Graphs to Describe Model Structured
2.1 Directed Models
Deep Learning Book: Chapter 16
O(kn)
O(km)
• Also known as Markov random fields o Markov networks • They use graphs whose edges are undirected and have no
CPD – Direct models work best when influence clearly flows in one
direction
– Undirect models work best when influence has no clear direction or is best modeled as flowing in both directions
Let’s see an example!
March 5, 2018 16/58
2. Using Graphs to Describe Model Structured
2.2 Undirected models
Deep Learning Book: Chapter 16
Example: The health undirected model • Consider a model over three binary variables
• Whether or not you are sick ( )
• Whether or not your coworker is sick ( )
• Whether or not your roommate is sick ( )
• Let us assume coworker and roommate do not know each other, very unlikely one of them will give a cold to the other (we do not model it)
• There is no clear directionality either undirected model
March 5, 2018 17/58
2. Using Graphs to Describe Model Structured
2.2 Undirected models
Deep Learning Book: Chapter 16
hy
hc
hr
Example: The health undirected graph
• You and your roommate may infect each other with a cold
• You and your coworker may do the same
• Let us assume your roommate and colleague do not know
March 5, 2018 18/58
2. Using Graphs to Describe Model Structured
2.2 Undirected models
Deep Learning Book: Chapter 16
hyhc hr
Does your roommate have a cold?
Do you have a cold?
Does your coworker have a cold?
Formal undirected graphical model • An undirected probabilistic graphical model is defined on a
graph ( ) – For each clique in the graph, a factor (also called a
clique potential) measures the affinity of the variables for being in each of their joint states
– Together, they define an unnormalized probability distribution
March 5, 2018 19/58
2. Using Graphs to Describe Model Structured
2.2 Undirected Models
Deep Learning Book: Chapter 16
GC φ(C)
A subset of nodes all connected to each other
�p(x) =�
C∈Gφ(C)
Example: This graph (with five cliques) implies that
March 5, 2018 20/58
2. Using Graphs to Describe Model Structured
2.2 Undirected Models
Deep Learning Book: Chapter 16
a b c
d e f
p(a, b, c, d, e, f) =1Z
φa,b(a, b)φb,c(b, c)φa,d(a, d)φb,e(b, e)φe,f (e, f)
p(a, b, c, d, e, f)∼
• The unnormalized probability distribution : – It is guaranteed to be non-negative
– It is not guaranteed to sum or integrate to 1
• To obtain a valid probability distribution, we must normalize the distribution (Gibbs distribution)
• Obviously:
– Z is a constant when the ϕ functions are constants
– If ϕ has parameters, then Z is a function of those parameters
March 5, 2018 21/58
2. Using Graphs to Describe Model Structured
2.3 The Partition Function
Deep Learning Book: Chapter 16
p(x)∼
p(x) =1Z
�p(x) �p(x) =�
C∈Gφ(C)
Z =�
�p(x)dx
Partition function
Intractability of Z • Since Z is an integral or sum of all possible values of x,
it is often intractable to compute
• In order to compute a normalize probability of an undirected model: – Model structure and definitions of ϕ functions must be conductive to
computing Z efficiently
– In deep learning, Z is intractable and we must resort to approximations
March 5, 2018 22/58
2. Using Graphs to Describe Model Structured
2.3 The Partition Function
Deep Learning Book: Chapter 16
Difference between directed & undirected • Directed models are:
– Defined directly in terms of probability distributions
• Undirected models are: – Defined loosely in terms of ϕ functions which must then
converted into probability distributions
– Domain of variables has a dramatic effect on kind of probability distributions given a set ϕ functions corresponds to
March 5, 2018 23/58
2. Using Graphs to Describe Model Structured
2.3 The Partition Function
Deep Learning Book: Chapter 16
• Many interesting theoretical results of undirected graphs depend on the assumption:
• We can enforce this using an energy-based model
– is known as the energy function
– Since no energy function will result in probability of zero for any state of x
• Any distribution of the form is referred to as a Boltzmann distribution
March 5, 2018 24/58
2. Using Graphs to Describe Model Structured
2.4 Energy-Based Models
Deep Learning Book: Chapter 16
�p(x) > 0 ∀x�
�p(x) =�
C∈Gφ(C)
�
!p(x) = exp !E(x)( )E(x)
exp(z)> 0 !z,
!p(x) = exp !E(x)( )
• Cliques in the undirected graph correspond to factors in the unnormalized probability function – Since , different cliques in undirected graph
correspond to different terms of the energy function • Exponentation makes each term of the energy function correspond to a factor for a
different clique
• i.e., energy-based model is a special Markov network
• This graph (with five cliques) implies that:
March 5, 2018 25/58
2. Using Graphs to Describe Model Structured
2.2 Undirected models
Deep Learning Book: Chapter 16
exp(a)exp(b) = exp(a+ b)
a b c
d e f
E(a,b,c,d,e, f ) = Ea,b(a,b)+Eb,c (b,c)+Ea,d (a,d)++Eb,e(b,e)+Ee, f (e, f )
ϕ functions are obtained by setting each ϕ to the exponential of the
corresponding negative energy !a,b(a,b) = exp(!E(a,b))
• This graph (with five cliques) implies that:
• ϕ functions are obtained by setting each ϕ to the exponential of the corresponding negative energy
March 5, 2018 26/58
2. Using Graphs to Describe Model Structured
2.2 Undirected models
Deep Learning Book: Chapter 16
a b c
d e f
E(a,b,c,d,e, f ) = Ea,b(a,b)+Eb,c (b,c)+Ea,d (a,d)+Eb,e(b,e)+Ee, f (e, f )
!a,b(a,b) = exp !E(a,b)( )
Separation in undirected models • Identifying conditional independences is very simple
– Conditional independence implied by the graph is called separation
• A set of variables A is separated from variables B given a third set of variables S if the graph structure implies that A is independent from B given S
• If two variables “a” and “b” are connected by a path involving only unobserved variables, then they are not separated – If no path exists between them, or all paths contain an observed
variable they they are separated
March 5, 2018 27/58
2. Using Graphs to Describe Model Structured
2.5 Separation and D-separation
Deep Learning Book: Chapter 16
Separation in undirected models: Example • “b” is shaded to indicate that it is observed
• “b” blocks path from “a” to “c”, so “a” and “c” are separated given “b”
• There is an active path from “a” to “d”, so “a” and “d” are not separated given “b”
March 5, 2018 28/58
2. Using Graphs to Describe Model Structured
2.5 Separation and D-separation
Deep Learning Book: Chapter 16
a
c
d
b
Separation in directed models • In the context of directed graphs, these separations concepts
are called d-separation • D-separation is defined as the same as separation for
undirected graphs: – A set of variables A is separated from variables B given a third set of
variables S if the graph structure implies that A is independent from B given S
• Two variables are dependent if there is an active path between them. If not, they are d-separated.
• In directed nets determining whether a path is active is more complicated
March 5, 2018 29/58
2. Using Graphs to Describe Model Structured
2.5 Separation and D-separation
Deep Learning Book: Chapter 16
All active paths (of length 2) in direct models between “a” and “b”
March 5, 2018 30/58
2. Using Graphs to Describe Model Structured
2.5 Separation and D-separation
Deep Learning Book: Chapter 16
V-structure or the collider case The explaining away effect
(e.g., relay race) A common cause s
March 5, 2018
2. Using Graphs to Describe Model Structured
2.6 Converting between Undirected and Directed Graphs
Deep Learning Book: Chapter 16
• No probabilistic model is inherently directed or undirected – Some models are easily described using a direct graph, or most easily
described using an undirected graph
• Direct models and undirected models have both their advantages and disadvantages – The choice will partially depend on which probability distribution we
wish to describe – Which approach can capture the most independences in the
probability distribution or which approach uses the fewest edges
• Every probability distribution can be represented by either a directed model or by an undirected model – Worst case: “complete graph”
31/58
“Complete graphs” § Directed models:
• Any directed acyclic graph where we impose some ordering in the random variables • Each variable has all other variables that precede it in the ordering as its ancestors
in the graph
§ Undirected models: • A graph containing a single clique encompassing all of the variables
March 5, 2018 32/58
2. Using Graphs to Describe Model Structured
2.6 Converting between Undirected and Directed Graph
Deep Learning Book: Chapter 16
Undirected model Directed model
Not useful because they do not imply any independences!
Converting a direct model into an undirected model • We need to create a new graph • Looking at graph :
– For every pair of variables “x” and “y”, we add an undirected edge connecting “x” and “y” to if there is a directed edge between or if “x” and “y” are both parents of a third variable “z”
• The resulting graph ( ) is known as a moralized graph
March 5, 2018 33/58
2. Using Graphs to Describe Model Structured
2.6 Converting between Undirected and Directed Graph
Deep Learning Book: Chapter 16
DU
U
D
U
Examples of converting direct models to undirected models
March 5, 2018 34/58
2. Using Graphs to Describe Model Structured
2.6 Converting between Undirected and Directed Graph
Deep Learning Book: Chapter 16
Undirected model
Directed model
Converting an undirected model to a directed model • A loop is a sequence of variables connected by undirected
edges, with the last variable connected to the first one in the sequence
• A chord is a connection between any two non-consecutive variables in the sequence defining a loop
• We can not create a directed model if the graph have loops of length four or greater – Solution: To add edges to triangulate long loops (the new graph is
known as chordal or triangulated graph)
• Finally, it is necessary to assign directions to the edges – No direct cycles are allowed!
March 5, 2018 35/58
2. Using Graphs to Describe Model Structured
2.6 Converting between Undirected and Directed Graph
Deep Learning Book: Chapter 16
Examples of converting an undirected model to a directed one
March 5, 2018 36/58
2. Using Graphs to Describe Model Structured
2.6 Converting between Undirected and Directed Graph
Deep Learning Book: Chapter 16
Undirected model Directed model
No loops of length greater than three
are allowed!
Edges are added to triangulate long loops
To assign direc4ons to edges (no direct cycles
are allowed!)
• Factor graphs resolve an ambiguity in the graphical representation of standard undirected models – Ambiguity arises because it is not clear if each clique actually has a
corresponding factor whose scope encompasses the entire clique
• A factor graph is a graphical representation of an undirected model that that consists of a bipartite undirected graph – Some of the nodes are drawn as circles
• They correspond to random variables in the standard undirected model
– The rest of the nodes are drawn as squares • They correspond to the factors of the unnormalized probability distribution
• A variable and a factor are connected if the variable is one of the arguments to the factor
– No factor may be connected to another factor in the graph, nor can a variable be connected to a variable
March 5, 2018 37/58
2. Using Graphs to Describe Model Structured
2.7 Factor graphs
Deep Learning Book: Chapter 16
Example of how a factor graph can resolve ambiguity
March 5, 2018 38/58
2. Using Graphs to Describe Model Structured
2.7 Factor graphs
Deep Learning Book: Chapter 16
Undirected graph: Is this tree pairwise
potentials or one potenTal over three variables?
This factor graph has one factor over all three variables
This factor graph has three factors (each over
only two variables)
March 5, 2018 39/58
3. Sampling from Graphical Models
3.1 Direct models
Deep Learning Book: Chapter 16
• In direct graphical models, the ancestral sampling method can produce samples from the joint distribution represented by the model
• How does ancestral sampling works? – Sorting the variables into a topological ordering, so that for all i and
j, j is greater than i if is a parent of . – The variables can be sampled in this order.
• First, we sample: • Then, sample: • … • Finally, we sample:
• It does not support every conditional sampling operation – To sample from a subset of variables in a directed graphical model,
given other variables, requires that all the contioning variables come earlier than the variables to be sampled in the ordered graph.
xix jxi
x1 ∼ P (x1)P (x2|PaG(x2))
P (xn|PaG(xn))
March 5, 2018 40/58
3. Sampling from Graphical Models
3.2 Undirected models
Deep Learning Book: Chapter 16
• Ancestral sampling is applicable only to directed models • We can sample from undirected models by converting them to
directed models – It involves solving intractable problems – Or introducing so many edges that the resulting directed model
becomes intractable • Gibbs sampling: The conceptually approach for drawing
samples from an undirected graph – Suppose we have a graphical model over a n-dimensional vector of
random variables x – We iteratively visit each variable xi and draw a sample conditioned on
all the other variables, i.e., – Asymptotically, after many repetitions, process converges to sampling
from correct distribution • Difficulty to determine when the samples have reached a sufficiently accurate
p(xi x!i )
March 5, 2018 41/58
4. Advantages of Structured Modeling
Deep Learning Book: Chapter 16
• To dramatically reduce the cost of representing probability distributions as well as learning as inference – By assuming each node has a tabular distribution given its parents,
memory, sampling, inference are now exponential in number of variables in factor with largest scope • For many interesting models, this is very small • e.g., RBMs: all factor scopes are size 2 or 1
– Previously, these costs were exponential in total number of nodes – Statistically, much easier to estimate this manageable number of
parameters
March 5, 2018 42/58
5. Learning about Dependencies
Deep Learning Book: Chapter 16
• A good generative model needs to accurately capture the distribution over the visible variables v – The different elements of v are highly dependent on each other – In deep learning, the dependencies are modeled by introducing latent
variables h – A good model of v which did not contain any latent variables will need
to have: • A very large number of parents per node in a Bayesian network • A very large number of cliques in a Markov network
Highly cost in both computational and statistical sense!
March 5, 2018 43/58
5. Learning about Dependencies
Deep Learning Book: Chapter 16
• When the model is intended to capture dependencies between visible variables with direct connections, it is usually infeasible to connect all variables – The graph must be designed to connect all those variables that are tightly
coupled and omit edges between other variables • Structure learning algorithms perform greedy search
• Using latent variables, instead of adaptive structure avoids to perform discrete searches and multiple rounds of training – Use one graph structure – Many latent variables – Dense connections of latent variables to observed variables – Parameters learn that each latent variable interacts strongly with only a
small subset of observed variables
March 5, 2018 44/58
6. Inference and approximate inference
Deep Learning Book: Chapter 16
Inference • Ask questions about how variables are related to each other
– i.e., given a set of medical tests, we can ask what disease a patient might have – In a latent variable model, we want to extract features describing the
observed variables v – Solve such problems in order to perform other task
• We want to compute to determine
• These are inference problems – Predict variables given other variables – Predict distributions of some variables given values of other variables
E[h | v]
p(h | v) p(v)
March 5, 2018 45/58
6. Inference and approximate inference
Deep Learning Book: Chapter 16
Intractability of Inference • For most interesting deep models, the inference problems are
intractable – Even when we use a structured graphical model to simplify them
• Graphs structures allow to represent complicated high-dimensional distributions with reasonable number of parameters – Resulting graphs are not restrictive enough to allow efficient inference
• Computing the marginal probability is #P hard – NP problems require determining whether a problem has a solution and if so, find it – Problems in #P require counting the number of solutions
• This motivates the use of approximate inference in deep learning – It is usually referred to variational inference
• Approximate a true distribution by another distribution that is close to the true one as possible
p(h | v) q(h | v)
March 5, 2018 46/58
7. The Deep Learning Approach to Structured Probabilistic Models
Deep Learning Book: Chapter 16
• Deep learning does not involve specially deep graphical models
• The main differences in structured probabilistic models in deep learning – Depth – Proportion of observed to latent variables – Latent semantics (meaning of a latent variable) – Connectivity and inference algorithm – Intractability and approximation
March 5, 2018 47/58
7. The Deep Learning Approach to Structured Probabilistic Models
Deep Learning Book: Chapter 16
Depth of a graphical model • Latent variable “hi” is at depth j if shortest path hi from to an
observed variable is j steps • Depth of model is the greatest depth of any such hi
• This kind of depth is different from depth induced by the computational graph • Many generative models used for deep learning have no latent variables
(or only one layer), but use deep computational graphs to define the conditional distributions within a model
March 5, 2018 48/58
7. The Deep Learning Approach to Structured Probabilistic Models
Deep Learning Book: Chapter 16
Proportion of observed/latent variables • Deep learning models typically have more latent variables than
observed variables – They always use of distributed representations
• Even shallow models have a single large layer of latent variables
• Complicated non-linear interactions between variables are accomplished via indirect connections that flow through multiple latent variables
• By contrast, traditional graphical models contain mostly variables that are observed (i.e., few latent variables)
March 5, 2018 49/58
7. The Deep Learning Approach to Structured Probabilistic Models
Deep Learning Book: Chapter 16
Latent variable semantics • Latent variables are designed differently in deep learning • In traditional graphical models, they are designed with specific
semantics in mind – Topic of a document, intelligence of a student, disease causing a
patient’s symptoms, etc.
• In deep learning, they are not designed to take any specific semantics ahead of time – Training algorithm is free to invent concepts needed to model a dataset – Latent variables not easy to interpret after the fact
March 5, 2018 50/58
7. The Deep Learning Approach to Structured Probabilistic Models
Deep Learning Book: Chapter 16
Connectivity • Deep graphical models have large groups of units connected
other large groups of units – Interactions can be described b a single matrix
• Traditional graphical models have few connections and the choice of connections for each variable may be individually designed – The design of the model structure is tightly linked to the choice of
inference algorithm
March 5, 2018 51/58
7. The Deep Learning Approach to Structured Probabilistic Models
Deep Learning Book: Chapter 16
Inference • Traditional graphical models: Tractability of exact inference
– When this is too limiting, a popular approximate approach is loopy belief propagation
– Both approaches work well with sparsely connected graphs
• Models used in deep learning are not sparse – Use either Gibbs sampling or variational inference
• Rather than simplifying model until exact inference is feasible, make model complex enough as long as we can compute a gradient
March 5, 2018 52/58
7. The Deep Learning Approach to Structured Probabilistic Models
Deep Learning Book: Chapter 16
The restricted Boltzmann machine (RBM) • Quintessential example of how graphical models are used for
deep learning • RBM itself is not a deep model
– It has a single layer of latent units that may be used to learn a representation for the input
– RBMs can be used to build many deeper models (Chapter 20)
March 5, 2018
7. The Deep Learning Approach to Structured Probabilistic Models
Deep Learning Book: Chapter 16
• A general Boltzman machine can have arbitrary connections
• Restricted Boltzmann Machine (RBM): First layer is called the visible or,
input layer, and the second is the hidden layer – Bipartite undirected graph
• Used for dimensionality reduction, classification or feature learning
No direct interactions between any two visible units or between any two
hidden units (“restricted”)
53/58
March 5, 2018 54/58
7. The Deep Learning Approach to Structured Probabilistic Models
Deep Learning Book: Chapter 16
RBM Characteristics • Units are organized into large groups called layers • Connectivity between layers is described by a matrix • Connectivity is relatively dense • The model is designed to allow efficient Gibbs sampling • Learn latent variables whose semantics are not specified by the
designer
March 5, 2018
7. The Deep Learning Approach to Structured Probabilistic Models
Deep Learning Book: Chapter 16
Canonical RBM • Energy-based model with binary visible and hidden units
– The model is divided into groups of units v and h and the interaction between them is described by matrix W
• The restrictions on RBM structure yield the properties
and
E(v,h) = !bTv! cTh! vTWh
Unconstrained, real-valued, learnable parameters
p(h | v) = p(hi | v)i!
p(v |h) = p(vi |h)i!
55/58
March 5, 2018 56/58
7. The Deep Learning Approach to Structured Probabilistic Models
Deep Learning Book: Chapter 16
Example: For the binary RBM, we obtain
- Together these properties allow for block Gibbs sampling which
alternate between sampling all h simultaneously and all v simultaneously
P(hi =1| v) =! (vTW:, i + bi )
P(hi = 0 | v) =1!! (vTW:, i + bi )
March 5, 2018 57/58
7. The Deep Learning Approach to Structured Probabilistic Models
Deep Learning Book: Chapter 16
Example: Samples from a trained RBM and its weights (model trained on MNIST data)
Corresponding weight vectors
• Each column is a separate Gibbs process • Each row represents the output of another
1000 steps of Gibbs samples (sucessive samples are highly correlated)
March 5, 2018 57/58
7. The Deep Learning Approach to Structured Probabilistic Models
Deep Learning Book: Chapter 16
How to put RBMs into practice? Tensor implementation of Restricted Boltzman Machine (RBM) and
Autoenconder with layerwise pretraining
hWps://github.com/Cospel/rbm-‐ae-‐Y
Semi-supervised Learning
Thank you very much for
your attention!
MLG: Semi-‐supervised Learning November 15, 2017