rapid introduction to machine learning/ deep learninghichoi/seminar2015/lecture5b.pdf · rapid...
TRANSCRIPT
1/24
1. Objectives of Lecture 5b 2. Markov random field (MRF)
Rapid Introduction to Machine Learning/Deep Learning
Hyeong In Choi
Seoul National University
2/24
1. Objectives of Lecture 5b 2. Markov random field (MRF)
Lecture 5bMarkov random field (MRF)
November 13, 2015
3/24
1. Objectives of Lecture 5b 2. Markov random field (MRF)
Table of contents
1 1. Objectives of Lecture 5b
2 2. Markov random field (MRF)2.1. Basics of MRF2.2. Boltzmann machine2.3. Restricted Boltzmann machine (RBM)
4/24
1. Objectives of Lecture 5b 2. Markov random field (MRF)
1. Objectives of Lecture 5b
Objective 1
Learn minimal MRF formalism that is necessary for theunderstanding of the deep neural network pretraining usingrestricted Boltzmann machine
Objective 2
Learn how the probability structure is encoded in the MRF,especially the energy based formalism of Boltzmann machine
Objective 3
Learn about some basic formalism of restricted Boltzmann machine
5/24
1. Objectives of Lecture 5b 2. Markov random field (MRF)
2.1. Basics of MRF
2. Markov random field (MRF)2.1.Basics of MRF
Terminology
G: undirected graph (not necessarily a tree) in which eachnode represents a random variable
Let Xi be the random variable represented by node i , and letxi be the value of Xi [we frequently confuse node i with xi ]
x = (x1,⋯, xn) a list of the values of all random variables
The joint probability is denoted by
P(x) = P(x1,⋯, xn)
For each node xi , let N (xi) be the neighbor of xi , i.e. N (xi)is the set of nodes connected to xi
6/24
1. Objectives of Lecture 5b 2. Markov random field (MRF)
2.1. Basics of MRF
Definition of MRF
We say P(x) satisfies the Markov property, if
P(Xi = xi ∣Xj = xj , for j ≠ i) = P(Xi = xi ∣Xj = xj , for xj ∈ N(xi))
G with P(x) satisfying the Markov property is called a Markovrandom field (MRF)
Proposition
Let G be a MRF. Let A,B ,C be mutually disjoint sets of nodes ofG. Assume A and B are separated by C, meaning that every pathfrom a node in A to a node in B passes through some node in C,then
P(A,B ∣C) = P(A ∣C)P(B ∣C)
7/24
1. Objectives of Lecture 5b 2. Markov random field (MRF)
2.1. Basics of MRF
i.e. A and B are conditionally independent given C. The converseis obviously true.
Example
8/24
1. Objectives of Lecture 5b 2. Markov random field (MRF)
2.1. Basics of MRF
Gibbs distributions
Definition
Clique is a set of nodes every node of which is connected toevery other node in the set
A probability distribution P(x) is called a Gibbs distribution ifit is of the form
P(x) =∏c∈C
ψc(xc),
where C is the set of maximal cliques and ψc is a non-negativefunction of xc , where xc is the list of variables in the clique c
9/24
1. Objectives of Lecture 5b 2. Markov random field (MRF)
2.1. Basics of MRF
Example
maximal cliques:c1 = {x1, x2, x3}, c2 = {x2, x3, x4}, c3 = {x3, x5}
P(x) = ψ1(x1, x2, x3)ψ2(x2, x3, x4)ψ3(x3, x5)
Theorem (Clifford-Hammersley)
Assume P(x) > 0 for all x . If P(x) is a Gibbs distribution, then Gis an MRF
10/24
1. Objectives of Lecture 5b 2. Markov random field (MRF)
2.2. Boltzmann machine
2.2. Boltzmann machine
G ∶ graph
xi ∈ {1,−1} or xi ∈ {0,1}
11/24
1. Objectives of Lecture 5b 2. Markov random field (MRF)
2.2. Boltzmann machine
E : EnergyE(x) = −∑
i∼j
ωijxixj −∑i
bixi
∑i∼j means the sum over adjacent nodes i and j for i < j
P: Probability
P(x) = 1
Zexp(−λE(x)),
where Z is the partition function given by
Z =∑x
exp(−λE(x))
[We usually set λ = 1]
12/24
1. Objectives of Lecture 5b 2. Markov random field (MRF)
2.3. Restricted Boltzmann machine (RBM)
2.3. Restricted Boltzmann machine (RBM)
Notation
x = (x1,⋯, xd)x = (h1,⋯,hn)(x ,h) = (x1,⋯, xd ,h1,⋯,hn)
13/24
1. Objectives of Lecture 5b 2. Markov random field (MRF)
2.3. Restricted Boltzmann machine (RBM)
Energy
E(x ,h) = −∑wijxihj −∑j
bjxj −∑i
cihi
Probability
P(x ,h) = 1
Zexp(−E(x ,h))
Z =∑x ,h
exp(−E(x ,h)) [= ∫ exp(−E(x ,h))]
Note
The lower the energy, the higher the probabilityIf wij > 0, it is more likely that xj and hi have the same signIf wij < 0, it is more likely that xj and hi have the opposite signIf bj > 0, it is more likely that xj > 0, and so on
14/24
1. Objectives of Lecture 5b 2. Markov random field (MRF)
2.3. Restricted Boltzmann machine (RBM)
Probabilities of RBM
WriteE(x ,h) = −hTWx − bT x − cTh,
where W = (wij),h = [h1,⋯,hn]T , x = [x1,⋯, xd]d
P(x ,h) = 1
Zexp(−hTWx − bT x − cTh)
P(x) =∑h
P(x ,h)
P(h) =∑x
P(x ,h)
15/24
1. Objectives of Lecture 5b 2. Markov random field (MRF)
2.3. Restricted Boltzmann machine (RBM)
P(h∣x)
Given x , hi and hj are separated, i.e. conditionallyindependent.Thus
P(h∣x) = P(h1,⋯,hn∣x) =∏i
P(hi ∣x)
16/24
1. Objectives of Lecture 5b 2. Markov random field (MRF)
2.3. Restricted Boltzmann machine (RBM)
Remark
This fact can be proved directly as follows:Let Wi =Wi● be the ith row of W . Then
hTWx =∑i
hiWix .
Thus
P(h∣x) = exp(hTWx + bT x + cTh)∑h exp(hTWx + bT x + cTh)
= ∏i exp(hiWix + cihi)∑h1,⋯,hn exp(hiWix + cihi)
17/24
1. Objectives of Lecture 5b 2. Markov random field (MRF)
2.3. Restricted Boltzmann machine (RBM)
= ∏i exp(hiWix + cihi)∏i ∑hi exp(hiWix + cihi)
= ∏i
1
Zexp(hiWix + bT x + cihi)
1
Z∑hi exp(hiWix + bT x + cihi)
= ∏i
P(x ,hi)∑hi P(x ,hi)
=∏i
P(hi ∣x)
18/24
1. Objectives of Lecture 5b 2. Markov random field (MRF)
2.3. Restricted Boltzmann machine (RBM)
Special Case: binary neurons
Assume xj = {0,1},hi ∈ {0,1}. Then
P(hi ∣x) =exp(hiWix + cihi)∑hi exp(hiWix + cihi)
19/24
1. Objectives of Lecture 5b 2. Markov random field (MRF)
2.3. Restricted Boltzmann machine (RBM)
Thus
P(hi = 1∣x) = exp(Wix + ci)1 + exp(Wix + ci)
= sigm(Wix + ci)
By symmetryP(xj = 1∣h) = sigm(W T
●j h + bj),
where W●j is the jth column of W .Now
P(x) = ∑h
P(x ,h)
= ebT x
Z∏i∑hi
exp(hiWix + cihi)
= ebT x
Z∏i
[1 + exp(Wix + ci)]
20/24
1. Objectives of Lecture 5b 2. Markov random field (MRF)
2.3. Restricted Boltzmann machine (RBM)
= ebT x
Zexp∑
i
log(1 + exp(Wix + ci))
= 1
Zexp [bT x +∑
i
log(1 + exp(Wix + ci))]
= 1
Zexp [bT x +∑
i
softplus(Wix + ci))] ,
where softplus(t) = log(1 + et)
21/24
1. Objectives of Lecture 5b 2. Markov random field (MRF)
2.3. Restricted Boltzmann machine (RBM)
Example Ising model
22/24
1. Objectives of Lecture 5b 2. Markov random field (MRF)
2.3. Restricted Boltzmann machine (RBM)
xi ∈ {1,−1}Configuration x
x = {x1,⋯, xi ,⋯, xn}
There are 2n configurations
Hamiltonian (Energy)
H = −∑i∼j
hijxixj −∑i
bixi
∑i∼j means the sum over adjacent nodes i and j for i < j
Probability of configuration x
P(x) ∼ exp(−λH)
23/24
1. Objectives of Lecture 5b 2. Markov random field (MRF)
2.3. Restricted Boltzmann machine (RBM)
λ = 1
kBT
kB ∶ Boltzmann constant (usually set to be 1)T ∶ temperature
If most xi are aligned in the same direction the energy(Hamiltonian) tends to be smaller, thus the probability isbigger
Ising model is an idealized “magnet” model
Partition functionZ =∑
x
P(x)
Thus
P(x) = 1
Zexp(−λH)
24/24
1. Objectives of Lecture 5b 2. Markov random field (MRF)
2.3. Restricted Boltzmann machine (RBM)
Due to large number of configuration, it is impractical tocompute z