rapid introduction to machine learning/ deep learninghichoi/seminar2015/lecture5b.pdf · rapid...

24
1/24 1. Objectives of Lecture 5b 2. Markov random field (MRF) Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University

Upload: others

Post on 20-Jun-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/Lecture5b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University

1/24

1. Objectives of Lecture 5b 2. Markov random field (MRF)

Rapid Introduction to Machine Learning/Deep Learning

Hyeong In Choi

Seoul National University

Page 2: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/Lecture5b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University

2/24

1. Objectives of Lecture 5b 2. Markov random field (MRF)

Lecture 5bMarkov random field (MRF)

November 13, 2015

Page 3: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/Lecture5b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University

3/24

1. Objectives of Lecture 5b 2. Markov random field (MRF)

Table of contents

1 1. Objectives of Lecture 5b

2 2. Markov random field (MRF)2.1. Basics of MRF2.2. Boltzmann machine2.3. Restricted Boltzmann machine (RBM)

Page 4: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/Lecture5b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University

4/24

1. Objectives of Lecture 5b 2. Markov random field (MRF)

1. Objectives of Lecture 5b

Objective 1

Learn minimal MRF formalism that is necessary for theunderstanding of the deep neural network pretraining usingrestricted Boltzmann machine

Objective 2

Learn how the probability structure is encoded in the MRF,especially the energy based formalism of Boltzmann machine

Objective 3

Learn about some basic formalism of restricted Boltzmann machine

Page 5: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/Lecture5b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University

5/24

1. Objectives of Lecture 5b 2. Markov random field (MRF)

2.1. Basics of MRF

2. Markov random field (MRF)2.1.Basics of MRF

Terminology

G: undirected graph (not necessarily a tree) in which eachnode represents a random variable

Let Xi be the random variable represented by node i , and letxi be the value of Xi [we frequently confuse node i with xi ]

x = (x1,⋯, xn) a list of the values of all random variables

The joint probability is denoted by

P(x) = P(x1,⋯, xn)

For each node xi , let N (xi) be the neighbor of xi , i.e. N (xi)is the set of nodes connected to xi

Page 6: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/Lecture5b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University

6/24

1. Objectives of Lecture 5b 2. Markov random field (MRF)

2.1. Basics of MRF

Definition of MRF

We say P(x) satisfies the Markov property, if

P(Xi = xi ∣Xj = xj , for j ≠ i) = P(Xi = xi ∣Xj = xj , for xj ∈ N(xi))

G with P(x) satisfying the Markov property is called a Markovrandom field (MRF)

Proposition

Let G be a MRF. Let A,B ,C be mutually disjoint sets of nodes ofG. Assume A and B are separated by C, meaning that every pathfrom a node in A to a node in B passes through some node in C,then

P(A,B ∣C) = P(A ∣C)P(B ∣C)

Page 7: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/Lecture5b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University

7/24

1. Objectives of Lecture 5b 2. Markov random field (MRF)

2.1. Basics of MRF

i.e. A and B are conditionally independent given C. The converseis obviously true.

Example

Page 8: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/Lecture5b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University

8/24

1. Objectives of Lecture 5b 2. Markov random field (MRF)

2.1. Basics of MRF

Gibbs distributions

Definition

Clique is a set of nodes every node of which is connected toevery other node in the set

A probability distribution P(x) is called a Gibbs distribution ifit is of the form

P(x) =∏c∈C

ψc(xc),

where C is the set of maximal cliques and ψc is a non-negativefunction of xc , where xc is the list of variables in the clique c

Page 9: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/Lecture5b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University

9/24

1. Objectives of Lecture 5b 2. Markov random field (MRF)

2.1. Basics of MRF

Example

maximal cliques:c1 = {x1, x2, x3}, c2 = {x2, x3, x4}, c3 = {x3, x5}

P(x) = ψ1(x1, x2, x3)ψ2(x2, x3, x4)ψ3(x3, x5)

Theorem (Clifford-Hammersley)

Assume P(x) > 0 for all x . If P(x) is a Gibbs distribution, then Gis an MRF

Page 10: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/Lecture5b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University

10/24

1. Objectives of Lecture 5b 2. Markov random field (MRF)

2.2. Boltzmann machine

2.2. Boltzmann machine

G ∶ graph

xi ∈ {1,−1} or xi ∈ {0,1}

Page 11: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/Lecture5b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University

11/24

1. Objectives of Lecture 5b 2. Markov random field (MRF)

2.2. Boltzmann machine

E : EnergyE(x) = −∑

i∼j

ωijxixj −∑i

bixi

∑i∼j means the sum over adjacent nodes i and j for i < j

P: Probability

P(x) = 1

Zexp(−λE(x)),

where Z is the partition function given by

Z =∑x

exp(−λE(x))

[We usually set λ = 1]

Page 12: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/Lecture5b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University

12/24

1. Objectives of Lecture 5b 2. Markov random field (MRF)

2.3. Restricted Boltzmann machine (RBM)

2.3. Restricted Boltzmann machine (RBM)

Notation

x = (x1,⋯, xd)x = (h1,⋯,hn)(x ,h) = (x1,⋯, xd ,h1,⋯,hn)

Page 13: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/Lecture5b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University

13/24

1. Objectives of Lecture 5b 2. Markov random field (MRF)

2.3. Restricted Boltzmann machine (RBM)

Energy

E(x ,h) = −∑wijxihj −∑j

bjxj −∑i

cihi

Probability

P(x ,h) = 1

Zexp(−E(x ,h))

Z =∑x ,h

exp(−E(x ,h)) [= ∫ exp(−E(x ,h))]

Note

The lower the energy, the higher the probabilityIf wij > 0, it is more likely that xj and hi have the same signIf wij < 0, it is more likely that xj and hi have the opposite signIf bj > 0, it is more likely that xj > 0, and so on

Page 14: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/Lecture5b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University

14/24

1. Objectives of Lecture 5b 2. Markov random field (MRF)

2.3. Restricted Boltzmann machine (RBM)

Probabilities of RBM

WriteE(x ,h) = −hTWx − bT x − cTh,

where W = (wij),h = [h1,⋯,hn]T , x = [x1,⋯, xd]d

P(x ,h) = 1

Zexp(−hTWx − bT x − cTh)

P(x) =∑h

P(x ,h)

P(h) =∑x

P(x ,h)

Page 15: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/Lecture5b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University

15/24

1. Objectives of Lecture 5b 2. Markov random field (MRF)

2.3. Restricted Boltzmann machine (RBM)

P(h∣x)

Given x , hi and hj are separated, i.e. conditionallyindependent.Thus

P(h∣x) = P(h1,⋯,hn∣x) =∏i

P(hi ∣x)

Page 16: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/Lecture5b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University

16/24

1. Objectives of Lecture 5b 2. Markov random field (MRF)

2.3. Restricted Boltzmann machine (RBM)

Remark

This fact can be proved directly as follows:Let Wi =Wi● be the ith row of W . Then

hTWx =∑i

hiWix .

Thus

P(h∣x) = exp(hTWx + bT x + cTh)∑h exp(hTWx + bT x + cTh)

= ∏i exp(hiWix + cihi)∑h1,⋯,hn exp(hiWix + cihi)

Page 17: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/Lecture5b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University

17/24

1. Objectives of Lecture 5b 2. Markov random field (MRF)

2.3. Restricted Boltzmann machine (RBM)

= ∏i exp(hiWix + cihi)∏i ∑hi exp(hiWix + cihi)

= ∏i

1

Zexp(hiWix + bT x + cihi)

1

Z∑hi exp(hiWix + bT x + cihi)

= ∏i

P(x ,hi)∑hi P(x ,hi)

=∏i

P(hi ∣x)

Page 18: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/Lecture5b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University

18/24

1. Objectives of Lecture 5b 2. Markov random field (MRF)

2.3. Restricted Boltzmann machine (RBM)

Special Case: binary neurons

Assume xj = {0,1},hi ∈ {0,1}. Then

P(hi ∣x) =exp(hiWix + cihi)∑hi exp(hiWix + cihi)

Page 19: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/Lecture5b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University

19/24

1. Objectives of Lecture 5b 2. Markov random field (MRF)

2.3. Restricted Boltzmann machine (RBM)

Thus

P(hi = 1∣x) = exp(Wix + ci)1 + exp(Wix + ci)

= sigm(Wix + ci)

By symmetryP(xj = 1∣h) = sigm(W T

●j h + bj),

where W●j is the jth column of W .Now

P(x) = ∑h

P(x ,h)

= ebT x

Z∏i∑hi

exp(hiWix + cihi)

= ebT x

Z∏i

[1 + exp(Wix + ci)]

Page 20: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/Lecture5b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University

20/24

1. Objectives of Lecture 5b 2. Markov random field (MRF)

2.3. Restricted Boltzmann machine (RBM)

= ebT x

Zexp∑

i

log(1 + exp(Wix + ci))

= 1

Zexp [bT x +∑

i

log(1 + exp(Wix + ci))]

= 1

Zexp [bT x +∑

i

softplus(Wix + ci))] ,

where softplus(t) = log(1 + et)

Page 21: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/Lecture5b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University

21/24

1. Objectives of Lecture 5b 2. Markov random field (MRF)

2.3. Restricted Boltzmann machine (RBM)

Example Ising model

Page 22: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/Lecture5b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University

22/24

1. Objectives of Lecture 5b 2. Markov random field (MRF)

2.3. Restricted Boltzmann machine (RBM)

xi ∈ {1,−1}Configuration x

x = {x1,⋯, xi ,⋯, xn}

There are 2n configurations

Hamiltonian (Energy)

H = −∑i∼j

hijxixj −∑i

bixi

∑i∼j means the sum over adjacent nodes i and j for i < j

Probability of configuration x

P(x) ∼ exp(−λH)

Page 23: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/Lecture5b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University

23/24

1. Objectives of Lecture 5b 2. Markov random field (MRF)

2.3. Restricted Boltzmann machine (RBM)

λ = 1

kBT

kB ∶ Boltzmann constant (usually set to be 1)T ∶ temperature

If most xi are aligned in the same direction the energy(Hamiltonian) tends to be smaller, thus the probability isbigger

Ising model is an idealized “magnet” model

Partition functionZ =∑

x

P(x)

Thus

P(x) = 1

Zexp(−λH)

Page 24: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/Lecture5b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University

24/24

1. Objectives of Lecture 5b 2. Markov random field (MRF)

2.3. Restricted Boltzmann machine (RBM)

Due to large number of configuration, it is impractical tocompute z