introduction to machine learning10701/slides/21_undirected_graphical_models.pdf · example: image...

46
Introduction to Machine Learning Undirected Graphical Models Barnabás Póczos

Upload: others

Post on 08-Mar-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Machine Learning10701/slides/21_Undirected_Graphical_Models.pdf · Example: Image Denoising 12 Let us look at the example of noise removal from a binary image. The

Introduction to Machine Learning

Undirected Graphical Models

Barnabás Póczos

Page 2: Introduction to Machine Learning10701/slides/21_Undirected_Graphical_Models.pdf · Example: Image Denoising 12 Let us look at the example of noise removal from a binary image. The

Credits

Many of these slides are taken from

Ruslan Salakhutdinov, Hugo Larochelle, & Eric Xing

• http://www.dmi.usherb.ca/~larocheh/neural_networks

• http://www.cs.cmu.edu/~rsalakhu/10707/

• http://www.cs.cmu.edu/~epxing/Class/10708/

Reading material:• http://www.cs.cmu.edu/~rsalakhu/papers/Russ_thesis.pdf

• Section 30.1 of Information Theory, Inference, and Learning Algorithms by David

MacKay

• http://www.stat.cmu.edu/~larry/=sml/GraphicalModels.pdf

2

Page 3: Introduction to Machine Learning10701/slides/21_Undirected_Graphical_Models.pdf · Example: Image Denoising 12 Let us look at the example of noise removal from a binary image. The

Probabilistic graphical models: a powerful framework for representing dependency

structure between random variables.

Markov network (or undirected graphical model) is a set of random variables having a

dependency structure described by an undirected graph.

Undirected Graphical Models

= Markov Random Fields

Semantic labeling

3

Page 4: Introduction to Machine Learning10701/slides/21_Undirected_Graphical_Models.pdf · Example: Image Denoising 12 Let us look at the example of noise removal from a binary image. The

4

Cliques

Clique: a subset of nodes such that there exists a link between all pairs of nodes in a

subset.

Maximal Clique: a clique such that it is not possible to include any other nodes in

the set without it ceasing to be a clique.

This graph has two maximal cliques:

Other cliques:

Page 5: Introduction to Machine Learning10701/slides/21_Undirected_Graphical_Models.pdf · Example: Image Denoising 12 Let us look at the example of noise removal from a binary image. The

Undirected Graphical Models

= Markov Random Fields

Directed graphs are useful for expressing causal relationships between random

variables, whereas undirected graphs are useful for expressing dependencies between

random variables.

The joint distribution defined by the graph is given by the

product of non-negative potential functions over the

maximal cliques (connected subset of nodes).

In this example, the joint distribution factorizes as:

5

Page 6: Introduction to Machine Learning10701/slides/21_Undirected_Graphical_Models.pdf · Example: Image Denoising 12 Let us look at the example of noise removal from a binary image. The

Markov Random Fields (MRFs)

Each potential function is a mapping from the joint

configurations of random variables in a maximal clique to non-

negative real numbers.

The choice of potential functions is not restricted to having

specific probabilistic interpretations.

6where E(x) is called an energy function.

Page 7: Introduction to Machine Learning10701/slides/21_Undirected_Graphical_Models.pdf · Example: Image Denoising 12 Let us look at the example of noise removal from a binary image. The

Conditional Independence

7

Theorem:

It follows that the undirected graphical structure represents conditional independence:

Page 8: Introduction to Machine Learning10701/slides/21_Undirected_Graphical_Models.pdf · Example: Image Denoising 12 Let us look at the example of noise removal from a binary image. The

MRFs with Hidden Variables

8

For many interesting problems, we need to introduce hidden or latent variables.

Our random variables will contain both visible and hidden variables x=(v,h)

Computing the Z partition function is intractable

Computing the summation over hidden variables is intractable

Parameter learning is very challenging.

Page 9: Introduction to Machine Learning10701/slides/21_Undirected_Graphical_Models.pdf · Example: Image Denoising 12 Let us look at the example of noise removal from a binary image. The

Boltzmann Machines

Definition: [Boltzmann machines] MRFs with maximum click size two [pairwise (edge)

potentials] on binary-valued nodes are called Boltzmann machines

The joint probabilities are given by :

The parameter θij measures the dependence of xi on xj, conditioned on the other nodes. 9

Page 10: Introduction to Machine Learning10701/slides/21_Undirected_Graphical_Models.pdf · Example: Image Denoising 12 Let us look at the example of noise removal from a binary image. The

Boltzmann Machines

Theorem: One can prove that the conditional distribution of one node conditioned

on the others is given by the logistic function in Boltzmann Machines:

10

Proof:

Page 11: Introduction to Machine Learning10701/slides/21_Undirected_Graphical_Models.pdf · Example: Image Denoising 12 Let us look at the example of noise removal from a binary image. The

Boltzmann Machines

11

Proof [Continued]:

Q.E.D.

Page 12: Introduction to Machine Learning10701/slides/21_Undirected_Graphical_Models.pdf · Example: Image Denoising 12 Let us look at the example of noise removal from a binary image. The

Example: Image Denoising

12

Let us look at the example of noise removal from a binary image.

The image is an array of {-1, +1} pixel values.

We take the original noise-free image (x) and randomly flip the sign of pixels with a small

probability. This process creates the noisy image (y)

Our goal is to estimate the original image x from the noisy observations y.

We model the joint distribution with

Page 13: Introduction to Machine Learning10701/slides/21_Undirected_Graphical_Models.pdf · Example: Image Denoising 12 Let us look at the example of noise removal from a binary image. The

Inference: Iterated Conditional Models

13

Goal:

Solution: coordinate-wise gradient descent

Page 14: Introduction to Machine Learning10701/slides/21_Undirected_Graphical_Models.pdf · Example: Image Denoising 12 Let us look at the example of noise removal from a binary image. The

Gaussian MRFs

• The information matrix is sparse, but the covariance matrix is not.14

Page 15: Introduction to Machine Learning10701/slides/21_Undirected_Graphical_Models.pdf · Example: Image Denoising 12 Let us look at the example of noise removal from a binary image. The

Restricted Boltzmann Machines

15

Page 16: Introduction to Machine Learning10701/slides/21_Undirected_Graphical_Models.pdf · Example: Image Denoising 12 Let us look at the example of noise removal from a binary image. The

Restricted Boltzmann Machines

Restricted = no connections in the hidden layer + no connection in the visible layer

16

x

Partition function (intractable)

Page 17: Introduction to Machine Learning10701/slides/21_Undirected_Graphical_Models.pdf · Example: Image Denoising 12 Let us look at the example of noise removal from a binary image. The

[Quadratic in v linear in h]

17

Gaussian-Bernoulli RBM

Page 18: Introduction to Machine Learning10701/slides/21_Undirected_Graphical_Models.pdf · Example: Image Denoising 12 Let us look at the example of noise removal from a binary image. The

Possible Tasks with RBM

Tasks:

Inference:

Evaluate the likelihood function:

Sampling from RBM:

Training RBM:

18

Page 19: Introduction to Machine Learning10701/slides/21_Undirected_Graphical_Models.pdf · Example: Image Denoising 12 Let us look at the example of noise removal from a binary image. The

Inference

19

Page 20: Introduction to Machine Learning10701/slides/21_Undirected_Graphical_Models.pdf · Example: Image Denoising 12 Let us look at the example of noise removal from a binary image. The

Inference

20

Theorem: Inference in RBM is simple: the conditional

distributions are logistic functions

Similarly,

x

Page 21: Introduction to Machine Learning10701/slides/21_Undirected_Graphical_Models.pdf · Example: Image Denoising 12 Let us look at the example of noise removal from a binary image. The

21

Proof:

Page 22: Introduction to Machine Learning10701/slides/21_Undirected_Graphical_Models.pdf · Example: Image Denoising 12 Let us look at the example of noise removal from a binary image. The

22

Proof [Continued]:

Q.E.D.

Page 23: Introduction to Machine Learning10701/slides/21_Undirected_Graphical_Models.pdf · Example: Image Denoising 12 Let us look at the example of noise removal from a binary image. The

Evaluating the Likelihood

23

Page 24: Introduction to Machine Learning10701/slides/21_Undirected_Graphical_Models.pdf · Example: Image Denoising 12 Let us look at the example of noise removal from a binary image. The

Calculating the Likelihood of an RBM

24

Theorem: Calculating the likelihood is simple in RBM

(apart from the partition function)

Free energy

Page 25: Introduction to Machine Learning10701/slides/21_Undirected_Graphical_Models.pdf · Example: Image Denoising 12 Let us look at the example of noise removal from a binary image. The

Proof:

25Q.E.D.

Page 26: Introduction to Machine Learning10701/slides/21_Undirected_Graphical_Models.pdf · Example: Image Denoising 12 Let us look at the example of noise removal from a binary image. The

Sampling

26

Page 27: Introduction to Machine Learning10701/slides/21_Undirected_Graphical_Models.pdf · Example: Image Denoising 12 Let us look at the example of noise removal from a binary image. The

Sampling from p(x,h) in RBM

27

Sampling is tricky… it is easier much in directed graphical models.

Here we will use Gibbs sampling.

Similarly,

x

Goal: Generate samples from

Page 28: Introduction to Machine Learning10701/slides/21_Undirected_Graphical_Models.pdf · Example: Image Denoising 12 Let us look at the example of noise removal from a binary image. The

28

Gibbs Sampling: The Problem

Our goal is to generate samples from

Suppose that we can generate samples from

Page 29: Introduction to Machine Learning10701/slides/21_Undirected_Graphical_Models.pdf · Example: Image Denoising 12 Let us look at the example of noise removal from a binary image. The

29

Gibbs Sampling: Pseudo Code

Page 30: Introduction to Machine Learning10701/slides/21_Undirected_Graphical_Models.pdf · Example: Image Denoising 12 Let us look at the example of noise removal from a binary image. The

30

Gibbs Sampling

Page 31: Introduction to Machine Learning10701/slides/21_Undirected_Graphical_Models.pdf · Example: Image Denoising 12 Let us look at the example of noise removal from a binary image. The

31

Training

Page 32: Introduction to Machine Learning10701/slides/21_Undirected_Graphical_Models.pdf · Example: Image Denoising 12 Let us look at the example of noise removal from a binary image. The

32

RBM Training

Training is complicated…

To train an RBM, we would like to minimize the negative log-likelihood

function:

To solve this, we use stochastic gradient ascent:

Theorem:

Positive phase Negative phase

(hard to computer)

Page 33: Introduction to Machine Learning10701/slides/21_Undirected_Graphical_Models.pdf · Example: Image Denoising 12 Let us look at the example of noise removal from a binary image. The

33

RBM Training

Proof:

Page 34: Introduction to Machine Learning10701/slides/21_Undirected_Graphical_Models.pdf · Example: Image Denoising 12 Let us look at the example of noise removal from a binary image. The

34

RBM Training

Proof [Continued]:

First term Second term

First term:

Difficult to

calculate the

expectation

Negative phase

Page 35: Introduction to Machine Learning10701/slides/21_Undirected_Graphical_Models.pdf · Example: Image Denoising 12 Let us look at the example of noise removal from a binary image. The

35

RBM Training

Proof [Continued]:

Page 36: Introduction to Machine Learning10701/slides/21_Undirected_Graphical_Models.pdf · Example: Image Denoising 12 Let us look at the example of noise removal from a binary image. The

36

RBM Training

Proof [Continued]:

Second term:

The conditionals

are independent

logistic

distributions

First term Second term

Q.E.DPositive phase

Page 37: Introduction to Machine Learning10701/slides/21_Undirected_Graphical_Models.pdf · Example: Image Denoising 12 Let us look at the example of noise removal from a binary image. The

37

Since

We need to calculate

where

RBM Training

Page 38: Introduction to Machine Learning10701/slides/21_Undirected_Graphical_Models.pdf · Example: Image Denoising 12 Let us look at the example of noise removal from a binary image. The

38

The second term is more tricky.

Approximate the expectations with a single sample:

RBM Training

Since

We need to calculate

where

Page 39: Introduction to Machine Learning10701/slides/21_Undirected_Graphical_Models.pdf · Example: Image Denoising 12 Let us look at the example of noise removal from a binary image. The

39

RBM Training

LogisticLogistic

Page 40: Introduction to Machine Learning10701/slides/21_Undirected_Graphical_Models.pdf · Example: Image Denoising 12 Let us look at the example of noise removal from a binary image. The

40

CD-k (Contrastive Divergence) Pseudocode

Page 41: Introduction to Machine Learning10701/slides/21_Undirected_Graphical_Models.pdf · Example: Image Denoising 12 Let us look at the example of noise removal from a binary image. The

Results

41

Page 42: Introduction to Machine Learning10701/slides/21_Undirected_Graphical_Models.pdf · Example: Image Denoising 12 Let us look at the example of noise removal from a binary image. The

http://deeplearning.net/tutorial/rbm.html

RBM Training Results

Samples generated by the RBM after training.

Each row represents a mini-batch of negative

particles (samples from independent Gibbs

chains). 1000 steps of Gibbs sampling were

taken between each of those rows.

Original images Learned filters

42

Page 43: Introduction to Machine Learning10701/slides/21_Undirected_Graphical_Models.pdf · Example: Image Denoising 12 Let us look at the example of noise removal from a binary image. The

Summary

Tasks:

Inference:

Evaluate the likelihood function:

Sampling from RBM:

Training RBM:

43

Page 44: Introduction to Machine Learning10701/slides/21_Undirected_Graphical_Models.pdf · Example: Image Denoising 12 Let us look at the example of noise removal from a binary image. The

Thanks for your Attention!

Page 45: Introduction to Machine Learning10701/slides/21_Undirected_Graphical_Models.pdf · Example: Image Denoising 12 Let us look at the example of noise removal from a binary image. The

45

RBM Training Results

Page 46: Introduction to Machine Learning10701/slides/21_Undirected_Graphical_Models.pdf · Example: Image Denoising 12 Let us look at the example of noise removal from a binary image. The

46

Gaussian-Bernoulli RBM Training Results

Each document (story) is represented with a bag of world coming from a multinomial

distribution with parameters (h = topics). After training we can generate words from this topics.