prml slides 8

71
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS

Upload: gaurav-katare

Post on 28-Nov-2014

183 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Prml Slides 8

PATTERN RECOGNITION AND MACHINE LEARNINGCHAPTER 8: GRAPHICAL MODELS

Page 2: Prml Slides 8

Bayesian Networks

Directed Acyclic Graph (DAG)

Page 3: Prml Slides 8

Bayesian Networks

General Factorization

Page 4: Prml Slides 8

Bayesian Curve Fitting (1)

Polynomial

Page 5: Prml Slides 8

Bayesian Curve Fitting (2)

Plate

Page 6: Prml Slides 8

Bayesian Curve Fitting (3)

Input variables and explicit hyperparameters

Page 7: Prml Slides 8

Bayesian Curve Fitting—Learning

Condition on data

Page 8: Prml Slides 8

Bayesian Curve Fitting—Prediction

Predictive distribution:

where

Page 9: Prml Slides 8

Generative Models

Causal process for generating images

Page 10: Prml Slides 8

Discrete Variables (1)

General joint distribution: K 2 { 1 parameters

Independent joint distribution: 2(K { 1) parameters

Page 11: Prml Slides 8

Discrete Variables (2)

General joint distribution over M variables: KM { 1 parameters

M -node Markov chain: K { 1 + (M { 1) K(K { 1) parameters

Page 12: Prml Slides 8

Discrete Variables: Bayesian Parameters (1)

Page 13: Prml Slides 8

Discrete Variables: Bayesian Parameters (2)

Shared prior

Page 14: Prml Slides 8

Parameterized Conditional Distributions

If are discrete, K-state variables, in general has O(K M) parameters.

The parameterized form

requires only M + 1 parameters

Page 15: Prml Slides 8

Linear-Gaussian Models

Directed Graph

Vector-valued Gaussian Nodes

Each node is Gaussian, the mean is a linear function of the parents.

Page 16: Prml Slides 8

Conditional Independence

a is independent of b given c

Equivalently

Notation

Page 17: Prml Slides 8

Conditional Independence: Example 1

Page 18: Prml Slides 8

Conditional Independence: Example 1

Page 19: Prml Slides 8

Conditional Independence: Example 2

Page 20: Prml Slides 8

Conditional Independence: Example 2

Page 21: Prml Slides 8

Conditional Independence: Example 3

Note: this is the opposite of Example 1, with c unobserved.

Page 22: Prml Slides 8

Conditional Independence: Example 3

Note: this is the opposite of Example 1, with c observed.

Page 23: Prml Slides 8

“Am I out of fuel?”

B = Battery (0=flat, 1=fully charged)F = Fuel Tank (0=empty, 1=full)G = Fuel Gauge Reading

(0=empty, 1=full)

and hence

Page 24: Prml Slides 8

“Am I out of fuel?”

Probability of an empty tank increased by observing G = 0.

Page 25: Prml Slides 8

“Am I out of fuel?”

Probability of an empty tank reduced by observing B = 0. This referred to as “explaining away”.

Page 26: Prml Slides 8

D-separation• A, B, and C are non-intersecting subsets of nodes in a

directed graph.• A path from A to B is blocked if it contains a node such that

eithera) the arrows on the path meet either head-to-tail or tail-

to-tail at the node, and the node is in the set C, orb) the arrows meet head-to-head at the node, and

neither the node, nor any of its descendants, are in the set C.

• If all paths from A to B are blocked, A is said to be d-separated from B by C. • If A is d-separated from B by C, the joint distribution over

all variables in the graph satisfies .

Page 27: Prml Slides 8

D-separation: Example

Page 28: Prml Slides 8

D-separation: I.I.D. Data

Page 29: Prml Slides 8

Directed Graphs as Distribution Filters

Page 30: Prml Slides 8

The Markov Blanket

Factors independent of xi cancel between numerator and denominator.

Page 31: Prml Slides 8

Cliques and Maximal Cliques

Clique

Maximal Clique

Page 32: Prml Slides 8

Joint Distribution

where is the potential over clique C and

is the normalization coefficient; note: M K-state variables KM terms in Z.

Energies and the Boltzmann distribution

Page 33: Prml Slides 8

Illustration: Image De-Noising (1)

Original Image Noisy Image

Page 34: Prml Slides 8

Illustration: Image De-Noising (2)

Page 35: Prml Slides 8

Illustration: Image De-Noising (3)

Noisy Image Restored Image (ICM)

Page 36: Prml Slides 8

Illustration: Image De-Noising (4)

Restored Image (Graph cuts)Restored Image (ICM)

Page 37: Prml Slides 8

Converting Directed to Undirected Graphs (1)

Page 38: Prml Slides 8

Converting Directed to Undirected Graphs (2)

Additional links

Page 39: Prml Slides 8

Directed vs. Undirected Graphs (1)

Page 40: Prml Slides 8

Directed vs. Undirected Graphs (2)

Page 41: Prml Slides 8

Inference in Graphical Models

Page 42: Prml Slides 8

Inference on a Chain

Page 43: Prml Slides 8

Inference on a Chain

Page 44: Prml Slides 8

Inference on a Chain

Page 45: Prml Slides 8

Inference on a Chain

Page 46: Prml Slides 8

Inference on a Chain

To compute local marginals:•Compute and store all forward messages, .•Compute and store all backward messages, . •Compute Z at any node xm •Compute

for all variables required.

Page 47: Prml Slides 8

Trees

Undirected Tree Directed Tree Polytree

Page 48: Prml Slides 8

Factor Graphs

Page 49: Prml Slides 8

Factor Graphs from Directed Graphs

Page 50: Prml Slides 8

Factor Graphs from Undirected Graphs

Page 51: Prml Slides 8

The Sum-Product Algorithm (1)

Objective:i. to obtain an efficient, exact inference

algorithm for finding marginals;ii. in situations where several marginals are

required, to allow computations to be shared efficiently.

Key idea: Distributive Law

Page 52: Prml Slides 8

The Sum-Product Algorithm (2)

Page 53: Prml Slides 8

The Sum-Product Algorithm (3)

Page 54: Prml Slides 8

The Sum-Product Algorithm (4)

Page 55: Prml Slides 8

The Sum-Product Algorithm (5)

Page 56: Prml Slides 8

The Sum-Product Algorithm (6)

Page 57: Prml Slides 8

The Sum-Product Algorithm (7)

Initialization

Page 58: Prml Slides 8

The Sum-Product Algorithm (8)

To compute local marginals:• Pick an arbitrary node as root• Compute and propagate messages from the leaf

nodes to the root, storing received messages at every node.

• Compute and propagate messages from the root to the leaf nodes, storing received messages at every node.

• Compute the product of received messages at each node for which the marginal is required, and normalize if necessary.

Page 59: Prml Slides 8

Sum-Product: Example (1)

Page 60: Prml Slides 8

Sum-Product: Example (2)

Page 61: Prml Slides 8

Sum-Product: Example (3)

Page 62: Prml Slides 8

Sum-Product: Example (4)

Page 63: Prml Slides 8

The Max-Sum Algorithm (1)

Objective: an efficient algorithm for finding i. the value xmax that maximises p(x);ii. the value of p(xmax).

In general, maximum marginals joint maximum.

Page 64: Prml Slides 8

The Max-Sum Algorithm (2)

Maximizing over a chain (max-product)

Page 65: Prml Slides 8

The Max-Sum Algorithm (3)

Generalizes to tree-structured factor graph

maximizing as close to the leaf nodes as possible

Page 66: Prml Slides 8

The Max-Sum Algorithm (4)

Max-Product Max-SumFor numerical reasons, use

Again, use distributive law

Page 67: Prml Slides 8

The Max-Sum Algorithm (5)

Initialization (leaf nodes)

Recursion

Page 68: Prml Slides 8

The Max-Sum Algorithm (6)

Termination (root node)

Back-track, for all nodes i with l factor nodes to the root (l=0)

Page 69: Prml Slides 8

The Max-Sum Algorithm (7)

Example: Markov chain

Page 70: Prml Slides 8

The Junction Tree Algorithm

• Exact inference on general graphs.• Works by turning the initial graph into a

junction tree and then running a sum-product-like algorithm.

• Intractable on graphs with large cliques.

Page 71: Prml Slides 8

Loopy Belief Propagation

• Sum-Product on general graphs.• Initial unit messages passed across all links,

after which messages are passed around until convergence (not guaranteed!).

• Approximate but tractable for large graphs.• Sometime works well, sometimes not at all.