cristopher m. bishop's tutorial on graphical models

Part 1: Graphical Models

Machine Learning Techniques

for Computer Vision

Microsoft Research Cambridge

ECCV 2004, Prague

Christopher M. Bishop

Machine Learning Techniques for Computer Vision (ECCV 2004)


About this Tutorial

• Learning is the new frontier in computer vision • Focus on concepts

– not lists of algorithms– not technical details

• Graduate level• Please ask questions!



Overview

• Part 1: Graphical models– directed and undirected graphs– inference and learning

• Part 2: Unsupervised learning– mixture models, EM– variational inference, model complexity– continuous latent variables

• Part 3: Supervised learning– decision theory– linear models, neural networks, – boosting, sparse kernel machines



Probability Theory

• Sum rule

• Product rule

• From these we have Bayes’ theorem

– with normalization



Role of the Graphs

• New insights into existing models• Motivation for new models• Graph based algorithms for calculation and computation

– c.f. Feynman diagrams in physics



Decomposition

• Consider an arbitrary joint distribution

• By successive application of the product rule



Directed Acyclic Graphs

• Joint distribution

where denotes the parents of i

No directed cycles



Undirected Graphs

• Provided then joint distribution is product of non-negative functions over the cliques of the graph

where are the clique potentials, and Z is a normalization constant



Conditioning on Evidence

• Variables may be hidden (latent) or visible (observed)

• Latent variables may have a specific interpretation, or may be introduced to permit a richer class of distribution



Conditional Independences

• x independent of y given z if, for all values of z,

• For undirected graphs this is given by graph separation!



“Explaining Away”

• C.I. for directed graphs similar, but with one subtlety• Illustration: pixel colour in an image

image colour

surfacecolour

lightingcolour



Directed versus Undirected



Example: State Space Models

• Hidden Markov model• Kalman filter



Example: Bayesian SSM



Example: Factorial SSM

• Multiple hidden sequences• Avoid exponentially large hidden space



Example: Markov Random Field

• Typical application: image region labelling



Example: Conditional Random Field



Inference

• Simple example: Bayes’ theorem



Message Passing

• Example

• Find marginal for a particular node

– for M-state nodes, cost is – exponential in length of chain– but, we can exploit the graphical structure

(conditional independences)



Message Passing

• Joint distribution

• Exchange sums and products



Message Passing

• Express as product of messages

• Recursive evaluation of messages

• Find Z by normalizing



Belief Propagation

• Extension to general tree-structured graphs• At each node:

– form product of incoming messages and local evidence– marginalize to give outgoing message– one message in each direction across every link

• Fails if there are loops



Junction Tree Algorithm

• An efficient exact algorithm for a general graph– applies to both directed and undirected graphs– compile original graph into a tree of cliques– then perform message passing on this tree

• Problem: – cost is exponential in size of largest clique– many vision models have intractably large cliques



Loopy Belief Propagation

• Apply belief propagation directly to general graph– need to keep iterating– might not converge

• State-of-the-art performance in error-correcting codes



Max-product Algorithm

• Goal: find

– define

– then

• Message passing algorithm with “sum” replaced by “max”• Example:

– Viterbi algorithm for HMMs



Inference and Learning

• Data set

• Likelihood function (independent observations)

• Maximize (log) likelihood

• Predictive distribution



Regularized Maximum Likelihood

• Prior , posterior

• MAP (maximum posterior)

• Predictive distribution

• Not really Bayesian



Bayesian Learning

• Key idea is to marginalize over unknown parameters, rather than make point estimates

– avoids severe over-fitting of ML and MAP– allows direct model comparison

• Parameters are now latent variables• Bayesian learning is an inference problem!



Bayesian Learning



And Finally … the Exponential Family

• Many distributions can be written in the form

• Includes: – Gaussian– Dirichlet– Gamma– Multi-nomial– Wishart– Bernoulli– …

• Building blocks in graphs to give rich probabilistic models



Illustration: the Gaussian

• Use precision (inverse variance)

• In standard form



Maximum Likelihood

• Likelihood function (independent observations)

• Depends on data via sufficient statistics of fixed dimension



Conjugate Priors

• Prior has same functional form as likelihood

• Hence posterior is of the form

• Can interpret prior as effective observations of value• Examples:

– Gaussian for the mean of a Gaussian– Gaussian-Wishart for mean and precision of Gaussian– Dirichlet for the parameters of a discrete distribution



Summary of Part 1

• Directed graphs

• Undirected graphs

• Inference by message passing: belief propagation

cristopher m. bishop's tutorial on graphical models

Documents

message passing example

directed graphs similar

vision models

new models

tutorial learning

graphs new insights

inference simple example

max example