csc 665-1: advanced topics in probabilistic graphical models
TRANSCRIPT
![Page 1: CSC 665-1: Advanced Topics in Probabilistic Graphical Models](https://reader030.vdocuments.site/reader030/viewer/2022012508/6184923cfff30875c8425c21/html5/thumbnails/1.jpg)
CSC 665-1: Advanced Topics in Probabilistic Graphical Models
Graphical Models
Instructor: Prof. Jason Pacheco
![Page 2: CSC 665-1: Advanced Topics in Probabilistic Graphical Models](https://reader030.vdocuments.site/reader030/viewer/2022012508/6184923cfff30875c8425c21/html5/thumbnails/2.jpg)
From Probabilities to Pictures
A probabilistic graphical model allows us to pictorially represent a probability distribution*
Probability Model:Graphical Model:
The graphical model structure obeys the factorization of the probability function in a sense we will formalize later
* We will use the term “distribution” loosely to refer to a CDF / PDF / PMF
![Page 3: CSC 665-1: Advanced Topics in Probabilistic Graphical Models](https://reader030.vdocuments.site/reader030/viewer/2022012508/6184923cfff30875c8425c21/html5/thumbnails/3.jpg)
Graphical Models
[Source: Erik Sudderth, PhD Thesis]
Bayes Network Factor Graph Markov Random Field
A variety of graphical models can represent the same probability distribution
Undirected ModelsDirected Models
![Page 4: CSC 665-1: Advanced Topics in Probabilistic Graphical Models](https://reader030.vdocuments.site/reader030/viewer/2022012508/6184923cfff30875c8425c21/html5/thumbnails/4.jpg)
Factorized Probability Distributions
A probability distribution over RVs can be written as a product of factors,
Where:
• a collection of subsets of indices
• are nonnegative factors (or potential functions)
• the normalizing constant (or partition function)
![Page 5: CSC 665-1: Advanced Topics in Probabilistic Graphical Models](https://reader030.vdocuments.site/reader030/viewer/2022012508/6184923cfff30875c8425c21/html5/thumbnails/5.jpg)
Undirected Graphical Models
A graph is a set of vertices and edges . An edge connects two vertices .
In undirected models edges are specified irrespective of node ordering so that,
Distributions are typically specified with unknown normalization (easier to specify),
![Page 6: CSC 665-1: Advanced Topics in Probabilistic Graphical Models](https://reader030.vdocuments.site/reader030/viewer/2022012508/6184923cfff30875c8425c21/html5/thumbnails/6.jpg)
Markov Random Fields (MRFs)
A factor corresponds to a clique (fully connected subgraph) in the MRF
An MRF does not imply a unique factorization, for example all the following are “valid”:Clique
A factorization is valid if it satisfies the Global Markov property, defined by conditional
independencies
![Page 7: CSC 665-1: Advanced Topics in Probabilistic Graphical Models](https://reader030.vdocuments.site/reader030/viewer/2022012508/6184923cfff30875c8425c21/html5/thumbnails/7.jpg)
We say and are independentor if:
We say they are conditionallyindependent or if:
Def. We say is globally Markovw.r.t. if in any separating set of .
Conditional Independence (Undirected)
[ Source: Michael I. Jordan]
Conditional independence
in undirected graphical models
is defined by separating sets
![Page 8: CSC 665-1: Advanced Topics in Probabilistic Graphical Models](https://reader030.vdocuments.site/reader030/viewer/2022012508/6184923cfff30875c8425c21/html5/thumbnails/8.jpg)
Hammersley-Clifford Theorem
A minimal factorization is one where all factors are maximal cliques (not a strict subset of any other clique) in the MRF
![Page 9: CSC 665-1: Advanced Topics in Probabilistic Graphical Models](https://reader030.vdocuments.site/reader030/viewer/2022012508/6184923cfff30875c8425c21/html5/thumbnails/9.jpg)
Pairwise Markov Random Field
Likelihood Prior
Restricted class of MRFs• 2-node factor exists for every edge
• Explicit factorization of joint distribution
• High-order factors not always easily decomposed into pairwise terms
Unknown
Variables
Observations
Often easier to specify and do inference on pairwise model
![Page 10: CSC 665-1: Advanced Topics in Probabilistic Graphical Models](https://reader030.vdocuments.site/reader030/viewer/2022012508/6184923cfff30875c8425c21/html5/thumbnails/10.jpg)
Example: Image Segmentation
Pairwise MRF energy:
[Source: Kundu, A. et al., CVPR16]
L2 Likelihood: Potts model:
Low energy configurations = High probability
MAP (minimum energy) configuration = Piecewise constant regions
Don’t need
to know log-
partition to
specify model
Notional figure
only!
![Page 11: CSC 665-1: Advanced Topics in Probabilistic Graphical Models](https://reader030.vdocuments.site/reader030/viewer/2022012508/6184923cfff30875c8425c21/html5/thumbnails/11.jpg)
Factor Graphs
A hypergraph where a hyperedge is a subset of vertices .
Factor graphs explicitly encode factorization of distribution:
where the set of variables in factor f. For example:
![Page 12: CSC 665-1: Advanced Topics in Probabilistic Graphical Models](https://reader030.vdocuments.site/reader030/viewer/2022012508/6184923cfff30875c8425c21/html5/thumbnails/12.jpg)
Example: Low Density Parity Check Codes
Sparse Parity Check MatrixFactor Graph Representation
Noisy
Channel
Transmitted Code Received Code
Decoder
[Source: David MacKay]
![Page 13: CSC 665-1: Advanced Topics in Probabilistic Graphical Models](https://reader030.vdocuments.site/reader030/viewer/2022012508/6184923cfff30875c8425c21/html5/thumbnails/13.jpg)
Example: Low Density Parity Check Codes
• Valid codes have zero parity:
• Chanel noise model arbitrary, e.g. flip bits w/ probability:
Sparse Parity Check MatrixFactor Graph Representation
n-th bit [Source: David MacKay]
![Page 14: CSC 665-1: Advanced Topics in Probabilistic Graphical Models](https://reader030.vdocuments.site/reader030/viewer/2022012508/6184923cfff30875c8425c21/html5/thumbnails/14.jpg)
Directed Graphs
Def. A directed graph is a graph with edges (arcs) connecting parent vertex to a child vertex
Def. Parents of vertex are given by the set of nodes with arcs pointing to ,
Children of are given by the set,
Ancestors are parents-of-parents. Descendants are children-of-children.
![Page 15: CSC 665-1: Advanced Topics in Probabilistic Graphical Models](https://reader030.vdocuments.site/reader030/viewer/2022012508/6184923cfff30875c8425c21/html5/thumbnails/15.jpg)
Bayes Network
Model factors are normalized conditional distributions:
Directed acyclic graph (DAG) specifies factorized form of joint probability:
Locally normalized factors yield globally normalized joint probability
Parents of node s
![Page 16: CSC 665-1: Advanced Topics in Probabilistic Graphical Models](https://reader030.vdocuments.site/reader030/viewer/2022012508/6184923cfff30875c8425c21/html5/thumbnails/16.jpg)
Bayes nets are easily simulated via ancestral sampling
Specification is more difficult than undirected models since each factor must be a normalized probability measure
Example: Gaussian Mixture Model
Probability Model Bayes Net Joint Sample
![Page 17: CSC 665-1: Advanced Topics in Probabilistic Graphical Models](https://reader030.vdocuments.site/reader030/viewer/2022012508/6184923cfff30875c8425c21/html5/thumbnails/17.jpg)
Plate Notation
Plates denote replication of elements
Example:
Example:
![Page 18: CSC 665-1: Advanced Topics in Probabilistic Graphical Models](https://reader030.vdocuments.site/reader030/viewer/2022012508/6184923cfff30875c8425c21/html5/thumbnails/18.jpg)
Latent state evolves according to linear dynamics.
Observations are linear functions of the state.
Example: Linear Gaussian Dynamics System
Conditional Probability Model: State-Space Model (equivalent):
where
where
“White” NoiseState Dynamics Plant EquationsProcess Noise
Measurement Model Observation Noise
![Page 19: CSC 665-1: Advanced Topics in Probabilistic Graphical Models](https://reader030.vdocuments.site/reader030/viewer/2022012508/6184923cfff30875c8425c21/html5/thumbnails/19.jpg)
Example: Linear Gaussian Dynamical System
Time Time
Po
sit
ion
Po
sit
ion
Low High
Measurement Noise
Lo
wH
igh
Pro
cess N
ois
e
![Page 20: CSC 665-1: Advanced Topics in Probabilistic Graphical Models](https://reader030.vdocuments.site/reader030/viewer/2022012508/6184923cfff30875c8425c21/html5/thumbnails/20.jpg)
Conditional Independence (Directed)
Not as simple as graph separation in directed graphs…
X Y
Z
X Y
Z
X : Robber
Y : Earthquake
Z : House Alarm
Binary RVs:
X : Heat On
Y : A/C On
Z : Temperature
Binary RVs:
“Explaining Away Evidence”
Directed separation (d-separation) property indicates conditional independence in directed models.
![Page 21: CSC 665-1: Advanced Topics in Probabilistic Graphical Models](https://reader030.vdocuments.site/reader030/viewer/2022012508/6184923cfff30875c8425c21/html5/thumbnails/21.jpg)
Bayes Ball Algorithm
To test if imagine rolling a “ball” from each node in . The “ball” follows certain rules defined by canonical 3-
node subgraphs:
Incoming & outgoing edges
Y blocks the Bayes ball,
acting as a d-separator.
Y does not block. It is not
a d-separator.
[Source: Michael I Jordan]
![Page 22: CSC 665-1: Advanced Topics in Probabilistic Graphical Models](https://reader030.vdocuments.site/reader030/viewer/2022012508/6184923cfff30875c8425c21/html5/thumbnails/22.jpg)
Two Outgoing Arrows Two Incoming Arrows
Y blocks Y blocksY does not block Y does not block
If a set blocks for every node in then . Conversely, if a ball reaches any node in then they are notconditionally independent.
![Page 23: CSC 665-1: Advanced Topics in Probabilistic Graphical Models](https://reader030.vdocuments.site/reader030/viewer/2022012508/6184923cfff30875c8425c21/html5/thumbnails/23.jpg)
Summary
Conditional independence given by graph separation and d-separation for undirected / directed models.
Undirected models may be specified up to normalization. Factorization may not be
unique for MRFs.
X Y
Z
Directed models useful for product of locally-normalized conditional probabilities.
Simplifies simulation via ancestral sampling. Conditional independence more difficult.