stat 598l probabilistic graphical modelsskirshne/teaching/stat598l_f09/mn.pdf · 2010-01-27 ·...

56
STAT 598L Probabilistic Graphical Models Instructor: Sergey Kirshner Markov Networks

Upload: others

Post on 25-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

STAT 598LProbabilistic Graphical Models

Instructor: Sergey Kirshner

Markov Networks

Page 2: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

Motivating Example

• Is there a Bayesian Network that is a P-map for {(A ⊥ B │ C, D), (C ⊥ D │ A, B)}?– No other independence except for application of

symmetry, so the rest of the parents are dependent (in a P-map)

– Skeleton

– Adding directions• Without loss of generality, A->C

• Cannot have B->C (A->C<-B)

• Cannot have D->B (C->B<-D)

• Cannot have A->D (A->D<-B)STAT 598L: Probabilistic Graphical Models (Markov Networks)

A

C

B

D

No BN P-map!

Page 3: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

Undirected Model• Is there a different framework that can represent

these dependencies?– What if we had undirected separation instead of d-

separation?

STAT 598L: Probabilistic Graphical Models (Markov Networks)

A

C

B

D

• Markov networks (Markov random fields, MRFs)– Represent conditional independence

relations with an undirected graph

– Encode functional dependence using potential functions or factors

Page 4: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

Factors

STAT 598L: Probabilistic Graphical Models (Markov Networks)

{X1,X2,…,Xn} = set of variables{Y1,Y2,…,Yk} ⊆ {X1,X2,…,Xn} -- subset of variables

Val(Y1) x Val(Y2)x … x Val(Yk)0 R+

φscope[φ]

=

= factor

Joint probability = product of factors

Factor = measure of relationship for a group of variables

Page 5: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

Example

STAT 598L: Probabilistic Graphical Models (Markov Networks)

normalization constant(partition function)

Gibbs distribution

Page 6: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

Example (continued)

STAT 598L: Probabilistic Graphical Models (Markov Networks)

How many free parameters?3+3+3+3=12

Page 7: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

Factors and Free Parameters

• For this analysis, stick to binary variables

• Each factor of k variables = 2k-1 free parameters

• Assume all factors are of the same size– nCk ways possible factors (O(nk))

– Total of O(nk2k) free parameters

– Compare to O(2n) for a full table

• Conclusion: even using large factors reduces the number of free parameters

STAT 598L: Probabilistic Graphical Models (Markov Networks)

Page 8: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

BNs: Special Case

STAT 598L: Probabilistic Graphical Models (Markov Networks)

Page 9: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

Factor Operations: Product

STAT 598L: Probabilistic Graphical Models (Markov Networks)

X=x Y=y φ1(x,y)

1 1 0.4

1 0 0.7

0 1 1

0 0 0.8

Y=y Z=z φ2(y,z)

1 1 0.3

1 0 0.9

0 1 0.5

0 0 1

X=x Y=y Z=z φ12(x,y,z)

1 1 1 0.12

1 1 0 0.36

1 0 1 0.35

1 0 0 0.7

0 1 1 0.3

0 1 0 0.9

0 0 1 0.4

0 0 0 0.8

Page 10: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

Conditional Independence?

• What about {a,c}, {a,d}, {b,c}, and {b,d}?– They cannot be made independent!

– Edges connect variables in the same scope

– Resulting graph = Markov network

STAT 598L: Probabilistic Graphical Models (Markov Networks)

A

C

B

D

Page 11: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

Factorization: Formal Definition• Given: Gibbs distribution P with non-negative factors Φ={φ1,…,φK}, and a Markov network H

• P factorizes over H: scope of every factor corresponds to a complete subgraph of H

STAT 598L: Probabilistic Graphical Models (Markov Networks)

A

C

B

D

A

C

B

D

Page 12: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

Factorization

• Collection of factors is not unique– Are the scopes {{A,B}, {A,C}, and {B,C}}, or is it just

{A,B,C}?

– Networks can obscure scopes (structures) of original factors

STAT 598L: Probabilistic Graphical Models (Markov Networks)

A

C

B

Page 13: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

Graphical Model

STAT 598L: Probabilistic Graphical Models (Markov Networks)

Graphical Model = Graph + Parameters

Bayesian Network =parents in

chain decomposition

+conditional probability

distributions

Markov network =variables in

factors + factors

Page 14: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

Undirected vs Directed Model• Bayesian networks:

– DAG => dimensionality reduction with chain rule for probability (simple justification)

– Possible causal dependence (interpretation the edge directions)

– Parameters are interpretable

– Represented independencies depend on the order of variables (drawback)

• Undirected model:– No ordering to consider! (Fewer objects, one less uncertainty to worry

about)

– Intuition using exponential models (later in the course)

– Difficult to interpret (and to illicit) the parameters

STAT 598L: Probabilistic Graphical Models (Markov Networks)

Page 15: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

Representational Power: BN vs MN

• Can Bayesian Networks represent all independencies from Markov Network?– No: {(A ⊥ B │ C, D), (C ⊥ D │ A, B)}

• Can Markov Networks represent all independencies from Bayesian Networks– No: A -> B <- C

• What is the overlap?– Later

STAT 598L: Probabilistic Graphical Models (Markov Networks)

Page 16: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

Graph Separation

• Need to establish conditional independence from undirected graph properties

• Active path = none of the intermediate variables are observed

• No active paths = separation

• Monotonic: adding observed variables can only reduce active paths

STAT 598L: Probabilistic Graphical Models (Markov Networks)

A

C

B

D

E

blocked

Set of global independencies (global Markov property)

Page 17: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

Representation Theorem for BNs

STAT 598L: Probabilistic Graphical Models (Markov Networks)

P factorizes according to GEach variable is independent of its non-descendants given its parents

Local Markov assumption

independencies graph structure

Page 18: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

Representation Theorem for MNs

STAT 598L: Probabilistic Graphical Models (Markov Networks)

P factorizes according to Hglobal independencies set by

scopes of factors

Global Markov property

independencies graph structure

A

C

B

D

E

?

Page 19: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

Representation Theorem for MNs

• Proof: Need to show

– Case 1: Assume• Partition Di so that either

Di⊆A∪C or Di⊆B∪C

STAT 598L: Probabilistic Graphical Models (Markov Networks)independencies graph structure

P factorizes according to Hglobal independencies set by

scopes of factors

A

B

C

Page 20: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

Representation Theorem for MNs

• Proof: Need to show

– Case 2:

STAT 598L: Probabilistic Graphical Models (Markov Networks)

A B

CU1

U2

independencies graph structure

P factorizes according to Hglobal independencies set by

scopes of factors

Page 21: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

Converse?

• Think xor

STAT 598L: Probabilistic Graphical Models (Markov Networks)

P factorizes according to Hglobal independencies set by

scopes of factors

Global Markov property

independencies graph structure

A

C

B

D

E

Page 22: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

Hammersley-Clifford Theorem

STAT 598L: Probabilistic Graphical Models (Markov Networks)

P factorizes according to Hglobal independencies set by

scopes of factors

Global Markov property

independencies graph structure

A

C

B

D

E

If P is positive and

Page 23: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

• Interpreting the statement

• Sketch of proof (by construction):– All factors not in the trail are uniform (remove

nodes and edges not in the trail)

– Make the remaining factors almost deterministic

Completeness of separation

STAT 598L: Probabilistic Graphical Models (Bayesian Networks)

Active trail between X and Y given Z X and Y are dependent given Z in some P that factorizes according to H

Page 24: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

More General Result

STAT 598L: Probabilistic Graphical Models (Bayesian Networks)

Soundness

Intuition: Two binary variables X and Y;3-d space of possible factors with a 2-d manifold for independence

Completeness (almost)

X Y

Page 25: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

Representation Theorem for BNs

STAT 598L: Probabilistic Graphical Models (Markov Networks)

P factorizes according to GEach variable is independent of its non-descendants given its parents

Local Markov assumption

independencies graph structure

Page 26: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

Other Ways to Encode Independence

• Local Markov independence:

• Pairwise independence:

STAT 598L: Probabilistic Graphical Models (Markov Networks)

Markov blanket (local)

Pairwise Markov independencies

Page 27: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

Relation Between Independencies

• Two separated nodes will also be separated by the neighbors for either node

• Variables corresponding to non-adjacent are conditionally independent given the variables corresponding to neighbors– Conditionally independent also given the rest of

the variables (monotonic)

STAT 598L: Probabilistic Graphical Models (Markov Networks)

global local pariwise

Page 28: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

Converse

• For all disjoint A, B, and C,

– Induction on size of C• |C|=n-2:

• |C|=k-1<n-2, case I:

STAT 598L: Probabilistic Graphical Models (Markov Networks)

globalpairwise

&

&

&

Page 29: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

Converse

• For all disjoint A, B, and C,

– Induction on size of C• |C|=k-1<n-2, case II:

• Assume |A|=|B|=1, otherwise approach as in case I

STAT 598L: Probabilistic Graphical Models (Markov Networks)

globalpairwise

&

&

&

Page 30: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

Equivalence

• Given P is positive– Global Markov property

– Local Markov property

– Pairwise Markov property

STAT 598L: Probabilistic Graphical Models (Markov Networks)

Page 31: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

How To Recover MNs from Distribution

• If P is positive– Check whether A ⊥ B | X-A-B or

– Find smallest C such that A ⊥ C | X-A-C• C=MBP(A) (Markov blanket)

– In both cases, the graph is a minimal I-map of P

– Graphs are the same – such I-map is unique!

• If P is not positive– No guarantee that the resulting graph is an I-map

of P

STAT 598L: Probabilistic Graphical Models (Markov Networks)

Page 32: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

Finding P-maps

• If P-map exists– Find a minimal I-map

– It is also a P-map!

• Does it always exist?– Think v-structure

STAT 598L: Probabilistic Graphical Models (Markov Networks)

Page 33: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

Alternative Parametrizations

• Structure of the Markov network may hide the scopes of the factors– Think complete graph: is it one factor with all

variables in the scope or a product of factors with pairs of variables in the scope?

• May want to make factorization more explicit in the structure

STAT 598L: Probabilistic Graphical Models (Markov Networks)

Page 34: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

Factor Graphs

• Bipartite graph: variables vs factors

STAT 598L: Probabilistic Graphical Models (Markov Networks)

A

C

B

DA CB D

Page 35: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

Log-Linear Model

• Product into a sum

• Convert factors into a finer set of features• Break down factors further (context)

• Different features may share same scope

STAT 598L: Probabilistic Graphical Models (Markov Networks)

energy functions

weights features

Page 36: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

Ising Model

STAT 598L: Probabilistic Graphical Models (Markov Networks)

Binary xis

Page 37: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

STAT 598L: Probabilistic Graphical Models (Markov Networks)http://www.cis.upenn.edu/~jshi/GraphTutorial/

Page 38: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

Recap

• Parameterizations for Markov networks– Features

– Overparameterizations

– How many parameters are free?

– Canonical parameterization

STAT 598L: Probabilistic Graphical Models (Markov Networks)

Page 39: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

Plan

• Proof of Hammersley-Clifford theorem (if there is interest)

• Justification for Markov networks using Maximum Entropy principle (later)

• Relating Bayesian and Markov networks– Proof of soundness theorem for Bayesian

networks

– Determining which Markov networks are P-maps for which Bayesian networks

STAT 598L: Probabilistic Graphical Models (Markov Networks)

Page 40: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

Information Theory

• P(X) encodes our uncertainty about X– Some variables are more uncertain than others

– How can we quantify this intuition?• Entropy: average number of bits required to encode X

• Entropy is maximized when X is uniform

STAT 598L: Probabilistic Graphical Models (Markov Networks) 40

P(X) P(Y)

X Y

( ) ( ) ( ) ( )∑=

=

xP xP

xPxp

EXH 1log1log

From Carlos Guestrin’s 10-708 Probabilistic Graphical Models Fall 2008 at CMU

Page 41: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

Maximum Entropy Principle

• Given everything else the same, pick a distribution with the maximum entropy– Closest to uniform

• Example: ¾ kangaroo’s are left-handed and ¾drink Foster’s– Want to reconstruct the full probability table

knowing only p11+p12=0.75 and p11+p21=0.75

– Have 3 free parameters and only 2 constraints leaving 1 free parameter

STAT 598L: Probabilistic Graphical Models (Markov Networks)

11 12

21 22

p pp p

Page 42: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

MaxEnt Principle Continued

• Since we are not given that left-handedness is correlated with Foster drunkedness, ideally do not want to introduce the correlation into the model

• Which objective function to maximize?

• Entropy is (the only) such function– Want to maximize HP(X) subject to the constraints

p11+p12=0.75 and p11+p21=0.75

STAT 598L: Probabilistic Graphical Models (Markov Networks)

Gull S.F., Skilling J. (1984), “The Maximum Entropy Method,” in Indirect Imaging

Page 43: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

Direct Solution

STAT 598L: Probabilistic Graphical Models (Markov Networks)

Left-handedness is independent of Foster drunkedness!

Page 44: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

Round-about Solution

• Constraints = Lagrange multipliers

STAT 598L: Probabilistic Graphical Models (Markov Networks)

Page 45: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

Round-about Solution

• How to find the weights?– Plug in the log-linear model for P(x) and maximize F(x)

– Or, satisfy the constraints

STAT 598L: Probabilistic Graphical Models (Markov Networks)

Log-linear model!

Page 46: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

MaxEnt in a More General Setting• Given a set of constraints

– General solution to the MaxEnt formulation is

• Log-linear model is an approximation to a distribution that preserves some properties (constraints) while making the distribution as close to uniform as possible– Duality between constraints and weights

STAT 598L: Probabilistic Graphical Models (Markov Networks)

Page 47: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

Soundness of d-separation

STAT 598L: Probabilistic Graphical Models (Markov Networks)

For all P that factorizes according to G

G is an I-map for P

G is a BN structure for P

d-separation in G

conditional independencein Plocal graph property

global separation property

Page 48: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

Proof Outline

• Given evidence, convert Bayesian network into an equivalent Markov network– Construct such network

– Show that it is an equivalent Markov network

• Use separation property of the Markov network to prove the theorem

STAT 598L: Probabilistic Graphical Models (Markov Networks)

Page 49: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

Constructing MNs from BNs

STAT 598L: Probabilistic Graphical Models (Markov Networks)

A

C

B

D E

A

C

B

D E

moralized graph

I-mapminimal I-map

G H

Page 50: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

Constructing MNs from BNs with Evidence

STAT 598L: Probabilistic Graphical Models (Markov Networks)

A

C

B

D E

A

C

B

D E

moralized graph

G H

Page 51: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

P-map for Moral Graphs

STAT 598L: Probabilistic Graphical Models (Markov Networks)

A

C

B

D E

A

C

B

D E

moral graph moralized graph

minimal I-map

G H

Proof: pick an active (minimal) trailin G. Show it is in H.

Two cases:Trail has no v-structures -- no marked nodes-- same trail is in HTrail has v-structures – v-structure is covered-- not minimal -- contradiction

Page 52: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

Soundness for d-separation

• What if the graph is not moral?– What if immoralities did not matter?

– They are if effects or their descendant is in evidence

• Only consider the subgraphs for which immoralities have a descendant in the evidence– Upward closure of evidence nodes

STAT 598L: Probabilistic Graphical Models (Markov Networks)

Page 53: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

Upward Closure and Its MN

STAT 598L: Probabilistic Graphical Models (Markov Networks)

A

C

B

D E

G

A

C

B

D

G’

Exercise 3.8: BN(G’) agrees with BN(G) over nodes of G’

barren node

A

C

B

D

H

Page 54: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

Soundness of d-separation

• Consider X and Y d-separated by Z

• Build an upward closure for X∪Y∪Z

• d-separation is equivalent to separation in H

• Separation in H implies conditional independence

STAT 598L: Probabilistic Graphical Models (Markov Networks)

For all P that factorizes according to G

G is a BN structure for P

d-separation in G

conditional independencein P

A

C

B

D E

G

A

C

B

D

H

Page 55: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

From Markov Networks to Bayesian Networks

• As seen before, Markov networks cannot represent immoralities

• Can show that if a Bayesian network G is a minimal I-map for some Markov network structure H, it contains no immoralities

• No immoralities = every three nodes with v-structure are covered

• Undirected cycle of length >3 = v-structure– Must have a chord

• All BN I-maps of Markov networks are chordal– No BN P-map exists for a non-chordal MN

STAT 598L: Probabilistic Graphical Models (Markov Networks)

Page 56: STAT 598L Probabilistic Graphical Modelsskirshne/teaching/STAT598L_F09/mn.pdf · 2010-01-27 · STAT 598L: Probabilistic Graphical Models (Markov Networks) A C B D • Markov networks

Markov Networks: Summary• Mass/density = normalized product of factors

• Represent conditional independence with independence graphs– Conditional independence = separation in the graph

– Global separation = local separation (Markov blanket) = pairwiseseparation, all in positive distributions

• Interpretation: closest to uniform under constraints specified by features– Scope of features determines the structure of the graph

(representation theorem)

• Relationship between Markov and Bayesian networks– MNs cannot represent v-structures of BNs

– BNs cannot represent chordless loops of MNs

– Chordal graphs can be represented (as P-maps) by bothSTAT 598L: Probabilistic Graphical Models (Markov Networks)