learning equivalence classes of bayesian-network structures

29
Learning Equivalence Classes of Bayesian- Network Structures David M. Chickering Presented by Dmitry Zinenko

Upload: berg

Post on 23-Feb-2016

43 views

Category:

Documents


0 download

DESCRIPTION

Learning Equivalence Classes of Bayesian-Network Structures. David M. Chickering Presented by Dmitry Zinenko. Heuristic Search. We are looking for the best state in the search space . Na ï vely: state = a particular DAG search space = all possible DAGs over our variables - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Learning  Equivalence Classes  of Bayesian-Network Structures

Learning Equivalence Classes of Bayesian-Network Structures

David M. Chickering

Presented by Dmitry Zinenko

Page 2: Learning  Equivalence Classes  of Bayesian-Network Structures

Heuristic Search

We are looking for the best state in the search space. Naïvely: state = a particular DAG search space = all possible DAGs over

our variables Move between related states using

search operators. Naively: Egde addition/removal/inversion

Page 3: Learning  Equivalence Classes  of Bayesian-Network Structures

Heuristic Search Challenges

Search space graph should be well-connected To reach good states quickly To avoid local maxima

Search space graph should not be too dense Computationally efficient scoring and

transformations

Page 4: Learning  Equivalence Classes  of Bayesian-Network Structures

Equivalence

G1 and G2 are equivalent if the set of distributions that can be represented by them is identical

Equivalence is an equivalence relationship!

X Y

X Y

X Y PP

Page 5: Learning  Equivalence Classes  of Bayesian-Network Structures

Score Equivalence

If all we care about is the probability distribution, all we need is the equivalence class

The scoring function should give equal scores to structures from the same class Called score equivalent

Why prefer one representation of the class to another?

Page 6: Learning  Equivalence Classes  of Bayesian-Network Structures

Equivalence Classes Are Good For You

We are ultimately looking for a probability representation, not a particular DAG

Searching individual DAGs is bad: Some operators lead to the same class

Efficiency Bad state connectivity for greedy

Page 7: Learning  Equivalence Classes  of Bayesian-Network Structures

Theorem 1 (Verma & Pearl 1990)

Two DAGs are equivalent if and only if they have the same skeletons and the same v-structures

X

Y

X

Y

Z

X

Y

ZZ

X

Y

Z

Page 8: Learning  Equivalence Classes  of Bayesian-Network Structures

Partially Directed Acyclic Graph

A directed edge is called compelled in G, if for every G’ equivalent to G, that edge has the same direction

Otherwise we call it reversible

Partially Directed Acyclic Graph (PDAG) Contains both directed and undirected edges Does not contain any directed circles

Theorem 1 extends naturally to PDAGs A DAG is also a PDAG

Page 9: Learning  Equivalence Classes  of Bayesian-Network Structures

CPDAG and Consistent Extension

Completed PDAG for Class(G) contains directed edges for the compelled edges of G undirected edges for the reversible edges of G

G is consistent extension of P if G has the same skeleton and v-structures Every directed edge in P has the same

orientation in G

X Y Z X Y Z

X Y WZ

Page 10: Learning  Equivalence Classes  of Bayesian-Network Structures

CPDAGs And Equivalence

Every consistent extension of P is in Class(P)

If Pc is a completed PDAG, then every PDAG G in Class(Pc) is a consistent extension of Pc

If P1 and P2 are completed PDAGs that admit consistent extension, then P1=P2 if and only if Class(P1)=Class(P2) A completed PDAG uniquely represents its

equivalence class

Page 11: Learning  Equivalence Classes  of Bayesian-Network Structures

DAG to CPDAG (Meek 1995)

Undirect all edges except those that are in the v-structures

Direct (mark as compelled) undirected edges that match particular patterns

X

Y

ZX

Y

Z X

Y

ZW

Page 12: Learning  Equivalence Classes  of Bayesian-Network Structures

Constructing Consistent Extension (I)

“Theorem 26”: The undirected components of a CPDAG are chordal In any cycle of length >3 in a DAG, there must be a

v-structure!

Let {Ki} be the set of undirected components of a completed PDAG Pc. Let {Gi} be consistent extensions of {Ki}

A graph G that results from replacing each reversible edge in Ki with the directed edge from corresponding Gi is a consistent extension of Pc

Page 13: Learning  Equivalence Classes  of Bayesian-Network Structures

Constructing Consistent Extension (II)

Use decreasing maximum cardinality search to direct edges in each one of the chordal components Property of dMCS: Every path between any pair

of non-adjacent x, y contains a node numbered higher than x or y

Resulting graph is a consistent extension of Pc

Works only on completed PDAGs

Page 14: Learning  Equivalence Classes  of Bayesian-Network Structures

PDAG-to-DAG (Dor & Tarsi 1992)

Select a node x in P s.t. x has no outgoing edges Vertices adjacent to x form a clique

Direct all edges (x―y) toward x x becomes a sink

Remove x from P

Works only on any PDAG

Page 15: Learning  Equivalence Classes  of Bayesian-Network Structures

Applying the Operators

Page 16: Learning  Equivalence Classes  of Bayesian-Network Structures

Operators

The set of operators should: Ensure global connectivity

(completeness) and good connectivity in general

Be easy to check for applicability (validity)

Avoid redundancy Allow for efficient scoring

Local scoring – local changes in G cause “local” changes in score(G)

Page 17: Learning  Equivalence Classes  of Bayesian-Network Structures

Score Decomposability

A scoring function S is decomposable if it is a product (or sum) of factors s, each depending only on one node and its parents

For example:

1 1

log | , log | , log | , ,n n

i i ii i

P D G P x G P x x G

X Y

X Y

1 | |log | log | , log | ,Z X Z Y XS G P Z P X Z P Y X

2 1 |log | , log |Y X YS G S G P Y X P Y

Z

Z

Page 18: Learning  Equivalence Classes  of Bayesian-Network Structures

Used Operators

Page 19: Learning  Equivalence Classes  of Bayesian-Network Structures

Operator Scoring

Chickering 1996a Apply the operator and score the

consistent extension (DAG) Drawbacks:

Need to apply PDAG-to-DAG for every operator

Local operators may cause non-local changes when applied to CPDAG

Cannot benefit from local scoring

Page 20: Learning  Equivalence Classes  of Bayesian-Network Structures

Local Operator Scoring

Page 21: Learning  Equivalence Classes  of Bayesian-Network Structures

InsertU Operator – “Theorem 34”

Let Pc be any completed PDAG for which nodes x and y are not adjacent.

If after adding an edge between x and y Pc admits a consistent extension, then

The edge x―y is reversible if and only if x and y have exactly the same parents in the original PDAG

Page 22: Learning  Equivalence Classes  of Bayesian-Network Structures

InsertU Operator – “Theorem 6”

The insertion of the undirected edge x―y in a CPDAG Pc is valid if and only if: x and y have the same parents in Pc

every undirected path between x and y contains at least one of their common neighbors

Only if (+Theorem 34): Take the shortest undirected path from x to y

in Pc that does not include any common neighbor of x and y

Length at least 3 and has no chord After adding x―y becomes a cycle of

length 4

Page 23: Learning  Equivalence Classes  of Bayesian-Network Structures

InsertU Operator – “Lemma 32”

Let Pc be any completed PDAG, and let x and y be any pair of nodes that are not adjacent.

There exists a consistent extension of Pc in which all the reversible edges adjacent to x are directed

away from x all the reversible edges between y and the common

neighbors of x and y are directed toward y all the other reversible edges adjacent to y are

directed away from y If and only if every undirected path between x and y

passes through a common neighbor of x and y

Page 24: Learning  Equivalence Classes  of Bayesian-Network Structures

InsertU Operator – Theorem 6“If” proof outline

Use consistent extension from Lemma 32 as G

Add a directed edge x→y to G to get G’ (the other direction is symmetric)

Show that G’ is a consistent extension of P’ (P with the addition of the undirected edge x―y) G’ is acyclic Same skeleton Same v-structures

Page 25: Learning  Equivalence Classes  of Bayesian-Network Structures

InsertU Operator – Theorem 6G’ is a DAG

Assume by contradiction that there is a directed path from y to x in G

All the reversible edges are directed away from x, so the last edge in that path w→x is compelled

Then w is a parent of x in P, and it must also be a parent of y

In G there is a cycle y→w→y

X Y

W

Page 26: Learning  Equivalence Classes  of Bayesian-Network Structures

InsertU Operator – “Lemma 24”

Let Pc be a completed PDAG, and let P’ denote a PDAG that results from adding a single edge between x and y to Pc

Consider any consistent extension G of Pc, and G’ that results by inserting a directed edge between x and y in G

Then any v-structure in G’ but not in P’, or any v-structure in P’ but not in G’ must include the edge between x and y

Page 27: Learning  Equivalence Classes  of Bayesian-Network Structures

InsertU Operator – Theorem 6G’ is a consistent extension of P’

By Lemma 24, any v-structure different between G’ and P’ must include the edge x―y

The v-structure must be in G’, because in P’ this edge is undirected

The other edge in the v-structure cannot be reversible in G’ x does not have reversible parents y’s reversible parents are adjacent to x

But any compelled parent of x or y is a parent of both Q.E.D

Page 28: Learning  Equivalence Classes  of Bayesian-Network Structures

Local Operator Evaluation

Since the only difference between G and G’ is the edge x→y, we can use score decomposability to compute the score of P’ in O(1) time s(P’) = s(Pc)+s(y,Nx,y{x}y)-s(y,Nx,yy)

In general we do not need to transform the CPDAG to compute neighbor scores: Calculate scores for all the neighbor states

(locally!) Check operator validity (efficiently!) starting

from the highest score

Page 29: Learning  Equivalence Classes  of Bayesian-Network Structures