learning bayes nets based on conditional dependencies oliver schulte department of philosophy and...

28
Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver, Canada [email protected] ` with Wei Luo (Simon Fraser) and Russ Greiner (U of Alberta)

Upload: donald-howard

Post on 17-Jan-2018

225 views

Category:

Documents


0 download

DESCRIPTION

Learning Bayes Nets Based on Conditional Dependencies 3/28 Bayes Nets: Overview Bayes Net Structure = Directed Acyclic Graph. Nodes = Variables of Interest. Arcs = direct “influence”, “association”. Parameters = CP Tables = Prob of Child given Parents. Structure represents (in)dependencies. Structure + parameters represents joint probability distribution over variables.

TRANSCRIPT

Page 1: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,

Learning Bayes Nets Based on Conditional Dependencies

Oliver SchulteDepartment of Philosophy andSchool of Computing ScienceSimon Fraser UniversityVancouver, [email protected] `

with Wei Luo (Simon Fraser) andRuss Greiner (U of Alberta)

Page 2: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,

Learning Bayes Nets Based on Conditional Dependencies

2/28

Outline

Brief Intro to Bayes NetsCombining Dependency Information with Model SelectionLearning from Dependency Data Only: Learning-Theoretic Analysis

Page 3: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,

Learning Bayes Nets Based on Conditional Dependencies

3/28

Bayes Nets: OverviewBayes Net Structure = Directed Acyclic Graph.Nodes = Variables of Interest.Arcs = direct “influence”, “association”.Parameters = CP Tables = Prob of Child given Parents.Structure represents (in)dependencies.Structure + parameters represents joint probability distribution over variables.

Page 4: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,

Learning Bayes Nets Based on Conditional Dependencies

4/28

Examples from CIspace (UBC)

Page 5: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,

Learning Bayes Nets Based on Conditional Dependencies

5/28

Graphs entail Dependencies

A

B

C

A

B

C

A

B

C

Dep(A,B),Dep(A,B|C)

Dep(A,B),Dep(A,B|C),Dep(B,C),Dep(B,C|A),Dep(A,C|B)

Page 6: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,

Learning Bayes Nets Based on Conditional Dependencies

6/28

I-maps and Probability Distributions

• Defn Graph G is an I-map of prob dist P If Dependent(X,Y|S) in P, then X is d-connected to Y given S in G.

• Example: If Dependent(Father Eye Color,Mother Eye Color|Child Eye Color) in P, then Father EC is d-connected to Mother EC given Child EC in G.

• Informally, G is an I-map of P G entails all conditional dependencies in P.

• Theorem Fix G,P. There is a parameter setting for G such that (G, ) represents P G is an I-map of P.

Page 7: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,

Two Approaches to Learning Bayes Net Structure

• selectgraph G as “model” with parameters to be estimated• “search and score”

• find G that represents dependencies in P• “test and cover” dependencies

Aim: find G that represents P with suitable parameters

Page 8: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,

Learning Bayes Nets Based on Conditional Dependencies

8/28

Our Hybrid Approach

Sample

Set ofDependencies Final

Output Graph

The final selected graph maximizesa model selection score and covers all observed dependencies.

Page 9: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,

Definition of Hybrid Criterion• Let d be a sample. Let S(G,d) be a score function.

AB

C

Case 1 Case 2 Case 3

S 10.5

Let Dep be a set of conditional dependencies extracted from sample d.

Graph G optimizes score S given Dep, sample d 1. G entails the dependencies Dep, and2. if any other graph G’ entails Dep, then score(G,d) ≥

score(G’,d).

Page 10: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,

Learning Bayes Nets Based on Conditional Dependencies

10/28

Local Search Heuristics for Constrained Search • There is a general method for adapting any local search heuristic to accommodate observed dependencies.• Will present adaptation of GES search - call it IGES.

Page 11: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,

Learning Bayes Nets Based on Conditional Dependencies

11/28

GES Search (Meek, Chickering)

GrowthPhase:AddEdges

B

CAScore = 5

B

CA Score = 7

B

CA Score = 8.5

ShrinkPhase:DeleteEdges

B

CA

Score = 9

B

CA Score = 8

Page 12: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,

Learning Bayes Nets Based on Conditional Dependencies

12/28

IGES Search

Case 1 Case 2 Case 3

Step 1: Extract Dependencies From Sample

Testing Procedure Dependencies

1. Continue with Growth Phase until all dependencies are covered.

2. During Shrink Phase, delete edge only if dependencies are still covered.

B

CAScore = 7

B

CA Score = 5

given Dep(A,B)

Page 13: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,

Asymptotic Equivalence GES = IGES Theorem Assume that score function S is consistent and that joint probability distribution P satisfies the composition principle. Let Dep be a set of dependencies true of P. Then with P-probability 1, GES and IGES+Dep converge to the same output in the sample size limit.• So IGES inherits the convergence properties

of GES.

Page 14: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,

Learning Bayes Nets Based on Conditional Dependencies

14/28

Extracting Dependencies

We use 2 test (with cell coverage condition)Exhaustive testing of all triplesIndep(X,Y|S) for cardinality(S) < k

chosen by user More sophisticated testing strategy coming soon.

Page 15: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,

Learning Bayes Nets Based on Conditional Dependencies

15/28

Simulation Setup: Methods

• The hybrid approach is a general schema.Our Setup• Statistical Test: 2

• Score S: BDeu (with Tetrad default settings)• Search Method: GES, adapted

Page 16: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,

Simulation Setup: Graphs and Data• Random DAGs with binary variables.• #Nodes: 4,6,8,10.• Sample Sizes 100, 200, 400, 800,

1600, 3200, 6400, 12800, 25600.• 10 random samples per graph per

sample size, average results.• Graphs generated with Tetrad’s random DAG

utility.

Page 17: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,

Result Graphs

Page 18: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,

Conclusion for I-map learning: The Underfitting Zone

Although not explicitly designed to cover statistically significant correlations, GES+BDeu does so pretty well.But not perfectly, so IGES helps to add in missing edges (on the order of 5) for node 10 graphs.

samplesize

small:little significance

medium:underfitting of correlations

large:convergence zone

Diver-gence from True Graph

standard search + scoreconstrained S + S

Page 19: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,

Learning Bayes Nets Based on Conditional Dependencies

19/28

Part II: Learning-Theoretic Model (COLT 2007)

• Learning Model: Learner receives increasing enumeration (list) of conditional dependency statements.• Data repetition is possible.• Learner outputs graph (pattern); may output ?.Dep(A,B) Dep(B,C) Dep(A,C|B)

B

CA

B

CA?

……

Data

Conjectures

Page 20: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,

Learning Bayes Nets Based on Conditional Dependencies

20/28

Criteria for Optimal Learning

Convergence: Learner must eventually settle on true graph.Learner must minimize mind changes.Given 1 and 2, learner is not dominated in convergence time.

Page 21: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,

Learning Bayes Nets Based on Conditional Dependencies

21/28

The Optimal Learning Procedure Theorem There is a unique optimal

learner defined as follows:1. If there is a unique graph G covering

the observed dependencies with a minimum number of adjacencies, output G.

2. Otherwise output ?.

Page 22: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,

Learning Bayes Nets Based on Conditional Dependencies

22/28

Computational Complexity of the Unique Optimal Learner

Theorem The following problem is NP-hard:1. Decide if there is a unique edge-minimal map for

a set of dependencies D.2. If yes, output the graph.Proof: Reduction to Unique Exact 3Set Cover.

{x1,x2,x3},{x3,x4,x5},{x4,x5,x7},{x2,x4,x5}, {x3,x6,x9}, {x6,x8,x9}

x1 x2 x3 x4 x5 x6 x7 x8 x9

{x1,x2,x3},{x4,x5,x7},{x3,x6,x9}

Page 23: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,

Learning Bayes Nets Based on Conditional Dependencies

23/28

Hybrid Method and Optimal Learner

Score-based methods tend to underfit (with discrete variables): place edges correctly but too few

mind change optimal but not convergence time optimal.• Hybrid method speeds up convergence.

Page 24: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,

Learning Bayes Nets Based on Conditional Dependencies

24/28

A New Testing Strategy

• Say that a graph G satisfies the Markov condition wrt sample d for all X, Y, if Y is nonparental nondescendant of X, then we do not find Dep(X,Y|parents(X)).• Given sample d, look for graph G that satisfies the MC wrt d with a minimum number of adjacencies.

Page 25: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,

Learning Bayes Nets Based on Conditional Dependencies

25/28

Future Work

• Use Markov condition to develop local search algorithm for score optimization requiring only (#Var)2 tests.• Apply idea of Markov condition +edge minimization for continuous variable models.

Page 26: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,

Learning Bayes Nets Based on Conditional Dependencies

26/28

Summary: Hybrid Criterion - test, search and score.

• Basic Idea: Base Bayes net learning on dependencies that can be reliably obtained even on small to medium sample sizes. • Hybrid criterion: find graph that maximizes model selection score given the constraint of entailing statistically significant dependencies or correlations.• Theory + Simulation evidence suggests that this:

• speeds up convergence to correct graph• addresses underfitting on small-medium samples.

Page 27: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,

Learning Bayes Nets Based on Conditional Dependencies

27/28

Summary: Learning-Theoretic Analysis Learning Model: Learn graph from dependencies alone.Optimal Method: look for graph that covers observed dependencies with a minimum number of adjacencies.Implementing this method is NP-hard.

Page 28: Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,

Learning Bayes Nets Based on Conditional Dependencies

28/28

References

“Mind Change Optimal Learning of Bayes Net Structure”.O. Schulte, W. Luo and R. Greiner (2007). Conference of Learning Theory (COLT).

THE END