samplesearch: importance sampling in the presence of determinism rina dechter bren school of...

71
Overview • Introduction: Mixed graphical models • SampleSearch: Sampling with Searching • Exploiting structure in samplinig: AND/OR Importance sampling

Upload: kristin-mclaughlin

Post on 13-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

Overview

• Introduction: Mixed graphical models

• SampleSearch: Sampling with Searching

• Exploiting structure in samplinig: AND/OR Importance sampling

Page 2: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

Overview

• Introduction: Mixed graphical models

• SampleSearch: Sampling with Searching

• Exploiting structure in samplinig: AND/OR Importance sampling

Page 3: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

Bayesian Networks: Representation(Pearl, 1988)

))(|()(

))(|(

1

i

n

i

i

ii

XpaXPXP

XpaXP

:CPTs

lung Cancer

Smoking

X-ray

Bronchitis

DyspnoeaP(D|C,B)

P(B|S)

P(S)

P(X|C,S)

P(C|S)

P(S, C, B, X, D) = P(S) P(C|S) P(B|S) P(X|C,S) P(D|C,B)

Belief Updating:P (lung cancer=yes | smoking=no, dyspnoea=yes ) = ?

Page 4: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

Mixed Networks: Mixing Belief and Constraints

Belief or Bayesian Networks

A

D

B C

E

FA

D

B C

E

F

)|(),,|(

),|(),|(),|(),( :CPTS

}1,0{:Domains

,,,,, :Variables

AFPBAEP

CBDPACPABPAP

DDDDDD

FEDCBA

FEDCBA

Constraint Networks

)( :solutions ofset theExpresses

),(),(),(),( :sConstraint

}1,0{:Domains

,,,,, :Variables

4321

Rsol

EARBCDRACFRABCR

DDDDDD

FEDCBA

FEDCBA

B C D=0

D=1

0 0 0 1

0 1 .1 .9

1 0 .3 .7

1 1 1 0

),|( CBDP

allowednot is 1D1,C1,B

allowednot is 0,0,0

)(3

DCB

BCDR

B= R=

Constraints could be specified externally or may occur as zeros in the Belief network

Page 5: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

Mixed networks:Distribution and Queries

• The distribution represented by a mixed network T=(B,R):

• Queries:– Weighted Counting (Equivalent to P(e), partition

function, solution counting)

– Marginal distribution:

)(

)(Rsolx

B xPM

otherwise ,0

)( if ),(1

)(RsolxxP

MxP BT

)( iT XP

Page 6: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

Applications• Determinism: More Ubiquitous than you may think!

• Transportation Planning (Liao et al. 2004, Gogate et al. 2005)– Predicting and Inferring Car Travel Activity of individuals

• Genetic Linkage Analysis (Fischelson and Geiger, 2002)– associate functionality of genes to their location on chromosomes.

• Functional/Software Verification (Bergeron, 2000)– Generating random test programs to check validity of hardware

• First Order Probabilistic models (Domingos et al. 2006, Milch et al. 2005)– Citation matching

Page 7: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

8

S13m

L11fL11m

L13m

X11 S13f

L12fL12m

L13f

X12

X13Model for locus 1

S23m

L21fL21m

L23m

X21 S23f

L22fL22m

L23f

X22

X23Model for locus 2

Functions on Orange nodes are deterministic

Page 8: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

Approximate Inference

• Approximations are hard with determinism• Randomized Polynomial ε-approximation possible when no

zeros are present (Karp 1993, Cheng 2001)• ε-approximation NP-hard in the presence of zeros• Gibbs sampling is problematic when MCMC is not ergodic.

• Current remedies– Replace zeros with very small values (Laplace

correction: Naive Bayes, NLP)– bad performance when zeros or determinism is

real!

Page 9: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

Overview

• Introduction: Mixed graphical models

• SampleSearch: Sampling with Searching– Rejection problem – Recovery, and analysis– Empirical evaluation

• Exploiting structure in samplinig: AND/OR Importance sampling

Page 10: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

Importance Sampling: Overview

• Given a proposal or importance distribution Q(z) such that f(z)>0 implies Q(z)>0, rewrite

EXZwherezfM

eEXPeP

Zz

EXB

\ P(e) )(

),()(\

)(

)(

)(

)()()(

zQ

zfE

zQ

zQzfzfM Q

Zz Zz

Given i.i.d. samples z1,..,zN from Q(z),

P(e)M]M̂[E )(1

)(

)(1ˆQ

11

N

j

N

jj

j

j zwNzQ

zf

NM

Page 11: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

Generating i.i.d. samples from Q

A=0

B=0 B=1 B=0 B=1

A=1

C=1C=1C=1 C=0 C=0 C=0 C=1

Root

0.8 0.2

0.4 0.6

0.2 0.8 0.2 0.8 0.2 0.8 0.2 0.8

0.2 0.8

C=0

)8.0.2.0()(),|(),8.0,2.0,6.0,4.0()|(),2.0,8.0()(

),|()|()(),,(

),...,|(.....)|()( 11121

CQBACQABQAQ

BACQABQAQCBAQ

XXXQXXQXQ Q(X) nn

Page 12: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

Rejection Problemsolution. anot is xi if

:sampling Importance

0)(

)(

)(1~

1

xif

xiQ

xif

NM

N

i

A=0

B=0

C=0

B=1 B=0 B=1

A=1

C=1C=1C=1 C=0 C=0 C=0 C=1

Root

0.8 0.2

0.4 0.6

0.2 0.8 0.2 0.8 0.2 0.8 0.2 0.8

0.2 0.8

Toss a biased coin

P(H)=0.8, P(T)=0.2

Say we get H

Page 13: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

Rejection Problem solution. anot is xi if

:sampling Importance

0)(

)(

)(1~

1

xif

xiQ

xif

NM

N

i

A=0

B=0

C=0

B=1

C=1C=1 C=0

Root

0.8

0.4 0.6

0.2 0.8 0.2 0.8

Toss a biased coin

P(H)=0.4, P(T)=0.6

Say, We get a Head

Page 14: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

Rejection Problem solution. anot is xi if

:sampling Importance

0)(

)(

)(1~

1

xif

xiQ

xif

NM

N

i

A=0

B=0

C=0

B=1

C=1C=1 C=0

Root

0.8

0.4 0.6

0.2 0.8 0.2 0.8

Toss a biased coin

P(H)=0.4, P(T)=0.6

Say, We get a Head

Page 15: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

Rejection Problem

• A large number of assignments generated will be rejected, thrown away

solution. anot is xi if

:sampling Importance

0)(

)(

)(1~

1

xif

xiQ

xif

NM

N

i

A=0

B=0

C=0 C=1

Root

0.8

0.4

0.2 0.8

Toss a biased coin

P(H)=0.4, P(T)=0.6

Say, We get a Head

Page 16: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

Rejection Problem

All Blue leaves correspond to solutions i.e. f(x) >0All Red leaves correspond to non-solutions i.e. f(x)=0

solution. anot is xif 0)(

)(

)(1ˆ :sampling Importance

i

1

i

N

i i

i

xf

xQ

xf

NM

A=0

B=0

C=0

B=1 B=0 B=1

A=1

C=1C=1C=1 C=0 C=0 C=0 C=1

Root

0.8 0.2

0.4 0.6

0.2 0.8 0.2 0.8 0.2 0.8 0.2 0.8

0.2 0.8

Constraints: A≠B A≠C

Page 17: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

Revising Q to backtrack-free distribution:

All Blue leaves correspond to solutions i.e. f(x) >0All Red leaves correspond to non-solutions i.e. f(x)=0

A=0

B=0

C=0

B=1 B=0 B=1

A=1

C=1C=1C=1 C=0 C=0 C=0 C=1

Root

0.8 0.2

0.40 0.61

0 0 0 1 1 0 0 0

0.21 0.80

QF(branch)=0 if no solutions under it

QF(branch)=1 if no solutions under the other branch

QF(branch) =Q(branch) otherwise

Constraints: A≠B A≠C

Page 18: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

Generating samples from QF

C=1

Root

0.80.2

?? Solution?? Solution

?? Solution?? Solution

A=0

00.4 0.6 1

?? Solution ?? Solution

B=1

00.2 0.81

• Invoke an oracle at each branch.• Oracle returns True if

there is a solution under a branch

• False, otherwise

Constraints: A≠B A≠C QF(branch)=0 if no solutions under it

QF(branch)=1 if no solutions under the other branch

QF(branch) =Q(branch) otherwise

Page 19: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

Generating samples from QF

• Oracles: In practice• Adaptive consistency as pre-

processing step• A complete CSP solver

• Too costly• O(exp(treewidth))• Invoked O(nd) for each

sample

A=0

B=1

C=1

Root

0.80.2

1

1

Constraints: A≠B A≠C

Gogate et al., UAI 2005, Gogate and Dechter, UAI 2005

Page 20: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

Approximations of QF

• Use i-consistency instead of adaptive consistency – O(ni) time and space complexity– identify some zeros so that they are never sampled

• Cons: Too weak when constraint portion is hard.

Reje

ction

Rat

e

Percentage of Zeros identified

NP-hard exponential time and space

No processing

Polynomial consistency enforcement

Gogate et al., UAI 2005, Gogate and Dechter, UAI 2005

Page 21: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

25

Algorithm SampleSearch

A=0

B=0

C=0

B=1 B=0 B=1

A=1

C=1C=1C=1 C=0 C=0 C=0 C=1

Root

0.8 0.2

0.4 0.6

0.2 0.8 0.2 0.8 0.2 0.8 0.2 0.8

0.2 0.8

N

j j

j

zQ

zf

NM

1 )(

)(1ˆ

Gogate and Dechter, AISTATS 2007, AAAI 2007

Constraints: A≠B A≠C

Page 22: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

26

Algorithm SampleSearch

A=0

B=0

C=0

B=1 B=0 B=1

A=1

C=1C=1C=1 C=0 C=0 C=0 C=1

Root

0.8 0.2

0.4 0.6

0.2 0.8 0.2 0.8 0.2 0.8 0.2 0.8

0.2 0.81

N

j j

j

zQ

zf

NM

1 )(

)(1ˆ

Gogate and Dechter, AISTATS 2007, AAAI 2007

Constraints: A≠B A≠C

Page 23: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

27

Algorithm SampleSearch

A=0

B=1 B=0 B=1

A=1

C=1C=1C=0 C=0 C=0 C=1

Root

0.8 0.2

0.2 0.8 0.2 0.8 0.2 0.8

0.2 0.8

Resume Sampling

1

N

j j

j

zQ

zf

NM

1 )(

)(1ˆ

Gogate and Dechter, AISTATS 2007, AAAI 2007

Constraints: A≠B A≠C

Page 24: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

28

Algorithm SampleSearch

A=0

B=1 B=0 B=1

A=1

C=1C=1C=0 C=0 C=0 C=1

Root

0.8 0.2

1

0.2 0.8 0.2 0.8 0.2 0.8

0.2 0.8

Constraint Violated

Until f(sample)>0

1

N

j j

j

zQ

zf

NM

1 )(

)(1ˆ

Gogate and Dechter, AISTATS 2007, AAAI 2007

Constraints: A≠B A≠C

Page 25: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

29

Generate more Samples

A=0

B=0

C=0

B=1 B=0 B=1

A=1

C=1C=1C=1 C=0 C=0 C=0 C=1

Root

0.8 0.2

0.4 0.6

0.2 0.8 0.2 0.8 0.2 0.8 0.2 0.8

0.2 0.8

N

j j

j

zQ

zf

NM

1 )(

)(1ˆ

Gogate and Dechter, AISTATS 2007, AAAI 2007

Constraints: A≠B A≠C

Page 26: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

30

Generate more Samples

A=0

B=0

C=0

B=1 B=0 B=1

A=1

C=1C=1C=1 C=0 C=0 C=0 C=1

Root

0.8 0.2

0.4 0.6

0.2 0.8 0.2 0.8 0.2 0.8 0.2 0.8

0.2 0.8

1

N

j j

j

zQ

zf

NM

1 )(

)(1ˆ

Gogate and Dechter, AISTATS 2007, AAAI 2007

Constraints: A≠B A≠C

Page 27: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

Traces of SampleSearch

A=0

B=1

C=1

Root

A=0

B=0B=1

C=1

Root

A=0

B=1

C=1

Root

C=0

A=0

B=0 B=1

C=1

Root

C=0

Constraints: A≠B A≠C

Gogate and Dechter, AISTATS 2007, AAAI 2007

Page 28: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

32

SampleSearch: Sampling Distribution• Problem: Due to Search, the samples are no

longer i.i.d. from Q

• Theorem: SampleSearch generates i.i.d. samples from the backtrack-free distribution

M]M[E )(

)(1

ondistributi free-Backtrack :Q

FQ

1

F

N

j jF

j

zQ

zf

NM

MMEzQ

zf

NM Q

N

j j

j

ˆ ,)(

)(1ˆ1

Gogate and Dechter, AISTATS 2007, AAAI 2007

Page 29: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

The Sampling distribution QF of SampleSearch

A=0

B=0 B=1

C=1C=1 C=0

Root

0.8

0 1

0 0 0 1

C=0

What is probability of generating A=0?

QF(A=0)=0.8

Why? SampleSearch is systematic

What is probability of generating (A=0,B=1)?

QF(B=1|A=0)=1

Why? SampleSearch is systematic

What is probability of generating (A=0,B=0)?

Simple: QF(B=0|A=0)=0

All samples generated by SampleSearch are solutions

Backtrack-free distribution

N

i i

i

xQ

xf

NM

1 )(

)(1~

Gogate and Dechter, AISTATS 2007, AAAI 2007

Constraints: A≠B A≠C

Page 30: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

Asymptotic approximations of QF

A=0

B=0 B=1

C=1C=1 C=0

Root

0.8

0 1

0 0 0 1

C=0

Hole ?

Don’t know

No solutions here

No solutions here

•IF Hole THEN

• UF=Q (i.e. there is a solution at the other branch)

• LF=0 (i.e. no solution at the other branch)

Gogate and Dechter, AISTATS 2007, AAAI 2007

QF(branch)=0 if no solutions under it

QF(branch) =Q(branch) otherwise

Page 31: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

Approximations:Convergence in the limit

• Store all possible traces

A=0

B=1

C=1

Root

0.8

1

1

?

Gogate and Dechter, AISTATS 2007, AAAI 2007

A=0

B=1

C=1

Root

C=0

0.8

?

0.6

1

?A=0

B=0B=1

C=1

Root

0.8

?

1

0.8

?

Page 32: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

Approximations:Convergence in the limit

• From the combined sample tree, update U and L.

IF Hole THEN UFN=Q and LF

N=0

)()()(:

)(

)(lim

)(

)(lim

)(

)()(

xLxQxUBounding

unbiasedallyAsymptotic

MxL

xfE

xU

xfE

xQ

xfExfM

FN

FFN

FN

NFN

N

FXx

A=0

B=1

C=1

Root

0.8

1

1

?

Gogate and Dechter, AISTATS 2007, AAAI 2007

Page 33: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

Improving Naive SampleSeach:The IJGP-wc-SS algorithm

• Better Search Strategy– Can use any state-of-the-art CSP/SAT solver e.g. minisat (Een and

Sorrenson 2006)

• Better Proposal distribution– Use output of IJGP- a generalized belief propagation to compute

the initial importance function

• w-cutset importance sampling (Bidyuk and Dechter, 2007)– Reduce variance by sampling from a subspace

Gogate and Dechter, AISTATS 2007, AAAI 2007

Page 34: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

Experiments• Tasks

– Weighted Counting– Marginals

• Benchmarks– Satisfiability problems (counting solutions)– Linkage networks– Relational instances (First order probabilistic networks)– Grid networks– Logistics planning instances

• Algorithms– IJGP-wc-SS/LB and IJGP-wc-SS/UB– IJGP-wc-IS (Vanilla algorithm that does not perform search)– SampleCount (Gomes et al. 2007, SAT)– ApproxCount (Wei and Selman, 2007, SAT)– EPIS (Changhe and Druzdzel, 2006)– RELSAT (Bayardo and Peshoueshk, 2000, SAT)– Edge Deletion Belief Propagation (Choi and Darwiche, 2006)– Iterative Join Graph Propagation (Dechter et al., 2002)– Variable Elimination and Conditioning (VEC)

Page 35: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

Results: Probability of EvidenceLinkage instances (UAI 2006 evaluation)

Time Bound: 3 hrsM: number of samples generated in 10 hrsZ: Probability of Evidence

Page 36: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

Results: Probability of EvidenceRelational instances (UAI 2008 evaluation)

Time Bound: 10 hrsM: number of samples generated in 10 hrsZ: Probability of Evidence

Page 37: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,
Page 38: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

Results: Solution CountsLatin Square instances (size 8 to 16)

Time Bound: 10 hrsM: number of samples generated in 10 hrsZ: Solution Counts

Page 39: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

Results: Solution CountsLangford instances

Time Bound: 10 hrsM: number of samples generated in 10 hrsZ: Solution Counts

Page 40: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,
Page 41: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

Results on Marginals

• Evaluation Criteria

• Always bounded between 0 and 1• Lower Bounds the KL distance• When probabilities close to zero are present KL

distance may tend to infinity.

n

xAxP

cedisHellinger

xAeApproximatxPExactn

i Dxii

ii

ii

1

2)()(

21

tan

)(: )(:

Page 42: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

Results: Posterior MarginalsLinkage instances (UAI 2006 evaluation)

Time Bound: 3 hrsTable shows the Hellinger distance (∆) and Number of samples: M

Page 43: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,
Page 44: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

Summary: SampleSearch• Manages rejection problem while sampling

• Sampling Distribution is the backtrack-free distribution QF

• Approximation of QF by storing all traces yielding an asymptotically unbiased estimator– Linear time and space overhead– Bound the weighted counts from above and below

• Empirically, when a substantial number of zero probabilities are present, SampleSearch dominate.

Page 45: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

Overview

• Introduction: Mixed graphical models

• SampleSearch: Sampling with Searching

• Exploiting structure in sampling: AND/OR Importance sampling

Page 46: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

51

Motivation

)(

)(

)(

)()()(

zQ

zfE

zQ

zQzfzfM Q

Zz Zz

4,3,2,1

4432321 )(),,(),,()(xxxx

CBAz

xfxxxfxxxfzfM

4,3,2,1

4321 ),()()()(xxxx

CBAz

xxfxfxfzfM

Given Q(z), Importance sampling totally disregards the structure of f(z) while approximating it.

Page 47: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

52

OR Search Tree

0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1

C

D

F

E

B

A 0 1

A

E

C

B

F

D

A B C RABC

0 0 0 10 0 1 10 1 0 00 1 1 11 0 0 11 0 1 11 1 0 11 1 1 0

A B E RABE

0 0 0 10 0 1 00 1 0 10 1 1 11 0 0 01 0 1 11 1 0 11 1 1 0

A E F RAEF

0 0 0 00 0 1 10 1 0 10 1 1 11 0 0 11 0 1 11 1 0 11 1 1 0

B C D RBCD

0 0 0 10 0 1 10 1 0 10 1 1 01 0 0 11 0 1 01 1 0 11 1 1 1

Constraint Satisfaction – Counting Solutions

Page 48: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

54

AND/OR Search Tree

A

E

C

B

F

D

A

D

B

EC

F

A B C RABC

0 0 0 10 0 1 10 1 0 00 1 1 11 0 0 11 0 1 11 1 0 11 1 1 0

A B E RABE

0 0 0 10 0 1 00 1 0 10 1 1 11 0 0 01 0 1 11 1 0 11 1 1 0

A E F RAEF

0 0 0 00 0 1 10 1 0 10 1 1 11 0 0 11 0 1 11 1 0 11 1 1 0

B C D RBCD

0 0 0 10 0 1 10 1 0 10 1 1 01 0 0 11 0 1 01 1 0 11 1 1 1

AOR

0AND

BOR

0AND

OR E

OR F F

AND 0 1 0 1

AND 0 1

C

D D

0 1 0 1

0 1

1

E

F F

0 1 0 1

0 1

C

D D

0 1 0 1

0 1

1

B

0

E

F F

0 1 0 1

0 1

C

D D

0 1 0 1

0 1

1

E

F F

0 1 0 1

0 1

C

D D

0 1 0 1

0 1

Constraint Satisfaction – Counting Solutions

pseudo tree

Page 49: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

55

AND/OR TreeA

D

B

EC

F

AOR

0AND

BOR

0AND

OR E

OR F

AND 0 1

AND 0 1

C

D D

0 1 0 1

0 1

1 1 1 0 0 1

0

1

E

F F

0 1 0 1

0 1

C

D

0 1

0 1

1

B

0

E

F

0 1

0 1

C

D D

0 1 0 1

0 1

1

E

F

0 1

0 1

C

D

0 1

0 1

1 1 0 1 1 1

0

1 1 1 0 1 0 1 0 1 1

0 0 0

pseudo tree

OR – casing upon variable values

AND – decomposition into independent subproblems

Page 50: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

56

Complexity of AND/OR Tree Search

AND/OR tree OR tree

Space O(n) O(n)

Time

O(n km)O(n kw* log n)

[Freuder & Quinn85], [Collin, Dechter & Katz91], [Bayardo & Miranker95], [Darwiche01]

O(kn)

k = domain sizem = depth of pseudo-treen = number of variablesw*= treewidth

Page 51: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

Background: AND/OR search space

X

Z

Y

{0,1,2}

{0,1,2}

{0,1,2}

Problem

Z

X Y

Pseudo-tree

X

Z

YChain Pseudo-tree

Z

0 1 2

X Y X Y X Y

1 2 1 202 0 2 0 101

OR

AND

OR

AND

AND/OR Tree

X

0 1 2

Z Z Z

1 2 20 0 1

Y

1 2

Y

1 2

Y

0 2

Y

0 2

Y

0 1

Y

0 1

OR Tree

57

Page 52: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

Example Bayesian network

58

P(X|Z) X=0 X=1 X=2

Z=0 0.3 0.4 0.3

Z=1 0.2 0.7 0.1

P(Y|Z) Y=0 Y=1 Y=2

Z=0 0.5 0.1 0.4

Z=1 0.2 0.6 0.2

P(Y)

Y=0 0.2

Y=1 0.7

Y=2 0.1

P(Z) Z=0 Z=1

0.8 0.2

Y

B

Z

X

A

Evidence A=0, B=0

P(X)

X=0 0.1

X=1 0.2

X=2 0.6

f(XZ) f(YZ)

f(Z)

XYZ

ZfYZfXZfbaPM )()()(),(

Page 53: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

59

Recap:Conventional Importance Sampling

)|()|()(

)()()(

)|()|()()|()|()(

)()()(

ZYQZXQZQ

ZfYZfXZfE

ZYQZXQZQZYQZXQZQ

ZfYZfXZfM

Q

XYZ

XYZ

ZfYZfXZfM )()()(

)|()|()()( ZYQZXQZQXYZQ

Gogate and Dechter, UAI 2008, CP2008

AND/OR idea!Decompose this expectation

Y

B

Z

X

A

Page 54: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

60

AND/OR Importance Sampling (General Idea)

Z

X Y

Pseudo-tree

)|(

)|(

)()|(

)|(

)()(

)(

)(ZYQ

ZYQ

YZfZXQ

ZXQ

XZfZQ

ZQ

ZfM

YXZ

)|()|()()|()|()(

)()()(ZYQZXQZQ

ZYQZXQZQ

ZfYZfXZfM

XYZ

Z

ZYQ

YZfEZ

ZXQ

XZfE

ZQ

ZfEM QQQ |

)|(

)(|

)|(

)(

)(

)(

• Decompose Expectation

Y

B

Z

X

A

Page 55: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

61

• Estimate all conditional expectations separately• How?

– Record all samples– For each sample that has Z=j

• Estimate the conditional expectations X|Z and Y|Z using samples corresponding to X|Z=j and Y|Z=j respectively.

• Combine the results

AND/OR Importance Sampling (General Idea) Z

X Y

Pseudo-tree

Z

ZYQ

YZfEZ

ZXQ

XZfE

ZQ

ZfEM QQQ |

)|(

)(|

)|(

)(

)(

)(

Y

B

Z

X

A

Page 56: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

62

AND/OR Importance Sampling

Z

0

X

2

1

1

Y

0 1

X

21

Y

0 1

<1.6,2> <0.4,2>

<0.16,1> <0.36,1> <0.2,1><0.14,1> <0.28,1> <0.84,1>

<0.08,1><0.12,1>

0|)0|(

)0,(Z

ZXQ

ZXfE

0|)0|(

)0,(Z

ZYQ

ZYfE

Sample # Z X Y

1 0 1 0

2 0 2 1

3 1 1 1

4 1 2 0

2/)0,2w(x0)z1,w(x

0 Zhaving Xon samples of Weight Average

0|)0|(

)0,( of Estimate

z

ZZXQ

ZXfE

Z

X Y

Pseudo-tree

Z

ZYQ

YZfEZ

ZXQ

XZfE

ZQ

ZfEM QQQ |

)|(

)(|

)|(

)(

)(

)(

Gogate and Dechter, UAI 2008, CP2008

Y

B

Z

X

A

Page 57: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

64

AND/OR Importance Sampling

Z

0

X

2

1

1

Y

0 1

X

21

Y

0 1

<1.6,2> <0.4,2>

<0.16,1> <0.36,1> <0.2,1><0.14,1> <0.28,1> <0.84,1>

<0.08,1><0.12,1>

0.262

0.360.16

0.17

2

0.140.2

0.462

0.840.08

0.2

2

0.120.28

0.09246.0*2.0 0.044217.0*26.0

05376.04

)092.04.02(0.0442)1.6(2

All AND nodes: Separate Components. Take ProductOperator: Product

All OR nodes: Conditional Expectations given the assignment above it

Operator: Average

Sample # Z X Y

1 0 1 0

2 0 2 1

3 1 1 1

4 1 2 0

Gogate and Dechter, UAI 2008, CP2008

Page 58: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

65

Algorithm AND/OR Importance Sampling

1. Generate samples x1, . . . ,xN from Q along O.

2. Build a AND/OR sample tree for the samples x1, . . . ,xN along the ordering O.

3. FOR all leaf nodes i of AND-OR tree do1. IF AND-node v(i)= 1 ELSE v(i)=0

4. FOR every node n from leaves to the root do1. IF AND-node v(n)=product of children2. IF OR-node v(n) = Average of children

5. Return v(root-node)

Gogate and Dechter, UAI 2008, CP2008

Page 59: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

67

Properties of AND/OR Importance Sampling

• Unbiased estimate of weighted counts.• AND/OR estimate has lower Variance than

conventional importance sampling estimate.• Variance Reduction

– Easy to Prove for case of complete independence (Goodman, 1960)

– Complicated to prove for general conditional independence case (Gogate thesis, papers!)

Gogate and Dechter, UAI 2008, CP2008

Page 60: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

AND/OR w-cutset (Rao-Blackwellised) sampling

• Rao-Blackwellisation (Rao, 1963)– Partition X into K and R, such that we can compute

P(R|k) efficiently.– Sample from K and sum out R– Estimate:

– w-cutset sampling (Bidyuk and Dechter, 2003): Select K such that the treewidth of R after removing K is bounded by “w”.

N

i i

Ri

RB kQ

kRP

NM

1 )(

)|(1ˆ

Weighted Counts conditioned on K=ki

Gogate and Dechter, Under review, CP 2009

Page 61: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

AND/OR w-cutset sampling• Perform AND/OR tree or graph sampling on K• Exact inference on R• Orthogonal approaches:

– Theorem: Combining AND/OR sampling and w-cutset sampling yields further variance reduction.

A

B C

D

E G

F

A

B C

D

E G

F

A

B C

A

B

C

Full pseudo tree

Start pseudo treeon the cutset variables

OR pseudo tree on the cutset variables

Graphical model

Gogate and Dechter, Under review, CP 2009

Page 62: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

70

From Search Trees to Search Graphs

• Any two nodes that root identical subtrees (subgraphs) can be merged

Page 63: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

71

From Search Trees to Search Graphs

• Any two nodes that root identical subtrees (subgraphs) can be merged

Page 64: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

72

Merging Based on Context

• One way of recognizing nodes that can be merged:

context (X) = ancestors of X in pseudo tree that are connected to X, or to descendants of X

[ ]

[A]

[AB]

[AE][BC]

[AB]

A

D

B

EC

F

pseudo tree

A

E

C

B

F

D

A

E

C

B

F

D

Page 65: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

AND/OR Graphs

C

B D

A

2

B D

0 101

C

0 1

B D B D

1 2 1 202 0 2

2 0 1 1

A

1 2

A

0 2

A

0

A

1

A

2

A

0

2

B D

0 101

C

0 1

D D

1 2 0 2

1 2

2

0

B

1

A

0

2

A

1

B

0

A

2

AND/OR tree

AND/OR graph

C

B

D

A

{0,1,2}

{0,1,2}

{0,1,2}

{0,1,2}

Page 66: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

C

0

B

1

D

1

A

0

2

D

2

A

1

1

B

0

D

0

A

2 0

2

D

2

A

C

0 1

B D B D

1 2 1 202 0 2

A

0

A

1

A

2

A

0

C

0 1

D D

1 2 0 22

0

B

1

A

0

2

A

1

B

0

A

2

C

B D

A

C

B

D

A

{0,1,2}

{0,1,2}

{0,1,2}

{0,1,2}

OR Sample Tree

AND/OR Sample Tree AND/OR Sample

Graph

AND/OR graph sampling

Page 67: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

IS

w-cutset ISAND/OR

Tree IS

AND/OR w-cutset Tree IS

AND/OR Graph IS

AND/OR w-cutset Graph IS

Variance Hierarchy and Complexity

O(nN)O(1)

O(nN)O(h)

O(cN+(n-c)Nexp(w))O(h+(n-c)exp(w))

O(cN+(n-c)Nexp(w))O(1)

O(nNw*)O(nN)

O(cNw*+(n-c)Nexp(w))O(cN+(n-c)exp(w))

Page 68: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

Experiments

• Benchmarks– Linkage analysis– Graph coloring– Grids

• Algorithms– OR tree sampling– AND/OR tree sampling– AND/OR graph sampling– w-cutset versions of the three schemes above

Page 69: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

Results: Probability of EvidenceLinkage instances (UAI 2006 evaluation)

Time Bound: 1hr

Page 70: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,
Page 71: SampleSearch: Importance Sampling in the presence of Determinism Rina Dechter Bren school of Information and Computer Sciences University of California,

82

Summary: AND/OR Importance sampling

• AND/OR sampling: A general scheme to exploit conditional independence in sampling

• Theoretical guarantees: lower sampling error than conventional sampling

• Variance reduction orthogonal to Rao-Blackwellised sampling.

• Better empirical performance than conventional sampling.