lower bounds for exact model counting and applications in probabilistic databases

35
Lower Bounds for Exact Model Counting and Applications in Probabilistic Databases Paul Beame Jerry Li Sudeepa Roy Dan Suciu University of Washington

Upload: emory

Post on 11-Jan-2016

42 views

Category:

Documents


0 download

DESCRIPTION

Lower Bounds for Exact Model Counting and Applications in Probabilistic Databases. Paul Beame Jerry Li Sudeepa Roy Dan Suciu University of Washington. Model Counting. Model Counting Problem: Given a Boolean formula F , - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Lower Bounds for  Exact Model Counting and  Applications in  Probabilistic Databases

Lower Bounds for Exact Model Counting and Applications in Probabilistic Databases

Paul Beame Jerry Li Sudeepa Roy Dan Suciu

University of Washington

Page 2: Lower Bounds for  Exact Model Counting and  Applications in  Probabilistic Databases

2

Model Counting• Model Counting Problem:

Given a Boolean formula F, compute #F = #Models (satisfying assignments) of F

e.g. F = (x y) (x u w) (x u w z) #Assignments on x, y, u, z, w which make F = true

• Probability Computation Problem:Given F, and independent Pr(x), Pr(y), Pr(z), …,

compute Pr(F)

Page 3: Lower Bounds for  Exact Model Counting and  Applications in  Probabilistic Databases

3

Model Counting• #P-hard

▫ Even for formulas where satisfiability is easy to check

• Applications in probabilistic inference ▫ e.g. Bayesian net learning

• There are many practical model counters that can compute both #F and Pr(F)

Page 4: Lower Bounds for  Exact Model Counting and  Applications in  Probabilistic Databases

4

•CDP•Relsat•Cachet•SharpSAT•c2d•Dsharp•…

Exact Model Counters

Search-based/DPLL-based(explore the assignment-space and count the satisfying ones)

Knowledge Compilation-based(compile F into a “computation-friendly” form)

[Survey by Gomes et. al. ’09]

Both techniques explicitly or implicitly • use DPLL-based algorithms • produce FBDD or Decision-DNNF compiled forms (output or trace)

[Huang-Darwiche’05, ’07]

[Birnbaum et. al.’99]

[Bayardo Jr. et. al. ’97, ’00]

[Sang et. al. ’05]

[Thurley ’06]

[Darwiche ’04]

[Muise et. al. ’12]

Page 5: Lower Bounds for  Exact Model Counting and  Applications in  Probabilistic Databases

5

Model Counters Use Extensions to DPLL

• Caching Subformulas▫ Cachet, SharpSAT, c2d, Dsharp

• Component Analysis▫ Relsat, c2d, Cachet , SharpSAT, Dsharp

• Conflict Directed Clause Learning▫ Cachet, SharpSAT, c2d, Dsharp

• DPLL + caching + (clause learning) FBDD• DPLL + caching + component + (clause learning) Decision-DNNF

How much more does component analysis add?i.e. how much more powerful are decision-DNNFs than FBDDs?

Page 6: Lower Bounds for  Exact Model Counting and  Applications in  Probabilistic Databases

6

Theorem:

• Decision-DNNF of size N FBDD of size Nlog N + 1

• If the formula is k-DNF, then FBDD of size Nk

• Algorithm runs in linear time in the size of its output

Main Result

Page 7: Lower Bounds for  Exact Model Counting and  Applications in  Probabilistic Databases

7

Consequence: Running Time Lower Bounds

Model counting algorithm running time ≥ compiled form size

Lower bound on compiled form size Lower bound on running time

▫Note: Running time may be much larger than the size▫e.g. an unsatisfiable CNF formula has a trivial compiled form

Page 8: Lower Bounds for  Exact Model Counting and  Applications in  Probabilistic Databases

8

Our quasipolynomial conversion+ Known exponential lower bounds on FBDDs

[Bollig-Wegener ’00, Wegener’02]

Exponential lower bounds on decision-DNNF size

Exponential lower bounds on running time of exact model counters

Consequence: Running Time Lower Bounds

Page 9: Lower Bounds for  Exact Model Counting and  Applications in  Probabilistic Databases

9

Outline

•Review of DPLL-based algorithms▫Extensions (Caching & Component Analysis)▫Knowledge Compilation (FBDD & Decision-DNNF)

•Our Contributions▫Decision-DNNF to FBDD conversion▫ Implications of the conversion▫Applications to Probabilistic Databases

•Conclusions

Page 10: Lower Bounds for  Exact Model Counting and  Applications in  Probabilistic Databases

DPLL Algorithms

Davis, Putnam, Logemann, Loveland [Davis et. al. ’60, ’62]

10

x

z

0

y

1

u 01

1

0

w

1

0

0

1

1 0

u11

1

0

w

1

0

0

1

1 0

1 0 1 0

01

11

F: (xy) (xuw) (xuwz)

uwz

uw

w

uw

½

¾ ¾

y(uw)3/87/8

5/8

Assume uniform distribution for simplicity

// basic DPLL:Function Pr(F):

if F = false then return 0if F = true then return 1select a variable x, return

½ Pr(FX=0) + ½ Pr(FX=1)

Page 11: Lower Bounds for  Exact Model Counting and  Applications in  Probabilistic Databases

DPLL Algorithms

11

x

z

0

y

1

u 01

1

0

w

1

0

0

1

1 0

u11

1

0

w

1

0

0

1

1 0

1 0 1 0

01

11

F: (xy) (xuw) (xuwz)

uwz

uw

w

uw

½

¾ ¾

y(uw)3/87/8

5/8

The trace is a Decision-Tree for F

Page 12: Lower Bounds for  Exact Model Counting and  Applications in  Probabilistic Databases

12

Extensions to DPLL

• Caching Subformulas

• Component Analysis

• Conflict Directed Clause Learning▫ Affects the efficiency of the algorithm, but not the final “form” of the trace

Traces of• DPLL + caching + (clause learning) FBDD• DPLL + caching + component + (clause learning) Decision-DNNF

Page 13: Lower Bounds for  Exact Model Counting and  Applications in  Probabilistic Databases

Caching

13

// basic DPLL:Function Pr(F):

if F = false then return 0if F = true then return 1select a variable x, return

½ Pr(FX=0) + ½ Pr(FX=1)

x

z

0

y

1

u 01

1

0

w

1

0

0

1

1 0

u11

1

0

w

1

0

0

1

1 0

F: (xy) (xuw) (xuwz)

uwz

uw

w

uw

y(uw)

w

// DPLL with caching:Cache F and Pr(F);look it up before computing

Page 14: Lower Bounds for  Exact Model Counting and  Applications in  Probabilistic Databases

Caching & FBDDs

14

x

z

0

y

1

0

1 0

u11

1

0

w

1

0

0

1

1 0

F: (xy) (xuw) (xuwz)

uwz

uw

w

y(uw)The trace is a decision-DAG for F

FBDD (Free Binary Decision Diagram)or

ROBP (Read Once Branching Program)

• Every variable is tested at most once on any path

• All internal nodes are decision-nodes

Decision-Node

Page 15: Lower Bounds for  Exact Model Counting and  Applications in  Probabilistic Databases

Component Analysis

15

x

z

0

y

1

0

1 0

u11

1

0

w

1

0

0

1

1 0

F: (xy) (xuw) (xuwz)

uwz

uw

w

y (uw)

// basic DPLL:Function Pr(F):

if F = false then return 0if F = true then return 1select a variable x, return

½ Pr(FX=0) + ½ Pr(FX=1)

// DPLL with component analysis (and caching):

if F = G Hwhere G and H have disjoint set of variablesPr(F) = Pr(G) × Pr(H)

Page 16: Lower Bounds for  Exact Model Counting and  Applications in  Probabilistic Databases

Components & Decision-DNNF

16

x

z

1u1

1

1

0

w

1

0

0

1

1 0

uwz

w

y (uw)

0

y

1

0

F: (xy) (xuw) (xuwz)

The trace is a Decision-DNNF [Huang-Darwiche ’05, ’07]

FBDD + “Decomposable” AND-nodes

(Two sub-DAGs do not share variables)

Decision Node

y

01AND Node

uw

How much power do they add?

Page 17: Lower Bounds for  Exact Model Counting and  Applications in  Probabilistic Databases

17

Main Technical Result

Decision-DNNF FBDDEfficient construction

Size N Size Nlog N+1

(quasipolynomial)

Size Nk

(polynomial)k-DNFe.g. 3-DNF: (x y z) (w y z)

Page 18: Lower Bounds for  Exact Model Counting and  Applications in  Probabilistic Databases

18

Outline

•Review of DPLL algorithms▫Extensions (Caching & Component Analysis)▫Knowledge Compilation (FBDDs & Decision-DNNF)

•Our Contributions▫Decision-DNNF to FBDD conversion▫ Implications of the conversion▫Applications to Probabilistic Databases

•Conclusions

Page 19: Lower Bounds for  Exact Model Counting and  Applications in  Probabilistic Databases

19

Need to convertall AND-nodes to Decision-nodeswhile evaluating the same formula F

Decision-DNNF FBDD

Page 20: Lower Bounds for  Exact Model Counting and  Applications in  Probabilistic Databases

A Simple Idea

20

G H

0 1 0 1

G

H0

0 1Decision-DNNF FBDD

G and H do not share variables, so every variable is still tested at most once on any path

1

FBDD

Page 21: Lower Bounds for  Exact Model Counting and  Applications in  Probabilistic Databases

But, what if sub-DAGs are shared?

21

G H

0 10 1

Decision-DNNF

Conflict!

g’

hG

H0

0 1

H

G

0 1

0g’

h

Page 22: Lower Bounds for  Exact Model Counting and  Applications in  Probabilistic Databases

22

G H

010 1

g’

h

Obvious Solution: Replicate Nodes

G H

No conflictApply the simple idea

But, may need recursive replicationCan have exponential blowup!

Page 23: Lower Bounds for  Exact Model Counting and  Applications in  Probabilistic Databases

Main Idea: Replicate Smaller Sub-DAG

23

Edges coming from other nodes in the decision-DNNF

Smaller sub-DAG

Larger sub-DAG

Each AND-node creates a private copy of its smaller sub-DAG

Page 24: Lower Bounds for  Exact Model Counting and  Applications in  Probabilistic Databases

Light and Heavy Edges

24

Smaller sub-DAG

Larger sub-DAG

Light Edge Heavy Edge

Each AND-node creates a private copy of its smaller sub-DAG

Þ Recursively each node u is replicated #times in a smaller sub-DAG

Þ #Copies of u = #sequences of light edges leading to u

Page 25: Lower Bounds for  Exact Model Counting and  Applications in  Probabilistic Databases

Quasipolynomial Conversion

25

L = Max #light edges on any path

L ≤ log N

N = Nsmall + Nbig ≥ 2 Nsmall ≥ ... ≥ 2L

#Copies of each node ≤ NL ≤ Nlog N

We also show that our analysis is tight

#Nodes in FBDD ≤ N. Nlog N

Page 26: Lower Bounds for  Exact Model Counting and  Applications in  Probabilistic Databases

26

Polynomial Conversion for k-DNFs

•L = #Max light edges on any path ≤ k – 1

•#Nodes in FBDD ≤ N. NL = Nk

Page 27: Lower Bounds for  Exact Model Counting and  Applications in  Probabilistic Databases

27

Outline

•Review of DPLL algorithms▫Extensions (Caching & Component Analysis)▫Knowledge Compilation (FBDDs & Decision-DNNF)

•Our Contributions▫Decision-DNNF to FBDD conversion▫ Implications of the conversion▫Applications to Probabilistic Databases

•Conclusions

Page 28: Lower Bounds for  Exact Model Counting and  Applications in  Probabilistic Databases

Separation Results

AND-FBDDDecision-DNNF

FBDDd-DNNF

• FBDD: Decision-DAG, each variable is tested once along any path

• Decision-DNNF: FBDD + decomposable AND-nodes (disjoint sub-DAGs)

Exponential Separation

Poly-size AND-FBDD or d-DNNF exists

Exponential lower bound on decision-DNNF size

• AND-FBDD: FBDD + AND-nodes (not necessarily decomposable) [Wegener’00]

• d-DNNF: Decomposable AND nodes + OR-nodes with sub-DAGs not simultaneously satisfiable [Darwiche ’01, Darwiche-Marquis ’02]

Page 29: Lower Bounds for  Exact Model Counting and  Applications in  Probabilistic Databases

29

Outline

•Review of DPLL algorithms▫Extensions (Caching & Component Analysis)▫Knowledge Compilation (FBDDs & Decision-DNNF)

•Our Contributions▫Decision-DNNF to FBDD conversion▫ Implications of the conversion▫Applications to Probabilistic Databases

•Conclusions

Page 30: Lower Bounds for  Exact Model Counting and  Applications in  Probabilistic Databases

Probabilistic Databases AsthmaPatien

t

Ann

Bob

Friend

Ann Joe

Ann Tom

Bob Tom

Smoker

Joe

Tom

Boolean query Q: x y AsthmaPatient(x) Friend (x, y) Smoker(y)

• Tuples are probabilistic (and independent)▫ “Ann” is present with probability 0.3

• What is the probability that Q is true on D?▫ Assign unique variables to tuples

• Boolean formula FQ,D = (x1y1z1) (x1y2z2) (x2y3z2)▫ Q is true on D FQ,D is true

x1

x2

z1

z2

y1

y2

y3

0.30.1

0.51.0

0.90.5

0.7

Pr(x1) = 0.3

Page 31: Lower Bounds for  Exact Model Counting and  Applications in  Probabilistic Databases

Probabilistic Databases

• FQ,D = (x1y1z1) (x1y2z2) (x2y3z2)

• Probability Computation Problem: Compute Pr(FQ,D) given Pr(x1), Pr(x2), ….

• FQ,D can be written as a k-DNF ▫ for fixed, monotone queries Q

For an important class of queries Q, we get exponential lower bounds on decision-DNNFs and model counting algorithms

Page 32: Lower Bounds for  Exact Model Counting and  Applications in  Probabilistic Databases

32

Outline

•Review of DPLL algorithms▫Extensions (Caching & Component Analysis)▫Knowledge Compilation (FBDDs & Decision-DNNF)

•Our Contributions▫Decision-DNNF to FBDD conversion▫ Implications of the conversion▫Applications to Probabilistic Databases

•Conclusions

Page 33: Lower Bounds for  Exact Model Counting and  Applications in  Probabilistic Databases

33

Summary

• Quasi-polynomial conversion of any decision-DNNF into an FBDD (polynomial for k-DNF)

• Exponential lower bounds on model counting algorithms • d-DNNFs and AND-FBDDs are exponentially more

powerful than decision-DNNFs

• Applications in probabilistic databases

Page 34: Lower Bounds for  Exact Model Counting and  Applications in  Probabilistic Databases

34

Open Problems

• A polynomial conversion of decision-DNNFs to FBDDs?

• A more powerful syntactic subclass of d-DNNFs than decision-DNNFs?▫ d-DNNF is a semantic concept▫ No efficient algorithm to test if two sub-DAGs of an OR-node are

simultaneously satisfiable

• Approximate model counting?

Page 35: Lower Bounds for  Exact Model Counting and  Applications in  Probabilistic Databases

35

Thank You

Questions?