symmetry in probabilistic databases - web.cs.ucla.eduguyvdb/slides/amw15.pdf · symmetry in...

111
Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame, Eric Gribkoff, Wannes Meert, Adnan Darwiche Based on NIPS 2011, KR 2014, and upcoming PODS 2015 paper

Upload: others

Post on 07-Sep-2019

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Symmetry in Probabilistic Databases

Guy Van den Broeck KU Leuven

Joint work with

Dan Suciu, Paul Beame, Eric Gribkoff,

Wannes Meert, Adnan Darwiche

Based on NIPS 2011, KR 2014, and upcoming PODS 2015 paper

Page 2: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Overview

• Motivation and convergence of – The artificial intelligence story (recap) – The machine learning story (recap) – The probabilistic database story – The database theory story

• Main theoretical results and proof outlines • Discussion and conclusions • Dessert

Page 3: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Overview

• Motivation and convergence of – The artificial intelligence story (recap) – The machine learning story (recap) – The probabilistic database story – The database theory story

• Main theoretical results and proof outlines • Discussion and conclusions • Dessert

Page 4: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

...

A Simple Reasoning Problem

?

Probability that Card1 is Hearts?

[Van den Broeck; AAAI-KRR‟15]

Page 5: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

...

A Simple Reasoning Problem

?

Probability that Card1 is Hearts? 1/4

[Van den Broeck; AAAI-KRR‟15]

Page 6: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

A Simple Reasoning Problem

...

?

Probability that Card52 is Spades given that Card1 is QH?

[Van den Broeck; AAAI-KRR‟15]

Page 7: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

A Simple Reasoning Problem

...

?

Probability that Card52 is Spades given that Card1 is QH? 13/51

[Van den Broeck; AAAI-KRR‟15]

Page 8: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Automated Reasoning

Let us automate this:

1. Probabilistic graphical model (e.g., factor graph)

2. Probabilistic inference algorithm (e.g., variable elimination or junction tree)

Page 9: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Automated Reasoning

(artist's impression)

Let us automate this: 1. Probabilistic graphical model (e.g., factor graph)

is fully connected!

2. Probabilistic inference algorithm (e.g., variable elimination or junction tree) builds a table with 5252 rows

[Van den Broeck; AAAI-KRR’15]

Page 10: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

...

What's Going On Here?

?

Probability that Card52 is Spades given that Card1 is QH?

[Van den Broeck; AAAI-KRR‟15]

Page 11: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

...

What's Going On Here?

?

Probability that Card52 is Spades given that Card1 is QH? 13/51

[Van den Broeck; AAAI-KRR‟15]

Page 12: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

What's Going On Here?

?

...

Probability that Card52 is Spades given that Card2 is QH?

[Van den Broeck; AAAI-KRR‟15]

Page 13: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

What's Going On Here?

?

...

Probability that Card52 is Spades given that Card2 is QH? 13/51

[Van den Broeck; AAAI-KRR‟15]

Page 14: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

What's Going On Here?

?

...

Probability that Card52 is Spades given that Card3 is QH?

[Van den Broeck; AAAI-KRR‟15]

Page 15: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

What's Going On Here?

?

...

Probability that Card52 is Spades given that Card3 is QH? 13/51

[Van den Broeck; AAAI-KRR‟15]

Page 16: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

...

Tractable Probabilistic Inference

Which property makes inference tractable? Traditional belief: Independence What's going on here?

[Niepert, Van den Broeck; AAAI‟14], [Van den Broeck; AAAI-KRR‟15]

Page 17: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

...

Tractable Probabilistic Inference

Which property makes inference tractable? Traditional belief: Independence What's going on here?

⇒ Lifted Inference

High-level (first-order) reasoning Symmetry Exchangeability

[Niepert, Van den Broeck; AAAI‟14], [Van den Broeck; AAAI-KRR‟15]

Page 18: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Let us automate this: Relational model

Lifted probabilistic inference algorithm

∀p, ∃c, Card(p,c) ∀c, ∃p, Card(p,c)

∀p, ∀c, ∀c‟, Card(p,c) ∧ Card(p,c‟) ⇒ c = c‟

...

Page 19: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

...

Playing Cards Revisited

Let us automate this:

∀p, ∃c, Card(p,c) ∀c, ∃p, Card(p,c)

∀p, ∀c, ∀c’, Card(p,c) ∧ Card(p,c’) ⇒ c = c’

[Van den Broeck.; AAAI-KR‟15]

Page 20: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

...

Playing Cards Revisited

Let us automate this:

∀p, ∃c, Card(p,c) ∀c, ∃p, Card(p,c)

∀p, ∀c, ∀c’, Card(p,c) ∧ Card(p,c’) ⇒ c = c’

[Van den Broeck.; AAAI-KR‟15]

Page 21: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

...

Playing Cards Revisited

Let us automate this:

∀p, ∃c, Card(p,c) ∀c, ∃p, Card(p,c)

∀p, ∀c, ∀c’, Card(p,c) ∧ Card(p,c’) ⇒ c = c’

Computed in time polynomial in n

[Van den Broeck.; AAAI-KR‟15]

Page 22: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Model Counting

• Model = solution to a propositional logic formula Δ • Model counting = #SAT

Rain Cloudy Model? T T Yes

T F No

F T Yes

F F Yes

#SAT = 3 +

Δ = (Rain ⇒ Cloudy)

[Valiant] #P-hard, even for 2CNF

Page 23: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

First-Order Model Counting Model = solution to first-order logic formula Δ

Δ = ∀d (Rain(d) ⇒ Cloudy(d))

Days = {Monday}

Page 24: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

First-Order Model Counting Model = solution to first-order logic formula Δ

Rain(M) Cloudy(M) Model?

T T Yes

T F No

F T Yes

F F Yes

FOMC = 3 +

Δ = ∀d (Rain(d) ⇒ Cloudy(d))

Days = {Monday}

Page 25: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

First-Order Model Counting Model = solution to first-order logic formula Δ

Rain(M) Cloudy(M) Rain(T) Cloudy(T) Model?

T T T T Yes

T F T T No

F T T T Yes

F F T T Yes

T T T F No

T F T F No

F T T F No

F F T F No

T T F T Yes

T F F T No

F T F T Yes

F F F T Yes

T T F F Yes

T F F F No

F T F F Yes

F F F F Yes

Δ = ∀d (Rain(d) ⇒ Cloudy(d))

Days = {Monday Tuesday}

Page 26: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

First-Order Model Counting Model = solution to first-order logic formula Δ

Rain(M) Cloudy(M) Rain(T) Cloudy(T) Model?

T T T T Yes

T F T T No

F T T T Yes

F F T T Yes

T T T F No

T F T F No

F T T F No

F F T F No

T T F T Yes

T F F T No

F T F T Yes

F F F T Yes

T T F F Yes

T F F F No

F T F F Yes

F F F F Yes

FOMC = 9 +

Δ = ∀d (Rain(d) ⇒ Cloudy(d))

Days = {Monday Tuesday}

Page 27: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

FOMC Inference: Example 1

Page 28: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

3. Δ = ∀x, (Stress(x) ⇒ Smokes(x)) Domain = {n people}

FOMC Inference: Example 1

Page 29: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

→ 3n models

3. Δ = ∀x, (Stress(x) ⇒ Smokes(x)) Domain = {n people}

FOMC Inference: Example 1

Page 30: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

→ 3n models

3. Δ = ∀x, (Stress(x) ⇒ Smokes(x)) Domain = {n people}

FOMC Inference: Example 1

2. Δ = ∀y, (ParentOf(y) ∧ Female ⇒ MotherOf(y)) D = {n people}

Page 31: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

→ 3n models

3. Δ = ∀x, (Stress(x) ⇒ Smokes(x)) Domain = {n people}

FOMC Inference: Example 1

2. Δ = ∀y, (ParentOf(y) ∧ Female ⇒ MotherOf(y)) D = {n people}

If Female = true? Δ = ∀y, (ParentOf(y) ⇒ MotherOf(y)) → 3n models

Page 32: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

→ 3n models

3. Δ = ∀x, (Stress(x) ⇒ Smokes(x)) Domain = {n people}

FOMC Inference: Example 1

2. Δ = ∀y, (ParentOf(y) ∧ Female ⇒ MotherOf(y)) D = {n people}

If Female = true? Δ = ∀y, (ParentOf(y) ⇒ MotherOf(y)) → 3n models

→ 4n models If Female = false? Δ = true

Page 33: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

→ 3n models

3. Δ = ∀x, (Stress(x) ⇒ Smokes(x)) Domain = {n people}

FOMC Inference: Example 1

→ 3n + 4n models

2. Δ = ∀y, (ParentOf(y) ∧ Female ⇒ MotherOf(y)) D = {n people}

If Female = true? Δ = ∀y, (ParentOf(y) ⇒ MotherOf(y)) → 3n models

→ 4n models If Female = false? Δ = true

Page 34: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

→ 3n models

3. Δ = ∀x, (Stress(x) ⇒ Smokes(x)) Domain = {n people}

FOMC Inference: Example 1

→ 3n + 4n models

2. Δ = ∀y, (ParentOf(y) ∧ Female ⇒ MotherOf(y))

1. Δ = ∀x,y, (ParentOf(x,y) ∧ Female(x) ⇒ MotherOf(x,y)) D = {n people}

D = {n people}

If Female = true? Δ = ∀y, (ParentOf(y) ⇒ MotherOf(y)) → 3n models

→ 4n models If Female = false? Δ = true

Page 35: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

→ 3n models

3. Δ = ∀x, (Stress(x) ⇒ Smokes(x)) Domain = {n people}

FOMC Inference: Example 1

→ 3n + 4n models

→ (3n + 4n)n models

2. Δ = ∀y, (ParentOf(y) ∧ Female ⇒ MotherOf(y))

1. Δ = ∀x,y, (ParentOf(x,y) ∧ Female(x) ⇒ MotherOf(x,y)) D = {n people}

D = {n people}

If Female = true? Δ = ∀y, (ParentOf(y) ⇒ MotherOf(y)) → 3n models

→ 4n models If Female = false? Δ = true

Page 36: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

FOMC Inference : Example 2

Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}

Page 37: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

FOMC Inference : Example 2

If we know precisely who smokes, and there are k smokers?

k

n-k

k

n-k

Database: Smokes(Alice) = 1 Smokes(Bob) = 0 Smokes(Charlie) = 0 Smokes(Dave) = 1 Smokes(Eve) = 0 ...

Smokes Smokes Friends

Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}

Page 38: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

FOMC Inference : Example 2

If we know precisely who smokes, and there are k smokers?

k

n-k

k

n-k

Database: Smokes(Alice) = 1 Smokes(Bob) = 0 Smokes(Charlie) = 0 Smokes(Dave) = 1 Smokes(Eve) = 0 ...

Smokes Smokes Friends

Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}

Page 39: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

FOMC Inference : Example 2

If we know precisely who smokes, and there are k smokers?

k

n-k

k

n-k

Database: Smokes(Alice) = 1 Smokes(Bob) = 0 Smokes(Charlie) = 0 Smokes(Dave) = 1 Smokes(Eve) = 0 ...

Smokes Smokes Friends

Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}

Page 40: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

FOMC Inference : Example 2

If we know precisely who smokes, and there are k smokers?

k

n-k

k

n-k

Database: Smokes(Alice) = 1 Smokes(Bob) = 0 Smokes(Charlie) = 0 Smokes(Dave) = 1 Smokes(Eve) = 0 ...

Smokes Smokes Friends

Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}

Page 41: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

FOMC Inference : Example 2

If we know precisely who smokes, and there are k smokers?

k

n-k

k

n-k

Database: Smokes(Alice) = 1 Smokes(Bob) = 0 Smokes(Charlie) = 0 Smokes(Dave) = 1 Smokes(Eve) = 0 ...

Smokes Smokes Friends

Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}

Page 42: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

FOMC Inference : Example 2

If we know precisely who smokes, and there are k smokers?

k

n-k

k

n-k

Database: Smokes(Alice) = 1 Smokes(Bob) = 0 Smokes(Charlie) = 0 Smokes(Dave) = 1 Smokes(Eve) = 0 ...

Smokes Smokes Friends

Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}

Page 43: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

FOMC Inference : Example 2

If we know precisely who smokes, and there are k smokers?

k

n-k

k

n-k

Database: Smokes(Alice) = 1 Smokes(Bob) = 0 Smokes(Charlie) = 0 Smokes(Dave) = 1 Smokes(Eve) = 0 ...

Smokes Smokes Friends

Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}

Page 44: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

FOMC Inference : Example 2

If we know precisely who smokes, and there are k smokers?

k

n-k

k

n-k

Database: Smokes(Alice) = 1 Smokes(Bob) = 0 Smokes(Charlie) = 0 Smokes(Dave) = 1 Smokes(Eve) = 0 ...

Smokes Smokes Friends

Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}

Page 45: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

FOMC Inference : Example 2

If we know precisely who smokes, and there are k smokers?

k

n-k

k

n-k

Database: Smokes(Alice) = 1 Smokes(Bob) = 0 Smokes(Charlie) = 0 Smokes(Dave) = 1 Smokes(Eve) = 0 ...

Smokes Smokes Friends

Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}

Page 46: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

FOMC Inference : Example 2

If we know precisely who smokes, and there are k smokers?

k

n-k

k

n-k

→ models

Database: Smokes(Alice) = 1 Smokes(Bob) = 0 Smokes(Charlie) = 0 Smokes(Dave) = 1 Smokes(Eve) = 0 ...

Smokes Smokes Friends

Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}

Page 47: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

FOMC Inference : Example 2

If we know precisely who smokes, and there are k smokers?

k

n-k

k

n-k

If we know that there are k smokers?

→ models

Database: Smokes(Alice) = 1 Smokes(Bob) = 0 Smokes(Charlie) = 0 Smokes(Dave) = 1 Smokes(Eve) = 0 ...

Smokes Smokes Friends

Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}

Page 48: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

FOMC Inference : Example 2

If we know precisely who smokes, and there are k smokers?

k

n-k

k

n-k

If we know that there are k smokers?

→ models

Database: Smokes(Alice) = 1 Smokes(Bob) = 0 Smokes(Charlie) = 0 Smokes(Dave) = 1 Smokes(Eve) = 0 ...

→ models

Smokes Smokes Friends

Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}

Page 49: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

FOMC Inference : Example 2

If we know precisely who smokes, and there are k smokers?

k

n-k

k

n-k

If we know that there are k smokers?

In total…

→ models

Database: Smokes(Alice) = 1 Smokes(Bob) = 0 Smokes(Charlie) = 0 Smokes(Dave) = 1 Smokes(Eve) = 0 ...

→ models

Smokes Smokes Friends

Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}

Page 50: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

FOMC Inference : Example 2

If we know precisely who smokes, and there are k smokers?

k

n-k

k

n-k

If we know that there are k smokers?

In total…

→ models

Database: Smokes(Alice) = 1 Smokes(Bob) = 0 Smokes(Charlie) = 0 Smokes(Dave) = 1 Smokes(Eve) = 0 ...

→ models

→ models

Smokes Smokes Friends

Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}

Page 51: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Overview

• Motivation and convergence of – The artificial intelligence story (recap) – The machine learning story (recap) – The probabilistic database story – The database theory story

• Main theoretical results and proof outlines • Discussion and conclusions • Dessert

Page 52: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Statistical Relational Models

• An MLN = set of constraints (w, Γ(x))

• Weight of a world = product of w, for all rules (w, Γ(x)) and groundings Γ(a) that hold in the world

∞ Smoker(x) ⇒ Person(x)

3.75 Smoker(x)∧Friend(x,y) ⇒ Smoker(y)

PMLN(Q) = [sum of weights of models of Q] / Z

Soft constraint

Hard constraint

Applications: large KBs, e.g. DeepDive

Page 53: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Weighted Model Counting

• Model = solution to a propositional logic formula Δ • Model counting = #SAT

Rain Cloudy Model? T T Yes

T F No

F T Yes

F F Yes

#SAT = 3 +

Δ = (Rain ⇒ Cloudy)

Page 54: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Weighted Model Counting

• Model = solution to a propositional logic formula Δ • Model counting = #SAT

Rain Cloudy Model? T T Yes

T F No

F T Yes

F F Yes

#SAT = 3

Weight 1 * 3 = 3

0

2 * 3 = 6

2 * 5 = 10

• Weighted model counting (WMC) – Weights for assignments to variables – Model weight is product of variable weights w(.)

+

Δ = (Rain ⇒ Cloudy)

w( R)=1 w(¬R)=2 w( C)=3 w(¬C)=5

Page 55: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Weighted Model Counting

• Model = solution to a propositional logic formula Δ • Model counting = #SAT

Rain Cloudy Model? T T Yes

T F No

F T Yes

F F Yes

#SAT = 3

Weight 1 * 3 = 3

0

2 * 3 = 6

2 * 5 = 10

WMC = 19

• Weighted model counting (WMC) – Weights for assignments to variables – Model weight is product of variable weights w(.)

+ +

Δ = (Rain ⇒ Cloudy)

w( R)=1 w(¬R)=2 w( C)=3 w(¬C)=5

Page 56: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Assembly language for probabilistic reasoning and learning

Bayesian networks Factor graphs

Probabilistic databases

Relational Bayesian networks

Probabilistic logic programs

Markov Logic

Weighted Model Counting

Page 57: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Weighted First-Order Model Counting Model = solution to first-order logic formula Δ

Rain(M) Cloudy(M) Rain(T) Cloudy(T) Model?

T T T T Yes

T F T T No

F T T T Yes

F F T T Yes

T T T F No

T F T F No

F T T F No

F F T F No

T T F T Yes

T F F T No

F T F T Yes

F F F T Yes

T T F F Yes

T F F F No

F T F F Yes

F F F F Yes

Δ = ∀d (Rain(d) ⇒ Cloudy(d))

Days = {Monday Tuesday}

Page 58: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Weighted First-Order Model Counting Model = solution to first-order logic formula Δ

Rain(M) Cloudy(M) Rain(T) Cloudy(T) Model?

T T T T Yes

T F T T No

F T T T Yes

F F T T Yes

T T T F No

T F T F No

F T T F No

F F T F No

T T F T Yes

T F F T No

F T F T Yes

F F F T Yes

T T F F Yes

T F F F No

F T F F Yes

F F F F Yes

#SAT = 9 +

Δ = ∀d (Rain(d) ⇒ Cloudy(d))

Days = {Monday Tuesday}

Page 59: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Weighted First-Order Model Counting Model = solution to first-order logic formula Δ

Weight

1 * 1 * 3 * 3 = 9

0

2 * 1* 3 * 3 = 18

2 * 1 * 5 * 3 = 30

0

0

0

0

1 * 2 * 3 * 3 = 18

0

2 * 2 * 3 * 3 = 36

2 * 2 * 5 * 3 = 60

1 * 2 * 3 * 5 = 30

0

2 * 2 * 3 * 5 = 60

2 * 2 * 5 * 5 = 100

Rain(M) Cloudy(M) Rain(T) Cloudy(T) Model?

T T T T Yes

T F T T No

F T T T Yes

F F T T Yes

T T T F No

T F T F No

F T T F No

F F T F No

T T F T Yes

T F F T No

F T F T Yes

F F F T Yes

T T F F Yes

T F F F No

F T F F Yes

F F F F Yes

#SAT = 9 +

Δ = ∀d (Rain(d) ⇒ Cloudy(d))

Days = {Monday Tuesday}

w( R)=1 w(¬R)=2 w( C)=3 w(¬C)=5

Page 60: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Weighted First-Order Model Counting Model = solution to first-order logic formula Δ

Weight

1 * 1 * 3 * 3 = 9

0

2 * 1* 3 * 3 = 18

2 * 1 * 5 * 3 = 30

0

0

0

0

1 * 2 * 3 * 3 = 18

0

2 * 2 * 3 * 3 = 36

2 * 2 * 5 * 3 = 60

1 * 2 * 3 * 5 = 30

0

2 * 2 * 3 * 5 = 60

2 * 2 * 5 * 5 = 100

WFOMC = 361 +

Rain(M) Cloudy(M) Rain(T) Cloudy(T) Model?

T T T T Yes

T F T T No

F T T T Yes

F F T T Yes

T T T F No

T F T F No

F T T F No

F F T F No

T T F T Yes

T F F T No

F T F T Yes

F F F T Yes

T T F F Yes

T F F F No

F T F F Yes

F F F F Yes

#SAT = 9 +

Δ = ∀d (Rain(d) ⇒ Cloudy(d))

Days = {Monday Tuesday}

w( R)=1 w(¬R)=2 w( C)=3 w(¬C)=5

Page 61: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Assembly language for high-level probabilistic reasoning and learning

Parfactor graphs

Probabilistic databases

Relational Bayesian networks

Probabilistic logic programs

Markov Logic

Weighted First-Order Model Counting

[VdB et al.; IJCAI‟11, PhD‟13, KR‟14, UAI‟14]

Page 62: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Symmetric WFOMC

Def. A weighted vocabulary is (R, w), where

– R = (R1, R2, …, Rk) = relational vocabulary

– w = (w1, w2, …, wk) = weights

• Fix an FO formula Q, domain of size n

• The weight of a ground tuple t in Ri is wi

This talk: complexity of FOMC / WFOMC(Q, n) • Data complexity: fixed Q, input n / and w • Combined complexity: input (Q, n) / and w

Page 63: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Example

Computable in PTIME in n

Q = ∀x∃y R(x,y) FOMC(Q,n) = (2n-1)n WOMC(Q,n,wR) = ((1+wR)n-1)n

Page 64: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Example

Q = ∃x∃y [R(x) ∧ S(x,y) ∧T(y)]

Computable in PTIME in n

Q = ∀x∃y R(x,y) FOMC(Q,n) = (2n-1)n WOMC(Q,n,wR) = ((1+wR)n-1)n

Page 65: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Example

Q = ∃x∃y [R(x) ∧ S(x,y) ∧T(y)]

Computable in PTIME in n

Q = ∀x∃y R(x,y) FOMC(Q,n) = (2n-1)n WOMC(Q,n,wR) = ((1+wR)n-1)n

Page 66: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Example

Q = ∃x∃y∃z [R(x,y) ∧ S(y,z) ∧T(z,x)]

Conjecture FOMC(Q, n) not computable in PTIME in n

Can we compute FOMC(Q, n) in PTIME? Open problem…

Page 67: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

From MLN to WFOMC

∞ Smoker(x) ⇒ Person(x) w ~Smoker(x)∨~Friend(x,y)∨Smoker(y)

Theorem PMLN(Q) = P(Q | hard constraints in MLN‟) = WFOMC(Q ∧ MLN‟) / WFOMC(MLN‟)

∞ Smoker(x) ⇒ Person(x) ∞ R(x,y) ⟺ ~Smoker(x)∨ ~Friend(x,y) ∨ Smoker(y) w R(x,y)

MLN:

MLN‟:

[Van den Broeck‟2011, Gogate‟2011]

R is a symmetric relation

Page 68: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Overview

• Motivation and convergence of – The artificial intelligence story (recap) – The machine learning story (recap) – The probabilistic database story – The database theory story

• Main theoretical results and proof outlines • Discussion and conclusions • Dessert

Page 69: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Probabilistic Databases

• Weights or probabilities given explicitly, for each tuple

• Examples: Knowledge Vault, Nell, Yago • Dichotomy theorem:

for any query in UCQ/FO(∃,∧,∨) (or FO(∀,∧,∨), asymmetric WFOMC is in PTIME or #P-hard.

Page 70: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Motivation 2: Probabilistic Databases

x y P

a1 b1 p1

a1 b2 p2

a2 b2 p3

Probabilistic database D:

Page 71: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Motivation 2: Probabilistic Databases

x y P

a1 b1 p1

a1 b2 p2

a2 b2 p3

x y

a1 b1

a1 b2

a2 b2

Possible worlds semantics:

p1p2p3

Probabilistic database D:

Page 72: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Motivation 2: Probabilistic Databases

x y P

a1 b1 p1

a1 b2 p2

a2 b2 p3

x y

a1 b1

a1 b2

a2 b2

Possible worlds semantics:

p1p2p3

x y

a1 b2

a2 b2

(1-p1)p2p3

Probabilistic database D:

Page 73: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Motivation 2: Probabilistic Databases

x y P

a1 b1 p1

a1 b2 p2

a2 b2 p3

x y

a1 b1

a1 b2

a2 b2

Possible worlds semantics:

p1p2p3

x y

a1 b2

a2 b2

(1-p1)p2p3

x y

a1 b1

a2 b2

x y

a1 b1

a1 b2

x y

a2 b2 x y

a1 b1 x y

a1 b2 x y

(1-p1)(1-p2)(1-p3)

Probabilistic database D:

Page 74: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

x y P

a1 b1 q1 Y1

a1 b2 q2 Y2

a2 b3 q3 Y3

a2 b4 q4 Y4

a2 b5 q5 Y5

S

x P

a1 p1 X1

a2 p2 X2

a3 p3 X3

R

P(Q) =

An Example Q = ∃x∃y R(x)∧S(x,y)

Page 75: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

x y P

a1 b1 q1 Y1

a1 b2 q2 Y2

a2 b3 q3 Y3

a2 b4 q4 Y4

a2 b5 q5 Y5

S

x P

a1 p1 X1

a2 p2 X2

a3 p3 X3

R

P(Q) = 1-(1-q1)*(1-q2)

An Example Q = ∃x∃y R(x)∧S(x,y)

Page 76: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

x y P

a1 b1 q1 Y1

a1 b2 q2 Y2

a2 b3 q3 Y3

a2 b4 q4 Y4

a2 b5 q5 Y5

S

x P

a1 p1 X1

a2 p2 X2

a3 p3 X3

R

P(Q) = 1-(1-q1)*(1-q2) p1*[ ]

An Example Q = ∃x∃y R(x)∧S(x,y)

Page 77: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

x y P

a1 b1 q1 Y1

a1 b2 q2 Y2

a2 b3 q3 Y3

a2 b4 q4 Y4

a2 b5 q5 Y5

S

x P

a1 p1 X1

a2 p2 X2

a3 p3 X3

R

P(Q) = 1-(1-q1)*(1-q2) p1*[ ] 1-(1-q3)*(1-q4)*(1-q5)

An Example Q = ∃x∃y R(x)∧S(x,y)

Page 78: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

x y P

a1 b1 q1 Y1

a1 b2 q2 Y2

a2 b3 q3 Y3

a2 b4 q4 Y4

a2 b5 q5 Y5

S

x P

a1 p1 X1

a2 p2 X2

a3 p3 X3

R

P(Q) = 1-(1-q1)*(1-q2) p1*[ ] 1-(1-q3)*(1-q4)*(1-q5) p2*[ ]

An Example Q = ∃x∃y R(x)∧S(x,y)

Page 79: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

x y P

a1 b1 q1 Y1

a1 b2 q2 Y2

a2 b3 q3 Y3

a2 b4 q4 Y4

a2 b5 q5 Y5

S

x P

a1 p1 X1

a2 p2 X2

a3 p3 X3

R

P(Q) = 1-(1-q1)*(1-q2) p1*[ ] 1-(1-q3)*(1-q4)*(1-q5) p2*[ ]

1- {1- } * {1- }

An Example Q = ∃x∃y R(x)∧S(x,y)

Page 80: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

x y P

a1 b1 q1 Y1

a1 b2 q2 Y2

a2 b3 q3 Y3

a2 b4 q4 Y4

a2 b5 q5 Y5

S

x P

a1 p1 X1

a2 p2 X2

a3 p3 X3

R

P(Q) = 1-(1-q1)*(1-q2) p1*[ ] 1-(1-q3)*(1-q4)*(1-q5) p2*[ ]

1- {1- } * {1- }

An Example Q = ∃x∃y R(x)∧S(x,y)

One can compute P(Q) in PTIME in the size of the database D

Page 81: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

x y P

a1 b1 q1 Y1

a1 b2 q2 Y2

a2 b3 q3 Y3

a2 b4 q4 Y4

a2 b5 q5 Y5

S

x P

a1 p1 X1

a2 p2 X2

a3 p3 X3

R

P(Q) = 1-(1-q1)*(1-q2) p1*[ ] 1-(1-q3)*(1-q4)*(1-q5) p2*[ ]

1- {1- } * {1- }

An Example Q = ∃x∃y R(x)∧S(x,y)

One can compute P(Q) in PTIME in the size of the database D

Page 82: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Probabilistic Database Inference

Preprocess Q (omitted from this talk; see book [S.‟2011])

• P(Q1 ∧ Q2) = P(Q1)P(Q2) P(Q1 ∨ Q2) =1 – (1 – P(Q1))(1 – P(Q2))

• P(∃z Q) = 1 – Πa ∈Domain (1– P(Q[a/z]) P(∀z Q) = Πa ∈Domain P(Q[a/z]

• P(Q1 ∧ Q2) = P(Q1) + P(Q2)- P(Q1 ∨ Q2) P(Q1 ∨ Q2) = P(Q1) + P(Q2)- P(Q1 ∧ Q2)

If rules succeed, WFOMC(Q,n) in PTIME; else, #P-hard

#P-hardness no longer holds for symmetric WFOMC

Independent join / union

Independent project

Inclusion/ exclusion

Page 83: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Overview

• Motivation and convergence of – The artificial intelligence story (recap) – The machine learning story (recap) – The probabilistic database story – The database theory story

• Main theoretical results and proof outlines • Discussion and conclusions • Dessert

Page 84: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Motivation: 0/1 Laws

Definition. μn(Q) = fraction of all structures over a domain of size n that are models of Q μn(Q) = FOMC(Q, n) / FOMC(TRUE, n) Theorem. For every Q in FO, limn ∞ μn(Q) = 0 or 1 Example: Q = ∀x∃y R(x,y); FOMC(Q,n) = (2n-1)n μn(Q) = (2n-1)n / 2n^2 1

Page 85: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Motivation: 0/1 Laws

In 1976 Fagin proved the 0/1 law for FO using a transfer theorem. But is there an elementary proof? Find explicit formula for μn(Q), then compute the limit. [Fagin communicated to us that he tried this first]

Page 86: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Overview

• Motivation and convergence of – The artificial intelligence story (recap) – The machine learning story (recap) – The probabilistic database story – The database theory story

• Main theoretical results and proof outlines

• Discussion and conclusions • Dessert

Page 87: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Class FO2

• FO2 = FO restricted to two variables

• Intuition: SQL queries that have a plan where all temp tables have arity ≤ 2

• “The graph has a path of length 10”: ∃x∃y(R(x,y) ∧∃x (R(y,x) ∧∃y (R(x,y) ∧…)))

Page 88: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Main Positive Results

Data complexity: • for any formula Q in FO2, WFOMC(Q, n) is

in PTIME [see NIPS‟11, KR‟13]

• for any γ-acyclic conjunctive query w/o self-joins Q, WFOMC(Q, n) is in PTIME

Page 89: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Main Negative Results

Data complexity: • There exists an FO formula Q s.t. symmetric

FOMC(Q, n) is #P1 hard • There exists Q in FO3 s.t. FOMC(Q, n) is #P1 hard • There exists a conjunctive query Q s.t. symmetric

WFOMC(Q, n) is #P1 hard • There exists a positive clause Q w.o. „=„ s.t.

symmetric WFOMC(Q, n) is #P1 hard

Combined complexity: • FOMC(Q, n) is #P-hard

Page 90: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Review: #P1

• #P1 = class of functions in #P over a unary input alphabet

• Valiant 1979: there exists #P1 complete problems

• Bertoni, Goldwurm, Sabatini 1988: counting strings of a given length in some CFG is #P1 complete

• Goldberg: “no natural combinatorial problems known to be #P1 complete”

Page 91: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Main Result 1

Theorem 1. There exists an FO3 sentence Q s.t. FOMC(Q,n) is #P1-hard Proof • Step 1. Construct a Turing Machine U s.t.

– U is in #P1 and runs in linear time in n – U computes a #P1 –hard function

• Step 2. Construct an FO3 sentence Q s.t. FOMC(Q,n) / n! = U(n)

Page 92: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Main Result 2

Theorem 2 There exists a Conjunctive Query Q s.t. WFOMC(Q,n) is #P1-hard

• Note: the decision problem is trivial

(Q has a model iff n > 0) • Unweighted Model Counting for CQ: open

Proof Start with a formula Q that is #P1-hard for FOMC, and transform it to a CQ in five steps (next)

Page 93: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Step 1: Remove ∃

Rewrite Q = ∀x∃y ψ(x,y) to Q‟ = ∀x ∀y (¬ψ(x,y) ∨ ¬A(x)) where A = new symbol with weight w = -1

Claim: WFOMC(Q, n) = WFOMC(Q‟, n) Proof Consider a model for Q‟, and a constant x=a

• If ∃b ψ(a,b), then A(a)=false; contributes w=1

• Otherwise, A(a) can be either true or false, contributing

either w=1 or w=-1, and 1 – 1 = 0.

Start: Q s.t. FOMC(Q, n) is #P1-hard

Q = ∀* …, WFOMC(Q, n) is #P1-hard

Page 94: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Step 2: Remove Negation

• Transform Q to Q‟ w/o negation s.t. WFOMC(Q, n) = WFOMC(Q‟, n)

• Similarly to step 1 and omitted

Q = ∀*[positive], WFOMC(Q, n) is #P1-hard

Start: Q s.t. FOMC(Q, n) is #P1-hard

Page 95: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Step 3: Remove “=“

Rewrite Q to Q‟ as follows: • Add new binary symbol E with weight w

• Define: Q‟ = Q[ E / “=“ ] ∧ (∀x E(x,x))

Claim: WFOMC(Q,n) computable using oracle for WFOMC(Q‟, n) (coefficient of wn in polynomial WFOMC(Q‟, n)

Q = ∀*[positive, w/o =], WFOMC(Q, n) is #P1-hard

Start: Q s.t. FOMC(Q, n) is #P1-hard

Page 96: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Step 4: To UCQ

• Write Q = ∀* (C1 ∧ C2 ∧ …) where each Ci is a positive clause

• The dual Q‟ = ∃* (C1„ ∨ C2„ ∨ …) is a UCQ

UCQ Q, WFOMC(Q, n) is #P1-hard

Start: Q s.t. FOMC(Q, n) is #P1-hard

Page 97: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Step 5: from UCQ to CQ

• UCQ: Q = C1 ∨ C2 ∨ …∨ Ck

• P(Q) = …. + (-1)S P(∧i ∈S Ci) + ….

• 2k-1 CQs P(Q1), P(Q2), … P(Q2^k-1)

• 1 CQ (using fresh copies of symbols): P(Q‟1Q‟2…Q‟2^k-1) =P(Q‟1)P(Q‟2)…P(Q‟2^k-1)

CQ Q‟ (=Q‟1Q‟2…Q‟2^k-1) WFOMC(Q‟, n) is #P1-hard

Start: Q s.t. FOMC(Q, n) is #P1-hard

Page 98: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Overview

• Motivation and convergence of – The artificial intelligence story (recap) – The machine learning story (recap) – The probabilistic database story – The database theory story

• Main theoretical results and proof outlines • Discussion and conclusions • Dessert

Page 99: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Motivation: 0/1 Laws

In 1976 Fagin proved the 0/1 law for FO using a transfer theorem. But is there an elementary proof? Find explicit formula for μn(Q), then compute the limit. [Fagin communicated to us that he tried this first]

Page 100: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Motivation: 0/1 Laws

In 1976 Fagin proved the 0/1 law for FO using a transfer theorem. But is there an elementary proof? Find explicit formula for μn(Q), then compute the limit. [Fagin communicated to us that he tried this first] A: unlikely when FOMC(Q,n) is #P1-hard

Page 101: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Discussion

Fagin (1974) restated: 1. NP = ∃SO

(Fagin‟s classical characterization of NP) 2. NP1 = {Spec(Φ) | Φ ∈ FO} in tally notation

(less well known!)

We show: #P1 corresponds to {FOMC(Q,n) | Q in FO }

Page 102: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Discussion

• Convergence of AI/ML/DB/theory • First-order model counting is a basic problem

that touches all these areas • Under-investigated • Hardness proofs are more difficult than for #P

Open problems: • New algorithm for symmetric model counting • New hardness reduction techniques

Page 103: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Fertile Ground

FO2

CNF

FO2

Safe monotone CNF Safe type-1 CNF

Θ1

FO3

Υ1

CQs

S

[VdB; NIPS’11], [VdB et al.; KR’14], [Gribkoff, VdB, Suciu; UAI’15], [Beame, VdB, Gribkoff, Suciu; PODS’15], etc.

Page 104: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Fertile Ground

FO2

CNF

FO2

Safe monotone CNF Safe type-1 CNF

? Θ1

FO3

Υ1

CQs

Δ = ∀x,y,z, Friends(x,y) ∧ Friends(y,z) ⇒ Friends(x,z)

S

[VdB; NIPS’11], [VdB et al.; KR’14], [Gribkoff, VdB, Suciu; UAI’15], [Beame, VdB, Gribkoff, Suciu; PODS’15], etc.

Page 105: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Overview

• Motivation and convergence of – The artificial intelligence story (recap) – The machine learning story (recap) – The probabilistic database story – The database theory story

• Main theoretical results and proof outlines • Discussion and conclusions • Dessert

Page 106: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

The Decision Problem

• Counting problem “count the number of XXX s.t…”

• Decision problem “does there exists an XXX s.t. …?”

• #3SAT and 3SAT: – counting is #P-complete, decision is NP-hard

• #2SAT and 2SAT: – counting is #P-hard, decision is in PTIME

Page 107: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Counting/Decision Problems for FO

• Counting: given Q,n, count the number of models of Q over a domain of size n

• Decision: given Q,n, does there exists a model of Q over a domain of size n?

• Data complexity: fix Q, input = n

• Combined complexity: input = Q, n

Page 108: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

The Spectrum

Definition. [Scholz 1952] Spec(Q)= {n | Q has a model over domain [n]}

Example: Q says “(D, +, *, 0, 1) is a field”: Spec(Q) = {pk | p prime, k ≥ 1}

Spectra studied intensively for over 50 years

The FO decision problem is precisely spectrum membership

Page 109: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

The Data Complexity

Suppose n is given in binary representation:

• Jones&Selman’72: spectra = NETIME

Suppose n is given in unary representation:

• Fagin’74: spectra = NP1

Page 110: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Combined Complexity

Consider the combined complexity for FO2

“given Q, n, check if n ∈ Spec(Q)”

We prove its complexity:

• NP-complete for FO2,

• PSPACE-complete for FO

Page 111: Symmetry in Probabilistic Databases - web.cs.ucla.eduguyvdb/slides/AMW15.pdf · Symmetry in Probabilistic Databases Guy Van den Broeck KU Leuven Joint work with Dan Suciu, Paul Beame,

Thanks!