generating tiny interpolants and near-interpolants from a resolution refutation

GENERATING TINY INTERPOLANTS AND NEAR-INTERPOLANTS FROM A RESOLUTION REFUTATIONAlexander Nadel3, Vadim Ryvchin2,3 and Yakir Vizel1

Interpolation’13 WorkshopSaint Petersburg, RussiaJuly 14th, 2013

1 - Computer Science Dept., The Technion, Haifa, Israel

2 - Information Systems Engineering Dept., The Technion, Haifa, Israel

3 - Intel, Haifa, Israel

2

Problem Statement• Interpolation-based model checking (ITP) is an efficient

and complete model checking procedure.

• One invocation of ITP uses many interpolants, where the interpolants are generated from a resolution refutation produced by the SAT solver

• Interpolants generated by the current method are highly redundant and might become too large rendering ITP slow or even intractable.

3

The Solution in Our CAV’13 Paper• Resolution-driven Variable Elimination (RVE)

• is a new way to generate interpolants from a resolution refutation• generates tiny interpolants very fast in the vast majority of cases , but• when it gets stuck for even ONE invocation for a given model checking

instance, the model checker gets stuck

• Solution to : • Adjust RVE so that it never gets stuck: when it cannot find an interpolant, it

generates a near-interpolant• Only few additional clauses are required to make it an interpolant

• We complete it to an interpolant with new model checking techniques

• Main results: our model checking algorithm outperforms ITP on most test-cases; and the interpolants are 117x smaller

4

Today’s Agenda• In focus: algorithms for generating interpolants and near-

interpolants from a resolution refutation:• A comparative description of 3 methods for generating interpolants:

• McMillan’s approach: the fundamental widely used algorithm• A-local variable elimination• Resolution-driven variable elimination (RVE)

• Adjusting RVE to generate near-interpolants in the worst case

• Not in focus: • Completing a near-interpolant to an interpolant• Our model checking algorithm CNF-ITP

5

Interpolant Generation: Problem Definition

• Input: propositional formulas A and B, such that A B ⇒ • Output: a formula I, such that

• A ⇒ I• I B ⇒ • V(I) G, where G V(A) V(B)

• Model checking needs: the interpolant is fed back into the SAT solver it must be in CNF⇒

6

Resolution• Resolution: given two clauses c1=c3 p and c2=c4 p,

derive a logical consequence c5=c1 p c2 = c3 c4

• p is the pivot variable

• Resolution refutation: a derivation by resolution of the empty clause from a given unsatisfiable formula

• A SAT solver can generate a resolution refutation

7

a1 g1 g2 a1 g1 g3 a1 g2 g3 g4 a1 g2a1 g4 g2 g3 g3

g2 g4a1 g2 g3

a1 g2 g3

g2 g3

g3

⊥

g1 a1

g4

a1

g2

g3

A B

A-local variables: a1

Global variables: g1, g2, g3

Example

8

Method 1 for Interpolant Generation: McMillan’s Method• Associate a formula p(c) with each node as follows

• An input node:• c A ⇒ p(c) = g(c)

• g(c): c restricted to global literals• c B ⇒ p(c) = T

• An internal node c3 = c1 p c2

• p is A-local ⇒ p(c3) = p(c1) p(c2)• p isn’t A-local ⇒ p(c3) = p(c1) p(c2)

• p() is the interpolant

9


g2 g4a1 g2 g3

a1 g2 g3

g2 g3

g3

⊥

g1 a1

g4

a1

g2

g3

g1 g2 g1 g3

(g1 g2) (g1 g3)

g2 g3 g4 g2 g4 T T

g2 g4

(g2 g3 g4) (g2 g4)

I = [(g1 g2) (g1 g3)] [(g2 g3 g4) (g2 g4)]

I

IMcMillan’s Method

10

McMillan’s Method: Pros and Cons• Pros:

• The interpolant is linear in the size of the resolution refutation• ITP works well when the resolution refutation is not overly complex

• Cons:• In many cases, the interpolant is huge and highly redundant

• Simplifying the formula on-the-fly helps, but doesn’t eliminate the problem

• The interpolant is not natively in CNF, translation is required

11

McMillan’s Method: Translating to CNF

I = [(g1 g2) (g1 g3)] [(g2 g3 g4) (g2 g4)]

g1

g2 g3 g4

a b c d

e

f

h

gI in CNF

12

Method 2 for Interpolant Generation: A-Local Variable Elimination• Variable elimination:

• Given formula F in CNF and variable p • VE(F, p) is created by replacing clauses containing p with the results of

pairwise resolutions between clauses containing p and p• VE(F, p) is equisatisfiable to F and p V(VE(F, p))

a1 g1 g2 a1 g1 g3 a1 g2 g3 g4 a1 g2a1 g4

g1 g2 g3 g4

g1 g2

g1 g2 g3 g4

g1 g2 g3

g2 g4 T

VE(A, a1)

13

A-Local Variable Elimination• Eliminate all the A-local variables from A one by one.

• The resulting formula is an interpolant

14


A-Local Variable Elimination

I = (g1 g2 g3 g4) (g1 g2) (g1 g2 g3 g4) (g1 g2 g3) (g2 g4)

g1 g2 g3 g4

g1 g2

g1 g2 g3 g4

g1 g2 g3

g2 g4 T

15

A-Local Variable Elimination: Correctness

• A ⇒ I: follows from the correctness of resolution

• I B ⇒ • Proof: Start with A B ⇒ and apply Lemma 1 for each elimination

of A-local variable

• Lemma 1: Let: (1) X ∧ Y ⇒ c; (2) p V(Y c). Then: VE(X, p) ∧ Y ⇒ c.

• V(I) G: by construction

16

A-Local Variable Elimination: Pros and Cons

• Pro: the formula is natively in CNF the translation ⇒overhead is saved

• Con: variable elimination blows up • The same problem as in the DPLL algorithm for deciding SAT

• Can one limit the amount of elimination and still get an interpolant?

17

Method 3 for Interpolant Generation: Resolution-driven Variable Elimination (RVE)• Associate a formula I(c), called the clause interpolant, with

each node c reachable from A as follows:

• For an input node: c ⇒ I(c) = c

• For an internal node c3 = c1 p c2, where

c1 and c2 are reachable from A• p is global ⇒ I(c3) = I(c1) I(c2)• p is A-local ⇒ I(c3) = VE(I(c1) I(c2), p)

• For an internal node, one of whose parents is not reachable from A: propagate the clause interpolant from the other parent

18


g2 g4a1 g2 g3

a1 g2 g3

g2 g3

g3

⊥

g1 a1

g4

a1

g2

g3

a1 g1 g2 a1 g1 g3

(a1 g1 g2) (a1 g1 g3)

a1 g2 g3 g4 a1 g2 a1 g4

g2 g4

(a1 g2 g3 g4) (g2 g4)

I = [(g1 g2 g3 g4) (g1 g2 g3 g4)] (g2 g4)]

I

IRVE

19

RVE: Correctness• I(c) is a clause interpolant of a clause c reachable from A

iff:• A ⇒ I(c)• I(c) B ⇒ c• V(I(c)) G L(c)

• L(c): A-local variables that appear in c

• By definition a clause interpolant of is an interpolant

• Proof: show that I(c) is a clause interpolant for every c

20

RVE: Pros and Cons• Pros

• Terminates where A-local variable elimination blows up in many cases because of variable elimination locality

21

I = (g1 g2 g3 g4) (g1 g2 g3 g4) (g2 g4)Resolution-driven variable elimination:

A-local variable elimination:

I = (g1 g2 g3 g4) (g1 g2) (g1 g2 g3 g4) (g1 g2 g3) (g2 g4)


g2 g4a1 g2 g3

a1 g2 g3

g2 g3

g3

⊥

g1 a1

g4

a1

g2

g3

g1 g2 g1 g2 g3

Saved!

22


• Generates significantly smaller interpolants than A-local variable elimination because of variable elimination locality

• Unlike McMillan’s method: • Optimizes the interpolant on-the-fly by local variable elimination• Generates the interpolant natively in CNF

23

I = (g1 g2 g3 g4) (g1 g2 g3 g4) (g2 g4)

Resolution-driven variable elimination:


g2 g4a1 g2 g3

a1 g2 g3

g2 g3

g3

⊥

g1 a1

g4

a1

g2

g3

McMillan’s method:

24


• Generates significantly smaller interpolants than A-local variable elimination because of variable elimination locality

• Unlike McMillan’s method: • Optimizes the interpolant on-the-fly by local variable elimination• Generates the interpolant natively in CNF

• Cons• Might still blow-up because of variable elimination unlike McMillan’s

method

25

Near-Interpolants

• The algorithm: • Adjust RVE to generate a B-weak interpolant missing only few

clauses from an interpolant. It may still find interpolants.• Find the remaining clauses with model checking techniques

B-weak Interpolant• A ⇒ I• I B ⇒ • V(I) G

26

Find B-weak Interpolant

1. Apply RVE adjusted as follows:• For each node with A-local pivot variable p eliminate p only if the

clause interpolant doesn’t grow as a result (bounded elimination)

2. Apply bounded A-local variable elimination to I globally3. Apply incomplete A-local variable elimination to I

• Eliminate A-local variables, but apply resolution only to some of the pairs, such that each input clause still contributes to at least one output clause

Non-Global Interpolant• A ⇒ I• I B ⇒ • (I) G

B-weak Interpolant• A ⇒ I• I B ⇒ • (I) G

After this stage we have either an interpolant or a non-global interpolant. We return in the former case, and continue in the latter.We have either an interpolant or a non-global interpolant. We return in the former case, and continue in the latter.

We return a B-weak interpolant (which perchance may be an interpolant)

27

a1 g1 g2 g1 g3

a1 g2 g3

a1 g2 g4

a1 g3 g4

a1 g3 g4

a1 g4

a1 g6 g5 a1 g6

a1 g5

g4 g5

g4 g5

⊥g 5

g5

a1 g1 g2

g1

g2

g3

a1

g4

g5

a1 g2 g4 a1 g3 g4 a1 g6 g5 a1 g6

a1 g1 g2

(a1 g1 g2) (a1 g2 g4)

(a1 g1 g2) (a1 g2 g4) (a1 g3 g4)

(a1 g6 g5) (a1 g6)

I = (a1 g1 g2) (a1 g2 g4) (a1 g3 g4) (a1 g6 g5) (a1 g6)

Variable elimination is skipped, since it would increase the number of clauses! I

II is a non-global interpolant.

g6

B

28

I = (a1 g1 g2) (a1 g2 g4) (a1 g3 g4) (a1 g6 g5) (a1 g6)

Variable elimination is skipped, since it would increase the number of clauses!

I’ = (g1 g2 g6 g5) (g2 g4 g6) (g3 g4 g6 g5)

Incomplete variable elimination example: each input clause contributes to the output

I’ is a B-weak interpolant!

29

RVE: Optimizations1. Store only such parts of the resolution refutation that

are reachable from A• Essential to keep the resolution refutation small• Can also be applied to McMillan’s method

30

RVE: Optimizations

2. Start from the vertex cut in A-resolution refutation, such that:

• its clauses are implied by A only, and• it’s the closest possible to

resolution refutation restricted to clauses implied by A

Consider the cut as the input clauses instead of A

31


g2 g4a1 g2 g3

a1 g2 g3

g2 g3

g3

⊥

g1 a1

g4

a1

g2

g3

I = g2 g3

I

I

I is an interpolant!

32

a1 g1 g2 g1 g3

a1 g2 g3

a1 g2 g4

a1 g3 g4

a1 g3 g4

a1 g4

a1 g6 g5 a1 g6

a1 g5

g4 g5

g4 g5

⊥g 5

g5

a1 g1 g2

g1

g2

g3

a1

g4

g5

a1 g2 g4 a1 g3 g4

a1 g1 g2

(a1 g1 g2) (a1 g2 g4)

(a1 g1 g2) (a1 g2 g4) (a1 g3 g4)

a1 g5

I = (g1 g2 g5) (g2 g4 g5) (g3 g4 g5)

I

I

g6

I is an interpolant

33

Experiments• Benchmarks: HWMC’12 benchmark set, 289 instances• Machines: Intel E5-2687W, 3.1GHz freq.; 32GB mem.• Timeout: 900 sec.

34

Results Summary• CNF-ITP vs. ITP vs. IC3, run-time

• CNF-ITP outperforms ITP in 43 cases, while ITP is better in 18 cases• CNF-ITP outperforms IC3 in 23 cases, while IC3 is better in 80 cases• CNF-ITP outperforms both ITP and IC3 in 18 cases

• CNF-ITP vs. ITP, interpolant size: 117x reduction!

• RVE in CNF-ITP:• CNF-ITP with RVE only solved 16 instances out of 51 solved by CNF-ITP.• CNF-ITP with RVE only outperforms both ITP and IC3 in 9 cases • >95% of the clauses in the interpolants were generated by RVE

• Some clauses are used across bounds and iterations in CNF-ITP• The remaining 5% clauses were generated with B-strengthening (inductive

generalization)

35

0 100 200 300 400 500 600 700 800 9000

100

200

300

400

500

600

700

800

900

ITP vs. CNF-ITP Run-Time

ITP

CNF-ITP

36

1 10 100 1000 10000 100000 1000000 100000001

10

100

1000

10000

100000

1000000

10000000

Average Clause Size Comparison (Log. Scale)

ITP

CNF - ITP

37

0 50 100 150 200 2500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

CNF-ITP: Ratio of Clauses Learnt with B-strengthening

Instance Number

Clause Ratio

Instance 74: the last one where all the clauses are generated with RVE

38

Challenges• How to direct the SAT solver towards a good interpolant?

• How to assess what “good” is?

• The ultimate challenge: design an algorithm that instantly generates “good” tiny interpolants in CNF whenever the SAT solver completes

40

McMillan’s Method: Correctness• A ⇒ I

• Prove I ⇒ A as follows• Let m be an assignment that falsifies I • m defines a path from to a clause in A, falsified by m.

• Invariant: p(c) is falsified by m for every clause in the path

41


g2 g4a1 g2 g3

a1 g2 g3

g2 g3

g3

⊥

g1 a1

g4

a1

g2

g3

g1 g2 g1 g3

(g1 g2) (g1 g3)

g2 g3 g4 g2 g4 T T

g2 g4

(g2 g3 g4) (g2 g4)

I = [(g1 g2) (g1 g3)] [(g2 g3 g4) (g2 g4)]

I

Im = {a1, g1, g2, g3, g4}Non-A-local pivot: Choose a parent c, whose p(c) is falsifiedA-local pivot: Choose a parent c, whose pivot literal is falsified (both p(c)’s are falsified)

A ⇒ I holds: the end clause in A is falsified by construction!

42



• I B ⇒ • Invariant that holds for every clause: p(c) B ⇒ c

• p(c) B ⇒ c implies (I = p()) B ⇒

43


g2 g4a1 g2 g3

a1 g2 g3

g2 g3

g3

⊥

g1 a1

g4

a1

g2

g3

g1 g2 g1 g3

(g1 g2) (g1 g3)

g2 g3 g4 g2 g4 T T

g2 g4

(g2 g3 g4) (g2 g4)

I = [(g1 g2) (g1 g3)] [(g2 g3 g4) (g2 g4)]

I

IMcMillan’s Method The invariant: p(c) B ⇒ cThe leafs: trivially holds

Global pivot: p(c3) B = p(c1) p(c2) B ⇒ c1 c2 ⇒c3

Local pivot: Assume m╞ p(c3) B = (p(c1) B) (p(c2) B)

Assume WLOG m╞ p(c1) B.

Since p(c1) B ⇒ c1, we have m╞ c1.

We have m╞ g(c1), otherwise switching the pivot’s value in m would contradict p(c1) B ⇒ c1.

c3 = g(c1) g(c2) \ (p G). Hence m╞ c3

m = {a1, g1, g2, g3, g4}

44



• I B ⇒ • The following invariant holds: p(c) B ⇒ c

• p(c) B ⇒ c implies I = p() B ⇒

• V(I) G• By construction

45


g2 g4a1 g2 g3

a1 g2 g3

g2 g3

g3

⊥

g1 a1

g4

a1

g2

g3

a1 g1 g2 a1 g1 g3

(a1 g1 g2) (a1 g1 g3)

a1 g2 g3 g4 a1 g2 a1 g4

g2 g4

(a1 g2 g3 g4) (g2 g4)

I = [(g1 g2 g3 g4) (g1 g2 g3 g4)] (g2 g4)]

I

IClause interpolant: A ⇒ I(c)I(c) B ⇒ cV(I(c)) G L(c)

The leafs: trivially holdsGlobal pivot:A ⇒ I(c1) I(c2) = I(c3)

I(c3) B = I(c1) I(c2) B ⇒ c1

c2 ⇒ c3

V(I(c3)) = V(I(c1)) V(I(c2)) G L(c1) L(c2) = G L(c3)

Local pivot:A ⇒ I(c1) I(c2) ⇒ VE(I(c1) I(c2), p) = I(c3)

I(c3) B = VE(I(c1) I(c2), p) B ⇒ c3

V(I(c3)) = V(I(c1)) V(I(c2)) \ {p} G L(c1) L(c2) \ {p} = G L(c3)

I(c1) I(c2) B ⇒ c1 c2 ⇒ c3

Lemma 1: Let: (1) X ∧ Y ⇒ c; (2) p V(Y c). Then: VE(X, p) ∧ Y ⇒ c.

RVE Correctness

generating tiny interpolants and near-interpolants from a resolution refutation

Documents

solver interpolants

given model

complete model

interpolantour model

model checker

invocation of itp

interpolant generation

itpour new model