Caching in Backtracking Search
Fahiem BacchusUniversity of Toronto
Introduction Backtracking search needs only space linear in the
number of variable (modulo the size of the problem representation).
However, its efficiency can greatly benefit from using more space to cache information computed during search. Caching can provable yield exponential improvements in the
efficiency of backtracking search. Caching is an any-space feature. We can use as much or as
little space for caching as we want without affecting soundness or completeness.
Unfortunately, caching can also be time consuming.How do we exploit the theoretical potential
of caching in practice?
2 Fahiem Bacchus, University of Toronto
04/19/23
Introduction We will examine this question for
The problem of finding a single solution And for problems that require considering all
solutions Counting the number of solutions/computing probabilities Finding optimal solutions.
We will look at The theoretical advantages offered by caching. Some of the practical issues involved with realizing
these theoretical advantages. Some of the practical benefits obtained so far.
3 Fahiem Bacchus, University of Toronto
04/19/23
Outline1. Caching when searching for a single solution.
Clause learning in SAT. Theoretical results. Its practical application and impact.
Clause learning in CSPs.
2. Caching when considering all solutions. Formula caching for sum of products problems
Theoretical results. Practical application
4 Fahiem Bacchus, University of Toronto
04/19/23
1. Caching when searching for a single solution
Fahiem Bacchus, University of Toronto
5 04/19/23
1.1 Clause Learning in SAT
Fahiem Bacchus, University of Toronto
6 04/19/23
Clause Learning in SAT (DPLL)
Fahiem Bacchus, University of Toronto
7
Clause learning is the most successful form of caching when searching for a single solution [Marques-Silva and Sakallah, 1996; Zhang et al., 2001].
Has revolutionized DPLL SAT solvers (i.e., Backtracking SAT solvers).
04/19/23
Clause Learning in SAT
Fahiem Bacchus, University of Toronto
8 04/19/23
XAssumption
1. Branch on a variable
2. Perform Propagation
Unit Propagation
),,,(),(),(),( DCBACXBXAX
),( AXA
),( BXB
),( CXC
),,,( DCBAD
Clause Learning in SAT
Fahiem Bacchus, University of Toronto
9 04/19/23
XAssumption
AAX ),(
BBX ),(
CCX ),(
DDCBA ),,,(
Every inferred literal is labeled with a clausal reason.
The clausal reason for a literal is a subset of the previous literals on the path whose setting implies the literal
DCBADCBA ),,,(
Clause Learning in SAT
Fahiem Bacchus, University of Toronto
04/19/23
XAssumption
AAX ),(
BBX ),(
CCX ),(
DDCBA ),,,(
YAssumption
PPY ),(
QQY ),(
DDPQ ),,(
Contradiction:1. D is forced to be both True and
False.
2. The clause (Q,P,D) has been falsifiedFalsified clauses are called conflict clauses.
10
Clause Learning in SAT
Fahiem Bacchus, University of Toronto
04/19/23
XAssumption
AAX ),(
BBX ),(
CCX ),(
DDCBA ),,,(
YAssumption
PPY ),(
QQY ),(
DDPQ ),,(
Clause learning occurs when a contradiction is reached.
This involves a sequence of resolution steps.
Any implied literal in a clausal reason can be resolved away by resolving the clause with the clausal reason for the implied literal.
11
),,(),(),,,( YDQPYDPQ
),,,(),(),,,,( XDBACXDCBA
),,(),(),,,( YDPQYDPQ
Clause Learning in SAT
Fahiem Bacchus, University of Toronto
04/19/23
XAssumption
AAX ),(
BBX ),(
CCX ),(
DDCBA ),,,(
YAssumption
PPY ),(
QQY ),(
DDPQ ),,(
12
SAT solvers utilize a particular sequence of resolutions against the conflict clause.
1-UIP learning [Zhang et al., 2001]—iteratively resolve away the deepest implied literal in the clause until the clause contains only one literal from the level the contradiction was generated.),,(),(),,,( YDPQYDPQ
),(),(),,,( YDPYYDP
Far Backtracking in SAT
Fahiem Bacchus, University of Toronto
04/19/23
XAssumption
AAX ),(
BBX ),(
CCX ),(
DDCBA ),,,(
YAssumption
PPY ),(
QQY ),(
DDPQ ),,(
13
Once the 1-UIP clause is learnt the SAT Solver backtracks to the level this clause became unit.
),( YD
1-UIP Clause
It then uses the clause to force a new literal.
Performs UP Continues its search.
YDY ),(
Theoretical Power of Clause Learning
Fahiem Bacchus, University of Toronto
14
The power of clause learning has been examined from the point of view of the theory of proof complexity [Cook & Reckhow 1977].
This area looks at the question of how large proofs can become and their relative sizes in in different propositional proof systems.
DPLL with Clause learning performing resolution (a particular type or resolution).
Various restricted versions of resolution have been well studied.
[Buresh-Oppenhiem, Pitassi 2003] contains a nice review of previous results and a number of new results in this area.
04/19/23
Theoretical Power of Clause Learning
Fahiem Bacchus, University of Toronto
15
Every DPLL search tree refuting an UNSAT instance contains a TREE-Resolution.
TREE-Resolution proofs can be exponentially larger than REGULAR-Resolutions proofs.
REGULAR-Resolutions proofs can be exponentially larger than general (unrestricted) resolution proofs.
04/19/23
UNSAT formulas min_size(DPLL Search Tree)
≥ min_size(TREE-Resolution)
>> min_size(REGULAR-Resolution)
>> min_size(general resolution)
Theoretical Power of Clause Learning
Fahiem Bacchus, University of Toronto
16
Furthermore every TREE-Resolution proof is a REGULAR-Resolution proof and every REGULAR-Resolution proof is a general resolution proof.
04/19/23
UNSAT formulasmin_size(DPLL Search Tree)
≥ min_size(TREE-Resolution)
≥ min_size(REGULAR-Resolution)
≥ min_size(general resolution)
Theoretical Power of Clause Learning
Fahiem Bacchus, University of Toronto
17
[Beame, Kautz, and Sabharwal 2003] showed that clause learning can SOMETIMES yield exponentially smaller proofs than REGULAR.
Unknown if general resoution proofs are some times smaller.
04/19/23
UNSAT formulasmin_size(DPLL Search Tree)
≥ min_size(TREE-Resolution) >> min_size(REGULAR-Resolution)
>> min_size(Clause Learning DPLL Search Tree)≥ min_size(general resolution)
Theoretical Power of Clause Learning
Fahiem Bacchus, University of Toronto
18
It is still unknown if REGULAR or even TREE resolutions can sometimes be smaller than the smallest Clause Learning DPLL Search tree.
04/19/23
Theoretical Power of Clause Learning
Fahiem Bacchus, University of Toronto
19
It is also easily observed [Beame, Kautz, and Sabharwal 2003] that with restarts clause learning can make the DPLL Search Tree as small as the smallest general resolution proof on any formula.
04/19/23
UNSAT formulasmin_size(Clause Learning + Restarts
DPLL Search Tree)
= min_size(general resolution)
Theoretical Power of Clause Learning
Fahiem Bacchus, University of Toronto
20
In sum. Clause Learning, especially with restarts, has the potential to yield exponential reductions in the size of the DPLL search tree.
With clause learning DPLL can potentially solve problems exponentially faster. That this can happen in practice has been irrefutably demonstrated by modern SAT solvers.
Modern SAT solvers have been able to exploit the theoretical potential of clause learning.
04/19/23
Theoretical Power of Clause Learning
Fahiem Bacchus, University of Toronto
21
The theoretical advantages of clause learning also hold for CSP backtracking search
So the question that arises is can the theoretical potential of clause learning also be exploited in CSP solvers.
04/19/23
1.1 Clause Learning in CSPs
Fahiem Bacchus, University of Toronto
22 04/19/23
Clause Learning in CSPs
Fahiem Bacchus, University of Toronto
23
Joint work with George Katsirelos who just completed his PhD with me “NoGood Processing in CSPs”
Learning has been used in CSPs, but have not had the kind of impact Clause Learning has had in SAT.[Decther 1990; T. Schiex & G. Verfaillie 1993; Frost & Dechter 1994; Jussien & Barichard 2000]
This work has investigated NoGood learning.
04/19/23
A NoGood is a set of variable assignments that cannot be extended to a solution.
NoGood Learning
Fahiem Bacchus, University of Toronto
24
NoGood Learning is NOT Clause Learning. It is strictly less powerful.
To illustrate this let us consider encoding a CSP as a SAT problem, and compare what Clause Learning will do on the SAT encoding to what NoGood Learning would do.
04/19/23
Propositional Encoding of a CSP—the propositions.
Fahiem Bacchus, University of Toronto
25
A CSP consists of a Set of variables Vi and constraints Cj Each variable has a domain of values Dom[Vi] = {d1, …, dm}.
Consider the set of propositions Vi=dj one for each value of each variable.
Vi=dj means that Vi has been assigned the value dj. True when the assignment has been made.
¬(Vi=dj) means that Vi has not been assigned the value dj True when dj has been pruned from Vi’s domain.
if Vi has been assigned a different value, all other values (including dj) are pruned from its domain.
Usually write Vi≠dj instead of ¬(Vi=dj).
We encode the CSP using clauses over these assignment propositions.
04/19/23
Propositional Encoding of a CSP—the clauses.
Fahiem Bacchus, University of Toronto
26
For each variable V with Dom[V]={d1,…,dk} we have the following clauses:
(V=d1,V=d2,…,V=dk) (must have a value) For every pair of values (di, dk) the clause (V ≠ di, V ≠
dk) (has a unique value) For each constraint C(X1,…,Xk) over some set of
variables we have the following clauses: For each assignment to its variables that falsifies the
constraint we have a clause blocking that assignment. If C(a,b,…,k) = FALSE then we have the clause
(X1 ≠ a, X2 ≠ b, …, Xk ≠ k) This is the direct encoding of [Walsh 2000].
04/19/23
DPLL on this Encoded CSP.
Fahiem Bacchus, University of Toronto
27
Unit Propagation on this encoding is essentially equivalent to Forward Checking on the original CSP.
04/19/23
DPLL on the encoded CSP
Fahiem Bacchus, University of Toronto
28 04/19/23
Variables Q, X, Y, Z, ...
Dom[Q] = {0,1} Dom[X,Y,Z] =
{1,2,3} Constraints
Q + X + Y ≥ 3 Q + X + Z ≥ 3 Q + Y + Z ≤ 3
2)2,2,0( ZZYQ
0 QAssumption
1)1,0( QQQ
1 XAssumption2)2,1( XXX
3)3,1( XXX
1)1,1,0( YYXQ
1)1,1,0( ZZXQ
3)3,2( YYY
2 YAssumption
3)3,2,0( ZZYQ
3)3,2,1( ZZZZ
DPLL on the encoded CSP
Fahiem Bacchus, University of Toronto
29 04/19/23
Clause learning
2)2,2,0( ZZYQ
0 QAssumption
1)1,0( QQQ
1 XAssumption2)2,1( XXX
3)3,1( XXX
1)1,1,0( YYXQ
1)1,1,0( ZZXQ
3)3,2( YYY
2 YAssumption
3)3,2,0( ZZYQ
)2,1,2,0()3,2,0(),3,2,1( ZZYQZYQZZZ
3)3,2,1( ZZZZ
DPLL on the encoded CSP
Fahiem Bacchus, University of Toronto
30 04/19/23
Clause learning
2)2,2,0( ZZYQ
0 QAssumption
1)1,0( QQQ
1 XAssumption2)2,1( XXX
3)3,1( XXX
1)1,1,0( YYXQ
1)1,1,0( ZZXQ
3)3,2( YYY
2 YAssumption
)1,2,0()2,2,0(),2,1,2,0( ZYQZYQZZYQ
A 1-UIP Clause
DPLL on the encoded CSP
Fahiem Bacchus, University of Toronto
31
This clause is not a NoGood! It asserts that we cannot have Q = 0, Y = 2, and Z
≠ 1 simultaneously. This is a set of assignments and domain prunings
that cannot lead to a solution. A NoGood is only a set of assignments. To obtain a NoGood we have to further resolve
awayZ = 1 from the clause.
04/19/23
)1,2,0( ZYQ
DPLL on the encoded CSP
Fahiem Bacchus, University of Toronto
32 04/19/23
NoGood learning
0 QAssumption
1)1,0( QQQ
1 XAssumption2)2,1( XXX
3)3,1( XXX
1)1,1,0( YYXQ
1)1,1,0( ZZXQ
2 YAssumption
)2,1,0()1,1,0(),1,2,0( YXQZXQZYQ
This clause is a NoGood. It says that we cannot have the set of assignments Q = 0, X = 1, Y = 2
NoGood learning requires resolving the conflicts back to the decision literals.
NoGoods vs. Clauses (Generalized NoGoods)
Fahiem Bacchus, University of Toronto
33 04/19/23
1. Unit propagation over a collection of learnt NoGoods is ineffective.
Nogoods are clauses containing negated literals only, e.g.,(Z ≠ 1, Y ≠ 0, X ≠ 3). If one of these clauses becomes unit, e.g., (X ≠ 3), the forced literal can only satisfy other NoGood clauses, it can never reduce the length of those clauses.
2. A single clause can represent an exponential number of NoGoods
(Q ≠ 1, Z = 1, Y = 1) is equivalent to (Domain = {1, 2, 3})(Q ≠ 1, Z ≠ 2, Y ≠ 2) (Q ≠ 1, Z ≠ 3, Y ≠ 2) (Q ≠ 1, Z ≠ 2, Y ≠ 3) (Q ≠ 1, Z ≠ 3, Y ≠ 3)
NoGoods vs. Clauses (Generalized NoGoods)
Fahiem Bacchus, University of Toronto
34 04/19/23
3. The 1-UIP clause can prune more branches during the future search than the NoGood clause [Katsirelos 2007].
4. Clause Learning can yield super-polynomially smaller search trees than NoGood Learning [Katsirelos 2007]
Encoding to SAT
Fahiem Bacchus, University of Toronto
35 04/19/23
With all of these benefits of clause learning over NoGood learning the natural question is
Why not encode CSPs to SAT and immediately obtain the benefits of Clause Learning already implemented in modern SAT solvers?
Encoding to SAT
Fahiem Bacchus, University of Toronto
36 04/19/23
1. The SAT theory produced by the direct encoding is not very effective.
Unit Prop. on this encoding only achieves Forward Checking (a weak form of propagation).
2. Under the direct encoding constraints of arity k yield 2O(k) clauses. Hence the resultant SAT theory is too large.
3. No direct way of exploiting propagators. Specialized polynomial time algorithms for
doing propagation on constraints of large arity.
Encoding to SAT
Fahiem Bacchus, University of Toronto
37 04/19/23
Some of these issues can be address by better encodings, e.g., [Bacchus 2007, Katsirelos & Walsh 2007, Quimper & Walsh 2007]. But overall complete conversion to SAT is currently impractical.
Clause Learning in CSPs without encoding
Fahiem Bacchus, University of Toronto
38 04/19/23
We can perform Clause Learning in a CSP solver by the following steps:
1. The CSP solver must keep track of the chronological sequence of variable assignments and value prunings made as we descend each path in the search tree.0Q
1X
0Q
1Q
1X
2X
3X
1Y
Clause Learning in CSPs without encoding
Fahiem Bacchus, University of Toronto
39 04/19/23
2. Each item must be labeled with a clausal reason consisting of items previously falsified along the path.
0Q
1Q
1X
2X
3X
1Y
0 QAssumption
1)1,0( QQQ
1 XAssumption
2)2,1( XXX
3)3,1( XXX
1)1,1,0( YYXQ
Clause Learning in CSPs without encoding
Fahiem Bacchus, University of Toronto
40 04/19/23
3. Contradictions are labeled by falsified clauses, e.g., Domain Wipe Outs can be labeled by the must have a value clause.
From this information clause learning can be performed whenever a contradiction is reached.
These clauses can be stored in a clausal database
Unit Propagation can be run on this database as new value assignments or value prunings are preformed.
The inferences of Unit Propagation agument the other constraint propagation done by the CSP solver.
)3,2,1( ZZZ
Higher Levels of Local Consistency Note that this technique works irrespective of
kinds of inference performed during search. That is, we can use any kind of inference we want to
infer a new value pruning or new variable assignment—as long as we can label the inference with a clausal reason.
This raises the question of how do we generate clausal reasons for other forms inference.
[Katsirelos 2007] answers this question for the most commonly used form of inference: Generalized Arc Consistency. Including ways of obtain clausal reasons from various
types of GAC propagators, ALL-DIFF, GCC.
Fahiem Bacchus, University of Toronto
41 04/19/23
Some Empirical Data [Katsirelos 2007]
Fahiem Bacchus, University of Toronto
42 04/19/23
GAC with NoGood learning helps a bit. GAC with clause learning but where GAC labels it inferences with NoGoods
offers only minor improvements. To get significant improvements must do clause learning as well have
proper clausal reasons from GAC.
Observations
Fahiem Bacchus, University of Toronto
43 04/19/23
Caching techniques have great potential, but to make them effective in practice it can sometimes require resolving a number of different issues.
This work goes a long ways towards achieving the goal of exploiting the theoretical potential of Clause Learning.
Prediction: Clause learning will play a fundamental role in the next generation of CSP solvers, and these solvers will often be orders of magnitude more effective than current solvers.
Open Issues
Fahiem Bacchus, University of Toronto
44 04/19/23
Many issues remain open. Here we mention only one: Restarts.
As previously pointed out, clause learning gains a great deal more power with restarts. With restarts it can be as powerful as unrestricted resolution.
Restarts play an essential role in the performance of SAT solvers. Both full restarts and partial restarts.
Search vs. Inference
Fahiem Bacchus, University of Toronto
45 04/19/23
With restarts and clause learning, the distinction of search vs. inference is turned on its head.
Now search is performing inference. Instead the distinction becomes systematic vs.
opportunistic inference. Enforcing a high level of consistency during search is
performing systematic inference. Searching until we learn a good clause is
opportunistic. Sat solvers perform very little systematic
inference, only Unit Propagation, but they perform lots of opportunistic inference.
CSP solvers essentially do the opposite.
One Open Question
Fahiem Bacchus, University of Toronto
46 04/19/23
In SAT solvers opportunistic inference is feasible: if a learnt clause turns out not to be useful it doesn’t matter much as the search to learn that clause did not take much time. Search (nodes/second rate) is very fast.
In CSP solvers the enforcement of higher levels of local consistency makes restarts and opportunistic inference very expensive. Search (nodes/second rate) is very slow.
Is high levels of consistency really the most effective approach for solving CSP once Clause learning is available?
2. Formula Caching when considering all solutions.
Fahiem Bacchus, University of Toronto
47 04/19/23
Considering All Solutions? One such class of problems are those that can be
expressed as Sum-Of-Product problems [Decther 1999].
1.Finite Set of Variables, V1, V2, …, Vn2.A finite domain of values for each variable
Dom[Vi].3.A finite set of real valued local functions f1, f2,
…, fm. Each function is local in the sense that it only
depends on a subset of the variables. f1(V1, V2), f2(V2, V4, V6), …
The locality of the functions can be exploited algorithmically.
Fahiem Bacchus, University of Toronto
48 04/19/23
Sum of Products The sum of products problem is to compute
from this representation
Fahiem Bacchus, University of Toronto
49 04/19/23
)()()(1 2
21 V V V
m
n
fff
The local functions assign a value to every complete instantiation of the variables (the product) and we want to compute some amalgamation of these values
A number of different problems can be cast as instances of sum-of-product [Decther 1999].
Sum of Products—Examples
Fahiem Bacchus, University of Toronto
50 04/19/23
#CSPs count the number of solutions. Inference in Bayes Nets. Optimization: the functions are sub-objective
functions returning real values and the global objective is to maximize the sum of the sub-objects (cf. soft constraints, generalized additive utility).
Algorithms Brief History [Arnborg et al. 1988] It had long been noted that various NP-
Complete problems on graphs were easy on Trees.
With the characterization of NP completeness systematic study of how to extend these techniques beyond trees started in the 1970s
A number of Dynamic Programming algorithms were developed for partial K-trees which could solve many hard problem in time linear in the size of the graph (but exponential in K)
Fahiem Bacchus, University of Toronto
51 04/19/23
Algorithms Brief History [Arnborg et al. 1988] These ideas were made systematic by Robertson &
Seymour who wrote a series of 20 articles to prove Wagner’s conjecture [1983].
Along the way they defined the concept of Tree and Branch decompositions and the graph parameters Tree-Width and Branch-Width.
It was subsequently noted that partial k-Trees are equivalent to the class of graphs with tree width ≤ k. So all of the dynamic programming algorithms developed for partial K-Trees worked for tree width k graphs.
The notion of tree width has been exploited many areas of computer science and combinatorics & optimization. .
Fahiem Bacchus, University of Toronto
52 04/19/23
Three types of Algorithms These algorithms all take one of three
basic forms all of which achieve the same kinds of tree-width complexity guarantees.
To understand these forms we first introduce the notion of a branch decomposition (which is somewhat easier to utilize than tree-decompositions when dealing the local functions with arity greater than 2)
Fahiem Bacchus, University of Toronto
53 04/19/23
Branch Decomposition
Fahiem Bacchus, University of Toronto
54 04/19/23
Start with m leaf nodes one for each of the local functions.
Map each local function to some leaf node.
f3 f6 f1 f4 f2 f7 f5
Branch Decomposition
Fahiem Bacchus, University of Toronto
55 04/19/23
V4, V5
V5,V6,V7
V3,V7
V1,V3
V4,V6
V8,V6
V5, V2
Label each leaf node with the variables in the scope of the associated local function.
f3 f6 f1 f4 f2 f7 f5
Branch Decomposition
Fahiem Bacchus, University of Toronto
56 04/19/23
V4, V5
V5,V6,V7
V3,V7
V1,V3
V4,V6
V8,V6
V5, V2
Build a binary tree on top of these nodes.
V4,V6,V3
Branch Decomposition
Fahiem Bacchus, University of Toronto
57 04/19/23
V4, V5
V5,V6,V7
V3,V7
V1,V3
V4,V6
V8,V6
V5, V2
Then label the rest of the nodes of the tree.
V5,V6,V3V4, V5 V3,V4,V6
V4,V6,V3
Internal Labels
Fahiem Bacchus, University of Toronto
58 04/19/23
B
B variables in the rest of the tree. (Not in subtree under the node)
AB
A variables in the subtree below
Internal Labels
Fahiem Bacchus, University of Toronto
59 04/19/23
A B
v v
Branch Width Width of a particular decomposition is
the size of the minimal label. Branch width is the minimal width
over all possible branch decompositions.
Branch width is no more than the tree-width.
Fahiem Bacchus, University of Toronto
60 04/19/23
Algorithms: Dynamic Programming Bottom up Dynamic Programming,
e.g., Join Tree algorithms in Bayesian Inference
Fahiem Bacchus, University of Toronto
61 04/19/23
V4,V6,V3
V4, V5
V3,V7
V1,V3
V4,V6
V8,V6
V5, V2
V5,V6,V3V4, V5 V3,V4,V6
V4,V6,V3
V5,V6,V7
Algorithms: Variable Elimination Linearize the bottom up process:
Variable Elimination.
Fahiem Bacchus, University of Toronto
62 04/19/23
V4,V6,V3
V4, V5
V3,V7
V1,V3
V4,V6
V8,V6
V5, V2
V5,V6,V3V4, V5 V3,V4,V6
V4,V6,V3
V5,V6,V7
Algorithms: Instantiation and Decomposition Instantiate variables starting at the
top (V4, V6 and V3) and decompose the problem.
Fahiem Bacchus, University of Toronto
63 04/19/23
V4,V6,V3
V4, V5
V3,V7
V1,V3
V4,V6
V8,V6
V5, V2
V5,V6,V3V4, V5 V3,V4,V6
V4,V6,V3
V5,V6,V7
V8,V1
V5,V2,V7
Instantiation and Decomposition A number of works have used this
approach Pseudo Tree Search [Freuder & Quinn 1985] Counting Solutions [Bayardo & Pehoushek
2000] Recursive Conditioning [Darwiche 2001] Tour Merging [Cook & Seymour 2003] AND-OR Search [Dechter & Mateescu
2004] …
Fahiem Bacchus, University of Toronto
64 04/19/23
Instantiation and Decomposition Solved by AND/OR search: as we
instantiate variables we examine the residual sub-problem.
If the sub-problem consists of disjoint parts that share no variables (components) we solve each component in a separate recursion.
Fahiem Bacchus, University of Toronto
65 04/19/23
Theoretical Results With the right ordering this approach can
solve the problem in time2O(w log n)
and linear space, were w is the branch (tree) width of the instance.
If the solved components are cached so that they do not have to be solved again the approach can solve the problem in time
nO(1)2O(w).But now we need nO(1)2O(w) space.
Fahiem Bacchus, University of Toronto
66 04/19/23
Solving Sum-Of-Products with Backtracking In joint work with Toniann Pitassi &
Shannon Dalmao we showed that caching is in fact sufficient to achieve these bounds with standard backtracking search. [Bacchus, et al. 2003] AND/OR decomposition of the search tree is
not necessary (and may be harmful). Instead an ordinary decision tree can be searched.
Once again Caching again provides a significant increase in the theoretical power of backtracking.
Fahiem Bacchus, University of Toronto
67 04/19/23
Simple Formula Caching As assumptions are made during search the
problem is reduced. In Simple Formula Caching we cache every
solved residual formula, and if we encounter the same residual formula again we utilize its cached value instead of solving the same sub-problem again.
Two residual formulas are the same if They contain the same (unassigned) variables. All instantiated variables in the remaining
constraints (constraints with at least one unassigned variable) are instantiated to the same value.
Fahiem Bacchus, University of Toronto
68 04/19/23
Simple Formula Caching C1(X,Y), C2(Y,Z) C3(Y,Q) [X=a,Y=b] C2(Y=b,Z) C3(Y=b,Q)
C1(X,Y), C2(Y,Z) C3(Y,Q) [X=b,Y=b]
C2(Y=b,Z) C3(Y=b,Q)
These residual formulas are the same even though we obtained them from different instantiations.
Fahiem Bacchus, University of Toronto
69 04/19/23
Simple Formula Caching
Fahiem Bacchus, University of Toronto
70 04/19/23
BTSimpleCache(Φ)if InCache(Φ), return CachedValue(Φ)else
Pick a variable V in Φ, value = 0for d in Domain[V]
value = value + BTSimpleCache(Φ|V=d)
AddToCache(Φ, value)return
Runs in time and space 2O(w log n)
Component Caching We can achieve the same
performance as AND/OR decomposition, i.e., 2O(w log n) time with linear space or nO(1)2O(w) time with nO(1)2O(w) space by examining the residual formula for disjoint Components.
We cache these disjoint components as they are solved.
We remove any solved component from the residual formula.
Fahiem Bacchus, University of Toronto
71 04/19/23
Component Caching Since components are no longer
solved in a separate recursion we have to be a bit cleverer about identifying the value of these components from the search computation.
This can be accomplished by using the cache in a clever way, or by dependency tracking techniques.
Fahiem Bacchus, University of Toronto
72 04/19/23
Component Caching There are some potential advantages of
searching a single tree rather than and AND/OR tree. With an AND/OR tree one has to make a
commitment to which component to solve first. The wrong decision when doing Bayesian
Inference or optimization with Branch and Bound can be expensive
In the single tree the components are solved in an interleaved manner.
This also provides more flexibility with respect to variable ordering.
Fahiem Bacchus, University of Toronto
73 04/19/23
Bayesian Inference via Backtracking Search These ideas were used to build a
fairly successful Bayes Net Reasoner. [Bacchus et al. 2003].
Better performance however would require exploiting more of the structure internal to the local functions.
Fahiem Bacchus, University of Toronto
74 04/19/23
Exploiting Micro Structure C1(A,Y,Z) = TRUE
A=0 Y = 1 A = 1 Y=0 & Z = 1
Then C(A=0,Y,Z) is in fact not a function of Z. That is C(A=0,Y,Z) C(A=0,Y)
C2(X,Y,Z) = TRUE X + Y + Z ≥ 3
Then C(X=3, Y, Z) is already satisfied.
Fahiem Bacchus, University of Toronto
75 04/19/23
Exploiting Micro Structure In both cases if we could detect this
during search we could potentially Generate more components, e.g., if we
could reduce C1(A=0,Y,Z) to a C1(A=0,Y) perhaps Y and Z would be in different components.
Generate more cache hits, e.g., if the residual formula differs from a cached formula only because it contains C2(X=3,Y,Z), recognizing that constraint is already satisfied would allow us to ignore it and generate the cache hit.
Fahiem Bacchus, University of Toronto
76 04/19/23
Exploiting Micro Structure It is interesting to note that if we
encode to CNF we do get to exploit more of the micro structure (structure internal to the constraint). Clauses with a true literal are satisfied and can be
removed from the residual formula.
Bayes Net Reasoners using CNF encodings have displayed very good performance [Chavira & Darwiche 2005].
Fahiem Bacchus, University of Toronto
77 04/19/23
Exploiting Micro Structure Unfortunately, as pointed out before,
encoding in CNF can result in an impractical blowup in the size of the problem representation.
Practical techniques for exploiting the micro structure remain a promising area for further research. Some promising results by Kitching to detect
when a symmetric version of a current component has already been solved [Kitching & Bacchus 2007], but more work to be done.
Fahiem Bacchus, University of Toronto
78 04/19/23
Observations Component caching solvers are the most
effective ways of exactly computing the number of solutions of a SAT formula.
Allow solution of certain types of Bayesian Inference problems not solvable by other methods.
Have shown promise in solving decomposable optimization problems [Dechter & Marinescu 2005, de Givry et al. 2006, Kitching & Bacchus 2007]
To date all these works have used AND/OR search. So exploiting the advantages of plain backtracking search remains work to be done [Kitching in progress].
Better exploiting micro structure also remains work to be done.
Fahiem Bacchus, University of Toronto
79 04/19/23
Conclusions
Fahiem Bacchus, University of Toronto
80
Caching is a technique that has great potential for making a material difference in the effectiveness of backtracking search.
The range of practical mechanisms for exploiting caching remains a very fertile area for future research.
Research in this direction might well change present day “accepted practice” in constraint solving.
04/19/23
References [Marques-Silva and Sakallah, 1996]
J. P. Marques-Silva and K. A. Sakallah. Grasp—a new search algorithm for Satisfiability. In ICCAD, 220-227, 1996.
[Zhang et al., 2001] L. Zhang, C. F. Madigan, M. H. Moskewicz, and S. Malik. Efficient conflict driven
learning in a Boolean Satisfiability solver. In ICCAD, 279-285, 2001. [Cook & Reckhow 1977]
S. A. Cook and R. A. Reck-how, The relative efficiency of propositional proof systems, J. Symb. Logic, 44 (1977), 36-50.
[Buresh-Oppenhiem, Pitassi 2003] J. Buresh-Oppenheim and T. Pitassi, The Complexity of Resolution Refinements, in
Proceedings of the 18th IEEE Symposium on Logic in Computer Science (LICS), June 2003, pp. 138-147
[Beame, Kautz, and Sabharwal 2003] P. Beame, H. Kautz, and A. Sabharwal: Towards Understanding and Harnessing the
Potential of Clause Learning. J. Artif. Intell. Res. (JAIR) 22: 319-351 (2004) [Decther 1990]
R. Dechter: Enhancement Schemes for Constraint Processing: Backjumping, Learning, and Cutset Decomposition. Artif. Intell. 41(3): 273-312 (1990)
[T. Schiex & G. Verfaillie 1993] T. Schiex and G. Verfaillie. Nogood recording for static and dynamic CSP. Proceeding of the
5th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'93), p. 48-55, Boston, MA, november 1993.
Fahiem Bacchus, University of Toronto
81 04/19/23
References [Frost & Dechter 1994]
D. Frost, R. Dechter: Dead-End Driven Learning. AAAI 1994: 294-300 [Jussien & Barichard 2000]
N. Jussien, V. Barichard "The PaLM system: explanation-based constraint programming" , Proceedings of TRICS: Techniques foR Implementing Constraint programming Systems, a post-conference workshop of CP 2000, pp. 118-133, 2000
[Walsh 2000] T. Walsh. SAT v CSP, Proceedings of CP-2000, pages 441-456, Springer-Verlag LNCS-1894, 2000.
[Katsirelos 2007] G. Katsirelos, NoGood Processing in CSPs. PhD thesis. Department of Computer Science,
University of Toronto. [Bacchus 2007]
F. Bacchus. GAC via Unit Propagation. International Conference on Principles and Practice of Constraint Programming (CP 2007) , pages 133-147.
[Katsirelos & Walsh 2007] G. Katsirelos and T. Walsh. A Compression Algorithm for Large Arity Extensional Constraints..
Proceedings of CP-2007, LNCS 4741, 2007. [Quimper & Walsh 2007]
C. Quimper and T. Walsh. Decomposing Global Grammar Constraints. Proceedings of CP-2007, LNCS 4741, 590-604 2007.
[Decther 1999] R. Decther. "Bucket Elimination: A unifying framework for Reasoning." In "Artificial Intelligence",
October, 1999.
Fahiem Bacchus, University of Toronto
82 04/19/23
References [de Givry et al. 2006]
S. de Givry, T. Schiex, G. Verfaillie. Exploiting Tree Decomposition and Soft Local Consistency in Weighted CSP. Proc. of AAAI'2006. Boston (MA), USA.
Fahiem Bacchus, University of Toronto
83 04/19/23