18 th feb

53
18 th Feb Using reachability heuristics for PO planning Planning using Planning Graphs

Upload: adora

Post on 11-Jan-2016

26 views

Category:

Documents


1 download

DESCRIPTION

18 th Feb. Using reachability heuristics for PO planning Planning using Planning Graphs. Then it was cruelly UnPOP ped. In the beginning it was all POP. The good times return with Re (vived) POP. 1970s-1995. 1995. 1997. 2000 -. Domination of heuristic state search approach: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: 18 th  Feb

18th Feb

Using reachability heuristics for PO planning

Planning using Planning Graphs

Page 2: 18 th  Feb

In the beginning it was all POP.

Then it was cruellyUnPOPped

The good timesreturn with Re(vived)POP

Page 3: 18 th  Feb

A recent (turbulent) history of planning

1995

Advent of CSP stylecompilation approach:

Graphplan [Blum & Furst]SATPLAN[Kautz & Selman]

Use of reachabilityanalysis and Disjunctive constraints

1970s-1995

UCPOP, Zeno[Penberthy &Weld]

IxTeT[Ghallab et al]

The whole worldbelieved in POPand was happy to stack 6 blocks!

UCPOP

Domination of heuristicstate search approach:

HSP/R [Bonet & Geffner]UNPOP [McDermott]: POP is dead!

Importance of goodDomain-independentheuristics

1997

UNPOP

2000 -

Hoffman’s FF – a statesearch planner won the AIPS-00 competition!

… but NASA’s highlypublicized RAX still aPOP dinosaur!

POP believed to begood framework tohandle temporaland resource planning[Smith et al, 2000]

RePOP

Page 4: 18 th  Feb

• To show that POP can be made very efficient by exploiting the same ideas that scaled up state search and Graphplan planners– Effective heuristic search control

– Use of reachability analysis

– Handling of disjunctive constraints

• RePOP, implemented on top of UCPOP– Dramatically better than all known partial-order planners

– Outperforms Graphplan and competitive with

state search planners in many (parallel) domains

Outline

RePOP: A revival for partial order planning

Page 5: 18 th  Feb

Partial plan representation

P = (A,O,L,OC,UL)A: set of action steps in the plan S0 ,S1 ,S2 …,Sinf

O: set of action ordering Si < Sj ,…

L: set of causal links OC: set of open conditions (subgoals remain to be satisfied)UL: set of unsafe links where p is deleted by some action Sk

pSi Sj

pSi Sj

S0

S1

S2

S3

Sinf

p

~p

g1

g2

g2oc1

oc2

G={g1 ,g2 }I={q1 ,q2 }

q1

Flaw: Open condition OR unsafe linkSolution plan: A partial plan with no remaining flaw • Every open condition must be satisfied by some action• No unsafe links should exist (i.e. the plan is consistent)

POP background

Page 6: 18 th  Feb

Algorithm

1. Let P be an initial plan2. Flaw Selection: Choose a flaw f (either

open condition or unsafe link)3. Flaw resolution:• If f is an open condition, choose an action S that achieves f• If f is an unsafe link, choose promotion or demotion• Update P• Return NULL if no resolution exist4. If there is no flaw left, return P else go to 2.

S0

S1

S2

S3

Sinf

p

~p

g1

g2g2oc1

oc2

q1

Choice points• Flaw selection (open condition? unsafe link?)• Flaw resolution (how to select (rank) partial plan?)

• Action selection (backtrack point)• Unsafe link selection (backtrack point)

S0

Sinf

g1

g2

1. Initial plan:

2. Plan refinement (flaw selection and resolution):

POP background

Page 7: 18 th  Feb

Our approach (main ideas)

1. Ranking partial plans: use an effective distance-based heuristic estimator

.2. Exploit reachability analysis: use invariants to discover implicit conflicts in the plan.

3. Unsafe links are resolved by posting disjunctive ordering constraints into the partial plan: avoid unnecessary and exponential multiplication of failures due to promotion/demotion splitting

State-space idea of

distance heuristic

CSP ideas of consistency enforcement

Page 8: 18 th  Feb

1. Ranking partial plans using distance-based heuristic

1. Ranking Function: f(P) = g(P) + w h(P)

g(P): number of actions in P h(P): estimate of number of new actions needed to refine P to become a solution plan w: increase the greediness of the heuristic search

2. Estimating h(P) h(P) |O’| Estimating |O’|

Difficulty: How to account for positive and negative - Interactions among actions in O’ - Interactions among actions in P - Interactions between O’ and P

S0

S1

S2

S3p

~p

g1

g2g2q

r

q1

Sinf

S4

P

h(P) |O’| = 2

S5

O’

Page 9: 18 th  Feb

Estimating h(P)

Assumption: Negative effects of actions are relaxed(which are to be dealt with later in unsafe link set)

P has no unsafe link flaws no negative interactions among actions in P no negative interactions between O’ and P

• |O’| ~ cost(S) needed to achieve the set of open conditions S from the initial state• Any state-space distance heuristic can be adapted• Informedness of heuristic estimate can be improved by using weaker relaxation assumption

S0

S1

S2

S3p

~p

g1

g2g2q

r

q1

Sinf

S4

P

S5

O’

Open condition setS={p,q,r,..}

Page 10: 18 th  Feb

Distance-based heuristic estimateusing length of relaxed plans

(adapted from state-space heuristics extracted from planning graphs [Nguyen & Kambhampati 2000], [Hoffman 2000],…)

Estimate h(P) = cost(S)1. Build a planning graph PG from the initial state.2. Cost(S) := 0 if all subgoals in S are in level 0. 3. Let p be a subgoal in S that appears last in PG.4. Pick an action a in the graph that first achieves p5. Update cost(S) := cost(a) + cost(S+Prec(a) – Eff(a))

where cost(a) = 0 if a P, and 1 otherwise6. Replace S = S+Prec(a) – Eff(a), goto 2

pa

0 1 2 3

SS+Prec(a)-Eff(a)a

Page 11: 18 th  Feb

2. Handling unsafe link flaws

Si

Sk

Sj

p

~pq

Prec(a)

1. For each unsafe link threatened by another step Sk:Add disjunctive constraint to O

Sk < Si V Sj< Sk

2. Whenever a new ordering constraintis introduced to O (or whenever you feel like it), perform the constraint propagations:

S1 < S2 V S3 < S4 ^ S4< S3 S1 < S2

S1 < S2 ^ S2 < S3 S1 < S3 S1 < S2 ^ S2 < S1 False

pSi Sj

• Avoid the unnecessary exponential multiplication of failing partial plans

Page 12: 18 th  Feb

3. Detecting indirect conflicts using reachability analysis

SkPrec(Sk)

1. Reachability analysis to detect inconsistency • on(a,b) and clear(b)• How to get state information in a partial plan?

3. Cutset: Set of literals that must be true at some point during execution of plan

For each action a, pre-C(Sk) = Prec(Sk) U {p |

is a link and Si < Sk < Sj } post-C(Sk) = Eff(Sk) U {p |

is a link and Si < Sk < Sj }

4. If there exists a cutset that violates of an invariantthe partial plan is invalid and should be pruned

pSi Sj

pSi Sj

Sm Sn

q

Si Sj

p

Eff(Sk)

Disadvantage:•Inconsistency checking is passiveand maybe expensive

Prec(Sk) + p + q Eff(Sk) + p + q

Page 13: 18 th  Feb

Detecting indirect conflicts using reachability analysis

SkPrec(Sk)

1. Generalizing unsafe link: Sk threatens iff p is mutually exclusive(mutex) with either Prec(Sk) or Eff(Sk)

2. Unsafe link is resolved by posting disjunctive constraints (as before) Sk < Si V Si < Sj

Sm Sn

q

Si Sj

p

Eff(Sk)

• Detects indirect conflicts early•Derives more disjunctive constraints to be propagated

pSi Sj

Page 14: 18 th  Feb

Experiments on RePOP

• RePOP is implemented on top of UCPOP planner using the three ideas presented– Written in Lisp, runs on Linux, 500MHz, 250MB– RePOP deals with set of totally instantiated actions thus

avoids binding constraints

• Compared RePOP against UCPOP, Graphplan and AltAlt in a number of benchmark domains– Performance metrics

• Time• Solution quality

Page 15: 18 th  Feb

Comparing planning time(time in seconds)

Repop vs. UCPOP Graphplan AltAlt

Problem UCPOP RePOP Graphplan AltAlt

Gripper-8 - 1.01 66.82 .43

Gripper-10 - 2.72 47min 1.15

Gripper-20 - 81.86 - 15.42

Rocket-a - 8.36 75.12 1.02

Rocket-b - 8.17 77.48 1.29

Logistics-a - 3.16 306.12 1.59

Logistics-b - 2.31 262.64 1.18

Logistics-c - 22.54 - 4.52

Logistics-d - 91.53 - 20.62

Bw-large-a 45.78 (5.23) - 14.67 4.12

Bw-large-b - (18.86) - 122.56 14.14

Bw-large-c - (137.84) - - 116.34

Page 16: 18 th  Feb

Comparing planning time(summary)

1. RePOP is very good in parallel domains (gripper, logistics, rocket, parallel blocks world)• Completely dominates UCPOP• Outperforms Graphplan in many domains• Competitive with AltAlt

2. RePOP still inefficient in serial domains: Travel, Grid, 8-puzzle

Repop vs. UCPOP Graphplan AltAlt

Page 17: 18 th  Feb

Some solution quality metrics

1. Number of actions

2. Makespan: minimum completion time

(number of time steps)

3. Flexibility: Average number of actions that do not have ordering constraints with other actions

Repop vs. UCPOP Graphplan AltAlt

Num_act=4Makespan=2Flex = 1

4

Num_act=4Makespan=2Flex = 2

1

2

3

4

1

2

3

1 2 3 4

Num_act=4Makespan=4Flex = 0

Page 18: 18 th  Feb

Comparing solution quality

Number of actions/ time steps Flexibility degree

Problem RePOP Graphplan AltAlt RePOP Graphplan AltAlt

Gripper-8 21/ 15 23/ 15 21/ 21 .57 .69 0

Gripper-10 27/ 19 29/ 19 27/ 27 .59 .61 0

Gripper-20 59/ 39 - 59/ 59 .68 - 0

Rocket-a 35/ 16 40/ 7 36/ 36 2.46 7.15 0

Rocket-b 34/15 30/ 7 34/ 34 7.29 4.80 0

Logistics-a 52/ 13 80/ 11 64/ 64 20.54 6.58 0

Logistics-b 42/ 13 79/ 13 53/ 53 20.0 5.34 0

Logistics-c 50/ 15 - 70/ 70 16.92 - 0

Logistics-d 69/ 33 - 85/ 85 22.84 - 0

Bw-large-a (8/5) - 11/ 4 9/ 9 2.75 2.0 0

Bw-large-b (11/8) - 18/ 5 11/ 11 3.28 2.67 0

Bw-large-c (17/ 10) -

- 19/ 19 5.06 - 0

Page 19: 18 th  Feb

Comparing solution quality(summary)

RePOP generates partially ordered plans

• Number of actions: RePOP typically returns shortest plans• Number of time steps (makespan):

Graphplan produces optimal number of time steps (strictly when all actions have the same durations)

RePOP comes close• Flexibility: RePOP typically returns the most flexible plans

Page 20: 18 th  Feb

Ablation studies

Problem UCPOP + CE + HP +CE+HP

(RePOP)

Gripper-8 * 6557/ 3881 * 1299/ 698

Gripper-10 * 11407/ 6642 * 2215/ 1175

Gripper-12 * 17628/ 10147 * 3380/ 1776

Gripper-20 * * * 11097/ 5675

Rocket-a * * 30110/ 17768 7638/ 4261

Rocket-b * * 85316/ 51540 28282/ 16324

Logistics-a * * 411/ 191 847/ 436

Logistics-b * * 920/ 436 542/ 271

Logistics-c * * 4939/ 2468 7424/ 4796

Logistics-d * * * 16572/ 10512

CE: Consistency enforcement techniques (reachability analysis and disjunctive constraint handlingHP: Distance-based heuristic

Page 21: 18 th  Feb

VHPOP: a successor to RePOP

Page 22: 18 th  Feb

Flaw Selection

• RePOP doesn’t particularly concentrate on flaw selection orderAny order will guarantee completeness but different orders have

different efficiency• For RePOP, unsafe links are basically handled by

disjunctive ordering constraints– So, we need an order for open conditions– Ideas:

• LIFO/FIFO• Pick open conditions with the least # of resolution choices (LCFR)• Pick open conditions that have the highest cost (in terms of

reachability). • Try a whole bunch in parallel! (this is what VHPOP does—although

it doesn’t use reachability based ordering)

Page 23: 18 th  Feb

Summary (till now)

• Progression/Regression/Partial order planners• Reachability heuristics for focusing them• In practice, for classical planning, progression planners

with reachability heuristics (e.g. FF) seem to do best – Assuming that we care mostly about “finding” a plan that is

cheapest in terms of # actions

(sort of) Open issues include:– Handling lifted actions (i.e. considering partially instantiated

actions)– Handling optimality criteria other than # actions

• Minimal cost (assuming actions have non-uniform costs)• Minimal make-span• Maximal flexibility

Page 24: 18 th  Feb

Disjunctive planning/Bounded length plan finding

Page 25: 18 th  Feb

PGs can be used as a basis for finding plans directly

If there exists a k-length plan, it will be a subgraph of the k-length planning graph. (see the highlighted subgraph of the PG for our example problem)

onT-A

onT-B

cl-A

cl-B

he

Pick-A

Pick-B

onT-A

onT-B

cl-A

cl-B

he

h-A

h-B

~cl-A

~cl-B

~he

St-A-B

St-B-A

Ptdn-A

Ptdn-B

Pick-A

onT-A

onT-B

cl-A

cl-B

he

h-Ah-B

~cl-A

~cl-B

~he

on-A-B

on-B-A

Pick-B

Page 26: 18 th  Feb

20th Feb

Page 27: 18 th  Feb

Finding the subgraphs that correspond to valid solutions..

--Can use specialized graph travesal techniques --start from the end, put the vertices corresponding to goals in. --if they are mutex, no solution --else, put at least one of the supports of those goals in --Make sure that the supports are not mutex --If they are mutex, backtrack and choose other set of supports. {No backtracking if we have no mutexes; basis for “relaxed plans”}

--At the next level subgoal on the preconds of the support actions we chose. --The recursion ends at init level --Consider extracting the plan from the PG directly-- This search can also be cast as a CSP or SAT or IP

onT-A

onT-B

cl-A

cl-B

he

Pick-A

Pick-B

onT-A

onT-B

cl-A

cl-B

he

h-A

h-B

~cl-A

~cl-B

~he

St-A-B

St-B-A

Ptdn-A

Ptdn-B

Pick-A

onT-A

onT-B

cl-A

cl-B

he

h-Ah-B

~cl-A

~cl-B

~he

on-A-B

on-B-A

Pick-B

The idea behind Graphplan

Page 28: 18 th  Feb

Backward search in Graphplan

P1

P2

P3

P4

P5

P6

I1

I2

I3

X

XX

P1

P2

P3

P4

P5

P6

A5

A6

A7

A8

A9

A10

A11

G1

G2

G3

G4

A1

A2

A3

A4

P6

P1Animate

d

Page 29: 18 th  Feb

Graphplan “History”

• Avrim Blum & Merrick Furst (1995) first came up with Graphplan idea—when planning community was mostly enamored with PO planning– Their original motivation was to develop a planner

based on “max-flow” ideas• Think of preconditions and effects as pipes and

actions as valves… You want to cause maximal fluid flow from init state to a certain set of literals in the goal level

• Maxflow is polynomial (but planning isn’t—because of the nonlinearity caused by actions—unless ALL preconditions are in, the “action valve” won’t activate the effect pipes…

• So they wound up finding a backward search idea instead

– Check out the animation…

Page 30: 18 th  Feb

The Story Behind Memos…

• Memos essentially tell us that a particular set S of conditions cannot be achieved at a particular level k in the PG. – We may as well remember this information—so in case we wind up

subgoaling on any set S’ of conditions, where S’ is a superset of S, at that level, you can immediately declare failure

• “Nogood” learning—Storage/matching cost vs. benefit of reduced search.. Generally in our favor

• But, just because a set S={C1….C100} cannot be achieved together doesn’t necessarily mean that the reason for the failure has got to do with ALL those 100 conditions. Some of them may be innocent bystanders.– Suppose we can “explain” the failure as being caused by the set U which

is a subset of S (say U={C45,C97})—then U is more powerful in pruning later failures

– Idea called “Explanation based Learning”• Improves Graphplan performance significantly….

[Rao, IJCAI-99; JAIR 2000]

Page 31: 18 th  Feb

Explaining Failures with Conflict Sets

Conflict set for P4 = P4

Whenever P can’t be given a value v because it conflicts with the assignment of Q, add Q to P’s conflict set

X

XX

P1

P2

P3

P4

P5

P6

A5

A6

A7

A8

A9

A10

A11

P2 P1

Page 32: 18 th  Feb

X

XX

P1

P2

P3

P4

P5

P6

A5

A6

A7

A8

A9

A10

A11

DDB & Memoization (EBL) with Conflict SetsWhen we reach a variable V with conflict set C

during backtracking --Skip other values of V if V is not in C (DDB) --Absorb C into conflict set of V if V is in C --Store C as a memo if V is the first variable at this level

Conflict set for P3 = P3 P2

P3

--Skip over P3 when backtracking from P4

Conflict set for P4 = P4 P2 P1

Conflict set for P1 = P4 P2 P1 P3

Conflict set for P2 = P4 P2 P1

Absorb conflict set being passed up

P1

P2

P3

P4

Store P1 P2 P3P4 as a memo

Page 33: 18 th  Feb

Regressing Conflict Sets

P1

P2

P3

P4

P1

P2

P3

P4

P5

P6

G1

G2

G3

G4

A1

A2

A3

A4

P6

P1

P1 P2 P3P4 regresses to G1 G2

-P1 could have been regressed to G4 but G1 was assigned earlier

--We can skip over G4 & G3 (DDB)

Regression: What is the minimum set of goals at the previous level, whose chosen action supports generate a sub-goal set that covers the memo --Minimal set --When there is a choice, choose a goal that has been assigned earlier --Supports more DDB

Page 34: 18 th  Feb

Using EBL Memos

If any stored memo is a subset of the current goal set, backtrack immediately

• Return the memo as the conflict set

Smaller memos are more general andthus prune more failing branches

Costlier memo-matching strategy --Clever indexing techniques available Set Enumeration Trees [Rymon, KRR92] UBTrees [Hoffman & Koehler, IJCAI-99]

Allows generation of more effectivememos at higher levels… Not possible with normal memoization

P1

P2

P3

P4

P1

P2

P3

P4

P5

P6

G1

G2

G3

G4

A1

A2

A3

A4

P6

P1

Page 35: 18 th  Feb

Speedups are correlated with memo-length reduction

Problem Length GP+EBL Graphplan Speedup

Huge-fct 18/18 9.5 11.3 1.7xBW-large-B 18/18 10.2 11.83 1.8xRocket-ext-a 7/36 8.5 23.9 24xRocket-ext-b 7/36 7.5 23.8 17xAtt-log-a 11/79 8.21 32 >1215xGripper-8 15/23 9 17.8 90xGripper-10 19/29 11 - >10xTower-5 31/31 .6.7 20.9 42x

Tower-6 63/63 7.9 22.3 >40x

TSP-10 10/10 6.9 13 >25x

TSP-12 12/12 7.9 - >58x

Ferry-5 31/31 8.8 25 37x

Ferry-6 39/39 10.9 - >25x

Page 36: 18 th  Feb

Goal(Variable)/Action(Value)selection heuristics..

• Pick hardest to satisfy variables (goals) first

• Pick easiest to satisfy values (actions) first– Hardness as

• Cardinality (goals that are supported by 15 actions are harder than those that can be supported by 17 actions)

• COST– Level of the goal (or set of action preconditions) in the PG

– The length of the relaxed plan for supporting that goal in the PG

[Romeo, AIPS-2000; also second part of AltAlt paper]

Page 37: 18 th  Feb

Level Heuristics help on solution bearing levels.

Page 38: 18 th  Feb

Level heuristics tend to be insensitive to length of the PG

Page 39: 18 th  Feb

More stuff on Graphplan• Graphplan differentiated between Static Interference and Mutex

– Two actions interfere statically if ones effects are inconsistent with the other actions preconditions/effects

– Two actions are mutex if they are either statically interfering or have been marked mutex by the mutex propagation procedure

• As long as we have static interference relations marked, then we are guaranteed to find a solution with backward search!

• Mutex propagation only IMPROVES the efficiency of the backward search..Mutex propagation is thus very similar to consistency enforcement in CSP Memoization improves it even further Efficient memoization can improve it even more further…

• Original Graphplan algorithm used “parallel planing graphs” (rather than serial planning graphs). – Not every pair of non-noop actions are marked mutex– This meant that you can get multiple actions per time step

• Serial PG has more mutex relations (apart from interferences that come because of precondition/effects, we basically are adding some sort of “resource-based” mutexes—saying the agent doesn’t have resources to do more than one action per level).

Page 40: 18 th  Feb

Optimality of Graphplan

• Original Graphplan will produce “step-optimal” plans– NOT optimal wrt #actions

• Can get it with serial Graphplan

– NOT cost optimal• Need Multi-PEGG..(according to Terry)

Page 41: 18 th  Feb

Graphplan and Termination• Suppose we grew the graph to level-off and still did not

find a solution.– Is the problem unsolvable?

• Example: Actions A1…A100 gives goals G1…G100. Can’t do more than one action at a level (assume we are using serial PG)

• Level at which G1..G100 are true=?• Length of the plan=?

• One can see the process of extracting the plan as verifying that at least one execution thread is devoid of n-ary mutexes– Unsolvable if memos also do not change from level to

level

Page 42: 18 th  Feb
Page 43: 18 th  Feb

onT-A

onT-B

cl-A

cl-B

he

Pick-A

Pick-B

onT-A

onT-B

cl-A

cl-B

he

h-A

h-B

~cl-A

~cl-B

~he

St-A-B

St-B-A

Ptdn-A

Ptdn-B

Pick-A

onT-A

onT-B

cl-A

cl-B

he

h-Ah-B

~cl-A

~cl-B

~he

on-A-B

on-B-A

Pick-B

Variables/Domains:~cl-B-2: { #, St-A-B-2, Pick-B-2}he-2: {#, St-A-B-2, St-B-A-2,Ptdn-A-2,Ptdn-B-2}h-A-1: {#, Pick-A-1}h-B-1: {#,Pick-B-1 }….

Constraints: he-2 = St-A-B-2 => h-A-1 !=# {activation}

On-A-B-2 = St-A-B-2 => On-B-A-2 != St-B-A-2 {mutex constraints} Goals:~cl-B-2 != # he-2 !=#

Conversion to CSP

-- This search can also be cast as a CSP Variables: literals in proposition lists Values: actions supporting them Constraints: Mutex and Activation constraints

Page 44: 18 th  Feb

CSP Encodings can be faster than Graphplan Backward Search

Graphplan Satz Relsat GP-CSP

Problem time (s) mem time(s) mem time (s) mem time (s) mem

bw-12steps 0.42 1 M 8.17 64 M 3.06 70 M 1.96 3M

bw-large-a 1.39 3 M 47.63 88 M 29.87 87 M 1.2 11M

rocket-a 68 61 M 8.88 70 M 8.98 73 M 4.01 3M

rocket-b 130 95 M 11.74 70 M 17.86 71 M 6.19 4 M

log-a 1771 177 M 7.05 72 M 4.40 76 M 3.34 4M

log-b 787 80 M 16.13 79 M 46.24 80 M 110 4.5M

hsp-bw-02 0.86 1 M 7.15 68 M 2.47 66 M .89 4.5 M

hsp-bw-03 5.06 24 M > 8 hs - 194 121 M 4.47 13 M

hsp-bw-04 19.26 83 M > 8 hs - 1682 154 M 39.57 64 M

Do & Kambhampati, 2000

But but WHY? --We are taking the cost of converting PG into CSP (and also tend to lose the ability to use previous level search) --there is NO reason why the search for finding the valid subgraph has to go level-by-level and back to front. --CSP won’t be hobbled by level-by-level and back-to-front

Page 45: 18 th  Feb

Mutex propagation as CSP pre-processing

• Suppose we start with a PG that only marks every pair of “interfering” actions as mutex

• Any pair of non-noop actions are interfering• Any pair of actions are interfering if one gives P and other gives or requires ~P• No propagation is done

– Converting this PG and CSP and solving it will still give a valid solution (if there is one)

– So what is mutex propagation doing?• It is “explicating” implicit constraints• A special subset of “3-consistency” enforcement

– Recall that enforcing k-consistency involves adding (k-1)-ary constraints– *Not* full 3-consistency (which can be much costlier)

» So enforcing the consistency on PG is cheaper than enforcing it after conversion to CSP...

Page 46: 18 th  Feb

Alternative encodings..

• The problem of finding a valid plan from the planning graph can be encoded on any combinatorial substrate

• Alternatives:– CSP [GP-CSP]– SAT [Blackbox; SATPLAN]– IP [Vossen et. Al]

Page 47: 18 th  Feb

Compilation to CSP[D

o & K

ambhampati, 2000]

Variables: Propositions (In-A-1, In-B-1, ..At-R-E-0 …)Domains: Actions supporting that proposition in the plan In-A-1 : { Load-A-1, #} At-R-E-1: {P-At-R-E-1, #} Constraints: Mutual exclusion ~[ ( In-A-1 = Load-A-1) & (At-R-M-1 = Fly-R-1)] ; etc..

Activation In-A-1 != # & In-B-1 != # (Goals must have action assignments) In-A-1 = Load-A-1 => At-R-E-0 != # , At-A-E-0 != # (subgoal activation constraints)

[Corr

esp

onds

to a

regre

ssio

n-b

ase

d p

roof]

At(R,E)

At(A,E)

At(B,E)

1: Load(A)

2 : Load(B)

3 : Fly(R)

P-At(R,E)

P-At(A,E)

P-At(B,E)

In(A)

In(B)

At(R,M)

At(R,E)

At(A,E)

At(B,E)

Goals: In(A),In(B)

CSP: Given a set of discrete variables,the domains of the variables, and constraintson the specific values a set of variables can takein combination, FIND an assignment of values toall the variables which respects all constraints

Page 48: 18 th  Feb

Compilation to SAT

Init: At-R-E-0 & At-A-E-0 & At-B-E-0Goal: In-A-1 & In-B-1

Graph: “cond at k => one of the supporting actions at k-1” In-A-1 => Load-A-1 In-B-1 => Load-B-1 At-R-M-1 => Fly-R-1 At-R-E-1 => P-At-R-E-1

Load-A-1 => At-R-E-0 & At-A-E-0 “Actions => preconds” Load-B-1 => At-R-E-0 & At-B-E-0 P-At-R-E-1 => At-R-E-0h

~In-A-1 V ~ At-R-M-1 ~In-B-1 V ~At-R-M-1 “Mutexes”

At(R,E)

At(A,E)

At(B,E)

1: Load(A)

2 : Load(B)

3 : Fly(R)

P-At(R,E)

P-At(A,E)

P-At(B,E)

In(A)

In(B)

At(R,M)

At(R,E)

At(A,E)

At(B,E)

[Kautz

& Selman]

Goals: In(A),In(B)

SAT is CSP with Boolean Variables

Page 49: 18 th  Feb

Compilation to Integer Linear Programming

• Motivations– Ability to handle numeric quantities, and do optimization

– Heuristic value of the LP relaxation of ILP problems

• Conversion – Convert a SAT/CSP encoding to ILP inequalities

• E.g. X v ~Y v Z => x + (1 - y) + z >= 1

– Explicitly set up tighter ILP inequalities (Cutting constraints)• If X,Y,Z are pairwise mutex, we can write x+y+z <= 1

(instead of x+y <=1 ; y+z <=1 ; z +x <= 1)[ W

alser & Kautz;

Vossen et. al;

Bockmayr & Dimopolous]

ILP: Given a set of real valued variables, a linear objective function on the variables, a set of linear inequalities on the variables, and a set of integrality restrictions on the variables, Find the values of the feasible variables for which the objective function attains the maximum value -- 0/1 integer programming corresponds closely to SAT problem

Page 50: 18 th  Feb

Relative Tradeoffs Offered bythe various compilation substrates

• CSP encodings support implicit representations– More compact encodings [Do & Kambhampati, 2000]– Easier integration with Scheduling techniques

• ILP encodings support numeric quantities – Seamless integration of numeric resource constraints [Walser & Kautz, 1999]– Not competitive with CSP/SAT for problems without numeric constraints

• SAT encodings support axioms in propositional logic form– May be more natural to add (for whom ;-)

Page 51: 18 th  Feb

CSP Encodings can be more compact: GP-CSP

Graphplan Satz Relsat GP-CSP

Problem time (s) mem time(s) mem time (s) mem time (s) mem

bw-12steps 0.42 1 M 8.17 64 M 3.06 70 M 1.96 3M

bw-large-a 1.39 3 M 47.63 88 M 29.87 87 M 1.2 11M

rocket-a 68 61 M 8.88 70 M 8.98 73 M 4.01 3M

rocket-b 130 95 M 11.74 70 M 17.86 71 M 6.19 4 M

log-a 1771 177 M 7.05 72 M 4.40 76 M 3.34 4M

log-b 787 80 M 16.13 79 M 46.24 80 M 110 4.5M

hsp-bw-02 0.86 1 M 7.15 68 M 2.47 66 M .89 4.5 M

hsp-bw-03 5.06 24 M > 8 hs - 194 121 M 4.47 13 M

hsp-bw-04 19.26 83 M > 8 hs - 1682 154 M 39.57 64 M

Do & Kambhampati, 2000

Page 52: 18 th  Feb

Advantages of CSP encodings over SAT encodings: GP-CSP

Size of learning: k = 10 for both size-based and relevance-based

Speedup over GP-CSP up to 10x

Faster than SAT in most cases, up to 70x over Blackbox

Size -base d EBL Re l-base d EBL Spe e dup/Me m-ratio for Re l-10 EBL

Problems time (s) me m time me m GP-CSP GP SATZ Re lsat

bw-12steps 1.31 11 M 1.46 10 M 1.34x/0.30x 0.28x/0.1x 5.60x/6.40x 2.10x/7.00x

bw-large-a 259 24 M 134 26 M 9.21x/0.42x 0.01x/0.12x 0.36x/3.38x 0.22x/3.34x

rocket -a 2.08 8 M 2.39 11 M 1.68x/0.27x 28.45x/5.54x 3.72x/6.36x 3.76x/6.63x

rocket -b 3.55 9 M 4.00 10 M 1.55/0.40x 32.50x/9.50x 2.94x/7.00x 4.47x/7.10x

log-a 2.37 18 M 2.30 18 M 1.45x/0.22x 770x/9.83x 3.07x/4.00x 1.91x/4.22x

log-b 39.55 19 M 35.13 19 M 3.13x/0.29x 22.40x/4.21x 0.46x/4.16x 1.32x/4.21x

log-c 61 24 M 48.72 25 M 10.47x/0.88x > 220x 24.42x/3.36x 2.61x/3.56x

hsp-bw-02 1.03 12 M 1.09 12 M 0.82x/0.37x 0.79x/0.08x 6.56x/5.67x 2.27x/5.50x

hsp-bw-03 5.04 26 M 5.17 41 M 0.86x/0.32x 0.98x/0.59x > 5570x 37.52x/2.95x

hsp-bw-04 38.01 86 M 23.89 86 M 1.65x/0.75x 0.81x/0.97x > 1205x 70.41x1.79x

Page 53: 18 th  Feb

Direct vs. compiled solution extraction

Need to adapt CSP/SAT techniques

Can exploit approaches for compacting the plan

Can make the search incremental across iterations

Can exploit the latest advances in SAT/CSP solvers

Compilation stage can be time consuming, leads to memory blow-up

Makes it harder to exploit search from previous iterations

Makes it easier to add declarative control knowledge

DIRECT Compiled