probabilistic planning via determinization in hindsight ff-hindsight

36
Sungwook Yoon – Probabilistic Planning via Determinization Probabilistic Planning via Determinization in Hindsight FF-Hindsight Sungwook Yoon Joint work with Alan Fern, Bob Givan and Rao Kambhampati

Upload: elysia

Post on 23-Feb-2016

60 views

Category:

Documents


0 download

DESCRIPTION

Probabilistic Planning via Determinization in Hindsight FF-Hindsight. Sungwook Yoon Joint work with Alan Fern, Bob Givan and Rao Kambhampati. Probabilistic Planning Competition. Client : Participants, send action Server: Competition Host, simulates actions. The Winner was ……. FF-Replan - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Probabilistic Planning via Determinization in Hindsight FF-Hindsight

Sungwook Yoon – Probabilistic Planning via Determinization

Probabilistic Planning via Determinization in Hindsight

FF-Hindsight

Sungwook Yoon

Joint work withAlan Fern, Bob Givan and Rao Kambhampati

Page 2: Probabilistic Planning via Determinization in Hindsight FF-Hindsight

Sungwook Yoon – Probabilistic Planning via Determinization

Probabilistic Planning Competition

Client : Participants, send actionServer: Competition Host, simulates actions

2

Page 3: Probabilistic Planning via Determinization in Hindsight FF-Hindsight

Sungwook Yoon – Probabilistic Planning via Determinization

The Winner was ……

• FF-Replan– A replanner. Use FF– Probabilistic domain is determinized

• Interesting Contrast– Many probabilistic planning techniques • Work in theory but does not work in practice

– FF-Replan• No theory• Work in practice

3

Page 4: Probabilistic Planning via Determinization in Hindsight FF-Hindsight

Sungwook Yoon – Probabilistic Planning via Determinization

The Paper’s Objective

Better determinization approach(Determinization in Hindsight)

Theoretical consideration of the new determinization (in Hindsight)

New view on FF-Replan

Experimental studies with determinization in Hindsight (FF-Hindsight)

4

Page 5: Probabilistic Planning via Determinization in Hindsight FF-Hindsight

Sungwook Yoon – Probabilistic Planning via Determinization

Probabilistic Planning(goal-oriented)

Action

ProbabilisticOutcome

Time 1

Time 2

Goal State

5

ActionState

Maximize Goal Achievement

Dead End

A1 A2

I

A1 A2 A1 A2 A1 A2 A1 A2

Left Outcomes are more likely

Page 6: Probabilistic Planning via Determinization in Hindsight FF-Hindsight

Sungwook Yoon – Probabilistic Planning via Determinization

All Outcome Replanning (FFRA)

Action

Effect 1

Effect 2

Probability1

Probability2

Action1 Effect 1

Action2 Effect 2

ICAPS-07

6

Page 7: Probabilistic Planning via Determinization in Hindsight FF-Hindsight

Sungwook Yoon – Probabilistic Planning via Determinization

Probabilistic PlanningAll Outcome Determinization

Action

ProbabilisticOutcome

Time 1

Time 2

Goal State

7

ActionState

Find Goal

Dead End

A1 A2

A1 A2 A1 A2 A1 A2 A1 A2

I

A1-1 A1-2 A2-1 A2-2

A1-1 A1-2 A2-1 A2-2 A1-1 A1-2 A2-1 A2-2 A1-1 A1-2 A2-1 A2-2 A1-1 A1-2 A2-1 A2-2

Page 8: Probabilistic Planning via Determinization in Hindsight FF-Hindsight

Sungwook Yoon – Probabilistic Planning via Determinization

Probabilistic PlanningAll Outcome Determinization

Action

ProbabilisticOutcome

Time 1

Time 2

Goal State

8

ActionState

Find Goal

Dead End

A1 A2

A1 A2 A1 A2 A1 A2 A1 A2

I

A1-1 A1-2 A2-1 A2-2

A1-1 A1-2 A2-1 A2-2 A1-1 A1-2 A2-1 A2-2 A1-1 A1-2 A2-1 A2-2 A1-1 A1-2 A2-1 A2-2

Page 9: Probabilistic Planning via Determinization in Hindsight FF-Hindsight

Sungwook Yoon – Probabilistic Planning via Determinization

Problem of FF-Replan and better alternative sampling

9

FF-Replan’s Static Determinizations don’t respect probabilities.

We need “Probabilistic and Dynamic Determinization”

Sample Future Outcomes and

Determinization in HindsightEach Future Sample Becomes a

Known-Future Deterministic Problem

Page 10: Probabilistic Planning via Determinization in Hindsight FF-Hindsight

Sungwook Yoon – Probabilistic Planning via Determinization

Probabilistic Planning(goal-oriented)

Action

ProbabilisticOutcome

Time 1

Time 2

Goal State

10

ActionState

Maximize Goal Achievement

Dead End

Left Outcomes are more likely

A1 A2

A1 A2 A1 A2 A1 A2 A1 A2

I

Page 11: Probabilistic Planning via Determinization in Hindsight FF-Hindsight

Sungwook Yoon – Probabilistic Planning via Determinization 11

Start Sampling

Note. Sampling will reveal which is betterA1? Or A2 at state I

Page 12: Probabilistic Planning via Determinization in Hindsight FF-Hindsight

Sungwook Yoon – Probabilistic Planning via Determinization

Hindsight Sample 1Action

ProbabilisticOutcome

Time 1

Time 2

Goal State

12

ActionState

Maximize Goal Achievement

Dead EndA1: 1A2: 0

Left Outcomes are more likely

A1 A2

A1 A2 A1 A2 A1 A2 A1 A2

I

Page 13: Probabilistic Planning via Determinization in Hindsight FF-Hindsight

Sungwook Yoon – Probabilistic Planning via Determinization

Hindsight Sample 2Action

ProbabilisticOutcome

Time 1

Time 2

Goal State

13

ActionState

Maximize Goal Achievement

Dead End

Left Outcomes are more likely

A1: 2A2: 1

A1 A2

A1 A2 A1 A2 A1 A2 A1 A2

I

Page 14: Probabilistic Planning via Determinization in Hindsight FF-Hindsight

Sungwook Yoon – Probabilistic Planning via Determinization

Hindsight Sample 3Action

ProbabilisticOutcome

Time 1

Time 2

Goal State

14

ActionState

Maximize Goal Achievement

Dead End

Left Outcomes are more likely

A1: 2A2: 1

A1 A2

A1 A2 A1 A2 A1 A2 A1 A2

I

Page 15: Probabilistic Planning via Determinization in Hindsight FF-Hindsight

Sungwook Yoon – Probabilistic Planning via Determinization

Hindsight Sample 4Action

ProbabilisticOutcome

Time 1

Time 2

Goal State

15

ActionState

Maximize Goal Achievement

Dead End

Left Outcomes are more likely

A1: 3A2: 1

A1 A2

A1 A2 A1 A2 A1 A2 A1 A2

I

Page 16: Probabilistic Planning via Determinization in Hindsight FF-Hindsight

Sungwook Yoon – Probabilistic Planning via Determinization

Summary of the Idea:The Decision Process

(Estimating Q-Value, Q(s,a))

1. For Each Action A, Draw Future Samples

2. Solve The Deterministic Problems

3. Aggregate the solutions for each action

4. Select the action with best aggregation

S: Current State, A(S) → S’

Each Sample is a Deterministic Planning Problem

The solution length is used for goal-oriented problems, Q(s,A)

Max A Q(s,A)

16

Page 17: Probabilistic Planning via Determinization in Hindsight FF-Hindsight

Sungwook Yoon – Probabilistic Planning via Determinization

Mathematical Summary of the Algorithm

• H-horizon future FH for M = [S,A,T,R]– Mapping of state, action and time (h<H) to a state– S × A × h → S

• Value of a policy π for FH – R(s,FH, π)

• VHS(s,H) = EFH [maxπ R(s,FH,π)]

• Compare this and the real value• V*(s,H) = maxπ EF

H [ R(s,FH,π) ]• VFFRa(s) = maxF V(s,F) ≥ VHS(s,H) ≥ V*(s,H)• Q(s,a,H) = (R(a) + EF

H-1 [maxπ R(a(s),FH-1,π)] )– In our proposal, computation of maxπ R(s,FH-1,π) is

approximately done by FF [Hoffmann and Nebel ’01]17

Done by FF

Each Future is aDeterministicProblem

Page 18: Probabilistic Planning via Determinization in Hindsight FF-Hindsight

Sungwook Yoon – Probabilistic Planning via Determinization

Key Technical ResultsThe Importance of Independent Sampling of States, Actions, Time

The necessity of Random Time Breaking in Decision making

Theorem 1When there is a policy that can achieve the goal with probability 1 within horizon, hindsight decision making algorithm will find the goal with probability 1.

Theorem 2Polynomial number of samples are needed with regard to, Horizon, Action, The minimum Q-value advantage

We identify the characteristic of FF-Replan in terms of Hindsight Decision Making, VFFRa(s) = maxF V(s,F)

18

Page 19: Probabilistic Planning via Determinization in Hindsight FF-Hindsight

Sungwook Yoon – Probabilistic Planning via Determinization

Empirical Results

Problem FFRa FF-HindsightBlocksworld 270 158

Boxworld 150 100

Fileworld 29 14

R-Tireworld 30 30

ZenoTravel 30 0

Exploding BW 5 28

G-Tireworld 7 18

Tower of Hanois 11 17

IPPC-04 Problems Numbers are solved Trials

For ZenoTravel, when we used Importance sampling, the solved trials have been improved to 26

19

Page 20: Probabilistic Planning via Determinization in Hindsight FF-Hindsight

Sungwook Yoon – Probabilistic Planning via Determinization

Empirical Results

Planners

Climber River Bus-Fare

Tire1 Tire2 Tire3 Tire4 Tire5 Tire6

FFRa 60% 65% 1% 50% 0% 0% 0% 0% 0%Paragraph 100% 65% 100% 100% 100% 100% 3% 1% 0%FPG 100% 65% 22% 100% 92% 60% 35% 19% 13%FF-HS 100% 65% 100% 100% 100% 100% 100% 100% 100%

These Domains are Developed just to Beat FF-ReplanObviously, FF-Replan did not do well.

But, FF-Hindsight did very well, showingProbabilistic Reasoning Ability while achieving Scalability

20

Page 21: Probabilistic Planning via Determinization in Hindsight FF-Hindsight

Sungwook Yoon – Probabilistic Planning via Determinization

Conclusion

21

Deterministic Planningscalability

Classic Planning

Machine Learning forPlanning

Net Benefit Optimization

Temporal Planning

Probabilistic Planning

scalability

Markov Decision Processes

Machine Learning forMDP

Temporal MDP

scalability

Determinization

Page 22: Probabilistic Planning via Determinization in Hindsight FF-Hindsight

Sungwook Yoon – Probabilistic Planning via Determinization

Conclusion

• Devised an algorithm that can take advantage of the significant advances in deterministic planning in the context of probabilistic planning

• Made many of the deterministic planning techniques available to probabilistic planning– Most of the learning to planning techniques are

developed solely for deterministic planning• Now, these techniques are relevant to probabilistic planning

too– Advanced net-benefit style of planners can be used

for the reward maximization style of probabilistic planning problems

22

Page 23: Probabilistic Planning via Determinization in Hindsight FF-Hindsight

Sungwook Yoon – Probabilistic Planning via Determinization

Discussion

• Mercier and Van Hentenryck provided the analysis of the difference between – V*(s,H) = maxπ EF

H [ R(s,FH,π) ]– VHS(s,H) = EF

H [maxπ R(s,FH,π)]• Ng and Jordan provided the analysis of the

difference between– V*(s,H) = maxπ EF

H [ R(s,FH,π) ]– V^(s,H) = maxπ ∑ [ R(s,FH,π) ] / m, where m is the

sample number

23

Page 24: Probabilistic Planning via Determinization in Hindsight FF-Hindsight

Sungwook Yoon – Probabilistic Planning via Determinization

IPPC-2004 Results

NMRC J1 Classy NMR mGPT C FFRS FFRA

BW 252 270 255 30 120 30 210 270

Box 134 150 100 0 30 0 150 150

File - - - 3 30 3 14 29

Zeno - - - 30 30 30 0 30

Tire-r - - - 30 30 30 30 30

Tire-g - - - 9 16 30 7 7

TOH - - - 15 0 0 0 11Exploding - - - 0 0 0 3 5

Human Control Knowledge 2nd Place Winners

LearnedKnowledge

NMR Non-Markovian Reward Decision Process PlannerClassy Approximate Policy Iteration with a Policy Language Bias

mGPT Heuristic Search Probabilistic Planning

C Symbolic Heuristic Search

Numbers : Successful Runs

Winner of IPPC-04FFRs

24

Page 25: Probabilistic Planning via Determinization in Hindsight FF-Hindsight

Sungwook Yoon – Probabilistic Planning via Determinization

IPPC-2006 ResultsFFRA FPG FOALP sfDP Paragraph FFRS

BW 86 63 100 29 0 77Zenotravel 100 27 0 7 7 7

Random 100 65 0 0 5 73

Elevator 93 76 100 0 0 93

Exploding 52 43 24 31 31 52

Drive 71 56 0 0 9 0

Schedule 51 54 0 0 1 0

PitchCatch 54 23 0 0 0 0

Tire 82 75 82 0 91 69

FPG Factored Policy Gradient Planner

FOALP First Order Approximate Linear Programming

sfDP Symbolic Stochastic Focused Dynamic Programming with Decision Diagrams

Paragraph A Graphplan Based Probabilistic Planner

Numbers : Percentage ofSuccessful Runs

Unofficial Winner of IPPC-06 FFRa

25

Page 26: Probabilistic Planning via Determinization in Hindsight FF-Hindsight

Sungwook Yoon – Probabilistic Planning via Determinization 26

Page 27: Probabilistic Planning via Determinization in Hindsight FF-Hindsight

Sungwook Yoon – Probabilistic Planning via Determinization

Sampling ProblemTime dependency issue

Start

S1 S2

Goal

S3

Dead End

A

BC (with probability p)

C (with probability 1-p)

D (with probability 1-p)

D (with probability p)

27

Page 28: Probabilistic Planning via Determinization in Hindsight FF-Hindsight

Sungwook Yoon – Probabilistic Planning via Determinization

Sampling ProblemTime dependency issue

Start

S1 S2

Goal

S3

Dead End

A

B

S3 is worse state then S1 but looks like there is always a path to GoalNeed to sample independently across actions

28

Page 29: Probabilistic Planning via Determinization in Hindsight FF-Hindsight

Sungwook Yoon – Probabilistic Planning via Determinization

Action Selection ProblemRandom Tie breaking is essential

Start S1 Goal

C: with probability 1-p

C: with probability p

B: with probability p

A: Always stays in StartB: with probability 1-p

In Start state, C action is definitely better, but A can be used to wait until C to the Goal effect is realized

29

Page 30: Probabilistic Planning via Determinization in Hindsight FF-Hindsight

Sungwook Yoon – Probabilistic Planning via Determinization

Sampling ProblemImportance Sampling (IS)

Start GoalS1 B: with extremely low probability

B: with very high probability

- Sampling uniformly would find the problem unsolvable.- Use importance sampling.- Identifying the region that needs importance sampling is for further study.-In the benchmark, Zenotravel needs the IS idea.

30

Page 31: Probabilistic Planning via Determinization in Hindsight FF-Hindsight

Sungwook Yoon – Probabilistic Planning via Determinization

Theoretical Results• Theorem 1

– For goal-achieving probabilistic planning problems, if there is a policy that can solve the probabilistic planning problem with probability 1 with bounded horizon, then hindsight planning would solve the problem with probability 1. If there is no such policy, hindsight planning would return less 1 success ratio.

– If there is a future where no plan can achieve the goal, the future can be sampled

• Theorem 2– The number of future samples needed to correctly identify the

best action– w > 4Δ-2

T ln (|A|H| / δ)– Δ : the minimum Q-advantage of the best action over the other

actions, δ: confidence parameter– From Chernoff Bound

31

Page 32: Probabilistic Planning via Determinization in Hindsight FF-Hindsight

Sungwook Yoon – Probabilistic Planning via Determinization

Probabilistic PlanningExpecti-max solution

Action

ProbabilisticOutcome

Time 1

Time 2

Goal State

32

ActionState

Maximize Goal Achievement

Max

Max Max Max Max

Exp Exp

E E E E E E E E

Page 33: Probabilistic Planning via Determinization in Hindsight FF-Hindsight

Sungwook Yoon – Probabilistic Planning via Determinization

Hindsight Sample 1Action

ProbabilisticOutcome

Time 1

Time 2

Goal State

33

ActionState

Maximize Goal Achievement

Dead EndA1: 1A2: 0

Left Outcomes are more likely

A1 A2

A1 A2 A1 A2 A1 A2 A1 A2

I

Page 34: Probabilistic Planning via Determinization in Hindsight FF-Hindsight

Sungwook Yoon – Probabilistic Planning via Determinization

Hindsight Sample 2Action

ProbabilisticOutcome

Time 1

Time 2

Goal State

34

ActionState

Maximize Goal Achievement

Dead End

Left Outcomes are more likely

A1: 2A2: 1

A1 A2

A1 A2 A1 A2 A1 A2 A1 A2

I

Page 35: Probabilistic Planning via Determinization in Hindsight FF-Hindsight

Sungwook Yoon – Probabilistic Planning via Determinization

Hindsight Sample 3Action

ProbabilisticOutcome

Time 1

Time 2

Goal State

35

ActionState

Maximize Goal Achievement

Dead End

Left Outcomes are more likely

A1: 2A2: 1

A1 A2

A1 A2 A1 A2 A1 A2 A1 A2

I

Page 36: Probabilistic Planning via Determinization in Hindsight FF-Hindsight

Sungwook Yoon – Probabilistic Planning via Determinization

Hindsight Sample 4Action

ProbabilisticOutcome

Time 1

Time 2

Goal State

36

ActionState

Maximize Goal Achievement

Dead End

Left Outcomes are more likely

A1: 3A2: 1

A1 A2

A1 A2 A1 A2 A1 A2 A1 A2

I