dan’s multi-option talk

27
1 © 2007 SRI International Dan’s Multi-Option Talk Option 1: HUMIDRIDE: Dan’s Trip to the East Coast Whining: High Duration: Med Viruses: Low Option 2: T-Cell: Attacking Dan’s Cold Virus Whining: Med Duration: Low Viruses: High Option 3: Model-Lite Planning: Diverse Multi- Option Plans and Dynamic Objectives Whining: Low Duration: High Viruses: Low

Upload: tan

Post on 22-Feb-2016

40 views

Category:

Documents


0 download

DESCRIPTION

Dan’s Multi-Option Talk. Option 1: HUMIDRIDE: Dan’s Trip to the East Coast Whining: High Duration: Med Viruses: Low Option 2: T-Cell: Attacking Dan’s Cold Virus Whining: Med Duration: Low Viruses: High Option 3: Model-Lite Planning: Diverse Multi-Option Plans and Dynamic Objectives - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Dan’s Multi-Option Talk

1© 2007 SRI International

Dan’s Multi-Option Talk• Option 1: HUMIDRIDE: Dan’s Trip to the East Coast

– Whining: High– Duration: Med– Viruses: Low

• Option 2: T-Cell: Attacking Dan’s Cold Virus– Whining: Med– Duration: Low– Viruses: High

• Option 3: Model-Lite Planning: Diverse Multi-Option Plans and Dynamic Objectives– Whining: Low– Duration: High– Viruses: Low

Page 2: Dan’s Multi-Option Talk

© 2007 SRI International

Model-Lite Planning: Diverse Multi-Option Plans and Dynamic ObjectivesDaniel BryceWilliam CushingSubbarao Kambhampati

Page 3: Dan’s Multi-Option Talk

3© 2007 SRI International

Questions• When must the plan executor decide on their planning objective?

– Before synthesis? Traditional model

– Before execution? Similar to IR model: select plan from set of diverse, but relevant plans

– During execution? Multi-Option Plans (subsumes previous)

– At all? “Keep your options open”

• Can the executor change their planning objective without replanning?• Can the executor start acting without committing to an objective?

Page 4: Dan’s Multi-Option Talk

4© 2007 SRI International

Overview• Diverse Multi-Option Plans

– Diversity– Representation– Connection to Conditional Plans– Execution

• Synthesizing Multi-Option Plans– Example– Speed-ups

• Analysis– Synthesis– Execution

• Conclusion

Page 5: Dan’s Multi-Option Talk

5© 2007 SRI International

Diverse Multi-Option Plans• Each plan step presents several diverse choices

– Option 1: Train(MP, SFO), Fly(SFO, BOS), Car(BOS, Prov.)– Option 1a: Train(MP, SFO), Fly(SFO, BOS), Fly(BOS, PVD), Cab(PVD, Prov.)– Option 2: Shuttle(MP, SFO), Fly(SFO, BOS), Car(BOS, Prov.)– Option2a: Shuttle(MP, SFO), Fly(SFO, BOS), Fly(BOS, PVD), Cab(PVD, Prov.)

• Diversity is Reliant on Pareto Optimality– Each option is non-dominated– Diversity through Pareto Front w/ High Spread

O1

Duration

CostO2

O2aO1a

Fly(BOS,PVD)

Car(BOS,Prov.)

Train(MP, SFO)

Shuttle(MP, SFO)

Fly(SFO, BOS)

Fly(SFO, BOS)

Fly(BOS,PVD)

Car(BOS,Prov.)

Cab(PVD, Prov.)

Cab(PVD, Prov.)

O2

O2a

O1

O1a

Diversity

Page 6: Dan’s Multi-Option Talk

6© 2007 SRI International

Dynamic Objectives

• Multi-Options Plans are a type of Conditional Plan– Conditional on the user’s Objective Function– Allow the objective Function to change– Ensured that, irrespective of their obj. fn., will have non-dominated options

Fly(BOS,PVD)

Car(BOS,Prov.)

Train(MP, SFO)

Shuttle(MP, SFO)

Fly(SFO, BOS)

Fly(SFO, BOS)

Fly(BOS,PVD)

Car(BOS,Prov.)

Cab(PVD, Prov.)

Cab(PVD, Prov.)

O2

O2a

O1

O1a

Page 7: Dan’s Multi-Option Talk

7© 2007 SRI International

Executing Multi-Option Plans

Duration

Cost

O1

O2

O2aO1a

Fly(BOS,PVD)

Car(BOS,Prov.)

Train(MP, SFO)

Shuttle(MP, SFO)

Fly(SFO, BOS)

Fly(SFO, BOS)

Fly(BOS,PVD)

Car(BOS,Prov.)

Cab(PVD, Prov.)

Cab(PVD, Prov.)

O2

O2a

O1

O1a

Duration

Cost

O1O1a

Local action choicecorrespondsto multiple options

Duration

Cost

O1

Duration

Cost

O1O1a

Option valuesChange at each step

Page 8: Dan’s Multi-Option Talk

8© 2007 SRI International

Multi-Option Conditional Probabilistic Planning• (PO)MDP setting: (Belief) State Space Search

– Stochastic Actions, Observations, Uncertain Initial State, Loops– Two Objectives: Expected Plan Cost, Probability of Plan Success

Traditional Reward functions are linear combination of above. Assume objective fn. • Extend LAO* to multiple objectives (Multi-Option LAO*)

– Each generated (belief) state has an associated Pareto set of “best” sub-plans– Dynamic programming (state backup) combines successor state Pareto sets

Yes, its exponential time per backup per state ♦ There are approximations

– Basic Algorithm While not have a good plan

♦ ExpandPlan♦ RevisePlan

SS

S

Page 9: Dan’s Multi-Option Talk

9© 2007 SRI International

Example of State Backup

Page 10: Dan’s Multi-Option Talk

10© 2007 SRI International

Search Example -- Initially

0.0

C

Pr(G)

Initialize Root Pareto Set with null plan and heuristicestimate

Page 11: Dan’s Multi-Option Talk

11© 2007 SRI International

Search Example – 1st Expansion

0.2

0.8

a1a2

0.0

0.0 0.0C

Pr(G)

C

Pr(G)

C

Pr(G)

C

Pr(G)

0.0

Expand Root Node andInitialize Pareto Sets ofChildren with null planAnd Heuristic Estimate

Page 12: Dan’s Multi-Option Talk

12© 2007 SRI International

Search Example – 1st Revision

0.2

0.8

a1a2

0.0

0.0 0.0C

Pr(G)

C

Pr(G)

C

Pr(G)

C

Pr(G)

a1

0.0

Recompute Pareto SetFor Root, find best heuristicPoint is through a1

Page 13: Dan’s Multi-Option Talk

13© 2007 SRI International

Search Example – 2nd Expansion

0.2

0.8

a1a2

a3a4

0.0

0.0 0.0

0.50.7

C

Pr(G)

C

Pr(G)

C

Pr(G)

C

Pr(G)

C

Pr(G)

0.0

C

Pr(G) a1

Expand Children of a1 and initialize their ParetoSets with null plan and Heuristic estimate –Both children Satisfy the Goal with non-zero probability

Page 14: Dan’s Multi-Option Talk

14© 2007 SRI International

Search Example – 2nd Revision

0.2

0.8

a1a2

a3a4

0.0

0.0 0.0

0.50.7

C

Pr(G)

C

Pr(G)

C

Pr(G)

C

Pr(G)

C

Pr(G)

a4

a4

a3

a3

0.0

C

Pr(G) a1,[a4|a3]

a1,[a4|a3]

Recompute Pareto Set of both expanded nodes and the root node – There is a feasible plan a1, [a4, a3] that satisfies the goal with 0.66 probability and cost 2. The heuristic estimate indicates extending a1, [a4, a3] will lead to a plan that satisfies the goal with 1.0 probability

Page 15: Dan’s Multi-Option Talk

15© 2007 SRI International

Search Example – 3rd Expansion

0.2

0.8

a1a2

a3a4

a7

0.0

0.0 0.0

0.50.7

0.9

C

Pr(G)

C

Pr(G)

C

Pr(G)

C

Pr(G)

C

Pr(G)

C

Pr(G)

C

Pr(G)

a4

a3

a1,[a4|a3]

0.0

a1,[a4|a3]

Expand Plan to include a7. There is no applicable action after a3

a4

a3

Page 16: Dan’s Multi-Option Talk

16© 2007 SRI International

Search Example – 3rd Revision

0.2

0.8

a1a2

a3a4

a7

0.0

0.0 0.0

0.50.7

0.9

C

Pr(G)

C

Pr(G)

C

Pr(G)

C

Pr(G)

C

Pr(G)

C

Pr(G)

C

Pr(G)

, a7a7

a4, a7

a4

a3

a2a1,[a4, a7|a3]a1,[a4|a3]

0.0

Recompute all Pareto Sets that areAncestors of Expanded Nodes. Heuristic for plans extended througha3 is higher because of no applicableaction. Heuristic at root node changesto plans extended through a2

a4,a7

a3

Page 17: Dan’s Multi-Option Talk

17© 2007 SRI International

Search Example – 4th Expansion

0.2

0.8

a1a2

a3a4a5a6

a7

0.0

0.0 0.0

0.50.70.10.0

0.9

C

Pr(G)

C

Pr(G)

C

Pr(G)

C

Pr(G)

C

Pr(G)

C

Pr(G)

C

Pr(G)

C

Pr(G)

C

Pr(G)

a7

a4, a7

a4

a3

a1,[a4, a7|a3]a1,[a4|a3]

0.0

a2

Expand Plan through a2,one expanding child satisfies the goal with 0.1 probability.

, a7

a4,a7

a3

Page 18: Dan’s Multi-Option Talk

18© 2007 SRI International

Search Example – 4th Revision

0.2

0.8

a1a2

a3a4a5a6

a7

0.0

0.0 0.0

0.50.70.10.0

0.9

C

Pr(G)

C

Pr(G)

C

Pr(G)

C

Pr(G)

C

Pr(G)

C

Pr(G)

C

Pr(G)

C

Pr(G)

C

Pr(G)

a7

a4, a7

a4

a3

a6

a5

a2,a5

a1,[a4, a7|a3]a1,[a4|a3]

0.0

a2, a6

Recompute Pareto sets of expandedAncestors. Plan a2, a5 is dominatedat the root.

a7

a4,a7

a3

Page 19: Dan’s Multi-Option Talk

19© 2007 SRI International

Search Example – 5th Expansion

0.2

0.8

a1a2

a3a4a5a6

a7a8

0.0

0.0 0.0

0.50.70.10.0

0.6 0.9

C

Pr(G)

C

Pr(G)

C

Pr(G)

C

Pr(G)

C

Pr(G)

C

Pr(G)

C

Pr(G)

C

Pr(G)

C

Pr(G)

C

Pr(G)

a7

a4, a7

a4

a3

a5

a1,[a4, a7|a3]a1,[a4|a3]

0.0

a2, a6Expand Plan through a6

a7

a4,a7

a3 a6

Page 20: Dan’s Multi-Option Talk

20© 2007 SRI International

Search Example – 5th Revision

0.2

0.8

a1a2

a3a4a5a6

a7a8

0.0

0.0 0.0

0.50.70.10.0

0.6 0.9

C

Pr(G)

C

Pr(G)

C

Pr(G)

C

Pr(G)

C

Pr(G)

C

Pr(G)

C

Pr(G)

C

Pr(G)

C

Pr(G)

C

Pr(G)

a7

a4, a7

a4

a3

a8a8

a6, a8

a5

a2,a6,a8a2,a5

a1,[a4, a7|a3]a1,[a4|a3]

0.0

Recompute Pareto Sets. Plans a2, a6, a8, and a2, a5 are dominated at root.

a7

a4,a7

a3 a6,a8

a2,a6,a8

Page 21: Dan’s Multi-Option Talk

21© 2007 SRI International

Search Example – Final

0.2

0.8

a1a2

a3a4a5a6

a7a8

0.0

0.0 0.0

0.50.70.10.0

0.6 0.9

C

Pr(G)

C

Pr(G)

C

Pr(G)

C

Pr(G)

C

Pr(G)

C

Pr(G)

C

Pr(G)

C

Pr(G)

C

Pr(G)

C

Pr(G)

a7

a4, a7

a4

a3

a8

a6, a8

a5

a1,[a4, a7|a3]a1,[a4|a3]

0.0

a7

a4,a7

a3

a8

a6,a8

a2,a6,a8

Page 22: Dan’s Multi-Option Talk

22© 2007 SRI International

Speed-ups-domination [Papadimtriou & Yannakakis, 2003]• Randomized Node Expansions

– Simulate Partial Plan to Expand a single node• Reachability Heuristics

– Use the McLUG (CSSAG)

Page 23: Dan’s Multi-Option Talk

23© 2007 SRI International

domination

x x’x’/x = 1+

Cost

1-Pr(G)

Multiply Each ObjectiveBy (1+)

Check Domination

Dominated

Non-Dominated

Each Hyper-RectangleHas a single point

Page 24: Dan’s Multi-Option Talk

24© 2007 SRI International

Synthesis Results

Page 25: Dan’s Multi-Option Talk

25© 2007 SRI International

Execution Results• Random Option: Sample Option, execute action• Keep Options Open

– Most Options: Execute action in most options– Diverse Options: Execute action in most

diverse set of options

Page 26: Dan’s Multi-Option Talk

26© 2007 SRI International

Summary & Future Work• Summary

– Multi-Option Plans let executor delay/change commitments to objective functions– Multi-Option Plans help executor understand alternatives– Multi-Option Plans passively enforce diversity through Pareto set approximation

• Future Work– Synthesis

Proactive Diversity: Guide search to broaden Pareto set Speedups: Alternative Pareto set representation, standard MDP tricks

– Execution Option Lookahead: how will set of options change? Meta-Objectives: Diversity, Decision Delay

– Model-Lite Planning Unspecified objectives (not just unspecified objective function) Objective Function preference elicitation

Page 27: Dan’s Multi-Option Talk

27© 2007 SRI International

Final Options• Option 1: Questions• Option 2: Criticisms• Option 3: Next Talk!