trade off between exploration and exploitation in satisficing planning fan xie

Trade off between Exploration and Exploitation

in Satisficing Planning

Fan Xie

Outline

What is Satisficing Planning

Heuristic Search in Planning

Why we need exploration?

Analysis of Arvand

Arvand-LTS: Arvand with Local MCTS

Experiments

AI Planning

Satisficing Planning

Deterministic environment

Only require sub-optimal solutions

Domain Independent Planning

Implicit Representation of the search space (why not explicit representation?) Impossible in most cases, because of huge state space

Example: An initial state: s0 A set of actions: A A set of requirements of a goal state: G

Outline




Analysis of Arvand


Experiments

Some Background

What is a Heuristic?Here, tell you how close this node to objects

Greedy Best-First Search:When expanding node n, take each successor n' and

place it on one list ordered by h(n’)

Hill Climbing Search:check neighbor nodes of current node, select the

node has lower h-value than current node. (if many, the lowest)

Terminates when no neighbor node has lower h-value

Heuristic Search As Planning

FF PlannerHill climbingFF heuristic: not admissbleEnforced Hill climbing: more exploration in hill

climbing to escape from local mimima

LAMA PlannerGreedy Best-First Search (WA*)Mixed heuristic: FF+Landmark

Outline




Analysis of Arvand


Experiments


Best First Search and Hill Climbing, mostly do greedy exploitation.

Problem: Local Minima and Plateaus

Local Minima and Plateaus

Local minima: local best h-value

Plateaus: an area all nodes have the same h-value

More Exploration

Current algorithms or planners directly address the tradeoff between exploration and exploitation:RRT(not for satisficing planning) Identidem (stochastic hill climbing)Diverse best-first search (not published yet)Arvand (Monte-Carlo random walk)

Rapidly-Exploring Random Tree(RRT)

RRT gradually builds a tree in the search space until a path to the goal state is found. At each step the tree is either expanded towards the goal, which corresponds to exploitation, or towards a randomly selected point in the search space for exploration

RRT example

RRT

RRT requires complete model of the environment to generate random points for exploration.

However, current planning domains mostly provide implicit representation of the search space. Random points might be invalid. (one possible way

to do is assume it is valid)Distribution of random points is not uniformed.

Identidem

Coles and Smith’s Identidem introduces exploration by stochastic local search (SLS).

Algorithm:Local searchaction sequences chosen probabilistically from the

set of all possible actions in each stateevaluates the FF heuristic after each action and

immediately jumps to the first state that improves on the start state

Diverse best-first search (DBFS)

diversify search directions by probabilistically selecting a node that does not have the best heuristic estimate （ not published yet）

DBFS GBFS KBFS

# Solved(16

12)

1451(161)

1209(403)

1288(324)

Arvand

Exploration using random walks helps to overcome the problem of local minima and plateaus.

Jumping greedily exploits the knowledge gained by the random walks.

Diff with Identidem: only the end-states of random walks are evaluated

Outline




Analysis of Arvand


Experiments

Analysis of Arvand

Fast Exploration:Exploration using random walksOnly end-states evaluated makes faster exploration

(computing heuristic value takes 90% of time)

Greedy Exploitation: Jump to the best obtained node

Advantages of Arvandescape from local minima and plateaus and

quickly

Coverage of Arvand(current ipc problems not hard enough)

Arvand LAMA FF Fast Downward

# Solved(17

82)

1641(92%)

1581(89%)

1389(78%)

1374(77%)

Still some problem

Problem:Waste a lot of knowledgeSometimes a lot of duplications

Outline




Analysis of Arvand


Experiments


Motivation:Use more knowledge we get from random walks?Selectively growing a search tree while running

random walks

Monte-Carlo Random Walk-based Local Tree Search (MRW-LTS)

Framework of MCTS

MRW-LTS

Every local search build a local search tree

Random walks are required starting from leaf nodes of the search tree.

Nodes in tree store the minimum h-value obtained by random walks starting from their subtrees (not node h-value)

It selects a leaf node by following an ε-greedy strategy in each node.

Some Change

Outline




Analysis of Arvand


Experiments

Experiments

1, IPC-2008

2, big search spaces

Coverage on IPC-6Domains LAMA Arvand Arvand-LTS

Cyber100% 100% 100%

Elevator87% 100% 100%

Openstacks100% 100% 100%

Parcprinter77% 100% 100%

Pegsols 100% 100% 100%Scanalyzer

100% 90% 90%Transport

100% 100% 100%Woodworking

100% 100% 100%Total

96% 99% 99%

Coverage

Summary

1, exploration is important in satisficing planning

2, A good balancing between exploration and exploitation might make a big difference!

trade off between exploration and exploitation in satisficing planning fan xie

Documents

exploration slide

ai planning slide

search space

search wa

analysis of arvand arvandlts

ff landmark slide

stochastic local search

local search action