lookahead pathology in real-time pathfinding mitja luštrek jožef stefan institute, department of...

Post on 05-Jan-2016

228 Views

Category:

Documents

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Lookahead pathology in real-time pathfinding

Mitja LuštrekJožef Stefan Institute, Department of Intelligent Systems

Vadim BulitkoUniversity of Alberta, Department of Computer Science

Introduction Problem Explanation Remedy

Real-time single-agent heuristic search Task:

find a path from a start state to a goal state

Complete search: plan the whole path to the goal state execute the plan

example: A* [Hart et al. 68]

good: given an admissible heuristic, the path is optimal bad: the delay before the first move can be large

Real-time single-agent heuristic search Incomplete search:

plan a part of the path to the goal execute the plan repeat

example: LRTA* [Korf 90], LRTS [Bulitko & Lee 06]

good: delay before the first move small, amount of planning per move bounded

bad: the path is typically not optimal

Why do we need it? Picture a real-time

strategy game The user commands

dozens of units to move towards a distant goal

Complete search would have to compute the whole paths for all of them

Incomplete search computes just the first couple of steps

Heuristic lookahead search

Currentstate Goal state

Lookahead area

Lookaheaddepth d

Heuristic lookahead search

Frontier state

True shortestdistance g

Estimated shortestdistance h

f = g + h

Heuristic lookahead search

Frontier statewith the lowest f(fopt)

Heuristic lookahead search

Heuristic lookahead search

h = fopt

Heuristic lookahead search

Lookahead pathology Generally believed that larger lookahead depths produce

better solutions Solution-length pathology: larger lookahead depths produce

worse solutions

Lookahead depth

Solution length

1 11

2 10

3 8

4 10

5 7

6 8

7 7

Degree ofpathology = 2

Lookahead pathology Pathology on states that do not form a path Error pathology: larger lookahead depths produce more

suboptimal decisions

Multiple states

Depth Error

1 0.31

2 0.25

3 0.21

4 0.24

5 0.18

6 0.23

7 0.12

One state

Depth Decision

1 suboptimal

2 suboptimal

3 optimal

4 optimal

5 optimal

6 suboptimal

7 suboptimal

Degree ofpathology= 2

There ispathology

Related: minimax pathology Minimax backs up heuristic values from the leaves of the

game tree to the root Attempts to explain why backed-up heuristic values are

better than static values Theoretical analyses show that they are worse – pathology

[Nau 79, Beal 80] Explanations:

similarity of nearby positions in real games realistic modeling of error ...

Focus on why the pathology does not appear in practice

Related: pathology in single-agent search Discovered on synthetic search trees [Bulitko et al. 03] Observed in eight puzzle [Bulitko 03]

appears with different evaluation functions shown that the benefit from knowing the optimal

lookahead depth is large Explained on synthetic search trees [Luštrek 05]

caused by certain properties of trees caused by inconsistent and inadmissible heuristics

Unexplored in pathfinding

Introduction Problem Explanation Remedy

Our setting HOG – Hierarchical Open Graph [Sturtevant et al.] Maps from commercial computer games (Baldur’s Gate,

Warcraft III)

Initial heuristic: octile distance (true distance assuming an empty map)

1,000 problems (map, start state, goal state)

On-policy experiments The agent follows a path from the start state to the goal

state, updating the heuristic along the way Solution length and error over the whole path computed for

each lookahead depth -> pathology

d = 1

d = 2d = 3

Off-policy experiments The agent spawns in a number of states It takes one move towards the goal state Heuristic not updated Error is computed from these first moves -> pathology

d = 1

d = 2

d = 3

d = 1

d = 2, 3

d = 3

d = 1, 2

Basic on-policy experiment

A lot of pathology – over 60%!

First explanation: a lot of states are intrinsically pathological (off-policy mode)

Not true: only 3.9% are If the topology of the maps is not at fault, perhaps the

algorithm is to blame?

Degree of pathology 0 1 2 3 4 ≥ 5

Length (problems %) 38.1 12.8 18.2 16.1 9.5 5.3

Error (problems %) 38.5 15.1 20.3 17.0 7.6 1.5

Off-policy experiment on 188 states

Not much less pathology than on-policy: 42.2% vs. 61.5%

Degree of pathology 0 1 2 3 ≥ 4

Problems % 57.8 31.4 9.4 1.4 0.0

Comparison not fair: On-policy: pathology from error over a number of states Off-policy: pathologicalness of single states

Fair: off-policy error over the same number of states as on-policy – 188 (chosen randomly)

Can use only error – no solution length off-policy

Tolerance The first off-policy experiment showed little pathology, the

second one quite a lot Perhaps off-policy pathology is caused by minor differences

in error – noise Introduce tolerence t:

increase in error counts towards the pathology only if error (d1) > t ∙ error (d2)

set t so that the pathology in the off-policy experiment on 188 states is < 5%: t = 1.09

Experiments with t = 1.09

On-policy changes little vs. t = 1: 57.7% vs. 61.9% Apparently on-policy pathology is more severe than off-

policy Investigate why! The above experiments are the basic on-policy experiment

and the basic off-policy experiment

Degree of pathology 0 1 2 3 4 ≥ 5

On-policy (prob. %) 42.3 19.7 21.2 12.9 3.6 0.3

Off-policy (prob. %) 95.7 3.7 0.6 0.0 0.0 0.0

Introduction Problem Explanation Remedy

Hypothesis 1

More pathology than in random states: 6.3% vs. 4.3% Much less pathology than basic on-policy: 6.3% vs. 57.7% Hypothesis 1 is correct, but it is not the main reason for on-

policy pathology

Degree of pathology 0 1 2 3 ≥ 4

Problems % 93.6 5.3 0.9 0.2 0.0

LRTS tends to visit pathological states with an above-average frequency

Test: compute pathology from states visited on-policy instead of 188 random states

Is learning the culprit?

Less pathology than basic on-policy: 20.2% vs. 57.7% Still more pathology than basic off-policy: 20.2% vs. 4.3% Learning is a reason, although not the only one

Degree of pathology 0 1 2 3 4 ≥ 5

Problems % 79.8 14.2 4.5 1.2 0.3 0.0

There is learning (updating the heuristic) on-policy, but not off-policy

Learning necessary on-policy, otherwise the agent gets caught in infinite loops

Test: traverse paths in the normal on-policy manner, measure error without learning

Hypothesis 2 Larger fraction of updated states at smaller depths

Updatedstate

Currentlookahead

area

Hypothesis 2 Smaller lookahead depths benefit more from learning This makes their decisions better than the mere depth

suggests Thus they are closer to larger depths If they are closer to larger depths, cases where a larger

depth happens to be worse than a smaller depth are more common

Test: equalize depths by learning as much as possible in the whole lookahead area – uniform learning

Uniform learning

Uniform learning

Search

Uniform learning

Update

Uniform learning

Search

Uniform learning

Update

Uniform learning

Uniform learning

Uniform learning

Uniform learning

Pathology with uniform learning

Even more pathology than basic on-policy: 59.1% vs. 57.7% Is Hypothesis 2 wrong?

Let us look at the volume of heuristic updates encountered per state generated during search

This seems to be the best measure of the benefit of learning

Degree of pathology 0 1 2 3 4 ≥ 5

Problems % 40.9 20.2 22.1 12.3 4.2 0.3

Volume of updates encountered

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

1 2 3 4 5 6 7 8 9 10

Depth

Up

dat

e vo

lum

e / g

ener

ated

Basic on-policy On-policy with uniform learning Basic off-policy

Hypothesis 2 is correct after all

Consistency Initial heuristic is consistent

the difference in heuristic value between two states does not exceed the actual shortest distance between them

Updates make it inconsistent Research on synthetic trees showed inconsistency causes

pathology [Luštrek 05]

Uniform learning preserves consistency It is more pathological than regular learning Consistency is not a problem in our case

Hypothesis 3 On-policy: one search every d moves, so fewer searchs at

larger depths Off-policy: one search every move

Hypothesis 3 The difference between

depths in the amount of search is smaller on-policy than off-policy

This makes the depths closer on-policy

If they are closer, cases where a larger depth happens to be worse than a smaller depth are more common

Test: search every move on-policy

Pathology when searching every move

Less pathology than basic on-policy: 13.1% vs. 57.7% Still more pathology than basic off-policy: 13.1% vs. 4.3% Hypothesis 3 is correct, the remaining pathology due to

Hypotheses 1 and 2

Further test: number of states generated per move

Degree of pathology 0 1 2 3 4 ≥ 5

Problems % 86.9 9.0 3.3 0.6 0.2 0.0

States generated / move

0

200

400

600

800

1000

1200

1400

1600

1800

1 2 3 4 5 6 7 8 9 10

Depth

Gen

erat

ed /

mo

ve

Basic on-policy On-policy every move Basic off-policy

Hypothesis 3 confirmed again

Summary of explanation On-policy pathology caused by different lookahead depths

being closer to each other in terms of the quality of decisions than the mere depths would suggest: due to the volume of heuristic updates ecnountered per

state generated due to the number of states generated per move

LRTS tends to visit pathological states with an above-average frequency

Introduction Problem Explanation Remedy

Is a remedy worth looking for?Averaged over 1,000 problems

Depth Length States

1 175.4 7.8

2 226.4 29.0

3 226.6 50.4

4 225.3 69.7

5 227.4 87.0

6 221.0 102.2

7 209.3 115.0

8 199.6 126.4

9 200.4 137.2

10 187.0 146.3

Optimal lookahead depth selected for each problem: Solution length =

107.9 States generated /

move = 73.6 The answer is yes –

solution length improved by 38.5%

What can we do? House + garden

Precompute the optimal depth for every start state

Optimal depth per start stateAveraged for house + garden

Depth Length States

1 253.2 7.8

2 346.3 29.4

3 329.1 50.4

4 337.0 69.3

5 358.9 85.7

6 318.8 101.2

7 283.6 116.2

8 261.5 126.7

9 282.6 133.2

10 261.1 142.7

Optimal lookahead depth selected for each start state: Solution length:

132.4 States generated /

move: 59.3

Similar to 1,000 problems – map representative

Optimal depth per start state

Optimal depth per move In a current state s, we can select the lookahead depth that

would be optimal if we were starting in s Might not be optimal because of learning prior to reaching

s, which would not have happened if we started in s

House + garden: solution length even smaller than with adapting per start

state: 113.3 vs. 132.4 fewer state generated / move: 34.0 vs. 59.3

Precomputation too expensive House + garden has 8,743 states That means 7.6 ∙ 107 directed pairs of states It could take months

If we were to go that far, we should just store the optimal paths instead, at least in a static environment

State abstraction Clique abstraction

[Sturtevant, Bulitko et al. 05]

Compute the optimal lookahead depth for the central ground-level state under each abstract state

Use the depth in all ground-level states under that abstract state

House + garden with abstraction

Abs. level Abs. states Length States/move

0 8,743 113.3 34.0

1 2,463 124.6 38.3

2 783 129.2 39.2

3 296 133.4 40.9

4 129 154.0 51.2

5 58 169.3 50.5

6 26 189.2 45.1

7 12 235.7 55.5

8 4 253.2 7.8

9 1 253.2 7.8

Noabstraction

Fixeddepth1

Abstraction level 5 3,306 directed

pairs of abstract states – 0.004% of ground-level pairs

Precomputed in a few hours, maybe even less

Future work Search on abstract states – even faster Problem: correlation between the optimal lookahead depth

at abstract levels and ground level

Smarter selection of ground-level states to merge into abstract states

Problem: how does the topology of maps affects the pathology

Thank you.

Questions?

top related