lookahead pathology in real-time pathfinding
DESCRIPTION
Lookahead pathology in real-time pathfinding. Mitja Luštrek Jožef Stefan Institute, Department of Intelligent Systems Vadim Bulitko University of Alberta, Department of Computer Science. Introduction Problem Explanation. Agent-centered search (LRTS). Lookahead area. Current state. - PowerPoint PPT PresentationTRANSCRIPT
Lookahead pathology in real-time pathfinding
Mitja LuštrekJožef Stefan Institute, Department of Intelligent Systems
Vadim BulitkoUniversity of Alberta, Department of Computer Science
Introduction
Problem
Explanation
Agent-centered search (LRTS)
Currentstate Goal state
Lookahead area
Lookaheaddepth d
Agent-centered search (LRTS)
Frontier state
True shortestdistance g
Estimated shortestdistance h
f = g + h
Agent-centered search (LRTS)
Frontier statewith the lowest f(fopt)
Agent-centered search (LRTS)
Agent-centered search (LRTS)
h = fopt
Agent-centered search (LRTS)
Lookahead pathology Generally believed that larger lookahead depths produce
better solutions Solution-length pathology: larger lookahead depths produce
worse solutions
Lookahead depth
Solution length
1 112 103 84 105 76 87 7
Degree ofpathology = 2
Lookahead pathology Pathology on states that do not form a path Error pathology: larger lookahead depths produce more
suboptimal decisions
Multiple statesDepth Error
1 0.312 0.253 0.214 0.245 0.186 0.237 0.12
One stateDepth Decision
1 suboptimal2 suboptimal3 optimal4 optimal5 optimal6 suboptimal7 suboptimal
Degree ofpathology= 2
There ispathology
Introduction
Problem
Explanation
Our setting HOG – Hierarchical Open Graph [Sturtevant et al.] Maps from commercial computer games (Baldur’s Gate,
Warcraft III)
Initial heuristic: octile distance (true distance assuming an empty map)
1,000 problems (map, start state, goal state)
On-policy experiments The agent follows a path from the start state to the goal
state, updating the heuristic along the way Solution length and error over the whole path computed for
each lookahead depth -> pathology
d = 1
d = 2 d = 3
Off-policy experiments The agent spawns in a number of states It takes one move towards the goal state Heuristic not updated Error is computed from these first moves -> pathology
d = 1
d = 2
d = 3
d = 1d = 2, 3
d = 3
d = 1, 2
Basic on-policy experiment
A lot of pathology – over 60%!
First explanation: a lot of states are intrinsically pathological (off-policy mode)
Not true: only 3.9% are If the topology of the maps is not at fault, perhaps the
algorithm is to blame?
Degree of pathology 0 1 2 3 4 ≥ 5Length (problems %) 38.1 12.8 18.2 16.1 9.5 5.3Error (problems %) 38.5 15.1 20.3 17.0 7.6 1.5
Off-policy experiment on 188 states
Not much less pathology than on-policy: 42.2% vs. 61.5%
Degree of pathology 0 1 2 3 ≥ 4Problems % 57.8 31.4 9.4 1.4 0.0
Comparison not fair: On-policy: pathology from error over a number of states Off-policy: pathologicalness of single states
Fair: off-policy error over the same number of states as on-policy – 188 (chosen randomly)
Can use only error – no solution length off-policy
Tolerance The first off-policy experiment showed little pathology, the
second one quite a lot Perhaps off-policy pathology is caused by minor differences
in error – noise Introduce tolerence t:
increase in error counts towards the pathology only if error (d1) > t ∙ error (d2)
set t so that the pathology in the off-policy experiment on 188 states is < 5%: t = 1.09
Experiments with t = 1.09
On-policy changes little vs. t = 1: 57.7% vs. 61.9% Apparently on-policy pathology is more severe than off-
policy Investigate why! The above experiments are the basic on-policy experiment
and the basic off-policy experiment
Degree of pathology 0 1 2 3 4 ≥ 5On-policy (prob. %) 42.3 19.7 21.2 12.9 3.6 0.3Off-policy (prob. %) 95.7 3.7 0.6 0.0 0.0 0.0
Introduction
Problem
Explanation
Hypothesis 1
More pathology than in random states: 6.3% vs. 4.3% Much less pathology than basic on-policy: 6.3% vs. 57.7% Hypothesis 1 is correct, but it is not the main reason for on-
policy pathology
Degree of pathology 0 1 2 3 ≥ 4Problems % 93.6 5.3 0.9 0.2 0.0
LRTS tends to visit pathological states with an above-average frequency
Test: compute pathology from states visited on-policy instead of 188 random states
Is learning the culprit?
Less pathology than basic on-policy: 20.2% vs. 57.7% Still more pathology than basic off-policy: 20.2% vs. 4.3% Learning is a reason, although not the only one
Degree of pathology 0 1 2 3 4 ≥ 5Problems % 79.8 14.2 4.5 1.2 0.3 0.0
There is learning (updating the heuristic) on-policy, but not off-policy
Learning necessary on-policy, otherwise the agent gets caught in infinite loops
Test: traverse paths in the normal on-policy manner, measure error without learning
Hypothesis 2 Larger fraction of updated states at smaller depths
Updatedstate
Currentlookahead
area
Hypothesis 2 Smaller lookahead depths benefit more from learning This makes their decisions better than the mere depth
suggests Thus they are closer to larger depths If they are closer to larger depths, cases where a larger
depth happens to be worse than a smaller depth are more common
Test: equalize depths by learning as much as possible in the whole lookahead area – uniform learning
Uniform learning
Uniform learning
Search
Uniform learning
Update
Uniform learning
Search
Uniform learning
Update
Uniform learning
Uniform learning
Uniform learning
Uniform learning
Pathology with uniform learning
Even more pathology than basic on-policy: 59.1% vs. 57.7% Is Hypothesis 2 wrong?
Let us look at the volume of heuristic updates encountered per state generated during search
This seems to be the best measure of the benefit of learning
Degree of pathology 0 1 2 3 4 ≥ 5Problems % 40.9 20.2 22.1 12.3 4.2 0.3
Volume of updates encountered
00.5
11.5
22.5
33.5
44.5
1 2 3 4 5 6 7 8 9 10
Depth
Upd
ate
volu
me
/ gen
erat
ed
Basic on-policy On-policy with uniform learning Basic off-policy
Hypothesis 2 is correct after all
Hypothesis 3 On-policy: one search every d moves, so fewer searchs at
larger depths Off-policy: one search every move
Hypothesis 3 The difference between
depths in the amount of search is smaller on-policy than off-policy
This makes the depths closer on-policy
If they are closer, cases where a larger depth happens to be worse than a smaller depth are more common
Test: search every move on-policy
Pathology when searching every move
Less pathology than basic on-policy: 13.1% vs. 57.7% Still more pathology than basic off-policy: 13.1% vs. 4.3% Hypothesis 3 is correct, the remaining pathology due to
Hypotheses 1 and 2
Further test: number of states generated per move
Degree of pathology 0 1 2 3 4 ≥ 5Problems % 86.9 9.0 3.3 0.6 0.2 0.0
States generated / move
0200400600800
10001200140016001800
1 2 3 4 5 6 7 8 9 10
Depth
Gen
erat
ed /
mov
e
Basic on-policy On-policy every move Basic off-policy
Hypothesis 3 confirmed again
Summary of explanation On-policy pathology caused by different lookahead depths
being closer to each other in terms of the quality of decisions than the mere depths would suggest: due to the volume of heuristic updates ecnountered per
state generated due to the number of states generated per move
LRTS tends to visit pathological states with an above-average frequency
Thank you.
Questions?