case-based subgoaling in real-time heuristic search for video … · 2011. 8. 3. · case-based...

32
Journal of Artificial Intelligence Research 39 (2010) 269 - 300 Submitted 04/10; published 09/10 Case-Based Subgoaling in Real-Time Heuristic Search for Video Game Pathfinding Vadim Bulitko BULITKO@UALBERTA. CA Department of Computing Science, University of Alberta Edmonton, Alberta, T6G 2E8, CANADA Yngvi Bj ¨ ornsson YNGVI @RU. IS School of Computer Science, Reykjavik University Menntavegi 1, IS-101 Reykjavik, ICELAND Ramon Lawrence RAMON. LAWRENCE@UBC. CA Computer Science, University of British Columbia Okanagan 3333 University Way, Kelowna, British Columbia, V1V 1V7, CANADA Abstract Real-time heuristic search algorithms satisfy a constant bound on the amount of planning per action, independent of problem size. As a result, they scale up well as problems become larger. This property would make them well suited for video games where Artificial Intelligence con- trolled agents must react quickly to user commands and to other agents’ actions. On the downside, real-time search algorithms employ learning methods that frequently lead to poor solution quality and cause the agent to appear irrational by re-visiting the same problem states repeatedly. The situation changed recently with a new algorithm, D LRTA*, which attempted to eliminate learn- ing by automatically selecting subgoals. D LRTA* is well poised for video games, except it has a complex and memory-demanding pre-computation phase during which it builds a database of subgoals. In this paper, we propose a simpler and more memory-efficient way of pre-computing subgoals thereby eliminating the main obstacle to applying state-of-the-art real-time search meth- ods in video games. The new algorithm solves a number of randomly chosen problems off-line, compresses the solutions into a series of subgoals and stores them in a database. When presented with a novel problem on-line, it queries the database for the most similar previously solved case and uses its subgoals to solve the problem. In the domain of pathfinding on four large video game maps, the new algorithm delivers solutions eight times better while using 57 times less memory and requiring 14% less pre-computation time. 1. Introduction Heuristic search is a core area of Artificial Intelligence (AI) research and its algorithms have been widely used in planning, game-playing and agent control. In this paper we are interested in real- time heuristic search algorithms that satisfy a constant upper bound on the amount of planning per action, independent of problem size. This property is important in a number of applications including autonomous robots and agents in video games. A common problem in video games is searching for a path between two locations. In most games, agents are expected to act quickly in response to player’s commands and other agents’ actions. As a result, many game companies impose a constant time limit on the amount of path planning per move (e.g., one millisecond for all simultaneously moving agents). c 2010 AI Access Foundation. All rights reserved. 269

Upload: others

Post on 13-Jul-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Case-Based Subgoaling in Real-Time Heuristic Search for Video … · 2011. 8. 3. · CASE-BASED SUBGOALING INREAL-TIME HEURISTIC SEARCH 2009) in the following ways. We store multiple

Journal of Artificial Intelligence Research 39 (2010) 269 - 300 Submitted 04/10; published 09/10

Case-Based Subgoaling in Real-Time Heuristic Search

for Video Game Pathfinding

Vadim Bulitko [email protected]

Department of Computing Science, University of AlbertaEdmonton, Alberta, T6G 2E8, CANADAYngvi Bjornsson [email protected]

School of Computer Science, Reykjavik UniversityMenntavegi 1, IS-101 Reykjavik, ICELANDRamon Lawrence [email protected]

Computer Science, University of British Columbia Okanagan3333 University Way, Kelowna, British Columbia, V1V 1V7, CANADA

Abstract

Real-time heuristic search algorithms satisfy a constant bound on the amount of planning peraction, independent of problem size. As a result, they scale up well as problems become larger.This property would make them well suited for video games where Artificial Intelligence con-trolled agents must react quickly to user commands and to other agents’ actions. On the downside,real-time search algorithms employ learning methods that frequently lead to poor solution qualityand cause the agent to appear irrational by re-visiting the same problem states repeatedly. Thesituation changed recently with a new algorithm, D LRTA*, which attempted to eliminate learn-ing by automatically selecting subgoals. D LRTA* is well poised for video games, except it hasa complex and memory-demanding pre-computation phase during which it builds a database ofsubgoals. In this paper, we propose a simpler and more memory-efficient way of pre-computingsubgoals thereby eliminating the main obstacle to applying state-of-the-art real-time search meth-ods in video games. The new algorithm solves a number of randomly chosen problems off-line,compresses the solutions into a series of subgoals and stores them in a database. When presentedwith a novel problem on-line, it queries the database for the most similar previously solved caseand uses its subgoals to solve the problem. In the domain of pathfinding on four large video gamemaps, the new algorithm delivers solutions eight times better while using 57 times less memoryand requiring 14% less pre-computation time.

1. Introduction

Heuristic search is a core area of Artificial Intelligence (AI) research and its algorithms have beenwidely used in planning, game-playing and agent control. In this paper we are interested in real-time heuristic search algorithms that satisfy a constant upper bound on the amount of planningper action, independent of problem size. This property is important in a number of applicationsincluding autonomous robots and agents in video games. A common problem in video games issearching for a path between two locations. In most games, agents are expected to act quicklyin response to player’s commands and other agents’ actions. As a result, many game companiesimpose a constant time limit on the amount of path planning per move (e.g., one millisecond for allsimultaneously moving agents).

c©2010 AI Access Foundation. All rights reserved.

269

Page 2: Case-Based Subgoaling in Real-Time Heuristic Search for Video … · 2011. 8. 3. · CASE-BASED SUBGOALING INREAL-TIME HEURISTIC SEARCH 2009) in the following ways. We store multiple

BULITKO, BJORNSSON, & LAWRENCE

While in practice this time limit can be satisfied by limiting problem size a priori, a scientificallymore interesting approach is to impose a time per-move limit regardless of the problem size. Doingso severely limits the range of applicable heuristic search algorithms. For instance, static searchalgorithms such as A* (Hart, Nilsson, & Raphael, 1968), IDA* (Korf, 1985) and PRA* (Sturte-vant & Buro, 2005; Sturtevant, 2007), re-planning algorithms such as D* (Stenz, 1995), anytimealgorithms such as ARA* (Likhachev, Gordon, & Thrun, 2004) and anytime re-planning algorithmssuch as AD* (Likhachev, Ferguson, Gordon, Stentz, & Thrun, 2005) cannot guarantee a constantbound on planning time per action. This is because all of them produce a complete, possibly ab-stract, solution before the first action can be taken. As the problem increases in size, their planningtime will inevitably increase, exceeding any a priori finite upper bound.

Real-time search addresses the problem in a fundamentally different way. Instead of computinga complete, possibly abstract, solution before the first action is taken, real-time search algorithmscompute (or plan) only a few first actions for the agent to take. This is usually done by conductinga lookahead search of a fixed depth (also known as “search horizon”, “search depth” or “lookaheaddepth”) around the agent’s current state and using a heuristic (i.e., an estimate of the remainingtravel cost) to select the next few actions. The actions are then taken and the planning-executioncycle repeats (Korf, 1990). Since the goal state is not seen in most such local searches, the agentruns the risks of selecting suboptimal actions. To address this problem, real-time heuristic searchalgorithms update (or learn) their heuristic function over time.

The learning process has precluded real-time heuristic search agents from being widely de-ployed for pathfinding in video games. The problem is that such agents tend to “scrub” (i.e., repeat-edly re-visit) the state space due to the need to fill in heuristic depressions (Ishida, 1992). As a result,solution quality can be quite low and, visually, the scrubbing behavior is perceived as irrational.

Since the seminal work on LRTA* (Korf, 1990), researchers have attempted to speed up thelearning process. We briefly describe the efforts in the related work section. Here, we note thatwhile the various approaches all brought about improvements, a breakthrough performance wasachieved by virtually eliminating the learning process in D LRTA* (Bulitko, Lustrek, Schaeffer,Bjornsson, & Sigmundarson, 2008). This was done by computing the heuristic with respect to anear-by subgoal as a distant goal. Offline, D LRTA* constructs a high-level graph of regions usingstate abstractions, calculates optimal paths between all region pairs, and then stores as subgoalsthe states where the paths cross region boundaries. During the online search, D LRTA* consultsthe database to find the next subgoal with respect to the current and goal regions. Since heuristicfunctions usually relax the problem (e.g., the Euclidean distance heuristic ignores obstacles on amap), they tend to be more accurate closer to a goal. As a result, a heuristic function with respectto a near-by goal tends to be more accurate and, therefore, requires less adjustment (i.e., learning).Consequently, the solution quality is improved and the scrubbing behavior is reduced.

In this paper, we adapt the idea of subgoaling and make the following four contributions. First,we simplify the pre-processing step of D LRTA*. Instead of using state abstraction to select sub-goals, we employ a nearest-neighbour algorithm over a database of solved cases. Second, we intro-duce the idea of compressing a solution path into a series of subgoals so that each can be “easily”reached from the previous one. In doing so, we use hill-climbing as a proxy for the notion of “easyreachability by LRTA*”. Third, we employ kd-trees in order to access the case base effectively.Finally, we evaluate the new algorithm empirically in large-scale problem spaces.

The new algorithm is called k Nearest Neighbor LRTA* (or kNN LRTA*) and, for the rest of thepaper, we set k = 1. This paper extends our previous conference publication (Bulitko & Bjornsson,

270

Page 3: Case-Based Subgoaling in Real-Time Heuristic Search for Video … · 2011. 8. 3. · CASE-BASED SUBGOALING INREAL-TIME HEURISTIC SEARCH 2009) in the following ways. We store multiple

CASE-BASED SUBGOALING IN REAL-TIME HEURISTIC SEARCH

2009) in the following ways. We store multiple goals per path to reduce the number of database ac-cesses and we use kd-trees to speed up each database access. Additionally, we make optimizationsto the on-line component of kNN LRTA*: evaluating only a small number of most similar databaseentries, interrupting LRTA* if it starts learning excessively and engaging start and end path opti-mizations. On the empirical evaluation side, we now use native multi-million state static video gamemaps and compare our algorithm to a newly published state-of-the-art non-learning real-time searchalgorithm (Bjornsson, Bulitko, & Sturtevant, 2009).

The rest of the paper is organized as follows. In Sections 2 and 3 we formulate the problem ofreal-time heuristic search and show how the core LRTA* algorithm can be extended with subgoalselection. Section 4 analyzes related research. Section 5 provides intuition for the new algorithmfollowing with details and pseudocode in Section 6. In Section 7 we give theoretical analysis and,in Section 8, empirically evaluate the algorithm in the domain of pathfinding. Section 9 summarizesthe empirical results. We then conclude with a discussion of current shortcomings and future work.

2. Problem Formulation

We define a heuristic search problem as a directed graph containing a finite set of states (vertices)and weighted edges, with a single state designated as the goal state. At every time step, a searchagent has a single current state, a vertex in the search graph, and takes an action (or makes a move)by traversing an out-edge of the current state. By traversing an edge between states s1 and s2 theagent changes its current state from s1 to s2. We say that a state is visited by the agent if andonly if it was the agent’s current state at some point of time. As it is usual in the field of real-timeheuristic search, we assume that path planning happens between the moves (i.e., the agent does notthink while traversing an edge). The “plan a move” - “travel an edge” loop continues until the agentarrives at its goal state, thereby solving the problem.

Each edge has a positive cost associated with it. The total cost of edges traversed by an agentfrom its start state until it arrives at the goal state is called the solution cost. We require algorithmsto be complete (i.e., produce a path from start to goal in a finite amount of time if such a path exists).In order to guarantee completeness for real-time heuristic search we make the assumption of safeexplorability of our search problems. Specifically, all costs are finite and for any states s1, s2, s3,if there is a path between s1 and s2 and there is a path between s1 and s3 then there is also a pathbetween s2 and s3.

Formally, all algorithms discussed in this paper are applicable to any such heuristic search prob-lem. To keep the presentation focused and intuitive as well as to afford a large-scale empiricalevaluation, we will use a particular type of heuristic search problem, pathfinding in grid worlds,for the rest of the paper. We discuss applicability of the new methods we suggest to other heuristicsearch and general planning problems in Section 11.

In video-game map settings, states are vacant square grid cells. Each cell is connected to fourcardinally (i.e., west, north, east, south) and four diagonally neighboring cells. Outbound edges ofa vertex are moves available in the corresponding cell and in the rest of the paper we will use theterms action and move interchangeably. The edge costs are defined as 1 for cardinal moves and 1.4for diagonal moves.1

An agent plans its next action by considering states in a local search space surrounding itscurrent position. A heuristic function (or simply heuristic) estimates the (remaining) travel cost

1. We use 1.4 instead of the Euclidean√2 to avoid errors in floating point computations.

271

Page 4: Case-Based Subgoaling in Real-Time Heuristic Search for Video … · 2011. 8. 3. · CASE-BASED SUBGOALING INREAL-TIME HEURISTIC SEARCH 2009) in the following ways. We store multiple

BULITKO, BJORNSSON, & LAWRENCE

between a state and the goal. It is used by the agent to rank available actions and select the mostpromising one. In this paper we consider only admissible and consistent heuristic functions whichdo not overestimate the actual remaining cost to the goal and whose difference in values for any twostates does not exceed the cost of an optimal path between these states. In this paper we use octiledistance – the minimum cumulative edge cost between two vertices ignoring map obstacles – as ourheuristic. This heuristic is admissible and consistent and uses 1 and 1.4 as the edge costs. An agentcan modify its heuristic function in any state to avoid getting stuck in local minima of the heuristicfunction, as well as to improve its action selection with experience.

The defining property of real-time heuristic search is that the amount of planning the agent doesper action has an upper bound that does not depend on the total number of states in the problemspace. Fast planning is preferred as it guarantees the agent’s quick reaction to a new goal specifica-tion. We measure mean planning time per action in terms of CPU time. We do not use the numberof states expanded as a CPU-independent measure of time because the algorithms evaluated in thispaper frequently perform time-consuming operations other than expanding states. Also note thatwhile total planning time per problem is important for non-real-time search, it is irrelevant in videogame pathfinding as we do not compute an entire path outright.

The second performance measure of our study is sub-optimality defined as the ratio of the so-lution cost found by the agent (c) to the minimum solution cost (c∗) minus one and times 100%:(

cc∗ − 1

)· 100. To illustrate, suboptimality of 0% indicates an optimal path and suboptimality of

50% indicates a path 1.5 times as costly as the optimal path.

3. LRTA*: The Core Algorithm

The core of most real-time heuristic search algorithms is an algorithm called Learning Real-TimeA* (LRTA*) (Korf, 1990). It is shown in Figure 1 and operates as follows. As long as the goal statesglobal goal is not reached, the algorithm interleaves planning and execution in lines 4 through 7. Inour generalized version we added a new step at line 3 for selecting goal sgoal (the original algorithmuses sglobal goal at all times). We will describe the details of subgoal selection later in the paper. Inline 4, a cost-limited breadth-first search with duplicate detection is used to find frontier states withcost up to gmax away from the current state s. For each frontier state s, its value is the sum of thecost of a shortest path from s to s, denoted by g(s, s), and the estimated cost of a shortest path froms to sgoal (i.e., the heuristic value h(s, sgoal)). The state that minimizes the sum is identified as s′

in line 5. Ties are broken in favour of higher g values. Remaining ties are broken in a fixed order.The heuristic value of the current state s is updated in line 6 (we keep separate heuristic tables forthe different goals and we never decrease heuristics). Finally, we take one step towards the mostpromising frontier state s′ in line 7.

LRTA* is a special case of value iteration or real-time dynamic programming (Barto, Bradtke,& Singh, 1995) and has a problem that has prevented its use in video game pathfinding. Specifi-cally, it updates a single heuristic value per move on the basis of heuristic values of near-by states.This means that when the initial heuristic values are overly optimistic (i.e., too low), LRTA* willfrequently re-visit these states multiple times, each time making updates of a small magnitude. Thisbehavior is known as “scrubbing”2 and appears highly irrational to an observer.

2. The term was coined by Nathan Sturtevant.

272

Page 5: Case-Based Subgoaling in Real-Time Heuristic Search for Video … · 2011. 8. 3. · CASE-BASED SUBGOALING INREAL-TIME HEURISTIC SEARCH 2009) in the following ways. We store multiple

CASE-BASED SUBGOALING IN REAL-TIME HEURISTIC SEARCH

LRTA*(sstart, sglobal goal, gmax)

1 s ← sstart2 while s �= sglobal goal do

3 if no subgoal is selected or the current subgoal is reached then select a (new) subgoal sgoal4 generate successor states of s up to gmax cost, generating a frontier5 find a frontier state s′ with the lowest g(s, s′) + h(s′, sgoal)6 update h(s, sgoal) to g(s, s′) + h(s′, sgoal)7 change s one step towards s′

8 end while

Figure 1: LRTA* algorithm with dynamic subgoal selection.

4. Related Research

Since the seminal work on LRTA* described in the previous section, researchers have attempted tospeed up the learning process. Most of the resulting algorithms can be described by the followingfour attributes:

The local search space is the set of states whose heuristic values are accessed in the planningstage. The two common choices are full-width limited-depth lookahead (Korf, 1990; Shimbo &Ishida, 2003; Shue & Zamani, 1993; Shue, Li, & Zamani, 2001; Furcy & Koenig, 2000; Hernandez& Meseguer, 2005a, 2005b; Sigmundarson & Bjornsson, 2006; Rayner, Davison, Bulitko, Ander-son, & Lu, 2007) and A*-shaped lookahead (Koenig, 2004; Koenig & Likhachev, 2006). Addi-tional choices are decision-theoretic based shaping (Russell & Wefald, 1991) and dynamic looka-head depth-selection (Bulitko, 2004; Lustrek & Bulitko, 2006). Finally, searching in a smaller,abstracted state has been used as well (Bulitko, Sturtevant, Lu, & Yau, 2007).

The local learning space is the set of states whose heuristic values are updated. Commonchoices are: the current state only (Korf, 1990; Shimbo & Ishida, 2003; Shue & Zamani, 1993; Shueet al., 2001; Furcy & Koenig, 2000; Bulitko, 2004), all states within the local search space (Koenig,2004; Koenig & Likhachev, 2006) and previously visited states and their neighbors (Hernandez &Meseguer, 2005a, 2005b; Sigmundarson & Bjornsson, 2006; Rayner et al., 2007).

A learning rule is used to update the heuristic values of the states in the learning space.The common choices are mini-min (Korf, 1990; Shue & Zamani, 1993; Shue et al., 2001;Hernandez & Meseguer, 2005a, 2005b; Sigmundarson & Bjornsson, 2006; Rayner et al., 2007),their weighted versions (Shimbo & Ishida, 2003), max of mins (Bulitko, 2004), modified Dijkstra’salgorithm (Koenig, 2004), and updates with respect to the shortest path from the current state to thebest-looking state on the frontier of the local search space (Koenig & Likhachev, 2006). Addition-ally, several algorithms learn more than one heuristic function (Russell & Wefald, 1991; Furcy &Koenig, 2000; Shimbo & Ishida, 2003).

The control strategy decides on the move following the planning and learning phases. Com-monly used strategies include: the first move of an optimal path to the most promising frontierstate (Korf, 1990; Furcy & Koenig, 2000; Hernandez & Meseguer, 2005a, 2005b), the entirepath (Bulitko, 2004), and backtracking moves (Shue & Zamani, 1993; Shue et al., 2001; Bulitko,2004; Sigmundarson & Bjornsson, 2006).

Given the multitude of proposed algorithms, unification efforts have been undertaken. In partic-ular, Bulitko and Lee (2006) suggested a framework, called Learning Real Time Search (LRTS), to

273

Page 6: Case-Based Subgoaling in Real-Time Heuristic Search for Video … · 2011. 8. 3. · CASE-BASED SUBGOALING INREAL-TIME HEURISTIC SEARCH 2009) in the following ways. We store multiple

BULITKO, BJORNSSON, & LAWRENCE

combine and extend LRTA* (Korf, 1990), weighted LRTA* (Shimbo & Ishida, 2003), SLA* (Shue& Zamani, 1993), SLA*T (Shue et al., 2001), and to a large extent, γ-Trap (Bulitko, 2004). In thedimensions described above, LRTS operates as follows. It uses a full-width fixed-depth local searchspace with transposition tables to prune duplicate states. LRTS uses a max of mins learning rule toupdate the heuristic value of the current state (its local learning space). The control strategy movesthe agent to the most promising frontier state if the cumulative volume of heuristic function updateson a trial is under a user-specified quota or backtracks to its previous state otherwise.

While the approaches listed above all brought about various improvements, breakthrough per-formance came in the form of subgoaling. Since commonly used heuristics simplify the problemat hand (e.g., the octile distance in grid-world pathfinding ignores obstacles), using LRTA* withnear-by subgoals effectively increases heuristic quality and thus reduces the amount of learning.

Although in general planning a goal is often represented as a conjunction of simple subgoals, tothe best of our knowledge, the only real-time heuristic search algorithm to implement subgoalingis D LRTA* (Bulitko, Bjornsson, Lustrek, Schaeffer, & Sigmundarson, 2007; Bulitko et al., 2008).In its pre-processing phase, D LRTA* uses the clique abstraction of Sturtevant and Buro (2005) tocreate a smaller search graph. The clique abstraction collapses a set of fully connected states intoa single abstract state and can be applied iteratively to compute progressively smaller graphs. Forexample, a 2-level abstraction applies the clique abstraction to a graph that has already been ab-stracted once. Similarly, an a-level abstraction applies the clique abstraction a times. If we assumethat each abstraction reduces the graph by a constant factor α, an a-level abstract graph would con-tain αa times fewer states than the original graph. This abstraction technique in effect partitions themap into a number of regions, with each region corresponding to a single abstract state. Then forevery pair of distinct abstract states, D LRTA* computes an optimal path between correspondingrepresentative states (e.g., centroids of the regions) in the original non-abstracted space. The pathis followed until it exits the region corresponding to the start abstract state. The entry state to thenext region is recorded as the subgoal for the pair of abstract states. Once this pre-processing step isfinished, D LRTA* runs LRTA* for a given problem but selects the subgoal recorded for the currentand the goal regions. The off-line and on-line steps are illustrated in Figure 2.

The underlying intuition is that reaching the entry-to-the-next-region state requires LRTA* tonavigate within a single region and is, therefore, “easy” with a default heuristic function. As a result,D LRTA* would rarely need to adjust its heuristic thereby virtually eliminating the costly learningprocess and the resulting scrubbing.

There are three key problems with D LRTA*. First, due to the fact that entry states (i.e., sub-goals) have to be computed and stored for each pair of distinct regions, the number of regions hasto be kept relatively small. In D LRTA* this is accomplished by applying the clique abstractionprocedure multiple times so that the regions become progressively larger and fewer in number. Aside effect is that regions will no longer be cliques and may, in fact, be quite complex in themselves.As a result, LRTA* may encounter heuristic depressions within a region (e.g., this would actuallyhappen if LRTA* tries to go from S to E in the right diagram of Figure 2). Second, each state inthe original space needs to be assigned to a region. Since the regions are irregular in shape, explicitmembership records must be maintained. This may require as much additional memory as storingthe original grid-based map. Third, clique abstraction is a non-trivial process and puts an extraprogramming burden on practitioners (e.g., game developers).

Another recent high-performance real-time search algorithm is Time-Bounded A* (TBA*)by Bjornsson et al. (2009), a time-bounded variant of the classic A*. It expands states in an A*

274

Page 7: Case-Based Subgoaling in Real-Time Heuristic Search for Video … · 2011. 8. 3. · CASE-BASED SUBGOALING INREAL-TIME HEURISTIC SEARCH 2009) in the following ways. We store multiple

CASE-BASED SUBGOALING IN REAL-TIME HEURISTIC SEARCH

C1

C2

E S

G

E

4 4 4 4 4 6 6 6 6

6

6

6

3

3 3

31

11

1

1

1 1 1 1

1

1

5

5

55555

2

2

2

2

2

2

2

2

2 2 2 2 2 7 7 7

7

7

7

7

Figure 2: Example of D LRTA* operation. Left: off-line, the map is partitioned into seven regions(or abstract states). Each vacant cell is labeled with its region number. Center: off-line,an optimal path between centroids of two regions (C1 and C2) is computed and the entrystate to the next region (E) is recorded as a subgoal for this pair of regions. Right: on-line, the agent intends to travel from S to G, it determines the corresponding regions andsets the pre-computed entry state E as its subgoal.

fashion using a closed list and an open list, away from the original start state, towards the goal untilthe goal state is expanded. However, unlike A* that plans a complete path before committing to thefirst action, TBA* time-slices the planning by interrupting its search periodically and acts. Initially,before a complete path to the goal is known, the agent takes an action that moves it towards the mostpromising state on the open list. If on a subsequent time slice an alternative most promising pathis formed and the agent is not on that path, it backtracks its steps as necessary. This interleavingof planning, acting, and backtracking is done in such a way that both real-time behavior and com-pleteness are ensured. The size of the time-slice is given as a parameter to the algorithm, using asa metric the number of states allowed to expand before the planning must be interrupted. Within asingle time-slice, however, operations for both state expansions and backtracing the closed list (toform the path to the most promising state on the open list) must be performed. The cost of the lattertype of operations is thus converted to state expansion equivalence (typically several backtracingsteps can be performed at the same computational cost as a single state expansion). A key aspect ofTBA* over LRTA*-based algorithms is that it retains closed and open lists over its planning steps.Thus, on each planning step it does not start planning from scratch, but continues with its open andclosed lists from the previous planning step. Also, it does not need to update heuristics online toensure completeness, nor does it require a precomputation phase. While the lack of precomputationis certainly its strong side, the negatives include high suboptimality if the amount of time per moveis low and high on-line space complexity due to storing closed and open lists.

This research is related to work from the realm of non-real-time heuristic search where patterndatabases are widely used to store pre-calculated distance information about abstractions of theoriginal (ground) search space (Culberson & Schaeffer, 1998). A recent approach for using pre-calculated state-space information is to calculate true distances between selected state pairs andthen use them whenever possible to make the distance estimates of the search guidance heuristich more informative. Two such enhanced heuristics are the differential heuristic (Cazenave, 2006;Sturtevant, Felner, Barrer, Schaeffer, & Burch, 2009) and the canonical heuristic (Sturtevant et al.,

275

Page 8: Case-Based Subgoaling in Real-Time Heuristic Search for Video … · 2011. 8. 3. · CASE-BASED SUBGOALING INREAL-TIME HEURISTIC SEARCH 2009) in the following ways. We store multiple

BULITKO, BJORNSSON, & LAWRENCE

2009). In the former case, a true distance d is pre-calculated from all states to a small subset ofstates S, so-called canonical states. During the on-line search the heuristic distance between anytwo arbitrary states a and b is calculated as the maximum of h(a, b) = |d(a, s) − d(b, s)| overall canonical states s ∈ S. In the latter case, for each state in the state space the true distanced to the closest canonical state is pre-calculated and stored and so is the true distance betweenall pairs of canonical states. During the search, the heuristic distance between any two states aand b is calculated as h(a, b) = d(C(a), C(b)) − d(a, C(a)) − d(b, C(b)) where C(s) returnsthe closest canonical state to s. These heuristics may return a lower distance estimate than anunmodified heuristic, so in practice one chooses the maximum of the two. An idea similar to thecanonical heuristic was proposed earlier in a more specialized context, where the heuristic functionwas improved by pre-calculating true distances between several strategically chosen passagewaysin a game map (Bjornsson & Halldorsson, 2006). These heuristics are not used in real-time search.

There is a large volume of work on case-based planning (e.g., Nebel & Koehler, 1995). Thisincludes path planning, where case-based approaches have been used to augment heuristic searchfor tasks such as route selection in road maps and mobile robot navigation. Such approaches typ-ically pre-compute and store paths, as opposed to distances, between selected states, and then usethem as model solutions for related pathfinding tasks in a case-based reasoning (CBR) fashion. Oneof the early works on combining search and case-based reasoning in pathfinding on road maps wasdone within the planning and learning system PRODIGY (Carbonell, Knoblock, & Minton, 1990),with the goal of generating near-optimal routes for an autonomous navigation vehicle trying toachieve multiple goals while driving in a city (Haigh & Veloso, 1993). The authors acknowledgethe benefits of such an approach in the situation where it is necessary to interleave planning and ex-ecution. Subsequent work on case-based route selection has though mainly focused on augmentingnon-interleaving path-planning algorithms, such as A* or Dijkstra, with the focus of the work onhow best to build the case base, for example, how to identify, compute, and store paths to criticaljunctions that many paths pass through (Anwar & Yoshida, 2001; Weng, Wei, Qu, & Cai, 2009). Asfor mobile robot navigation, two heuristic search algorithms working in ground space and using aCBR-based approach were introduced by Branting and Aha (1995). The simpler one, when lookingfor a path between states a and b, searches the pre-calculated case base for a path that contains botha and b. If a match is found the best path is returned, otherwise a regular A* search is invoked tocalculate the solution path. The second, and more elaborate, algorithm searches the case base for amatch in the same fashion as the first, but if none is found, it adapts an existing case to fit the newtask. This is done by using A* to join a and b to an existing path in the case base so that the newoverall distance is minimized. There is still ongoing research in this area, for example, work onstoring the case base as a graph structure called a case-graph that gradually builds a waypoint-likenavigation network (Hodal & Dvorak, 2008). Note that these and many other existing algorithmsare not real-time as they generate or modify complete plans.

5. Intuition for kNN LRTA*

In our design of kNN LRTA* we address the three shortcomings of D LRTA* listed earlier. Indoing so, we identify two key aspects of a subgoal-based real-time heuristic search. First, we needto define a set of subgoals that would be efficient to compute and store off-line. Second, we need todefine a way for the agent to find a subgoal relevant to its current problem on-line.

276

Page 9: Case-Based Subgoaling in Real-Time Heuristic Search for Video … · 2011. 8. 3. · CASE-BASED SUBGOALING INREAL-TIME HEURISTIC SEARCH 2009) in the following ways. We store multiple

CASE-BASED SUBGOALING IN REAL-TIME HEURISTIC SEARCH

Intuitively, if an LRTA*-controlled agent is in the state s going to the state sgoal then the bestsubgoal is a state sideal subgoal that resides on an optimal path between s and sgoal and can be reachedby LRTA* along an optimal path with no state re-visitation. Given that there can be multiple optimalpaths between two states, it is unclear how to computationally efficiently detect the LRTA* agent’sdeviation from an optimal path immediately after it occurs.

On the positive side, detecting state re-visitation can be done computationally efficiently by run-ning a simple greedy hill-climbing agent. This is based on the fact that if a hill-climbing agent canreach a state b from a state a without encountering a local minimum or a plateau in the heuristic thenan LRTA* agent can travel from a to b without state re-visitation (Theorem 5). Thus, we proposean efficiently computable approximation to sideal subgoal. Namely, we define the subgoal for a pairof states s and sgoal as the state skNN LRTA* subgoal farthest along an optimal path between s and sgoalthat can be reached by a simple hill-climbing agent (defined rigorously in the following section). Insummary, we select subgoals to eliminate any scrubbing (Theorem 5) but do not guarantee that theLRTA* agent keeps on an optimal path between the subgoals (Theorem 6). In practice, however,only a tiny fraction of our subgoals are reached by the hill-climbing agent suboptimally and eventhen the suboptimality is minor.

This approximation to the ideal subgoal allows us to effectively compute a series of subgoalsfor a given pair of start and goal states. Intuitively, we compress an optimal path into a seriesof key states such that each of them can be reached from its predecessor without scrubbing. Thecompression allows us to save a large amount of memory without much impact on time-per-move.Indeed, hill-climbing from one of the key states to the next requires inspecting only the immediateneighbors of the current state and selecting one of them greedily. The re-visitation-free reachabilityof one subgoal from another addresses the first key shortcoming of D LRTA* where the agent mayget “trapped” within a single complex region and thus be unable to reach its prescribed subgoal.

However, it is still infeasible to compute and then compress an optimal path between everytwo distinct states in the original search space. We solve this problem by compressing only apre-determined fixed number of optimal paths between random states off-line. Then on-line kNNLRTA*, tasked with going from s to sgoal, retrieves the most similar compressed path from itsdatabase and uses the associated subgoals. We define (dis-)similarity of a database path to theagent’s current situation as the maximum of the heuristic distances between s and the path’s begin-ning and between sgoal and the path’s end. We use maximum because we would like both ends ofthe path to be heuristically close to the agent’s current state and the goal respectively. Indeed, theheuristic distance ignores walls and thus a large heuristic distance to the path’s either end tends tomake that end hill-climbing unreachable.

Note that high similarity (i.e., both distances being low) does not guarantee that the path willbe useful to the kNN LRTA* agent. For instance, the beginning of the path can be heuristicallyvery close to the agent but on the other side of a long wall, making it unreachable without a lotof learning and the associated scrubbing. To address this problem we complement the fast-to-compute similarity metric with more computationally demanding move-limited reachability checksas detailed below.

We illustrate this intuition with a simple example. Figure 3 shows kNN LRTA* operation off-line. On this map, two random start and goal pairs are selected and optimal paths are computedbetween them. Then each path is compressed into a series of subgoals such that each of the subgoalscan be reached from the previous one via hill-climbing. The path from S1 to G1 is compressed intotwo subgoals and the other path is compressed into a single subgoal.

277

Page 10: Case-Based Subgoaling in Real-Time Heuristic Search for Video … · 2011. 8. 3. · CASE-BASED SUBGOALING INREAL-TIME HEURISTIC SEARCH 2009) in the following ways. We store multiple

BULITKO, BJORNSSON, & LAWRENCE

S2

G2

S1

G1

S1

G1

S1

G1

1

2

S2

G2G2

S2

1

Figure 3: Example of kNN LRTA* off-line operation. Left: two subgoals (start,goal) pairs arechosen: (S1, G1) and (S2, G2). Center: optimal paths between then are computed byrunning A*. Right: the two paths are compressed into a total of three subgoals.

Once this database of two records is built, kNN LRTA* can be tasked with solving a problemon-line. In Figure 4 it is tasked with going from the state S to the state G. The database is scannedand similarity between (S,G) and each of the two database records is determined. The records aresorted by their similarity: (S1, G1) followed by (S2, G2). Then the agent runs reachability checks:from S to Si and from Gi to G where i runs the database indices in the order of record similarity.In this example, S1 is found unreachable by hill-climbing from S and thus the record (S1, G1) isdiscarded. The second record passes hill-climbing checks and the agent is tasked with going to itsfirst subgoal (shown as 1 in the figure).

S1

G1

S2

G2

1

S1

G1

S2

G2

S

G

S

G

S

G

Figure 4: Example of kNN LRTA* on-line operation. Left: the agent intends to travel from S toG. Center: similarity of (S,G) to (S1, G1) and (S2, G2) is computed. Right: while(S1, G1) is more similar to (S,G) than (S2, G2), its beginning S1 is not reachable fromS via hill-climbing and hence the record (S2, G2) is selected and the agent is tasked withgoing to subgoal 1.

The similarity plus hill-climbing check approach makes the state abstraction of D LRTA* un-necessary, thereby addressing its other two key shortcomings: high memory requirements and acomplex pre-computation phase.

278

Page 11: Case-Based Subgoaling in Real-Time Heuristic Search for Video … · 2011. 8. 3. · CASE-BASED SUBGOALING INREAL-TIME HEURISTIC SEARCH 2009) in the following ways. We store multiple

CASE-BASED SUBGOALING IN REAL-TIME HEURISTIC SEARCH

6. kNN LRTA* in Detail

In this section we flesh out kNN LRTA* in enough detail for other researchers to implement it. Westart with a basic version and then describe several significant enhancements.

6.1 Basic kNN LRTA*

kNN LRTA* consists of two parts: database pre-computation (off-line) and LRTA* with dynam-ically selected subgoals (on-line). Pseudocode for the off-line part is presented in Figure 5. Thetop-level function computeSubgoals takes a user-controlled parameter N and a search graph(e.g., a grid-based map in pathfinding) and builds a subgoal database of N compressed paths.Each path is generated in line 4 from start and goal states randomly chosen in line 3. If thepath does not exist or is too short (line 5), we discard it and re-generate the start and goalstates. The compression takes place in the function compress, which returns a sequence of statesΓp =

(sstart, s

1subgoal, . . . , s

np

subgoal, sgoal

)where np ≥ 0 is the number of subgoals (line 6). The

sequence Γp is the compressed representation of path p and forms a single record in the subgoaldatabase (line 7).

subgoal database ← computeSubgoals(N,G)

1 subgoal database ← ∅2 for n = 1, . . . , N do

3 generate a random pair of states (sstart, sgoal)4 compute an optimal path p from sstart to sgoal with A*5 if p = ∅ ∨ |p| < 3 then go to step 3 end if

6 Γp ← compress(p)7 add Γp to the subgoal database8 end for

Figure 5: kNN LRTA* off-line: building a subgoal database.

Pseudocode for the function compress is found in Figure 6. It takes the path p = (sstart, . . . , sgoal) =(s1, . . . , st) as an argument and returns a subset Γ of it — the states reachable from each other viahill-climbing (and thus without scrubbing). The code builds the sequence γ of indices of stateswhich will then be put into Γ as subgoals. As long as the path is not exhausted (line 2), the nextcandidate subgoal is defined by the index i in line 3. Note that the state with the index i = end(γ)+1is always hill-climbing reachable from the state with the index end(γ) because these two states areimmediate neighbours. We then run a binary search defined by the scope of indices [l, r] in lines4 and 5. The middle of the scope is calculated in line 7 and its hill-climbing reachability from thelatest computed subgoal send(γ) is checked in line 8. If the middle is indeed hill-climbing reachablethen the scope is moved to the upper half (line 10) and the candidate subgoal is updated (line 9).Otherwise, the scope of the binary search is moved to the lower half in line 12. Once the binarysearch is completed, the candidate subgoal is added to γ in line 15.3 We convert the sequence ofindices γ into the sequence of states Γ in line 17.

The function reachable(sa, sb) checks if a hill-climbing agent can reach the state sb from thestate sa. The pseudocode is found in Figure 7. We start climbing from the state sa (line 1). As long

3. We use parentheses with set operations to indicate that γ is an ordered set.

279

Page 12: Case-Based Subgoaling in Real-Time Heuristic Search for Video … · 2011. 8. 3. · CASE-BASED SUBGOALING INREAL-TIME HEURISTIC SEARCH 2009) in the following ways. We store multiple

BULITKO, BJORNSSON, & LAWRENCE

Γ ← compress((s1, . . . , st))

1 γ ← (1)2 while t �∈ γ do

3 i ← end(γ) + 14 l ← i+ 15 r ← t6 while l ≤ r do

7 m ← l+r2

8 if reachable(send(γ), sm) then

9 i ← m10 l ← m+ 111 else

12 r ← m− 113 end if

14 end while

15 γ ← γ ∪ (i)16 end while

17 Γ ← sγ

Figure 6: kNN LRTA* off-line: compressing a path into a sequence of subgoals.

ρ ← reachable(sa, sb)1 s ← sa2 while s �= sb do

3 generate immediate successor states of s, generating a frontier4 if h(s) ≤ mins′′ ∈ frontier(h(s

′′)) then break

5 find the frontier state s′ with the lowest g(s, s′) + h(s′, sb)6 s ← s′

7 end while

8 ρ ← (s = sb)

Figure 7: Checking if one state is reachable from another. When this function is called on-line, afixed cap is put on the number of iterations in the while loop.

as the goal is not reached (line 2), we generate immediate successors of the current state (line 3) andcheck if we are in a local heuristic minimum or a plateau (line 4). If so we terminate our climb anddeclare that sb is not hill-climbing reachable from sa. Otherwise we climb towards a frontier statewith the lowest g + h value (lines 5 and 6). We use g + h instead of h to make the move selectioncorrespond to that of LRTA*. Additionally, the ties are broken in exactly the same way as they arewith the LRTA* algorithm in Figure 1. Note that whenever the function reachable is called in theon-line phase, we impose a fixed cutoff on the number of steps hill-climbing is allowed to travel.This is done to place an upper bound on the time complexity of the reachability check independentof the number of states in the search graph, as required by real-time operation.

In the on-line phase of kNN LRTA*, we run LRTA* as per Figure 1. Dynamic subgoal selection(line 3) is done as per pseudocode in Figure 8. Given a start and goal state, we scan our subgoal

280

Page 13: Case-Based Subgoaling in Real-Time Heuristic Search for Video … · 2011. 8. 3. · CASE-BASED SUBGOALING INREAL-TIME HEURISTIC SEARCH 2009) in the following ways. We store multiple

CASE-BASED SUBGOALING IN REAL-TIME HEURISTIC SEARCH

database and, for each record, compute heuristic distance between our start state and the record’sfirst state as well as heuristic distance between our goal state and the record’s last state. As we men-tioned earlier, we define the (dis-)similarity between our problem and the record as the maximumof the two heuristic distances. This is done so that similar records are such where the start and endare both close to the agent’s current position and its goal in terms of the heuristic distance.

All database records are sorted by their similarity to the agent’s current and global goal states(line 1) and, starting with the most similar record, we check if its start and end are hill-climbingreachable from the agent’s current state and the agent’s global goal respectively (line 4). If eitherreachability check fails, we go onto the next record. Otherwise, we stop the database search (line6). If we exhaust the database and find no reachable record, we resort to the global goal (line 9).Once a record is found, all its subgoals are fed one by one to LRTA* in line 3 of Figure 1.

The intuition is that our similarity metric uses heuristic distance and, therefore, ignores someconstraints of the problem (e.g., walls in grid-based pathfinding). Thus, a database record with ahigh similarity value may not be relevant to the agent’s situation as its start and goal may be onthe other side of a wall which means that its subgoals will not be reachable by LRTA* withoutscrubbing and therefore are useless to the agent.

r ← selectSubgoals(s, sglobal goal)

1 (r1, . . . , rN ) ← database records from most to least similar2 for i = 1, . . . , N do

3 retrieve ri = (sstart, . . . , send)4 if reachable(s, sstart) and reachable(send, sglobal goal) then

5 r ← ri6 return

7 end if

8 end for

9 r ←(s, sglobal goal

)Figure 8: kNN LRTA* on-line: selecting subgoals.

6.2 Enhanced kNN LRTA*

We have presented the basic kNN LRTA* algorithm. In this section we introduce six enhancements.First, before selecting a database record in the function selectSubgoals, we check if the global

goal is reachable from the agent’s current state. This is done by calling the function reachable. Ifthe global goal is indeed reachable via move-limited hill-climbing then we set it as the agent’s goaland do not look for a subgoal. Otherwise, we turn to the database for subgoals.

Second, having selected a database record in the routine selectSubgoals, we run a reachabilitycheck between the agent’s current state and the first subgoal in the record. If the first subgoal isreachable then we set it as the goal for LRTA*. Otherwise, we set the LRTA* to go to the start stateof the record which is already checked to be reachable within the function selectSubgoals.

Third, when LRTA* reaches the last subgoal (i.e., the state in the record immediately prior tothe end of the record), it checks if the global goal is reachable from it. If so, the global goal is usedas the next subgoal. Otherwise, the agent heads for the end of the record from which it can reachthe global goal as guaranteed by the record selection criteria.

281

Page 14: Case-Based Subgoaling in Real-Time Heuristic Search for Video … · 2011. 8. 3. · CASE-BASED SUBGOALING INREAL-TIME HEURISTIC SEARCH 2009) in the following ways. We store multiple

BULITKO, BJORNSSON, & LAWRENCE

Each of the first three enhancements addresses a trade-off between path optimality and planningtime per move. Specifically, calling the function reachable, while real-time, increases kNN LRTA*planning time per move but, at the same time, leads to a potentially shorter solution due to bettersubgoal selection. Recall that the function reachable satisfies the real-time operation constraintbecause we place an a priori limit on the number of moves it can take.

Reachability checks constitute a substantial portion of kNN LRTA*’s planning time per move.The other substantial contributor is accessing the record database and computing record similarity.The basic algorithm described above always computes the similarity for all database records and, inthe worst case, runs reachability checks for all records in the function selectSubgoals. While thisdoes not depend on the search graph size and is thus real-time, we can still speed it up as follows.

The fourth enhancement is to run reachability checks only for a fixed number of most similarrecords. This can be done simply by substituting the total number of database records N with afixed constant M ≤ N in line 2 of Figure 8. The intuition is that only fairly similar records areworth checking for reachability.

When M � N this enhancement can substantially reduce the amount of planning time takenup by reachability checks. However, the similarity is still computed for all records in the database(line 1 in Figure 8). The fifth enhancement speeds this step up by employing kd-trees instead of alinear database scan. A kd-tree (Moore, 1991) is a spatial tree index that can have a sublinear timecomplexity for nearest-neighbor searches. Specifically, our kd-tree indexes start and end states ofthe subgoal database records. Each tree node is thus a four-tuple (xstart, ystart, xend, yend). The indexworks by dividing the search space along a dimension at each level of the tree. The search space isdivided on xstart below the root node of the tree, ystart at the next level down, xend on the next level,yend on the next, and then the cycle repeats. For example, if the root node is (4, 5, 8, 9), then thestart state has coordinates (4, 5) and the end state has coordinates (8, 9). Further, any nodes in theleft subtree have xstart ≤ 4, and nodes in the right subtree have xstart > 4.

To illustrate, consider the tree in Figure 9 and a subgoal record whose start state is (8, 4) andwhose goal state is (4, 9). This records will be represented by a kd-tree node (8, 4, 4, 9). It is in theright subtree of the root as its xstart = 8 which is greater than the root’s value of 4. It is in the leftsubtree of the next node as its value of 4 for ystart is less than the node’s value of 5. At the thirdlevel, it is in the left subtree as its value of 4 for xend is less than 6. Finally, it is the right subtree ofits parent at level four because its value of yend = 9 is greater than the 8 of its parent.

divide by start x−coordinate

8,4,4,98,3,3,7

9,9,5,59,9,4,49,1,9,36,2,5,8

5,9,4,68,3,6,4

7,5,6,4

3,8,5,41,9,2,61,6,8,33,2,2,1

2,8,3,34,5,4,5

3,7,6,6

4,5,8,9

endY > 8endY <= 8

endX > 4endX <= 4endX > 6endX <= 6

startY > 5startY <= 5

endX > 3endX <= 3endX > 4endX <= 4

startY > 7startY <= 7

startX > 4startX <= 4

divide by end y−coordinate

divide by end x−coordinate

divide by start y−coordinate

Figure 9: A kd-tree for database access.This structure allows nearest-neighbors to be computed without searching all paths in the tree

index by eliminating some subtrees based on distance. For instance, if the search currently has thebest M records found so far, and it encounters a node in the tree where it can be guaranteed that all

282

Page 15: Case-Based Subgoaling in Real-Time Heuristic Search for Video … · 2011. 8. 3. · CASE-BASED SUBGOALING INREAL-TIME HEURISTIC SEARCH 2009) in the following ways. We store multiple

CASE-BASED SUBGOALING IN REAL-TIME HEURISTIC SEARCH

nodes are farther away than those M records from the search target, then that subtree is not searched.The nearest-neighbor search algorithm is explained by Moore (1991). Note that the kd-tree indexworks for regular grid pathfinding problems but not necessarily all heuristic search problems. Forinstance, high-cost edges connecting states with similar coordinates or low-cost edges connectingstates with distant coordinates would present a problem for the kd-tree index.

Given a subgoal database, we build a kd-tree to index it off-line and store it together with thedatabase. On-line, we use the kd-tree to identify the M records relevant to the agent’s current startand goal states (line 1 in Figure 8). We then compute the similarity metric only for these M records.

The sixth enhancement deals with the case where kNN LRTA* is unable to find a subgoal andresorts to its global goal. This happens in the function selectSubgoals (line 9 of Figure 8). A failureto find a subgoal is caused by none of the M most similar records passing our reachability checks.Having to resort to a global goal indicates an insufficient database coverage of the current area inthe space of start and goal state pairs. Given that records are compressed optimal paths betweenrandomly generated start and goal states, database coverage is likely to be uneven. Thus resortingto a global goal should not be a permanent step as the agent traveling to a global goal is likely toenter an area covered by the database sooner or later. At that point, the record selection process canbe repeated, hopefully resulting in a database hit. We implement this intuition in kNN LRTA* byimposing a travel quota on LRTA* after the function selectSubgoals fails to find a reachable record.The quota is computed as a heuristic distance between the agent’s current state and the global goalmultiplied by a fixed constant greater than 1. Once the agent exhausts its quota, selectSubgoals

is called again. If it fails to find a reachable subgoal for a second time in a row, the quota isset to infinity leading to no further interruptions. This is necessary to guarantee completeness.Additionally, interrupting LRTA* indefinitely many times increases average planning time per movedue to subgoal selection attempts.

We have also experimented with the idea that a database record of the form (s1, . . . , sn) does nothave to be used in its entirety. Indeed, any of its fragments (i.e., (si, . . . , sj) where 1 ≤ i < j ≤ n)can be used within kNN LRTA* in the same fashion as the entire record. We implemented thisidea by running the kd-tree search over all fragments of database records in addition to the wholerecords. The results were disappointing in several ways. First, the kd-tree algorithm becomes morecomplex and the kd-tree query time increases. Second, record fragments “crowd” the M hits thatthe kd-tree returns and for which the similarity metric is computed. In practice this means that thekd-tree returns M similar but not hill-climbing unreachable records and, thus, causes kNN LRTA*to resort to its global goal more often. This can be fixed by increasing M accordingly but then thesimilarity computation and the hill-climbing checks become more costly.

7. Theoretical Analysis

In this section we prove completeness of the algorithm and analyze its complexity.

7.1 Off-line Complexity

Off-line kNN LRTA* generates N records in a space of S states. Let the diameter of the space (i.e.,the number of states along the longest possible shortest path between two states) be δS .

Theorem 1 Off-line worst-case space complexity of kNN LRTA* is O(NδS + S).

283

Page 16: Case-Based Subgoaling in Real-Time Heuristic Search for Video … · 2011. 8. 3. · CASE-BASED SUBGOALING INREAL-TIME HEURISTIC SEARCH 2009) in the following ways. We store multiple

BULITKO, BJORNSSON, & LAWRENCE

Proof. In the worst case each optimal path kNN LRTA* generates between randomly selected startand goal is δS long and is minimally compressible. Minimum compression means that every stateon the path is stored. If all N records have this property then the total amount of database storage isO(NδS). Additionally, A* is run for each record and has the worst-case space complexity of S. �

Theorem 2 Off-line worst-case time complexity of kNN LRTA* is O(NS logS +N logN).

Proof. kNN LRTA* runs A* to compute an optimal path for N pairs of randomly generated startand goal states. With a consistent heuristic and other constraints from our problem formulation,A*’s worst case time complexity is O(S logS). Since δS ≤ S, A*’s complexity dominates theworst-case time complexity of the function compress which is O(δS log δS). Additionally, buildinga kd-tree takes O(N logN). �

7.2 On-line Complexity

In this section we assume that LRTA* generates all immediate neighbors of its current state and onlythem on each move. In our grid pathfinding this can be easily accomplished by setting gmax = 1.4.More generally, this can be guaranteed by substituting line 4 in Figure 1 with “generate immediatesuccessor states of s”.

Theorem 3 kNN LRTA*’s on-line worst-case space complexity is O(dmax + S) where dmax is themaximum out-degree of any vertex and S is the total number of states.

Proof. The open list of kNN LRTA* is at most the maximum number of immediate neighbors ofany state (i.e., dmax). As LRTA* learns, it has to store updated heuristic values, of which there areno more than S. Hence the overall space complexity is O(dmax + S). Note that in grid pathfindingdmax � S and dmax does not increase with map size, thereby reducing the upper bound to O(S). �

Theorem 4 kNN LRTA*’s per-move worst-case time complexity is O(dmax+N+M logM) wheredmax is the maximum out-degree of any vertex, N is the total number of records in the database andM is the number of candidate records selected by the kd-tree.

Proof. On each move kNN LRTA* invokes LRTA* which generates at most dmax states. On somemoves, kNN LRTA* additionally searches its database to find the appropriate record. The databasesearch starts with querying the kd-tree for M records (M ≤ N ). While balanced kd-trees can havetime complexity sub-linear in N , the worst case time for this step is still O(N). We then sort theM records by their similarity in O(M logM) time. Finally, move-limited hill-climbing checks arerun for the records, collectively taking no more than O(M) time. Thus, the overall per-move timecomplexity is O(dmax +N +M logM) in the worst case. �

Note that this bound does not depend on S and, therefore, makes kNN LRTA* real-time by ourdefinition. Also note that in grid pathfinding dmax � N and M � N which makes kNN LRTA*’sper move time complexity simply O(N).

284

Page 17: Case-Based Subgoaling in Real-Time Heuristic Search for Video … · 2011. 8. 3. · CASE-BASED SUBGOALING INREAL-TIME HEURISTIC SEARCH 2009) in the following ways. We store multiple

CASE-BASED SUBGOALING IN REAL-TIME HEURISTIC SEARCH

7.3 Completeness

Theorem 5 For any two states s1 and s2, if s2 is hill-climbing reachable from s1 then an LRTA*agent starting in s1 will reach s2 without state re-visitation (i.e., scrubbing).

Proof. First, we show that if the hill-climbing agent (as specified in the function reachable inFigure 7) can reach s2 from s1 then it can never re-visit any states on its way. Suppose, that isnot the case. Then there exists a state s3 re-visited by the hill-climbing agent. Because our tiesare broken in a fixed order, once the hill-climbing agent arrives at s3 for the second time, it willcontinue following the same path as it did after the first visit and will, therefore, arrive at s3 for thethird time and so on. In other words, it will be in an infinite loop re-visiting s3 repeatedly. Thiscontradicts the fact that it was able to reach s2.

From here we conclude that the path between s1 and s2 followed by the hill-climbing agent isfree of repeated states. We now have to show that the LRTA* agent starting in s1 will follow exactlythe same path as the hill-climbing agent. Observe that the only difference between our hill-climbingagent (Figure 7) and the LRTA* agent (Figure 1) is the heuristic update rule in line 6 of the latterfigure. The update rule can only increase heuristic values (i.e., make them less attractive to theagent) and only in already visited states. Since the hill-climbing agent never re-visits its states whiletraveling between s1 and s2, any increase in the heuristic values caused by LRTA* does not affectLRTA*’s move choice (line 5 in Figure 1). As a result, LRTA* will follow precisely the same pathbetween s1 and s2 as the hill-climbing agent and thus will not re-visit any states. �

Theorem 6 There exist two states s1 and s2 such that s2 is hill-climbing reachable from s1 but thepath that the hill-climbing agent follows is not optimal (i.e., shortest).

Proof. The proof is constructive and presented in Figure 10. The darkened cells are walls. The hill-climbing agent, starting in the state s1 will “hug” the wall on its way to the state s2.The resultingpath has the cost of 16.4. The optimal path, however, takes advantage of diagonal moves by makinga non-greedy move and going around the wall above the agent. Its cost is 15. �

Theorem 7 kNN LRTA* is complete for any size of its subgoal database if the underlying kNNLRTA* generates at least all immediate neighbors of the current state.

Proof. To prove completeness we need to show that for any pair of states s1 and s2, if there is apath between s1 and s2, kNN LRTA* will reach s2 from s1.

Given a problem, the subgoal selection module of kNN LRTA* (Figure 8) will either return arecord of the form r = (sstart, . . . , send) or instruct LRTA* to go to the global goal. In the lattercase, kNN LRTA* is complete because the underlying LRTA* is complete (Korf, 1990) as long asit generates all immediate neighbors of its current state.

In the former case, LRTA* is guaranteed to reach either sstart or the first subgoal of r dueto the way r is selected. Once any of the states in r is reached, LRTA* is guaranteed to reachthe subsequent states due to the completeness of the basic LRTA* and the way the subgoals aregenerated. Note that the interruptibility enhancement does not interfere with completeness becausewe can interrupt going for a global goal at most once. �

285

Page 18: Case-Based Subgoaling in Real-Time Heuristic Search for Video … · 2011. 8. 3. · CASE-BASED SUBGOALING INREAL-TIME HEURISTIC SEARCH 2009) in the following ways. We store multiple

BULITKO, BJORNSSON, & LAWRENCE

s1

s2

Figure 10: Hill-climbing reachability does not guarantee optimality.

8. Empirical Evaluation

Pathfinding in video games is a challenging task, frequently requiring many units to plan their pathssimultaneously and to react promptly to user commands. The task is made even more challengingby ever-growing map sizes and little computational resources allocated to in-game AI. Accordingly,most recent work in the field of real-time heuristic search uses video game pathfinding as a testbed.

8.1 Test Problems

Maps modelled after game levels from Baldur’s Gate (BioWare Corp., 1998) and WarCraft III:Reign of Chaos (Blizzard Entertainment, 2002) have been a common choice (e.g., Sturtevant &Buro, 2005; Bulitko et al., 2008). These maps, however, are small by today’s standards and do notrepresent the state of the industry. For this paper, we developed a new set of maps modelled aftergame levels from Counter-Strike: Source (Valve Corporation, 2004), a popular on-line first-personshooter. In this game the level geometry is specified in a vector format. We developed software toconvert it to a grid of an arbitrary resolution. While previous papers commonly used maps in therange of 104 to 105 grid cells (e.g., between 150 × 141 and 512 × 512 cells in Sturtevant, 2007;Bulitko & Bjornsson, 2009), our new maps have between nine and thirteen million vacant cells (i.e.,states). This is a two to three orders of magnitude increase in size. As a point of reference, theentire road network of Western Europe used for state-of-the-art route planning has approximatelyeighteen million vertices (Geisberger, Sanders, Schultes, & Delling, 2008).

The experiments in this paper were run on a set of 1000 randomly generated problems across thefour maps shown in Figure 11. There were 250 problems on each map and they were constrained tohave solution cost of at least 1000. The grid dimensions varied between 4096 × 4604 and 7261 ×4096 cells. For each problem we computed an optimal solution cost by running A*. The optimalcost was in the range of [1003.8, 2999.8] with a mean of 1881.76, a median of 1855.2 and a standarddeviation of 549.74. We also measured the A* difficulty defined as the ratio of the number of statesexpanded by A* to the number of edges in the resulting optimal path. For the 1000 problems, the

286

Page 19: Case-Based Subgoaling in Real-Time Heuristic Search for Video … · 2011. 8. 3. · CASE-BASED SUBGOALING INREAL-TIME HEURISTIC SEARCH 2009) in the following ways. We store multiple

CASE-BASED SUBGOALING IN REAL-TIME HEURISTIC SEARCH

Figure 11: The maps used in our empirical evaluation.

A* difficulty was in the range of [1, 199.8] with a mean of 62.60, a median of 36.47 and a standarddeviation of 64.14.

All algorithms compared were implemented in Java using common data structures as much aspossible. We used Java version 6 under SUSE Enterprise Linux 10 on a 2.1 GHz AMD Opteronprocessor with 32 Gbytes of RAM. All timings are reported for single-threaded computations.

8.2 Algorithms Evaluated

We evaluated kNN LRTA* with the following parameters. Database size values were in{1000, 5000, 10000, 40000, 60000, 80000} records. On-line, we allowed our hill-climbing test toclimb for up to 250 steps before concluding that the destination state is not hill-climbing reachable.This value was picked after some experimentation and had to be appropriate for the record densityon the map. Indeed, a larger database requires fewer hill-climbing steps to maintain the likelihoodof finding a hill-climbing reachable record for a given problem.

287

Page 20: Case-Based Subgoaling in Real-Time Heuristic Search for Video … · 2011. 8. 3. · CASE-BASED SUBGOALING INREAL-TIME HEURISTIC SEARCH 2009) in the following ways. We store multiple

BULITKO, BJORNSSON, & LAWRENCE

We ran reachability checks on the 10 most similar records.4 Whenever selectSubgoals failedto find a matching record, we allowed LRTA* to travel towards its global goal up to 3 times theheuristic estimate of the remaining path. After that, LRTA* was interrupted and the second attemptto find an appropriate subgoal was run. LRTA*’s parameter gmax was set to the cost of the mostexpensive edge (i.e., 1.4) so that LRTA* generated only all immediate neighbors of its current state.

We also ran two recent high-performance real-time search algorithms to compare kNN LRTA*against: D LRTA* and TBA*. D LRTA* was run with the databases computed for abstraction levelsof {9, 10, 11, 12}. TBA* was run with the time slices of {5, 10, 20, 50, 100, 500, 1000, 2000, 5000}.The cost ratio of expanding a state to backtracing was set to 10.

We chose the space of control parameters via trial and error, with three considerations in mind.First, we had to cover enough of the space to clearly determine the relationship between controlparameters and algorithm’s performance. Second, we attempted to establish the pareto-optimalfrontier (i.e., determine which algorithms dominate others by simultaneously outperforming themalong two performance measures such as time per move and suboptimality). Third, parameter valueshad to be such that we could run the algorithms in a practical amount of time (e.g., building adatabase for D LRTA*(8) would have taken us over 800 hours which is not practical). We detail ourobservations with respect to all three considerations below.

8.3 Solution Suboptimality and Per-Move Planning Time

We begin the comparisons by looking at average solution suboptimality versus average time permove. The left plot in Figure 12 shows the overall picture by plotting all algorithms and parameters.The right plot zooms in on a high-performance area. Table 1 shows the individual values. kNNLRTA* produces the highest quality solutions, followed by TBA*.

D LRTA* with its mean suboptimality of 819.72% delivers paths which are about 9 timescostlier than optimal paths. Such suboptimality is impractical in pathfinding and we included DLRTA*(9) in the right subplot of Figure 12 only to illustrate the substantial gap in solution qualitybetween D LRTA* and kNN LRTA*. Optimality of D LRTA* solutions can be improved by lower-ing the abstraction level but the database pre-computation increases rapidly as we discuss below.

TBA* produces solutions substantially less costly than D LRTA* but cannot reach kNN LRTA*with the database size of 60 and 80 thousand records. Additionally, TBA* is noticeably slowerper move as it expands more than one state and allocates some time to backtracking as well. Thetime per move can be decreased by lowering the value of cutoff but already with the cutoff of 10,TBA* produces unacceptably suboptimal solutions (666.5% suboptimal). As a result, kNN LRTA*dominates TBA* by outperforming it with respect to both measures. This is intuitive as TBA* doesnot benefit from subgoal precomputation.

On the other hand, D LRTA* stands non-dominated due to its low time per move. This isalso intuitive as it does not have to scan the database for most similar records and then check hill-climbing reachability for them. The differences between D LRTA* and kNN LRTA* are, however,below 4 microseconds per move.

For the sake of reference, we also included A* results in the table. A* is not a real-time algo-rithm and its average time per move tends to increase with the number of states in the map. Also,

4. We have also experimented with querying the kd-tree for 100 most similar records and found a very minor improve-ment of suboptimality together with a significant increase in the mean time per move. This is because frequently ahill-climbing-reachable database record will be among the top 10 candidates and thus the extra time spent queryingthe kd-tree for 90 more records and then sorting them is wasted.

288

Page 21: Case-Based Subgoaling in Real-Time Heuristic Search for Video … · 2011. 8. 3. · CASE-BASED SUBGOALING INREAL-TIME HEURISTIC SEARCH 2009) in the following ways. We store multiple

CASE-BASED SUBGOALING IN REAL-TIME HEURISTIC SEARCH

0 50 100 150 200 250 3000

2

4

6

8

10x 10

4

Mean time per move (μseconds)

Mea

n su

bopt

imal

ity (

%)

D LRTA*kNN LRTA*TBA*

0 50 100 1500

200

400

600

800

1000

Mean time per move (μseconds)

Mea

n su

bopt

imal

ity (

%)

D LRTA* (9)kNN LRTA* (60000)kNN LRTA* (80000)TBA* (50)TBA* (100)

Figure 12: Suboptimality vs. time per move: all algorithms (left), high-performance region (right).

Algorithm Mean time per move (microseconds) Solution suboptimality (%)

kNN LRTA*(10000) 7.56 6851.62kNN LRTA*(40000) 6.88 620.63kNN LRTA*(60000) 6.40 12.77kNN LRTA*(80000) 6.55 11.96

D LRTA*(12) 3.73 15999.23D LRTA*(11) 3.93 8497.09D LRTA*(10) 4.26 6831.74D LRTA*(9) 3.94 819.72

TBA*(5) 14.31 1504.54TBA*(10) 26.34 666.50TBA*(50) 83.31 131.12TBA*(100) 117.52 64.66

A* 208.03 0

Table 1: Suboptimality versus time per move.

it spends most of it time during the first move when it computes the entire path. Subsequent movesrequire a trivial computation. In the table, we define A*’s mean time per move as the total planningtime for a problem divided by the number of moves in the path A* finds. We average this quantityover all problems. kNN LRTA* is about 30 times faster than A* per move.

Note that kNN LRTA*’s time per move decreases with larger databases. This is intuitive as withmore database records there is a higher probability that an earlier record on the short list of recordsfor whom the reachability checks are run will pass the checks (line 4 in Figure 8). Consequently,no further time-consuming reachability checks will be administered in the function selectSubgoals,saving time per move. These time savings, resulting from a larger database, outweigh the extra timespent traversing a correspondingly larger kd-tree to form the short list of most similar records. Thisfact indicates that the kd-tree approach scales well with the database size.

289

Page 22: Case-Based Subgoaling in Real-Time Heuristic Search for Video … · 2011. 8. 3. · CASE-BASED SUBGOALING INREAL-TIME HEURISTIC SEARCH 2009) in the following ways. We store multiple

BULITKO, BJORNSSON, & LAWRENCE

0 50 100 1500

2

4

6

8

10x 10

4

Mean precomputation time (hours per map)

Mea

n su

bopt

imal

ity (

%)

D LRTA*kNN LRTA*

0 50 100 1500

200

400

600

800

1000

Mean precomputation time (hours per map)

Mea

n su

bopt

imal

ity (

%)

D LRTA* (9)kNN LRTA* (40000)kNN LRTA* (60000)kNN LRTA* (80000)

Figure 13: Suboptimality versus database pre-computation time per map. Left: all pre-computingalgorithms. Right: a high-performance subplot.

Algorithm Pre-computation time per map (hours) Solution suboptimality (%)

kNN LRTA*(10000) 13.10 6851.62kNN LRTA*(40000) 51.89 620.63kNN LRTA*(60000) 77.30 12.77kNN LRTA*(80000) 103.09 11.96

D LRTA*(12) 0.25 15999.23D LRTA*(11) 1.57 8497.09D LRTA*(10) 11.95 6831.74D LRTA*(9) 89.88 819.72

Table 2: Suboptimality versus database pre-computation time.

8.4 Database Pre-computation Time

Suboptimality versus database pre-computation time is shown in Figure 13. The left subplot demon-strates all parametrizations of D LRTA* and kNN LRTA* while the right plot focuses on betterperforming configurations. Table 2 shows the individual values.

kNN LRTA* has three advantages over D LRTA*. First, kNN LRTA* with 40 and 60 thou-sand records easily dominates D LRTA*(9): it has better suboptimality while requiring less pre-computation time. kNN LRTA*(80000) is an overkill for these maps as it does not improve subop-timality by much (11.96% versus 12.77% achievable with 60000 records) while having the longestprecomputation time.

Second, the database computation can be parallelized more easily in the case of kNN LRTA* asthe individual records are completely independent of each other. This is not the case with D LRTA*.Additionally, D LRTA* requires building a map abstraction which is more complex to do in parallel.

Third, the number of records in the kNN LRTA* database can be controlled much more easilythan in that of D LRTA*. Specifically, in D LRTA* one controls the level of abstraction. Thenumber Sa of abstract states at abstraction level a is approximately S

αa where S is the number of

290

Page 23: Case-Based Subgoaling in Real-Time Heuristic Search for Video … · 2011. 8. 3. · CASE-BASED SUBGOALING INREAL-TIME HEURISTIC SEARCH 2009) in the following ways. We store multiple

CASE-BASED SUBGOALING IN REAL-TIME HEURISTIC SEARCH

original non-abstract states and α is a constant reduction factor (Bulitko et al., 2007). The numberNa of records in D LRTA* database is Sa(Sa − 1). Thus, the ratio between Na and Na−1 is:

Na−1

Na=

Sa−1(Sa−1 − 1)

Sa(Sa − 1)=

Sαa−1

(S

αa−1 − 1)

Sαa

(Sαa − 1

) =S − αa−1

S − αaα2 = Ω(α2).

Thus by decreasing the level of abstraction by one, D LRTA* database size grows at leastquadratically in α. On our maps, clique abstraction has α of approximately 3 which means thatthere is nearly an order of magnitude in database size (and pre-computation time) when we go downby one level of abstraction. To illustrate, building a database for D LRTA*(8) is estimated to takeover 800 CPU-hours. On the other hand, the number of records in the kNN LRTA* database is auser-specified parameter, affording a much greater control.

Of a particular interest is the pair kNN LRTA* with a database of 10000 and D LRTA* withabstraction level 10 as they perform closely in both measures. We discuss differences in theirdatabase sizes in the next section.

8.5 Database Size

Memory is at premium in video games, especially on consoles. TBA* space complexity comesfrom its open and closed list which it builds on-line. kNN LRTA* and D LRTA* expand only asingle state (the agent’s current state) and thus have the closed list of one state and the open list ofat most eight states (as any grid cell in our maps has at most eight neighbors). However, these twoalgorithms consume memory as they store updated heuristic values. Additionally, they store theirsubgoal databases. In this section we focus on the database size. The next section will cover thetotal memory consumed on-line: open and closed lists as well as the updated heuristic values.

Each D LRTA* database record stores exactly three states. kNN LRTA* records have two ormore states each and the number of records is fixed by the algorithm parameter. Additionally, kNNLRTA* stores start and end states of each record in a kd-tree. We define relative database size asthe ratio of the total number of states stored in all records to the total number of map grid cells.

In addition to subgoal records, D LRTA* databases contain explicit region assignment for eachstate. Consequently, D LRTA* databases have a relative size of at least 1. This extra storage is amajor weakness of D LRTA* in comparison to kNN LRTA*. To illustrate, as in our implementationwe use 32 bits to index states, storing region assignment for each grid cell translates to an averageof about 84 megabytes per map. Full results are found in Figure 14 and Table 3.

Algorithm Pre-computation time Records Relative size Size (megabytes)

kNN LRTA*(10000) 13.10 10000 0.00308 0.25kNN LRTA*(40000) 51.89 40000 0.01234 1.00kNN LRTA*(60000) 77.30 60000 0.01851 1.51kNN LRTA*(80000) 103.09 80000 0.02468 2.01

D LRTA*(12) 0.25 251.5 1.00001 84.96D LRTA*(11) 1.57 1896.5 1.00009 84.97D LRTA*(10) 11.95 14872.0 1.00068 85.02D LRTA*(9) 89.88 116048.5 1.00532 85.40

Table 3: Database statistics. All values are averages per map. Pre-computation time is in hours.

291

Page 24: Case-Based Subgoaling in Real-Time Heuristic Search for Video … · 2011. 8. 3. · CASE-BASED SUBGOALING INREAL-TIME HEURISTIC SEARCH 2009) in the following ways. We store multiple

BULITKO, BJORNSSON, & LAWRENCE

0 0.5 1 1.50

2

4

6

8

10x 10

4

Mean relative database size (per map)

Mea

n su

bopt

imal

ity (

%)

D LRTA*kNN LRTA*

0 0.5 1 1.50

200

400

600

800

1000

Mean relative database size (per map)

Mea

n su

bopt

imal

ity (

%)

D LRTA* (9)kNN LRTA* (60000)kNN LRTA* (80000)

Figure 14: Suboptimality vs. database size: all algorithms (left), high-performance region (right).

Again, kNN LRTA* dominates D LRTA* by requiring much less memory and, at the same time,producing solutions of better quality. For instance, kNN LRTA*(60000) requires approximately 57times less database memory than D LRTA*(9) while simultaneously producing solutions that areabout eight times better.

Let us now re-visit the interesting case of kNN LRTA*(10000) and D LRTA*(10) which closelymatch each other with respect to database pre-computation time and solution suboptimality. As Ta-ble 3 reveals, kNN LRTA*(10000) uses approximately 340 times less memory than D LRTA*(10):about 256 kilobytes versus 85.02 megabytes.

Note that D LRTA*(10) averages approximately 49% more records per map than kNNLRTA*(10000) but requires approximately 9% less pre-computation time. This is because (i) DLRTA* averages fewer subgoals per record than kNN LRTA* and (ii) computing D LRTA* sub-goals does not require reachability checks — a time-consuming process.

Also note that despite having 49% more records, D LRTA* affords only a 0.3% improvement insolution quality over kNN LRTA*. More generally, additional experiments have demonstrated thatkNN LRTA* tends to outperform D LRTA* in solution quality given the same number of records.There are two factors in play here. First, kNN LRTA* records often contain several subgoals thatare guaranteed to be reachable from each other without scrubbing. D LRTA* records offers no suchguarantees and their only subgoal may be difficult to reach from the agent’s start state when abstractregions become large and complex. On the upside, D LRTA* spaces its records in a systematicfashion — one record per each pair of regions — thereby providing a potentially better coverage thancan be afforded by randomly selected starts and ends of kNN LRTA* database records. It appearsthat the former factor overcomes the latter, leading to kNN LRTA*’s better per-record suboptimality.

8.6 On-line Space Complexity

We will first analyze specifically the amount of memory allocated by the algorithms on-line. Whenan algorithm solves a particular problem, we record the maximum size of its open and closed listsas well as the total number of states whose heuristic values were updated. We count each updatedheuristic value as one state in terms of storage required.5 Adding these three measures together, we

5. Multiple heuristic updates in the same state do not increase the amount of storage.

292

Page 25: Case-Based Subgoaling in Real-Time Heuristic Search for Video … · 2011. 8. 3. · CASE-BASED SUBGOALING INREAL-TIME HEURISTIC SEARCH 2009) in the following ways. We store multiple

CASE-BASED SUBGOALING IN REAL-TIME HEURISTIC SEARCH

Algorithm Strictly on-line memory (Kbytes) Solution suboptimality (%)

kNN LRTA*(10000) 8.62 6851.62kNN LRTA*(40000) 5.04 620.63kNN LRTA*(60000) 4.23 12.77kNN LRTA*(80000) 4.22 11.96

D LRTA*(12) 18.76 15999.23D LRTA*(11) 11.09 8497.09D LRTA*(10) 8.24 6831.74D LRTA*(9) 3.04 819.72

TBA*(5) 1353.94 1504.54TBA*(10) 1353.94 666.50TBA*(50) 1353.94 83.31TBA*(100) 1353.94 64.66

A* 1353.94 0

Table 4: Strictly on-line memory versus solution suboptimality.

record the amount of strictly on-line memory per problem. Averaging the strictly on-line memoryover all problems, we list the results in Table 4.

kNN LRTA* dominates all D LRTA* points except D LRTA*(9) which has the lowest meanstrictly on-line memory of 3.04 Kbytes per problem. TBA*, being effectively a time-sliced A*,does not update heuristic values at all. However, its open and closed lists contribute to the highestmemory consumption at 1353.94 Kbytes. This is intuitive as TBA* does not use subgoals and there-fore must “fill in” potentially large heuristic depressions with its open and closed lists. Also, noticethat the total size of these lists does not change with the cutoff as state expansions are independentof agent’s moves in TBA*. A* has identical memory consumption as it expands states in the sameway as TBA*. Again, kNN LRTA* dominates TBA* for all cutoff values, using less memory andproducing better solutions.

Strictly on-line memory gives an insight into the algorithms but does not present a complete pic-ture. Specifically, D LRTA* and kNN LRTA* must load their databases into their on-line memory.Thus we define the cumulative on-line memory as the strictly on-line memory plus the size of thedatabase loaded. The values are found in Figure 15 and Table 5.

Several observations are due. First, TBA* is no longer dominated due to its low memory con-sumption. Second, D LRTA* is in its own league due to explicitly labelling every state with acorresponding region as well as computing subgoals for all pairs of regions. Third, D LRTA* has asweet spot in its memory consumption which corresponds to the abstraction level of 11. A higherlevel of abstraction reduces the database size but not enough to compensate for more updated heuris-tic values. Lower abstraction levels reduce the amount of learning but not enough to compensatefor large number of subgoals in the database.

293

Page 26: Case-Based Subgoaling in Real-Time Heuristic Search for Video … · 2011. 8. 3. · CASE-BASED SUBGOALING INREAL-TIME HEURISTIC SEARCH 2009) in the following ways. We store multiple

BULITKO, BJORNSSON, & LAWRENCE

0 2 4 6 8 10

x 104

0

2

4

6

8

10x 10

4

Mean cumulative on−line memory (Kbytes)

Mea

n su

bopt

imal

ity (

%)

D LRTA*kNN LRTA*TBA*

0 500 1000 1500 2000 25000

50

100

150

Mean cumulative on−line memory (Kbytes)

Mea

n su

bopt

imal

ity (

%)

kNN LRTA* (60000)kNN LRTA* (80000)TBA* (50)TBA* (100)

Figure 15: Suboptimality versus cumulative on-line memory. Left: all algorithms. Right: a high-performance subplot.

Algorithm Cumulative on-line memory (Kbytes) Solution suboptimality (%)

kNN LRTA*(10000) 265.65 6851.62kNN LRTA*(40000) 1034.08 620.63kNN LRTA*(60000) 1547.85 12.77kNN LRTA*(80000) 2062.20 11.96

D LRTA*(12) 87019.74 15999.23D LRTA*(11) 87018.50 8497.09D LRTA*(10) 87066.34 6831.74D LRTA*(9) 87456.35 819.72

TBA*(5) 1353.94 1504.54TBA*(10) 1353.94 666.50TBA*(50) 1353.94 83.31

TBA*(100) 1353.94 64.66

A* 1353.94 0

Table 5: Solution suboptimality versus cumulative on-line memory.

8.7 Simultaneous Pathfinding by Multiple Agents

If only a single agent is pathfinding at a time, the analysis above holds and TBA* is the mostmemory efficient choice. However, in most video games, anywhere from half a dozen to a thousandagents (e.g., Gas Powered Games, 2007) can be pathfinding simultaneously on the same map.

Such a scenario favors kNN LRTA* and D LRTA* whose subgoal databases are map-specificbut independent of start and goal states. Consequently, multiple agents running D LRTA* and kNNLRTA* can all share the same subgoal database.6 In contrast, all memory consumed by TBA* isspecific to a given agent and cannot be shared with other agents operating on the same map.

6. Note that such multiple agents cannot, generally speaking, share their heuristic h as it is computed and updated withrespect to different goals.

294

Page 27: Case-Based Subgoaling in Real-Time Heuristic Search for Video … · 2011. 8. 3. · CASE-BASED SUBGOALING INREAL-TIME HEURISTIC SEARCH 2009) in the following ways. We store multiple

CASE-BASED SUBGOALING IN REAL-TIME HEURISTIC SEARCH

Hence the total amount of cumulative on-line memory for K agents operating simultaneouslyequals the amount of the database memory plus K times the amount of strictly on-line memory. Abreak-even point for an algorithm A with respect to an algorithm B is then defined as the minimalnumber of agents using A that collectively consume less memory than the same number of agentsusing B. Table 6 lists the break-even points for D LRTA* and kNN LRTA* with respect to TBA*.

Algorithm Break-even point with respect to TBA* (number of agents)

kNN LRTA*(10000) 1kNN LRTA*(40000) 1kNN LRTA*(60000) 2kNN LRTA*(80000) 2

D LRTA*(9) 65D LRTA*(10) 65D LRTA*(11) 66D LRTA*(12) 66

Table 6: Break-even points for kNN LRTA* and D LRTA* with respect to TBA*.

kNN LRTA* with ten and forty thousand record databases requires less cumulative on-linememory than TBA* and hence the break-even point is just one agent. For sixty and eighty thousandrecords, two kNN LRTA* agents take less total cumulative on-line memory than two TBA* agents.It takes 65 to 66 simultaneously pathfinding agents to amortize the large D LRTA* databases andgain a memory advantage over TBA*.

9. Discussion

This is the first time the high-performance real-time search algorithms TBA*, D LRTA* and kNNLRTA* were evaluated on contemporarily sized maps. The results, presented in detail in the previ-ous section, are summarized for representative algorithms in Table 7.

Dimension kNN LRTA* versus other algorithms

Suboptimality kNN LRTA* is 8.16 times better than D LRTA* and 1.46 times better than TBA*Time per move kNN LRTA* is 18.36 times better than TBA* and 62% worse than D LRTA*

Cumulative memory kNN LRTA* is 57 times better than D LRTA* and 13% times worse than TBA*Break-even point kNN LRTA* takes less memory than TBA* with two or more agents

Pre-computation time kNN LRTA* is 14% better than D LRTA*

Table 7: Comparisons of kNN LRTA*(60000) to D LRTA*(9) and TBA*(100).

kNN LRTA* achieves the best suboptimality of the three algorithms. kNN LRTA* is substan-tially faster per move than TBA* and is on par with D LRTA*. In terms of cumulative on-linememory, kNN LRTA* outperforms D LRTA* by two orders of magnitude and is only 13% worsethan TBA*. Furthermore, with two or more simultaneously planning agents, kNN LRTA* takes lessmemory than TBA*. In contrast, it takes 65 or more D LRTA* agents to amortize its database andgain a memory advantage over TBA*. Off-line, kNN LRTA* outperforms D LRTA* by achieving

295

Page 28: Case-Based Subgoaling in Real-Time Heuristic Search for Video … · 2011. 8. 3. · CASE-BASED SUBGOALING INREAL-TIME HEURISTIC SEARCH 2009) in the following ways. We store multiple

BULITKO, BJORNSSON, & LAWRENCE

an order of magnitude better solutions with a database two orders of magnitude smaller in size andslightly faster to compute.

While the results of comparisons to TBA* were expected as TBA* does not benefit from pre-computation, the comparison between kNN LRTA* and D LRTA* unearthed unexpected results.Specifically, subgoal databases of kNN LRTA* are a more effective use of pre-computation timeand memory than those of D LRTA*. The lower memory consumption with kNN LRTA* databasesis achieved by not having to store explicit region membership. Better pre-computation times comefrom not having to compute shortest paths for all pairs of abstract times. Finally, kNN LRTA* isbetter than D LRTA* on a per record basis. This is because compressing an entire optimal pathinto a series of subgoals reachable from each other via hill-climbing guarantees that once a singlesubgoal is reached, the underlying LRTA* agent will reach its global goal without scrubbing. Incontrast, D LRTA* subgoals can be difficult to reach from the agent’s current position. And evenwhen reached, the difficulties can recur with the subsequent subgoals.

In terms of applications, kNN LRTA* can be the algorithm of choice to use in video gamepathfinding. For instance, kNN LRTA*(60000) is over 30 times faster per average move than com-monly used A* and produces solutions which are less than 15% suboptimal. This performancecomes at the cost of about 77 hours of pre-computation time per map which can be easily reducedto under 10 hours on a modern eight-core workstation. This is negligible comparing to the amountof time a game company spends hand-crafting a single map.

10. Current Shortcomings and Future Work

Despite outperforming existing state-of-the-art real-time search algorithms on our problems overall,kNN LRTA* has several shortcomings. First, the database records are generated off randomlyselected start and end states. This means that the coverage of the space is not necessarily even:some small, but difficult to reach, regions of the space may never get a suitable record while someeasy to reach regions may get multiple redundant records covering it.

Increasing database efficiency would allow a smaller database to afford an equal coverage andhence an equal on-line performance. This will in turn reduce the pre-computation time of kNNLRTA* database which can presently reach over 100 hours per map. While the computation canbe sped up at a nearly linear scale by using multi-core processors and the time is affordable on thegame company side, most players would want their home-made game maps to be processed in amatter of seconds or minutes.

Making the subgoal coverage more uniform can be accomplished via forgoing random start andend selection in favor of space partitioning. However, unlike D LRTA*’s abstract regions built viarepeated applications of the clique abstraction, such partitions should have all their states reachablefrom each other by LRTA* without scrubbing. Then start and end states for the database records canbe selected within these partitions. To reduce the amount of pre-computation one can then computesubgoals by compressing optimal paths only between neighboring regions (as opposed to all distinctabstract regions as D LRTA* does). Note that unlike D LRTA*, the partitioning is necessarily onlyoff-line and no explicit region assignment is stored for every state. As a result, the on-line memoryconsumption will be comparable or better than the existing kNN LRTA*.

A philosophically oriented project would be to develop a self-aware agent. Specifically, suchan agent would analyze the performance of its core algorithm (e.g., LRTA*) and decide on the

296

Page 29: Case-Based Subgoaling in Real-Time Heuristic Search for Video … · 2011. 8. 3. · CASE-BASED SUBGOALING INREAL-TIME HEURISTIC SEARCH 2009) in the following ways. We store multiple

CASE-BASED SUBGOALING IN REAL-TIME HEURISTIC SEARCH

appropriate partitioning scheme. A similar meta-level control has been previously attempted fordynamic selection of lookahead depth in real-time search (e.g., Russell & Wefald, 1991).

11. Beyond Grid Pathfinding

We presented and evaluated kNN LRTA* for grid-based pathfinding. Formally, the algorithm, withthe exception of its kd-tree module, is applicable to arbitrary weighted graphs that satisfy the con-straints at the beginning of Section 2. In principle, it should be applicable to general planningusing the ideas from search-based planners ASP (Bonet, Loerincs, & Geffner, 1997), the HSP-family (Bonet & Geffner, 2001), FF (Hoffmann, 2000), SHERPA (Koenig, Furcy, & Bauer, 2002)and LDFS (Bonet & Geffner, 2006).

As we described earlier in the paper, using the kd-tree index requires a certain correspondencebetween coordinate similarity and heuristic distance. Extending kd-trees or developing appropriatenew index structures for an arbitrary graph is an open research question. An interim solution is toapply kNN LRTA* to arbitrary search problems without its kd-tree module. Doing so will, however,slow down the on-line part because similarity must be computed between agent’s current situationand every single record in the database. On the positive side, not computing kd-trees will speed upthe off-line part of kNN LRTA*.

Finally, while kNN LRTA* is theoretically applicable to arbitrary search problems, it is notclear how well it will perform there with respect to its competitors D LRTA* and TBA*. Such aninvestigation is left for future work.

12. Conclusions

In this paper we considered the problem of real-time heuristic search whose planning time per movedoes not depend on the number of states. We proposed a new mechanism for selecting subgoalsautomatically. The resulting algorithm was shown to be theoretically complete and, on large videogame maps, substantially outperformed the previous state-of-the-art algorithms D LRTA* and TBA*along several important performance measures.

Acknowledgments

This research was supported by grants from the National Science and Engineering Research Councilof Canada (NSERC); Icelandic Centre for Research (RANNIS); and by a Marie Curie Fellowshipof the European Community programme Structuring the ERA under contract number MIRG-CT-2005-017284. We appreciate help of Josh Sterling, Stephen Hladky and Daniel Huntley.

References

Anwar, M. A., & Yoshida, T. (2001). Integrating OO road network database, cases and knowledgefor route finding. In ACM Symposium on Applied Computing (SAC), pp. 215–219. ACM.

Barto, A. G., Bradtke, S. J., & Singh, S. P. (1995). Learning to act using real-time dynamic pro-gramming. Artificial Intelligence, 72(1), 81–138.

BioWare Corp. (1998). Baldur’s Gate., Published by Interplay, http://www.bioware.com/bgate/,November 30, 1998.

297

Page 30: Case-Based Subgoaling in Real-Time Heuristic Search for Video … · 2011. 8. 3. · CASE-BASED SUBGOALING INREAL-TIME HEURISTIC SEARCH 2009) in the following ways. We store multiple

BULITKO, BJORNSSON, & LAWRENCE

Bjornsson, Y., Bulitko, V., & Sturtevant, N. (2009). TBA*: Time-bounded A*. In Proceedings ofthe International Joint Conference on Artificial Intelligence (IJCAI), pp. 431 – 436, Pasadena,California. AAAI Press.

Bjornsson, Y., & Halldorsson, K. (2006). Improved heuristics for optimal path-finding on gamemaps. In Laird, J. E., & Schaeffer, J. (Eds.), Proceedings of the Second Artificial Intelligenceand Interactive Digital Entertainment Conference (AIIDE), June 20-23, 2006, Marina delRey, California, pp. 9–14. The AAAI Press.

Blizzard Entertainment (2002). Warcraft III: Reign of chaos., Published by Blizzard Entertainment,http://www.blizzard.com/war3, July 3, 2002.

Bonet, B., & Geffner, H. (2001). Planning as heuristic search. Artificial Intelligence, 129(1–2),5–33.

Bonet, B., & Geffner, H. (2006). Learning depth-first search: A unified approach to heuristic searchin deterministic and non-deterministic settings, and its application to MDPs. In Proceedingsof the International Conference on Automated Planning and Scheduling (ICAPS), pp. 142–151, Cumbria, UK.

Bonet, B., Loerincs, G., & Geffner, H. (1997). A fast and robust action selection mechanism forplanning. In Proceedings of the National Conference on Artificial Intelligence (AAAI), pp.714–719, Providence, Rhode Island. AAAI Press / MIT Press.

Branting, K., & Aha, D. W. (1995). Stratified case-based reasoning: Reusing hierarchical problemsolving episodes. In Proceedings of the International Joint Conference on Artificial Intelli-gence (IJCAI), pp. 384–390.

Bulitko, V. (2004). Learning for adaptive real-time search. Tech. rep.http://arxiv.org/abs/cs.AI/0407016, Computer Science Research Repository (CoRR).

Bulitko, V., & Bjornsson, Y. (2009). kNN LRTA*: Simple subgoaling for real-time search. InProceedings of Artificial Intelligence and Interactive Digital Entertainment (AIIDE), pp. 2–7,Stanford, California. AAAI Press.

Bulitko, V., Bjornsson, Y., Lustrek, M., Schaeffer, J., & Sigmundarson, S. (2007). Dynamic Con-trol in Path-Planning with Real-Time Heuristic Search. In Proceedings of the InternationalConference on Automated Planning and Scheduling (ICAPS), pp. 49–56, Providence, RI.

Bulitko, V., & Lee, G. (2006). Learning in real time search: A unifying framework. Journal ofArtificial Intelligence Research (JAIR), 25, 119–157.

Bulitko, V., Lustrek, M., Schaeffer, J., Bjornsson, Y., & Sigmundarson, S. (2008). Dynamic controlin real-time heuristic search. Journal of Artificial Intelligence Research (JAIR), 32, 419 – 452.

Bulitko, V., Sturtevant, N., Lu, J., & Yau, T. (2007). Graph abstraction in real-time heuristic search.Journal of Artificial Intelligence Research (JAIR), 30, 51–100.

Carbonell, J. G., Knoblock, C., & Minton, S. (1990). Prodigy: An integrated architecture for plan-ning and learning. In Lehn, K. V. (Ed.), Architectures for Intelligence. Lawrence ErlbaumAssociates.

Cazenave, T. (2006). Optimizations of data structures, heuristics and algorithms for path-findingon maps. In Louis, S. J., & Kendall, G. (Eds.), Proceedings of the 2006 IEEE Symposium

298

Page 31: Case-Based Subgoaling in Real-Time Heuristic Search for Video … · 2011. 8. 3. · CASE-BASED SUBGOALING INREAL-TIME HEURISTIC SEARCH 2009) in the following ways. We store multiple

CASE-BASED SUBGOALING IN REAL-TIME HEURISTIC SEARCH

on Computational Intelligence and Games (CIG06), University of Nevada, Reno, campus inReno/Lake Tahoe, 22-24 May, 2006, pp. 27–33. IEEE.

Culberson, J., & Schaeffer, J. (1998). Pattern Databases. Computational Intelligence, 14(3), 318–334.

Furcy, D., & Koenig, S. (2000). Speeding up the convergence of real-time search. In Proceedingsof the National Conference on Artificial Intelligence (AAAI), pp. 891–897.

Gas Powered Games (2007). Supreme Commander., Published by THQ,http://www.supremecommander.com/, February 20, 2007.

Geisberger, R., Sanders, P., Schultes, D., & Delling, D. (2008). Contraction hierarchies: Faster andsimpler hierarchical routing in road networks. In McGeoch, C. C. (Ed.), WEA, Vol. 5038 ofLecture Notes in Computer Science, pp. 319–333. Springer.

Haigh, K., & Veloso, M. (1993). Combining search and analogical reasoning in path planning fromroad maps. In Proceedings of the AAAI-93 Workshop on Case-Based Reasoning, pp. 79–85,Washington, DC. AAAI. AAAI Press technical report WS-93-01.

Hart, P., Nilsson, N., & Raphael, B. (1968). A formal basis for the heuristic determination ofminimum cost paths. IEEE Transactions on Systems Science and Cybernetics, 4(2), 100–107.

Hernandez, C., & Meseguer, P. (2005a). Improving convergence of LRTA*(k). In Proceedings ofthe International Joint Conference on Artificial Intelligence (IJCAI), Workshop on Planningand Learning in A Priori Unknown or Dynamic Domains, pp. 69–75, Edinburgh, UK.

Hernandez, C., & Meseguer, P. (2005b). LRTA*(k). In Proceedings of the International JointConference on Artificial Intelligence (IJCAI), pp. 1238–1243, Edinburgh, UK.

Hodal, J., & Dvorak, J. (2008). Using case-based reasoning for mobile robot path planning. Engi-neering Mechanics, 15, 181–191.

Hoffmann, J. (2000). A heuristic for domain independent planning and its use in an enforced hill-climbing algorithm. In Proceedings of the 12th International Symposium on Methodologiesfor Intelligent Systems (ISMIS), pp. 216–227.

Ishida, T. (1992). Moving target search with intelligence. In National Conference on ArtificialIntelligence (AAAI), pp. 525–532.

Koenig, S. (2004). A comparison of fast search methods for real-time situated agents. In Proceed-ings of Int. Joint Conf. on Autonomous Agents and Multiagent Systems, pp. 864 – 871.

Koenig, S., Furcy, D., & Bauer, C. (2002). Heuristic search-based replanning. In Proceedings of theInt. Conference on Artificial Intelligence Planning and Scheduling, pp. 294–301.

Koenig, S., & Likhachev, M. (2006). Real-time adaptive A*. In Proceedings of the InternationalJoint Conference on Autonomous Agents and Multiagent Systems (AAMAS), pp. 281–288.

Korf, R. (1985). Depth-first iterative deepening: An optimal admissible tree search. Artificial Intel-ligence, 27(3), 97–109.

Korf, R. (1990). Real-time heuristic search. Artificial Intelligence, 42(2–3), 189–211.

Likhachev, M., Ferguson, D. I., Gordon, G. J., Stentz, A., & Thrun, S. (2005). Anytime dynamicA*: An anytime, replanning algorithm. In ICAPS, pp. 262–271.

299

Page 32: Case-Based Subgoaling in Real-Time Heuristic Search for Video … · 2011. 8. 3. · CASE-BASED SUBGOALING INREAL-TIME HEURISTIC SEARCH 2009) in the following ways. We store multiple

BULITKO, BJORNSSON, & LAWRENCE

Likhachev, M., Gordon, G. J., & Thrun, S. (2004). ARA*: Anytime A* with provable bounds onsub-optimality. In Thrun, S., Saul, L., & Scholkopf, B. (Eds.), Advances in Neural Informa-tion Processing Systems 16. MIT Press, Cambridge, MA.

Lustrek, M., & Bulitko, V. (2006). Lookahead pathology in real-time path-finding. In Proceedings ofthe National Conference on Artificial Intelligence (AAAI), Workshop on Learning For Search,pp. 108–114, Boston, Massachusetts.

Moore, A. (1991). Efficient Memory-based Learning for Robot Control. Ph.D. thesis, University ofCambridge.

Nebel, B., & Koehler, J. (1995). Plan reuse versus plan generation: A theoretical and empiricalanalysis. Artificial Intelligence, 76, 427–454.

Rayner, D. C., Davison, K., Bulitko, V., Anderson, K., & Lu, J. (2007). Real-time heuristic searchwith a priority queue. In Proceedings of the International Joint Conference on ArtificialIntelligence (IJCAI), pp. 2372–2377, Hyderabad, India.

Russell, S., & Wefald, E. (1991). Do the right thing: Studies in limited rationality. MIT Press.

Shimbo, M., & Ishida, T. (2003). Controlling the learning process of real-time heuristic search.Artificial Intelligence, 146(1), 1–41.

Shue, L.-Y., Li, S.-T., & Zamani, R. (2001). An intelligent heuristic algorithm for project schedulingproblems. In Proceedings of the 32nd Annual Meeting of the Decision Sciences Institute, SanFrancisco.

Shue, L.-Y., & Zamani, R. (1993). An admissible heuristic search algorithm. In Proceedings of the7th International Symposium on Methodologies for Intelligent Systems (ISMIS-93), Vol. 689of LNAI, pp. 69–75.

Sigmundarson, S., & Bjornsson, Y. (2006). Value Back-Propagation vs. Backtracking in Real-Time Search. In Proceedings of the National Conference on Artificial Intelligence (AAAI),Workshop on Learning For Search, pp. 136–141, Boston, Massachusetts, USA.

Stenz, A. (1995). The focussed D* algorithm for real-time replanning. In Proceedings of theInternational Joint Conference on Artificial Intelligence (IJCAI), pp. 1652–1659.

Sturtevant, N. (2007). Memory-efficient abstractions for pathfinding. In Proceedings of the thirdconference on Artificial Intelligence and Interactive Digital Entertainment, pp. 31–36, Stan-ford, California.

Sturtevant, N., & Buro, M. (2005). Partial pathfinding using map abstraction and refinement. InProceedings of the National Conference on Artificial Intelligence (AAAI), pp. 1392–1397,Pittsburgh, Pennsylvania.

Sturtevant, N. R., Felner, A., Barrer, M., Schaeffer, J., & Burch, N. (2009). Memory-based heuristicsfor explicit state spaces. In Boutilier, C. (Ed.), IJCAI 2009, Proceedings of the 21st Interna-tional Joint Conference on Artificial Intelligence, Pasadena, California, USA, July 11-17,2009, pp. 609–614.

Valve Corporation (2004). Counter-Strike: Source., Published by Valve Corporation,http://store.steampowered.com/app/240/, October 7, 2004.

Weng, M., Wei, X., Qu, R., & Cai, Z. (2009). A path planning algorithm based on typical casereasoning. Geo-spatial Information Science, 12, 66–71.

300