development of artiﬁcial intelligence systems for stealth ...ei08034/mieic-en.pdfabstract stealth...

FACULDADE DE ENGENHARIA DA UNIVERSIDADE DO PORTO

Development of Artificial IntelligenceSystems for Stealth Games based on the

Monte Carlo Method

Diogo Albuquerque Valente Silva

Mestrado Integrado em Engenharia Informática e Computação

Supervisor: Eugénio da Costa Oliveira

Co-Supervisor: Pedro Gonçalo Ferreira Alves Nogueira

July 23, 2014

Development of Artificial Intelligence Systems for StealthGames based on the Monte Carlo Method

Diogo Albuquerque Valente Silva

Mestrado Integrado em Engenharia Informática e Computação

July 23, 2014

Abstract

Stealth elements have been present in the gaming industry since 1981 with the release of the firstgame that required the player to hide and walk through hiding spots, avoiding lights to achieve agoal. Stealth-based games evolved throughout the years, centred always on the player. This thesisintends to help change the paradigm of this kind of games, creating an agent that instead of actingas a reactive opponent, has the ability to plan ahead inside the game world, shaping the agent tobe the player in a stealth game. To test this agent, a simulator with procedurally generated contentstealth-centred elements was created in parallel to the agent’s development. This shift in focuswill open way for new mechanics and a new kind of meta-game, hopefully driving the creation ofa new type of games.

i

Acknowledgements

First and foremost, I want to show my gratitude to Prof. Augusto de Sousa for his continual workin keeping the great gears of MIEIC running.

I also want to express my gratitude to my supervisors, Prof. Eugénio da Costa Oliveira andMsc. Pedro Gonçalo Ferreira Alves Nogueira. Without their approval and vision, this project andthis great opportunity would never have come to be.

I’d like to thank my family for all the support given through the tough development times. Aspecial thank you to Álvaro Valente da Silva.

I need to thank all my friends for everything, but especially for the late hours of companywhile working.

I’d also like to show my gratitude to Msc. António Sérgio Ferreira, for all the wise words andencouragement without which the trek would have been much harder.

Diogo Albuquerque Valente da Silva

iii

"There is an art, or, rather, a knack to flying. The knack lies in learning how to throw yourself atthe ground and miss."

Hitchiker’s Guide to the Galaxy, Douglas Adams

v

Contents

1 Introduction 11.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Motivation and Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Structure of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 State of the Art 52.1 Intelligent Agents for Video Games . . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.1 Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.1.2 Monte-Carlo Tree Search . . . . . . . . . . . . . . . . . . . . . . . . . . 82.1.3 Movement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.1.4 Procedurally Generated Content . . . . . . . . . . . . . . . . . . . . . . 10

3 Conceptual Model 113.1 Agent Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.1.1 Goal-Oriented Action Planning . . . . . . . . . . . . . . . . . . . . . . 113.1.2 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.1.3 Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.1.4 Planner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.1.5 Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.1.6 Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.2 Simulation platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4 Implementation 154.1 Simulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.1.1 Map elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154.2 Agent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.2.1 Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.2.2 Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.2.3 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.2.4 Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.2.5 Planner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.2.6 Movement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

5 Experimental Results 235.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

vii

CONTENTS

6 Conclusions and Future Work 296.1 Goal Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

References 31

A Procedurally generated simulator 35A.0.1 Procedural map generation . . . . . . . . . . . . . . . . . . . . . . . . . 35

viii

List of Figures

2.1 Sample agent using FSMs by Fernando Bevilacqua . . . . . . . . . . . . . . . . 62.2 Example Hierarchical Task Network . . . . . . . . . . . . . . . . . . . . . . . . 72.3 Example of pathfinding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.1 Representation of a Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.2 Representation of an Action . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.3 Representation of a Planner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.4 Agent, represented by a green circle, hiding inside a barrel . . . . . . . . . . . . 14

4.1 Map generation example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174.2 Backward Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.3 Forward Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.4 MCTS. From http://mcts.ai/about/index.html . . . . . . . . . . . . . . . . . . . 21

5.1 Exploration Rate (Easy) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265.2 Exploration Rate (Hard) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

ix

LIST OF FIGURES

x

List of Tables

4.1 Container Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164.2 List of Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

5.1 Map Difficulty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245.2 Average number of detections per game . . . . . . . . . . . . . . . . . . . . . . 255.3 Average loot per game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255.4 Average number of iterations per game . . . . . . . . . . . . . . . . . . . . . . . 255.5 Maximum Completion Percentage . . . . . . . . . . . . . . . . . . . . . . . . . 275.6 Number of playouts and average duration in miliseconds . . . . . . . . . . . . . 28

xi

LIST OF TABLES

xii

Abbreviations

AI Artificial IntelligenceFSM Finite-State MachineHTN Hierarchical Task NetworkPCG Procedural Content GenerationIDA* Interactive Deepening A*MCTS Monte-Carlo Tree SearchNPC Non-Playing CharacterUCT Upper-Confidence Bounds applied to Trees

xiii

Chapter 1

Introduction

References to artificial intelligence have been around since ancient Greece, in the form of automata

with some semblance of intelligence, used by Hephaestus to help in his work. There are also refer-

ences to intelligent machines, mostly with humanoid shapes, in different civilizations throughout

history . The creation of intelligent machines is a topic that has intrigued mankind for millennia

[Neg05], from its presence in the ancient legends to its current situation. Research on AI, although

it was officially started in 1956 at a conference at Dartmouth College in Hanover, has its corner-

stones in the attempt to define intelligence and thought on the part of philosophers of antiquity.

The research carried out has led to the application of AI algorithms to solve various problems

from completely different areas with high degrees of success. AI is used today as a response to

problems that would be difficult for man to solve within the same time limit. One of the areas that

have prevalent synergy with artificial intelligence is the video game industry. The first instance of

artificial intelligence in this area appears on one of the first created games, Pong[MF09], where

when the game was only played by one player, the vertical bar that was controlled by the computer

followed a simple equation to move to the expected height the ball would arrive at.

1.1 Context

The application of AI in games was simple or non-existent in its infancy, since most games that

existed were played with two players, or had opponents whose behavior was sufficiently simple

to be defined as a series of conditions, or by a simple minimax algorithm. The simplicity of the

research was a requirement given the low computational capacity and the need for a response from

the opponent in real time. The AI in games grew with the evolution of processing power and the

changing needs of the game development, the need for smarter opponents, richer environments

created near real-time, more realistic situations and reactions from the opponents, culminating

in examples of intelligent agents such as the ones existing in Sims (Maxis,2000), where agents

have their own personalities and needs, in which to base their own decisions, in The Elder Scrolls

1

Introduction

V: Skyrim (Bethesda Softworks, 2011) where the non-player characters (NPCs) have their own

agendas that extend over weeks of play, enabling the creation of cities bursting with a life of their

own or as in F.E.A.R. (Monolith Productions, 2005), where the agents take the form of opponents

that move in a tactical way in order to effectively hinder the player’s progress. Stealth games are

games that allow, reward or even force the player to overcome their obstacles with some degree

of stealth. In this type of games, the artificial intelligence agents take the form of opponents that

actively seek the player, or patrol a given area and react if they find proof of the presence of the

player. The most common reactions of this type of agent are raising an alarm, which causes the

other opponents in the field know that there is an intruder, followed by the initiation of an active

search for the player, based on different search methods. These opponents then seek the player for

a while, and attack if they find him or give up after some time has passed.

1.2 Motivation and Goals

This project aims to create an agent that is not just reactive but it is also able to plan its steps

with precision and a high degree of stealth, capable of learning and be adaptable in order to get

a good response in a situation that has not yet seen without wasting too much computing time.

The creation of this agent will allow for a new vision of stealth gaming, expanding the paradigm

of such games in order to allow for the creation of new game modes, where the system would be

able to be the player rather than the guards opposing him. The creation of this agent may enable

the creation of different modes of cooperative play, creating the opportunity for a player to be

one of the opponents in this type of game, the creation of new play testing and game balancing

techniques, new obstacles in such games and the possibility of creating procedurally generated

content for stealth games. This requires the possibility for the agent to create counter strategies

to respond to external stimuli. When the project is concluded, we intend to have created a high-

level simulator for testing and training the agent in different situations and maps, that procedurally

generates a map for stealth games, which in turn will require the creation of abstractions of some

key concepts in stealth games. The second and main objective of this thesis is to create an artificial

intelligent agent that is able to:

• Create a plan based on incomplete world information received through its sensors that allows

the agent to achieve a goal in a realistic and near-optimal fashion.

• Learn through a memory of past plans that achieved set goals and recall those plans when a

similar situation arises.

• Move through any map stealthily, avoiding detection and creating an escape plan if neces-

sary.

2

Introduction

1.3 Structure of the Thesis

In the next section of this thesis, we will explore the area of Artificial Intelligence with a special

focus on agents in video games and Procedural Content Generation. After that, a conceptual model

of the AI system will be discussed, detailing the architecture used. Then the agent and details

about its implementation and the testing platform will be presented, followed by a chapter where

the results are discussed. The document ends with an assessment of the objectives accomplished

in this thesis and possibilities for future improvement.

3

Introduction

4

Chapter 2

State of the Art

Intelligent agents have been present in AI since the late 1980s[WJ95]. There are several definitions

of agents, but they seem to concur in a few basic properties: an agent must have some measure

of autonomy, or be able to control its own actions without the intervention of humans, it can have

social abilities, or be able to communicate and interact with other agents, it must be able to react, as

in perceive its environment and react in a way that makes sense according to the changes it senses

and it must be able to be proactive, or act toward a defined goal instead of simply reacting to the

environment. There are some other attributes that are considered to be important to agents, such

as the addition of human properties to the agent, such as emotions, knowledge, beliefs, intentions

[WJ95].

2.1 Intelligent Agents for Video Games

There are various different approaches when constructing agents, or different architectures. De-

liberative architectures [Bro91], which are defined as architectures that contains an explicit model

of the world and where the agent makes its decisions via logical reasoning, logic which is world

dependent[TRR+13]. These architectures create two interesting problems, the representation of

the world that is given to the agent must be accurate and created in a timely fashion and the infor-

mation inside that world that represents its knowledge and the properties of more complex entities

has to be represented in a way that makes it possible for the agent to process and react in time. A

good example of this kind of architecture are planning agents, who receive a symbolic representa-

tion of the world and of a goal and attempt to find a sequence of actions that leads to the fulfilment

of that goal [FN72]. One of the first examples was the planning system STRIPS.

There are also reactive architectures, defined as architectures that do not need or include any

symbolic representation of the world. These were the basis for behaviour-based robots [Bro91]

in which the key aspects were situatedness where the robots (agents) did not have to deal with

abstract descriptions of the world, embodiment, where the actions of the agents in the world have

5

State of the Art

instantaneous feedback, intelligence, where there are a list of factors that influence the actions the

robot takes and not just the computational engine and emergence where the intelligence comes

from the interaction of the robot and the world.

The third kind of architectures usable to construct agents are hybrid architectures, that contain

parts of both of the architectures described above as subsystems, such as having a deliberative

system with a symbolic world model and a system capable of making decisions and reacting to

events much like a reactive system. This introduced a form of layered architectures, with each

layer dealing with information with different degrees of abstractness.

The most popular methods to build agents for games nowadays are finite state machines

(FSMs) and hierarchical finite state machines (HFSMs) . These allow for a network of states

and transitions to be created that represent plans the agents might follow given an initial state.

One problem in FSM usage is the scalability, FSMs grow to a very high complexity in larger prob-

lems [HYK10], which is reduced by the HFSMs’ capability of reducing parts of the machine to

superstates. Another is the fact that agents built based on these kinds of state machines tend to

have a robotic behaviour, which can be predicted by the player and can ruin the immersiveness in

games.

Figure 2.1: Sample agent using FSMs by Fernando Bevilacqua

2.1.1 Planning

In 1999, a planner called SHOP (Simple Hierarchical Ordered Planner [NCLMnA99]) was created

as an argument against the idea that planning in AI using total-order forward search methods

was a bad idea, because of the need for excessive backtracking. This method requires a formal

representation of the problem domain, in the shape of axioms and methods, from which is extracted

a plan . This planner is based on Hierarchical Task Networks (HTN) , which causes the domain

6

State of the Art

to be represented as a series of tasks of different types, some of which are preconditions for

performing other until a final task is completed. Such planners achieve good results in situations

where the domain can be described formally.

Figure 2.2: Example Hierarchical Task Network

In 2007 [KBK07] offline tests with an advanced version of the SHOP planning were made in

order to create plans for the daily routine of NPCs in the game The Elder Scrolls IV : Oblivion

from Bethesda Softworks. Each NPC has a group of AI packages that are used to control its

behavior. Each package describes behavior in the form of a HTN and has preconditions that must

be met for the package to be activated. Each agent has access to various information about the

state of the game at any time and these are used for the activation of packages. This approach

is more efficient for gaming environments with static than for dynamic environments, where the

second type of environment has the potential to describe a game world more realistically. In the

case of dynamic environments, the plan created would have to be sufficiently robust, which would

make it more complex to be able to cover all the possibilities or at least be able to react through

situations that were not anticipated at the time of planning .

As an alternative to this, Jeff Orkin [Ork03] developed an architecture for making decisions

oriented goals , the Goal-Oriented Action Planning (GOAP). This architecture was developed

based on goal-oriented architectures already used at the time, without a planning component,

constantly reevaluating the existing goals and choosing the most appropriate for the situation the

agents were in. In this kind of architecture each agent has only one goal active at any time and a

list of actions that can be taken and from them generates a plan that is composed of a sequence of

actions in order to achieve the chosen goal. Each goal has a condition required for completion that

is defined as a value within a domain variable. Every action has a pre-condition whose fulfillment

is required for it to take place, in the form of a key-value pair or array, an effect that describes what

happens after its completion and a cost which creates a metric through which one action might

be selected over the other[Ork06]. Two problems arise in the implementation of this system,

the first being the need of a good search algorithm to choose the actions that compose the plan

and the second being the representation of the world, whose data is needed by the planner for

7

State of the Art

the formulation of plans. The solutions presented are the use of A* as a search algorithm to

formulate the plan, regressively searching for some action that fulfills the goal, followed by the

search of an action that satisfies the pre-condition of previous action until all conditions are met.

The representation of the world is suggested to be a data structure that contains an attribute, a

value and the corresponding world entity.The advantage of this approach is there a limit on the

necessary knowledge of the agent to accomplish a given goal, which discards unnecessary data

and allows a simpler representation of the world [Ork04]. Later, in 2013, a new framework based

on GOAP was created, in order to reduce the time needed to compute a plan, even for hundreds

of agents [MSD+13]. The framework allows for naive planning, where the NPCs are able to plan

with limited knowledge of the world. It uses an informed version of A* in the planning stages,

the IDA* algorithm, using heuristics to expand the nodes in the search-tree, and uses a form of

layered planning, where the planners are nested hierarchically, with more complex actions being

higher in the chain than simpler actions. This kind of layered planning system drastically reduces

the time necessary to compute more complex plans. The framework also allows for a memory

system for the NPCs, where they recall plans if they are in similar situations, and for some degree

of personalization, making an NPC favor different decisions than other in the same circumstances,

due to their preferences.

2.1.2 Monte-Carlo Tree Search

Monte Carlo Tree Search (MCTS) is a variation on tree search algorithms that is easy to use,

domain independent, in the sense that it can be used in any game, only needing knowledge of the

legal moves available and the end conditions. The method is also able to learn, and its parameters

can be fine-tuned for different needs of results [FNO14], since it allows for good plays to be

kept and reused throughout the gameplay. These facts make this method a valuable alternative to

other search methods. The method is structured in four different stages, Selection, where a node

is selected, Expansion, where child nodes of the selected node are created, Simulation, where a

playout of the path chosen is simulated and Backpropagation where the current move sequence is

evaluated and updated.

MCTS was used in several different games, and some improvements were made to lighten

the processing load and to reduce the tree size. The usage of UCT (Upper-Confidence bounds

applied to trees) was implemented in order to add an informed way for the selection of nodes to be

explored[Lor08]. This was used mainly in board games, with the first example and most common

usage being in the Go game, but it was proved to be effective in any kind of game without any sort

of previous experience or training [MC10].

This method was proven to be a good alternative to standard methods such as A* and IDA*

[SWV+08] both in single and multiplayer games, getting especially good results in single-player

games with perfect information. There is also a study that applies this method to create intelligent

and adaptive opponents, where it is shown that since the performance of the algorithm is based

on computation time, the difficulty of the opponents can be adjusted by adjusting the allowed

simulation time for the method. Since MCTS is very computing intensive, it might be complicated

8

State of the Art

to use it to calculate optimal solutions online, but it is possible to train a Neural Controller with

data obtained from MCTS simulations and achieve near-optimal solutions[XYS+09].

2.1.3 Movement

Movement in videogames is a very important part that can damage the player’s immersion if not

dealed with properly. Usually pathfinding algorithms are used to calculate a path for an agent to

follow, and these algorithms can be categorized under two different types[GMS03].

• Undirected approach, where the agent does not plan its path ahead of time and instead moves

around trying to find the way to its target. Search algorithms such as Depth-first search and

Breadth-first search are often used to improve the efficiency of this method.

• Directed approach, where there is an assessment of the path beforehand. Usually, methods

that use this approach have some measure of cost or distance to estimate the value of a path

between two points. Algorithms such as Dijkstra or A* are used to find an optimal path.

Figure 2.3: Example of pathfinding

Stealthy movement is by definition movement that allows for the agent to remain undetected

by potential threats. Usually, movement in games is an application of a pathfinding algorithm, like

A*, but this kind of movement is used to find the smallest path between two or more points, with

no regard for threats other than the complete avoidance of them, assuming them as non traversable

paths. There is an alternative that explores the aesthetics of stealth in pathfinding algorithms

developed by Ron Coleman, which explores an alteration in the A* algorithm to create a stealthy

A* algorithm [Col09]. This algorithm generates its paths using a stealth effect that is added to the

weight calculation for each cell of the map. This leads to paths that favor proximity to corners

and walls, leading in turn to stealthier movement on the agent’s part. There is also a beautifying

treatment that can be applied to the path leading to smoother turns and a more realistic path.

Stealthy movement is only needed if there is a chance for the agent to be discovered, therefore

it makes sense to take into account the movement of an opponent when calculating possible paths.

Such a problem could be tackled as a zero-sum transit game between two players [VBJP10].

9

State of the Art

One would be the agent trying to traverse the region and the other would be responsible for all

the patrols in the region. The problem in this approach is that it assumes the knowledge of the

strategies of both the players, which might not be true in a multiplayer game.

Another problem in pathfinding is that in larger maps it can be a problem to store the maps

and paths in memory. Nathan Sturtevant suggests that an abstraction of the map can be used to

lighten the load on the memory at the cost of optimality [Stu07]. This approach encompasses the

possibility of the usage of dynamic maps.

2.1.4 Procedurally Generated Content

The need for Procedural Content Generation (PCG) appeared to offset the huge amount of time

necessary to create content in a game. There is a lot of documentation on procedurally generated

content, but at this moment the author is unaware of any in use specifically for stealth games.

Procedural content generation is used in many things in gaming, from items to maps and even

storyboards and rules can be procedurally generated at this moment [PCWL08]. Many different

approaches are taken in generating content, usually depending on the type of game that content is

for. Strategy games usually use search-based methods or evolutionary algorithms to generate maps

with a selected or random topology[TPY10] with more or less regard to the aesthetic aspect of the

maps [LCF13a] and map balancing[LCCFL12], using grammars with the purpose of defining

what the content of the map should be [vdLLB13] with the necessary parameters to guarantee

dynamism in multiplayer games [LCF13b]. Other games, like platformers might use a different

method, such as generating the map with automata[JYT10] or generating the map based on the

player’s performance [SW10] and/or the rhythm intended for a given level [JTSWF10].

In general there are two types of techniques for PCG, assisted techniques which require heavy

human intervention to generate content, and non-assisted that requires much less or even no human

intervention at all. Both techniques must be able to generate content appropriate to the setting.

Some techniques are based in real-life examples, such as the use of algorithms to simulate erosion

when creating continents [CBPD11]. There are other techniques using semantic descriptions in

the design of game worlds, where some rules are attributed to objects and a solver attempts to

shape and place the objects in logical places [TSBdK09].

There are also studies using genetic and evolutionary algorithms for puzzle generation [Ash10]

or map generation based on pieces of content [SP10]. The studies about genetic algorithms that

generate the maps or levels through stitching together components and pieces of the level and

evaluate them through different means (custom heuristics or even the A* algorithm to check for

traversability), crossing over the best results. Evolutionary methods generate the maps through

sets of rules or grammars, and some take the input of the player’s past experience through the

map to generate new maps. Some recent approaches even use the player’s emotions[NS13] to help

generate new experiences and new game worlds [NRON13][NARO14].

10

Chapter 3

Conceptual Model

This thesis was created to bring a new take into AI agents for stealth games in general, creating an

agent that would eventually emulate the actions of a player. There was a need to create an agent

that could traverse through any map in a stealthy way, hidden from sight, aware of its enemies and

its goals. Ideally this is represented by an agent with incomplete knowledge of the current map, as

knowledgeable as a player would be when first playing through a map on these kinds of games.

3.1 Agent Architecture

The artificial intelligence developed needed to be able to plan instead of just react to external

inputs, in order to better simulate the movements of a player inside of a stealth game. To that

purpose, the agent required an architecture that supported dynamic decision-making on-the-fly.

Planning has been proven to be a very powerful tool in creating agents for games, leading to

NPCs with routines that expanded through a 24 hour cycle [KBK07] or reactive opponents, based

on FSMs and HFSMs, whose perception triggered a set of actions [HYK10]. The behaviour we

aimed for could potentially be too complex to be defined by simple rule-based systems or FSMs.

Therefore, an architecture was chosen based on the Goal-Oriented Action Planning, a choice that

allowed to reduce the development time for the agent’s system and allows for a possible expansion

with moderate ease, as well as allowing for the Agent to better suit its actions to a dynamically

changing world.

3.1.1 Goal-Oriented Action Planning

This type of architecture was chosen over FSMs and HTNs mostly due to the time constraints for

the development of both the Agent and the simulator platform. If one would use FSMs to create

the agent, every possible scenario would have to be accounted for, which could lead to an FSM

of very high complexity. If the Agent was to be developed in HTNs, every task defined would

have to be separated in sub-tasks, each with its own priority. On one hand this appears to be a

11

Conceptual Model

solid choice for the Agent’s design, but there was a need for the Agent to be modular in nature, so

that eventually any new aspects added to the thief could require extensive code changes. GOAP

is composed of Goals, Actions, and a Planner. Its modular aspect allows for expansion through

the creation of new instances of any of its parts [Ork06]. For instance, if a need arises to create an

Agent with different priorities, new Goals can be created and used with the already implemented

Goals and it can give the Agent a totally different approach to the game than before. The Agent

developed needed not only to be able to Plan but also a way of gathering information from the

world around it and a representation of said Knowledge.

3.1.2 Goals

Goals can be diverse and can be abstract concepts as long as they can be defined by an alteration in

the knowledge or the state of the agent. The existence of the Goal concept allows for the Agent to

have a defined objective in the playthrough, which can change dynamically if any situation arises

that removes the need for the current Goal or creates the need for another.

Figure 3.1: Representation of a Goal

3.1.3 Actions

Actions can be as diverse as goals, and usually are simple steps that can be done within one

iteration of the simulation, but can also be more complex actions, such as walking through a

defined path. Actions are a crucial part of GOAP, since they the elements that compose the plan

the Agent is going to follow. Each Action has a precondition that can be used to check if the

Action is feasible and an effect that represents a change in the game world.

Figure 3.2: Representation of an Action

3.1.4 Planner

The agent also has a Planner incorporated which chooses actions in order to achieve the goal. The

Planner does this by applying a search-based algorithm to decide what Action in the search tree to

explore next. The sequence of Actions chosen is called a Plan.

12

Conceptual Model

Figure 3.3: Representation of a Planner

3.1.5 Sensors

The agent possesses sensors that represent a line of sight oriented toward where the agent is facing.

Any object in the line of sight of which the agent had no previous knowledge grants the agent

knowledge of all the possible actions related to that object.

3.1.6 Knowledge

Knowledge representation is an important area in AI research[DSS93]. The structure of this rep-

resentation can allow for an increase in the computing capability of a tree search-based algorithm,

by simplifying the way the tree is generated or how it is accessed. There is a need to create a sym-

bolic representation [Ork04] that safeguards all the important knowledge in relation to the Agent’s

needs and still simplifies in any way the data structure.

3.2 Simulation platform

There was also a need to test the agent created in an environment ready for stealth games. At the

moment of development there was a tool in Alpha stage called AI Sandbox (http://aisandbox.com,

2014), since this tool in a closed access Alpha, the choice was made to create a platform that could

be used to test the Agent and could create environments with stealth elements. For that purpose,

the simulation platform was developed with PCG elements based on stealth games. The platform

uses a method based loosely on Johnson’s [JYT10] take on the usage of cellular automata for

level generation. We aimed to create an environment approximate to that of a castle or mansion

filled with valuables for our Agent to take. The map generator follows a simple, non-assisted

13

Conceptual Model

generate and test approach to level generation, guaranteeing the whole level is traversable and

that any objective is within reach of the agent. The generation is stochastic, in a way that every

level is different than any before it, and performed in an offline fashion, during the runtime of the

simulator, generating each level before each simulation based on a number of parameters given to

the generator at the beginning of the simulation.

Certain elements are present throughout stealth games. The most common element in this type

of games is the light, or rather, the shadow that comes from its absence. In these games, the player

needs to stay away from the light and tread in darkness to avoid detection. Another very common

element in stealth games are opponents, usually in the form of guards, patrolling or static, chasing

and attacking the player if detected. Most stealth games also have some kind of hiding mechanic,

either in a bush or under a box, the player can evade opponents by taking advantage of these spots.

Figure 3.4: Agent, represented by a green circle, hiding inside a barrel

Lights are a simple element to create, their main property being the illumination they give to

nearby areas. The shadows left by the absence of light are where the stealth players will dwell

during most of the game, because usually they affect the vision of the opponents in some way, by

making the player less visible. The opponents on these games can be as complex as any Agent.

In our case we intended to test the performance of a thief on a map, so there was a need to create

opponents, in the form of Guards, that would hinder the player’s progress on the map. Guards that

could see far and would search the player even if they lost sight of it. The last common element,

the presence of hiding spots, needs to be present in any stealth game, and in this case our Agents

chose to hide themselves inside or under all kinds of Furniture.

14

Chapter 4

Implementation

As discussed in the previous chapter, the agent was created based on the GOAP architecture,

settled on a procedurally generated map simulator. In this chapter we’ll review the details of the

creation of both the agent and the simulator.

4.1 Simulator

The first step of each simulation is map creation. Each map is 2-dimensional and has a maximum

size defined upon creation, a maximum and minimum room size and a maximum number of rooms.

Each map creation phase is composed of a cycle of two steps until the map is validated for use.

4.1.1 Map elements

We will now discuss the elements that compose a map. The first element of interest is a Cell. The

core map is a matrix of cells.

4.1.1.1 Cell

A Cell is the smallest element of which the map is composed. Each Cell has its own position

inside the map in cartesian coordinates, a reference to every cell adjacent to itself, a Light and a

Furniture which may or may not exist on the cell, and light intensity.

4.1.1.2 Light

Light is an element that represents a lightbulb or a torch. Each Light element has an intensity that

affects nearby cells in a radius of the origin and a status of on or off.

15

Implementation

4.1.1.3 Furniture

Furniture is a more complex element. There are several different types of Furniture that may be

present on each map. Each Furniture has an orientation, which is important in Furnitures that

occupy more than one cell, a type that defines what kind of Furniture it is and if the Furniture

may appear in a Room based on the Room type, a hiding property that defines if the agent can use

the Furniture to hide and how it hides in relation to that furniture, a height value that defines if it

may block the line-of-sight of the agent or the guards, a property that defines if the Furniture is a

container and a value of loot contained in the Furniture itself. Any Furniture of medium height or

more blocks line-of-sight and allows the thief to hide behind it. These are the different types of

Furniture and their properties.

Table 4.1: Container Types

Type Container Hiding HeightBarrel Yes Inside MediumBed No Under ShortChair No None ShortChest Yes Inside ShortDesk Yes Under MediumDummy No None MediumHole No None ShortShelf Yes None HighTable Yes Under MediumThrone No None High

4.1.1.4 Room

A Room is a section of the map composed of Cells with similar properties. Each Room has a

type, that defines what kind of Furniture may appear on the Cells composing the Room, its own

Cartesian coordinates, a size value, and a list of furniture inside the room. The following is a list

of different Room types and the respective Furniture.

• Corridor: represents a corridor to transition between rooms. No Furniture may appear in

this room.

• Bedroom : May contain Barrels, Beds, Chests, Chairs and Tables.

• Storage Room : May contain Barrels, Chests, Shelves and Tables.

• Training Room : May contain Dummies, Barrels, Chests and Tables.

• Throne Room : May contain a Throne and Chairs.

• Restroom : May contain Holes.

• Work Room : May contain Desks, Chests, Shelves and Chairs

16

Implementation

Figure 4.1: Map generation example

4.1.1.5 Waypoint

A Waypoint is a point inside a Room that is part of the patrol path of the Guards inside the Map.

A Room may have only one Waypoint and it is only created on a Cell that contains no Furniture.

4.1.1.6 Guards

A Map contains one or more guards. Each Guard is an extension of the class Person. Each Person

has its own position on the Map, an unique id code and a HitPoint value that defines whether the

Person is alive or dead. Each Guard has a target Waypoint, which is the next waypoint to where it

has to move on its patrol, a value defining its status, which can be on patrol, inquisitive, on alert or

dead; a Guard also possesses Sensors such as sight, an orientation, or where the guard is turned, an

alert timer, which defines the duration of the Alert mode and a target Cell, which is the last known

location of the Agent.

Each Guard is on Patrol mode by default, where it walks from one waypoint to another, via

the A* algorithm. If the guard becomes aware of the Agent, it goes into Alert mode, chasing the

agent until it either catches it or loses it from sight. If it loses the Agent from sight it goes into

Inquisitive mode, where it searches the vicinity of the last known place of the Agent until it finds

the Agent again or the Alert timer runs out, in which case it returns to Patrol mode. If a Guard

catches the Agent the simulation ends in a failure.

17

Implementation

4.2 Agent

The Agent was built to represent a character akin to a Thief in a stealth game. As the Guards

mentioned above, the Agent is an extension of the class Person. Apart from the attributes inherited

from that class, the architecture used imparts other properties to the Agent.

4.2.1 Sensors

The Agent has a sight sensor that simulates a viewpoint oriented toward wherever the Agent is

turned. The sight sensor functions as a kind of Ray Casting algorithm in two-dimensional space.

Sight is therefore blocked by walls or medium to high pieces of Furniture. Anything not before

known detected by the sensors is added to the Knowledge of the Agent.

4.2.2 Knowledge

The Agent possesses a structure that represents its knowledge. The Knowledge structure starts

blank in the beginning of the simulation. It contains a list of known Cells, a list of known Furniture,

a list of possible Actions, a HashMap recording the cells in which the Agent has seen the Guards

patrolling, all of which are updated each time the Agent views a new Cell with its Sensor or sees

a Guard. The structure also records the current position of the Agent, its hiding status and current

and last goal.

4.2.3 Goals

Goals are objectives that the Agent has to accomplish to be successful. Each Goal has a condition

for success that depends on an alteration of the Agent’s Knowledge. In this case, four different

goals were implemented. By default the agent has the Exploration goal active, where it attempts

to learn more about its surroundings, adding to its knowledge whatever it finds. There is also the

Loot Goal, which becomes active if the agent knows of enough containers to loot, the Hide Goal,

where the agent attempts to hide from current pursuers and the Escape Goal, that becomes active

when the agent has fulfilled his primary objective (to loot the current map), where it attempts to

escape the current map without being detected.

• Explore: The default Goal, which is accomplished if the agent finishes his turn with more

known cells than he had at the beginning of the turn.

• Hide : Goal that becomes active if a Guard triggers the Alarm state on the Map. It is

accomplished when the agent is hidden out of the line-of-sight of Guards.

• Loot : Becomes active when the Agent has in its knowledge the whereabouts of more than

a third of the total loot of the Map.

• Escape : Becomes active when the Agent has in its possession more than a third of the total

loot of the Map and is accomplished if the Agent succeeds in getting to the exit of the Map.

18

Implementation

The only way to achieve any of the Goals is through the effect of different Actions. Goals

change dynamically whenever the agent senses something that invalidates the current goal or gives

priority to another.

4.2.4 Actions

Actions are single steps that the agent can follow and that compose a plan. Actions have a precon-

dition and an effect. Every new object the agent detects through its sensors gives it the knowledge

of actions it can perform related to those objects. The agent can know different types of actions

such as the Hide action, where the agent hides inside, under or behind an object, the Steal action

in which the agent loots a container, the Move, that defines a path for the agent, the Peek action

where the agent peeks from behind a cover and the Look action, which changes the facing of the

agent. Each Action has a pre-condition and an effect. The precondition defines a state that the

Agent has to achieve for the Action to be successful. They receive Knowledge as a parameter and

the effect function returns altered Knowledge that is tested to see if it accomplishes a Goal or a

precondition for another Action. There are five defined Actions.

Table 4.2: List of Actions

Action Precondition EffectHide Adjacency of the Agent to a Furniture that allows hiding Hides the agentLook No precondition Changes the facing of the AgentMove No precondition Moves the Agent to the CellSteal Adjacency of the Agent to a container-type Furniture Loots the containerPeek Adjacency of the Agent to a Furniture Gives line-of-sight from behind Furniture

4.2.5 Planner

The Planner, as the name implies, formulates a Plan. A Plan is a sequence of Actions that once

finished accomplishes a Goal. The Planner therefore receives a Goal and the known Actions

that the Agent has access to. The planning process can be similar to a navigational pathfinding

problem, so it can be tackled by search-based algorithms or even pathfinding algorithms. Since

the objective was to find a way to make the Agent act within time constraints, the Monte-Carlo

Tree Search (MCTS) method was selected for the Planner. An algorithm like A* would find the

optimal plan for any situation at the cost of increased planning time.

4.2.5.1 Planning tree

The planning stage starts with the definition of the Goal to accomplish. Once that is obtained,

there are two different methods to expand the search tree, through forward search, or backward

search. Both have been implemented using the MCTS method.

19

Implementation

Figure 4.2: Backward Search

Figure 4.3: Forward Search

20

Implementation

4.2.5.2 Planning with MCTS

Monte-Carlo Tree Search is an algorithm divided in four steps: Selection, Expansion, Simulation

and Backpropagation. It functions aheuristically, or does not need previous domain knowledge to

achieve good decisions, which fits with the fact that our Agent works with incomplete knowledge.

Two versions of MCTS were implemented over the GOAP architecture, one using forward search,

one using backward search. Forward searching starts from a node that represents the current state

of the Agent and expands to an Action from all possible Actions the Agent may take at its current

state, that is, whose precondition has been met by the previous state, and then expands to all

possible Actions whose precondition is met by the effect of the Action selected on the previous

node, until an effect accomplishes the Goal. Backward searching starts from a state where the Goal

is accomplished and expands into nodes whose Action’s effect accomplishes the Goal, expanding

afterward to nodes whose Action’s meets the selected Action’s precondition. Each of the four-step

iterations of the MCTS method is called a playout.

Figure 4.4: MCTS. From http://mcts.ai/about/index.html

The Selection step starts on a node that represents the current state of the Agent during forward

search or the success state during backward Search. A node is selected randomly during several

playouts and then based on the Upper-Confidence Bounds formula:

vi +C ∗√

ln(N)/ni

Where v is the value of the node, given by the number of successes achieved when traversing

that node over the number of times the node has been visited, C is a bias parameter, set at 1, N is

the total number of times the parent node has been visited and n is the number of times the node

has been visited. This step repeats itself until it reaches a node that hasn’t been explored. Each

node except the starting node is an Action. During forward search the nodes are selected from all

the Actions whose precondition is met by the state of the Agent after the effect of the Action on

the previous node. During Backwards search the nodes are selected from all the Actions whose

effect leads to accomplishing the precondition of the Action in the previous node. The Expansion

step expands the node selected above into a random node in the same fashion. Since the Agent

21

Implementation

is supposed to have incomplete knowledge of the world, i.e. not knowing where the Guards are

unless it sees them, the Simulation step expands the nodes randomly, until it either reaches an end

state or until a certain depth has been reached. The last step is Backpropagation, where the values

of the Actions in the current sequence are updated, increasing if they reached a success state.

4.2.6 Movement

One of the requirements of the Agent was its ability to traverse the map in a stealthy fashion, so

the heuristic was changed to include an extra cost of traversing a Cell based on the lighting and

the danger level of the Cell.

hx = mx + ix +dx

Where h is the heuristic value of a Cell, m is the Manhattan distance from it to the ending Cell,

i is the light intensity on the Cell cubed and d is the danger level, which is the number of times that

a Guard has been seen on that Cell over the number of times that cell has been seen. This is not

an admissible heuristic [Kor00] since it increases the depth of search, overestimating the distance

cost to the goal, but an admissible heuristic for a pathfinding algorithm requires extensive testing

and was not the focus of this thesis. This heuristic nonetheless adds to the distance cost of nodes

that have a high light intensity value and danger level allowing the Agent to favor Cells with a low

light intensity level and with a low danger level, creating a stealthier path.

22

Chapter 5

Experimental Results

In this section we review the experimental data collected through simulations on the platform

developed. We begin by describing the experimental setup, followed by an analysis the results and

their discussion.

5.1 Experimental Setup

The experiments were on a PC with a 2.5 GHz processor and 4 GB of RAM. Each experiment

was a playthrough of a procedurally generated map by an Agent. In each playthrough the Agent

would start in a random position on the map, and both the Agent and the Guards would act in turn.

Each iteration of the program would mean one step from the Guards and one Action or a step for

the Agent (while during a Move Action). There were three agents, all implemented as per Section

4.2.

• Random: an agent that chooses its actions randomly, based only on which it can perform

that moment.

• MCTS-F: an agent using the MCTS method in forward search.

• MCTS-B: an agent using the MCTS method in backward search.

Maps were divided into two difficulties based on the number of Guards per area of the map,

since a higher concentration of Guards would increase the probability of a Guard finding and

catching the agent. The map difficulty formula is:

Di = gi/Ai

Where D is the difficulty, g is the number of guards in the map and A is the effective area of

the map, or the number of non-wall cells. Maps under 0.002 Difficulty were discarded since this

usually signified a low concentration of guards over a large area.

23


Table 5.1: Map Difficulty

Difficulty Minimum Area Maximum Area Average Guards per mapEasy D < 0.004 264 1149 1.9Hard D >= 0.004 50 987 2.6

In each experiment the agent had to go through a map, discovering the layout of the level,

avoiding guards and stealing valuables. Over 300 experiments were conducted with each Agent in

each difficulty level. Each of these experiments generated a log-file, with the following informa-

tion of the playthrough:

• Guards: The number of guards patrolling around the map, useful to test the performance of

the Agents in more dangerous environments.

• Actual Map Size: Number of cells on the map that are not walls.

• Total Loot: Quantity of loot present on the map.

• Known Actions: Number of Actions known by the Agent before the each planning stage

begins.

• Playout time: Time spent on each playout.

• Number of Alarms: number of times the Agent was discovered by guards.

• Looted value: quantity of valuables the Agent looted during the playthrough.

• Score: value of success of the agent on said map.

Each Agent had a maximum of 100 ms time allotment to create a plan. The Random Agent

picks Actions randomly, so the planning stage would be almost instantaneous. MCTS-F performs

a forward tree search using the MCTS method, limited by a maximum expansion depth of 30

levels, and MCTS-B searches backward, with the same limitation.

5.2 Experimental Results

The agents were tested in the simulating platform described in Section 4.1, where a map was

randomly generated, and a planner type chosen to play it. Each map had a minimum of one Guard,

and more than one Furniture with a loot value, and the Agent failed the playthrough immediately if

a Guard caught it (if it became adjacent to the Agent while knowing its position). The performance

of each Agent is rated through several parameters that allow us to perceive the Agent’s efficiency

in stealth.

The first of these parameters is the number of detections per game. In this experimental setup,

due to the guaranteed high concentration of Guards in each map, detection is almost unavoidable,

even for the stealthier of Agents. MCTS-B outperformed the other Agents in avoiding guards, due

24


Table 5.2: Average number of detections per game

Easy Hard Alarms Easy Alarms HardRandom 6.51 6.81 180 69MCTS-F 4.92 5.53 133 63MCTS-B 4.02 4.26 106 51

to its planning type, which led to considerably shorter plans than its counterpart with the same

method. Shorter plans led it to safety faster, away from the Guard’s line-of-sight. In the same way,

we can see that the MCTS-B Agent triggered fewer alarms than its counterparts. The lower value

of alarms in the Hard difficulty is due to the fact that in the Hard difficulty an alarm would usually

be followed by a failure.

Table 5.3: Average loot per game

Easy HardRandom 2.35 5.54MCTS-F 11.60 12.94MCTS-B 13.61 13.68

In each playthrough, each Agent had as a primary objective to steal the loot it could find and

attempt to escape. Table 5.4 shows an average of the loot each Agent could appropriate before

being caught. The Agents were scored by their exploration in each experiment and the loot they

were able to accumulate.

Table 5.4: Average number of iterations per game

Easy HardRandom 35.70 29.59MCTS-F 13.19 11.17MCTS-B 19.62 13.40

During each playthrough, the Agents explored the map, and every Cell discovered increased

their known Actions, and therefore raised their planning possibilities. During the Exploration

Goal, the Agent’s objective is to maximize the number of known Actions. The following figures

(5.1,5.2) show that the exploration rate follows a logarithmic trend, with MCTS-B having a much

higher exploration rate on easy maps than its counterpart and that rate being much closer on hard

difficulty maps.

As mentioned above, during each iteration of each experiment the time allotment for the plan-

ning step was 100 ms. Of the three Agents, Random was the fastest to finish the playouts. On the

next table we can see the average duration of a playout in each of the difficulties for each of the

Agents.

The duration of each playout increases at near exponential rate with the number of Actions

that the Agent knows at the time of planning, since the planner has to go through the entirety

of the Actions known to find which ones are feasible at that step in the plan. During the initial

25


Figure 5.1: Exploration Rate (Easy)

Figure 5.2: Exploration Rate (Hard)

26


Table 5.5: Maximum Completion Percentage

Easy HardRandom 2% 6%MCTS-F 8% 10%MCTS-B 13% 23%

stages, since the Agent has few possible Actions, most of the playouts are under 1 ms of duration.

On instances where the Agent knew more than 600 Actions it is common to see the number of

Playouts severely reduced when using the MCTS-F, but each Playout duration over 50 ms. This

happens because the forward search expands the tree until it finds a state where the current Goal

is accomplished, and since the first playouts are essentially random because the Actions’ v value

(as seen in Section 4.2.5.2) is zero during the initial playouts. The minimum time for a playout

for any of the methods above is under 1 ms, and the maximum time is 99 ms, in planning stages

where only one playout was made, since the planner stops itself if going over 100 ms. This can

be a problem with some search-based methods, but in MCTS-F, if the planner is stopped, the plan

created can still be valid, since it starts exploring the tree from the Agent’s current status.

5.3 Discussion

The performance of the Agents in these experiments leads us to the conclusion that MCTS used in

backward search, even in an environment with incomplete knowledge, is a very powerful search

method and that with some improvement it could possibly be used to create a stealth agent that

plans during runtime. The GOAP architecture can function with any search-based algorithm, and

can profit with the usage of MCTS due to it being able to not only achieve optimal plans as per

the A* method but also due to the possibility of interrupting the method at any time and still be

able to extract a working plan. The simulation platform requires some changes, in order to be less

unforgiving and allow the Agents some measure of ease during the playthroughs.

27


Table 5.6: Number of playouts and average duration in miliseconds

Medium Avg. Duration Medium Hard Avg. Duration HardRandom 9889 0.051 2989 0.094MCTS-F 768846 0.704 336343 1.223MCTS-B 7031303 5.929 3107600 7.079

28

Chapter 6

Conclusions and Future Work

In this thesis we discussed the creation of an AI Agent that would be able to act in a way befitting a

player on a stealth-based game. We explored diverse possibilities for its creation, and chose a path

for its implementation. We also discussed PCG and its potential in creating game environments

tailored to different types of games and testing platforms.

6.1 Goal Assessment

The Agent created was able to walk stealthily through a game map, avoiding lights and patrolling

guards, being able to hide in order to avoid being caught. It was also able to plan ahead, with a

specific goal in mind, be it hiding from an inquisitive guard, or looting an entire map filled with

valuables. The Agent is not yet capable of playing like a human player or passing a Turing test,

but the framework was created so that it would be possible to improve upon and eventually get

to a point where it would be able to act in a realistic way that could be confused with a human

playthrough. The testing platform created used PCG concepts to create different game maps every

playthrough, allowing for near-infinite game experiences. It was built in a modular fashion, so that

more stealth elements can be added with relative ease and without increasing the generating time

significantly.

6.2 Future Work

The next step in this work will be improving the Agent, in the pathfinding heuristic, making sure

that it finds a stealthy path in optimal time, in the planning method, using optimization techniques

to simplify the search-tree, and applying the layered version of the GOAP architecture, in the

behaviour of the Agent, adding new Actions and Goals and changing some of the existing ones to

reflect a more realistic approach to each level. The knowledge of the Agent can also be improved,

with personality values, such as a value of fear or courage to better emulate a player, and by

29

Conclusions and Future Work

optimizing the way the Actions are accessed in order to reduce search times. The PCG tool can

be improved by adding more stealth elements, such as noisy ground, or vents that the Agent could

crawl through, and by balancing the maps in terms of difficulty.

30

References

[Ash10] Daniel Ashlock. Automatic generation of game elements via evolution. In 2010IEEE Conference on Computational Intelligence and Games, CIG2010, August18, 2010 - August 21, 2010, pages 289–296, Copenhagen, Denmark, 2010. IEEEComputer Society.

[Bro91] Rodney A Brooks. Intelligence without representation. Artificial intelligence,47(1):139–159, 1991.

[CBPD11] D M D Carli, F Bevilacqua, C T Pozzer, and M C D’Ornellas. A Survey of Proce-dural Content Generation Techniques Suitable to Game Development. In Gamesand Digital Entertainment (SBGAMES), 2011 Brazilian Symposium on, pages 26–35, 2011.

[Col09] Ron Coleman. Fractal analysis of stealthy pathfinding aesthetics. InternationalJournal of Computer Games Technology, (1), 2009.

[DSS93] Randall Davis, Howard Shrobe, and Peter Szolovits. What Is a Knowledge Repre-sentation? AI magazine, 14:17–33, 1993.

[FN72] Richard E Fikes and Nils J Nilsson. STRIPS: A new approach to the application oftheorem proving to problem solving. Artificial intelligence, 2(3):189–208, 1972.

[FNO14] H. Fernandes, P. Nogueira, and E. Oliveira. Monte Carlo Tree Search in The Oc-tagon Theory. In 6th International Conference on Agents and Artificial Intelligence(ICAART), pages 328–335, 2014.

[GMS03] Ross Graham, Hugh Mccabe, and Stephen Sheridan. Pathfinding in computergames. ITB Journal, pages 57–81, 2003.

[HYK10] Frederick W P Heckel, G Michael Youngblood, and Nikhil S Ketkar. Representa-tional complexity of reactive agents. In 2010 IEEE Conference on ComputationalIntelligence and Games, CIG2010, August 18, 2010 - August 21, 2010, pages 257–264, Copenhagen, Denmark, 2010. IEEE Computer Society.

[JTSWF10] Martin Jennings-Teats, Gillian Smith, and Noah Wardrip-Fruin. Polymorph: Amodel for dynamic level generation. In Sixth Artificial Intelligence and InteractiveDigital Entertainment Conference, 2010.

[JYT10] Lawrence Johnson, Georgios N Yannakakis, and Julian Togelius. Cellular au-tomata for real-time generation of infinite cave levels. In Proceedings of the 2010Workshop on Procedural Content Generation in Games, page 10. ACM, 2010.

31

REFERENCES

[KBK07] John-Paul Kelly, Adi Botea, and Sven Koenig. Planning with hierarchical tasknetworks in video games. In Proceedings of the ICAPS-07 Workshop on Planningin Games, 2007.

[Kor00] Richard E Korf. Recent progress in the Design and Analysis of Admissible Heuris-tic Functions. American Association for Artificial Intelligence (AAAI), pages1165–1170, 2000.

[LCCFL12] R Lara-Cabrera, C Cotta, and A J Fernendez-Leiva. Procedural map generationfor a RTS game. In 13th International Conference on Intelligent Games and Sim-ulation (Game-On 2012), 14-16 Nov. 2012, pages 53–58, Ostend, Belgium, 2012.EUROSIS-ETI Publications.

[LCF13a] Raúl Lara Cabrera, Carlos Cotta, and Antonio J Fernández Leiva. Evolving Aes-thetic Maps for a Real Time Strategy Game. 2013.

[LCF13b] Raúl Lara Cabrera, Carlos Cotta, and Antonio J Fernández Leiva. Using Self-Adaptive Evolutionary Algorithms to Evolve Dynamism-Oriented Maps for a RealTime Strategy Game. 2013.

[Lor08] Richard J Lorentz. Amazons discover monte-carlo. In Computers and games,pages 13–24. Springer, 2008.

[MC10] Jean Mehat and Tristan Cazenave. Combining UCT and nested Monte Carlo searchfor single-player general game playing. IEEE Transactions on ComputationalIntelligence and AI in Games, 2(4):271–277, 2010.

[MF09] Ian Millington and John Funge. Artificial intelligence for games. CRC Press,2009.

[MSD+13] Giuseppe Maggiore, Carlos Santos, Dino Dini, Frank Peters, Hans Bouwknegt,and Pieter Spronck. LGOAP: adaptive layered planning for real-time videogames.In Computational Intelligence in Games (CIG), 2013 IEEE Conference on, pages1–8. IEEE, 2013.

[NARO14] P. A. Nogueira, R. Aguiar, R. Rodrigues, and E. Oliveira. Modelling Players’Emotional Reactions in Digital Games Via Physiological Input. In EEE/WIC/ACMInternational Conference on Intelligent Agent Technology, 2014.

[NCLMnA99] Dana Nau, Yue Cao, Amnon Lotem, and Hector Muñoz Avila. SHOP: Simple hier-archical ordered planner. In Proceedings of the 16th international joint conferenceon Artificial intelligence-Volume 2, pages 968–973. Morgan Kaufmann PublishersInc., 1999.

[Neg05] Michael Negnevitsky. Artificial intelligence: a guide to intelligent systems. Pear-son Education, 2005.

[NRON13] Pedro A Nogueira, Rui Rodrigues, Eugénio Oliveira, and Lennart E Nacke.Guided emotional state regulation: Understanding and shaping players’ affectiveexperiences in digital games. Proceedings of the Ninth Annual AAAI Conferenceon Artificial Intelligence and Interactive Digital Entertainment (AIIDE), 2013.

32

REFERENCES

[NS13] Pedro Nogueira and José Serra. Personality Simulation in Interactive AgentsThrough Emotional Biases. In European Conference on Modelling and Simu-lation, pages 25–31. IEEE, 2013.

[Ork03] Jeff Orkin. Applying goal-oriented action planning to games. AI Game Program-ming Wisdom, 2(1):217–227, 2003.

[Ork04] Jeff Orkin. Symbolic representation of game world state: Toward real-time plan-ning in games. In AAAI Workshop on Challenges in Game AI, 2004.

[Ork06] Jeff Orkin. Three states and a plan: the AI of FEAR. In Game Developers Con-ference, volume 2006, page 4. Citeseer, 2006.

[PCWL08] David Pizzi, Marc Cavazza, Alex Whittaker, and Jean-Luc Lugrin. Automaticgeneration of game level solutions as storyboards. In 4th Artificial Intelligenceand Interactive Digital Entertainment Conference,AIIDE 2008, October 22, 2008- October 24, 2008, pages 96–101, Stanford, CA, United states, 2008. AAAI Press.

[SP10] Nathan Sorenson and Philippe Pasquier. Towards a generic framework for auto-mated video game level creation. In Applications of Evolutionary Computation,pages 131–140. Springer, 2010.

[Stu07] Nathan R Sturtevant. Memory-Efficient Abstractions for Pathfinding. In AIIDE,pages 31–36, 2007.

[SW10] Gillian Smith and Jim Whitehead. Analyzing the expressive range of a level gen-erator. In Proceedings of the 2010 Workshop on Procedural Content Generationin Games, page 4. ACM, 2010.

[SWV+08] Maarten P D Schadd, Mark H M Winands, H Jaap Van Den Herik, Guillaume MJ-B Chaslot, and Jos W H M Uiterwijk. Single-player monte-carlo tree search. InComputers and Games, pages 1–12. Springer, 2008.

[TPY10] Julian Togelius, Mike Preuss, and Georgios N Yannakakis. Towards multiobjectiveprocedural map generation. In Proceedings of the 2010 Workshop on ProceduralContent Generation in Games, page 3. ACM, 2010.

[TRR+13] LuísFilipe Teófilo, Rosaldo Rossetti, LuísPaulo Reis, HenriqueLopes Cardoso,and PedroAlves Nogueira. Simulation and Performance Assessment of PokerAgents. In Francesca Giardini and Frédéric Amblard, editors, Multi-Agent-BasedSimulation XIII SE - 6, volume 7838 of Lecture Notes in Computer Science, pages69–84. Springer Berlin Heidelberg, 2013.

[TSBdK09] Tim Tutenel, Ruben Michaël Smelik, Rafael Bidarra, and Klaas Jan de Kraker.Using Semantics to Improve the Design of Game Worlds. In AIIDE, 2009.

[VBJP10] Ondrej Vanek, Branislav Boansky, Michal Jakob, and Michal Pechoucek. Tran-siting areas patrolled by a mobile adversary. In 2010 IEEE Conference on Com-putational Intelligence and Games, CIG2010, August 18, 2010 - August 21, 2010,pages 9–16, Copenhagen, Denmark, 2010. IEEE Computer Society.

[vdLLB13] Roland van der Linden, Ricardo Lopes, and Rafael Bidarra. Designing procedu-rally generated levels. In Proceedings of the the second workshop on ArtificialIntelligence in the Game Design Process, 2013.

33

REFERENCES

[WJ95] Michael Wooldridge and Nicholas R Jennings. Intelligent agents: Theory andpractice. Knowledge engineering review, 10(2):115–152, 1995.

[XYS+09] Liu Xiao, Li Yao, He Suoju, Fu Yiwen, Yang Jiajian, Ji Donglin, and Chen Yang.To create intelligent adaptive game opponent by using Monte-Carlo for the gameof Pac-Man. In 2009 Fifth International Conference on Natural Computation(ICNC 2009), 14-16 Aug. 2009, volume vol.5, pages 598–602, Piscataway, NJ,USA, 2009. IEEE.

34

Appendix A

Procedurally generated simulator

There was a need to create a platform for testing the AI developed in different scenarios. There was

a platform called AI Sandbox (http://aisandbox.com, 2014) in closed access alpha stages during

the development of this thesis, so there was a need to create a platform for testing. Since the agent

was decided to be a thief, mirroring some stealth-based games, the the simulator would have some

necessary content to emulate the gameplay style of said games, the stealth elements and mechanics

that caracterize those games and possible objectives included on those games.

A.0.1 Procedural map generation

The generator was based in the Rogue (Toy and Wichman 1980) map generation, starting with a

map of a given size, where every cell composing the map is a wall and the elements of the map are

dug out from those walls. The generator has two phases, repeated until the map is deemed ready

for the simulation, or until a certain condition has been reached. The first phase is the generation

phase where a room is created in a random location inside the map. The second phase is the test

phase where the map is then tested to see if it achieves the end state condition or if the generation

step created any illegal zones in the map (a blocked path or a room outside the map area). If any

illegalities are detected, the generation step is then reversed and attempted again, and if the map

achieves the end condition the generation stops and the simulation begins.

A.0.1.1 Step 1: Generation

At the beginning of the first generation phase, the generator receives the following set of parame-

ters:

• Maximum map size

• Maximum and minimum room size

• Maximum number of rooms

• Valid room types for the map

The generator digs rooms out of the walls of the map based on the following elements:

35

Procedurally generated simulator

• Cell: Smallest element available, can contain other objects.

• Room: Element composed of cells, can be a regular room or a corridor.

• Furniture: Elements that can appear inside a room, may be containers with loot and may

allow the agent to hide within, under or behind them.

• Light: Element that affects the lighting of the cell it is contained and cells within its intensity

radius.

During this step the generator creates a room composed of cells, each cell may contain a

piece of furniture and a light. The room is generated empty at first, with only the lighting and is

subsequently filled with furniture.

A.0.1.2 Step 2: Testing

The testing step of generation tests the room created for accessibility, every part of furniture must

be accessible and any doors created must be accessible as well. If the test returns a valid room it

selects a random wall connected to a room and creates a door on it, jumping to step 1 and creating

a new room. If not, the room is emptied of furniture and refurnished and this step is repeated. If

some attempts happen without success the whole room is discarded and the generator jumps to

step 1 to create a new room with different properties.

36

development of artiﬁcial intelligence systems for stealth ...ei08034/mieic-en.pdfabstract stealth...

Documents