multi-agent patrolling on a budget: finding the best team ... · the challenge of choosing the best...

15
Multi-Agent Patrolling on a Budget: Finding the Best Team for the Job Sara Marie Mc Carthy 1 Aaron Schlenker 1 Milind Tambe 1 Christopher Kiekintveld 2 1 University of Southern California 2 University of Texas at El Paso Abstract. Research in security games currently focuses on optimizing the use of defender resources of a given team, with little focus on the problem of which team to form in the first place. However, in real world domains, we often face the challenge of selecting a defender team given budget constraints and the require- ment that the team be composed of heterogeneous resources, each with differ- ent costs and utilities. We present three novel contributions to address this chal- lenge in security games. First we introduce a scalable security game algorithm for computing defender resource allocations with the ability to model coordina- tion among resources. This algorithm is used in our second contribution, which includes heuristic search algorithms to form optimal teams. Finally, we present detailed experimental results showing that these methods consistently outperform baseline methods and can scale up to real world problem domains. 3 1 Introduction To date, most research on security games has focused on tactical decision support pro- viding recommendations for day-to-day operations and short term decisions on how to optimally allocate a set of available resources [2, 16, 7]. However, there is also a great need for strategic decision support at the management level to analyze long term in- vestments in security resources. The challenge of choosing the best team of security resources on a budget arises in many different domains of security game applications. A current major motivation is the domain of environmental crime, with security game frameworks being applied for the protection of forests, fish and wildlife [9, 25]. For ex- ample, illegal forestry in Madagascar poses a challenging problem due to the limited resources available for conservation and enforcement, and the large physical area that must be protected. 4 There are many different types of resources in this domain that can be used for patrolling: foot patrols, UAVs, stationary cameras and other remote sensing devices. This diversity of resources and the limited budget raises the challenge of how to find the best team of resources to maximize the overall security effectiveness and motivates the development of our model. 3 A similar version of this paper is under review at IJCAI. Since the OptMAS proceedings are non-archival we believe that this is not in conflict. If this does cause a conflict we will withdraw the workshop submission if necessary. 4 http://www.ambatovy.com

Upload: others

Post on 29-May-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multi-Agent Patrolling on a Budget: Finding the Best Team ... · The challenge of choosing the best team of security resources on a budget arises in many different domains of security

Multi-Agent Patrolling on a Budget: Finding the BestTeam for the Job

Sara Marie Mc Carthy1 Aaron Schlenker1 Milind Tambe1 Christopher Kiekintveld2

1 University of Southern California2 University of Texas at El Paso

Abstract. Research in security games currently focuses on optimizing the useof defender resources of a given team, with little focus on the problem of whichteam to form in the first place. However, in real world domains, we often face thechallenge of selecting a defender team given budget constraints and the require-ment that the team be composed of heterogeneous resources, each with differ-ent costs and utilities. We present three novel contributions to address this chal-lenge in security games. First we introduce a scalable security game algorithmfor computing defender resource allocations with the ability to model coordina-tion among resources. This algorithm is used in our second contribution, whichincludes heuristic search algorithms to form optimal teams. Finally, we presentdetailed experimental results showing that these methods consistently outperformbaseline methods and can scale up to real world problem domains. 3

1 Introduction

To date, most research on security games has focused on tactical decision support pro-viding recommendations for day-to-day operations and short term decisions on how tooptimally allocate a set of available resources [2, 16, 7]. However, there is also a greatneed for strategic decision support at the management level to analyze long term in-vestments in security resources. The challenge of choosing the best team of securityresources on a budget arises in many different domains of security game applications.A current major motivation is the domain of environmental crime, with security gameframeworks being applied for the protection of forests, fish and wildlife [9, 25]. For ex-ample, illegal forestry in Madagascar poses a challenging problem due to the limitedresources available for conservation and enforcement, and the large physical area thatmust be protected.4 There are many different types of resources in this domain that canbe used for patrolling: foot patrols, UAVs, stationary cameras and other remote sensingdevices. This diversity of resources and the limited budget raises the challenge of howto find the best team of resources to maximize the overall security effectiveness andmotivates the development of our model.

3 A similar version of this paper is under review at IJCAI. Since the OptMAS proceedings arenon-archival we believe that this is not in conflict. If this does cause a conflict we will withdrawthe workshop submission if necessary.

4 http://www.ambatovy.com

Page 2: Multi-Agent Patrolling on a Budget: Finding the Best Team ... · The challenge of choosing the best team of security resources on a budget arises in many different domains of security

This paper addresses the challenge of team formation [17, 10] in security gamesbringing together two areas of research in multi-agent systems. Our main goal is to op-timize over teams given some algorithms for evaluating their quality. One major chal-lenge we face is that of computational scalability. Solving individual instances of similarsecurity games with a fixed team is already a difficult problem [15]. Therefore, a majoremphasis in our work is to develop algorithms that can feasibly find teams for problemswith realistic scale.

The main contributions of our work are: (1) A formal model of team formation insecurity games, (2) a generalized version of the SNARES algorithm [11] for solving se-curity patrolling games with cooperative heterogeneous resources, (3) a best-first searchand a tree search algorithm for finding teams, bounds on team performance that are usedfor optimization of the algorithms, and (4) experimental evaluation of the algorithm per-formance and solution quality on both synthetic and realistic problem instances.

2 Motivating Domain and Game Model

Fig. 1: Madagascar cast as a securitygame domain

Our formal model is based on networksecurity games on graphs but can begeneralized to other types of games. Inthese network security games, the de-fender’s goal is to prevent an adver-sary from reaching his target by defend-ing edges. While this general model hasmultiple applications (e.g. interdiction ofdrug smuggling [5], urban security [12]), asa concrete motivating domain we turn tothe illegal activity occurring in Madagas-car.

Non-Governmental Organizations such asAlliance Voahary Gasy are working to com-bat the problem of illegal logging in Mada-gascar by deploying teams of rangers, police,and local volunteers. Our formal model is de-

signed to capture both the problem of determining the best team of resources to deploygiven a limited budget, as well as the problem of optimizing the patrolling policies forspecific teams.

As in previous work [11], this security game is played on a graph G = (N,E), con-sisting of source nodes s ∈ S ⊂ N , intermediary nodes and target nodes t ∈ T ⊂ N .These are shown Figure in 1 as the triangle, filled circle, and square nodes respectively.The attacker’s objective is to find a path to a target node without encountering the de-fender. Target nodes ti have an associated payoff of τ(ti) for the attacker. The value ofthe payoff is domain dependent; for Madagascar it depends the density of trees or the

Page 3: Multi-Agent Patrolling on a Budget: Finding the Best Team ... · The challenge of choosing the best team of security resources on a budget arises in many different domains of security

value of wood in a particular area. We play a zero sum game, so that positive payoff forthe attacker results in equivalent negative payoffs for the defender.

The defender’s objective is to stop the attacker from reaching any of the targets;she does this by placing resources on edges of the graph. If any edges with a defenderresource intersect the attacker path, there is a probability that the defender can stop theattack. One key novelty of our model compared to previous work [11, 22] is that wemodel heterogeneity in teams by introducing variation in coverage probability and cov-erage size among defender resources. We also impose path constraints to allow us tobetter model patrols. The defender has K types of these resources with which to build ateam and protect targets. Each resource of type k can cover Lk edges, which must forma subpath on the graph, has a cost of bk and a probability Pk of detecting an attacker onan edge.

The defender has a total budget B; given the cost of each resource type bk, a teamconsists of mk resources of each type k. A defender pure strategy Xi is a single alloca-tion of all defender resources in a team to a set of the |E| edges of the graph, satisfyingthe path constraints on each resource. An attacker pure strategy Aj is any path start-ing at a source node sj and ending at a target node tj . The attacker and defender playeach strategy with some probability. These distributions over strategies are the mixedstrategies a and x. Multiple resources placed on a single edge results in a boosted prob-ability of detecting the attacker allowing us to model coordination among resources.This boosted probability is given by:

P (e,Xi) = 1−K∏1

(1− Pk)mk,e (1)

Where P (e,Xi) is the probability of detecting an attacker on edge e if the defenderfollows pure strategy Xi which allocates mk,e number of resources of type k to edge e.The total probability that a defender pure strategy Xi protects against an attacker purestrategy Aj is given below, where we take the product over all the edges in the attackpath.

P (Xi, Aj) = 1−∏e∈Aj

(1− P (e,Xi)) (2)

3 TROOPS

We present TROOPS (Team Resource Optimization for Obtaining Patrol Strategies) asolver which acts as a subroutine that evaluates teams for the algorithms presented insection 4. TROOPS is a double oracle algorithm that builds on SNARES [11]. TROOPSoptimally solves the game for teams with perfect coverage probability and approximatessolutions for teams with imperfect coverage. This method allows for greater computa-tional scalability, but our search algorithms could function in the same way with anysolver, optimal or approximate. For example, we could replace TROOPS with an op-timal double-oracle game solver [3, 24] to guarantee global optimality, but would loseseveral important computational enhancements specific to patrolling problems.

Page 4: Multi-Agent Patrolling on a Budget: Finding the Best Team ... · The challenge of choosing the best team of security resources on a budget arises in many different domains of security

G(N,E) Graph representing security domainτ(ti) Payoff of the ith target tiK Number of defender resource typesLk Number of edges covered by the kth resource typebk Cost of the kth resource typePk Probability coverage of resource type kB Total budget for the teammk Number of defender resources of type k

X = {Xi} Set of defender pure strategiesx = {xi} Defender’s mixed strategy over XA = {Aj} Set of attacker pure strategiesa = {aj} Attacker’s mixed strategy over AUd(Xi, a) Defender utility playing Xi against aUa(x,Aj) Attacker utility playing Aj against x

Table 1: Notation and Game Description

TROOPS is defined in Algorithm 1, with the key idea being to incrementally addattacker and defender strategies until convergence to an equilibrium game value. It isdivided into master (MiniMax) and slave (Defender and Attacker Oracle) components.To speed up computation, the slave components operate in two steps, a BetterOracleand a standard Oracle. The BetterOracle is called first and uses a greedy algorithm toquickly select a better-response strategy to add. If the greedy strategy does not increasethe expected utility, then the standard Oracle is called and a MIP is solved to generatean approximate best response, which is only added if it increases the utility. The itera-tion over master and slave continues until the attacker expected utility and the defenderexpected utility converge to the equilibrium game value and no new strategies are added.

Algorithm 1: TROOPS1: Initializes X, A2: do:3: (x, a)←MiniMax(X, A)4: X← BetterDefenderOracle(a)5: if Ud(X, a) - Ud(x, a) ≤ ε then6: X← DefenderOracle(a)7: X← X ∪ { X }8: A← BetterAttackerOracle(x)9: if Ua(x, A) - Ua(x, A) ≤ ε then

10: A← AttackerOracle(x)11: A← A ∪ { A }12: until convergence13: return (x, a)

TROOPS’s novelty compared to previous work [11] is the new DefenderOracle(DO)and BetterDefenderOracle(BDO), now able to handle heterogeneous resources and modelthe coordination of multiple resources, solving for teams with varying edge coveragesizes subject to path constraints, and non unitary coverage probability on edges. The

Page 5: Multi-Agent Patrolling on a Budget: Finding the Best Team ... · The challenge of choosing the best team of security resources on a budget arises in many different domains of security

two attacker oracles remain the same as in SNARES.

3.1 Defender Oracle

The DO considers distribution a over the current the set of attacker strategies A, andadds a best response defender strategy Xi. The objective is to maximize the defenderutility, given in equation 3.

Ud(Xi,a) = −∑

jaj(1− P (Xi, Aj))τ(tj) (3)∑

k,eλk,im,e = K;

∑eλk,im,e = Lk;

∑m,e

λk,im,e = mk (4)∑

e∈out(n)

λk,im,e =

∑e∈in(n)

λk,im,e ∀n 6= ns, ns+k (5)

∑e∈out(ns)

λk,im,e = 1

∑e∈in(ns+k)

λk,im,e = 1 λk,i

m,e ∈ {0, 1} (6)

The significant difference in the DO occurs when we generalize to accommodatearbitrary edge coverage as well as non unitary coverage probability. We achieve thiswith constraints given in equations 4-6. Path constraints are enforced with equations5-6. We consider every node ns ∈ N as a potential starting point of the path and searchfor a subpath through the set of nodes within k edges of ns, with the path terminating onany nodes k edges away, ns+k. The optimization problem is one of allocating resourcesto edges e by setting λk,im,e = 1 which corresponds to the ith defender strategy havingthe mth resource of type k being allocated to edge e.

In the case where edges have unit coverage probability, P (Xi, Aj) ∈ {0, 1} de-pends only on whether an attacker path intersects a defender edge. When resourceshave coverage probability less than 1, we calculate the probabilities additively (7). Thismeans that when the DO calculates the utility of a particular strategy Xi, it overesti-mates the value of playing that strategy, as the additive probability calculation alwaysresult in a higher probability coverage than the correct multiplicative one. However,this can lead to situations where the DO returns a response that is already in the masterprogram, and does not return the correct best response, potentially leading to prematureconvergence of the master.

P (e,Xi) =∑

k,mk

Pk; P (Xi, Aj) =∑e∈Aj

P (e,Xi) (7)

The master component is always optimal and calculates the probabilities using equa-tions 1 and 2; it is the slave component which is optimal for unit coverage probabilityresources and approximate for partial coverage probability resources.

3.2 Better Defender Oracle

The new BDO is a greedy algorithm which maximizes potential gain to quickly choosethe next best strategy. The method is outlined in algorithm 2; in SNARES weights we

Page 6: Multi-Agent Patrolling on a Budget: Finding the Best Team ... · The challenge of choosing the best team of security resources on a budget arises in many different domains of security

are assigned to each edge in the graph based only on the likelihood that an attacker usesthat edge, as well as the value τ(tj) of the target of the attacker path:

we =∑

jAj,eajτ(tj) (8)

However, these weights do not take into account the imperfect coverage of defenderresources. In equation 9 we define a new metric for edge value as the potential gainthe defender receives placing a resource of type k on edge e, using the weights and thecoverage probability under the strategy with the new resource, Xi ∪ λk, and without itXi.

gke = we

(P (e,Xi ∪ λk)− P (e,Xi)

)(9)

The maximum gain edge e∗ is chosen as the anchor point for the next defender resourcethat needs to be assigned. From this anchor the bestSet algorithm returns the maximumgain sub path of size equal to the resource coverage and the resource is assigned to thisset of edges. This is done by considering all the incoming and outgoing of the anchor.We then incrementally increase the set of anchor edges, adding the next best until a subpath of size k is found. This is done until all resources have been assigned.

Algorithm 2: Better Defender Response1: Initialize X← 0; we ← 02: for each resource3: for each Aj ∈ A4: for each edge e ∈ Aj

5: ∆Pe ← P (e,X ∪ λke)− P (e,X)6: we ← we + ajT (tj);7: gke ← ∆Pewe

8: e∗ ← argmaxe(gke )

9: { e } ← bestSet(e∗,Lk)10: X← X ∪ { e }11: return X

4 Team Formation

We now consider team formation with budgetary constraints in the FORTS (FormingOptimal Response Teams for Security) algorithms. It is possible to optimize team com-position by exhaustively running every team though TROOPS to calculate the gamevalues. This is an expensive computation, so to speed it up we introduce two searchmethods: p-FORTS and t-FORTS, for finding teams with the highest value given by thesolver for an individual team. To aid this search we use two novel heuristics used toguide the searches: (1) Attacker Oracle Cutoff (AOC) and (2) Virtual Resources (VR).Our heuristics provide an upper bound on the expected defender utility for a given teamwithout needing to fully solve the game with TROOPS. We now prove that these areadmissible heuristics for our model.

Page 7: Multi-Agent Patrolling on a Budget: Finding the Best Team ... · The challenge of choosing the best team of security resources on a budget arises in many different domains of security

4.1 Search Heuristics

Attacker Oracle Cutoff (AOC) The AOC finds an upper bound on the game value fora complete team. We do this by limiting the number of attacker strategies to a constantc, considerably shortening the number of master-slave iterations.

Theorem 1 The AOC provides an upper bound on the defender expected utility, Ud, ofa complete team and Ud is monotonically decreasing as c varies from 1 to |A|.

Proof (Proof Sketch). The AOC introduces a cutoff constant c that limits the number ofpossible pure strategies added to the attacker strategy set A. We label this limited mixedattacker strategy ac . The corresponding defender utility playing against ac is U c

d andthe attacker utility is U c

a . To prove U cd ≥ Ud, where Ud has cutoff c = |A|, it suffices to

show U cd ≥ U

c+1d . Now take an attacker mixed strategy ac+1 where the cutoff is raised

by one from ac, allowing for one new attacker strategy, aj+1, to be added. Now we havetwo possible cases. In case 1, assume aj+1 does not increase the utility of the attacker.If this case is true then the attacker sets the probability of using aj+1 = 0. This meansac = ac+1 and since we have a zero sum game ac = ac+1 ⇔ U c

a = U c+1a ⇔ U c

d =U c+1d . In case 2 assume aj+1 does raise the attacker’s utility. In this case the attacker

chooses to use the new strategy in their mixed strategy ac+1. Since the new strategyraises the attacker’s utility we know U c

a < U c+1a , which implies U c

d > U c+1d because

we have a zero sum game. Therefore, U cd ≥ U

c+1d and it follows that U c

d ≥ Ud. Further,it can be shown that the defender’s utility is monotonically decreasing as c varies from1 to |A| using the fact that U c

d ≥ Uc+1d .

Virtual Resources The AOC provides an upper bound on the value of a completeteam. However, it does not bound the value of partially formed teams. Given a partiallyformed team, we would like to estimate its potential value, even if we do not knowwhat resources will be added. We introduce the idea of virtual resources to providebounds on partial teams. A virtual resource λv is a resource which covers one edge,Lv = 1, has the lowest cost per edge bv = mink

Lk

bkand highest coverage probability

Pv = maxk Pk of any of the real resources. If we use the remaining budget on virtualresources, we show that each such team will dominate the best possible real team.

Theorem 2 A partial team saturated with virtual resources always dominates one sat-urated with real resources.

Proof (Proof Sketch).Consider a partial team of real resources. We can use the remaining budget on a

set of virtual resources or a set a real resources. Consider the optimal allocation ofboth of these sets of resources and assume that the real team of resources has a highergame value than the virtual resources. This implies that the team of real resources hasa higher probability of detecting the attacker than the team of virtual resources. Thecosts of the virtual resources are set so that we will always be able to cover a set ofedges greater than or equal to the the number of edges covered by a real team. It is thenalways possible to at least cover the same edges that the real team covers with the virtualresources. The only way that the real team would be able to have a higher value is if the

Page 8: Multi-Agent Patrolling on a Budget: Finding the Best Team ... · The challenge of choosing the best team of security resources on a budget arises in many different domains of security

coverage probability on each edge was higher than the team of virtual resources. Sincewe set the virtual resources to have the highest coverage probability of any resource,the team of virtual resources must have an equivalent or greater probability coverage.Therefore a team of virtual resources always has a value at least that of a real team.

4.2 t-FORTS

Based on the heuristics mentioned above, the first method we propose is a branch andbound search which incrementally adds resources to a team, until the cost of the teamsaturates the budget. The root node is initialized with an empty team with one additionalresource added to each child node. Intermediate nodes contain partially formed teamswhile leaf nodes contain complete teams.

The expansion of nodes follows a DFS strategy and is described in algorithm 4given a node n. The AOC is used to estimate the value of child nodes and guide thesearch. However, given that the search space is composed of partial teams, we saturateeach child with virtual resources to upper bound their game values. If a child node issaturated it is evaluated using TROOPS. The search terminates when a leaf node withvalue greater than or equal to the bounds for all other expanded nodes is found and isguaranteed to return the optimal solution.

Algorithm 4: t-FORTS(n)1: if n is leaf return n2: for each resource type k3: nchild = n ∪ λk5: if nchild is saturated6: nchild.value= TROOPS(nchild)7: else4: saturateTeam(nchild)8: nchild.value = AOC-TROOPS(nchild)9: add nchild to children of n

There are several optimizations we make to speed up the search. Since child nodesshare similar resources with parents, the evaluation using TROOPS is warm startedwith the solution of the parent node. Additionally we generate quicker upper bounds bylimiting the number of virtual resources added. If virtual resources have unit coverageprobability, we only add a number equal to the size of the total min-cut of the graph.With enough virtual resources to fully cover the min cut of the graph, we can completelyseparate the source nodes from target nodes, so limiting the number of virtual resourcesto the size of the min-cut does not decrease game value.

4.3 p-FORTS

Our second method, p-FORTS, is a best-first search utilizing a priority queue. It firstfinds an upper bound on the value of all teams using a low cutoff for the AOC. Teamsare prioritized by the AOC value. Next, the team at the head of the queue is reevaluatedwith a higher cutoff, and if an exact game value is found before the cutoff is reached

Page 9: Multi-Agent Patrolling on a Budget: Finding the Best Team ... · The challenge of choosing the best team of security resources on a budget arises in many different domains of security

the search terminates. Otherwise, if the value is less than the next team in the queue were-insert the first team and repeat the process. The first step of p-FORTS is expensivesince the cutoff solver must be run for each team. To prevent this we introduce a budgetlimit, so that only teams within a certain range of the maximum budget are consideredin the search. This does not sacrifice optimality because the game value of the optimalteam increases monotonically with budget. One reason to relax this constraint is if wewish to return the optimal team with the lowest budget.

An outline of the algorithm is given in p-FORTS with AOC(c) referring to runningthe Attacker Oracle Cutoff with cutoff value c and vt, vt+1 representing the values ofthe teams first and second in the priority queue, respectively.

Algorithm 3: p-FORTS1: Run AOC(5) for each team and insert into pq2: Poll head of pq to get t3: Run AOC(5 + k) on t4: if cutoff is not used and vt ≥ vt+1

5: return t6: while vt ≥ vt+1 and cutoff is not used7: run AOC(5 + n ∗ k)8: check condition in line 49: if vt > vt+1

10: reinsert t into pq11: Repeat 2-10 until team is returned

5 Evaluation

We present three sets of experimental results: (1) We evaluate the scalability and run-time performance of p-FORTS and t-FORTS on several classes of random graphs, (2)we investigate the benefit of optimizing team composition as well as the diversity ofoptimal teams and (3) we demonstrate their ability to scale up to the real world by test-ing performance on a graph generated from map data of Madagascar. All values areaveraged over 20 trials, each run on a different generated graph. Within a trial, eachsearch method is run on the same graphs to ensure consistency in the problem diffi-cultly across all search methods. The experiments were run on a Linux cluster withHP-SL250, 2.4 GHz, dual-processor machines. All reported differences are statisticallysignificant (with p ≥ 0.05). We use the following graphs:

(1) Grid graphs consist of a grid with width w, height h, sources s, targets t andnearest neighbor connections between nodes, labeled as G(w, h, s, t). We define startand end points for the attacker, with sources located at one end and targets at another.Grids graphs are a commonly used model in environmental crime games [25]

(2) Geometric graphs provide a good approximation of real road networks [6] al-lowing us to model the networks of villages and rivers in forest regions. The graphs

Page 10: Multi-Agent Patrolling on a Budget: Finding the Best Team ... · The challenge of choosing the best team of security resources on a budget arises in many different domains of security

Ru

ntim

e (

s)

0

1500

3000

Ga

me

Va

lue

-10

-5

0

Budget

40 50 60 70

Game Value p-FORTS

t-FORTS BF

(2.1)

Budget

0

6000

12000

Ga

me

Va

lue

-5

-2.5

0

50 60 70 80

Game Value p-FORTS

t-FORTS BF

(2.2)

Ru

ntim

e (

s)

0

100

200

Ga

me

Va

lue

-20

0

Budget

10 15 20 25 30

Game Value t-FORTS

p-FORTS BF

(2.3)

% D

ecre

ase

0

25

50

Budget

30 35 40 45

2 10 20

(2.4)

Fig. 2: Runtime scalability with increasing search space.

consist of n nodes distributed randomly in a plane. Nodes are connected based on theirdistance which determines the density of the graph.

5.1 Scalability

We first evaluate the performance of p-FORTS and t-FORTS as the size of the searchspace increases. We test two types of graphs and two sets of resources types. We holdconstant the number of resources types at K = 3 and the cost of each resource typeb = {6, 8, 30}. The first set of resources Λ1 have varied edge coverages L1 = {2, 3, 5}and constant coverage probability P 1 = {1, 1, 1} while the second set Λ2 has constantedge coverage L2 = {1, 1, 1} and varied coverage probabilities P 2 = {0.6, 0.7, 0.8}.We use a brute force search as a baseline. Both p-FORTS and t-FORTS outperform thebrute force search and scale much better to larger budgets.

Varying Coverage Probability Figure 2.1 shows the runtimes for p-FORTS, t-FORTSand brute force as a function of increasing budget on G(4, 4, 4, 4) with target values of50. The average game value of the optimal team is superimposed on the y-axis to theleft, runtime on the right and with budget on the x-axis. Figure 2.2 shows the runtimes

Page 11: Multi-Agent Patrolling on a Budget: Finding the Best Team ... · The challenge of choosing the best team of security resources on a budget arises in many different domains of security

Ru

ntim

e (

s)

0

70

140

Budget

10 20 30

Budget-Limited p-FORTSNo-Limits p-FORTS

(3.1)

Ru

ntim

e (

s)

0

70

140

Edges

80 130 180

No-Limits

MinCut-Limit

(3.2)

Ru

ntim

e (

s)

0

150

300

Edges

80 130 180

tFORTS pFORTS

(3.3)

Fig. 3: Component evaluation and comparison of searches

for geometric graphs with 40 nodes, 5 sources, 5 targets and have on average 100 edges.The average game value is again superimposed on the y-axis. Several instances of thebrute force search were cut off at 12000 seconds and are included in the data at thisruntime. On both graphs p-FORTS and t-FORTS outperform brute force search and thescalability appears to stay relatively level with increasing search space, while the bruteforce increases exponentially. We see over 2000% decrease in runtime on grid graphsand more than 10000% decrease for geometric graphs.

Varying Coverage Size Figure 2.3 shows the runtimes for searches run onG(5, 5, 5, 5)graphs. The budget varies on the x-axis with the left y-axis showing game value and theright showing runtime. Both search methods consistently outperform the brute force andshow a decrease in runtime as the game value approaches 0. The maximum runtimesfor our search methods occur around a budget of 20 to 25. This is due to the deploymentvs saturation phenomenon which occurs in these types of network models [13].

We observe that t-FORTS and p-FORTS, run with optimal and approximate versionsof TROOPS scale in a similar fashion, indicating that the sup-optimality of the DO inTROOPS for imperfect coverage teams does not drastically change the behaviour of oursearch methods.

Figure 2.4 shows the runtime reduction for the parallel p-FORTS on 40 node geo-metric graphs with teams of varying coverage. The number of processors varies from 2to 20, with results showing a 25% to 40% decrease in runtime. The speedup is not linearin part because CPLEX does some internal multithreading, and because the parallel partof the algorithm is only used in the initial phase.

Limiting the Search and Constant Mincut Figures 3.1 and 3.2 show the benefit oflower bounding the budget in p-FORTS and limiting the number of virtual resourcesin t-FORTS. A lower bound of B − 5 is used for the budget limit. We ran the budgetlimit tests on G(5, 5, 5, 5) graphs, with teams of Λ1 resources. In all cases both thebudget limited search and the unlimited search returned the same game value. We seean 18% to 80% decrease in runtime as the size of the search space increases. We testedlimiting virtual resources on G(w, 5, 5, 5) with w varied and Λ1 resources. The height

Page 12: Multi-Agent Patrolling on a Budget: Finding the Best Team ... · The challenge of choosing the best team of security resources on a budget arises in many different domains of security

Ga

me

Va

lue

-80

-40

0

Budget

20 30 40 50

Optimal-5

Random-5

Optimal-4

Random-4

(4.1)(4.2)

Fig. 4: Team composition investigation

of the graph was held constant at 5 so the min-cut size did not change. In all cases theMinCut-Limited search outperformed the unlimited search. Although in most cases itappears p-FORTS outperforms t-FORTS there are some domains where this is not true.In figure 3.3 we compare the two searches on a lattice graph with constant size min-cutand show that t-FORTS scales much better with large graphs that have small min-cutsizes.

5.2 Team Composition

Here we demonstrate the value of optimizing over team composition. We first comparethe game values of optimal teams found by p-FORTS and t-FORTS to randomly gen-erated teams. In these experiments we consider a larger set of resource types, k = 6,and vary both the edge coverage L = {2, 2, 5, 3, 3, 6}, and the probability coverageP = {0.7, 0.9, 0.7, 0.6, 0.6, 0.6}. The resource costs are b = {6, 8, 10, 6, 8, 10}. Addi-tionally, we only consider teams which saturate the budget, so that no team dominatesanother. We average the game values of 20 randomly generated teams on G(5,5,5,5)and G(4,4,4,4) graphs with target values of 200. The results are shown in Figure 4.1with budget on the x-axis and game value on the y-axis. There are large improvementsin performance when optimizing over team composition, with optimal teams provid-ing improvements in some cases of over 90% in game value. We also investigate thediversity of the optimal teams found and show that the best teams often have multipledifferent types of resources and vary for different graphs. For the first set of experimentsusing only three resource types, we plot each team on a 3-dimensional unit square basedon the number of each resource type in the team, shown in Figure 4.2.

5.3 Madagascar Example

We test p-FORTS and t-FORTS using a map generated from an at risk forest regionof Madagascar, the Masoala National Park. The graph has 109 nodes and 262 edgesand follows the road and river network of the forest, which are the primary means of

Page 13: Multi-Agent Patrolling on a Budget: Finding the Best Team ... · The challenge of choosing the best team of security resources on a budget arises in many different domains of security

human access to the area. We choose two villages as source nodes and three targetsbased on infraction locations reported by the national park patrols. To more closelymodel real world resources, we choose resources from a set that varies both coveragesize and coverage probability, using the same set of resources as the previous section.Data is averaged over 20 runs. Table 5.2 demonstrates that FORTS can scale up to realworld domains. Figure 5.1 also shows that optimizing over team composition providessignificant savings in real world domains as well.

Ga

me

Va

lue

-12.5

0

Budget

20 30 40 50

Optimal

Random

(5.1) Team optimization on Malasoa Graph

Runtime (seconds)Budget t-Forts p-Forts

20 223 24430 1759 138940 1413 542

(5.2) Runtime on Malasoa Graph

Fig. 5: Testing on Madagascar Graphs

6 Conclusion and Related Work

The problem of how to allocate agents to specific roles to maximize performance hasbeen studied for mission rehearsal and RoboCupRescue [20]. Other research has con-sidered how to lead an ad-hoc team to the optimal joint action [1], automatically con-figuring a network of agents [8], and agents cooperating in a board game [21]. Therehas also been some work on team formation in adversarial settings, though not specifi-cally on security games. Team formation has been used to beat human players in fantasyfootball [19], and build teams of diverse agents to improve performance of a Go-playingagent [18]. [4] studies multi-objective optimization for a coalition and how trust effectsthe success or failure of a mission in a tactical game. Additional work on optimallyallocating security resources to scheduling and patrolling problems has considered onlythe optimization problem for a known, fixed set of resources [14, 23].

We study the problem of optimizing teams of resources for security games, bridgingthese research areas. Selecting the best combination of resources to invest in is a keystrategic question in many security domains. Our model and algorithms provide a newset of tools to answer these team formation questions based on the foundation of tacti-cal analysis for the individual teams using security games. Our empirical results clearlydemonstrate the importance of team formation; optimal teams are substantially betterthan arbitrary teams in a wide variety of settings. Solving the team formation problemfor graphs of realistic scale required several novel algorithmic advances including gen-

Page 14: Multi-Agent Patrolling on a Budget: Finding the Best Team ... · The challenge of choosing the best team of security resources on a budget arises in many different domains of security

eralizations of methods for evaluating individual teams and new methods for findingthe right team to invest in given budget restrictions.

References

1. Agmon, N., Stone, P.: Leading ad hoc agents in joint action settings with multiple teammates.In: AAMAS. pp. 341–348 (2012)

2. Basilico, N., Gatti, N., Amigoni, F.: Leader-follower strategies for robotic patrolling in envi-ronments with arbitrary topologies. In: AAMAS (2009)

3. Bosansky, B., Kiekintveld, C., Lisy, V., Cermak, J., Pechoucek, M.: Double-oracle algorithmfor computing an exact nash equilibrium in zero-sum extensive-form games. In: AAMAS.pp. 335–342 (2013)

4. Cho, J.H., Chen, I.R., Wang, Y., Chan, K.S., Swami, A.: Multi-objective optimization fortrustworthy tactical networks: A survey and insights. Tech. rep., DTIC Document (2013)

5. Dell, M.: Trafficking networks and the mexican drug war (job market paper). Tech. rep.,Working Paper (2011)

6. Eppstein, D., Goodrich, M.T.: Studying (non-planar) road networks through an algorithmiclens. In: Proceedings of the 16th ACM SIGSPATIAL international conference on Advancesin geographic information systems. p. 16. ACM (2008)

7. Fiedrich, F., Gehbauer, F., Rickers, U.: Optimized resource allocation for emergency re-sponse after earthquake disasters. Safety science 35(1), 41–57 (2000)

8. Gaston, M.E., desJardins, M.: Agent-organized networks for dynamic team formation. In:AAMAS. pp. 230–237. ACM (2005)

9. Haskell, W.B., Kar, D., Fang, F., Tambe, M., Cheung, S., Denicola, L.E.: Robust protectionof fisheries with compass. In: IAAI (2014)

10. Hunsberger, L., Grosz, B.J.: A combinatorial auction for collaborative planning. In: Multi-Agent Systems, 2000. Proceedings. Fourth International Conference on. pp. 151–158. IEEE(2000)

11. Jain, M., Conitzer, V., Tambe, M.: Security scheduling for real-world networks. In: AAMAS.pp. 215–222 (2013)

12. Jain, M., Korzhyk, D., Vanek, O., Conitzer, V., Pechoucek, M., Tambe, M.: A double oraclealgorithm for zero-sum security games on graphs. In: AAMAS. pp. 327–334 (2011)

13. Jain, M., Leyton-Brown, K., Tambe, M.: The deployment-to-saturation ratio in securitygames. In: Conference on Artificial Intelligence (AAAI) (2012)

14. Kiekintveld, C., Jain, M., Tsai, J., Pita, J., Ordonez, F., Tambe, M.: Computing optimal ran-domized resource allocations for massive security games. In: AAMAS. pp. 689–696 (2009)

15. Korzhyk, D., Conitzer, V., Parr, R.: Complexity of computing optimal stackelberg strategiesin security resource allocation games. In: AAAI (2010)

16. Letchford, J., Vorobeychik, Y.: Computing randomized security strategies in networked do-mains. In: AARM (2011)

17. Liemhetcharat, S., Veloso, M.: Modeling and learning synergy for team formation withheterogeneous agents. In: AAMAS. pp. 365–374. AAMAS ’12, Richland, SC (2012),http://dl.acm.org/citation.cfm?id=2343576.2343628

18. Marcolino, L.S., Xu, H., Jiang, A.X., Tambe, M., Bowring, E.: Give a hard problem to adiverse team: Exploring large action spaces. In: AAAI (2014)

19. Matthews, T., Ramchurn, S.D., Chalkiadakis, G.: Competing with humans at Fantasy Foot-ball: Team formation in large partially-observable domains. In: Proceedings of the 26th Con-ference of the Associations for the Advancement for Artificial Intelligence. pp. 1394–1400(2012)

Page 15: Multi-Agent Patrolling on a Budget: Finding the Best Team ... · The challenge of choosing the best team of security resources on a budget arises in many different domains of security

20. Nair, R., Tambe, M.: Hybrid BDI-POMDP framework for multiagent team-ing. Journal of Artificial Intelligence Research 23(1), 367–420 (apr 2005),http://dl.acm.org/citation.cfm?id=1622503.1622512

21. Obata, T., Sugiyama, T., Hoki, K., Ito, T.: Consultation algorithm for Computer Shogi: Movedecisions by majority. In: Computer and Games’10. Lecture Notes in Computer Science, vol.6515, pp. 156–165. Springer (2011)

22. Okamoto, S., Hazon, N., Sycara, K.: Solving non-zero sum multiagent network flow securitygames with attack costs. In: AAMAS. pp. 879–888 (2012)

23. Shieh, E., Jiang, A.X., Yadav, A., Varakantham, P., Tambe, M.: Unleashing dec-mdps in se-curity games: Enabling effective defender teamwork. In: Proceedings of the European Con-ference on Artificial Intelligence (ECAI). pp. 2340–2345 (2014)

24. Vanek, O., Bosansky, B., Jakob, M., Pechoucek, M.: Transiting areas patrolled by a mobileadversary. In: Computational Intelligence and Games (CIG). pp. 9–16. IEEE (2010)

25. Yang, R., Ford, B., Tambe, M., Lemieux, A.: Adaptive resource allocation for wildlife pro-tection against illegal poachers. In: International Conference on Autonomous Agents andMultiagent Systems (AAMAS) (2014)