intention progression using quantitative summary information
TRANSCRIPT
Intention Progression usingQuantitative Summary InformationYuan Yao
Zhejiang University of Technology
Natasha Alechina
Utrecht University
Brian Logan
Utrecht University
John Thangarajah
RMIT University
ABSTRACTA key problem for Belief-Desire-Intention (BDI) agents is intentionprogression, i.e., which plans should be selected and how the ex-
ecution of these plans should be interleaved so as to achieve the
agent’s goals. Monte-Carlo Tree Search (MCTS) has been shown
to be a promising approach to the intention progression problem,
out-performing other approaches in the literature. However, MCTS
relies on runtime simulation of possible interleavings of the plans
in each intention, which may be computationally costly. In this pa-
per, we introduce the notion of quantitative summary informationwhich can be used to estimate the likelihood of conflicts between
an agent’s intentions. We show how offline simulation can be used
to precompute quantitative summary information prior to execu-
tion of the agent’s program, and how the precomputed summary
information can be used at runtime to guide the expansion of the
MCTS search tree and avoid unnecessary runtime simulation. We
compare the performance of our approach with standard MCTS
in a range of scenarios of increasing difficulty. The results suggest
our approach can significantly improve the efficiency of MCTS in
terms of the number of runtime simulations performed.
KEYWORDSIntention progression; BDI agents; Qualitative summary informa-
tion; Monte-Carlo Tree Search
ACM Reference Format:Yuan Yao, Natasha Alechina, Brian Logan, and John Thangarajah. 2021.
Intention Progression using Quantitative Summary Information. In Proc.of the 20th International Conference on Autonomous Agents and MultiagentSystems (AAMAS 2021), Online, May 3–7, 2021, IFAAMAS, 9 pages.
1 INTRODUCTIONIn Belief-Desire-Intention-based (BDI) agents [14] the behaviour
of an agent is specified in terms of beliefs, goals, and plans. Beliefsrepresent the agent’s information about the environment (and it-
self), goals represent desired states of the environment the agent
is trying to bring about, and plans are the means by which the
agent can achieve its goals. Plans consist of primitive actions that
directly change the state of the environment, and subgoals which
are in turn achieved by subplans. When the agent commits to a
particular plan to achieve a (top-level) goal, an intention is formed.
At each deliberation cycle, the agent chooses which of its multiple
intentions it should progress (i.e., intention selection) and, if the
Proc. of the 20th International Conference on Autonomous Agents and Multiagent Systems(AAMAS 2021), U. Endriss, A. Nowé, F. Dignum, A. Lomuscio (eds.), May 3–7, 2021, Online.© 2021 International Foundation for Autonomous Agents and Multiagent Systems
(www.ifaamas.org). All rights reserved.
next step within the selected intention is a subgoal, chooses the best
plan to achieve it (i.e., plan selection). These two choices together
form the intention progression problem [13].
An agent typically pursues multiple goals in parallel. In many
BDI agent programming languages and platforms, by default, the
intentions for each of the agent’s top-level goals are progressed
in round robin fashion, i.e., at each deliberation cycle, the next
step of the next intention in sequence is executed [2, 32]. The
resulting interleaving of steps in plans in different intentions may
result in conflicts, i.e., the execution of a step in one plan may
make the execution of a step in another concurrently executing
plan impossible. Most languages allow a developer to over-ride the
default scheduling behaviour to avoid conflicts, e.g., by over-riding
a method or by writing a meta-plan. However, it is difficult for
the developer to anticipate all possible conflicts in advance, and it
would be desirable for the agent to solve the intention progression
problem itself, i.e., which plans should be selected and how the
execution of these plans should be interleaved so as to achieve the
agent’s goals without conflicts.
A number of approaches to various aspects of the intention
progression problem have been proposed in the literature, includ-
ing, summary-information-based (SI) [21, 22], coverage-based (CB)
[29, 30] and Monte-Carlo Tree Search-based (MCTS) [35–37] ap-
proaches. In [37] Yao et al. showed that the MCTS-based approach
SA out-performed round robin, first in first out, SI, and CB intention
progression in static and dynamic environments, both in synthetic
domains and a more realistic scenario based on the Miconic-10 ele-
vator domain [12]. However, SA relies on runtime pseudorandom
simulations of different interleavings of the plans in each intention
to determine which intention to progress at the current delibera-
tion cycle. Random simulation has the advantage that all possible
interleavings have a probability of being explored. However the
random nature of the simulations often results in the generation
of many “obviously bad” interleavings, which is computationally
costly. If the computational budget is (too) small, the performance
of MCTS can be unstable in problems with a large search space,
such as intention progression [33].
In this paper, we showhow the computational overhead ofMCTS-
based scheduling approaches can be reduced using summary infor-
mation. Summary information was introduced in [22] as a way of
characterising the definite and potential pre- and post-conditions of
different ways of achieving a goal. Using this information, a sched-
uler could decide to, e.g., delay progressing an intention, if doing
so would necessarily result in a conflict with another intention.
The summary information in [21–23, 37] is represented as boolean
combinations of definite and potential pre- and postconditions, and
Main Track AAMAS 2021, May 3-7, 2021, Online
1416
is essentially qualitative, that is, it does not consider the likelihood
that a precondition will be required or a postcondition will be es-
tablished. As a result, it becomes less informative as the agent’s
program grows in complexity — with more ways of achieving a
goal, very few pre- and postconditions are definite, and many pre-
and postconditions become potential. Moreover, the size of the sum-
mary information itself grows exponentially with the complexity
of the agent’s program. (Computing it also takes exponential time,
but this is done offline.)
We introduce the notion of quantitative summary information
which can be used to estimate the likelihood of conflicts between
an agent’s intentions. We show how offline simulation can be used
to precompute quantitative summary information prior to execu-
tion of the agent’s program, and how the precomputed summary
information can be used at runtime to guide the expansion of the
MCTS search tree and avoid unnecessary runtime simulation. We
compare the performance of our approach with standard MCTS
in a range of scenarios of increasing difficulty. The results suggest
our approach can significantly improve the efficiency of MCTS in
terms of the number of runtime simulations performed.
2 PRELIMINARIESIn this section, we briefly recall the definition of a goal-plan tree
that forms the foundation of our approach, and formally define the
intention progression problem.
2.1 Goal-Plan TreesWe use the notion of goal-plan trees [21, 22, 34] to represent the
relations between goals, plans and actions, and to reason about the
interactions between intentions. The root of a GPT is a goal node
д, its children are plan nodes that can be used to achieve д. Eachplan node contains a sequence of action nodes and (sub)goal nodes.
The subgoals have their associated plans as their children, giving
rise to a tree structure representing all possible ways an agent can
achieve the top-level goal д.
Figure 1: Example goal-plan tree.
Figure 1 shows a simple goal-plan tree. The top level goal, G0
can be achieved by two plans, P0 and P2, and can be seen as an “OR”
node, in that the agent has a choice of which plan to use. The plan
P0 consists of a single action, A0, and a subgoal G1. Plan nodes can
be seen as “AND” nodes, in that, if P0 is selected, action A0 must
be executed and G1 must be achieved.
To allow reasoning about interactions between intentions, a goal
plan tree records information about the conditions necessary to
achieve a (sub)goal or successfully execute a plan or an action in
the form of pre- and post-conditions associated with goal, plan and
action nodes. Preconditions are conditions that must be true in order
to execute a plan or an action. Postconditions are conditions that aremade true by executing a plan or an action or by achieving a goal
(in addition to achieving the goal itself). For simplicity, in what
follows, we assume that pre- and postconditions are sets of literals.
For example, the action A3 in Figure 1 may have p0 as preconditionand {p1,p2} as postcondition, A4 may have p1 as precondition and
p3 as postcondition, and action A5 in plan P3 may have {p2,p3} asprecondition. That is, preceding steps establish preconditions for
later steps.
2.2 Intention Progression ProblemThe intentions of an agent at each deliberation cycle are represented
by a pair I = (T , S ) whereT = {t1, . . . , tn } is a set of goal-plan trees
and S = {s1, . . . , sn } is a set of pointers to the current step of each ti .Each si is initially set to the root goal of ti , дi . We define next (si ) asthe successor of si in a pre-order traversal of ti (i.e., if si is the laststep in a plan π , then next (si ) is the next step in the parent plan of π ),and nexta (si ) as the first action step of ti following si . If next (si ) isan action, nexta (si ) = next (si ). If si is a (top-level) goal or next (si )is a subgoal, determining the next action step involves choosing a
plan π for the (sub-)goal and returning the first action step in π (if
the first step in π is again a subgoal, then we also need to choose
a plan to achieve it and so on). A current step si is progressable ifthere exists an action a = nexta (si ) whose precondition holds.
The intention progression problem (IPP) [13] is that of choosing
a current step si ∈ S to progress (i.e., advance to nexta (si )) ateach deliberation cycle so as to maximise the agent’s utility. There
are many ways in which utility may be defined, e.g., taking into
account the importance or priority of goals, the deadlines by which
goals should be achieved, the ‘fairness’ or order in which goals are
achieved, the costs or preferences of actions, etc. For concreteness,
in what follows we assume that the agent’s utility is maximised by
achieving the largest number of goals.
3 QUANTITATIVE SUMMARY INFORMATIONIn this section, we introduce the notion of quantitative summary
information and explain how it can be computed.
3.1 Fragile and Establishing RatiosWe define quantitative summary information in terms of possible
executions of the plans an agent may use to achieve its goals. A
trace is a sequence of primitive actions corresponding to a possible
execution of a plan for a goal. Traces are generated by expanding the
subgoals in the plan and the plans for those subgoals recursively.
The set of execution traces, traces(π ), induced by a plan π in a
goal-plan tree is given by:
traces(π ) = {a | a = expand (e1), . . . , expand (ek )}
where π = e1, . . . , ek and
expand (a) = a
expand (д) ∈ traces(π ′) where π ′ ∈ plans(д)
Main Track AAMAS 2021, May 3-7, 2021, Online
1417
where a is a primitive action, д is a subgoal and plans(д) is the setof plans available to achieve д. A trace σ = a1, . . . ,an is coherent if,for every pair of steps ai , ak such that i < k , ai has a postcondition∼l , ak has a precondition l , there exists aj with i < j < k such that
aj has postcondition l . We stipulate that each trace σ ∈ traces(π )is coherent.
We characterise traces in terms of the fragility and establishment
of literals. We denote by L(T ) the set of literals l such that l appearsin the postcondition of a goal, plan or action inT , and ∼l appears inthe precondition of a plan or action inT . A literal l ∈ L(T ) is fragileon an interval [i, j] of a trace a1, . . . ,an , iff l is a postcondition of a
step ai , a precondition of a step aj , and no ak with i < k < j hasl as a postcondition. Intuitively, a literal l is fragile in an interval
where it is established but not yet used, and may be clobbered by
another interleaved intention that establishes ∼l . A special case is
when l is true at the beginning of the plan (e.g., l is a preconditionof the plan which is true in the environment) and is used for the
first time by some aj , then l is fragile during the interval [0, j].To characterise fragility of a literal l on a trace, we only consider
maximal intervals where l is fragile, that is, if l is established by
step i and then used by steps j and j ′, where j < j ′, without beingre-established before j ′, we only consider the interval [i, j ′]. Thenotion of an interval where l is fragile is similar to the notion of
preparatory effects (p-effects) defined in [22]. However we extend
the notion of p-effect in [22] to include the establishment of the
precondition of an action by a previous action in the same plan,
and to include ‘unestablished preconditions’, i.e., dependencies on
prior execution or the environment.
The length of an interval [i, j] is j − i + 1. We define the fragileratio of l for a plan π , fr (l ,π ), as the ratio of the sum of lengths of
the intervals where l is fragile on all traces of π to the total length
of all traces of π ,
fr (l ,π ) =Σσ ∈traces(π ),[i, j]∈F (l,σ ) (j − i + 1)
Σσ ∈traces(π )lenдth(σ )
where for σ = a1, . . . ,an , F (l ,σ ) = {[i, j] | l ∈ post (ai ) ∧ l ∈pre (aj ) ∧ ¬∃k (i < k < j ∧ l ∈ post (ak )) ∧ ¬∃j
′(i < j < j ′ ∧ l ∈pre (aj′ ) ∧¬∃k
′(i < k ′ < j ′ ∧ l ∈ post (ak ′ )))}. The fragile ratio canbe seen as a measure of a plan’s robustness: the larger the fragile
ratio for l , the more likely interleaving steps from plans in other
intentions will destroy (clobber) a fragile precondition l .We define the fragile ratio of l for a goal д, fr (l ,д) in terms of the
fragile ratios of the plans for д,
fr (l ,д) =Σπ ∈plans(д),σ ∈traces(π ),[i, j]∈F (l,σ ) (j − i + 1)
Σπ ∈plans(д),σ ∈traces(π )lenдth(σ )
We also define a complementary notion of an establishing ratio.
Given a trace a1, . . . ,an , a step ak , 1 ≤ k ≤ n is establishing wrt to
a literal l ∈ L(T ) iff l is a postcondition of ak . For a plan π and a
literal l , the establishing ratio of l on π , er (l ,π ), is the ratio of the
number of steps establishing l on all traces of π , to the total length
of all traces of π :
er (l ,π ) =Σσ ∈traces(π )E (l ,σ )
Σσ ∈traces(π )lenдth(σ )
where E (l ,σ ) is the number of steps establishing l on σ . The es-tablishing ratio can be seen as a measure of a plan’s potential to
conflict with plans in other intentions: the larger the establishing
ratio for a literal l , the more likely interleaving steps from this plan
will destroy a fragile precondition ∼ l in another intention.
As for fragility, we define the establishing ratio of l for a goal д,er (l ,д) in terms of the establishing ratios of the plans for д,
er (l ,д) =Σπ ∈plans(д),σ ∈traces(π )E (l ,σ )
Σπ ∈plans(д),σ ∈traces(π )lenдth(σ )
3.2 ComplexityAs with qualitative summary information, we wish to compute and
store quantitative summary information for each goal and plan
node in a goal-plan tree so that it can be used at runtime to guide
the scheduling of intentions. However, as we show next, computing
the fragile and establishing ratios for plans (and hence for goals) is
computationally costly.
Proposition 3.1. Computing fr (l ,д) can be done using spacepolynomial in the GPT representation of plans(д).
Proof. To show that fr (l ,д) can be computed in PSPACE in the
size of plans(д), observe that each σ ∈ traces(π ) for π ∈ plans(д)is linear in the size of π (since it is a concatenation of a subset of
branches in π ) hence it is linear in the size of plans(д). The number
of traces in traces(π ) in exponential in the size of π , however foreach σ we can compute Σ
[i, j]∈F (l,σ ) (j − i + 1) one at a time using
polynomial amount of space, and keep a running proportion of total
length of fragile intervals to the total length of traces explored so
far. Note that the values for the numerator and denominator (sum
of lengths of traces) may be exponential in the size of plans(д), butthey can be represented in binary using only polynomial amount of
space. We can keep track of where we are in plans(д) by placing a
pointer on a node in a goal-plan tree, again using only polynomial
space. □
In the next theorem, we show that the problem of computing
fr (l ,д) is ♯P-hard. The complexity class ♯P contains function form
problems that involve counting solutions to NP problems [24].
Clearly a solution to the problem of counting satisfying assign-
ments is also a solution to the boolean satisfiability problem, so it
is at least as hard as an NP-complete problem. This means that it is
not possible to compute fr (l ,д) in polynomial time in the size of
the GPT for д, unless FP=♯P, where FP is the set of function form
problems solvable in polynomial time. FP=♯P would entail P=NP,
which is unlikely.
Theorem 3.2. The problem of computing fr (l ,д) is ♯P-hard.
Proof. We reduce the ♯P-complete problem of counting the
number of satisfying assignments for a formula in 3CNF to comput-
ing fr (l ,д). We build on the proof sketch in [20], which shows that
the problem of whether there exists an execution of a given GPT is
NP-complete by constructing a GPT corresponding to a 3CNF for-
mula and showing that the formula is satisfiable if and only if there
is an execution of (coherent trace of a plan in) a corresponding GPT.
As in [20], we construct a GPT corresponding to a 3CNF formula
ϕ over n variables, that has two plans. One plan is essentially the
same as the encoding of ϕ in [20], however each execution of it
ends with an action that requires a propositional variable p that
Main Track AAMAS 2021, May 3-7, 2021, Online
1418
does not occur in ϕ. This means that p is fragile in the entire exe-
cution corresponding to a satisfying assignment for ϕ. The secondplan makes sure that the GPT has an execution corresponding to
every one of 2npossible assignments to the variables in ϕ. The
second plan has traces corresponding to assignments satisfying ¬ϕ.The variable p is not fragile on any of these traces. All executions
have the same length. Then ϕ has k/2n satisfying assignments iff
fr (p,д) = k/2n .The top level goal д in the GPT has two plans, πϕ and π¬ϕ . First
we construct πϕ . Consider a 3CNF formula ϕ withm clauses. Each
clause ci , 1 ≤ i ≤ m, has three literals li1, li2, li3. In a satisfying
assignment for ϕ, at least one li j for each ci is made true. Following
[20], we model this by constructing πϕ which is a sequence of
subgoals, each corresponding to a clause ci in ϕ. Each subgoal
ci has three plans, each containing a single action li j intuitivelycorresponding to setting the literal li j to true. In order to make
sure that only one of two literals q and ¬q can be set to true in an
execution of πϕ , each action li j requires li j as a precondition, andalso has it as a postcondition. So if we execute plan q for subgoal ci ,we cannot execute plan ¬q for a subgoal c j occurring later in πϕ .The last action in plans for subgoal cm (the last subgoal in πϕ ) endsin an action p that requires p. Since no action establishes p earlier
in πϕ , p is fragile for the duration of any trace of πϕ . It is also clear
that any trace in traces(πϕ ) corresponds to a satisfying assignment
of ϕ (recall that we require traces to be coherent). The length of all
traces of πϕ is the same,m + 1 (where we executem actions, one
for each clause to set a literal to true, and one for p at the end).
For the second plan π¬ϕ we first compute a 3DNF representation
of ¬ϕ: it is a disjunction ofm conjuncts d1 ∨ . . . ∨ dm , where di is(¬li1∧¬li2∧¬li3). In order to make ¬ϕ true, it is sufficient to make
one disjunct di true, and this means making all three literals in it
true. So π¬ϕ consists of a single subgoal s¬ϕ withm plans, and each
plan for di is a sequence of actions corresponding to ¬li1, ¬li2, ¬li3,followed by the same sequence of lengthm − 2 trivial steps withno preconditions (to make traces of π¬ϕ to be of the same length
as the traces of πϕ ). The variable p is not used in any subplans
of π¬ϕ and is not fragile on any intervals in those traces. Clearly
plans(д) have in total 2ntraces, all of lengthm + 1, and all the ones
where p is fragile for allm + 1 steps correspond to the satisfying
assignments of ϕ. So the ϕ has k/2n satisfying assignments if, and
only if, fr (p,д) = (m + 1)k/(m + 1)2n = k/2n . □
Proposition 3.3. The problem of computing er (l ,д) is in PSPACEand ♯P-hard.
The proof is similar to that for Proposition 3.1 and Theorem 3.2
and is omitted due to lack of space.
3.3 Estimating Fragile and Establishing RatiosThe high computational cost makes computation of precise quan-
titative summary information difficult, even in an offline setting.
We therefore adopt an approach based on sampling to estimate the
quantitative summary information for each goal and plan in an
agent’s goal-plan trees. For each plan π in a goal-plan tree ti ∈ T ,we generate ν traces using pseudorandom simulation. As the sim-
ulations are performed offline, without knowledge of the agent’s
environment at runtime, we assume that all consistent environ-
ments are possible. That is, when performing the simulations, we
assume that any unestablished preconditions in a trace hold, e.g.,
they are established by a parent plan of π or are true in the envi-
ronment. As we require traces to be coherent, we are guaranteed
that each trace is executable in some (consistent) environment.
Given a set of traces R (π ) produced by pseudorandom simulation
of π , for each σ ∈ R (π ) and literal l ∈ L(T ), we compute the sum of
the lengths of the intervals where l is fragile, the number of times lis established and the length of σ . We can then estimate the fragile
ratio of l for π , fr∗ (l ,π ), as
fr∗ (l ,π ) =Σσ ∈R (π ),[i, j]∈F (l,σ ) (j − i + 1)
Σσ ∈R (π )lenдth(σ )
Similarly, we can estimate the establishing ratio of l onπ , er∗ (l ,π )as
er∗ (l ,π ) =Σσ ∈R (π )E (l ,σ )
Σσ ∈R (π )lenдth(σ )
The estimates of the fragile and establishing ratios for a goal д,fr∗ (l ,д) and er∗ (l ,д) are computed from the sets of traces R (π ) foreach π ∈ plans(д) as follows
fr∗ (l ,д) =Σπ ∈plans(д),σ ∈R (π ),[i, j]∈F (l,σ ) (j − i + 1)
Σπ ∈plans(д),σ ∈R (π )lenдth(σ )
er∗ (l ,д) =Σπ ∈plans(д),σ ∈R (π )E (l ,σ )
Σπ ∈plans(д),σ ∈R (π )lenдth(σ )
Note that the offline computation of fr∗ and er∗ need only be per-formed once: it is essentially a static analysis of the agent program
and forms part of the development phase rather than the execution
of the agent.
4 THE SQ SCHEDULERIn this section, we present SQ , an MCTS-based solver for intention
progression problems that uses quantitative summary information
to guide the expansion of the search tree and avoid unnecessary
runtime simulation. To choose which intention to progress, SQiteratively builds a search tree. Edges in the tree represent the choice
of a nexta (si ) for one of the agent’s intentions. Nodes representthe state of the agent following the execution of the corresponding
action, that is, the agent’s beliefs updated with the postconditions of
the action, and the updated set of current step pointers. In addition,
each node also contains the current value of the node and a record
of the number of times it has been visited in the search (see below).
Algorithm 1 Return the action to be executed at the current cycle
1: function SQ (B, S, α, β, γ , δ )2: n0 ← node0 (B, S )3: for i ← 1, α do4: ne ← max-uct-leaf-node(n0)5: children(ne ) ← expand(ne )6: ns ← random-child(children(ne ))7: v (ns ) ← value(ns , β, γ , δ )8: backup(ns , v (ns ))9: return max-child(n0)
Main Track AAMAS 2021, May 3-7, 2021, Online
1419
SQ is shown in Algorithm 1 and takes six parameters as input:
the agent’s current beliefs, B, the set of pointers to the current step
in each of the agent’s intentions, S , the number of nodes in the
MCTS search tree to be expanded at this cycle, α , the number of
runtime simulations that may be performed when a node is ex-
panded, β , and two threshold values γ and δ . As with MCTS and SA[35], each iteration of the main loop consists of four main phases:
selection, expansion, simulation and back-propagation. However
the introduction of qualitative summary information requires sig-
nificant modifications to the simulation phase which are described
in detail below.
In the selection phase, a leaf node, ne is selected for expansion
(line 4). A node may be expanded if it represents a non-terminal
state (i.e., ∃si ∈ S (ne ) such that si is progressable). ne is selected
using a modified version of Upper Confidence bounds applied to
Trees (UCT) [16], which models the choice of node as a k-armed
bandit problem. Starting from the root node, we recursively follow
child nodes with highest UCT value until a leaf node is reached.
In the expansion phase, ne is expanded by adding a child node foreach nexta (si ) progressable in state s (ne ) representing the agent’s
beliefs resulting from executing nexta (si ) and the updated current
step pointer s ′i = nexta (si ) (line 5). That is, each child node cor-
responds to a different choice of which intention to progress at
this cycle, and, if next (si ) is a subgoal, how to progress it. One of
the newly created child nodes, ns , is then selected at random for
possible simulation (line 6).
In the simulation phase, the value of ns is estimated. The func-
tion value (see Algorithm 2) takes as argument ns , the number of
simulations we may perform, β , and the upper and lower bounds,
γ ,δ ,δ < γ , on the confidence interval within which runtime simu-
lation will be performed. We first compute quantitative summary
information for the agent’s intentions (line 3). Using the quantita-
tive summary information, we then compute the probability, p, ofachieving at leastm goals from the state represented by ns (line5). If p is less than or equal to δ , we assume achievingm goals is
unlikely, and decrementm (line 7). If the probability of achievingmgoals is greater than or equal to γ , we returnm as the value of ns(line 9). That is, if the probability of achievingm goals is sufficiently
low or sufficiently high, we “trust” the estimate based on qualitative
summary information and do not perform any runtime simulations.
Otherwise (i.e., δ < p < γ and we are uncertain whetherm goals
Algorithm 2 Return the value of node ns
1: function value(ns , β, γ , δ )2: m ← lenдth (intentions (ns ))3: Q ← qsi(intentions (ns ))4: whilem ≥ 2 do5: p ← goals-achieved(intentions (ns ), Q,m)6: if p ≤ δ then7: m ←m − 18: else if p ≥ γ then9: returnm10: else11: m ← 0
12: for i ← 1, β do13: m ← max(m, simulate(ns ))14: returnm
can really be achieved from ns ) we perform β pseudorandom simu-
lations from ns as in standard MCTS. Starting in the environment
represented by ns , a nexta (si ) that is executable in s (ns ) is ran-domly selected and executed, and the environment and the current
step of the selected goal-plan tree updated with s ′i = nexta (si ).This process is repeated until a terminal state is reached in which
no next steps can be executed or all top-level goals are achieved.
The value of the simulation is taken to be the number of top-level
goals achieved in the terminal state. The maximum number of goals
achieved in any of the β simulations is returned as the value of ns(lines 10–13).
To compute the fragile and establishing ratios for each of the
agent’s intentions ι at ns , the function qsi (see Algorithm 3) pro-
cesses each unexecuted step in each partially executed plan in ι,accumulating an estimate of the total number of fragile and estab-
lishing steps, fs, es, for each literal l , and the average length of an
execution of ι, n. If a step ι (j ) is a subgoal, we use the fr∗ (l , ι (j ))and er∗ (l , ι (j )) ratios multiplied by the average length of executions
of plans for ι (j ), ι (j )len , (computed offline) to estimate the number
of fragile and establishing steps for l required to achieve ι (j ), andadd these and ι (j )len to fs, es and n respectively. For actions, we
increment es if l is a postcondition of the action, keep track of the
number of action steps where l is fragile, r , and increment n. When
all steps have been processed, we divide fs and es by n to estimate
fr and er for l for ι. Note that computing fs, es and n involves only
arithmetic, and so is much less costly than runtime simulation.
The function goals-achieved (see Algorithm 4) uses the quan-
titative summary information for each intention to calculate the
maximum probability of the execution of a subset ofm intentions
being conflict free, i.e., of achievingm goals. For each pair of inten-
tions, ιi , ι j , and literal l ∈ L(T ), we compute the probability that a
step in ιi that is fragile on l will be clobbered by an establishing step
in ι j (ι j being clobbered by ιi is symmetric). The number of fragile
steps in ιi , fsi , is given by the fragile ratio for ιi , Qfr (l , ιi ), times
Algorithm 3 Return the QSI for the agent’s current intentions
1: function qsi(I )2: Q ← ∅3: for ι ∈ I do4: for l ∈ L(T ) do5: fs, es, n ← 0; r ← false6: for j ← lenдth (ι ) − 1, 0 do7: if goal-p(ι (j )) then8: es ← es + (er ∗ (l, ι (j )) × ι (j )len )9: fs ← fs + (f r ∗ (l, ι (j )) × ι (j )len )10: n ← n + ι (j )len11: else12: if l ∈ post(ι (j )) then13: es ← es + 114: r ← false15: else if l ∈ pre(ι (j )) then16: r ← true17: if r then18: fs ← fs + 119: n ← n + 120: Q (l, ι ) ← (fs/n, es/n, n)21: return Q
Main Track AAMAS 2021, May 3-7, 2021, Online
1420
Algorithm 4 Return the probability of achievingm top-level goals
1: function goals-achieved(I, Q,m)
2: p ← 0
3: for X ∈ {Y ⊆ I : |Y | =m } do4: pX ← 1
5: for ιi , ι j ∈ X , ιi , ι j do6: for l ∈ L(T ) do7: fsi ← Qfs (l, ιi ) ×Qn (ιi )8: esj ← Qes (l, ι j ) ×Qn (ι j )
9: c li, j ← 1 −(lenдth (ιj )−esj
fsi)
(lenдth (ιj )
fsi)
10: pX ← pX × (1 − c li, j )
11: p ← max(p, pX )
12: return p
the average length of executions of ιi , Qn (ιi ), and the number of
establishing steps in ιi , esj , is given by the establishing ratio for ι j ,Qer (l , ι j ) times the average length of executions of ι j , Qn (ι j ) (lines7–8). We assume the worst case, that is, the fsi steps are contiguous(constitute a single fragile region where l is established at the start
of the region and is used at the end), and that the fragile region is
at the start of intention ιi . We model ι j as a “bag” of steps, withprobability esj/lenдth(ι j ) that the next step in ι j is establishing. Ifwe are equally likely to chose a step from either intention when
forming the interleaving, then the probability that a step in ιi that
is fragile on l will be clobbered by an establishing step in ι j , cli, j
is the probability of placing fsi steps in the interleaving without
drawing an establishing step from ι j (line 9). The probability that a
set X of intentions is conflict free, pX , is simply the probability that
for each pair of intentions ιi , ι j ∈ X and literal l ∈ L(T ) no conflict
occurs (line 10), and we return the maximum pX (lines 11–12).
In the back-propagation phase, the value is back-propagated
from ns to all nodes on the path to the root node n0 and the visit
counts of the nodes on this path are incremented. Note that we
increment the visit count for ns and its parent nodes regardless of
whether any runtime simulations were actually performed from
ns . That is, if the probability of achievingm goals estimated using
qualitative summary information is greater than γ , we effectivelytreatm as the value we would have obtained had we performed
runtime simulation.
After α iterations, the step leading to the child of the root node
n0 which has the largest average simulation value (i.e., the largest
number of goals achieved compared to its siblings) is returned, and
the goal-plan trees and their associated current step pointers (one
of which has been updated) are assigned to I for use at the nextdeliberation cycle.
5 EVALUATIONIn this section, we evaluate the performance of SQ in a range of
scenarios of increasing difficulty. We compare the performance of
SQ with that of MCTS. The MCTS-based scheduler used in our
comparison is essentially the same as the SA scheduler described
in [35], except that it does not use coverage information to priori-
tise intentions which are more likely to become unexecutable in a
dynamic environment, and the utility function considers only the
number of goals achieved (as in SQ ) and does not consider fairness.
We evaluate the performance of SQ and MCTS on the number of
goals achieved, the number of runtime simulations required, and
the computation time required for each approach.
5.1 Experimental SetupIn the interests of generality, we evaluate SQ using sets of randomly-
generated, synthetic goal-plan trees representing the current inten-
tions of an agent in a simple environment. The synthetic trees are
similar to those used in the Intention Progression Competition1[13]
and in [35], except that the trees are not always balanced, i.e., not
all the leaf actions occur at the same depth. We conjecture that our
estimates of fragile and establishing ratios will give more accurate
results on perfectly balanced trees, since the average length of an
intention will be the same as the actual length. This may result in
enhanced performance of our approach on perfectly balanced trees,
and, in order to give unbiased comparison with the performance of
MCTS on arbitrary trees, we avoided the use of balanced trees.
The unbalanced trees were generated by introducing a new pa-
rameter, x , that specifies the probability of a plan being a leaf plan.
That is, rather than only generating leaf plans when the maximum
depth of the goal-plan tree is reached, a plan has probability x of
being a leaf plan, even when the current depth is smaller than the
maximum depth of the tree. An issue with unbalanced trees is that
the agent will favour short paths. To balance the difficulty of execu-
tions of different length, we increased the number of preconditions
on short paths, i.e., short paths are more likely to cause conflicts
when interleaved with other intentions. This seems reasonable —
in many cases the simplicity of a plan is inversely proportional to,
e.g., its resource cost.
In the experiments reported below, we vary the maximum depth,
d , of the goal-plan trees from 4 to 6, each goal has two relevant
plans, and each plan contains three actions and two subgoals. The
probability of a plan being a leaf plan was set to 0.2. The agent’s
environment is built from 60 propositions (corresponding to 120 lit-
erals). For each goal-plan tree, we select 30 propositions at random
and choose from the corresponding literals the pre- and postcondi-
tions of the actions in the tree. In each experiment, we generate 50
sets of 10 goal-plan trees with the parameters specified above.2The
fr∗ and er∗ ratios were calculated offline, by running 10000 random
simulations from each plan in each goal-plan tree.
The α parameter specifying the number of nodes expanded at
each deliberation cycle was set to 100 as in [35]. The β parameter
specifying the number of runtime simulations performed by MCTS
(and that may be performed by SQ ) was varied to create different
computational settings. SQ was configuredwithγ = 0.5 and δ = 0.1,
i.e., we only perform runtime simulations when the probability of
achievingm goals is in the range (0.1, 0.5). In all other cases, we
trust the quantitative summary information.
5.2 Experimental ResultsTo investigate whether the use of quantitative summary informa-
tion by SQ reduced the number of runtime simulations performed
1https://www.intentionprogression.org/
2The source code for SQ and goal-plan trees used in the experiments are available at
https://www.github.com/yvy714/SQ-MCTS.git
Main Track AAMAS 2021, May 3-7, 2021, Online
1421
d (α , β) (100, 100) (100,10) (100,1)
4
MCTS
#goals 9.9 9.7 9
#sims 2.27 ×106 2.24 ×105 2.36 ×104
time 1406.1 289.6 45.2
SQ
#goals 9.9 9.8 9.2
#sims 1.04 ×106 1.10 ×105 1.14 ×104
#saved 1.23 ×106 1.13 ×105 1.22 ×104
time 899.7 229.6 185.1
5
MCTS
#goals 8.9 8.7 7.7
#sims 2.82 ×106 2.73 ×105 2.85 ×104
time 1603.4 352.0 55.6
SQ
#goals 9.6 9.1 8.9
#sims 1.22 ×106 1.13 ×105 1.23 ×104
#saved 1.60 ×106 1.60 ×105 1.61 ×104
time 1002.0 276.0 213.1
6
MCTS
#goals 7.5 6.6 5.6
#sims 4.04 ×106 3.85 ×105 3.07 ×104
time 1737.1 415.2 66.4
SQ
#goals 8.0 7.6 6.4
#sims 1.47 ×106 1.70 ×105 1.27 ×104
#saved 2.57 ×106 2.14 ×105 1.80 ×104
time 1121.1 317.8 230.9
Table 1: Goals achieved and runtime simulations with vary-ing computational budget and problem difficulty.
and/or computation time, we considered three computational bud-
gets: β = 100, β = 10, and β = 1, and three levels of problem
difficulty: d = 4, d = 5, and d = 6. In Table 1, we report the average
(over 50 runs) number of goals achieved (#goals), average number
of runtime simulations required (#sims), and the average computa-
tion time for each approach in milliseconds (time). This is the time
required to return the first action to be executed, i.e., to compute
a complete interleaving of actions in all intentions starting from
the root of each goal-plan tree, and is essentially the worst case
for both approaches. As noted in [35], in a static environment, this
interleaving only needs to be recomputed when the agent adopts
a new top-level goal, so in the best case (no new top-level goals
during the execution of the current intentions) the computation
time is effectively amortised over the execution of all 10 intentions.
For SQ , we also report the average saving in runtime simulations
(#saved).3
As can be seen, for goal-plan trees with a maximum depth of 4,
the number of goals achieved by both approaches declines as the
computational budget (number of runtime simulations) decreases.
However, SQ achieves at least as many goals as MCTS in all cases,
and requires approximately 50% of the runtime simulations required
by MCTS. (The (α = 100, β = 10) case represents the same compu-
tational budget used in [35], and the number of goals achieved in
this case indicates that the unbalanced trees with maximum depth 4
are of similar difficulty to the balanced trees of depth 5 used in [35]
to evaluate SA.) In the β = 100 and β = 10 cases, the reduction in
3The average number of simulations and simulations saved are rounded to two signifi-
cant figures.
the number of runtime simulations results in a reduction in compu-
tation time of approximately 36% and 21% respectively. In the β = 1
case, the overhead of computing qualitative summary information
is significantly greater than the computation time saved by the
reduction in runtime simulations. However, with a small number
runtime simulations, the performance of MCTS can be unstable,
i.e., not only is the average solution quality lower, the variance in
the quality of solution returned is greater.
For goal-plan trees with maximum depth 5, the performance
of both approaches declines. (At this depth, the goal-plan trees
require about six times as many steps to achieve a top-level goal
as the GPTs in [35].) However the decline is more marked in the
case of MCTS, particularly for smaller values of β . Moreover, SQrequires 43% and 41% of the runtime simulations required by MCTS
in the β = 100 and β = 10 cases, with a corresponding reduction
in computation time of 38% and 22%. As in the d = 4 case, when
β = 1, the overhead of computing qualitative summary information
is significantly greater than the computation time saved by the
reduction in runtime simulations. However, SQ can achieve approx-
imately the same number of goals with one simulation as MCTS
with 10 simulations, and in 61% of the computation time, indicating
that SQ is effectively exploiting qualitative summary information.
At d = 6, a similar pattern can be observed. The performance of
both approaches declines further, with a more marked reduction
in the case of MCTS. SQ requires only 36% and 44% of the runtime
simulations required by MCTS, with a corresponding reduction in
computation time of 36% and 24% in the β = 100 and β = 10 cases.
When β = 1, SQ can achieve approximately the same number of
goals as MCTS with β = 10, and in 56% of the computation time.
Our experiments did not consider the case of a dynamic envi-
ronment. However, we stress that SQ can handle exogenous events.
The fr∗ and er∗ values for GPTs are unaffected by changes to the en-vironment, and the qsi and goals-achieved values are recomputed
at each cycle based on the actual progression of each intention
(which in turn is based on the agent’s current beliefs).
6 RELATEDWORKIn addition to the work of Thangarajah et al. [21–23] and Yao
et al. [35–37] discussed above, a number of other approaches to
scheduling intentions to avoid conflicts have been proposed in the
literature.
Waters et al. [29, 30] present a coverage-based approach to in-
tention selection proposed by [23], in which the intention with
the lowest coverage, i.e., the highest probability of becoming non-
executable due to changes in the environment, is selected for ex-
ecution. As in [21–23], intention selection is limited to the plan
level, and their experimental evaluation assumes that there are
no conflicts between intentions. Shaw and Bordini have proposed
approaches to intention selection based on Petri nets [18] and con-
straint logic programming [19]. Again, as in [21–23], the plans and
sub-goals in a goal-plan tree are regarded as basic steps, and in-
terleaving is at the level of sub-plans and subgoals. They do not
consider interactions between actions in plans.
The TÆMS (Task Analysis, Environment Modelling, and Sim-
ulation) framework [9] together with Design-To-Criteria (DTC)
scheduling [27] have been used in agent architectures such the
Main Track AAMAS 2021, May 3-7, 2021, Online
1422
Soft Real-Time Agent Architecture [25] and AgentSpeak(XL) [1]
to schedule intentions. TÆMS provides a high-level framework
for specifying the expected quality, cost and duration of of meth-
ods (actions) and relationships between tasks (plans). DTC decides
which tasks to perform, how to perform them, and the order in
which they should be performed, so as to satisfy hard constraints
(e.g., deadlines) and maximise the agent’s objective function. DTC
can produce schedules which allow interleaved or parallel execu-
tion of tasks and can be used in an anytime fashion. In the work
closest to that presented here [1], DTC was used to schedule ex-
ecution of AgentSpeak intentions at the level of individual plans.
The TÆMS relations between plans required to generate a schedule
(enables, facilitates and hinders) were specified as part of the agent
program. In contrast SQ interleaves intentions at the level of ac-
tions, and information about possible conflicts between intentions
is extracted automatically from goal-plan trees generated from the
agent program.
In [15] Sardina et al. show how anHTN planner can be integrated
into a BDI agent architecture. However their focus is on finding
a hierarchical decomposition of a plan that is less likely to fail by
avoiding incorrect decisions at choice points, and they do not take
into account interactions with other concurrent intentions. In [31],
Wilkins et al. presented the Cypress architecture which combines
the the Procedural Reasoning System reactive executor PRS-CL,
and the SIPE-2 look-ahead planner. A Cypress agent uses PRS-CL to
pursue its intentions using a library of procedures (plans). If a failure
occurs during the execution of the plan due to an unanticipated
change in the agent’s environment, the executor calls SIPE-2 to
produce a new plan to achieve the goal, and continues executing
those portions of plans which are not affected. However, their
approach focusses on the generation of new plans to recover from
plan failures, rather than interleaving intentions so as to avoid
conflicts.
Among other approaches to deliberation about conflicts between
plans is the abstract programming language developed in [8] which
is inspired by the BOID architecture. In [28], an approach to run-
time conflict resolution between goals based on event calculus is
proposed and experimentally evaluated.
There has also been work on avoiding conflicts in a multi-agent
setting. For example, Clement and Durfee [4–6] propose an ap-
proach to coordinating concurrent hierarchical planning agents
using summary information and HTN planning. However in this
work, summary information is used to identify when conflicts may
arise between two or more agents rather than to avoid conflicts
between the intentions of a single agent. Moreover, it is assumed
that the agents plan offline in a static environment. In [11], Ephrahi
et al. present an approach to planning and interleaving the execu-
tion of tasks by multiple agents. The task of each agent is assigned
dynamically, and the execution of all tasks achieves a global goal.
They show how conflicts between intentions can be avoided by
appropriate scheduling of the actions of the agents. In [7] Dann et al.
extend the MCTS-based approach of Yao et al. [35] to a multi-agent
setting, in which agents consider not only interactions between
their own intentions but the intentions of other agents.
Visser et al. [26] have used summary information to reason about
preferences in BDI agents. They specify preferences over resources
and properties (e.g., payment-type, flight-class, hotel-rating etc.),
and use resource and property summary information to deliberate
over the selection of plans and the ordering of subgoals in a plan
when the ordering is not fixed by design, in order to maximise the
satisfaction of the agent’s preferences. Their approach however,
does not consider preference satisfaction across multiple intentions.
Many approaches to enhancing the performance of MCTS have
been proposed in the literature, both to reduce the search space
(for example pruning, analogously to α-β-pruning [10, 17]), and to
improve the performance of simulation, such as domain-specific
modifications to the default policy, backpropagation policy, paral-
lelisation, etc. [3]. Our approach does not change the MCTS search
space. Rather, it falls under enhancements to the simulation phase,
since during simulation we use information obtained off-line in or-
der avoid performing unnecessary computation. We are not aware
of similar approaches to simulation performance enhancement in
the MCTS literature. However, our approach can be used in combi-
nation with techniques such as [3, 10, 17] .
7 CONCLUSIONIn this paper, we introduced the notion of quantitative summary in-
formation which can be used to estimate the likelihood of conflicts
between an agent’s intentions. We showed how offline simulation
can be used to precompute quantitative summary information prior
to execution of the agent’s program, and how the precomputed sum-
mary information can be used at runtime to guide the expansion
of the MCTS search tree and avoid unnecessary runtime simula-
tion. We compared the performance of our approach with standard
MCTS in a range of scenarios of increasing difficulty. The results
suggest our approach can reduce the number of runtime simula-
tions performed by up to 64% and the time required to schedule an
agent’s intentions by up to 38%.
In its current form, SQ considers only the non-executability of
intentions due to conflicts. In future work we plan to investigate
whether notions similar to coverage [23] can be incorporated into
quantitative summary information to estimate the probability of
unestablished fragile literals (i.e., unestablished preconditions) re-
sulting in the non-executability intentions.
ACKNOWLEDGMENTSThis work was supported by Zhejiang Provincial Natural Science
Foundation of China (LQ19F030009) and National Natural Science
Foundation of China (61906169).
REFERENCES[1] Rafael Bordini, Ana L. C. Bazzan, Rafael de O. Jannone, Daniel M. Basso, Rosa M.
Vicari, and Victor R. Lesser. 2002. AgentSpeak(XL): Efficient Intention Selection
in BDI Agents via Decision-Theoretic Task Scheduling. In Proceedings of theFirst International Conference on Autonomous Agents and Multiagent Systems(AAMAS’02). 1294–1302.
[2] Rafael H. Bordini, Jomi Fred Hübner, andMichaelWooldridge. 2007. Programmingmulti-agent systems in AgentSpeak using Jason. Wiley.
[3] Cameron Browne, Edward J. Powley, Daniel Whitehouse, Simon M. Lucas, Pe-
ter I. Cowling, Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon
Samothrakis, and Simon Colton. 2012. A Survey of Monte Carlo Tree Search
Methods. IEEE Transactions on Computational Intelligence and AI in Games 4, 1(March 2012), 1–43.
[4] Bradley J. Clement and Edmund H. Durfee. 1999. Theory for Coordinating
Concurrent Hierarchical Planning Agents Using Summary Information. In Pro-ceedings of the Sixteenth National Conference on Artificial Intelligence and EleventhConference on Innovative Applications of Artificial Intelligence (Orlando, Florida,
Main Track AAMAS 2021, May 3-7, 2021, Online
1423
USA), Jim Hendler and Devika Subramanian (Eds.). AAAI Press / The MIT Press,
495–502.
[5] Bradley J. Clement and Edmund H. Durfee. 2000. Performance of Coordinat-
ing Concurrent Hierarchical Planning Agents Using Summary Information. In
4th International Conference on Multi-Agent Systems (Boston, MA, USA). IEEE
Computer Society, 373–374.
[6] Bradley J. Clement, Edmund H. Durfee, and Anthony C. Barrett. 2007. Abstract
Reasoning for Planning and Coordination. Journal of Artificial Intelligence Re-search 28 (2007), 453–515.
[7] Michael Dann, John Thangarajah, Yuan Yao, and Brian Logan. 2020. Intention-
Aware Multiagent Scheduling. In Proceedings of the 19th International Conferenceon Autonomous Agents and Multiagent Systems (AAMAS 2020) (Auckland), B. An,N. Yorke-Smith, A. El Fallah Seghrouchni, and G. Sukthankar (Eds.). IFAAMAS,
285–293.
[8] Mehdi Dastani and Leendert W. N. van der Torre. 2004. Programming BOID-Plan
Agents: Deliberating about Conflicts among Defeasible Mental Attitudes and
Plans. In Proceedings of the 3rd International Joint Conference on AutonomousAgents and Multiagent Systems (AAMAS 2004) (New York, NY). IEEE Computer
Society, 706–713.
[9] Keith S. Decker and Victor R. Lesser. 1993. Quantitative modeling of complex
environments. International Journal of Intelligent Systems in Accounting, Financeand Management 2 (1993), 215–234.
[10] Joris Duguépéroux, Ahmad Mazyad, Fabien Teytaud, and Julien Dehos. 2016.
Pruning Playouts in Monte-Carlo Tree Search for the Game of Havannah. In
Proceedings of the 9th International Conference on Computers and Games (CG 2016)(Leiden) (Lecture Notes in Computer Science, Vol. 10068), Aske Plaat, Walter A.
Kosters, and H. Jaap van den Herik (Eds.). Springer, 47–57.
[11] Eithan Ephrati and Jeffrey S. Rosenschein. 1993. A Framework for the Interleaving
of Execution and Planning for Dynamic Tasks by Multiple Agents. In FromReaction to Cognition, 5th European Workshop on Modelling Autonomous Agents,MAAMAW ’93 (Neuchatel, Switzerland), Cristiano Castelfranchi and Jean-Pierre
Müller (Eds.). Springer, 139–153.
[12] Jana Koehler and Kilian Schuster. 2000. Elevator Control as a Planning Problem. In
Proceedings of the Fifth International Conference on Artificial Intelligence PlanningSystems, Breckenridge, CO, USA, April 14-17, 2000 (Breckenridge, CO, USA), SteveChien, Subbarao Kambhampati, and Craig A. Knoblock (Eds.). AAAI, 331–338.
[13] Brian Logan, John Thangarajah, and Neil Yorke-Smith. 2017. Progressing Inten-
tion Progresson: A Call for a Goal-Plan Tree Contest. In Proceedings of the 16thInternational Conference on Autonomous Agents and Multiagent Systems (AAMAS2017), (Sao Paulo, Brazil), S. Das, E. Durfee, K. Larson, and M. Winikoff (Eds.).
IFAAMAS, 768–772.
[14] Anand S. Rao andMichael P. Georgeff. 1992. AnAbstract Architecture for Rational
Agents. In 3rd International Conference on Principles of Knowledge Representationand Reasoning. Morgan Kaufmann, 439–449.
[15] Sebastian Sardiña, Lavindra de Silva, and Lin Padgham. 2006. Hierarchical plan-
ning in BDI agent programming languages: a formal approach. In Proceedings ofthe 5th International Joint Conference on Autonomous Agents and Multiagent Sys-tems (AAMAS 2006) (Hakodate, Japan), Hideyuki Nakashima, Michael P. Wellman,
Gerhard Weiss, and Peter Stone (Eds.). ACM, 1001–1008.
[16] Maarten P. D. Schadd, Mark H. M. Winands, Mandy J. W. Tak, and Jos W.
H. M. Uiterwijk. 2012. Single-player Monte-Carlo tree search for SameGame.
Knowledge-Based Systems 34 (2012), 3–11.[17] Nick Sephton, Peter I. Cowling, Edward Jack Powley, and Nicholas H. Slaven.
2014. Heuristic move pruning in Monte Carlo Tree Search for the strategic
card game Lords of War. In Proceedings of the IEEE Conference on ComputationalIntelligence and Games (CIG 2014). 1–7.
[18] Patricia H. Shaw and Rafael H. Bordini. 2007. Towards Alternative Approaches
to Reasoning About Goals. In Declarative Agent Languages and TechnologiesV, 5th International Workshop (Honolulu, HI, USA), Matteo Baldoni, Tran Cao
Son, M. Birna van Riemsdijk, and Michael Winikoff (Eds.), Vol. 4897. Springer,
104–121.
[19] Patricia H. Shaw and Rafael H. Bordini. 2010. An Alternative Approach for
Reasoning about the Goal-Plan Tree Problem. In Proceedings of the 19th EuropeanConference on Artificial Intelligence (ECAI 2010) (Lisbon, Portugal), Helder Coelho,Rudi Studer, and Michael Wooldridge (Eds.), Vol. 215. IOS Press, 1035–1036.
[20] Patricia H. Shaw, Berndt Farwer, and Rafael H. Bordini. 2008. Theoretical and
experimental results on the goal-plan tree problem. In Proceedings of the 7thInternational Joint Conference on Autonomous Agents and Multiagent Systems(AAMAS 2008), Lin Padgham, David C. Parkes, Jörg P. Müller, and Simon Parsons
(Eds.). IFAAMAS, 1379–1382.
[21] John Thangarajah and Lin Padgham. 2011. Computationally Effective Reasoning
About Goal Interactions. Journal of Automated Reasoning 47, 1 (2011), 17–56.
[22] John Thangarajah, Lin Padgham, and Michael Winikoff. 2003. Detecting &
Avoiding Interference Between Goals in Intelligent Agents. In Proceedings ofthe Eighteenth International Joint Conference on Artificial Intelligence (IJCAI-03)(Acapulco, Mexico), Georg Gottlob and Toby Walsh (Eds.). Morgan Kaufmann,
721–726.
[23] John Thangarajah, Sebastian Sardina, and Lin Padgham. 2012. Measuring plan
coverage and overlap for agent reasoning. In Proceedings of the 11th InternationalConference on Autonomous Agents and Multiagent Systems (AAMAS 2012) (Valen-cia, Spain), Wiebe van der Hoek, Lin Padgham, Vincent Conitzer, and Michael
Winikoff (Eds.). IFAAMAS, 1049–1056.
[24] Leslie G. Valiant. 1979. The Complexity of Computing the Permanent. TheoreticalComputer Science 8, 2 (1979), 189–2001.
[25] Régis Vincent, Bryan Horling, Victor Lesser, and Thomas Wagner. 2001. Imple-
menting Soft Real-Time Agent Control. In Proceedings of the Fifth InternationalConference on Autonomous Agents (AGENTS’01) (Montreal, Quebec, Canada).
ACM Press, New York, NY, USA, 355–362.
[26] Simeon Visser, John Thangarajah, James Harland, and Frank Dignum. 2016.
Preference-based reasoning in BDI agent systems. Autonomous Agents and Multi-Agent Systems 30, 2 (2016), 291–330.
[27] Thomas Wagner, Alan Garvey, and Victor Lesser. 1998. Criteria-Directed Heuris-
tic Task Scheduling. International Journal of Approximate Reasoning 19 (1998),
91–118.
[28] Xiaogang Wang, Jian Cao, and Jie Wang. 2012. A Runtime Goal Conflict Resolu-
tion Model for Agent Systems. In 2012 IEEE/WIC/ACM International Conferenceson Intelligent Agent Technology, IAT 2012. IEEE Computer Society, 340–347.
[29] Max Waters, Lin Padgham, and Sebastian Sardina. 2014. Evaluating Coverage
Based Intention Selection. In Proceedings of the 13th International Conferenceon Autonomous Agents and Multi-agent Systems (AAMAS 2014) (Paris, France),Alessio Lomuscio, Paul Scerri, Ana Bazzan, and Michael Huhns (Eds.). IFAAMAS,
957–964.
[30] Max Waters, Lin Padgham, and Sebastian Sardiña. 2015. Improving domain-
independent intention selection in BDI systems. Autonomous Agents and Multi-Agent Systems 29, 4 (2015), 683–717. https://doi.org/10.1007/s10458-015-9293-5
[31] David E. Wilkins, Karen L. Myers, John D. Lowrance, and Leonard P. Wesley.
1995. Planning and reacting in uncertain and dynamic environments. Journal ofExperimental and Theoretical Artificial Intelligence 7, 1 (1995), 121–152.
[32] MichaelWinikoff. 2005. JACK Intelligent Agents: An Industrial Strength Platform.
In Multi-Agent Programming: Languages, Platforms and Applications, Rafael H.Bordini, Mehdi Dastani, Jürgen Dix, and Amal El Fallah-Seghrouchni (Eds.).
Multiagent Systems, Artificial Societies, and Simulated Organizations, Vol. 15.
Springer, 175–193.
[33] Michael Winikoff and Stephen Cranefield. 2014. On the Testability of BDI Agent
Systems. Journal of Artificial Intelligence Research 51 (2014), 71–131.
[34] Yuan Yao, Lavindra de Silva, and Brian Logan. 2016. Reasoning about the Exe-
cutability of Goal-Plan Trees. In Proceedings of the 4th International Workshop onEngineering Multi-Agent Systems (EMAS 2016) (LNCS, Vol. 10093), Matteo Baldoni,
Jorg P. Muller, Ingrid Nunes, and Rym Zalila-Wenkstern (Eds.). Springer, 176–191.
[35] Yuan Yao and Brian Logan. 2016. Action-Level Intention Selection for BDI Agents.
In Proceedings of the 15th International Conference on Autonomous Agents andMultiagent Systems (AAMAS 2016) (Singapore), J. Thangarajah, K. Tuyls, C. Jonker,and S. Marsella (Eds.). IFAAMAS, 1227–1235.
[36] Yuan Yao, Brian Logan, and John Thangarajah. 2014. SP-MCTS-based Intention
Scheduling for BDI Agents. In Proceedings of the 21st European Conference on Arti-ficial Intelligence (ECAI-2014) (Prague, Czech Republic), Torsten Schaub, Gerhard
Friedrich, and Barry O’Sullivan (Eds.). IOS Press, 1133–1134.
[37] Yuan Yao, Brian Logan, and John Thangarajah. 2016. Robust Execution of BDI
Agent Programs by Exploiting Synergies Between Intentions. In Proceedings ofthe Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16) (Phoenix, USA),Dale Schuurmans and Michael P. Wellman (Eds.). AAAI Press, 2558–2564.
Main Track AAMAS 2021, May 3-7, 2021, Online
1424