eric platon* nicolas sabouret shinichi honiden · 2 e. platon, n. sabouret and s. honiden...

23
Int. J. Agent-Oriented Software Engineering, Vol. X, No. Y, xxxx 1 Copyright © 200x Inderscience Enterprises Ltd. An architecture for exception management in multiagent systems Eric Platon* National Institute of Informatics, Sokendai 2–1–2 Hitotsubashi, Chiyoda-ku 101–8430 Tokyo, Japan and Laboratoire d’informatique de Paris 6 104, avenue du Président Kennedy 75016 Paris, France E-mail: [email protected] *Corresponding author Nicolas Sabouret Laboratoire d’informatique de Paris 6 104, avenue du Président Kennedy 75016 Paris, France E-mail: [email protected] Shinichi Honiden National Institute of Informatics, Sokendai 2–1–2 Hitotsubashi, Chiyoda-ku 101–8430 Tokyo, Japan E-mail: [email protected] Abstract: Multiagent Systems (MAS) are open, heterogeneous and distributed software systems of autonomous agents. The management of exception differs in MAS from what is known in usual engineering approaches, owing to specific situations to handle, such as the agent death, knowledge inconsistencies or collaborative handling. Existing work does not fully address the properties of MAS, notably agent autonomy, and the mechanisms related to exceptions are often ad hoc. In this article, we define the concept of agent exception so as to satisfy the characteristics of the agent paradigm, and we propose a MAS architecture to support design and development of agent systems with exception management facilities. This architecture provides designers with an exception mechanism integrated into usual agent models, so that the work left to the designer is the definition of application-dependent handlers that are automatically invoked by the architecture when required. Keywords: exception management; Multiagent Systems; MAS; autonomy engineering. Reference to this paper should be made as follows: Platon, E., Sabouret, N. and Honiden, S. (xxxx) ‘An architecture for exception management in multiagent systems’, Int. J. Agent-Oriented Software Engineering, Vol. X, No. Y, pp.000–000.

Upload: others

Post on 10-Jan-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Eric Platon* Nicolas Sabouret Shinichi Honiden · 2 E. Platon, N. Sabouret and S. Honiden Biographical notes: Eric Platon holds a joint PhD from Paris 6 University and the National

Int. J. Agent-Oriented Software Engineering, Vol. X, No. Y, xxxx 1

Copyright © 200x Inderscience Enterprises Ltd.

An architecture for exception management in multiagent systems

Eric Platon* National Institute of Informatics, Sokendai 2–1–2 Hitotsubashi, Chiyoda-ku 101–8430 Tokyo, Japan and Laboratoire d’informatique de Paris 6 104, avenue du Président Kennedy 75016 Paris, France E-mail: [email protected] *Corresponding author

Nicolas Sabouret Laboratoire d’informatique de Paris 6 104, avenue du Président Kennedy 75016 Paris, France E-mail: [email protected]

Shinichi Honiden National Institute of Informatics, Sokendai 2–1–2 Hitotsubashi, Chiyoda-ku 101–8430 Tokyo, Japan E-mail: [email protected]

Abstract: Multiagent Systems (MAS) are open, heterogeneous and distributed software systems of autonomous agents. The management of exception differs in MAS from what is known in usual engineering approaches, owing to specific situations to handle, such as the agent death, knowledge inconsistencies or collaborative handling. Existing work does not fully address the properties of MAS, notably agent autonomy, and the mechanisms related to exceptions are often ad hoc. In this article, we define the concept of agent exception so as to satisfy the characteristics of the agent paradigm, and we propose a MAS architecture to support design and development of agent systems with exception management facilities. This architecture provides designers with an exception mechanism integrated into usual agent models, so that the work left to the designer is the definition of application-dependent handlers that are automatically invoked by the architecture when required.

Keywords: exception management; Multiagent Systems; MAS; autonomy engineering.

Reference to this paper should be made as follows: Platon, E., Sabouret, N. and Honiden, S. (xxxx) ‘An architecture for exception management in multiagent systems’, Int. J. Agent-Oriented Software Engineering, Vol. X, No. Y, pp.000–000.

Page 2: Eric Platon* Nicolas Sabouret Shinichi Honiden · 2 E. Platon, N. Sabouret and S. Honiden Biographical notes: Eric Platon holds a joint PhD from Paris 6 University and the National

2 E. Platon, N. Sabouret and S. Honiden

Biographical notes: Eric Platon holds a joint PhD from Paris 6 University and the National Institute of Informatics, Sokendai. His research interests are multi-agent systems, notably issues related to software engineering and distribution aspects. Most of his publications are related to dependability mechanisms in the multi-agent case, with a focus on exception management and interaction monitoring.

Nicolas Sabouret is an Assistant Professor at Paris 6 University and holds a PhD from Paris 11 University. His research interests are multi-agent systems, semantic web services and animated conversational agents. The common issue targeted in these interests is the reasoning on action, change and runtime in robust manners. He is a member of several scientific societies and review program committees related to artificial intelligence and engineering publications.

Shinichi Honiden is a Full Professor at the National Institute of Informatics and the University of Tokyo. His research laboratory focuses on software engineering issues in distributed systems, with particular contributions to the field of multi-agent systems, mobile agent technologies and ubiquitous computing. After a career as the Director of Research at Toshiba, he is cultivating strong links with industry, notably with a joint education project to transfer techniques from mature research to software industry practices. He is a member of several major scientific societies and review program committees related to software engineering.

1 Preliminary

Exception-handling techniques were developed in the 1970s to increase the reliability of software without hampering the ease of programming. In the era of procedural languages and the advent of object-oriented programming, the term ‘exception’ has acquired a specialised meaning, tightly attached to high-level programming paradigms, as illustrated by the definition of Goodenough.

“Of the conditions detected while attempting to perform some operation, exception conditions are those brought to the attention of the operation’s invoker. The invoker is then permitted (or required) to respond to the condition.” (Goodenough, 1975a–c)

This definition and the subsequent lineage of exception-handling mechanisms are operation-centric approaches (Parnas and Würges, 1976). When an operation is invoked, e.g., by a method of an object, conditions are checked before the actual execution to validate the invocation context. Typical conditions are the correctness of the types and values of the operation input parameters. If a condition is not met, the operation is not executed and an exception is signalled to the invoker to initiate appropriate handling mechanisms. This description allows to recognise modern models, such as the ones in Java/C++ and Eiffel (Meyer, 1988; Stroustrup, 2000; Gosling et al., 2005).

In the case of agent systems, the usual definition of exception applies as agents are software, but it also misses characteristics of the agent concept that stand at a higher level. Traditional definitions state that an exception is entirely determined when conditions are violated in the invocation context of an operation. Agents are, however, free to evaluate whether the result of invoking an operation is ‘normal’ or ‘exceptional’,

Should this be “exceptional” instead?

Page 3: Eric Platon* Nicolas Sabouret Shinichi Honiden · 2 E. Platon, N. Sabouret and S. Honiden Biographical notes: Eric Platon holds a joint PhD from Paris 6 University and the National

An architecture for exception management in multiagent systems 3

owing to the autonomy assumption and to the loose coupling among agents and resources. Agents can also evaluate differently the type of exception they encounter depending on their individual contexts and inner mechanisms. And lastly, the usual definition of exceptions is mostly implemented as specific constructs in a programming language, whereas it is not clear whether a language construct is appropriate to address the agent case, owing to autonomy, openness, heterogeneity and systemic effects (Klein and Dellarocas, 1999). These characteristics lead to consider an additional ‘agent-centric’ approach orthogonal to the notion of exception in programming languages, and that is akin to architectural considerations, as can be observed in related work on exceptions (Issarny and Banâtre, 2001; Brambilla et al., 2006).

1.1 Agent exception

We define an agent exception with regard to the characteristics of openness, heterogeneity and agent autonomy of Multiagent Systems (MAS) (Platon et al., 2006a).

An agent exception is the evaluation by the agent of a perceived event as unexpected.

The source of exception is the essential difference with traditional definitions. The source is the agent taking the decision that an event is exceptional, instead of having the agent merely receiving an event that has been deemed as exceptional by an external entity. Owing to autonomy, agents can evaluate any percept and thus choose to engage either exceptional or normal execution code. An event should be understood in a broad sense of any observable action or state in the system. For example, events are the sending and reception of Agent Communication Language (ACL) messages, or the perception of artificial pheromones in stigmergic systems.

Exceptions are qualified as unexpected events. By unexpected, we mean the agent does not anticipate the arrival of the event in the current execution context (time, resources, value of parameters, etc.). In other words the agent is not ‘ready’ or unable to process the event when it occurs. The unexpected characteristic of an event then depends on the kind of agent that evaluates it and two different agents may react differently to the same event.

This definition is compatible with the openness and agent autonomy in MAS as it is elaborated on a loosely coupled model of MAS and agents can autonomously interpret an event equivocally. This definition provides the basis of what an agent exception is and it is not concerned with the social interdependency overlays that modulate individual interpretations. Typically, a power relationship or a reputation model can lead autonomous agents to consider an event as exceptional because they were told to do so by a superior or a trusted party. The modulations are optional capabilities that can influence the choice of agents, but they remain distinct matters. The remainder of this text focuses on the essential characteristics of the definition and leaves the study of the modulations for future work.

1.2 Case study

The case study is a simple agent-based simulation where agents sell and buy items following the contract net protocol (CNet) (Smith, 1980; FIPA, 2006). Agents play a single role, either as retailer or consumer. We assume that the target system shall be

Page 4: Eric Platon* Nicolas Sabouret Shinichi Honiden · 2 E. Platon, N. Sabouret and S. Honiden Biographical notes: Eric Platon holds a joint PhD from Paris 6 University and the National

4 E. Platon, N. Sabouret and S. Honiden

FIPA-compliant, i.e., the system features the infrastructure recommended by The Foundation for Intelligent Physical Agents (FIPA) (directory facilitation, agent management, etc.).

1.2.1 Fundamental architecture, algorithm and protocol of the case

Figure 1 presents on the left the architecture of the agents and a usual execution algorithm, based on the standard ‘sense–process–act’ models in the agent community (Brooks, 1991; Russell and Norvig, 2003). The right part of the figure shows a version of the CNet from the FIPA. The protocol is slightly simplified compared to the standard version to save space, but the main characteristics are preserved. The notation for the protocol follows the FIPA recommendation, except for the ‘*’ (star) symbol that is introduced to represent zero or more elements. Therefore, the CNet accepts an arbitrary number of retailers, but only one consumer, and messages can be multiple (multicasting).

Figure 1 Basic agent architecture, execution algorithm and a version of CNet

The agents receive percepts from others through the environment (input parameter) with the sense functionality, which corresponds to the sensor component of the architecture. They process percepts to produce actions, where the process is represented by the agent’s internal mechanisms of the architecture and knowledge for processing is in the internal representation. Agents act eventually to apply an action in the environment with the actuator component.

The two types of agents in the case study differ in their process functionalities in order to fulfil their respective roles in the CNet protocol. Retailers process messages to decide prices, sell and produce ordered items. These processes output reactions to

Input: environment while true do

percept ← sense(environment) action ← process(percept) act(action,environment)

end

Agent

Internal Representation

Actuator Sensor

Application Environment

Agent Internal Mechanisms

Loop flow R/W Access

Page 5: Eric Platon* Nicolas Sabouret Shinichi Honiden · 2 E. Platon, N. Sabouret and S. Honiden Biographical notes: Eric Platon holds a joint PhD from Paris 6 University and the National

An architecture for exception management in multiagent systems 5

messages received consumers (right-hand lifeline). On the other hand, consumers produce Call For Proposals (CFP) and decide the winner of the call. Consumers initiate CNet protocols with CFP messages and react to messages from retailers (left-hand lifeline).

1.2.2 Example of agent exceptions

The agents of the case study are first designed according to the CNet protocol. Flaws in the design of agents or non-determinism in their decision process can produce messages that do not follow the sequence of the protocol or do not arrive on time. Such a situation is exacerbated in open systems, where agents are developed independently. Despite standard and public specification of the protocol, implementations can be over- or underspecified with potential for design flaws. We will consider two situations here: overspecification and cancellation metaprotocol (FIPA, 2006).

A retailer can implement extra functionalities (overspecification) that comply with the characteristics of the CNet, although they are unexpected events. In a FIPA-compliant system, it is possible for retailers to exploit the directory facilitator to know about consumers in the system and initiate CNet protocols. Such CNet do not start with the CFP, however, but directly with a ‘propose’ message from the retailer to the target consumer. This initiative from retailers is an unexpected event, since consumer agents are not ‘ready’ to process them if they closely follow the CNet. It is, however, a desired property of autonomous agents to adapt to such kind of exceptional situation and take advantage of the opportunity of unexpected offers. It is also sound in this example as a retailer would just propose to execute a legal CNet, although initiated in an exceptional fashion.

FIPA recommends to implement a cancellation metaprotocol in addition to the CNet, in case the consumer decides to abandon the protocol. In open systems, it is possible that some consumer agents do not implement this extra protocol (‘underspecification’), and it is also possible that retailer agents decide to cancel their proposals for example (thus returning to the previous example). In other words, agents are likely to encounter situations where the protocol is cancelled and they are not informed about it. They have then to deal with the absence of events, which can also be thought of as an unexpected situation. The usual way to cope with such absence is the setting of timeouts in the agent for a given activity, as can be observed at the beginning of the CNet. The standard FIPA model of the CNet does not specify timeouts for the particular case of the cancellation, so that it is hard to expect designers to implement them.

A last example in the case study is the ‘agent death’, in which an agent faces the situation where a peer prematurely terminates (Klein et al., 2003). The problem is for the agents to react to the death and remain in a consistent state to pursue their activities. A fundamental issue is for the agent to detect the death of the peer (Platon et al., 2006c). The basic detection methods are to set up a ‘heartbeat’ mechanism (Miller and Tripathi, 2004; Iliasov and Romanovsky, 2006), or a time limit for an answer so that a peer is considered ‘dead’ whenever the limit is reached. In this latter case, the time event produced by the system clock when the limit is reached can be considered as an exceptional event by an agent. For another agent, the same time event can be simply ignored as irrelevant or normal, depending on an autonomous choice.

Page 6: Eric Platon* Nicolas Sabouret Shinichi Honiden · 2 E. Platon, N. Sabouret and S. Honiden Biographical notes: Eric Platon holds a joint PhD from Paris 6 University and the National

6 E. Platon, N. Sabouret and S. Honiden

1.3 Research issues and purpose

Agent exceptions differ from programming exceptions, which means the existing work may not be fully appropriate to handle them. As for agent-oriented software engineering issues, it is of high importance to avoid ad hoc exception management and provide designers with appropriate models and tools. These research endeavours are essential to cope with the issue of exception in MAS, and further with the issue of fault tolerance.

A number of paths can be considered to study the question of exception in MAS. Most notably, we considered the introduction of a new performative in the FIPA-ACL to have declarative means to deal with exceptions. We discarded this option, however, owing to our conclusions on the nature of agent exceptions (Platon et al., 2006a). A new performative can help in a range of situations to inform agents about an exception, but the autonomy of agents should let them decide whether the content (or performative) of a message is exceptional depending on its knowledge and context. An inform performative with appropriate contents is therefore sufficient for the pragmatics. More fundamentally, an ‘exception performative’ would lead to confound the semantics of programming exceptions with the one for agent exceptions (yet to be defined). We think it is risky to inherit the characteristics of usual models that are not adapted to agents. This claim is supported by the proliferation of innovative models in distributed systems (Xu et al., 1998; Issarny, 2001; Miller and Tripathi, 2004), where usual exception models cannot cope with the concurrency of exception signals.

Another path that we considered and followed is the architecture of agent systems, (e.g., Weyns, 2006). Our definition of exception stresses that agent exceptions are based on perception, the functional part of the interface between an agent and its environment. This relation to perception has consequences on the type of agent architecture that is required to deal with exception management. Appropriate architectures and guidelines can support designers in producing agents that feature exception-safe capabilities, to some extent. In addition, an approach based on the architecture is appropriate in the case of MAS, where the properties of interest are their distribution, openness, heterogeneity and the agent autonomy. All these properties impact the architectural styles that are acceptable for agents, in addition to the requirements for exception management.

The purpose of this article is then to present a software architecture for MAS that encompasses agent exception mechanisms. As open and heterogeneous system that host autonomous agents, MAS have properties such as dynamic binding of its elements and automatic reorganisation capabilities (as for autonomic computing).

Although mechanisms were proposed to handle agent exception, we will show they are usually not integrated into appropriate architectures and do not usually fulfil the requirements exposed by the aforementioned definition.

Although this article focuses explicitly on the issue of MAS architecture, the research on agent exception is only at the beginning. We identified a number of other research directions in related publication. For example, the automatic generation of exception handler, asynchronous exception management and concert exception management are specific mechanisms that were shown as necessary to manage agent exception (Souchon et al., 2003; Platon et al., 2006a–b).

Page 7: Eric Platon* Nicolas Sabouret Shinichi Honiden · 2 E. Platon, N. Sabouret and S. Honiden Biographical notes: Eric Platon holds a joint PhD from Paris 6 University and the National

An architecture for exception management in multiagent systems 7

1.4 Organisation

The structure of this article is as follows. Section 2 reviews the literature akin to exceptions in MAS and other types of systems. The aim of this section is to show how the current state of research relates to the definition of agent exception, and why it is not completely appropriate. Section 3 presents in detail the MAS architecture we propose to take agent exceptions into account. Section 4 discusses the case study presented in the introduction, the limitations of the approach and a comparison to related work. Finally, Section 5 concludes the article.

2 Related work in exception management

The literature related to this article pertains to MAS and distributed systems at large. In the following, theoretical and operational research is presented and related to the definition of agent exception, in the aim to demonstrate shortcomings relative to requirements for agents and their engineering. This survey cannot be exhaustive, and the most representative work has been selected for presentation.

2.1 Exceptions in distributed systems 2.1.1 Approaches for distributed and active objects

Distributed and Active Objects (D/AO) have received particular attention regarding exception handling, since usual mechanisms are not adapted to properties of such systems, such as concurrency and the global scope of some exceptional situations.

The exception handling models from Xu et al. (1998), Issarny (2001), or Miller and Tripathi (2004) rely on close concepts to cope with the concurrency of exception signals in D/AO systems (Dony et al., 2006). For example, Miller and Tripathi proposed ‘the guardian’ as a set of software constructs to handle exceptions in a distributed-object system. The guardian is a dedicated object that encapsulates rules to handle ‘global exceptions’ involving several threads, thus dealing with concurrent exceptions. A detailed example of exception handling is presented by Miller and Tripathi where the direct relationship with Java facilities can be observed. The guardian assists a client-server system that implements the ‘primary-backup’ approach to deal with server-side failures (Tanenbaum, 1994). If the primary server fails, a ‘global exception’ is raised, so that the guardian handles the error by asking the backup to take the role of primary, and by starting a new backup. The specification of this example is related to reorganisation of teams in MAS, and the server failure can be thought of as an agent-level exception.

The guardian and related work do not capture, however, the characteristics of agent systems. They initially target D/AO with (remote) procedure call, and the coupling is higher than an agent system architecture. Concretely, the interaction model of D/AO has a very similar semantics to usual object-oriented handling facilities that ‘bind’ invoker and operation. Agent interactions rely on other models with ‘weaker bindings’, typically message passing. Malicious agents can be part of open MAS, along with benevolent and ill-designed agents. The approaches for D/AO assume that agents are benevolent and they do not cope currently with arbitrary agent profiles, even though security concerns are

Page 8: Eric Platon* Nicolas Sabouret Shinichi Honiden · 2 E. Platon, N. Sabouret and S. Honiden Biographical notes: Eric Platon holds a joint PhD from Paris 6 University and the National

8 E. Platon, N. Sabouret and S. Honiden

considered. Some agent exceptions such as the agent death are also not taken into account (Klein et al., 2003). Finally, the encapsulation of agents cannot be fully satisfied. It guarantees some independence to the agent, which can ignore messages explicitly or answer false results, but access to the agent state is granted (members and methods) in D/AO approaches. Typically, the guardian is allowed to ‘command’ an agent, e.g., to wait or to restart a task, which are undesired possibilities in MAS.

2.1.2 Exceptions and software architecture

Research in software architecture proposes exception handling related to Architecture Description Languages (ADL), which target software engineering directly at the architecture level. The motivation for this approach is that the architecture of complex systems is not always guaranteed to be optimal in the lifetime of the system. Some events do not always require, but would benefit from architectural adaptation to new conditions of executions. Such adaptation can be observed in banking applications, where banks cannot afford to completely revise their IT system architecture at each evolution. The architecture is often extended, instead of adapted, to the cost of increasing complexity and performance drops. One notable instance of architecture-related exception handling is the work of Issarny and Banâtre (2001), which introduces exception handling constructs and runtime support to an ADL. The use of this extended ADL allows to specify how the architecture reacts to some exceptions.

Examples of such architecture-level exceptions are related to the client-server architecture. The language allows to specify that a base architecture (e.g., a Remote Procedure Call (RPC) link) can evolve automatically for dynamic binding of component instances, enhanced availability (replication), or enhanced response-time (prefetching), whenever such evolution is necessary to maintain the system performance.

Such work at the architecture-level is relevant to MAS, which are open and heterogeneous architectures. However, the extended ADL proposed in current work mostly aims at cooperative components, so that further extensions are required to deal with autonomous entities.

2.1.3 Exceptions in component-based software development

In relation to software architecture, the development of software based on Components On The Shelf (COTS) aims at building systems by assembling generic ‘ready-to-use’ components (Szyperski, 2002). The issue with COTS in practise is the actual integration of arbitrary components into a robust application. The implementation details of components are usually not known, and only some details about the provided functionalities are delivered with a given component. Integration of components is therefore difficult as ‘systemic exceptions’ can occur owing to their assembling (Dellarocas, 1998). In addition to traditional exceptions handled inside components as individual subparts of the system, system-level exceptions need specific mechanisms, in the same way that agent exceptions call for novel approaches.

Dellarocas proposes a model developed in relation to the work of Hägg and Klein et al. in MAS (Hägg, 1996; Dellarocas, 1998; Klein et al., 2003). The approach is to introduce pluggable ‘sentinel components’ in the assembling of COTS and request the components of the application to implement a set of interfaces that lets sentinels detect

Page 9: Eric Platon* Nicolas Sabouret Shinichi Honiden · 2 E. Platon, N. Sabouret and S. Honiden Biographical notes: Eric Platon holds a joint PhD from Paris 6 University and the National

An architecture for exception management in multiagent systems 9

and deal with exceptional behaviours. Sentinels actively monitor the execution systemwide for symptoms and they exploit a knowledge base of handling recipes to recover a variety of situations.

A later approach relies on the coordinated exception-handling approach, aforementioned in the case of D/AO (Xu et al., 1998; Romanovsky, 2001). The execution of components is organised into atomic actions that define a scope wherein exceptional situations must be managed. Action scopes can be nested so that the usual recursive handling schemes are reproduced: an exception that cannot be handled inside an action scope is propagated to the enclosing action scope. The main advantage of this approach is that it provides a dynamic means to organise the execution of components into actions. It also manages the occurrence of concurrent exceptions inside these actions. In addition, this work proposes guidelines to software integrators for introducing this exception-handling mechanism in the development process of COTS assembling.

The component-based approach assumes that application components are observable and commandable (through the required set of interfaces), and this hypothesis is not acceptable with agents. In the case of sentinel components, the approach based on systemwide observation does not hold in MAS where agents only have a local scope and scalability issues arise as the number of agents or the complexity of their interactions increase. The structuring offered by the action model is also not fully applicable in the case of agent exceptions. One of the assumption of this work is in fact that ‘components have deterministic behaviour and do not change their state spontaneously’ (Romanovsky, 2001). In other words, components need an invocation to ever react, similarly to an object in object-oriented programming. Although agents can be predictable, they usually evolve spontaneously as they execute autonomously.

2.2 Exceptions in MAS research 2.2.1 The sentinel-based architecture

Sentinels are special agents introduced by Hägg in MAS applications to provide a fault-tolerance service layer for BDI agents (Hägg, 1996). The approach has been extended in the work of Klein et al. with an exception handler repository (Klein and Dellarocas, 1999; Klein et al., 2003). Another extension has been developed by Shah et al. to focus on an exception diagnosis mechanism for detecting when sentinels must react (Shah et al., 2004; 2005; 2006). Each sentinel assists an application agent in its interactions with other agents. Sentinels are specialised in error detection and recovery, with the capability to inspect the state of agents (including here their ‘beliefs’ (Rao and Georgeff, 1995)). When an exception is detected in interactions or agent states, the sentinels execute a specific code to recover a desired state.

A detailed application from Hägg is a system and its sentinels for a power distribution company. Application agents negotiate energy consumption credits for load balancing on an electric grid. Sentinels can detect and remedy erroneous behaviours in negotiation processes by inspecting ‘checkpoints’ in the agent code.

The problem with the sentinel approach is that it does not satisfy the properties of agent encapsulation and autonomy. Encapsulation is not respected since sentinels can access and execute code in the so-called ‘agent-head’ (Hägg, 1996), which should be a blackbox to respect agent autonomy. In addition, the latter extension is declared to be part

Page 10: Eric Platon* Nicolas Sabouret Shinichi Honiden · 2 E. Platon, N. Sabouret and S. Honiden Biographical notes: Eric Platon holds a joint PhD from Paris 6 University and the National

10 E. Platon, N. Sabouret and S. Honiden

of the hosting system where agents can freely join and leave (Shah et al., 2004). As sentinels are allowed to fully inspect agents, this architectural style does not satisfy further the assumptions of openness and agent autonomy. Finally, agents are supposed to be benevolent and this hypothesis cannot hold in open systems.

2.2.2 Exceptions and mobile agents

Mobile agents have specific needs regarding fault tolerance, as they execute in heterogeneous deployment environments and the potential for communication issues is higher than fixed systems (problems of location, tracking, etc.). The agent community has proposed several approaches, notably middleware layers to support mobile agents in case of exceptions (Tripathi and Miller, 2001; Arief et al., 2006; Iliasov and Romanovsky, 2006; Damasceno et al., 2006). The CAMA middleware is an example of endeavours that elaborate a notion of scope in the coordination space of mobile agents. Different levels of scopes target a location, an agent, or a role, and different types of exceptions are attached to each scope. This organisation of the exception types provides designers with appropriate guidelines to reason about exceptions, and tools to define adequate agent reactions to unexpected events. The coordination space serves to propagate exception signals among the different scopes when required.

The number of approaches for exceptions in mobile agent systems has grown recently, which demonstrates the importance of the underlying issues. The advantage of these approaches is that they build upon sound engineering foundations, often related to earlier work presented in the section on distributed systems. The mobile agent model gives rise, however, to specific challenges that are often not included in other agent approaches, such as all others presented in this section. Consequently, we think the present work on mobile agents is essential for its topic and provides concrete grounds for the technical aspects of exceptions in MAS. It does not cope, however, with our definition of agent exception, since agents do not decide by themselves what is an exception and what is normal. We believe nevertheless that the present models are useful for agent exceptions and may evolve adequately.

2.2.3 Agent exceptions in commitment protocols

Representative work related to commitment protocols includes mechanisms to deal with exceptions in the fulfilment of commitments, which is particularly interesting for expressing the responsibilities of the agent software (Xing et al., 2001a–b; Xing and Singh, 2003; Mallya and Singh, 2005). The work of Xing et al. focuses on patterns of interactions among agents that lead to commitments, and some of the patterns can serve to model the reaction of agents to exceptional situations. A typical example is a pattern that describes the need for revising a commitment after the context of the agent has changed and evolved to exceptional conditions. The advantage of this pattern-based approach is the possibility to describe agent behaviours with state charts that are suitable for engineering them, notably at the requirement analysis and design stages. The research on commitment patterns, however, needs further endeavours towards the later development stages and a framework where commitment patterns could fit in down to implementation; similarly with the work on exceptions in workflow systems (Borgida and Murata, 1999). Another approach to commitments from Mallya and Singh deals with

Page 11: Eric Platon* Nicolas Sabouret Shinichi Honiden · 2 E. Platon, N. Sabouret and S. Honiden Biographical notes: Eric Platon holds a joint PhD from Paris 6 University and the National

An architecture for exception management in multiagent systems 11

exception handling for proactive agents that execute commitment protocols, a model of agent interactions that guarantees autonomy. When such a protocol is not respected, an exception is signalled to handle expected and unexpected situations. Expected exceptions are foreseen by the designer who wrote a specific handler beforehand (here, another protocol), which is the most common case in software programs. Unexpected exceptions are not coded beforehand and some constructs allow to dynamically build a handler by merging basic protocols together.

This method has been illustrated for a hotel reservation protocol. An expected exception can be the case where there is no vacancy in the hotel. The designer usually foresees this issue and a specific handler is available in the system to deal with it. An unexpected exception can be the occurrence of a dramatic event (a fire) that destroys the hotel. Customer agents usually know how to behave facing a cancellation, and it is reasonable to consider such situation as ‘expected’, in the sense that a handler is available. The hotel agent is unlikely, however, to have a handler that serves to cancel all reservations. At design time, the handling of such an exception might not have been fully prepared. Mallya and Singh propose to rely on an external repository to fetch specific handlers and merge them automatically for an adequate handling.

Although this approach is very attractive and verifies the agent paradigm (encapsulation, heterogeneity and openness), it still remains theoretical and it lacks validated results concerning scalability, even in later work (Mallya, 2005). The current issues are indeed the computational complexity of the selection of handlers and the dynamic assembly of new handlers.

2.2.4 Stigmergic systems

Stigmergy is an interaction model where agents put marks in the environment (messages with no intended recipient) that other agents exploit to determine their next actions (Brueckner, 2000). Stigmergy models the behaviour of social insects such as termites. One termite starts to build a nest by putting a piece of material on the ground (a mark in the environment). Other termites use this information to determine where to pile the piece they carry. Stigmergy is thus an indirect interaction model as there is no direct message passing. Stigmergic systems are shown to be particularly robust to exceptions such as the death or the failure of agents (Van Dyke Parunak, 1997). The robustness of these systems is mostly due to the high redundancy of agents, which reminds of the choice for modularity of software architectures that could limit the impact of exceptions in sequential systems. The study of these systems is particularly relevant as an advanced coordination model for agents, with a number of applications, notably for self-organising systems.

Little work on stigmergic systems discusses robustness issues, and there is no work on exception handling to our knowledge. The essential feature of this approach is the particular role of the environment in the mediation of interactions. We see the environment as an adequate element for supporting agent exceptions mechanisms. It is a modular element of the MAS architecture that can serve agents without infringing their encapsulation (part of the architecture ‘outside’ agents), and the case of stigmergy shows how information about events can flow in the system, so that ant-like agents can cope with events such as the agent death.

Page 12: Eric Platon* Nicolas Sabouret Shinichi Honiden · 2 E. Platon, N. Sabouret and S. Honiden Biographical notes: Eric Platon holds a joint PhD from Paris 6 University and the National

12 E. Platon, N. Sabouret and S. Honiden

Please verify if some text was inadvertently deleted or if the highlighted text can be omitted.

2.3 Survey conclusion

The work akin to exception contributes to our model of exception. None addresses completely, however, the concerns of openness, heterogeneity and agent autonomy. The survey shows important directions that are compatible with the definition and that can serve for a comprehensive agent exception approach. It also supports the research directions that we develop in this article, i.e., the elaboration of architectural foundations for exception management in MAS.

3 An architecture for exception management in MAS

The definition of agent exception and related work have consequences on the architecture of MAS. Usual architectures, as introduced in the running example in the case study and illustrated by the Jade or Jason frameworks (Jade Agent Framework, 2005; Jason Agent Platform Project, 2006), do not integrate exception management facilities for agent exception. In this section, a base architecture is presented for this purpose, after enumerating the constraints owing to the definition of agent exception.

3.1 Architectural constraints

Agents are autonomous pieces of software, i.e., they can take their own decisions, notably concerning normal and exceptional events, which is the central idea of our definition of agent exception. The first architectural constraint is therefore that exceptions make sense inside agents. Exception management is confined to the inner agent architecture, independently from other agents, and the application environment is not involved in the decision mechanisms related to exceptions.

The place of the environment in the system architecture leads to a second constraint: An explicit environment should be exploited to structure event propagation, thus including events that may cause exceptions. The environment is responsible for general functions of the architecture that implicitly supports exception management. In other words, the environment can provide event notification facilities (c.f. stigmergy), but it is up to the agents to interpret the events as exceptional. The environment can, however, give access to additional support, such as the handler repository of Klein et al. (2003). Such repository is then qualified as ‘resource’ in the environment (Weyns et al., 2007).

The third and last constraint that steered our work is that agent exceptions require internal representation (e.g., knowledge). An internal representation is necessary as reference for agents to evaluate incoming events and be able to distinguish what is ‘normal’ from what is ‘exceptional’, from the individual and subjective point of view of an agent. Consequently, agent architectures without explicit representation, such as the pure subsumption architecture from Brooks (1991), are not in principle. Practical reactive models encompass, however, some built-in implicit representations, so that they deal in fact with agent exceptions but in a less flexible prewired manner.

3.2 Model of exception-ready agents

The architecture. The architectural constraints lead us to the following revision of the architecture of an agent. Figure 2 depicts such an architecture, with necessary and optional components (Platon et al., 2006a). The necessary components are defined by the

Page 13: Eric Platon* Nicolas Sabouret Shinichi Honiden · 2 E. Platon, N. Sabouret and S. Honiden Biographical notes: Eric Platon holds a joint PhD from Paris 6 University and the National

An architecture for exception management in multiagent systems 13

base architecture in the agent community, introduced for the case study in Figure 1. The particularity of the architecture hereafter pertains to the elaboration of the agent perception and actuation components. The novelty is the management of relevance and expectation criteria to classify input events (the ‘percepts’) and let the agent initiate potential exception management when required internally. This model can be related to the proposal from Shah et al. for exception diagnosis (Shah et al., 2004), and the work of Weyns et al. (2004) on active perception and the notion of focus.

Figure 2 Agent architecture with exception management mechanisms

Internal Representation

Base Mechanisms

Agent Internal Mechanisms

Actuator Sensor

Application Environment

Agent

Execution Cycle

Read Access

R/W Access

Actuation Perception

Generation

Exception Mechanisms

Evaluation

Relevance

Expectation FilterExpectation

Relevance Filter

The architecture reproduces in white the necessary components of the base architecture, and the optional components are introduced in grey, with the aim to manage agent exceptions. This distinction separates the application logic in white, from the exception handling logic in grey, so that designers can choose whether the exception management part is necessary depending on their target application.

The perception component of the present architecture encompasses the sensor and evaluation subcomponents. Sensors receive events from the environment and pass them to the evaluation. This latter element is first responsible for distinguishing relevant from irrelevant events. Relevance appears to be an essential feature to filter out unnecessary information (potential for exceptions that do not concern the agent in any way) and avoid the high-bandwidth issue (Kushmerick, 1997). The evaluation then identifies unexpected events depending on the criteria of the agent. One example of criteria in planning agents can be that unexpected events are those who are not ‘scheduled’ in the plan. Such criteria is independent of the architecture, and it is up to the designer to choose one in the development stages, depending on the target application. The evaluation uses the internal representation as the reference by which the agent can distinguish events. The internal

Page 14: Eric Platon* Nicolas Sabouret Shinichi Honiden · 2 E. Platon, N. Sabouret and S. Honiden Biographical notes: Eric Platon holds a joint PhD from Paris 6 University and the National

14 E. Platon, N. Sabouret and S. Honiden

representation refers to any representation inside the agent. For example, the BDI and KGP architectures have a set of knowledge bases (Rao and Georgeff, 1995; Kakas et al., 2004), whereas ant-like agents have simpler internal fixed data structures. It should be noted here that the evaluation component does not have to distinguish mechanisms for relevance and expectation, as they can be merged together conveniently. The aim of the distinction is to show the difference of purpose, and to link our work to existing endeavours (Weyns et al., 2004).

Events classified by the evaluation are forwarded to the agent internal mechanisms component, where they are processed. The two functional layers presented in this component separate the exception mechanisms from the ‘base’ that aims at the application logic of the agent. The exception layer introduces appropriate mechanisms to deal with exceptions, and its output should be directed to the base layer, so that the agent can continue its activity despite the occurrence of an unexpected event. The component as a whole manipulates the internal representation and its output is an action passed to the actuation for producing an effect in the environment.

The actuation component eventually prepares the relevance and expectation criteria of the agent in its future interactions. The criteria can be dynamically adapted by the agent to fit its context in the system, and it is up to the designer to decide the kind of evolution of criteria. Finally, the actuator component serves to commit the agent action in the environment.

3.2.1 Fundamental algorithms associated with the architecture

The algorithm related to the base agent architecture needs to be adapted (see Figure 1). Algorithm 1 presents an updated version.

Algorithm 1 Algorithm of an exception-ready agent

On the basis of the new architecture, agents sense the environment and evaluate the percepts. If the evaluation flags the percept as exception, the exception management architectural component is exploited in the handling instruction of the algorithm. Handling can modify the agent state and knowledge, but the execution eventually returns to processing the internal percept (which may have been modified by the handling) to produce an action in the environment, if required. Before committing the action in the environment, the agent generates the next relevance and expectation criteria that matter to it. The generation does not modify the action, as decided by the agent, but it exploits it

Input: environment while true do

percept ← sense(environment); internal_percept ← evaluate(percept); if internal_percept is flagged as exception then internal_percept ←handling(internal_percept); end action ← process(internal_percept); generate(action); act(action,environment);

end

Page 15: Eric Platon* Nicolas Sabouret Shinichi Honiden · 2 E. Platon, N. Sabouret and S. Honiden Biographical notes: Eric Platon holds a joint PhD from Paris 6 University and the National

An architecture for exception management in multiagent systems 15

to determine the appropriate criteria. The algorithm shows that no exception ‘enters’ the agent. The agent decides by itself when it needs to initiate exception handling capabilities after the evaluation. This mechanism is the architectural mechanism to ensure the agent encapsulation and autonomous decisions.

The evaluation and generation functionalities are the central mechanisms of our exception management approach. The architecture must provide adequate components for implementing their algorithms, and we detail hereafter their essential stages. Algorithm 2 describes how the evaluation component evaluates an incoming percept.

Algorithm 2 Algorithm for the evaluation component

The purpose of the algorithm is to discard irrelevant percepts, to distinguish expected from unexpected percepts, and to provide inner components of the agent with useful information in case of unexpected percepts (i.e., an ‘internal percept’). The relevance function is an application-dependent function that extracts salient information from a percept for comparison with the knowledge base ‘relevanceKB’. If the extracted information is relevant (e.g., appropriate receiver field), the percept is kept for further processing. Otherwise, the percept is considered irrelevant and the algorithm exits. The application-dependent expectation function further extracts information from relevant percepts for comparison with ‘expectationKB’. The resulting signature is matched with the knowledge base. If the agent was waiting for the percept (e.g., an inform message), the base contains information about it (e.g., a filter on inform messages). The output of the algorithm is then a tuple (percept, nil), where nil indicates that no unexpected situation has occurred. When an exception is detected, the output of the algorithm is a similar tuple, where nil is replaced by the signature. This signature serves other components of the agent to determine how to handle the situation (i.e., handler selection scheme).

Algorithm 3 presents an algorithm to generate the relevance and expectation criteria necessary for the aforementioned relevance and expectation functions. This algorithm focuses on the case of agents executing interaction protocols, and it does not apply as-is for planning agents and ant-like agents, to name a few.

At the stage of this algorithm, the agent has decided an action in a given protocol (the action here is sending a message). If this action initiates the protocol, the knowledge base of the agent is updated. RelevanceKB receives a formatted criterion that states that ‘any message related to this protocol is relevant’, which can be implemented as a simple predicate rule. ExpectationKB receives the next message after the action in the protocol. That is, the agent expects to receive from others one of the immediately following messages defined in the protocol. Similarly, the second case updates the knowledge of

Input: percept,relevanceKB,expectationKBOutput: internal_percept if relevanceKB contains relevance(percept) then

signature ← expectation(percept); if expectationKB contains signature then internal_percept ← (percept,nil); else internal_percept ← (percept,signature); end

end

Page 16: Eric Platon* Nicolas Sabouret Shinichi Honiden · 2 E. Platon, N. Sabouret and S. Honiden Biographical notes: Eric Platon holds a joint PhD from Paris 6 University and the National

16 E. Platon, N. Sabouret and S. Honiden

the agent when the action terminates the protocol. The case of cancellation introduced with the example in Section 1.2 is also managed here. Finally, all other actions replace the current expectation of the agent by the next one with respect to the protocol.

Algorithm 3 Algorithm for the generation component in the restricted case of interaction protocols

3.2.2 Basic environment for event notification

The case study introduced agent exceptions such as the ‘agent death’, with the difficulty to define how agents detect the actual death. The environment can be conveniently tailored to support such kind of exception, without infringing the agent paradigm, notably its autonomy and local sensing capabilities. The essential functionality of this support is to notify agents about changes in the environment, such as the termination of heartbeats from one agent, or similar approaches. The application environment architecture does not explicitly refer to exception mechanisms, owing to the definition of agent exception. However, the role of the environment is essential in the proper notification of events with respect to agent autonomy. Figure 3 focuses on such application environment. This architecture is inspired by the reference model proposed in the agent community (Weyns et al., 2007).

Figure 3 Environment-based architecture for exception management

Input: action,protocol Input/Output: relevanceKB,expectationKB if action initiates a new protocol then

relevanceKB ← relevanceKB∪ {format(protocol)}; expectationKB ← expectationKB∪ {next(action,protocol)};

else if action terminates a protocol then relevanceKB ← relevanceKB\{format(protocol)}; expectationKB ← expectationKB\{format(protocol)};

else expectationKB ← expectationKB\{current(action,protocol)}; expectationKB ← expectationKB∪ {next(action,protocol)};

end

Application Environment

Collecting

Agents

Effecting

Internal Mechanisms

Acting

State (e.g. topology)

Interface to Deployment Context

Observing

Messaging

Interface For Distributed Environments

Page 17: Eric Platon* Nicolas Sabouret Shinichi Honiden · 2 E. Platon, N. Sabouret and S. Honiden Biographical notes: Eric Platon holds a joint PhD from Paris 6 University and the National

An architecture for exception management in multiagent systems 17

Actions from agents are collected in the collecting function and passed to the internal mechanisms for processing. Agents can have three types of actions, namely, exchanging messages with other agents (messaging), observing resources and agents (observing), or acting on resources and agents (acting). Depending on the action types, either of the three components is activated. The three action components can access the state of the environment, but only acting can modify it, for example, to update the state of the agent in the environment (e.g., heartbeat action). The state of the environment encompasses a variety of data useful to application agents, such as the topology of the system network (if applicable) or the state of pheromones in the case of stigmergic MAS. Once the actions are processed, the resulting events are sent to agents by the effecting function.

The interface to deployment context allows agents to exploit and receive notifications from external resources, such as databases and web services. The internal mechanisms use the deployment context to access some facilities. For example, the handler repository proposed by Klein et al. (2003) could be advantageously integrated as a knowledge base accessible by agents in search of specific handlers. All internal mechanisms can send events (e.g., database query) to the deployment context, which reacts by replying on the messaging mechanism towards agents. In addition, the ‘interface to the deployment context’ element maintains data in the environment state, for example, concerning time (essential for the timeouts introduced earlier).

The last component of the architecture is for the interface for distributed environments, that serves to synchronise the global state of the environment, passing events, and agent mobility, when the environment is separated into ‘pieces’ over a distributed infrastructure, e.g., in Okuyama et al. (2004).

4 Application to the case study and limitations of the architecture

This section revisits the case study presented in Section 1.2, describing how the architecture supports the exception occurrence. This description then serves to present the limitations of the architecture and discuss it relative to other approaches.

4.1 Application to the case study

Section 1.2 refers to two types of agent exceptions in the case study, namely, unexpected messages in a CNet and the agent death. Let us consider a consumer agent C and a retailer agent R. C normally initiates CNet, but it also considers relevant any CNet-related message from other agents, i.e., it accepts taking opportunities. Let us assume that R sends a ‘propose’ message to C relative to a fresh protocol P. According to the new architecture and Algorithm 1, C will sense the message and evaluate it. At this stage, the knowledge base of C does not contain any reference to P. Algorithm 2 states that P is relevant as it is an offer directed to C. It is not expected, however, so the internal percept of C is deemed as an exception. This mechanism has allowed C to decide an exception autonomously, without any external intervention. C can now decide how to handle the situation, which is independent from architectural matters. For example, the handling can be to accept the opportunity and engage in P. C would then generate an action and criteria, according to Algorithms 1 and 3, respectively. The action would be sending an ‘accept’ message, and the latter algorithm would generate expectations for a ‘result’ or ‘failure’ message from R.

Page 18: Eric Platon* Nicolas Sabouret Shinichi Honiden · 2 E. Platon, N. Sabouret and S. Honiden Biographical notes: Eric Platon holds a joint PhD from Paris 6 University and the National

18 E. Platon, N. Sabouret and S. Honiden

Let us assume that R dies at this point in the protocol, and that the environment is configured to receive heartbeats from agents to notify death in its topology. Whenever R disappears, the environment notifies agents about the death, including C. The perception of such a message by C is deemed as an exception in Algorithm 2, so that C can initiate appropriate handling for the termination of the protocol.

4.2 Limitations of the architecture

The case study illustrates how the architectural choices support designers in building exception-ready agents. Agents compliant with the architecture are able to distinguish normal from exceptional situations depending on their knowledge. In other words, the designer is not left with an ad hoc approach. The architecture implements the base mechanisms, and the designer needs to provide application-dependent data and functionalities:

1 application logic of the agent (base mechanisms)

2 base knowledge of the agent (internal representation)

3 handlers and extra knowledge (e.g., address of a handler repository).

Items (1) and (2) refer to the code that needs to be provided for a given application. Item (3) refers to the exception-related part of the code, which is reduced to handlers and necessary extra knowledge. The architecture manages the semantics of switching the agent execution to handlers when necessary. However, the support from the architecture cannot help defining handlers, which depends on design choices and the application. In the case study, we chose agents that execute protocols, so that handlers can be designed as ‘protocols’, with the essential difference that a handler can lead to internal actions, whereas a protocol is for interactions. An agent design based on planning or other models requires specific care about the form of handlers. We think, however, that the architecture itself is an appropriate foundation for other types of agents.

Concerning openness, the architecture relies on message-passing techniques, so that interoperability concerns are mostly confined to the choice of message formats. However, the architecture is not as flexible as recent work on open distributed systems, where components and connectors can be dynamically assembled according to configurations. The architectural elements in the agent and the environment have limited flexibility, as there exist few possible configurations of their execution cycles, e.g., the perception-processing action for agents. The only flexibility in these elements is their detailed descriptions, where different architectures can be adopted and dynamically changed. Most of the flexibility in our architecture relies on the connectors between agents and environment, which are dynamically instantiated as message-passing connectors at arrival and departure of agents. It is expected that such flexibility is sufficient in practise, similarly to service-oriented architectures. The detailed design of a MAS can introduce more flexibility ‘inside’ agents and environment, but this remains out of the focus of this article.

4.3 Comparison to related work and perspectives

The recurrent issue in related work is the lack of respect for agent encapsulation and autonomy. Most approaches in distributed systems and MAS allow controlling agents from ‘outside’, with the exception of the work on commitment protocols and stigmergy

Page 19: Eric Platon* Nicolas Sabouret Shinichi Honiden · 2 E. Platon, N. Sabouret and S. Honiden Biographical notes: Eric Platon holds a joint PhD from Paris 6 University and the National

An architecture for exception management in multiagent systems 19

(Sections 2.2.3 and 2.2.4). Our architecture is designed to comply with the definition of agent exception, so that agent encapsulation and autonomy are enforced. The work on commitment protocols also complies with the definition, but the present achievements relative to this work depend on the commitment protocol model, so that the question of engineering this model might be specific to this approach. Our architecture targets a more general concept of exception that can be implemented for various approaches. It is applicable to FIPA-compliant interaction protocols, as illustrated in the case study, and we think it is also applicable to commitment protocols and other agent models, such as the ones based on planning. As for stigmergic systems, our architecture offers a more flexible approach to robustness. The extended flexibility may not be necessary in current stigmergic systems, however, but we believe it can serve in future cognitive stigmergic models (Van Dyke Parunak, 2005).

In relation to the work on sentinels and mobile agents, our architecture takes another approach to exception management. The sentinel-based approach separates MAS into two distinct sub-MAS: Application agents fulfil the functional requirements of the system, and sentinel agents implement the quality requirements. The sentinel ‘layer’ has therefore a global point of view on the system, so that it can deal with ‘local’ and ‘global’ exceptions. Local exceptions pertain to unexpected events local to an agent, e.g., inconsistent knowledge. Global exceptions pertain to unexpected events that impact several agents, such as a circular wait cycle. Mobile agent approaches are often based on mechanisms developed for distributed computing, e.g., the coordinated atomic action model (Xu et al., 1998), which specifically targets global exceptions. Our architecture is not focusing on the difference between local and global exceptions, as the primary purpose was to ensure the agent encapsulation and autonomy. It is, however, more appropriate for local exceptions, since the mechanisms provided by the architecture endow agents with capabilities relative to their accessible context, thus local to the agent. Global exceptions can nevertheless be managed by the architecture. For instance, a circular-wait cycle could be handled by providing appropriate handlers to agents implementing the architecture, but experience shows that it would be significantly less efficient than an approach such as sentinels or coordinated atomic action. In consequence, a comprehensive model of agent exception should rely on our architecture for preserving encapsulation and autonomy, and also on robust external support for global exceptions. Such double approach makes sense when considering MAS where agents are part of a hierarchy. ‘Manager agents’ can detect exceptions and propose a remedy to some global issues. The problem with this view is that external support must appropriately serve agents without supposing external control is possible. This problem is difficult as global approaches must be robust to non-collaborative agents. Typically, an autonomous agent can decide that an event presented as an exception is not an issue, so that the agent refuses to participate in any collaborative handling. Alternative must then be searched dynamically.

5 Conclusions

Exception management is not a novel research question, but the present state of the art in MAS demonstrates the need for further research. The properties of openness, heterogeneity and agent autonomy require appropriate model of exception in MAS. We

Page 20: Eric Platon* Nicolas Sabouret Shinichi Honiden · 2 E. Platon, N. Sabouret and S. Honiden Biographical notes: Eric Platon holds a joint PhD from Paris 6 University and the National

20 E. Platon, N. Sabouret and S. Honiden

proposed an architecture of MAS that fulfils the requirements for the notion of ‘agent exceptions’ defined in this article. Our approach guarantees a loose coupling of agents in the system and respects their autonomy. The architecture supports designers in building exception-ready agents and it avoids ad hoc implementation of exception detection mechanisms. The designer needs to just provide handlers, i.e., how to process an exceptional situation, in addition to the usual application logic and knowledge of the agent. The main limitation of the architecture is that handlers depend on design choices (agents using protocols, or plans, etc.) and that further work is required for improving the management of global exceptions.

Our current endeavours in the study of agent exceptions will be first, to provide an API for the architecture. In particular, we expect the integration of several agent architectures in this model to extend their capabilities to exception handling. Our present code implements the architecture and algorithms of this article, in the case of agents executing protocols in market simulations. We plan to extract the API from the present experiments. Concerning open issues that should be addressed in future work, we focus on handling strategies on top of our architecture, especially for concurrent and concert exceptions with a priori non-collaborative agents. In the case of MAS, we are seeking an appropriate approach to allow agents to detect situations where a temporary collaboration is preferable, so that they have an incentive to form coalitions with other parties. The present candidate approach to define such an incentive is to rely on a trust model. A second challenging issue pertains to unexpected exceptions. The work of Klein et al. and Mallya allows agents to deal with unexpected exceptions to some extent. In the perspective of our architecture, we aim at pursuing these research endeavours.

Acknowledgements

This research is partially supported by the French Ministry of Foreign Affairs under the reference BFE/2006-484446G, Lavoisier grant programme. The authors thank José Ghislain Quenum for discussing this work in detail.

References

Arief, B., Iliasov, A. and Romanovsky, A. (2006) ‘On using the CAMA framework for developing open mobile fault-tolerant agent systems’, Software Engineering for Large-Scale Multi-Agent Systems, pp.29–35.

Borgida, A. and Murata, T. (1999) ‘Tolerating exceptions in workflows: a unified framework for data and processes’, WACC, ACM, pp.59–68.

Brambilla, M., Comai, S. and Tziviskou, C. (2006) ‘Exception management within web applications implementing business processes’, in C. Dony, J.L. Knudsen, A.B. Romanovsky and A. Tripathi (Eds.) Advanced Topics in Exception-Handling Techniques, Lecture Notes in Computer Science, Springer, Vol. 4119, pp.101–120.

Brooks, R. (1991) ‘Intelligence without representation’, Artificial Intelligence, Vol. 47, Nos. 1–3, pp.139–159.

Brueckner, S. (2000) ‘Return from the ant – synthetic ecosystems for manufacturing control’, PhD thesis, Humboldt University, Berlin, Germany.

Page 21: Eric Platon* Nicolas Sabouret Shinichi Honiden · 2 E. Platon, N. Sabouret and S. Honiden Biographical notes: Eric Platon holds a joint PhD from Paris 6 University and the National

An architecture for exception management in multiagent systems 21

Damasceno, K., Cacho, N., Garcia, A., Romanosky, A. and Lucena, C. (2006) ‘Context-aware exception handling in mobile agent systems: the MoCA case’, Software Engineering for Large-Scale Multi-Agent Systems, pp.37–43.

Dellarocas, C. (1998) ‘Toward exception handling infrastructures in component-based software’, Proceedings of the International Workshop on Component-based Software Engineering.

Dony, C., Urtado, C. and Vauttier, S. (2006) ‘Exception handling and asynchronous active objects: issues and proposal’, in C. Dony, J.L. Knudsen, A.B. Romanovsky and A. Tripathi (Eds.) Advanced Topics in Exception-Handling Techniques, Lecture Notes in Computer Science, Springer, Vol. 4119, pp.81–100.

Foundation for Intelligent Physical Agents (FIPA) (2006) ‘Contract net interaction protocol specification’, http://www.fipa.org/specs/fipa00029/SC00029H.html, Document number SC00029H (accessed October 2006).

Goodenough, J.B. (1975a) ‘Exception-handling design issues’, SIGPLAN Not., Vol. 10, No. 7, pp.41–45.

Goodenough, J.B. (1975b) ‘Exception handling: issues and a proposed notation’, Commun. ACM, Vol. 18, No. 12, pp.683–696.

Goodenough, J.B. (1975c) ‘Structured exception handling’, POPL ’75: Proceedings of the 2nd ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages, New York, NY: ACM Press, pp.204–224.

Gosling, J., Joy, B., Steele, G. and Bracha, G. (Eds.) (2005) The JavaTM Language Specification,

3rd ed., Addison-Wesley.

Hägg, S. (1996) ‘A sentinel approach to fault handling in multi-agent systems’, in C. Zhang and D. Lukose (Eds.) Distributed AI, Lecture Notes in Computer Science, Springer, Vol. 1286, pp.181–195.

Iliasov, A. and Romanovsky, A. (2006) ‘Structured coordination spaces for fault tolerant mobile agents’, in C. Dony, J.L. Knudsen, A.B. Romanovsky and A. Tripathi (Eds.) Advanced Topics in Exception-Handling Techniques, Lecture Notes in Computer Science, Springer, Vol. 4119, pp.181–199.

Issarny, V. (2001) Concurrent Exception Handling, in A.B. Romanovsky, C. Dony, J.L. Knudsen and A. Tripathi (Eds.) Advances in Exception-Handling Techniques, Lecture Notes in Computer Science, Springer, Vol. 2022, pp.111–127.

Issarny, V. and Banâtre, J-P. (2001) ‘Architecture-based exception handling’, Hawaii International Conference on System Sciences.

Jade Agent Framework (2005) ‘Jade agent framework’, http://jade.tilab.com/.

Jason Agent Platform Project (2006) ‘Jason agent platform project’, http://jason.sourceforge.net/ (accessed August 2006).

Kakas, A.C., Mancarella, P., Sadri, F., Stathis, K. and Toni, F. (2004) ‘The KGP model of agency’, in R. López de Mántaras and L. Saitta (Eds.) ECAI, IOS Press, pp.33–37.

Klein, M. and Dellarocas, C. (1999) ‘Exception handling in agent systems’, Agents, pp.62–68.

Klein, M., Rodríguez-Aguilar, J.A. and Dellarocas, C. (2003) ‘Using domain-independent exception handling services to enable robust open multi-agent systems: the case of agent death’, Autonomous Agents and Multi-Agent Systems, Vol. 7, Nos. 1–2, pp.179–189.

Kushmerick, N. (1997) ‘Software agents and their bodies’, Minds and Machines, Vol. 7, No. 2, pp.227–247.

Mallya, A.U. (2005) ‘Modeling and enacting business processes via commitment protocols among agents’, PhD thesis, North Carolina State University, Raleigh, USA.

Mallya, A.U. and Singh, M.P. (2005) ‘Modeling exceptions via commitment protocols’, Autonomous Agents and Multi-Agent Systems, New York, NY: ACM Press, pp.122–129.

Meyer, B. (1988) ‘Disciplined exceptions’, Technical report, Interactive Software Engineering.

Page 22: Eric Platon* Nicolas Sabouret Shinichi Honiden · 2 E. Platon, N. Sabouret and S. Honiden Biographical notes: Eric Platon holds a joint PhD from Paris 6 University and the National

22 E. Platon, N. Sabouret and S. Honiden

Miller, R. and Tripathi, A. (2004) ‘The guardian model and primitives for exception handling in distributed systems’, IEEE Trans. Software Eng., Vol. 30, No. 12, pp.1008–1022.

Okuyama, F.Y., Bordini, R.H. and da Rocha Costa, A.C. (2004) ‘Elms: an environment description language for multi-agent simulation’, in D. Weyns, H. Van Dyke Parunak and F. Michel (Eds.) E4MAS, Lecture Notes in Computer Science, Springer, Vol. 3374, pp.91–108.

Parnas, D.L. and Würges, H. (1976) ‘Response to undesired events in software systems’, International Conference on Software Engineering, pp.437–446.

Platon, E., Sabouret, N. and Honiden, S. (2006a) ‘A definition of exceptions in agent-oriented computing’, in G. O’Hare, M. O’Grady, O. Dikenelli and A. Ricci (Eds.) Engineering Societies in the Agent World ’06.

Platon, E., Sabouret, N. and Honiden, S. (2006b) ‘Challenges for exception handling in multi-agent systems’, Software Engineering for Large-Scale Multi-Agent Systems, pp.45–50.

Platon, E., Sabouret, N. and Honiden, S. (2006c) ‘Environment support for tag interactions’, Environment for Multi-Agent Systems.

Rao, A.S. and Georgeff, M.P. (1995) ‘BDI agents: from theory to practice’, Technical report, Australian Artificial Intelligence Institute.

Romanovsky, A.B. (2001) ‘Exception handling in component-based system development’, COMPSAC, IEEE Computer Society, pp.580–598.

Russell, S. and Norvig, P. (2003) Artificial Intelligence: A Modern Approach, Prentice Hall.

Shah, N., Chao, K-M., Godwin, N. and James, A.E. (2005) Exception Diagnosis in Open Multi-agent Systems, in A. Skowron, J-P.A. Barthès, L.C. Jain, R. Sun, P. Morizet-Mahoudeaux, J. Liu and N. Zhong (Eds.) IAT, IEEE Computer Society, pp.483–486.

Shah, N., Chao, K-M., Godwin, N., James, A.E. and Tasi, C-F. (2006) ‘An empirical evaluation of a sentinel-based approach to exception diagnosis in multi-agent systems’, AINA, IEEE Computer Society, Vol. 1, pp.379–386.

Shah, N., Chao, K-M., Godwin, N., Younas, M. and Laing, C. (2004) ‘Exception diagnosis in agent-based grid computing’, International Conference on Systems, Man and Cybernetics, IEEE, pp.3213–3219.

Smith, R.G. (1980) ‘The contract net protocol: high-level communication and control in a distributed problem solver’, IEEE Trans. Computers, Vol. 29, No. 12, pp.1104–1113.

Souchon, F., Dony, C., Urtado, C. and Vauttier, S. (2003) ‘Improving exception handling in multi-agent systems’, in C.J.P. de Lucena, A.F. Garcia, A.B. Romanovsky, J. Castro and P.S.C. Alencar (Eds.) SELMAS, Lecture Notes in Computer Science, Springer, Vol. 2940, pp.167–188.

Stroustrup, B. (2000) The C++ Programming Language, Addison-Wesley.

Szyperski, C. (2002) Component Software, Addison-Wesley.

Tanenbaum, A.S. (1994) Distributed Operating Systems, Prentice Hall.

Tripathi, A. and Miller, R. (2001) ‘Exception handling in agent-oriented systems’, in A.B. Romanovsky, C. Dony, J.L. Knudsen and A. Tripathi (Eds.) Advances in Exception-Handling Techniques, Lecture Notes in Computer Science, Springer, Vol. 2022, pp.128–146.

Van Dyke Parunak, H. (1997) ‘“Go to the ant”: engineering principles from natural multi-agent systems’, Annals of Operation Research, Vol. 75, pp.69–101.

Van Dyke Parunak, H. (2005) ‘A survey of environments and mechanisms for human–human stigmergy’, in D. Weyns, H. Van Dyke Parunak and F. Michel (Eds.) E4MAS, Lecture Notes in Computer Science, Springer, Vol. 3830, pp.163–186.

Weyns, D. (2006) ‘An architecture-centric approach for software engineering with situated multiagent systems’, PhD thesis, Katholieke Universiteit Leuven, Leuven, Belgium, October.

Weyns, D., Omicini, A. and Odell, J. (2007) ‘Environment, first-order abstraction in multiagent systems’, Autonomous Agents and Multi-Agent Systems, February, Vol. 14, No. 1, pp.5–30.

Page 23: Eric Platon* Nicolas Sabouret Shinichi Honiden · 2 E. Platon, N. Sabouret and S. Honiden Biographical notes: Eric Platon holds a joint PhD from Paris 6 University and the National

An architecture for exception management in multiagent systems 23

Weyns, D., Steegmans, E. and Holvoet, T. (2004) ‘Towards active perception in situated multi-agent systems’, Special Issue of the Journal on Applied Artificial Intelligence, Vol. 18, pp.8–9.

Xing, J. and Singh, M.P. (2003) ‘Engineering commitment-based multiagent systems: a temporal logic approach’, AAMAS, ACM, pp.891–898.

Xing, J., Wan, F., Rustogi, S.K. and Singh, M.P. (2001a) ‘A commitment-based approach for business process interoperation’, IEICE TRANSACTIONS on Information and Systems, Vol. E84-D, No. 10, pp.1324–1332.

Xing, J., Wan, F., Rustogi, S.K. and Singh, M.P. (2001b) ‘Commitment-based interoperation for e-commerce’, ISADS, pp.161–168.

Xu, J., Romanovsky, A.B. and Randell, B. (1998) ‘Coordinated exception handling in distributed object systems: from model to system implementation’, ICDCS, pp.12–21.