background analysis and design of abos, an agent-based ...830085/fulltext01.pdf · background...

Background Analysis and Designof ABOS, an Agent-Based

Operating System

Author: Mikael Svahnberg, [email protected]

A Software Engineering Master’s Thesis at the University of Karlskrona/Ronneby, 1998.

AuthorMikael Svahnberg, [email protected]

Advisor sHåkan Grahn, [email protected]

Paul Davidsson, [email protected]

A Software Engineering Master’ s Thesis at the Univer sity of Karlskr ona/Ronneb y, 1998.

AbstractModern operating systems should beextensible andflexible. This means that the operating systemshould be able to accept new behaviour and change existing behaviour without too much troubleand that it should ideally also be able to do this without any, or very little, downtime. Furthermore,during the past years the importance of the network has increased drastically, creating a demand foroperating systems to function in a distributed environment. To achieve this flexibility and distribut-edness, I have designed and evaluated ABOS, an Agent-Based Operating System. ABOS usesagents to solve all the tasks of the operating system kernel, thus moving away from traditionalmonolithic kernel structures. Early results show that I have gained in flexibility and modularity, cre-ating a fault-tolerant distributed operating system that can adapt and be adapted to almost any situa-tion with negligible decrease in performance. Within ABOS some tasks has been designed further,and there exists a demonstration of how the agent-based filesystem might work.

Keywords: Operating systems, Practical application of multi-agent systems

1

Table of Contents

1 Introduction1.1 Overview.............................................................................. 31.2 Current Praxis...................................................................... 4

UNIX.................................................................................. 4Windows NT...................................................................... 4Amoeba............................................................................. 5Mach ................................................................................. 6CORBA ............................................................................. 6

1.3 Ongoing Research............................................................... 7Spring................................................................................ 7Aegis ................................................................................. 8

1.4 Summary ............................................................................. 8

2 Agents2.1 Introduction ........................................................................ 102.2 Agent characteristics ......................................................... 112.3 Multiagent systems............................................................ 132.4 Agent enablers................................................................... 14

KQML .............................................................................. 14April ................................................................................. 15

2.5 Operating system agents................................................... 15TACOMA......................................................................... 16MOTIS and X.400 ........................................................... 16UNIX Daemons ............................................................... 17

2.6 Motivation to use agents.................................................... 18

3 Operating System Tasks3.1 Introduction ........................................................................ 193.2 Process Management........................................................ 193.3 Memory Management........................................................ 203.4 I/O Management ................................................................ 213.5 File system Management................................................... 223.6 Communication support..................................................... 223.7 Synchronization ................................................................. 233.8 Security.............................................................................. 243.9 Summary ........................................................................... 25

2

4 Top-Level design of ABOS4.1 Introduction ........................................................................ 264.2 General layout.................................................................... 264.3 Core ................................................................................... 274.4 Kernel ................................................................................ 28

Division between kernel and core ................................... 294.5 Services............................................................................. 294.6 User Applications............................................................... 304.7 Summary ........................................................................... 30

Assignment of functionality ............................................. 31Service layer ................................................................... 32The chicken and the egg problem................................... 32

4.8 Achieved goals................................................................... 33

5 Examples5.1 Introduction ........................................................................ 345.2 Agent File system .............................................................. 35

Active Documents ........................................................... 355.3 Resource Allocation........................................................... 375.4 Synchronization ................................................................. 395.5 Summary ........................................................................... 41

6 Evaluation6.1 Introduction ........................................................................ 436.2 Process management........................................................ 436.3 Memory Management........................................................ 446.4 I/O Management ................................................................ 456.5 File system Management................................................... 466.6 Communication support..................................................... 486.7 Synchronization ................................................................. 496.8 Security.............................................................................. 496.9 Performance ...................................................................... 506.10 Other................................................................................ 516.11 Summary ......................................................................... 52

7 Conclusions7.1 Summary ........................................................................... 547.2 Conclusions ....................................................................... 547.3 Future work........................................................................ 55

3

1. IntroductionThis chapter will give an overview and background for the thesis, as well as presenting a survey onwhat the current status is in the common operating systems. Futhermore the trends regarding oper-ating systems in the research community will be investigated.

1.1 OverviewRequirements on modern operating systems are that they should beextensible andflexible [6]. This means that the operating system should be able to accept newbehav-iour and change existing behaviour without too much trouble. In truly distributed envi-ronments they should ideally also be able to do this without any, or very little,downtime. This is due to the trouble of for example migrating processes running onthe machine in question.During the past years the importance of the network hasincreased drastically. Thus, operating systems also need to be more or less distributed.

As always, performance is also an issue. The traditional way to solve performanceproblems is to embed everything into one single, monolithic kernel, while still using amicrokernel design and object-oriented programming to achieve flexibility. Thisenables modules to communicate via shared memory and simple procedure calls,instead of using the overhead of inter process communication (IPC). Obviously, thisembedding is in conflict with the requirement of flexibility. When every new functionhas to be imported or implemented in the kernel space, one cannot easily add new ser-vices on-the-fly. Indeed, it is also the exact opposite of what one wishes to achieve byusing a microkernel design. The cause of this embedding is, as stated, the overhead ofusing IPC calls. However, recent studies [1] have shown that the overhead for IPCcalls has decreased enough to make this a practical approach.

The benefits one can gain by using IPC to communicate within the kernel are substan-tial. Modules can be exchanged duringrun-time and new ones can be added withouthaving to reboot or recompile, which sometimes is the case. New strategies andresources can be changed and added as easy as they should be. By using IPC, one canalso easily and transparently run some of the services on another machine, thus creat-ing a truly distributed system.

A concept that is beginning to see the light and that can address the questions above isagents. Agents are small software components with certain qualities, described later inthis paper. Interesting about agents is that they are autonomous, uses IPC and canadapt over time. This makes them highly suitable for employing within an operatingsystem kernel. My idea is to explore whether agents can be used to facilitate the tasksin an operating system. I aim to present a model where agents reside within the kernelof an operating system.Furthermore, I will present a design solution for some com-mon tasks that a distributed operating system performs.

This thesis will present a brief overview of the most commonly known operating sys-tems, followed by a presentation of some of the research performed in operating sys-tems.Succeeding this there is a presentation of agents and agent technologies toclarify what an agent is. There is also a survey showing what attempts has been madeto apply agentsin operating systems.To get some understanding of what the require-ments and problems are in a modern operating system the section following this sur-vey will deal withwhat an operating system should perform. Once this is clear we canmove on to the presentation ofABOS,an agent-basedoperating system. Some exam-ples of more top-level tasks are also given, after which it is time to evaluate the agentoperating system. This is done and topped of with some concluding comments.

4

1.2 Current PraxisIn this section, I will present some of the more well-known operating systems. Someof them are maybe not so well known, but they are representatives of a group of oper-ating systems. No one can write anything about objects and agents without mentioningCORBA, so I will present this as well.

UNIXUNIX is the gathering name of a number of operating systems that share some equali-ties. UNIX systems usually rely on one of the two “grandfathers of all UNIX’s”, Berk-ley UNIX and System V. This may sound as if there exists many differentimplementations of UNIX, but these are basically just differentvariations of the over-all architecture.

UNIX is structured as a number of layers, with the hardware at the bottom. On top ofthis is the parts that run in kernel mode like process management, memory manage-ment, I/O and the filesystem. The bottom level in the user mode consists of standardlibraries like open, close, read, write, fork etc. The top-layer consists of the programs.These can be anything from shells, editors and compilers to databases and advancedapplications.

Distribution is not a part of the original UNIX, even though networking was an earlypart of it. The distribution is, in fact, limited to the ability to remotely execute pro-grams and a filesystem that allows transparent mapping of remote disks. The kernellayout in itself varies much from differentversions but generally one cannot exchangekernel parts without restarting the system. From this I draw the conclusion that thevarious tasks of the kernel is highly intertwined.Some parts, like time synchronizationand, in fact, most of theotherfunctionality runs in user mode.

Windo ws NTWindows NT1 is a single-user multi-tasking operating system developed byMicrosoft. This is one of the few operating systems around that was developed com-mercially from the beginning. The basic assumption for the design of Windows NT isthat people will be using the same machine, probably residing on their desktop, butthey might want to run more than one application simultaneously. [8]

The structure of the kernel is rather complex, and is easier explained by an image (Fig-ure 1, as presented by Stallings [8]) than by text. Unlike UNIX the system is not trulylayered and unlike Aegis, Amoeba, and Mach(see below)much of the control coderesides in kernel mode. The Hardware Abstraction Layer makes the implementation ofWindows NT platform independent and the subsystem architecture makes it clientindependent.

The layout of the NT Executive enables a great flexibility in the choice of for exampleprocess manager. Much of the kernel resides in DLLs, dynamically linked libraries,which makes it easy to exchangeparts like the security manager. Novell utilizes this intheir directory service for Windows NT[12]. It is a pity that Microsoft has not takencare of this flexibility and support the exchange of kernel modules by providing openinterfaces and explicit support for third-party developers.

1Windows NT is a registered trademark of Microsoft Corp.

5

Windows NT has many shortcomings like not being able to distribute load over thenetwork, not even in the crude way UNIX does. Having most of the system running inkernel mode makes it very hard to adapt new services without rebooting. As men-tioned, the subsystem layout ensures that the kernel is essentially ‘client independent’since both OS/2 and POSIX applications can run on top of the NT kernel. This is usedin some UNIX ports for Windows NT[13].

AmoebaAmoeba [7] is a distributed operating system, developed as a research project byAndrew S. Tanenbaumet al. at the Vrije Universiteit, Amsterdam. The goals ofAmoeba is to be a transparent distributed operating system. This means that the usershould not be aware that he is using more than one computer while working. In con-trast to many other approaches the load is balanced across the entire system withoutpreference to a specific machine. Two assumptions are made about the hardware, thatboth limits and aids Amoeba: Systems will have a very large number of CPUs andeach CPU will have tens of megabytes of memory. Based on these assumptionsAmoeba is designed to use a pool of processors, accessed via X-terminals.

The basic layout of Amoeba is that of a microkernel that manages processes, threads,memory, communication, and low-level I/O. Clients and servers rest on top of thismicrokernel. As with the case of Aegis everything fromfile systems to resource allo-cation are managed through servers that run in user space. The notion of objects iscentral to Amoeba. Everything is encapsulated into an object and managed by a server.Objects are accessed using a cryptographically protected capability, a handle to theobject. All access to servers is done by using a system-global port number. This portnumber ispresent in all object capabilities so that one can easily find the responsibleserver process. The port number is not machine specific so if a server migrates toanother CPU or system the port number will remain the same.

Amoeba is based on many outdated assumptions, that have sprung from the correctassumptions about many CPUs and lots of memory. The processor pools in the style ofan ancient mainframe that Tanenbaum assumes will, I argue, not happen again. Myprediction is that the ongoing trend with more CPUs sharing the same memory spacein servers as well as in some clients will continue and even though Sun, Oracle, and

Hardware

Hardware Abstraction Layer (HAL)

System Services

Kernel

I/O Manager

ObjectManager

SecurityReferenceMonitor

ProcessManager

LocalProcedure

Call Facility

VirtualMemoryManager

File SystemsCache Manager

Device DriversNetwork Drivers

NT Executive

Kernel Mode

User Mode

Win32Subsystem

POSIXSubsystem

OS/2SubsystemSecurity

Subsystem

Log-onProcess

ClientApplications

ProtectedSubsystems(Servers)

Applications

Figure 1. Windo ws NT Structure

6

IBM are making a great show about their thin clients, so called NCs[14], no one istalking about removing the processing power from the clients. The alternative, to havededicated workstations that are part of the processor pool, is visible in today’s cluster-ing techniques[15], but only organizations with high requirements on servers will useclustering. In the case of memory, Tanenbaumet al were correct in that machines willhave tens of megabytes of memory, but their assumption that this excess would betheirs for the taking to achieve performance is proven wrong by the applications oftoday that sometimes requires even larger amounts of memory for themselves.

Be thatas it may, Amoeba shows many bright ideas as well. To start with, it is distrib-uted in the deepest sense of the word with process migration and communication prim-itives supported by the kernel. As with Aegis, most of the system runs in user mode,thus enabling fast and easy switching of functionality.

MachMach [9] [10] is perhaps the most well-known microkernel operating system. It wasinitially developed as a research project at the university of Rochester and continuedby Carnegie Mellon University. The goal of Mach is to demonstrate that you canstructure operating systems in a modular way.

Mach has a microkernel that performs process and thread management, memory man-agement, communication, and I/O services. On top of this microkernel rests, in userspace, a software emulator layer. In this layer other operating systems are emulatedlike UNIX, Windows NT, or even another Mach kernel. File systems and other handystuff are managed by these emulators. Mach supports communication between pro-cesses at kernel level using the concept of ports. Ports reside in the kernel and acts asmessage queues.

The microkernel architecture in Mach is limited to relying on an operating systemsemulator running on top of the kernel. The advantages one can gain by exploiting themicrokernel is thus left to the capriciousness of the emulator. The fact that IPC is sup-ported in the kernel only makes the situation worse since it inhibits the emulators fromimplementing smarter or more suitable primitives. As it is, all they can do is to act asan interface to the Mach ports. The reason for this emulator strategy is, I think, to sup-port as wide a range of software as possible.

I have found no evidence of the kernel itself being modularized, and distributionseemsnot to have been the main issue when developing Mach although it does supportinter-machine communication using the Network Message Server [7].

CORBACORBA [11], or Common Object Request Broker Architecture, is a standard definedby OMG, the Object Management Group. CORBA is not in itself an operating system,but it has become somewhat of a standard if one wishes to communicate with objectsover a network.

The CORBA design is usually described as if the CORBA ORB, object request bro-ker, replaces the network. Clients are hooked on to the ORB and perform theirrequests to an object, running somewhere else. The actual communication work is usu-ally done by stubs on the client side and skeletons on the server side that in turn callsthe actual object. The stub can be replaced by a Dynamic Invocation Interface, allow-ing you to access the object anyway using an Interface Definition Language definedby OMG.

7

Many of the research operating systems[17] [18] use stubs to access the services inthe object-oriented kernel. CORBA has the possibility to use stubs, but can also callobjects dynamically. This introduces a great flexibility. As we will see with agents fur-ther on a language for invoking objects must be defined for agents as well. In the caseof CORBA they use their own IDL. The trouble is that it is only possible to send dataas if it were a normal function call. You must also be aware of which functions thatyou can call, whereas agents usually have a common language in which they can talkabout what to talk about. The calls are also synchronous, meaning that the caller is sus-pended until an answer is returned. Furthermore a function call reeks somewhat of cli-ent-server which, as we will see, is not compatible with the agent paradigm.

CORBA is usually implemented on top of theoperating system, as a part of the appli-cations that wishes to use it and as a separate ORB. Having to rely on kernel-levelprimitives to actually send data inflicts performance. As yet, I have not seen any oper-ating systems that support CORBA in the kernel.

1.3 Ongoing Resear chHaving presented some of the popular operating systems and object enabling tech-niques I will now move on to the research community with focus on what is being saidand done in the field of operating systems. In particular, two research projects shinemore than the others. These are Spring [5] from Sun and Aegis [6] from M.I.T.

Generally, the research in operating systems tends to be much aimed at object-orienta-tion. There is much discussion regarding persistent objects, migrating objects and pro-cesses [3], and even omnipresent objects [4]. One of the main issues seems to be toinvent a global naming scheme to be able to find objects over the network. Persistentobjects discussions seems to be aimed at how to store objects without loosing toomuch in performance. All in all, there is much emphasis onhowand not so much onwhat to do. The research on objects are on varying levels of detail. Some suggest ker-nel support [2] for persistent objects and others deal with more abstract reasoning onhow to name objects in a global network[19].

SpringSpring is an experimental distributed environment developed at Sun Microsystems. Itconsists of a distributed operating system and a support framework for distributedapplications. Briefly one can describe the structure of Spring as being a set of inter-faces rather than actual implementations. This is a decision taken to support the cre-ation of many differing implementations of a given interface. Spring has a specifiedinterface language[17] in which all interfaces should be written. From these defini-tions stubs are generated for the programming language of choice.

Spring has a somewhat different object model compared to other approaches. A stan-dard approach is to have an object reference locally that points to a remote object.Spring distributes the actual object, making sure that it can only exist in one place at atime. Before passing the object on one can make a copy of it. These two objects willthen point to the same underlying state.

The kernel in Spring is object-oriented, and all access to objects in the kernel is donethrough stubs defined in the IDL. At a quick glance this looks as an agent systemexcept that the IDL and calling mechanisms is as limited as that of CORBA in that itworks like ordinary function calls.

Nevertheless, Spring is a serious effort to make a flexible and modularized system butI have not found evidence that the objects in the kernel can be replaced during run-

8

time, which is a requirement to make the system run-time flexible. Spring is stuck inthe old client-server paradigm and assumes that a server should only reply with ananswer and not ‘talk back’ to the client.

AegisAegis is developed at the M.I.T laboratory for Computer Science. The goal of theproject is to prove the point that operating systems need not act as an abstraction layerfor applications. The underlying idea behind Aegis is that the only task that the operat-ing system should perform is to allocate resources securely to client applications. Thisis because the authorsfeel that operating systems have become increasingly large intheir pursuit to support every available device in every possible desired way. As theyput it: “Applications know better than operating systems what the goal of theirresource management decisions should be.” [6]. This implies that as long as securitycan be guaranteed everything from caching to actualfile system layout should behanded over to the client applications. These can then, by using library operating sys-tems, access the hardware.

The functions that the exokernel performs can be summarized as to securelyexposehardware, expose allocation, expose names andexpose revocation. This means that itshould expose as much of the hardware, its DMA capabilities, etc. as possible and thatall allocation should be done in consensus with the library operating system. Physicalnames should be exported, and the kernel should visibly reclaim resources accordingto a defined protocol.

The layout of Aegis is more of a standard operating system than a distributed ditto. Noemphasis is put on distributing objects or similar topics of interest in distributed oper-ating systems. Flexibility is achieved by minimizing the kernel functions.

Aegis may not have much to do with the construction of a distributed operating systembut it’s design proves a very important point; user-mode software can equally wellmanage resources as the kernel. This theory is used in many other more interestingoperating systems like Amoeba and Mach. It also implies that the kernel can be mini-mal and still provide the functionalityrequired.

1.4 Summar yA general trend in the newer operating systems seems to be to move as much as possi-ble away from the kernel. It is interesting to notice that the operating system thatincreases most in usage[16], Windows NT, goes in the exact opposite direction byadding more and more functionality to the kernel.

Distribution seems inmost cases (except Amoeba) to be to show the user that thereexists more than one computer in the network, sometimes giving the user a possibilityto execute software remotely.File systems are centralized and shared across the net-work. Inter-process communication is many times cumbersome for the applicationsand the possibility for it is often added to the system almost as an afterthought.

Exokernel

Hardware

Application

library OS

Application

library OS

Application

library OS

Figure 2. Aegis design

9

The only development company that has truly spent time trying to make a modular-ized, distributed, and flexible system is Sun with their Spring system. However, theobject invocation in Spring suffers from the same disadvantages as CORBA in thatyou must know the names of the functions to call.

All of these problems can be addressed if one uses the abilities in a microkernel archi-tecture to load and lose modules atrun-time. Another aid needed is to give ample sup-port for inter-process communication and inter-machine communication.

File systems, finally, is traditionally viewed as part of the kernel and in some extent ithas to be so that at least the filesystem server can be loaded but there is no need toview the entire filesystem, its ability to be shared, or its caching mechanisms as partof the kernel. By utilizing the microkernel design once again, file systems of arbitrarycomplexity may be built.

10

2. AgentsIn the previous section I presented some of the operating systems around, and the research beingperformed in the area of operating systems. The main focus of this survey was what could beregarded as agents or modular design in operating systems of today. Agents were mentioned in theprevious section, but to get a better understanding of the concept they require more explanation.This chapter tries to present such an explanation. The explanation is followed by some examples ofagents and their uses in connection to operating systems.

2.1 Intr oductionThe software industry was revolutionized when object orientation was introduced inthe early1970:s [20]. The critics claimed thatobject orientation did not contributeanything new; that in theory you could not do things that you had not managed before,and that the Church-Turing thesis about computability [21] still held. Indeed, youcould not do anything new with object orientation; it did not make non-computableproblems computable all of a sudden. What object orientation did, however, was tobring structure into chaos. Development times decreased, the amount of re-useincreased, and the training time of new staff decreased since they did not really have tounderstand the entire system before starting to work on some part of it [22].

Agents provide the same paradigm change as object orientation once did, but this timeon the level of processes. Just as object orientation introduced a new level of abstrac-tion, agents introduce another, even higher abstraction level. Instead of having oneenormous piece of software that encapsulates all functionality, agentsare usuallysmall modules with very well-specified purpose that interacts with other agents toachieve some specific goal contributing to the desired total functionality. Naturally,the benefits from such a design is best found in distributed computing. By having pro-grams that are socially aware of other programs on the network, you can distributetasks and control over the network. It is also very easy to exchange parts of the func-tionality by adding new agents that either influence the existing agents to achieve thegoal differently orto replace some of the earlier agents altogether.

When designing an agent solution, there is a major difference compared to an objectoriented design. In object orientation you design according to the objects that consti-tute the actors in the solution. Agent designing is task-oriented. Instead of looking atwhat actors are involved in an operation, you look at what tasks and subtasks the oper-ation consists of. Agents are then created to solve these tasks. Whereas object orienta-tion does not say anything about the actual tasks but rather expects the objects to solvethem implicitly, agent orientation concentrates on the tasks at hand and creates actorsthat can help in solving these tasks. An agent has thus a clearly outspoken goal with itsexistence, and this goal is part of the design, decreasing the need for a design rationale.Despite these differences, it is not my intention to say that object orientation cannotcoexist with agent orientation. An agent can very well be composed of a set of objects,just as an object can be a complete agent and an agent can be considered to be a com-plex object.

Agents bring enormous advantages in conceptual grasp of what is done, but you canstill not do more than before. Still the Church-Turing thesis hold. Calculations of algo-rithmic complexity are still valid, and NP-complete problems still take exponentialtime to solve. However, you can distribute the algorithm to more than one computer,thus engaging the CPU speed in all of the machines. This is also something that hasbeen done for a long time, but with severe implications on the readability of the codeand lots of tricky protocols and communications overhead.

11

Despite all of the benefits one can gain by using agents, there is no clear definition ofwhat an agent is. People tend to claim that they have developed an agent solutionbecause it is state-of-the-art technology, even when it is nothing more than an ordinaryprogram. To create an understanding of my view of what an agent is, a definition ofwhat I see as an agent will be presented. This definition is vital in order to understandthe intentions of the rest of this paper.

I will explain what is needed to be called an agent according to the general opinion,and top this off with my view of what an agent should do. I will explain how agentswork together, what communication protocols they use, and what a multiagent systemis. An investigation of what has been done about agents in relation to operating sys-tems is rounded off by motivating why agents are so suitable for this particular use.

2.2 Agent c haracteristicsSo far, the agent community experiences troubles in determining what should becalled an agent, and there are as many definitions as people trying to define agents.This diversity of agent definitions is both an advantage and a disadvantage. Theadvantage is that one can easily fit in much new and interesting behaviour into theagent paradigm, whereas the disadvantage is that you can never tell that “this is an100% agent”. The only thing one can say is that a program is more or less of an agent,with respect of how many agent qualities it possess and to what extent. A definition ofthe agent qualities that I like, because it is fairly simple, is summarized by Hyacinth S.Nwana [24].

According to Nwana’s definition, an agent can bestatic or mobile. A static agent doesall its work from one single computer and has no wish nor mechanisms to move toanother host, asthe mobile agents does. An agent can furthermore be classified asdeliberative or reactive, where a deliberative agent has the ability to reason and canplan its own actions to coordinate and negotiate with other agents. A deliberativeagent is also known aspro-active, because it can initiate a chain of events withoutexternal influence. A reactive agent responds to changes in its surrounding environ-ment according to a preset pattern. A reactive agent is idle until it receives some sig-nal, which it then processes.

In addition to this, an agent must possess at least two of the qualitiesautonomous,learning or cooperative. Autonomous means that the agents should be able to operatewithout guidance from human operators. This also implies that they should be pro-active, to take the initiative in causing changes rather than simply reacting in responseto the environment.Cooperation refers to the social ability to interact with otheragents and humans using some communication language. To achieve the sense ofsmartness, anagent’s actions must be based on previous events, so it needs to be ableto learn what has happened, and use this learning when making decisions.

Figure 3 illustrates these qualities. If an agent is autonomous and cooperative it is saidto be collaborative. If it is cooperative and learning it is a collaborative learning agent.An autonomous learning agent is called an interface agent because they are generallyinterfacing towards the user, serving him in some way. Typical examples of interfaceagents are personal assistants and customizable search agents.

The areas in Figure 3 are not definitive, it is just a statement of what areas a givenagent focus more on. The cooperativeness can, for example, have many degrees, rang-ing from simple signals to complex messages in a communications protocol likeKQML [30]. I am also not entirely certain about the classification of autonomouslearning agents as interface agents. I can think of many examples where an autono-

12

mous learning agent does not interface to anything, but rather observes the environ-ment to adapt its own behaviour.

Pro-activeness is a concept stressed by the agent community. The fact that a programcan take the initiative to start a chain of events is something that appeal to the agentresearchers as it emphasizes the fact that agents are autonomous. I claim that no pro-gram can ever be pro-active. No matter how you see it, it is some external event thatstarts the chain. The first event is always that the program is started. After this, manyother events of varying kinds can be the cause for the agent to react, but even if theagent uses polling to check the state of something, it is bound to the timer events. If itdecides to enter a tight loop instead of falling asleep between the pollings, there is stillthe start-up-event. Consequently all actions, even if they are preceded by two weeks ofcomputations, are the result of an external event, which means that the agent is reac-tive and not pro-active. All of this makes it hard to say that an agent is pro-active. Thecommon definition of pro-activeness is that the agent reacts to some event and fore-sees future results from this event and takes action against or towards these results. Apro-active agent is thus a reactive agent that predicts future events and how pro-activeit is just a definition of how far into the future the agent can predict. If one views theagent as a physical entity, for example some embedded equipment, the reasoningabout timer events above is falsified, because in such cases the timer event is gener-ated by the agent itself. My reasoning still holds though, because the agent is stillstarted at some point and this is of course also an event.

Agents originated from the artificial intelligence (AI) community. AI researchersbeing as they are tend to want agents to have all sorts of AI concepts like planners, the-orem provers and such[34]. This strategy makes agents more complex and actuallymore unfit for some tasks, being forced to carry around extra functionality. I claim thatan agent need not be smart as defined by the AI community to be called intelligent.Indeed,Ekdahl argues against the use of terminology like intelligent, reflective, oreven learning software[35] since this is inherently not possible within a formal sys-tem, and a computercan not perform anything which is not describable in a formalsystem. My view is more relaxed. I have no objections to using the word smart orintelligent about an agent, but my interpretation is that the agent is programmed in asmart way or that it seems more intelligent in its behaviour than other software. As forlearning, I agree with Ekdahl that the only learning that a computer program can do isto obtain information already present. This is also enough for the tasks that I see fit foragent use, and is certainly enough to achieve a higher service degree by not repeatingmistakes andkeeping faulty assumptions.

There is a common assumption that to be called agent, the software needs to bemobile. Following the above definition, this is obviously not needed. The communica-tion primitives ensures that you can reach any agent, no matter where it resides.

Cooperative Learning

Autonomous

Collaborative Learning Agents

Interface Agents

Smart Agents

Collaborative Agents

Figure 3. Agent Typology

13

Mobility does, however, give certain advantages in some situations, but requires theagents to be small enough to be able to migrate. A task that require much communica-tion can be done by sending an agent to negotiate for you, thus reducing the networkload. Another example is when your machine can be disconnected from the network.In such cases, you can send off the agent onto the network first, then disconnect yourcomputer and connect it again somewhere else. The agent willfind you in your newlocation and “dock” with your computer again, providing you with information youhave requested[33].

Agent systems can use one of two approaches; a federated approach or a fully autono-mous [27]. In a federated approach agents are not truly autonomous and relies on sup-port from a facilitator, an agent host. All communication goes through this facilitatorand it provides the agents with their view of the world. In the case where agents arefully automated they keep track of their reality themselves and containsupport forcommunication. This can be viewed as a standard process running on an operatingsystem, whereas the federated model looks more like threads within a process. If oneshould use agents in an operating system the federated approach is naturally not feasi-ble unless one sees the operating system kernel as the facilitator. To have one and onlyone process running which manages the rest of the operating system as internal threadsfalls on its own ridiculousness, even if it can be argued that this is exactly the situationwe are faced with in many operating systems today.

The ‘three-quality-model’, described above, says nothing about the size of an agentand it may be hard to actually say anything about this. I claim that an agent shouldhave a limited but very well-defined behaviour. This enables modular programming inmultiagent systems according to the same principles that underlies Sun’s JavaBeans[25] and Microsoft’s Active-X technology [26]. More complex behaviour is generatedby combining a set of agents that each perform a single task very well. Keeping theagents small also facilitates the ability to migrate, which is often desired even if notalways needed.

2.3 Multia gent systemsAgents rarely come alone. Usually agents are part of some larger system of interactingpieces of software. These systems are usually referred to as multiagent systems. In amultiagent system each of the agents take care of a small and well-defined task. A sin-gle agent need not be intelligent, but the total of the system achieves more than simplebehaviors.

One important thing to note is that the agents are peers. No agent is worth more thanothers, and they have equal control. Even if some agent decides to solve a task bybreaking it down into subtasks and let other agents help in solving the subtasks, theaides are not considered to be serving the one that had the initial task. The agent sys-tem solves the task as a whole with no definition of what is client and what is server.Hence, there are no rules as to who is allowed to send a message to whom, whereas ina client-server solution the server is usually silent until the client initiates a communi-cation link.

As with everything else, there is no single way to design these multiagent systems.One way is to view the agent collaboration as that of a blackboard pattern [28], wherethe agents communicatevia a server process, the blackboard.The blackboard is not aserver in the true sense, but rather a coordinator. It can contain the data that the otheragents put there and also decide who should get a chance to work with the data next.The blackboard may in turn be distributed using a hierarchical tree of connected black-boards. In a blackboard system all agents have equal status and control is handed out

14

by the actual blackboard to the most suitable. I claim that such a design violates theidea of autonomicity since the agents do not get the right to decide for themselves whois next to run. The idea to have certain agents that contain the shared state of manyagents is however sound. These could act as repositories for information needed bymore than one agent. They can also volunteer information to agents that they thinkshould do something about the information. Co-ordinating agents like the blackboardagents are called intermediate agents.

Blackboard collaboration is just one of many ways to solve a task. One of the simplestcollaboration forms to grasp is that of subtasking. In subtasking one agent commitsitself to a certain task and divides it into smaller tasks that is handed out to otheragents, that in turn can subtask this even further [29].

Even if the agents are peers, they can either be cloned from the same agent, or they canbe specialized agents. Both of these strategies have their advantages. If the agents arecloned, they need to contain everything needed for the entire model, even if they donot exercise all of these skills at one given time. A multiagent system would, insteadof having severalagents each implementing the complete behavior, have agents thateach comprise a part of the solution. These partial solutions, albeit cloned, cantogether see and solve the task at hand. The other form is to have a heterogeneousenvironment in which no agent has to be like any other. As with the homogenous envi-ronment these agents together see the full task and theentire environment, collaborat-ing to solve their tasks. The difference here is that certain agents may have special andunique skills, whereas in the homogenous environment all agents possess the exactsame skills.

2.4 Agent enab lersHaving talked about agents in general and about collaboration models it may be inter-esting to look at some of the techniques used for achieving this modularity andsocialabilities. The agent community has more or less agreed upon a standard communica-tions language, KQML [30], that supports the easy communication required. I willpresent this language in brief, showing the main concepts of it. I will also present aprogramminglanguage designed to facilitate the building of agents and agent systems.

KQMLTo enable agents to talk to each other, you need a standard way of communicating.The ideal language is called ACL, Agent Communications Language. This ACL iscommonly used by theoreticians when they need a language, but there exists no imple-mentations or even definitions of ACL. A number of approaches has been made to thisACL, of which KQML is one of the more well-known, together with FIPA’s proposal[38]. A KQML message consists of a performative,and a set of arguments that stateswho is the caller, the recipient, a message id and a reply-message id, which ontologyto use and finally the actual command or contents. The language of the contents isstated as another argument and can be in for example Prolog or KIF[37]. The perfor-mative states the type of the message, for example ‘inform’ or ‘request’, and the con-tents contains the data of the inform message or the wishes for a request.

With this structure a KQML message looks very much like an e-mail message. Ibelieve that KQML is a sound approximation to the platonic ACL. The fact that themessage body can be stated in any language allows you to select the language mostsuitable for the task at hand. On the other hand, you risk getting agents that are incapa-ble of communicating with each other because they do not understand the language ofthe other. As long as you understand the language used, you can understand messages

15

of arbitrary complexity in almost any discipline because the ontology used is stated inthe message. I am not saying that this is simple, only that it is possible. Implementingsupport for ontologies is quite complex and may not be needed for simple homoge-nous systems.

AprilApril [31] is a process oriented programming language developed at Imperial College,London by Keith Clarket al. It contains ample support for creating processes andcommunicate between the processes. Each process will have a uniqueidentifier or ahandle associated with it. Theidentifier caneitherbe provided by the programmerorautomatically generated by the system. An API is also provided that allows easy inte-gration of April processes with external programmes. Making use of the API allowsthese external programs to exchange April messages with ordinary April processes.April gives the programmers an easy way to create agents. Communication primitivesand process management is handled by the April abstract machine, and is not depen-dent on any underlying operating system. If the April abstract machine were to rundirectly on top of the hardware, one could expect gains in for example performance.

As a programming language April provides just the primitives one would want inorder to develop agents. It is not visible to the user where process management is doneand the communication primitives are easy to use. Finding another process or agent isa relatively easy task for the programmer. Code can be sent to execute in anotheragent, enabling agents to learn new behaviour over time. All of this makes April anexample of a successful platform for implementing agents. As for communication,April does not lock you into a certain language, on the contrary it can be used toimplement such languages. For example, you can use April to easily provide an imple-mentation of the KQML communication language.

2.5 Operating system a gentsWith this definition of agents and how they interact in mind, it would be interesting tosee what has already been done with agents in operating systems.However, it is veryhard to find evidence of anyone attempting to support agents in operating systems.Looking through both what is done in the operating systems community and the agentcommunity, few papers deals with agents and operating systems together. TheTACOMA project [23] has developed an agent platform for mobile agents. Anotherexample is the mail transfer system MOTIS [32], which is also an ISO-standard. Themost well-known example of agents in operating systems are probably the variousdaemons in UNIX systems. They might not be known as agents but, as I will show,they certainly are.

As seen earlier, many attempts has been made to define naming schemes for objects,defining how to make objects persistent and so on. These objects usually lack theautonomous quality or some of the other qualities needed to be called an agent. Also,as said before, there is more emphasis onhow to support objects rather thanwhat theobjects should do. In the agent community, the situation is quite the opposite. Hereone assumes underlying communication primitives and support for agents exists andworks and instead defines the behaviors of the agents. Nevertheless, the discussionsare usually held on a high level, not giving any concrete examples of what needs to bedone and not one even hints at using agents to perform operating system tasks.

16

TACOMAThe TACOMA project is an exception to this ‘what’-wave visible in agent researchcommunities, since it is concerned with operating system support for mobile agents.The TACOMA project uses mobile agents and the metaphor that they should do aspeople; visit a place, use a service and move on. Granted, if the service to use involveslots of communication or transactions requiring security, the mobility of the softwareis a desired feature. For mosttasks, though, the network can equally well relay themessages instead of the agent binary code. The TACOMA model also employs amodel with folders, or briefcases, to carry an agent’s state around the network. Sur-prisingly, these folders are not part of the agent, but must be transferred to whereverthe agent wishes to use them.

TACOMA is implemented in Tcl/Tk[36] on top of a UNIX kernel. This means thatyou need a Tcl interpreter on each host in the network, acting as a sort of facilitator forthe agents. With the TACOMA agent support, a mechanism for exchanging electroniccash has been implemented to demonstrate its abilities. Agent-based schemes forscheduling and fault-tolerance has also been implemented using TACOMA.

The main disadvantage of the TACOMA project is that they focus on mobile agents,claiming that this is the only type of agents that the operating system needs to support.They have built an agent platform in Tcl/Tk on top of a UNIX kernel, but still claimthat they have implemented operating system support for agents. Their main focus isthat of electronic commerce, adding solutions to the operating system tasks that areneeded for this such as fault-tolerance and security.

MOTIS and X.400MOTIS is similar to the SMTP[32] in the TCP/IP suite. MOTIS, which is more acomplete message transfer system than a mere transfer protocol, is based on X.400[32], defined by ITU-T. As with many ISO-standards regarding network protocols andsystems it is very well defined but rarely used. Most systems today use the standardsdefined in the TCP/IP protocol suite.

The X.400 system can be characterized as having a User Agent (UA) that communi-cates to a local Message Transfer Agent (MTA) using a Submit/Deliver Service Ele-ment (SDSE). The MTA communicates to the recipient’s MTA using a MessageTransfer Service Element (MTSE). The recipient’s User Agent then acquires the mes-sage from this MTA and presents it to the user. All in all, four protocols are used; onebetween User Agents, one between the SDSEs and two between the MTSEs. MOTISlooks very similar to the X.400 model but one MTA manages an entire site and acts asa bridge to other sites via the X.400 pubic message handling system.

The X.400 system is illustrated in Figure 4 as presented by Halsall [32]. The userscommunicate via a user agent to the SDSE. The SDSE connects to the local post officeand another SDSE. After going through some checks and systems the message exitsthe post office via an MTSE to the recipient’s post office. From here on the path isreversed on the recipient’s side. The UA polls an SDSE that in turn connects to thelocal post office and receives the message. Communication from UA to UA is done inprotocol P2, SDSEs communicate via P3/P7 and MTSEs via P1.

MOTIS at least calls the different software components agents, and indeed they are.The message system is autonomous in that it acts without a user interference. It cantake decisions on where to route a message, and a smart system might even learn cer-tain routes over time. It is also collaborative consisting of several autonomous unitsthat communicate via defined protocols.

17

There are two major guidelines in the network society. The first one is to use as manyacronyms as possible, which is why I have included these above lest someone shouldmiss them. The other guideline is not to use ISO-standards. ISO-standards are gener-ally newer, meaning harder to integrate with existing standards, than the commonlyused. They are also considered to be slower and causing more overhead because theyare better structured with more layers and this usually decrease performance.

UNIX DaemonsTo say that agents have never been used in operating systems before is not entirelytrue. In UNIX-systems much of the extra service that is provided and which is oftenviewed as part of the operating system is managed by so-called daemons. A daemon isa piece of software that takes care of some service, often listening to a certain commu-nications channel. When requests come on this socket the daemon processes it accord-ingly. Examples of such daemons are the telnet daemon, the www daemon, and the ftpdaemon. Other daemons react on the clock like the cron daemon, or on events in thesystem like the syslog daemon. Some daemons like the pageout daemon can be con-sidered to be part of the kernel, but I argue that it is not. The pageout daemon existsmerely to enhance performance by freeing memory for future use when the machine ismore idle than otherwise, but the operating system can manage without this memorypage cleanup by only evicting memory pages when needed.

These daemons have many qualities that would classify them as agents. They are reac-tive, since they idly listen to a port or device until a request comes along. They areautonomous, because they need no baby-sitting from the user, and the user is rarelyeven aware of their existence. Many of them are social, communicating with the calleraccording to some protocol to process the requests. In some cases like the sendmaildaemon they are also collaborative, helping each other in delivering e-mails to theright user and the right place. Daemons like the telnet and ftp daemons does, however,have some qualities that makes it dubious whether they really are agents. A programthat simply listens on a port and responds when spoken to belongs more to the client-server paradigm than the agent paradigm. To be called an agent, the program wouldneed to be able to initiate communication using a peer-to-peer protocol. Neverthelessdaemons are the closest thing to agents to be found in operating systems today. Acommon treat is that they provide additional functionality which is not needed by the

OS

UA

SDSE

OS

UA

SDSE

MS MTA

SDSE MTSE

Message handling system (MHS)

Message Transfer System (MTA)

User User

SDSEMTSE

MSMTA

P2

P3/P7 P3/P7P1

Figure 4. X.400 Functional model

18

operating system but provided as an extra feature. Indeed, some daemons are not evenprovided together with the operating systems but have to be downloaded and installedas any other software.

2.6 Motiv ation to use a gentsAs we have seen in the case of UNIX-daemons agents are suitable for many of the ser-vice tasks in an operating system. Any task that requires monitoring of some device ornetwork socket should manage without interference from the user. As I also haveargued, such daemons are not quite agents but rather a standard client-server solution.Within the operating system kernel there are other tasks where the subsystems workmore like peers. So does for example memory and process managers need to coordi-nate their work to suspend a process waiting for a memory page to be read from disk. Ibelieve that by viewing the entire kernel of an operating system as a multiagent sys-tem, you can benefit from the increased support for communication and the possibili-ties to implement parts of the kernel with arbitrary levels of intelligence.

To have an operating system kernel that is composed of several stand-alone modulesthat interact in a structured and extensible way gives enormous possibilities to create asmart and extensible system. Instead of the standard interaction method of functioncalls you can really maintain a dialogue between the subsystems. Hopefully this,together with other techniques, will yield a more intelligent operating system. The factthat parts of the kernel can be replaced during run-time and that you can add moremodules to help in a certain task makes such a system extremely flexible and extensi-ble.

An operating system kernel that is composed of a set of autonomous modules, oragents, can seamlessly be made into a distributed operating system by adding a globalnaming scheme for the agents. You can, in fact, have a single memory and processmanager for an entire network. Or you can have the memory and process managerscommunicate with each other to decide on scheduling policies as you would in amultiagent system.

19

3. Operating System TasksThis section takes a deeper look at what tasks an operating system should perform, presenting theproblems and pitfalls connected to each of the tasks. No attempt is made to solve the tasks, focusinginstead on giving an unbiased view of the problems. This chapter should be read as an introductionto what I will attempt to solve with the agent-based operating system in the next chapter.

3.1 Intr oductionA. Tanenbaum regards an operating system as either an extended machine providing abetter interface to and an abstraction from the actual hardware, or as a resource man-ager providing support and control for hardware access [7]. The kernel should thushide the complexities of for example memory and process management from theapplications. It should also be able to allocate hardware to processes in a consistentand coherent way. This extended machine should basically hide memory layout, disklayout and I/O intrinsics. It should also hide the fact that more than one process is run-ning on the same CPU from the applications. The last condition also implies thatapplications should see resources as their own all the time, even if they are shared bymany other clients.

Using the ideas of Amoeba, Mach, and Aegis, thefile system can be managed by aprocess running outside the kernel. Thereis also research suggesting that the kernelneed not bother about process management either, but that this can be managed viaCPU inheritance scheduling[39]. Aegis goes a step further, saying that the applica-tions should themselves manage memory and I/O as well so that the operating systemcan live the easy life just pointing at who should get to do the work next.The fourtasks identifiedare, however, not adequate for modern distributed operating systems.They were barely sufficient for ancient single-user, single-computer operating systemsand distributed applications require more support fortheir distribution if they are evergoing to be developed at a large scale.

During the rest of thischapter I will present tasks needed for a distributed operatingsystem that, even if they may not be part of the kernel, needs to be addressed. Most ofthe information below is fetched from Tanenbaum [7] and Stallings [8].

3.2 Process Mana gementProcess management ranges from CPU-time sheduling to process creation. You firstneed a way to create a process, either by creating a new process control block, or bycloning an existing. While the process is running, you need to be able to suspendit andlet it wait for some device without consuming CPU time. The CPU should be sharedamong a set of processes with different priorities. The user should preferably notnotice that he actually does not have one CPU for each program he runs, so prioritiesneeds to be shuffled in accordance to the user activity. Processes that are waiting forsomething should not be kept running, since this is a waste of resources (CPU-time).The user should be able to address the differentprocesses in some way, at least to beable to kill or suspend them.

A common way to manage processes is to keep informationabout them in a processtable. The entry for a certain process typically contains itsprocess ID, its priority,whether it is running or blocked andfor scheduling purposes the amount of time it hasbeen run. There is usually also information about the program counter, stack pointer,open files, and memory blocks thatthe process controls. Scheduling differsbetweenvarious operating systems mostly in the way processes’ time quanta are handled.

20

One might think that process management must reside in the kernel, but UNIX SystemV shows that this is not true. In System V, the processes manage most of the tasksdescribed above using library functions within the process itself. The only concession,if you can call it that, is that the library functions are run in kernel mode. For standard,non-mobile processes this is a fine and well-working model, but if processes have theability to move to another machine, it causes overhead ifthis codeshould be movedaswell. The operating system code may not be able to run on the receiving machineeither, since this host can be running another operating system in which the systemcode will not work.

3.3 Memor y ManagementMemory management is simply the concern to provide each process with memorymapped to its address space. This can be done simply by letting the process address allthe physical memory, or by providing a virtual address space to the process and map-ping the memory contents to a certain place in the physical memory. Of course, in amodern multi-process operating system you need the latter model since one of themain ideas is that processes should not be aware of other processes unless they reallywant to. Hence, it would not do if processes had to share memory space with someoneelse. This would also cause great problems with security issues. The usual way tosolve the mapping between virtual memory to physical memory is by dividing thememory into pages and to swap in pages as they are needed. This paging is performedtotally transparent for the processes, that does not know when it happens or how itworks.

Processes can be presented either with a full view of all the memory, or with a set ofsegments, each comprising a part of the full address space. For some purposes this canbe practical but, since segmentation is usually achieved by utilizing some of the bits inthe address space to select the segment, I find segmentation rather pointless except asa conceptual notion. Segmentation made more sense before hardware supporting pag-ing was invented and implemented into computers.

Another thing to take into consideration is whether processes should be allowed toshare memory pages or not and, in such cases, how much and how to do it. Commonapproaches to this is to share full pages or segments from a pool of memory pages.This pool is often of a fixed size, thus limiting the amount of shared memory that thesystem can manage. Lots of research have also been put into how to share memory indistributed applications[40], letting processes on different hosts share memory as ifthey were working with the same physical memory.

With memory management there is not much that can differ between different operat-ing systems since this is mostly controlled by the hardware. Paging, being a practicalsolution, is used by all modern operating systems and what varies is how selection ofpages to evict is performed. Shared memory, on the other hand, can be managed inmany different ways. It may for example be done blockwise or per segment basis.Another thing that can vary from system to system is process migration and replica-tion. Since these tasks do not specifically involve hardware, it can be done in any of anumber of ways.

As far as I can see, memory management can be handled in the same way as processmanagement, by library functions in each process. Some parts can also be handled asstand-alone processes, e.g. swapping. In UNIX systems process 0 is the swap daemon,managing all swap activity and also thestart-up of the first “real” process, the init dae-mon. Paging may also be done in a separate process but the cost of context-switchesmakes this solution impractical. A common approach is to have a separate pageout

21

process, responsible for freeing up memory to a certain level by paging out non-usedmemory pages.

3.4 I/O ManagementI/O management is something that is consuming an increasing part of the code in oper-ating systems. This is due to a generally held conception that the operating systemshould act as an abstraction layer between software and hardware, creating a uniforminterface to all types of devices. Especially needed is this on Intel-based systems,where a plethora of possible devices exists.

The tasks involved in I/O and device management are numerous. Interrupts thrown bythe device must be caught somewhere and distributed to the right process. Processesthat have been blocked waiting for reply must be awaked (Which, in fact, is the pro-cess manager’s task.). If the device communicates by direct memory access (DMA)the operating system must allocate buffers and make sure that these are not paged out,implying communication with the memory manager. Applications require intuitivenames on the devices, names that must be maintained and kept uniform. The ever soimportant error handling must be managed and the operating system must throw errorsthat make sense to the applications if they expect error messages at all. The operatingsystem must also decide whether the device can be shared by many users simulta-neously, or whether applications must gain exclusive rights for the device.

I/O is usually handled by adding a device driver for each device to the operating sys-tem kernel. These drivers are commonly linked into the kernel on start-up, so that oneneeds to restart the computer for them to take effect or to delete one. The operatingsystem will then provide interface to these device drivers in a uniform way so that forexample disk drives are accessed in the same way as hard disks. Applications caneither communicate directly to the device driver or through yet another abstractionlayer handled by the operating system.

In some cases one wishes yet another layer, residing in user space mode. This layeracts as a buffer, so that no application can lock the device indefinitely, blocking accessfrom others. Applications will instead talk to the buffer layer that can handle concur-rent requests and queue them according to some criteria (e.g. ‘First In First Out’ or‘Shortest Job First’). For devices visible to the network, someone needs to managequeues and network communication to the device. This can be done either in the actualdevice driver or in a layer resting on top of the device drivers.

What differs between operating systems regarding I/O management is mainly how thedevices are presented to the user and applications. At one extreme is UNIX, in whichall devices are presented as either character devices or block devices, and at the otherextreme is Windows NT, where you must basically know exactly what you want to dobefore you do it. In UNIX you need no special knowledge of a certain device, just itsname under the ‘/dev’-directory, whereas in Windows NT you have to know that it isfor example an audio device you are going to communicate with using a certain audio-API.

Usually the presentation scheme is deeply coded into the operating system kernel,whereas management of different devices is done by pluggable device drivers. Itwould thus be possible to make the presentation schemeinto a specific module aswell. This would yield a flexible operating system that can be tailored for specificneeds or use a general naming scheme if desired. Processes need some place to com-municate to register interest in certain events that devices may throw. This can, conse-quently, also be done in a certain replaceable module.

22

3.5 File system Mana gementNaturally, it would not do to present hard disks as simple block devices and ask of theuser and applications to try and find their way around this. Some more abstraction isneeded. Firstly something is needed to find out what block belong to what file, andwhether the files are stored sequentially or blockwise. These files need some handlethat can be presented to the user, and you will typically need some way to structurethese handles into groups, or directories. The operating system will also need somestrategyonhow to cache files.

File systems have come to a standstill with a structure of directories in which one putsfiles. The files are commonly spread blockwise over the harddisk. How to find theblocks vary; some use linked lists, others use i-nodes. There issome variation howcaching is performed and whether to use write-through caching or lazy writes. In somecases, for example in database systems, you wish to access the raw disk device since astandardfile system simply do not have the structure and hence not the performancerequired. In such cases the operating system should preferably be able to present appli-cations with a raw block device.

Having a localfile system is only half the truth. In order tofacilitate for system admin-istrators, much of the software and home directories rest on a server, accessed viasome networkfile system. These networkfile systems usually try to maintain the sameprotocols and access techniques as if they were localfile systems. Sun’s networkfilesystem, NFS[52], uses a technique where you have a virtualfile system layer thatdecides whether a call should go to a local disk or an NFS disk. The client applicationsneed thus not worry whether a call goes to a local disk or a remote.

In the past 20-odd years, not much has happened tofile systems. File systems fromdifferent vendors use the same conceptual artifacts with files and directories. Whatdiffers is how files are stored on the physical disk, and how they are accessed. Howdisks and otherfile system media are presented to the user and system is another areawhere different operating systems differ vastly, but the concepts are the same. Filesare dormant phenomenons, viewed as a storage dump for other applications. Fredriks-son showsthat documents can very well manage their own activities, being more thanan passive occurrence [43]. I claim that this also holds for files at large. The time isripe for a paradigm change regarding files and how they are viewed. In Chapter 5 Iwill explain further what such afile system would look like, and what benefits one cangain by applying Fredriksson’s theories.

3.6 Comm unication suppor tThe above presented topics are the ones that are traditionally viewed as the core func-tionality of an operating system kernel. Most operating systems also provide mecha-nisms for process communication. UNIX, for example, providespipes which is a basicway to connect one thread or process with another. Amoeba, being a distributed sys-tem provide a mechanism similar to pipes, but with support for inter-machine commu-nication as well.

In a modern distributed operating system I think that communication, and especiallyinter-machine communication, is highly important. Preferably, applications should nothave to define their own protocol for data either unless they really want to, and thecommunication primitives should be intuitive and easy to use.

The problems regarding interprocess communication, IPC, are manifold. First of allthere is the question of binding an application to a communication layer so that theapplication can receive data. Incoming data must somehow be transferred to the appli-

23

cation and its memory for processing, and outgoing messages needs to be transferredfrom the applications memory to outgoing buffers or the recipients memory space.Once this has been handled, there is the question of keeping connections alive so that atwo-way communication can be upheld and at the same time ensure that the messagesreach the right recipient. To get a connection, you need to have a name, or an address,to the process you wish to communicate to. This name should be visible over the net-work and it should also be possible to transparently send messages between machines.Support must be included for both synchronous and asynchronous message transfer,meaning that clients should be able to select whether they wish to block whilethetransfer is done or continue immediately.

Once the basic primitives have been implemented, there is the question of providingthe programmers with an easy-to-use interface to these routines. RPC[41], CORBA[11], and RMI[42] are examples of such APIs. These APIsgenerallytries togive theillusion that you are calling a local function or a method in a local object, which putssome demands on the underlying communication primitives. In some cases a higherabstraction might use another API. It would, for example, be possible to implement aCORBA API using RPC.

A good support for communication should also include group communication primi-tives, with all the new problems that this brings. Group messages should be receivedsimultaneously, or be perceived to be received simultaneously. Messages shouldarrive in the same order to all the recipients and you need some way to ensure that themessage actually reached all the recipients or whether you should do a rollback on theoperation.

Even if modern operating systems provide support for interprocess communication, itis often only supported for processes on the same machine.Commonly, there exists noeasy-to-use low-level communication primitives to access other machines, and youneed to know exactly where you are going before you can start communicating. Whatis needed is an API that makes it transparent whether the receiving process resides onthe same machine or not, and a global naming service to find it, no matter where itmay reside. This API could be implemented fully in higher-level layers, but for perfor-mance reasons, and because inter-process and inter-machine communication isbecoming increasingly common, it is best housed by the operating system kernel.

3.7 Sync hronizationSynchronization is related to communication, and especially group communication.To determine the ordering of messages some sort of timestamp or uniquely orderedIDis needed for each message. The easiest way would be if all the processes shared theexact same time down to the nanosecond and to include the send-time in messages.Combined with a host-based runningID this would make it an easy task for the recipi-ent to figure out in which order to put messages and whether there are any missed mes-sages.

However, due to imperfect crystals no computer clock ticks at exactly the same rate asanother. Hence computers need to update their clock at regular intervals with somecentral time. This synchronization can either be done against a centralized clock or bysome distributed algorithm. It canfurthermore work using a polling method or abroadcast method. The troubles with having a distributed algorithm is that the syn-chronization is not accurate to the nanosecond, or even microsecond, depending onnetwork traffic and machine load. For each step away from the original server thisvariance will increase and at the same time decreasing the accuracy of the time mea-surement. Other distributed algorithms exist in which each machine regularly broad-

24

casts its time, letting the others set their time as a mean value of all the timesbroadcast.

A time server usually gets its values from some global place, for example the nearestatomic clock, and the clients in a local network gets their time from the time server.This design is used to reduce the load on the global time servers and because commu-nication within the local network is probably faster than to the atomic clock.

Why, then, is time synchronization such a cumbersome task in networked computersystems? In the ‘real world’, there exists a short wave radio station with call signWWV that broadcasts a signal at the beginning of each UTC (Universal CoordinatedTime) second. The accuracy of this signal is +/- 1 msec, but atmospheric disturbancemakes it accurate only to +/- 10 msec. The reason that this signal can be so accuratewhen it cannot be so over a network cable is simply the fact that the WWV signal neednot compete with other traffic. Thus, if one could halt all network traffic when the timebroadcast is due, one could reach equal accuracy in this medium as well.Furthermore,knowing how many metres of cable there is between client and server makes it easy tocalculate the propagation time.

For some strange reason time synchronization is not traditionally part of the operatingsystem. The system must manually be set up to synchronize its time with a server. OnUNIX systems a common entry in root’s crontab1-file is to set the clock against someserver. Berkeley UNIX, on the other hand, has a broadcasting method with a centraltime daemon that generates a mean time of all the clients connected to it. To set thetime against an atomic clock, one often have to rely on third-party software like xntp[44].

Time synchronization is only one of the problems, but it is certainly the most difficultto solve. Other synchronization problems that need support by the operating systemare semaphores and monitors. Monitors can be implemented using semaphores, so youin fact only need support for semaphores. To create a working semaphore you alsoneed some hardware support, the test-and-set assembly call that tests if a memory bitis set and if not sets it. If it is already set, a message about this is returned and the pro-cess can go and add itself to a queue for the semaphore. Semaphores are commonlyprovided at a more abstract level as operating system calls. This call takes care of boththe flag-checking and the queuing. When the semaphore is released, the operating sys-tem wakes up a suitable candidate from the waiting queue.

3.8 SecurityOne final topic that often is neglected is the issue of security. Security is related to allof the above topics.File systems needs some access control, I/O devices should onlybe accessed by authorized users, you should not be able to read data in another appli-cation’s memory unless it explicitly gives you the right to do so, and unauthorized pro-cesses should not be able to snoop network packages not designated to them.

Another aspect of security is fault tolerance.Force Majeur events like earthquakes orpower failures should preferably not result in loss of data, or the loss should at least beminimized. The human factor, which is sometimes even more disastrous and definitelymore unpredictable than earthquakes or floods, should also beconsidered. Thus thesystem should not crash or cause irreparable damage because of a simple human error.A common cure against user errors is to require a certain access level to be able to do

1The cron-daemon runs tasks listed in the crontab at the specified time.

25

stupid things. Regardingforce majeur, there is not much one can do, since most faulttolerance schemes result in lower performance. This is not to say that there is nothingonecan do, it is just hard to make something fully fault tolerant. Database vendorshave had this problem since the very start, and there exists sound principles describinghow to achieve fault tolerance. In recent years the operating systems research commu-nity has begun to realize this as well and papers concerning for example logfile sys-tems[45] are beginning to emerge. Modern operating systems are also fairly faulttolerant without any modifications or special schemes.

The area in which breaches occur most often and where the most improvement can bemade is in the area of protection against mischievous users and processes. In principleone can say that for every feature you provide the user you create at least one securityproblem. UNIX has long had the policy to trust its users to some extent. Hence, pass-words are often sent as unencrypted text when connecting via telnet or ftp. The file inwhich passwords are stored is also commonly available by default. To acquire accessto such a system is a mere matter of listening to the network traffic until someonestarts up a telnet session. Using publicly available software it is just a question of timeuntil one has decoded a number of passwords in the password file.

The reason for this naive approach in UNIX is the TCP/IP protocol, that does notinclude coding. TCP/IP only spans level 1 to 4 in the OSI-model [32], whereas codingand encryption is handled at level 6,thepresentation layer. The implementors of theapplications (like telnet and ftp) have not considered security as an issue, either ignor-ing the problem or assuming that a lower layer has taken care of it.

3.9 Summar yAs you might have guessed, one of my main goals is to deprive the actual kernel of asmuch power as possible. The underlying theory behind this is the same as with aegis;it is not the operating system’s task to manage resources, all it should to is to allocatethem to processes in a secure way. I argue that almost everything can be handled byprocesses not part of the kernel. The kernel can accordingly be simplified enormously,thus reducing its size and increasing flexibility.

As the situation is today, most of the tasks above are handledwithin the kernel. Achangein memory management or process management would result in a recompila-tion of the kernel, and most certainly a restart of the system. Adding and removingdevices is a bit more flexible, but still often requires a restart. To add support for newfile systems is equally cumbersome, even if some operating systems like Linux han-dles this fairly well[46]. As for security, the different subsystems often need to imple-ment their own security check, or at least contact some global security manager beforeexecuting certain commands. This model is a paradise for security breaches, all youneed to do is to make a service fault in such a way that the security check is skipped. Abetter solution would be to go through a security layer before reaching the servicerequired.

In the previous section I explained the concept of agents. An agent is a small, stand-alone piece of software that can easily interact with other software. I claim that suchagents are highly suitable for these kernel tasks, chiefly because of their flexibility andability to interact. In the next section I will describe a microkernel operating systemthatsupports these kernel agents.

26

4. Top-Level design of ABOSIn the previous section I described what tasks a modern operating system should perform. I madeno attempt at solving the problems I described. In this section I will design ABOS, an Agent-BasedOperating System. ABOS is an example of an operating system that uses agents at the kernel level.As in the previous chapter, I will make no explicit attempt at solving the problems, but rather con-centrate on the design and functionality.

4.1 Intr oductionThe trade-off between flexibility and performance often result in large monolithic ker-nel structures where changes are done by recompiling the kernel. The kernels com-monly only have one memory space because the cost for an in-process function call isconsidered significantly lower than an IPC-call. As I have indicated earlier, the kernelcan be deprived of much control and still be functional provided that these tasks arehandled elsewhere. This ‘elsewhere’ could be small, autonomous server processes, socalled agents. Naturally, this also implies that IPC is today fast enough even for time-critical tasks.

Today, agent platforms are usually built on top of an existing operating system. If weare going to use agents in the kernel this is naturally not sufficient. I have earlierdescribed two agent models,the fully autonomous andthe federated agent platform.Fully autonomous usually means that each agent is run as a separate process, whereasin a federated system all the agents are run within a certain process, commonly calledfacilitator. In the system I propose, the agents will be run as separate processes, butthey will still run on a core designed especially for the agents. In this sense the systemwill host fully autonomous agents but still be a federated agent system.

There exists similar operating system platforms that support dynamic objects andcomponents like Shag/OS[47], GLOBE[49], and Paramecium[48]. I claim that thestep from dynamic or persistent objects to agents is not that far, and my work will relysomewhat on the theories and conclusions of these and similar projects, but the ABOSkernel is not quite like any of these.

In this chapter I will draw an outline of a flexible and extensible agent-based operatingsystem. Starting with the core I will work my way outwards to the end-user applica-tions. I will not describe these in detail, but rather stop at the service level, where ordi-nary operating system boundaries go. Exactly how the tasks in the operating systemare performed is not explained either, unless they are of particular interest for theagent aspects.

4.2 General la youtFlexibility is commonly achieved by modularity. Modularity alone is however notenough to achieve full flexibility. A kernel can be one large binary and still be builtwith separate modules. In such cases you need to recompile the kernel and restartthemachinefor every kernel level change. It will not help to store the modules separatelyeither. They are still loaded into a shared memory space and are highly dependent oneach other. By instead making the kernel extremely small and let everything in theoperating system be run by separate processes, the same flexibility and more can begained without recompilation and restart of the entire operating system.

ABOS consists of such a set of small and autonomous modules, or agents. They areautonomous in the sense that they require very little support from other modules, andeven less support from the human users. The system is structured as a set of layers

27

around the core in the middle. The level of the services provided increase for eachlayer. The hardware is accessed wherever it is feasible, not forcing calls to propagatethrough more layers than is necessary. The layer structure is retained in spite ofitsdecreased significance because it provide a conceptual structuring of the privilegesthat processes within a certain layer should possess.

Figure 5 illustrates the various layers inABOS. The bottom layer is thecore system.This system provides very basic support for process’ and memory management.Around this layer is the actualkernel wrapped, providing more advanced process andmemory management. File systems are also defined in the kernel, as are moreadvanced communication primitives and I/O management. Outside of the kernel layeris aservice layer, providing things that are not part of an operating system kernel, butare still part of the operating system like user management and resource allocation.User applications, finally, are run on top of the service layer.

4.3 CoreAt the very core of the system is the bootloader that in turn loads the core modules.The core modulestake care of process management, memory management and com-munication primitives. The modules support very little intelligence, just a basic sched-uler and primitive memory management. In fact, they should only support the creationand running of the advanced services in the kernel. It would also be convenient if thecore modules were the only modules that interfaced the platform-specific hardwarelike CPU and memory, letting the kernel bother with the more strategic decisions.

The reason for this division between core and advanced service levels is to gain flexi-bility. By keeping the services at this level simple, more interesting functionality canreplace the default without disturbing the low-level hardware interface. Itis, however,not advisable to remove the core layer altogether since you need some support to getthe rest of the system up and running.

Process management is required at this deep level to be able to run the rest of the ker-nel as separate processes. You need to have at least support for creating processes atthis level. More advanced process management can very well be handled at the kernellevel. The same reasoning holds for memory management. Once a process is created,its code needs to be loaded into memory. Everything else can be done by library func-tions within the process.

A process is created by a request to the process manager. The process manager ini-tiates a process control block and assigns priorities etc. to it. A uniqueID is also gen-erated and associated with the process. ThisID is used by all other processes to findthe newly created one. At this level, no automation is present, so you need an extrarequest to the memory manager requesting a page allocation and the binary code to fill

user applicationsservices

kernel

core

Figure 5. General la yout of ABOS

28

the pages with. With this allocation scheme it is easy to load the rest of the kernelsince the binaries are specified on creation. The bootloader can load the core and thenstart to load the rest of the kernel using the core. As soon as enough of the kernel isloadedfor it to carry on by itselfthe bootloader can terminate.

The support for process communication is also placed at this low level, since the ker-nel agentsuses the available IPC mechanisms to communicate with each other andwith the core system. At this level the communication is restricted to simple pipes,described in more detail below. Additional primitivesto for example use a particularcommunication language is handled by library functions within the other processes.The core modules can of course also communicate with each other so that incomingmessages for a process results in a wakeup of the process and memory is freed uponprocess termination.

4.4 KernelAt the kernel layer the functionality is found that is more commonly part of the operat-ing system. The following tasks are handled in the kernel layer:

• Final process management• Additional memory management• File systems management• Communication naming schemes• I/O management• Network management

As seen in Figure 6, the core takes care of the basic memory management and processmanagement, whereas advanced services resides in the kernel layer outside of thecore. All managers should be considered agents, even those in the core. The coreagents have given up part of their independence, but they are still autonomous andsocial. They all communicateusing a predefined agent communication language,ACL.

The final scheduling algorithms for processes etc. are handled at this level, using thecore process manager to do the actual switching of processes. Swapping and paging

Core

MemoryManager

ProcessManager

CommunicationManager

ProcessManager

MemoryManager

File systemManager

CommunicationManager

Kernel Layer

Advanced

Advanced

Advanced

I/OManager

NetworkManager

DevicesDeviceDrivers

DevicesOther

AgentsKernel

Figure 6. ABOS, kernel le vel

29

algorithms are categorized as advanced services, using the core functions whenneeded to do the actual work. Listings of shared memory pools are also managed out-side of the core, together with access information and authentication. The kernelagents use the core communication management to access the advanced communica-tion services. The advanced communications manager provide intelligible naming andnetwork-wide communication.

Letting the core manage all processor and memory access means that only the corefunctions and perhaps the device managers need to be run in kernel mode.The rest canvery well be taken care of in user mode, even if it may be desirable for security rea-sons to let other tasks run in kernel mode. Removing the need for other agents to beaware of platform-specific details also means that there is a great flexibility toexchange parts without affecting others and without needing to rewrite these agentsfor every new platform.

Division between kernel and coreIn the cases where you have a manager in the core and an advanced manager in thekernel it is vital that there is a clear definition of what should be done in the kernel andwhat should be done in the core. In my view the core should only manage the verybasic things, leaving all policy decisions and more complicated services to theadvanced managers in the kernel layer. Consequently, the core process managershould only manage the creation of new process control blocks and run a process untilit yields or falls asleep on some operation. When a process is removed from the run-queue, a new process will be selected to run. The most basic run-queues (running,blocked, and ready) should be managed, but it should also be possible to create otherqueues and move processes between the queues on command from the advanced ser-vices. The advanced process manager should maintain these extra queues and com-mand movement of processes according to its algorithms. Creating a process at thekernel level can also include more than one operation in the core. Memory may beallocated and communications channels set up, according to the advanced processmanager’s strategies.

The core memory manager should be able to allocate and deallocate virtual memorypages to physical memory. It should be possible to fill an allocated page or a set ofpages with a binary string. The advanced memory management performs paging andswapping to and from disk. Lists of which memory pages that are shared are main-tained by the advanced memory manager. Page faults are handled by the core memorymanager, but the advanced memory manager has to be asked for what to do when apage fault occur.

Communication at the core layer is merely a matter of receiving an address and astring and deliver this to the right process, notifying the process manager that this pro-cess should be put in the ready queue if it is blocked. At this level, communication isdone using pipes, similar to those used in UNIX.At the kernel layer, a process can cre-ate more than one communication channel, or service, binding it to a certain name.The kernel communication manager also has the ability to make a name visible overthe entire network, and forwards messages via the network manager to the right host.

4.5 ServicesThe service layer is wrapped around the kernel layer. In an ordinary operating systemthere are tasks that belong to the operating system, but not to the kernel. These are tra-ditionally run in user-mode with the help of kernel-mode programs and functionswhen needed. It is my opinion that by adding a certain layer in the operating system

30

structure for such services, they may get the acknowledgment they deserve withoutgiving them too much power.

At this service layer things like resource allocation and time synchronization shouldreside. It is also here that the users are managed. This is not to say that there doesn’texist a user notion in the kernel and core, but at the service level this user ID ismapped against login names, authorities and restrictions, home directory paths andenvironment variables, etc. Process migration should be provided as a service at thislevel. Since this feature is not a part of the primary operating system functions itshould not be included at kernel level. Agents for encryption and authentication isanother thing that this layer provides. The reason is the same as for user managementand process migration. The operating system can manage without it, but it is a conve-nience to provide support for it.

Since network management and communication is provided at a lower layer, the ser-vice layer can be global for the entire network, creating a foundation for a distributedenvironment and hiding the peculiarities of the individual systems. Many of the ser-vices are global by their nature, like user management, but it might be practical toduplicate the services to every machine for performance reasons. Since the servicesare agents it is no problem for them to cooperate and propagate changes to othermachines in the network.

4.6 User ApplicationsThe final layer is the user applications. These typically involve user interfaces and aremore interactive than the previous layers. A smart process manager could take advan-tage of this information and manage processes’ time quanta and priorities accordingly,giving larger quanta for services than for user applications. Reducing time quanta forinteractive processes is something which is utilized in Windows NT[50] where timequanta differ between the server version and the workstation version of the operatingsystem. In both UNIX and Windows NT, every process that receives user input fromkeyboard or mouse gets a boost in priority, which is another way of ensuring that userprocesses gets the CPU time needed. By identifying processes as user applications orsystem services, one can make such priority boosts even more fine-grained.

4.7 Summar yComparing this operating system architecture with the commonly available kernelarchitectures, it differs mostly in that the kernel functionality is run as a number ofstand-alone processes. The number of context switches increase, and hence the perfor-mance decrease. However, the flexibility is increased manifold by allowing new ser-vices to be added or replace old ones at runtime. For instance, you can install a newrelease of the operating system without even having to restart the computer! Since thefunctionality is all located in autonomous modules, all dependencies and open connec-tions will still be open and valid when a new service is installed to replace an olderone. Unlike UNIX where almost the entire operating system is duplicated as libraryfunctions into every process [8], the functionality inABOS is divided into stand-aloneprocesses.

The division into the four layers core, kernel, service, and user gives a possibility toconsider the type of process when scheduling and setting time quanta. Server pro-cesses usually gain by running longer periods, user processes work better with smallertime quanta. In a traditional operating system, there is usually only one rule whenscheduling is considered, namely to spend as little time as possible in kernel mode. As

31

there only exists two modes, user and kernel, you have no possibility to affect thescheduling policies further.

Since the different parts of the kernel are no longer compiled together, one can thinkthat there are high demands on well-specified protocols and interfaces for the variousagents. This is true to some extent but not fully. You need a minimal set of commandsthat are standard for each service, but each agent can provide any number of extrafunctions as well. Agents can also reduce the original set of calls by simply ignoringnon-supported messages or reply with an ‘invalid command’ message. The point ofhaving at least a minimal set of commands is that the semantics of these method callscan be coded into calling agents. On top of these minimal protocols, you can agreeupon a certain ontology in which you will talk to the specific agent.

To achieve the flexibility strived for, the calling protocols must define mechanisms totake over control for specific tasks. So can, for example, a file-encryption agent takecontrol over the ‘write’ and ‘read’ commands from the file system agent, and in turncall the file system agent to store the file once it has been encrypted. Another way tosolve this would be to tell the file system agent to ask the file-encryption agent beforestoring or loading. Which strategy to use is just a matter of design, and is not handledin this paper.

The solution with advanced managers in the kernel layer makes it possible to havemore than one manager for processes and memory. In this way an application can pro-vide a special process manager that takes care of a subset of the processes to schedulethese in another way.The paging algorithms can in the same way differ depending onthe application. More forms of communication can be added in a similar way by add-ing communication managers. Having several process or memory manager requiresthat these are implemented to be aware of this situation, communicating with eachother to decide who should take care of what process.

It should be noted that I have taken the agent strategy to an extreme. The point I amtrying to make is that youcan perform all the tasks of an ordinary operating systemwith a multi-agent system. This does not mean that it is the best solution to do so. Itmight be better to have certain parts as a standard module-based operating system andemploy agent-based techniques in other places of the operating systems.

Assignment of functionalityThe core modules, acting as an interface to hardware, are inherently harder toexchange than the kernel agents outside of the core. This is not because they are sodeeply needed by the system, but rather because they involve machine specific codeand optimizations. Still, I have strived to keep them as small as possible, putting themore algorithmic behaviour into agents in kernel-space. This approach enables func-tionality to be replaced and added at runtime without disrupting the system. Behaviourcan also be altered without extensive knowledge about the target platform.

The operating system can very well work without the notion of users, and processesshould be secure from each other by default, unless they state otherwise by sharingmemory or communicating to untrusted agents. Accordingly, I do not introduce thenotion of users until at the service layer. Users should be seen as a convenience toachieve security but the lower layers function without them. Nevertheless it helps tohave some user awareness like the owners of processes and files. The kernel agentsmap a user ID as ownership and ask the user manager agent in the service layer forinformation about the access rights when needed. If no user manager is present, thekernel agents fall back to some default behaviour. The reasoning considering processmigration is similar. It is a desirable feature, but the operating system can manage very

32

well without it. Strictly speaking, the support given at the kernel level is with someexceptions that of a single-user, single-computer system and the service layer extendsthis to be a multi-user distributed system.

Naturally, this division can be discussed from the perspective of performance, but Ibelieve that the cost for context switches and IPC communication is small enough tobe negligible, considering the vast improvements gained in flexibility and structure.There is virtually an equal amount of data to be stored when performing an in-processfunction call as when doing a context switch. The data to be loaded for the new pro-cess is however larger, but still small enough not to make this a real problem. Withmodern processors it is a matter of mere microseconds to conduct a context switch. Asfor IPC, one can by using shared memory reduce this task to the same level as if youwere putting something on the heap before a function call.

Service la yerSomething completely new that I have introduced is the service layer. I have not foundevidence of anything similar in the research papers and operating system descriptionsthat I have read. The reason for this layer is that there are things that are part of theoperating system, but not needed for the actual kernel. These services should beallowed to run in kernel mode if they need to, and they should be able to communicatewith the kernel in a leisurely fashion. In some cases the kernel agents are even awareof the existence of services in the service layer to for example authenticate requestsfrom users with the user manager.

In traditional operating systems there is a very strict division. Either you are in the ker-nel, or you are outside it running in user mode and forced to rely on the functionalityof the kernel. I believe that by adding a service layer one makes this border more flex-ible since you can have non-essential functionality running in kernel mode and thekernel agents can also rely on the presence of the service.

The chic ken and the eg g pr oblemRemoving the file system from the bootloader causes an unwanted problem; how toload the file system manager from a non-existent file system. Before you can load thefile system, you also need to load the I/O manager and the disk device driver. I have noobvious solution to this problem. To simply move the file system manager into thecore inflicts the flexibility goals since this means that the I/O manager and devicedriver also would need to be put into the core.

In traditional operating systems this is not so much of a problem since there you knowwhat file system you are using and can support at least read-operations in the boot-loader, but with the flexible model used inABOS you have no way of knowing howthe files are stored and can consequently not know how to load them.

One could make a trade-off like the one with process and memory management, hav-ing a basic I/O manager and file system manager in the core and an advanced manageroutside, but this would still, in my view, make the core system too large and too mucha central component in the system.

Another solution would be to have a separate file system for the boot loader, loadingthe core system and the file systems manager using some default disk access devicedriver and then, when the system has started both file systems manager and I/O man-ager together with disk device drivers, discard these initial device drivers and loadersystems. The real file system manager could then mount this initial boot file system toa suitable place in its directory tree, providing access to it for changes. This solution

33

may be the easiest to work with, but I believe that having a special file format for theboot disk is not desirable. It forces the file system manager to keep track of more thanone type of file system, making it larger than what should be needed.

The best solution would be to have a bootloader that contains a dummy device driverand also at least a subset of the file systems manager so that files can be read and fedto the memory manager or wherever they are supposed to go. I believe that the solu-tion with a separate file system is, unfortunately, the best way to solve the boot proce-dure, but this file system should be managed by a separate agent that works like anyother file directory agent with the exception that it manages the entire boot disk direc-tory tree.

4.8 Achieved goalsThe flexibility goal has been achieved with theABOS kernel described above. Replac-ing a kernel agent is simply a question of starting the new service parallel to the oldone, and then gradually take over the functions from it, by querying about it’s status.The final cut is done by moving the communications channels to the new agent. Add-ing functionality is even easier, since it merely involves starting up a new agent.

Having extensive support for communication and network communication in the ker-nel to rely on should increase productivity when developing distributed applications.If these services also define a flexible communication protocol like KQML [30] thatcan be used as an all-purpose communication language, it should at least be easier toprogram in a distributed fashion. Supporting process migration and including locationindependence in the communications protocol facilitates such programming even fur-ther.

Security should always be a goal when designing systems like this. The goals regard-ing security involves fault-tolerance and user privacy. Fault-tolerance and such arehandled by the autonomous qualities of the various subsystems but is ultimately con-trolled by their respective implementation. Security for malignant users is achieved byadding security agents in the service layer. The communications manager can call anencryption agent in the service layer before sending messages over network, as can thefile systems manager if file encryption should be enabled. Agents knowing that theirservices are of a delicate nature can require authentication from the service layerbefore executing requests.

Performance, which is also always expected from operating systems, is something thatI have not taken into consideration when designing the system. A quick glance at thedesign shows that the number of context switches will increase significantly comparedto an ordinary system, as will interprocess communication. The division of the operat-ing system kernel into several layers increases the possibilities for smart schedulingand varying time quanta depending on in which layer the process can be found. Run-ning kernel modules as separate processes means that parts of the operating systemcan be swapped out at times, decreasing memory demands. If the scheduling algo-rithms are similar to UNIX, processes that have been idle a long time will be swappedout and removed from the primary run queue [7], implying that the entire kernel neednot be in the primary run queue.

34

5. ExamplesThe previous section presented the design of ABOS, an agent-based operating system, showing thecore and kernel layout. To further test the design, I will in this chapter take a few tasks and examinefurther, making a more thorough design and discussion of these.

5.1 Intr oductionIn the previoussection I made a rough design ofABOS. I outlined the different layersand gave examples of tasks that resides in the different layers. I showed that the func-tions provided by a traditional operating system can be handled by agents running asseparate processes. I also introduced the notion of a service layer that can hide thingslike process distribution and enable process migration, among other things.

As mentioned earlier, there is not much one can do about process management andmemory management since these are ultimately controlled by the hardware. What dif-fers between operating systems are the actual scheduling algorithms and paging algo-rithms used. The same goes in all that is essential for the device drivers. They shouldact as a mere abstraction to the varying hardware devices, not differing much inABOSfrom what is common practice.

Communication management can be done in many ways but I assume a simple modelthat for the programmer looks like streams or pipes on which data is sent using a pro-tocol like KQML. Naming services and network routing is done by a separate agent,the advanced communication manager.

This brings us to the parts thatcan differ widely from current praxis. I have stated ear-lier that the area offile systems can be greatly improved upon compared to thede factostandard that is used today. I will examine this further by designing an agent-basedfilesystem.

Resource allocation, for example to print a document, is often requested by the user.Naturally the user have certain preferences regarding where he wants his document tobe printed, the minimum quality acceptable, and a number of other details. In a tradi-tional operating system, it is up to the user to decide exactly where he want to have hisdocument printed, forcing him to know whether the printer is working, has paper, itsprinting resolution, and where it is located. This goes for most other network devicesas well, even if the printer example is very significative. Users being as they are willprobably find a favorite machine/device and always use this regardless of its currentload, which will create an unbalanced resource usage. If you instead assign a poten-tially mobile agent with the resource request or job to be performed this agent can findthe most suitable resource for performing the request, optionally moving to a hostcloser to the device for the more communication-intense bits. This is the second taskthat I will examine further and create a design for.

As I discussed in the section about operating system tasks, synchronization, and inparticular time synchronization, is very hard to achieve. Since you have no way ofmeasuring the time a packet isen route on the network, you have no exact way of syn-chronizing clocks. I will investigate whether mobile agents can perform this task bet-ter. As for other synchronizations like ordering of events there exists several workingtheories so I will not go deeper into these areas.

These three topics (file systems, resource allocations, and synchronization) will beexamined in further detail below. A design is presented that relies on the agent operat-ing system described in the previous section, and this design is evaluated.

35

5.2 Agent File systemOver the past 20 years, not much has happened with thefile systems we use. The sameconceptual notions of files and folders are still used. Files are dead objects, a dumpingplace for applications to ensure persistence. Directories were added to be able to grouplogically connected files and handle them as a single entity. In recent years an abilityto encrypt files has been added[51]. This was done because users requested a securitylevel not supported by the applications in use.

Networkedfile systems are commonly added as an afterthought, working on top of thetraditionalfile system. An explicit command is needed to distribute a certain directorytree, causing troubles when one is working from another machine. This could beviewed as an advantage when it comes to security, but security is handled with accessrights for the individual files, so this safety effort can be considered redundant.

Caching is in most systems rudimentary at best. There is always a problem to ensurethat the file read is the most recent one, and that no cached versions exist somewherein the network. One common way to solve this is to have write-through caches, whichdecreases performance but ensures that all data is directly stored to disk. If the net-work should go down for some reason you cannot save any files since the remote diskand the networkfile server is no longer accessible.

Active DocumentsFredriksson has recognized the fact that documents are passive objects[43]. He writesthat you have to rely on external software like word processors and database systemsfor the documents to get any behaviour. The attempts at making documents moreactive, like compound documents [26] and OLE [26] only go so far. The autonomousbehaviour one can achieve in compound documents is a patch solution, and cannot bemodified to fit ones own needs. OLE is primarily concerned with visualization andlinking of data between a set of documents. The documents are still inanimate and relyon the word processor to support the OLE or compound document framework.Fre-driksson also outlines a solution to this application dependence, namely by embeddingeach document in an agent. This agent can have arbitrary behaviour, sending updatesto concerned parties or simply update itself according to some rule.

I claim thatFredriksson’s theories are just as valid on files at large, not only docu-ments, and that by making each file an agent we can gain many benefits. Things likefile dependencies, for example between source code files and their compiled equiva-lence, can be managed by the files themselves, thus removing the need for a specific‘make’-program. Caching over the network can also be handled in a smart way byactually moving the file to the computer where it is used most,storing an image on theserver when the computer and network load permits, or when the server requests thefile for backup.

SolutionThefile system manager resides in kernel mode. It defines standard file operations likeopen, read, write, and close. In addition, it also defines interfaces for setting up associ-ation tables for newly created files, to make sure that files of a certain type gets anagent of corresponding type wrapped around the data. A word document should forexample get a word-document agent by default. File types that do not have an agentassociated with them are provided with a default agent from thefile system manager.This default agent implements the traditional operations and provide basic protectionmechanisms.

36

A request for reading or writing to a file is forwarded to the specific file, and this filemay then decide whether to accept the request or not. Files can thus decide on theirown who should be allowed to operate on it, and also which programs that should beallowed to access the file. The role of thefile system manager is by this reduced to asimple interface to load the file agents from the disk device driver. A default cachingmechanism can be added, but the files should have an option to manage their owncaching strategy. This will enable databases to have their own caching mechanisms,whereas other files may decide that they are temporary enough never to ask to be writ-ten to disk.

A distributedfile system is accessed through a certain mount point, which is a separatefile agent. This mount point agent will act as a bridge to the remote system, showingthe contents of the remote directory as if it existed on the local disk. All file operationscome through the communications manager, so thefile system manager need notknow whether it is a request from another computer or a local request that it is process-ing. The actual directory agent that is accessed via the mount point agent decideswhether to accept or to refuse access from a certain host or mount point. The causesfor a refusal can be anything from not being on a trusted network to a temporary lock-out to perform system management or backups.

EvaluationThe agent layout described above gives enormous flexibility, since each file may havean individual behaviour. You also reduce the need for specific programs traditionallyneeded to manage the files. Since the files carry their own behaviour they can easilymove to a new host within the network or even a completely different host system onanother network and still execute in their intended way.

The reason for the file system manager to implement the read and write operations isthat you do not know whether the file system is agent-based or a standard file system.The manager should also have a possibility to load the file agent into memory if it isnot already loaded. Furthermore, the communication manager needs to set up a com-munications channel to the object if this has been closed down.

Having each file handling their own security and behaviour, and also the decisionrights on who should be allowed to do what, gives an extensible security model. Thefile agent will not allow the file to be transported to an insecure host, nor will it givedata to an unauthorized client. The file agent can also choose to show different viewsof the file to different clients in accordance to their specific security status. The filescan also decide for themselves whether to encrypt the data or not. In the overview ofABOS I stated that thefile system manager could request this from an encryptionagent in the service layer. Letting the files manage this by themselves removes theneed for thefile system agent to know specifics about every file. Different encryptionalgorithms can be used for different files, improving performance on files with lowdemands on encryption compared to files with high demands.

By letting each file agent manage its own caching mechanisms, file performance canto some extent be tuned for the individual systems. The main disadvantage regardingperformance is that for each file request where the file agent is not already active, anew process will have to be created and the request forwarded via IPC. Removingsome of the autonomicity of the files, reducing them to mere threads within thefilesystem manager, can reduce this bottleneck. To achieve better fault tolerance, one canhave a set of file-agent nurseries that each takes care of a number of agent threads.These nursery agents can quickly optimize their numbers with regards to the average

37

number of file requests so that there are no more nursery agents than are needed at anygiven time.

The mount point agent provides possibilities for redundant systems, letting the agentdetermine whichfile system to provide access to from a set of choices. If one server isdown, the agent points to the next one. It can also dynamically determine the fastest ofa set of servers, and let the request go to this one. If the mountedfile system is not justa read-onlyfile system but is updated as well, the servers should of course work as acluster, cooperating to store the data in a fault-tolerant way and update servers thathave been down with the changes made during their absence. The mount point agentcan also merge more than one directory to a single mount point, which might be usefulif you need to install something that is larger than the disks you have available. If youin the future let each file creation request in this directory tree use the full path, themount point agent can dynamically put the files on the least used disk, or work inaccordance to the file’s own preferences on storage space. A database log file agentmay wish to have a disk with lots of space to grow in, but the actual database storagespace knows its exact size on creation and is not expected to grow uncontrollablywhich means that it can be placed on the disk with the tightest fit.

There is a data storage overhead involved for each file, since its state and perhapsbinary executable should be stored together with the actual data. Doing a quick checkon all my java bytecode files gives that a standard class-file is approximately 5 kB insize. Assuming that a file agent requires several classes, the overhead might be aslarge as 50 kB. It should be noted that these figures may not be significative at all, theyare just based on estimations from my side. The actualstate of a file is however just amatter of a few bytes, so one can avoid this overhead by storing the binarycode for theagentas a separate file, letting thefile system manager launch this binary when a fileis requested. This would mean that an agent is needed that can ensure that the requiredbinaries are present at the future host if a file wishes to migrate. This is best done bythe process migration agent. After all, a migrating file agent is no different than anyother migrating agent, and the mechanisms for migrating are still the same. Referencesneeds to be updated and forwarders left so that it can be found in the new location.

Naturally there is a risk that the users get overloaded with extra parameters to keep inmind when creating a new file. It is not my intention to create more overhead. A fileagent is automatically selected and created with respect to the file type similar to howfiles are displayed with a specific icon in Windows NT. The users can write file agentsof their own, but it is more the task of the software suppliers to provide agents for thedifferent file types.

5.3 Resour ce AllocationBasically, one can reduce the problem of resource allocation to ‘finding the rightresource for the right price’. This may involve a number of parameters like the user’spreferences, or a company policy of which resources should preferably be usedbecause they are cheaper, or a wish to use the device closest to the target. Theresources can be everything from a printer to a fax-machine or simply a certain harddisk or sound card. This multitude makes it hard to create a general resource allocationpolicy. In the case of a fax machine the one closest to the recipient should be used, andin the case of a printer the one closest to the sender should be used. In other cases thechoice is easier, like in the case of the sound card where you only have the choice touse a local card or none at all.

A common way to solve resource allocation is simply to have different programs forall different resources, forcing the user to keep track of the various commands for the

38

resources. The user thus have to know that an audio device is found in ‘/dev/audio’,but the printer should only be accessed via the ‘lp’ command (Examples are takenfrom UNIX, System V). The user also needs to know where the device is situated,since this affects the cost of the operation. Clearly, this is not a desirable situationsince it requires both the user and facilitating programs to keep track of the differentallocation schemes, commands and locations of devices. It would be preferable if thesystem provided a common interface language for all devices, and a common interfacefor allocating the device. Instead of having this plethora of different programs andaccess ways, it would be convenient to have a single point to which all resource allo-cation requests goes, a resource manager agent. This agent can in turn create any num-ber of allocation agents that can negotiate the specific allocation by communicatingwith the device drivers.

A resource allocation can be preceded by much communication, trying to decidewhich device is the most suitable to allocate, how to allocate it, and for how long itshould be occupied. If the device or its device driver is situated on a remote machine,perhaps with a slow connection, this communication may take a while. In such casesthe allocation agent can benefit from transferring itself to some place closer to thedevice driver so it does not have to communicate over these bottleneck networks.

SolutionIn the service layer we have a resource manager, from which you request what deviceto use. The resource manager agent creates an allocation agent for the desired type ofdevice. This device allocation agent knows the specifics of the resource it is supposedto allocate, and can thus initiate a dialogue with the user or calling software requestingvalues for the parameters needed to set up the device. By hooking a policy agent to theresource manager the allocation agent can also acquire guidelines of how the companyviews the allocation. Another policy agent provides the allocation agent with theuser’s preferences.

In a distributed environment where the devices and resources may be spread over thenetwork the resource allocation agent gathers the information needed from policyagents and interaction with the calling party and sends a lackey with the allocation cri-teria to the driver’s host with the power to negotiate a deal. After a deal is struck thelackey returns to the allocation agent with this deal so that the allocation agent and thecalling party can proceed.

As with file agents the idea of a nursery can be utilized, supposing that there are manyallocation requests. The allocation agents will in such cases not exit when the alloca-tion is done, but instead stay on to take up a new job. The allocation agents can alsoutilize encryption agents and other security policies according to the security modelfor the device in question. The device driver agent willconsequently have certainaccess levels that it checks each request against. It can also monitor the number andsize of requests from a specific user and perhaps deny access when the user hasexceeded his quota. It can also contact a bailiff agent to make a transfer of capital forthe service performed.

The device allocation agent can also monitor the job as it is being processed and returnany error messages to the user. If you for example print a document on the companyprinter from your home you will want to know whether you can expect it to be lying atthe printer when you arrive thefollowing morning. The agent can monitor the progressof the print job, notifying you if a paper jams. Furthermore, the agent could find some-one that is present in the building and ask of him to fix the printer before giving up andnotifying you of the failure. Sticking to the printer example, if you have a very large

39

document to print and the company has an array of printers to take care of print jobsduring peak hours, the agent can split the job and print it in parallel on all or some ofthe available printers if you launch the print job during the idle hours.

All of the above suggestions are hypothetical ideas of what a device allocation agentcan do. This is naturally depending on the implementation of the agent. I am just hint-ing at ideas of usage that distinguishes themselves from an ordinary inflexible andstatic allocation scheme.

EvaluationBy embedding each resource request in an agent you achieve device independencesince you leave it to the agent to find out the specific parameters of a certain device byinteracting with both the device and the user. A parallel to UNIX would be if you hada single command, ‘allocate’, to allocate a device and had to answer a set of questionsto setup everything.

Usage of agents is highly suitable for batch jobs since you can provide the allocationagent with the tasks to perform as well and not only under which parameters it shouldwork. For interactive resources, using agents to allocate the resource may result inunnecessary overhead. On the other hand many tasks can be batched even if they nor-mally are not. In fact, anything that does not require inputand output in one operation,where the output is dependent on the input, can be batched. Even tasks with both inputand output can be batched, but this requires more intelligence from the agent perform-ing the job. In the case of an interactive device, the resource allocation agent can alsoprovide an advantage by letting the agent watch the device until it becomes available,signalling to the appropriate program that the resource is allocated and available forinteraction. The user or program will hence not need to manually retry if the device isallocated at the time of the initial request.

The mobility of the device allocation agent ensures that you can allocate any resourcefrom anywhere in the network and the device need not even know that the allocation isperformed by someone from another host. For security reasons it might be advisablefor the device agentto do some check anyway, to see whether the request comes froma trusted host, allocation agent, and user. This could also be done if the agent used theadvanced communications manager and the network to communicate the request, butthis way credentials could easier be faked and could also result in much network traf-fic over a possibly slow network connection.

5.4 Sync hronizationIn many situations, for example in database applications, the order in which messagesarrive is important. In a distributed environment it is also important in which orderthey are sent since part of the calculations may have been done on another machineand messages are sent to integrate the results. The traditional way to solve this is to useLamport’s algorithm [7] in which messages are timestamped according to a logicalclock, and where this clock is updated according to the messages received. This is aworking solution and I see no reason to modify this.

There is also the question of time synchronization, which is asomewhat differentproblem than event synchronization. Time synchronization is in many cases of graveimportance. A traditional ‘make’ program, for example, is extremely dependent onthat the clock in thefile server is synchronized with the clock in the workstation. If theclocks are not synchronized the ‘make’ program will not work, recompiling more thanis needed or perhaps nothing at all even if there are updated files.

40

As described in thechapter about operating system tasks, the problem with time syn-chronization is that there is no guaranteed time in which a network packet can bedelivered. This time is dependent on many factors like network load and the load ofthe sending and receiving machines. To remove at least the network factor, global timeshould preferably be broadcasted under the protecting umbrella of a jam-signal. How-ever, this would slow down the rest of the network, if it is at all possible. Using thejam-signal for anything other than a network collision would also be viewed as highlyunorthodox, and it would be better to introduce an entirely new signal to use for broad-casting the time.

SolutionEven if the network was quiet during the time signal, so that a certain delivery timecould be guaranteed, there is still the possibility that a computer is so loaded withwork that it cannot process the time synchronization at once. There are two solutionsto this. The most obvious solution is to let the network interface manage time synchro-nization messages, noting them with the local time. The network interface commonlyhave a very high priority since it works with interrupts, so this local timestamp can beconsidered accurate enough. Since you now have both the desired time and the localtime in the message it is an easy task to calculate the actual time at a later stage whenthe machine finds time to do so. This solution relies on that the network device have ahigh priority, and that it works on an interrupt-driven basis. Supposing that it is notinterrupt-based or that we do not know whether it is, we cannot assume that the localtimestamp will be the correct one. In such cases we need another strategy for synchro-nizing time. Fortunately time synchronizations need typically not be performed veryoften and can furthermore be scheduled to be executed during idle times like the nightshift. Using this assumption we can halt the entire network for the duration of the timesynchronization according to the solution below.

Once again I propose the use of a lackey agent, sent out by the machine responsible forthe time synchronization. This lackey can be sent at any time and it is the agent’s taskto agree with the machines which time would be suitable for a halt. This time may besignificantly different between different machines if their respective clocks are verymuch off the actual time, but should still be relatively close together in their distribu-tion around the real world time that the agent aims at for the synchronization event.Once the time of synchronization has been agreed upon it is the job of the lackey tomake sure that all processes are suspended from the agreed time and forward, beingresumed either by a time-out or by a completion of the time synchronization. As soonas the lackey has achieved total idleness, it sends a message to the time server statingthis. When it is safe for the time server to assume that all machines have ceased allother activity it broadcasts the time on a by now silent network. Since nothing else isrunning on the receiving machines they can process the time message at once, adjust-ing their internal clock accordingly. The machines can then resume their previousactivities and the time lackey terminates. On such a silent network with idle machines,it is also very easy to find out the propagation time to each machine by letting thetimeserver send a unicast message to each of the machines, which will in turn replyimmediately. This can be done once when a new machine is added or on commandfrom the system administrator and need not be done every time a synchronization isconducted. Knowing this propagation time ensures even better accuracy in the timeadjustment if you take the delay into calculation when adjusting the clock.

EvaluationThe main and most obvious disadvantage with this solution is that the network and themachines needs to be idle for an arbitrary period of time. This time may vary on differ-

41

ent machines, depending on the state of their internal clock. A machine that fallsasleep early has to wait for the slowest machine to achieve stillness. The algorithmthus have a tendency to penalize some machines more than others. On the other hand,the machines with the least load will most likely be the ones succeeding to fall asleepfirst which will minimize the effects of this penalty.

There is, of course, also the question of whether onecan suspend all activities on anetwork. In effect, the network needs to be stand-alone and not connected to any othernetwork like the internet to eliminate the risk that an external message enters the net-work looking for a web-server or something similar. The router can of course playalong and block all external messages for the duration of the time synchronization, butthis would require that the router is intelligent enough to support such a policy.

What, then, do I gain by using this model? I gain the accuracy of a VVW-signal,assuming that the network can be silenced. I also get a basis for a distributed timemanagement, since there is nothing saying that there needs to be a central time server.Any machine can initiate a time synchronization, so I can have a cluster of machinesthat decides which of them should get the time from the nearest atomic clock and dis-tribute it to the rest of the network. I still have the benefits of a ‘push’-solution, mean-ing that the server broadcasts the time instead of having each client asking the server.Furthermore I get an informal guarantee that all machines have received and set theirclocks according to some timestamp, since they should have been silenced by thelackey and the lackey should also have sent a message to the time synchronizer sayingthat it is ready to receive. As soon as the synchronization is complete, the lackeys canstart sending acknowledgments to the server as well, thus giving a more formalproofof that the synchronization succeeded.

There is, of course, no real need for the network to be silent or the machines to be idlewhen performing the time synchronization unless you want an extremely accuratetime. If a moderate time synchronization is enough it might be sufficient to just silencethe network, or perhaps to suspend all processes. It might even be enough to do noth-ing at all and trust that the network is fairly stable and the network agent has highenough priority for the synchronization to achieve the accuracy you wish.

5.5 Summar yThe examples above certainly do not constitute an entire operating system. They aresimply tasks that I have found more interesting and more suitable for applying anagent based approach to than others. Process management, memory management, andI/O management would make the operating system complete, but these parts are notthat different from other operating systems so there is no point in explaining anddesigning these further. The presentation of these in the previous chapter should giveenough information on how these differ from traditional approaches.

The area in which you gain the most benefits is, in my view, with the file agents.These provide a vast advantage in flexibility and understandability compared to a stan-dardfile system. To let the files be able to modify their own state gives a very clearconceptual notion of how the system works. It is no longer a program that modifies thefile, it is the file that modify its own state. In many cases the need for an external soft-ware can be removed altogether by letting the file take care of all of the work itself.

In the other two examples, resource allocation and time synchronization, the benefitsare not as clearly visible. Resource allocation can be done with the same possibility tointroduce policies even if it is not done by agents. In this case the main contributionwith my solution is that I identify these programs as being agents,I suggest that the

42

allocation agent can be mobile, andI alsosuggest that the agent can supervise the jobin a more involved way than current allocation strategies. As for time synchronization,this is today done with daemons that in effect are agents which is why I developed amore complex solution using mobile agents. The time synchronization example is agood example in many ways, showing a non-polled version of time synchronizationand also showing how to eliminate many sources of errors, but it is also an example ofmaking a relatively simple task too complicated.

One problem that all of the above solutions may suffer from is lack of performance.Having so many autonomous agents causes a large number of extra threads or pro-cesses, which in turn gives more context switches than in an ordinary operating sys-tem. This is naturally a majorAchilles’ heel. Even if computers are becoming fasterand faster the operating system should still not be time-consuming enough to benoticed. In the case offile system agents, the gains in flexibility can in many waysdefend the many context switches but with other tasks, like synchronization, it is prob-ably overkill to use agents.

Still, the three examples presented above are areas where one can gain at least somebenefits from agents. Another example springs to mind where agents would not bevery suitable, namely network packages.Embedding every network package with anagent would give each message the power to find the fastest or the cheapest way to itsdestination, and the sender would not have to worry about the package encounteringdifferent policies somewhere else since the policies are embedded in the networkpackage agent. Whereas this solution gives much power and control to the packages,the overhead cost in transferring the agent binary and start up on each routing host onthe way is simply too expensive. This is just one example where agents are absolutelynot recommendable. As always there is a trade-off between ‘cool techniques’ andpractical solutions. It is just a question of using agents where one can benefit fromthem, but one should not use agents in places where they clearly are unsuitable.

43

6. EvaluationIn the previous chapters I have described agents and operating system tasks. I have also designed anoperating system consisting of agents and furthermore given examples of more detailed tasks in thisagent-based operating system. Remaining is to evaluate the operating system designed against thecriteria set out for a modern operating system in Chapter 3. This chapter will evaluate the operatingsystem designed in Chapter 4 and Chapter 5 against these criteria. I will verify that the tasksrequired are fulfilled and hold a general discussion about the agent operating system solution.

6.1 Intr oductionAs I described in Chapter 3, an operating system should minimally manage processes,memory, I/O, and file systems. I also claimed that you need communication supportand security primitives in the operating system to make it complete and able to meetthe demands on a modern operating system. Going through these different areas I willinvestigate whether the agent operating system described in the previous chaptersmeets these requirements.

6.2 Process mana gementNaturally ABOS manages processes, otherwise it would not be an operating system.The core system takes care of the creation of processes like creating process controlblocks and such. It also manages the various run-queues. By putting all other logic intostand-alone processes that run in the kernel layer the agent operating system ensuresthat scheduling algorithms can be added and customized while the operating system isrunning. Since you can have more than one advanced process manager, you can alsohave more than one way to create a process. The easiest is to ask a process creation ofyour own manager, but you can also ask another manager for the creation. A processcan also move to another process manager if it needs to change scheduling algorithmwhile running. In this way a process is not limited to a certain scheduling policy dur-ing its entire lifetime. As the application proceeds into different stages it can changepolicy by transferring to another process manager.

Having the process managers as separate processes creates some interesting dilemmas.The most obvious is how the process managers should be selected to run without hav-ing any scheduling policies for themselves. The answer is that at least one processmanager is always present in the run-queue. This process manager should act as ananny to the other process managers and manage their process control blocks whenthey are unable to do so themselves. Another dilemma iswhen a process managershould be run. According to the standard scheduling model it should wait for its turnlike any other process, but for performance reasons it might be better if the processmanager is run as soon as someone sends requests to it. This question is more relatedto communication management, and I will discuss the problem further in this section.

The division between what the core process manager and what the advanced managersshould do is not clear in the current design. On one hand I claim that all schedulingpolicies should be handled by the advanced managers, and on the other hand I arguethat the core process manager should choose new processes to run. According to thedesign described, the core process manager does not even use preemption, but insteadwaits for the processes to yield. This results in a situation where one process can holdthe CPU forever, never yielding to let the advanced process managers run. To solvethis, we need to add preemption to the core process manager. Any process should beable to block for a specified time, and when this time has expired the core processmanager should be able to preempt the running process to run the blocked thread. To

44

avoid abuse, we can further limit this by only letting kernel agents be guaranteed exe-cution within real-time boundaries.

So far, the process management does not supply support for multiple threads within aprocess. Threads share the same memory space, but should otherwise be treated asseparate processes. Many approaches to developing agents, like that suggested whenusing April [31], use a method where every thread is a stand-alone process. I believethat an agent should not consist of more than one thread, or it will be too complex. Ifan agent needs more than one thread, it can equally well fork off a child-agent to solvethe extra task. However, it is an easy task to write a new process manager that supportsmulti-threading, should need arise. One can also solve many tasks with the use ofshared memory between full processes.

As with any preemptive multi-tasking operating system, ABOS has difficulties tomeet any hard real-time boundaries. A process does not know when or for how long itwill be run, so you can not guarantee any execution times. However, by communicat-ing with the process manager an agent can set up conditions that it for example “needsto have exclusive access to a CPU for xx clock cycles starting yy cycles from now”.The process manager can then make sure that all other activity is suspended or re-scheduled to another CPU before this time. It can furthermore influence the timer notto generate any interrupts to this CPU during the duration of the exclusive access. Thismeans that a process need not have a fixed priority. The agent can negotiate with theprocess manager to set an adequate priority at any given time. This is very differentcompared to traditional operating systems where the user commonly has to set the pri-ority manually, and is in some cases unable to change the priority once the programhas been launched.

There are no troubles involved in supporting multiple CPUs. The advanced memorymanagers might need to adapt their caching policies, but apart from this no extra mod-ifications are needed. The core will need to be aware of the other CPUs and be able tomove a process to a new processor, which implies that also the core process managerneeds to be replaced with a multi-processor version. The advanced process managersin the kernel can be aware of the extra CPUs, but it is not necessary. The core processmanager can keep the single run-queue and act as if it only had one processor even ifthe system actually has more. Multiple processors might however affect schedulingdecisions, so the advanced managers will probably benefit from being aware of theextra processors.

6.3 Memor y ManagementAs with process management, at least some control is needed in the core to be able torun the rest of the system. The core module only manages the assignment of virtualmemory pages to physical memory. The core can decide to move pages in memoryand copy or move the contents of a page, but eviction mechanisms and communicationwith the disk device for swapping and paging is handled by the advanced memorymanager in the kernel layer. Again, as with process management, you can have morethan one advanced memory manager, each having different functions. One managercan handle local memory and another shared memory while a third participates in themanagement of a distributed, network-wide pool of memory.

Bereaving the core memory manager of the page eviction algorithms causes overheadwhen a page fault occurs. In such cases the core memory manager needs to send amessage to a memory manager informing of the problem, after which the advancedmemory manager should communicate to the disk device manager and acquire therequested page and then send this back to the core memory manager. Optionally

45

another page needs to be evicted, meaning communication to the core manager to firstget available pages together with all the extra info needed about the pages and thenmore communication to receive the selected page and send this to the disk devicedriver. Supposing that it is one of the programs in this chain that had the original page-fault, you will enter an infinite loop trying to load the memory page, which will causea page fault, and so on. This is a problem that does not occur in a traditional operatingsystem since the entire kernel is always loaded into memory. Fortunately, there is asolution to this. Since it is the job of the memory manager to keep track of whichmemory pages that should be evicted to the swap disk, it can also choose not to evictpages that are part of this essential chain. This forces the memory manager to be awareof what kernel agents are involved when loading and storing a page to disk.

It is not at all clear how shared memory should work. A process who wishes to share amemory area should request this from the advanced memory manager. The managerfinds the appropriate pages and marks them as shared. When another process wishes toshare the pages he requests this from the advanced memory manager. As a reply hegets the size of the shared area, after which he sends another request to the managerwith a memory address to map the pages onto. Interesting is that a shared memory areacan have a name of arbitrary complexity. I prefer a text string describing the purposeof the memory, but a unique number is also possible.

The fact that the memory manager is an agent just as the processes are gives opportu-nities to communicate and negotiate with the memory manager. By knowing what anapplication is planning to do, the memory manager can customize the page swappingalgorithms to best suit the task at hand and thus increase performance. This is ofcourse assuming that the application informs the memory manager of its intentionsand negotiate to get a suitable paging algorithm. An application should be able to storea certain paging scheme under a certain name, so that it only needs to send a messageto the memory manager stating that “in the next xx clock cycles I intend to do theoperation nn” when it wishes to use a certain algorithm. The memory manager canthen adjust the paging algorithms accordingly, trying to join the demands of this algo-rithm with all the paging schemes that other applications are using.

6.4 I/O ManagementI/O management is the question of making devices visible for communication in a uni-form way. The different applications should not need to worry about the specifics of acertain device, but rather concentrate on what they wish to do with the device. ABOSis similar to ordinary operating systems in that each device is handled by a certainmodule, but unlike traditional approaches all communication with the device is donethrough the same communication channels as if they were any other process. Whereasthis approach enables a transparent allocation and utilization policy, it decreases per-formance. Many applications like games require direct and immediate response fromdevices, for example the audio device, and if each new sound effect should be sent andtreated as a standard IPC message, this real-time aspect might not be achieved. On theother hand, I have argued that IPC performance is improving, and that the use of IPCinstead of in-process calls is no longer any restriction to performance. The loss of per-formance must, accordingly, be caused by something else. If you use an in-processcall, the execution is performed immediately after the call to the responsible method,but with an IPC call, there is no guarantee that the receiving process is run immedi-ately after the call. Indeed, if the call was sent asynchronously, it may not even bedesired to break the running process to start executing the receiving process. As withprocess management, this problem will be discussed further in the section about com-munication support.

46

Due to limitations in the hardware, the computer generally needs to be shut downwhen installing new hardware devices. This makes the flexibility and extensibility ofABOS somewhat malplaced. A feature perhaps more desired is that of dynamicallybeing able to replace device drivers with new or more specific versions. Supposingthat a system upon start-up detects a new device, it can start up a generic driver tomanage the device which can later be replaced by one from the hardware vendor with-out having to restart the system again. The fact that the device drivers are agentsenables interesting solutions to distribute work load as well. If a task involves muchpre-processing before the interfacing to the actual device, the device driver agent canspawn peers to help with this. The newly created agents will then communicate witheach other to gain exclusive access to the device once the processing has been done.

One of the main tasks of the operating system is to present the devices to the user in auniform and intuitive way. ABOS does not specify any particular way to categorizethe devices. It is the task of the I/O manager to keep track of the available devices, andto start up appropriate device drivers when new hardware is detected. Once a devicedriver is started, it will bind itself to a name in the communications manager. Allaccess to the device will henceforward be made through this communication channel.In this way, a device is presented as any other agent or resource, and communicationto it is done in the same intuitive way. The I/O manager can furthermore act as a skillserver, tagging each device with its capabilities and usage statistics. To further enableusers to find and communicate with the devices, a special directory agent can be addedto the file system that presents the devices much like the ‘/dev’ directory in UNIX. Ifthe file system does not support agents a special file system manager can be installedthat presents the device drivers as block or character devices, again in the fashion ofUNIX.

6.5 File system Mana gementIn the basic kernel design nothing is said about how the actual file system should bemanaged. All that is said is that you have one or a number of file system managers thateach manage a certain type of file system. So does for example one manager handleUNIX file systems while another handles NTFS file systems. This division of file sys-tems into different agents enables an interesting feature, namely that one partition on ahard disk can contain more than one file system. Suppose that a system has been run-ning FAT and now wishes to change to NTFS, no files needs to be converted; you sim-ply let both file systems coexist on the same disk. New files that are created will getthe NTFS structure, and old files will be read from FAT and perhaps converted aswell. In this example all files will eventually have been converted to NTFS, but theagents can be set up to coexist so that no file system is favored.

Being able to extend and replace these low-level interfaces as the file system managersare gives possibilities to extend the set of operations allowed on a file system. So canfor example a FAT file system be extended with the ‘defrag’ operation that wouldmove around blocks on the hard disk to enhance access times. All file systems can beextended with a backup operation, and in some file systems one can force a garbagecollection. In traditional operating systems, distribution of a file system is done by acertain part of the operating system, and not by the separate directories and files as inthe agent file system. This means that to, in ABOS, add file system access to a hostrunning a traditional operating system you would need to write either a separate SMBmount point agent or a separate SMB file system manager.

The agent file system described in Chapter 5 brings enormous flexibility to add andmodify the default behaviour of the file system. As stated, many applications can beremoved altogether by replacing them with a special file agent. The common problem

47

with applications polling a certain directory for new files to process can be avoided bywriting a new directory agent that takes care of the file as soon as it is added. Such anopening can work as an intuitive access point for other software that do not necessarilyknow the interfaces of the program responsible for the directory agent.

An agent-based file system gives better robustness than ordinary distributed file sys-tems, because files can be transferred to another host if need arises. If the server is shutdown for maintenance, the open documents can move themselves to the host wherethey are edited and wait for the server to come on-line again. This means that the userwill not notice that the server is gone. The intelligent mount-point agent that switchesthe mount to another file system on another host also guarantees that a server shut-down and restart can occur virtually unnoticed by the users.

Every type of file does not benefit from the same caching techniques or even storagetechniques. The fact that every file in the agent file system is an autonomous agentenables every file to be cached and stored in a way that suits the file best. A databasefile must not be cached by the operating system, whereas a temporary file shouldmaybe not be stored at all. These are strategies that can be implemented with the agentfile system. As for storage, the database file may wish to be stored sequentially for fastsearches and a log-file may wish to be stored close to the centre of the hard disk tominimize disk arm movement when storing the data. These preferences will be negoti-ated with the file system managers to achieve an optimal situation.

The agent file system also enables a truly distributed file system by letting all harddisks on the entire network participate in the global file system. This would mean thatall the disk space that today is left unused after installing the local operating systemcan be utilized. It would, however, also mean that the global file system is dependenton a multitude of disks to be fully operable all the time. It would only require someoneto trip over a wire to bring the entire network to a standstill. Accordingly, I do notargue for the completely distributed solution, instead I recommend the use of one ormore central disks and file servers.

The problem that every file will grow to include the code for the agent as well hasbeen discussed previously. This is of course not good, but measures can be takenagainst this by storing the agent code separately and merely store the state togetherwith the file. Having two or more physical files per logical file will cause overheadand hence reduce performance. In fact, having to let every file request go through thefile agent to do some processing will also inflict on the performance. Fortunately, harddisks are magnitudes slower than processors, so some extra milliseconds will probablynot be noticeable. Greater possibilities for smart caching and customized behaviors ofthe files will hopefully also reduce the performance penalty, provided that the possi-bilities are used.

Today, when many applications are object-oriented, it is sometimes convenient to beable to store each object as a separate file. A particular example is the EPIDEMICcompiler [53], where each node in the parse tree is represented as a separate object.When you change the source code, the parse tree should update accordingly. For largefiles it would be convenient if parts of the parse tree were dumped to disk to savememory. With the agent file system this can be done. The compiler objects would stillbe notified when there is a change even if they are currently lying dormant on the harddisk, and this functionality need not even be implemented into the compiler.

48

6.6 Comm unication suppor tBecause of the extremely modularized structure of ABOS, the demands on communi-cation support are very high. First of all, the agents should be able to communicatelocally as smoothly as if they were parts of the same process. An IPC call should pref-erably not be much slower than a traditional function call to make the ideas of anagent-based operating system acceptable. The solution described is message based ascompared to the stream-based solutions that is common in traditional operating sys-tems. The communication that goes through the core communication manager is how-ever stream-based, but is not fitted for lengthy transmissions like those of a UNIXsocket or pipe. The communication channels are furthermore one-way pipes, forcingreplies to be sent as separate messages craving separate communication channels. Thissolution makes it possible and even easy to use asynchronous message transfers.

I have already mentioned the scheduling problem that arises from communication. Inmany cases like when trying to communicate with a process manager it is not accept-able to wait for the recipient to run according to the preset scheduling algorithm. Therecipient should preferably be run immediately after the caller has sent the message orwhen it has finished its time quantum. This can be done by letting the core communi-cations manager talk to the core process manager and re-schedule the recipient tosome high-priority message-receive-queue, so that it is run immediately after the send-ing process’ time quantum has run out. If the sender wants a synchronous messagetransfer, it will most likely wait for a reply immediately after it has send the originalmessage, so this policy would guarantee response times not very different from a stan-dard function call. The other agents that are not part of this particular exchange ofmessages will however suffer from such a strategy. If two agents engage in a lengthydiscussion no other process will get a chance to run; they will starve.

To avoid such problems, the process managers will mark each process with a flag inits process control block stating whether it is a fast-response, custom-response, or astandard-response agent. A fast-response agent will reply immediately, and is re-scheduled entirely by the core. A standard-response agent will not be re-scheduled inany way, and a custom-response agent will cause a trigger from the core process man-ager to the advanced process manager which can then place the agent into a queue ofarbitrary choice. Kernel agents are always fast-response agents, assuming that kernelagents are more considerate than other agents and are not likely to engage themselvesin any lasting discussions.

Without having stated exactly how the communication should work, down to the levelof how the bytes are transferred to the recipient, it is very hard to say anything aboutthe performance. I believe that one can make primitives that fulfills the goal of notcosting more than a function call. Supposing this, it is hard to find anything negativeabout the communication strategies. The advanced communication manager will takecare of all network messages, as well as being able to set up more than one channel toa given agent. These extra channels are provided to give logical names to services, butdoes in effect not provide more functionality than a single mailbox. Being able to dis-tribute names over the network helps in creating distributed applications since the cli-ents need not explicitly know the location of a service as long as it is on the samenetwork. If it is on another network, the domain server can take care of requests androute them to the right host, whereas the clients only need to know what network theservice is available on.

To find a service, it is necessary to know the name of the communication port, whichmakes it necessary to be able to query the advanced communications manager aboutwhat names are available together with some description of the skills of the agent

49

responsible for a certain channel. This could also be handled by a separate skill serveragent, which will be referred to by the advanced communication manager when que-ried about the tasks of some agent.

6.7 Sync hronizationThere are two different categories of synchronization in an operating system. The firstand most important is that of synchronizing messages so that they arrive or appear toarrive in the same order as they were sent. If a distributed system should appear as if itwere a single system it is also important that all participating hosts share the sametime.

Whereas message synchronization and ordering is handled by the communicationsmanager in the kernel layer and by library functions within each process, time syn-chronization is not part of the kernel functions. This is also visible in ABOS, wheretime synchronization is done by agents in the service layer. The synchronization is ini-tiated and performed on the command of a single machine so one can guarantee thatall machines actually gets synchronized. If one follows the algorithm fully, silencingboth the network and the machines before the synchronization, one is also guaranteeda highly accurate synchronization compared to the traditional approaches.

I have already argued against the solution in Chapter 5, claiming that it is unnecessar-ily complex and that the need for the participating machines to be idle during thelength of the synchronization procedure makes it waste valuable computation time.One can also question the necessity for a mobile agent. The whole operation canequally well be performed using simple messages. There are at least some advantagesin having the agent mobile, though; you do not need to start up any special software onevery client when you boot it up and you do not need a certain agent on each clientspending most of its time in an idle state, waiting for the synchronization message.The mobile agent will instead transfer itself to each host and begin to execute when itis time for the synchronization, and can disappear when the synchronization is done.

Unlike the traditional ostrich algorithm, ABOS can implement deadlock detection andrecovery. An agent can when it requests a resource learn who is holding the resource.Communicating with this other agent, you can find out which other resources it isholding and whether it is blocked waiting for another resource, and in such caseswhich resource. By introducing the requesting and the holding agent to each other, theholding agent can also store information as to what resources it indirectly blocks byholding a certain resource. It can then decide to give up one of the resources, abort theoperation it is working on, or ask the requesting agent to give up one of its resourcesthat is directly or indirectly needed. I am thus attacking the circular wait condition byway of the no preemption and the hold-and-wait conditions, assuming that it is safe tohold a resource as long as no other process needs it, and that I am not indirectly wait-ing for myself to release a resource. The detection algorithms can be implementedboth in the agents themselves, or in the resource allocation agent. Where to put theintelligence is just a matter of taste. However, to make sure that the problem is notignored by application developers, it may be best to implement it in the resource allo-cation agent.

6.8 SecuritySecurity is not only the issue of protection against malignant users, it is also to protectagainst and recover from faults in the operating system and system crashes. It is oftenpossible to distribute and replicate software over a set of machines to protect againstsuch system crashes. The trouble in distributed systems is commonly that instead of

50

having one single source of error, you end up depending on several machines to beoperable. The entire structure of ABOS is done to enable distribution. There are veryfew central components that can break down and cause delays for the users. Someagents, chiefly in the service layer, can be shared among a set of computers thus mak-ing them centralized but they can also be implemented on each host, communicatingin a distributed manner. For the unavoidable central components like file servers thereare schemes allowing redundant servers and the clients change to backup serverstransparently for the user. The caching mechanism supported by the agent file systemsthat files actually move to the host where they are used the most can cause loss of dataif the hard disk where the file currently resides should break down, but this loss can beminimized by sending images of the file to the server at regular intervals.

I have earlier argued that mobility is often unnecessary, and that most tasks can behandled by standard message-based communication over the network. To protectagainst unauthorized entry, one can think that mobility is even more undesirable.However, actually sending the code to execute on the receiving side brings someadvantages to the security situation. If you, as an agent, get a message it may be hardto determine that this message originates from a trusted agent that is not up to somemischief. If the code is transferred to your host you can let it pass through a filter muchlike a virus check before starting to communicate with it. If the host from which theagent originates is on the local network this search can be done on the sending side andeliminate the need to transfer the agent code, but if the agent comes from another net-work you have no way to trust it without actually checking the code.

Even with such scans, ABOS is in its current state very open for virus attacks becauseof the flexibility that I have prized so much. Malevolent users can write their own ker-nel agents and install them, replacing the real ones. Obviously some sort of securitycheck needs to be made to ensure that a certain user has the access rights necessary toinstall something into the kernel. This security check can thus allow some users toinstall kernel agents, whereas others may only run them. Other users again can beallowed to install and/or run agents in the service layer, and some users may only beallowed to run programs in the user layer. This gives a security model where practi-cally everything of interest can be configured on a per-user-basis. Because files areactive entities, and because anyone should be allowed to write their own file agent, theagent file system will be a major breeding ground for virus attacks. I fear that peoplewill use the opportunity for flexibility just as they have used the macro-feature in pro-grams like Microsoft Word. What is even worse is that the files can migrate by them-selves to new networks and hosts, and the user has no opportunity to select whether afile should be allowed to run or not.

Because of the possibility to add more managers in the kernel that communicates withthe default ones, you achieve high fault-tolerance. If one process or memory managerblocks or bugs on some operation, only the processes involved with this manager willnotice this, and all other applications will continue as usual.

6.9 PerformancePerformance was not one of the goals when I designed ABOS. Nevertheless I havetried to keep it in mind when designing, trying to avoid the most ineffective solutions.In many cases, like the agent file system, having to execute code in situations thattoday are done as a simple disk-to-memory operation will of course slow down theoverall performance of the system. It is my hope that by executing these extra bits ofcode some smart behaviour can be implemented to negate this effect by speeding upthe perceived performance for the user.

51

Scheduling decisions will also cause some troubles with performance if one is notcareful. Operations that in traditional operating systems are performed as a functioncall should be done in a similar way in ABOS by ensuring that the receiving agent isexecuted immediately after the calling agent. However, care must be taken to avoidstarvation of the other processes when a lengthy discussion is being held. One way toavoid this is to let the receiving agent share the same time quanta as the calling pro-cess. In this way no extra time is allotted for communicating agents.

The amount of context switches can be a liability compared to a traditional operatingsystem. For every context switch CPU registers needs to be saved, memory referencesswitched, open files noted and stored, and so on. This takes some time, and doing thisoften can result in reduced performance. A short survey yields that a common situa-tion in a UNIX system is to have 40 to 80 processes in total, but in ABOS there can bethis many processes even before users have logged in and started any applications,which should prove that there is an increase of context switches in an agent-basedoperating system.

By installing a network process manager you can distribute work over the hosts in anetwork. There is an overhead in finding a suitable host and perhaps transfer the code,but for processes running a longer period of time this overhead can be worthwhile.This is valid for agents without any user interface, but if the agent requires input fromthe user, the demands on the network bandwidth increase. This can be solved by justdistributing computation-intensive tasks as separate agents to other machines. Alltasks that involves user interaction will remain on the local host. This means that theagents themselves must decide whether they are suitable for load balancing, which isin accordance to the rest of ABOS that encourages such local decisions.

It is possible to achieve real-time execution by negotiating with the process manager.Having real-time execution means that ABOS can be used in embedded equipments,and that hard real-time deadlines can be met while at the same time have a multi-task-ing environment. On a single-processor machine, the system is of course not multi-tasking while a process is running in real-time mode. This implies that real-timebehaviour should only be allowed for shorter periods of time to let other processes runas well. When an agent is running in real-time mode all agents that it calls should bepromoted to real-time mode as well, provided that the call is synchronous. If an asyn-chronous call is made this can cause some problems because the called agent has noway of running on a single-processor system until the real-time process has finishedrunning.

The fact that the scheduling priorities can be customized to almost any extent and evensupport real-time execution implies that the performance can be very much tuned forthe specific applications. By allowing the applications to themselves negotiate for achange in priority also means that better performance can be achieved when needed.To avoid abuse, there should be a cost involved in running a higher priority, so that noprocess runs indefinitely at an unnecessarily high priority. The system is scalable toany number of processors, provided that the core process manager is programmed insuch a way, which implies that performance can always be improved by installingmore processors. Whether the performance improvement is linear or not I will not tryto evaluate.

6.10 OtherIn the design of ABOS there is no mention as to what decides whether an agent is akernel agent or a service agent or a standard user program. This is an implementationdecision, but could have severe impact on the total of the model. The agents can them-

52

selves decide whether they are kernel agents, or they can be assigned a place in thesystem. If they are assigned a place, someone needs to supervise that they keep them-selves to this layer and someone needs to maintain lists of what agents to start in a cer-tain layer. If the agents decide themselves you do not need this supervisory function,but you still need some check to ensure that the agents do not exceed their authorities.

An agent should be aware of its environment and should be able to decide whether itwishes to reside in the kernel layer or the service layer, but this also implies that theagent is aware of it being a part of the operating system which makes it more difficultto incorporate agents from other vendors into the system unless they have been cus-tomized to fit the agent operating system. There is also the question of abuse. If theagent claims to be a kernel agent, someone must perform a check to ensure that theuser who launched the agent is authorized to start up kernel agents. You can obviouslynot trust the judgement of the agent itself in such cases. Considering all these aspects,I believe that it is desirable to have a certain agent that installs and supervises the otheragents.

In a multi-tasked operating system it is common to use an interrupt-based approach tocommunicate with hardware. To have the entire operating system halt and wait forsome device I/O is naturally not possible. The interrupt signal, when it comes, shouldbe directed to the process that is interested in the signal. This process may not be theone that is running when the signal comes so some mechanism is needed to halt thecurrently executing process, make a note that an interrupt has arrived in the receivingagents process control block, resume running the halted process and finally notify theprocess that an interrupt has occurred when it is time to run it again. It may even beneeded to run the receiving process directly, allowing it to clear buffers before newdata arrives. Be that as it may, what is more important is that the core needs to recog-nize interrupts and to perform some action when one arrives. The core can alreadyhandle page fault interrupts and timer interrupts, but it must also be able to manage allother interrupts. The best way to do this would be to extend the core with an extramodule that takes care of incoming interrupts and directs them to the appropriate coremodule. The core process manager needs a table in each process’ PCB stating whichinterrupts the process subscribes for and a memory address to call in the process’memory space.

6.11 Summar yIn the evaluation above, I have tried to impartially both criticize and praise the designof ABOS. I have solved the problems encountered and have tried to come up with newideas of how to put the good things to use. In total this amounts to a better, more thor-ough design of ABOS. I believe that the main structure is as robust as it can be, andthat many of the problems one will encounter during development has been covered.What remains to be done is, of course, the tedious task of designing every core moduleand kernel agent in detail, and to implement the system. This is however out of thescope for this thesis, I believe that I have proven that agents can be used to constructan operating system, and that I can gain many benefits in so doing.

I have gone through great trouble to really use the flexibility provided by ABOS. Ihave created a situation where applications no longer need to adjust to a static operat-ing system. Instead, ABOS adjusts to the applications by adding new file agents, pro-cess managers and so on. This amounts to applications being able to execute on rawhardware, but still having the possibility to coexist with other applications runningconcurrently. In this sense I have kept the ideas from Aegis [6], but crafted the solu-tion in a completely different way.

53

In my solution applications from different vendors can use the agents provided byanother vendor. The applications need not be aware of which vendor a certain agentcomes from. Unlike Aegis, where you can at best share operating system componentswhen developing, you can in ABOS share them at runtime as well. This means thatsoftware from different vendors can be made to collaborate in ways that were notknown or expected when the applications were developed. The components can fur-thermore be replaced and upgraded at runtime without the applications knowing oreven noticing that this is happening.

One of the main idea behind ABOS is openness. Nothing should be so hidden in thekernel that clients cannot modify it. I believe that I have achieved this open system.Applications can customize the behavior of the kernel down to the smallest compo-nent. Interesting to notice is that such a customization is only noticeable for a subset ofthe processes, those that wants to use the particular scheme. Other applications willnot be disturbed by these tunings, and can continue to work as normal.

54

7. ConclusionsThis section concludes the work done in the previous chapters. A discussion is held regarding whatI have achieved, and where to go from here.

7.1 Summar yThis thesis consists of several parts. A survey of how operating systems are built todayis followed by a description the major trends in operating system research, showingthat whereas the current operating systems are built as large monolithic kernels, thereare a number of interesting research operating systems. These research systems com-monly focus on object orientation and modularization.

Following a description of agents and some of the concepts commonly connected toagents, is an investigation of what has been done with agents in operating systemsalready after which, in the realization that this area of research is still unexplored,there is further motivation for the suitability of agents in operating system kernels.

In order to be able to design an operating system, it is vital to know what tasks anoperating system should be able to perform. Chapter 3 is spent identifying and dis-cussing what tasks an operating system should perform, giving examples of commonsituations and trying to find pitfalls.

The two chapters following are spent designing ABOS, an agent-based operating sys-tem supporting the tasks identified earlier. The rough design in Chapter 4 is furtherrefined by designing some tasks in even more detail. Chapter 6 is spent evaluating thedesign to ensure that the requirements has been met.

7.2 Conc lusionsThis thesis has achieved a confirmation that agents are indeed suitable for use in oper-ating system kernels, and that you can in fact base the entire operating system onagents that interact with each other. To use agents instead of traditional approachesensures that you get greater flexibility, as well as interesting opportunities to make adistributed operating system.

The design of ABOS shows some interesting qualities which are until now, and to mybest knowledge, unheard of. The service layer is an example of such a feature. Thisextra layer in the operating system does not provide any extra functionality in itself,but is a very good way to categorize applications that are not quite kernel programsand not quite user applications either. By having a special layer for such services youcan generate more detailed priority, access, and scheduling policies.

The agent-based file system described in Chapter 5 shows that there is still much workto do regarding how we perceive file systems. For decades virtually no attempt hasbeen made, and certainly none has succeeded to add to or change the concept of filesand directories. I believe that the agent-based file system will introduce a change inthe vision of file systems, even if some aspects like security speaks against it. Agentsas files provides so much new in flexibility and usefulness that it is my firm belief thatthe concept will be used even in commercial operating systems.

ABOS can also be used in real-time environments. There is a possibility to affect thescheduling for a process in such a way that it can gain exclusive access to the CPU forlimited periods of time. Compared to other real-time systems, ABOS is much moreflexible in that it does not use static scheduling for a predefined number of threads.

55

Unlike the classical operating systems you do not have to change the priority of a pro-cess manually, something which often renders the user incapable of lowering the pri-ority again.

Furthermore, in ABOS a situation is created where the operating system adjusts to theapplications currently running. The traditional approach is otherwise to assume thatthe operating system is static, and instead write the application to circumvent the lia-bilities in the operating system. If you can modify the kernel, there is no way to mod-ify it for a single application only. The changes you make is visible to all otherapplications as well. In ABOS the situation is different. Here, you can modify the ker-nel by adding new kernel agents and negotiate with the existing ones, and the changesyou thus make to the kernel is only visible to your application.

7.3 Future w orkThe design and evaluation of ABOS in the previous chapters is so far only a paperproduct. The next step is of course to implement the core and at least some of the ker-nel agents. This, however, is a much greater task than the scope of a master’s thesis.

After the core and kernel has been implemented, a further evaluation should be madeto verify that it works as a real product as well as a mind experiment. I would not besurprised if many things need to be made differently, but I also believe that the basiclayout will remain, as will the flexibility.

56

References1. Bershad, B.“The Increasing Irrelevance of IPC performance for Microkernel-Based Operating

Systems”, USENIX Microkernels Workshop, 1992.

2. Dearle, A., Rosenberg, J., Henskens F., Vaughan, F., Maciunas, K.“An Examination of OperatingSystem Support for Persistent Object Systems”, Proceedings of the 25th Hawaii International Con-ference on System Sciences, vol 1, 1992.

3. Douglis, F., Ousterhout, J.,“Transparent Process Migration: Design Alternatives and the SpriteImplementation”, Software-Practice & Experience, 21(8): 757-785, August 1991.

4. Homburg, P., van Steen, M., Tanenbaum, A. S.“An Architecture for A Scalable Wide Area Distrib-uted System”,Proceedings of the Seventh ACM SIGOPS European Workshop, ACM, New York,pp. 75-82, 1996.

5. Hamilton, G., Kougiouris, P. “The Spring nucleus: A microkernel for objects”, Proceedings of the1993 Summer Usenix Conference, June 1993.

6. Engler, D. R., Kaashoek, F., O’Toole Jr., J. “Exokernel: An Operating System Architecture forApplication-Level Resource Management”, Proceedings of the Fifteenth Symposium on OperatingSystems Principles, December 1995.

7. Tanenbaum, A. S.“Modern Operating Systems”, Prentice Hall, Englewood Cliffs, New Jersey07632, 1992.

8. Stallings, W. “Operating Systems, 2nd edition”, Prentice Hall, Englewood Cliffs, New Jersey07632, 1995.

9. Rashid, R., Baron, R., Forin, A., Golub. A., Jones, M., Julin, D., Orr, D., Sanzi, R.,“Mach: A Foun-dation for Open Systems.”, Proceedings of the Second Workshop on Workstation Operating Sys-tems (WWOS2), 1989.

10.Rashid, R., Julin, D., Orr, D., Sanzi, R., Baron, R., Forin, A., Golub, D., Jones, M.,“Mach: A Sys-tem Software Kernel.”, Proceedings of the 34th Computer Society International Conference COMP-CON 89, 1989.

11.“The Common Object Request Broker Architecture and Specification”, BNR Europe Ltd., 1995.

12.“NDS For NT”, Novell Inc., 1997.

13.Walli, S. R.“OPENNT: UNIX Application Portability to Windows NT via an Alternative Environ-ment Subsystem”, Proceedings of the USENIX Windows NT workshop, Seattle, Washington, 1997.

14.“JavaStation NC. Slashing the Total Cost of Ownership for Fixed Function Applications”, Site onthe Web: http://www.sun.com/javasystems/krups/ , Sun Microsystems Inc., (Date Unknown).

15.“Sun Announces New HPC Server Clustering Capabilities Supporting up to 256 Processors”, SUNpress release, Palo Alto nov 11., 1997.

16.Wallström, M.“NT passerar Unix inom arbetsstationer”, Computer Sweden, 98-03-16.

17.Kessler, P. B. “A Client-Side Stub-Interpreter”, Proceedings of ACM Workshop on Interface Defini-tion Languages, January 1994.

18.van Steen, M., Homburg, P., van Doorn, L., Tanenbaum, A. S., de Jonge, W. “Towards Object-basedWide-Area Distributed Systems”, Proceedings of the Fourth International Workshop on Object Ori-entation in Operating Systems, IEEE, New York, pp. 224-227, 1995.

19.van Steen, M., Hauck, F. J., Tanenbaum, A. S.,“A Model for Worldwide Tracking of DistributedObjects”, Proceedings of TINA 96, Eurescom, pp. 203-212, 1996.

20.Higgs, B. J.“History of Object-Orientation in Programming Languages”, Lecture slides availableat: http://niagara.rivier.edu/staff/bhiggs/FrontPageWebs/OOPandCPP/history/history.htm , RivierCollege, 1998.

21.Brookshear, J. G.“Theory of Computation. Formal Languages, Automata, and Complexity”, Ben-jamin-Cummings, 1989.

57

22.Higgs, B. J.“What Problems are We Trying to Solve? The Motivating Forces behind Object-Ori-ented Programming and Design”, Lecture slides available at: http://niagara.rivier.edu/staff/bhiggs/FrontPageWebs/OOPandCPP/why/why.htm, Rivier College, 1998.

23.Johansen, D., van Renesse, R., Schneider, F. B. “Operating System Support for Mobile Agents”,Proceedings of the 5th IEEE Workshop on Hot Topics in Operating Systems, pp. 42-45, 1995.

24.Nwana, H. S.“Software Agents: An Overview”, Knowledge Engineering Review, Vol 11, No 3, pp.205-244, October/November 1996.

25.Morrison, M.“Presenting JavaBeans”, Sams.net publishing, 1997.

26.Persson, E.“Component Technology. Infrastructures and Enabling Technologies - a Short Survey”,LU-CS-TR:97-197, LUTEDX/(TECS-3078)/1-71/(1997), Department of Computer Science, LundUniversity, 1997.

27.Fredriksson, M.“Agent Oriented Programming”, Unpublished paper, University of Karlskrona/Ronneby, 1997.

28.Cohen, P. R., Cheyer, A., Wang, M., Baeg, S. C.“An Open Agent Architecture”, Proceedings of theAAAI Spring Symposium on Software Agents, pp. 1-8, 1994.

29.Liu, J-S., Sycara, K. P. “Multiagent Coordination in Tightly Coupled Task Scheduling”, Proceed-ings of the First International Conference on Multiagent Systems, pp. 181-188, 1996.

30.Labrou, Y., Finin, T. “Semantics and Conversations for an Agent Communication Language”, Pro-ceedings of the Fifteenth International Conference on Artificial Intelligence, pp. 584-591, 1997.

31.McCabe, F. G., Clark, K. L.“April - Agent PRocess Interaction Language”, Department of Com-puting, Imperial College, London, 1994.

32.Halsall, F. “Data Communications, Computer Networks and Open Systems, Fourth Edition”, Addi-son-Wesley Publishing Company Inc., 1996.

33.Rus, D., Gray, R., Kotz, D. “Transportable Information Agents”, Proceedings of the InternationalConference on Autonomous Agents, pp. 228-236, 1997.

34.Woolridge, M., Jennings, N. R.,“Pitfalls of Agent-Oriented Development”, Proceedings of 2ndInternational Conference on Autonomous Agents (Agents-98), Minneapolis, USA. (to appear)

35.Ekdahl, B. “Computerized Agents from a Linguistic Perspective”, Ph.D. thesis, Department ofComputer Science, Lund University, 1997.

36.Ousterhout, J. K.“Tcl and the Tk Toolkit”, Addison-Wesley, Reading, Massachusetts, 1994.

37.Genesereth, M. R., Singh, N. R.“A Knowledge Sharing Approach to Software Interoperation”,Computer Science Department, Stanford University, 1994.

38.“FIPA 97 Specification. Part 2: Agent Communication Language”, Foundation for Intelligent Phys-ical Agents, Geneva, 1997.

39.Ford, B., Susarla, S.“CPU Inheritance Scheduling”, Proceedings of OSDI'96, October 1996.

40.Ballesteros, F. J., Fernández, L. L.“Advice: An Adaptable and Extensible Distributed Virtual Mem-ory Architecture”, Proceedings of the IASTED PDCS'96, Chicago IL, 1997.

41.“Middleware -- The Essential Component for Enterprise Client/Server Applications”, InternationalSystems Group, Inc., New York, 1997.

42.“Java Remote Method Invocation - Distributed Computing for Java”, Sun Microsystems, Site onthe Web: http://java.sun.com/marketing/collateral/javarmi.html, 1998.

43.Fredriksson, M.“Active Documents and their Applicability in Distributed Environments”, Unpub-lished Master’s Thesis, University of Karlskrona/Ronneby, Pending release 1998.

44.Mills, D. L. “Internet Time Synchronization: the Network Time Protocol”, IEEE Trans. Communi-cations 39, 10 (October 1991), pp. 1482-1493, 1991.

58

45.Matthews, J. N., Roselli, D., Costello, A. M., Wang, R. Y., Anderson, T. E. “Improving the Perfor-mance of Log-Structured File Systems with Adaptive methods”, Proceedings of the Sixteenth ACMSymposium on Operating System Principles, 1997.

46.Ward, B.“The Linux Kernel HOWTO”, Site on the Web: ftp://ftp.funet.fi/pub/Linux/doc/HOWTO/Kernel-HOWTO , 1997.

47.Barrus, F., “Shag/OS: A Small, Dynamic, Object-Oriented, MicroKernel-based Operating System”,Unpublished study-project proposal, Rochester Institute of Technology, 1995.

48.van Doorn, L., Homburg, P., Tanenbaum, A. S.“Paramecium: An extensible object-based kernel”,Proceedings of Hot Topics in Operating Systems V, ACM, New York, pp. 86-89, 1995.

49.Homburg, P., van Steen, M., Tanenbaum, A. S.,“The Architectural Design of GLOBE: A Wide-AreaDistributed System”, Technical Report IR-422, Vrije Universiteit Amsterdam, 1997.

50.Russinovich, M. “Inside the Windows NT Scheduler, Part 1 & 2”, Windows NT Magazine, July &August 1997.

51.“Norton Your Eyes Only 4.1 for Windows NT/Windows 95”, Symantec Corporation, Site on theWeb: http://www.symantec.com/yeo/fs_yeo41-95nt.html , (Date unknown).

52.“NFS Software from Sun: Bringing the Enterprise to your Desktop”, Sun Microsystems, Site on theWeb: http://www.sun.com/netclient/wp-nfs.sw/ , (Date unknown).

53.Hall, F., Hellspong, M.“EPIDEMIC: A Framework for Object Oriented Compiler Construction”,Unpublished Master’s Thesis, University of Karlskrona/Ronneby, 1998.