Transcript
Page 1: [IEEE MILCOM 91 - Conference record - McLean, VA, USA (4-7 Nov. 1991)] MILCOM 91 - Conference record - Application of knowledge-based network management techniques for packet radio

APPLICATION OF KNOWLEDGE-BASED NETWORK MANAGEMENT TECHNIQUES FOR PACKET RADIO NETWORKS

R. J. Doyle and A. R. K. Sasay Rockwell International Science Center

1049 Camino Dos Rios Thousand Oaks, CA 91360

Abstract: We have developed a preliminary version of a knowledge-based model for network management and reconfigllllltion using blackboard techniques and have applied it initially to packet radio networks. Monitoring and control of a large network can be improved with techniques for automated network management implemented by a network controller. The major functions of a network controller are (a) configuration management, (b) monitoring, and (c) fault diagnosis. Our work is concerned with developing procedures for evaluation of candidate recovery/reconfiguration methodologies and techniques for fault isolation and related monitoring functions. As an initial step, we chose the Generic Blackboard (GBB) as our AI environment to develop the management tools and interlink it to a packet radio network simulator that is used as a test bed network to be controlled and monitored. In this paper, we describe the details of the interaction of the management environment and the packet radio simulator as implemented in our model so far and present numerical results obtained through the execution of some preliminary rules.

1.INTRODUCTION A

Monitoring and control of a large network can be improved with techniques for automated network management implemented by a network controller [1]-[3]. This requires developing procedures for evaluating candidate recovery/mnfiguration methodologies and techniques for fault isolation and related monitoring functions. Towards this goal. we initiated work on knowledge-based network management and reconfiguration and used a packet radio network as an example of the network to be controlled.

Our long-term emphasis in this effort will be towards generic techniques not tied to any particular network. However, initially we intended to carry out a successful demonstration of the coupling of a Packet Radio communications network simulation with the Generic Blackboard (GBB) development system [4].[5]. the latter acting as a distributed network manager.

Our research is specifically directed towards a decentralized concept in which we envision an autonomous management function attached to each network node. The management function at each node implements the same control policy and operates under a common set of rules; however, each responds autonomously to the situation as locally observed. This form of distributed control can prove more robust for certain types of networks. Figm 1 shows the management scheme.

In section 2, we describe the blackboard paradigm. Section 3 details our approach to the network management problem. Section 4 focuses on the packet radio application. Section 5 is concerned with the simulation implementation. Section 6 reports some preliminary results. and section 7 contains concluding remarks.

2. THE BLACKBOARD PARADIGM

AI researchers have applied the Blackboard paradigm to a number of complex problems, which is particularly effective in applications involving dynamic distributed elements, and thus is well-suited to the problem of network management. The Blackboard paradigm is characterized by three salient features: knowledge sources, blackboard and control shell.

E S T OF THI

El TRAFFI

Figure 1. Autonomous Management Concept.

T(now1edEe Sou rceg. Domain specific knowledge is partitioned into a number of independent modules called knowledge sources. Each knowledge source is responsible for a certain restricted aspect of problem-solving. Presumably, a knowledge source has some particular area of expertise. A knowledge source may be implemented as a rule-based or procedural element.

Blackboard. The black&nud is a shared data structure to which all knowledge sources have access. A knowledge source may observe, add, modify or delete information on the blackboard. Thus the blackboard provides for anonymous communication among the knowledge sources. Blackboard data may not be relevant to the problem solution, or its relevance recognized, until it has been on the blackboard for some period of time. Hence the blackboard provides global memory for the knowledge sources.

m. The control shell is a set of functions that mplement a control model that initiates and controls problem- solving activity. It includes a mechanism that 'triggers' the knowledge sources in accordance with certain defined blackboard events. Presumably such an event signals the posting of data on the blackboard relevant to the knowledge source's area of expertise. When triggered, a knowledge source will search the blackboard for other data related to the triggering event. This may cause the knowledge source to be scheduled for execution, or alternatively removed from further consideration. Knowledge source execution ultimately should

041 4

20.5.1. CH2981-919110000-0414 $1.00 0 1991 IEEE MILCOM '91

Page 2: [IEEE MILCOM 91 - Conference record - McLean, VA, USA (4-7 Nov. 1991)] MILCOM 91 - Conference record - Application of knowledge-based network management techniques for packet radio

result in other data being added to the backboard that will trigger other knowledge sources, thereby advancing the problem solution.

In our distributed concept, the manager at each node will have its own blackboard or set of blackboards, however, the associated knowledge sources will be identical.

3. NETWORK MANAGEMENT APPLICATION

The operational goal of network management is to dynamically control network traffic in such a manner as to provide a not necessarily optimal but relatively high service level to the network clients. It must be able to recognize operational problems and determine and implement corrective or compensative action. Secondarily, the manager must promptly diagnose and report hard system failures.

In our decentralized implementation, management decisions will be based primarily on system state as determined from local observations of traffic, congestion and delays, and to a lesser extent, from communication with other nodes. Management control in such a distributed system is assumed to be limited to local actions only. Control options available to an autonomous local manager are necessarily limited. In the case of routing, control is restricted to selection of next node. Flow control can be effected by timing successive transmissions on a neighbor-by-neighbor basis. And the local manager can support global system objectives through priority assignment. One goal of our research is to determine to what extent external state information is desirable or necessary in managing the network.

CO-

The management process is seen to involve three basic steps, the first of which is to recognize deviant behavior. In this the node is assumed to receive some sort of feedback in the form of packet delivery acknowledgment, or lack thereof. Deviant behavior would be manifest as for example failed packets or excessive forwarding delay. The second step is to hypothesize

some cause or causes to account for the observed behavior. The third step is to synthesize action to correct the hypothesized causes.

The implementation of these three steps has led to the conceptual blackboard configuration shown in Figure 2. This configuration has three domain blackboards (the third of these, shown with dashed lines, is not yet implemented). The Observations blackboard contains facts about the system that can be determined from direct observation. The Hypotheses blackboard contains facts about the inferred state of the system. The Network Control blackboard contains proposed control actions.

The Observations blackboard provides an observations database. It contains information that is directly available to the node. Examples include a list of neighbor nodes, a list of all nodes known by the manager, current routing data, queue sizes, link quality information, forwarding delays, failed packets, past actions that have been taken by this node, actions repted by neighbors, and requests from neighbors.

The function of knowledge sources attached to this blackboard is problem recognition. These knowledge sources encapsulate domain knowledge relating observed conditions with deviant network behavior. Thus far, knowledge sources whose expertise is congestion, delays and environmental concerns have been suggested. Knowledge sources operating on this blackboard are triggered by the posting of observation data. Depending on the trigger conditions and other data present on the blackboard at the time, the triggered knowledge source may be able to infer facts about the state of the network and as a result may post one or more hypotheses on the Hypotheses blackboard.

The Hypotheses blackboard contains the hypothesized state of the network. The function of knowledge sources operating on this blackboard is to generate proposed control actions. Knowledge sources have been identified for this blackboard whose expertise is routing and flow control. These knowledge sources generate proposed control actions which are in turn posted on the Network Control blackboard.

DOMAIN BLACKBOARDS

I I I I- - - - Obsewations -I Network

Network State Conditions Hypotheses I Actions I I

nvimnmental GI KNOWLEDGE SOURCES (KS

~ ~~

Figure 2. Network Manager Blackboard Configuration.

20.5.2. 0415

Page 3: [IEEE MILCOM 91 - Conference record - McLean, VA, USA (4-7 Nov. 1991)] MILCOM 91 - Conference record - Application of knowledge-based network management techniques for packet radio

At present, proposed control actions are imposed directly on the network. In the future, the Network Control blackboard will be a repository for control actions. Since some of the network state hypotheses may be mutually exclusive, proposed control actions may not be consistent. The function of the Control Action knowledge source is to select from the set of proposals those actions that are to be carried out. The output of this knowledge source will be specific actions to be taken by the node.

The manager function at each of the network nodes is to be implemented in an identical fashion, as described above. However, since local observations vary from node to node, resultant control actions will likewise vary.

4. PACKET RADIO APPLICATION

Packet Radio has been selected for the initial application of network management because (1) the complexity of that system provides an interesting and fertile area for investigation. (2) packet radio is a naturally distributed system since the nodes must operate essentially autonomously, and (3) autonomous management is critical to packet radio because of the dynamic character of a system that includes mobile nodes.

A packet radio network consists of a number of nodes (packet radio units), at least some of which may be mobile, geographically distributed over some area. Communication between nodes is via a shared broadcast channel. A Canier Sense Multiple Access (CSMA) scheme is assumed (in our version). As transmission range is limited, nodes also act as repeaters, and one or more hops may be requipxi for a packet to reach its destination. Locally generated traffic may be inserted into the network at any node. Because of the shared communication channel and node autonomy, interference between neighboring nodes is common and limits the effectiveness of the network. A spirit of cooperative restraint among the autonomous nodes in the use of the channel is necessary to make the network function effectively.

5. SIMULATION IMPLEMENTATION

A hybrid approach has been taken in the design of the simulation. The packet radio node simulation is written in C while the manager simulation is written in LISP using the GBB development tool. A complete simulation consists of a number of identical nodes with the associated managers.

The packet radio node performs routine communication tasks, such as transmitting and receiving packets, acknowledging packets that it receives, and retrying unacknowledged packets. In addition, it periodically broadcasts control packets to its neighbor nodes, and in turn, receives their control packet broadcasts. The control packets contain information that permits the nodes to do minimum-hop routing in a distributed fashion. The control packets can also convey information from neighboring managers and thus are forwarded to the local manager. The node also collects summary performance related data such as traffic rate and backlog. which it periodically reports to the local manager. All data sent to the manager is posted directly to the manager blackboard structure.

The manager, through its control model and various knowledge sources, will observe data posted on its blackboard(s), add data to the blackboard as to hypothesized state of system, and from time to time issue control directives

to the node. Currently these control directives are limited to telling the node which links to avoid in its routing determination. In addition it may provide manager data to the node to be included in subsequent control packet broadcasts. Data flow between node and a manager is shown in Figure 3.

Figure 3. Data Flow Between Node and Manager

F u n c h . The node must determine the disposition of received packets: re-transmitted if this node is a designated relay node, or acknowledged if this node is the indicated destination. Control packets and received acknowledgements must be processed appropriately. Packets inserted into the network locally are transmitted. The node maintains transmission flow control to its various neighbors by appropriately timing transmissions.

W. A distributed minimum-hop routing algorithm is implemented through a periodic broadcast of control packets. The control packet contains routing information, specifically, the distance in hops to every node in the network as perceived by the source of the control packet. Upon receiving a control packet from one of its neighbors, a node can select that neighbor as the relayer to a destination if that neighbor provides a shorter path to that destination. If the node has been advised by the manager that the link to a particular neighbor is bad, or that the neighbor is inactive, the node may use the control packet routing information to replace that neighbor in any routes for which it has been the designated relayer.

The node will also include any manager generated data to be broadcast to neighbors when it transmits the control packet.

Acc-, The node periodically sends data to the manager which represents direct observations of network conditions. The manager can use such data to infer the current state of the comunication system. The node accumulates data on the following item:

1. TOM packets transmitted by node. 2. Total packets received by node. 3. Average (over pnxeding manager update interval)

4. Total packets received fmm each neighbor. 5. Total control packets meived from each neighbor. 6. Observed forwarding delay (filtered) to each neighbor. 7. Total packets failed to each neighbor.

backlog.

20.5.3. 041 6

Page 4: [IEEE MILCOM 91 - Conference record - McLean, VA, USA (4-7 Nov. 1991)] MILCOM 91 - Conference record - Application of knowledge-based network management techniques for packet radio

8. Total packets successfully transmitted to each neighbor. 9. Total retries to each neighbor. 10Average (over preceding manager update interval) backlog

This data is sent to the manager every managm-update interval (MUI). The MU1 (currently 60 seconds) is somewhat longer than the control packet period (current value 7.5 seconds) so that several control packets will be received from each neighbor during an MUI.

Manaeer.. Local data is sent to the manager at the MUI. In addition, control packets, which can contain new information from the neighboring manager, are sent to the manager as they are received. All data sent to the manager are posted directly on the Observations blackboiud.

In synchronism with the local report, the manager generates data to be broadcast to neighboring managers. These data are sent to the node to be included in subsequent control packet transmissions. Control packet reception is subject to the unreliability of the packet radio network. The node always includes the most recent manager provided data with the control packet. Thus manager data will be repeated for several contrd packets. The receiving manager will ignore data that he has already received and processed.

As a result of manager activity, the manager may issue directives to the node. Currently these include the advice (a) that the link to a specific neighbor is bad, or (b) the neighbor itself is bad. The node uses such advice to adjust muting.

to each neighbor.

As indicated in Figure 3. the manager consists of a Blackboard Structure and a Control Shell.

. The blackboard structure, as currently implemented, consists of two blackboards. All data provided by the node is posted on the Observations blackboard, and in fact on the space Observed-State, which is the only space defined (so far) for this blackboard. The Observed-State space contains configuration data as well as current data about the node and its neighbors.

The Hypotheses blackboard is divided into spaces according to the particular fault hypothesis. Currently Bad-Links, Bad- Nodes, and Congested-Nodes spaces have been identified. Knowledge sources associated with the manager control-shell post and remove data from the Hypotheses blackboard.

ControlModel. The control model consists of control shell, blackboard events, knowledge sources, and knowledge source activation records.

Events iire created by certain defined blackboard operations, such as additions, modifications or deletions of data on the blackboard. When created, events are added to the control- shell event list.

A knowledge source encapsulates a certain restricted area of domain knowledge or expertise. It is characterized by blackboard spaces, trigger conditions, execution preconditions. obviation conditions, action function. and priority. The blackboard spaces identify the space or spaces from which it can be triggered. Trigger conditions are a set of conditions on the triggering event, a l l of which must be true to mgger the knowledge source.

When a knowledge source is triggered, a knowledge source activation record (KSAR) is created and added to the control shell agenda. A KSAR is a temporary structure relating knowledge source and triggering event. Knowledge source preconditions are a set of conditions, involving blackboard data related to the triggering event, all of which must be true to schedule the KSAR for execution. Obviation conditions are a set of conditions, involving blackboard data related to the mggering event, which if all are true before the KSAR is executed, cause the KSAR to be removed from the agenda and any further consideration. The action function implements whatever action the knowledge source is to take. Since the action function i s completely arbitrary it can include consideration of other conditions and involve funher decision making. When all preconditions are satisfied, the action function is invoked and the KSAR is removed from the agenda. Knowledge source priority is used to determine the order in which multiple KSARs on the agenda are processed.

The Control Shell consists of an event list and an agenda. The event list is a temporary container for events that have been generated for this control shell. The agenda is a priority queue of KSARs. When a knowledge source is triggered the corresponding KSAR is placed on the agenda (according to the knowledge source priority). If and when all preconditions are satisfied, the KSAR is executed, that is, the knowledge source action function is invoked and the the KSAR is removed from the agenda. Otherwise, if al l obviation conditions become satisfied, the KSAR is removed from the agenda and is not considered further. The control shell also includes a list of executed KSARs and a list of obviated KSARs, but these are used for diagnostic purposes and play no part in the operation of the control shell.

The control shell has two states: active and inactive. When the event list is empty and the agenda contains no executable KSARs, the control shell is in the inactive state. When it is in the active state. the control shell repeatedly executes a control cycle consisting of processing the event list and processing the agenda. Event list processing consists of removing each event from the event list and checking against all knowledge source mgger conditions. If triggered, a KSAR is created and placed on the agenda. After processing, events are discarded. TO process the agenda each enqueued KSAR is considered in turn. If all obviation conditions are satisfied the KSAR is removed from the agenda (and added to the obviated list). If all preconditions are satisfied, the KSAR is executed and removed from the agenda (and added to the executed list). Otherwise the KSAR remains on the agenda. Note that KSAR execution can in general create new events. These are processed on the next cycle. The cycle continues until no more new events are generated and the agenda contains only KSARS for which neither preconditions nor obviation conditions has been satisfied. The control shell then goes to the inactive state. Subsequent event creation can return the control shell to the active state.

NodeInterface. The manager receives periodic reports from the node as previously described. These reports contain data as to the current state of the system, as can be observed from the node. These data are posted directly on the Observations blackmd.

Control packets received from neighbors are also forwaded to the manager. In addition to routing information, these contain status data from the neighboring manager. The latter are also posted directly on the Observations blackboard. At this time the manager may also request data from the node to correlate

20.5.4. 041 7

Page 5: [IEEE MILCOM 91 - Conference record - McLean, VA, USA (4-7 Nov. 1991)] MILCOM 91 - Conference record - Application of knowledge-based network management techniques for packet radio

with that from the neighboring node. For reliability the manager data is repeated for several control packets; thus the manager may receive the same data in several successive control packets. The manager will disregard such redundant data.

As the result of knowledge source actions, the manager may issue control directives to the node as described previously.

To establish and maintain V m d c a l l y hadcast control packets. Presumably the manager can exploit this feature to transfer various status information to the neighboring managers. A variety of information can be exchanged by this m e t h a currently, data relating to link quality and congestion arebeingtransferred.

On each MUI, after processing all new data received from the local node, the manager makes a set of data available to the node to be included in subequemt mtrol packet broadcasts.

. .

F a simulation purposes a packet radio node is partitioned into the following modules, as shown in Figure 4.

Figure 4. Packet Radio Simulation

1. A processor module determines the disposition of all incoming packets and generates ACKs for received packets.

2. A router module determines proper routing (designation of 'next' node) for all packets to be transmitted, updates its routing database in accordance with received control packets, and generates control packets to be broadcast to neighbor nodes.

3. Afrow control module maintains a transmit queue in accordance with established priorities, determines the order in which packets arc to be transmitted, and controls timing between successive transmissions.

4. A transceiver module, which simulates the CSMA protocol, transmits packets to neighbor nodes (with propagation delay), receives neighbor node transmissions, and determines validity of received packets (clashed or unclashed). The transceiver module provides the interface with the rest of the network.

5. A node interface module translates protocol interaction primitives to and from the packet radio format.

6. A tr&c model which creates packets to represent locally generated traffic interfaces with the packet radio nodes via the node Interface module.

In addition, a Performance Monitor module gathers statistical data on the operation of the simulation. The packet radio simulation is wrimn in C.

To simulate the decentralized management concept, each network node must be provided with its own separate management function. Thus there arc separate but identical instances of blackboard structure and control shell for each node. There is no direct communication between managers at different nodes. A simulated node can send data to the local blackboard, and the control shell knowledge sources can read and write data on the local blackboard and issue control directives to the node, however, inter-manager communication can take place only using the network facilities, as, for example, through the control packet mechanism described previously. The management function is implemented using the Generic BlackBoard (GBB) development system [41,[51.

6. EXERCIZING THE MODEL

The primary thrust of the rules that have been implemented in our simulation has been to determine when a link with a ceatain neighbor has become bad enough to become unusable, and to distinguish between such a bad link and an entirely inoperative node. Additionally, these rules determine when a previously declared bad link or node again becomes operative.

At present these rules are mostly diagnostic; the only control effected is to avoid bad links in routing decisions.

To exercise the network manager simulation, a graphic user interface that permits the user to insert faults into the network with a "mouse" pointing device has been implemented. Two kinds of faults have been introduced: faults may be inserted into one or more links or into one or more nodes. A fault inserted into a link reduces the effectiveness of that link to some low value, e.g.. only 20 percent of the packets transmitted over the link are received. A fault inserted into a node renders the node totally inactive.

When a manager at some node detects the manifestations of a fault thus inserted, and has taken some resultant action, a message is written out in a special manager window. The message includes the time of the action, node identification, the specific action taken, and the reason for such action. The resulting list of such messages provides a chronological record, ur script, of every manager action for all nodes.

The output script in Table 1 is the result of inserting (and later removing) a fault into the network shown in Figure 5. At approximately 70 seconds after the start of the run, a fault was inserted into node 6. From this time on node 6 would be completely inactive and would transmit no packets. The direct effects of this fault should be seen by the neighbors of node 6 (nodes 1.4.9, and 11).

The fvst manager action occurs at 105.394 seconds when node 1 declares its node 6 link bad. Several other bad link declarations ultimately result in rerouting that avoids using node 6 as a relayer. The insertion of the fault causes an

20.5.5. 0418

Page 6: [IEEE MILCOM 91 - Conference record - McLean, VA, USA (4-7 Nov. 1991)] MILCOM 91 - Conference record - Application of knowledge-based network management techniques for packet radio

immediate dip in network performance which is restored as the routes are adjusted. At about 180 seconds, the fault is removed from node 6, which again starts to transmit packets.

7. CONCLUDING REMARKS

The GBB management module, the associated control software, and the simulation model, together form a very useful infrasaucture, with a high potential for yielding insights on various network mangement approaches [6]. The current software has some basic rules on both diagnosis and control actions that help demonstrate the collective opeartion of the modules, as described in the previous section. We are expanding the initial set of rules to relieve local congestion in a packet radio network to exploit the additional data that is available to the manager and to introduce more routing options. We also plan to expand the present preliminary blackboard structure into a more comprehensive problem solving system. Maintaining the autonomous management concept, we will investigate the value of making use of varying degrees of global network information, obtained through normal communication over the network. We are also planning to apply the knowledge-based management information to other candidate networks.

REERENCES

[l]. L. Ruston and P. Sen, "Rule-Based Network Design: ADDlication to Packet radio networks." EEE Network, DD. - _ 3i-39, July 1989. [2]. M. Bereschinsky and C. Graff, "A State of the art Assessment in Network Management Using techniques from Artificial Intelligence," Pmcedings of t& Symposium on Command and Control Research, Monterey. June 7-9,1988.

[3]. J . Pasquali, "Using Expert Systems to Manage Dismbuted Computer Systems, IEEE Network, pp. 22-28, September 1988.

[4]. D.D. Corkill, K.Q. Gallager, and K.E. Murray, Roc. of the National Conf. on Artificial Intelligence, pp. 1008-1014, Philadelphia, August 1986.

[5] . K.Q. Gallager, D.D. Corkill, and P.M. Johnson, GBB -ce M d , GBB Version 1.2, COINS Technical Report 88-66, University of Massachusetts at Amherst, July 1988.

[6]. K.J. Doyle and A.R.K. Sastry, "Application of Knowledge-Based Network Management Techniques for Packet Radio Networks," Rockwell International Science Center Technical Report, SCTR 91-2, February 1991.

Table 1.

105.394 Node 1: Declare Link to 6 Bad. Cntl pkts rcvd below lower threshold.

105.579 Node 4: Declare Link to 6 Bad. Cntl pkts rcvd below lower threshold.

106.574 Node 11: Declare Link to 6 Bad. Cntl pkts rcvd below lower threshold.

119.572 Node 9: Declare Link to 6 Bad. Cntl pkts rcvd below lower threshold.

137.251 Node 14 Declare Link to 6 Bad. Cntl pkts rcvd below lower threshold.

137.251 Node 14: Node 6 suspected bad No cntl pkts rcvd from this node

165.394 Node 1: Node 6 suspected bad No cntl pkts rcvd from this node

165.579 Node 4: Node 6 suspected bad No cntl pkts rcvd from this node

166.574 Node 11: Node 6 suspected bad No cntl pkts rcvd from this node

173.553 Node 8: Declare Link to 3 Bad. Cntl pkts m d below lower threshold.

179.572 Node 9: Node 6 suspected bad No cntl pkts rcvd f" this node

197.251 Node 14 Node 6 confmed bad Sustained zero cntl pkts rcvd condition.

213.612 Node 1: Declare Link to 6 Good. Link quality above upper threshold.

213.612 Node 4: Declare Link to 6 Good. Link quality above upper threshold.

213.612 Node 14 Declare Link to 6 Good. Link quality above upper threshold.

213.612 Node 1 4 Node 6 Regained Cntl pkts rcvd from node.

213.612 Node 9: Declare Link to 6 Good. Link quality above upper threshold.

225.394 Node 1: Node 6 bad hypothesis removed. Suspicion unfounded, cntl pkts rcvd.

225.394 Node 1: Declare Link to 6 Bad. Cntl pkts rcvd below lower threshold.

225.579 Node 4: Node 6 bad hypothesis removed. Suspicion unfounded, cntl pkts rcvd.

226.574 Node 11: Node 6 bad hypothesis removed. Suspicion unfounded, cntl pkts rcvd.

233.553 Node 8: Declare Link to 3 Good. Cntl pkts rcvd above upper threshold.

Suspicion unfounded, cntl pkts rcvd.

Link quality above upper threshold.

239.572 Node 9: Node 6 bad hypothesis removed.

273.61 1 Node 1: Declare Link to 6 Good.

Figure 5. Test network with 20 nodes

20.5.6. 041 9


Top Related