new flllfffflllfff ehomhohhohhoheeehe eeeee · 2014. 9. 27. · ad-aloi 412 air force wright...

AD-AlOI 412 AIR FORCE WRIGHT AERONAUTICAL LAOS WRIWT-PATTERSON AFS ON4 F/O 9/2A ONTINUOUSLY RECONIGURING MULTI-MICROPROCESSOR FLIGHT CONTRO-f*C (U)

MAY 61 S J LARINER, S L MAHER NUNCLASSIFIED AFAT57 NflllfffflllfffI1 EhE EEEEEEhomhohhohhohEEEEEEEEEEEEoohEImEEEohhEEEEEEE

AFWAL-TR-81-3070

! A CONTINUOUSLY RECONFIGURING MULTI-MICROPROCESSOR

M FLIGHT CONTROL SYSTEM

Stanley J. Larimer, Captain, USAFScott L. Ma.her, First Lieutenant, USAF

MAY 1981

Final Report for Period August 1979 to March 1981

Approved for public release; distribution unlimited.

j FLIGHT DYNAMICS LABORATORY', AIR FORCE WRIGHT AERONAUTICAL LABORATORIES

AIR FORCE SYSTEMS COMMANDWRIGHT-PATTERSON AIR FORCE BASE, OHIO 45433

1- 7 13 311

NOTICE

When Government drawings, specifications, or other data are used for anypurpose other than in connection with a definitely related Government pro-curement operation, the United States Government thereby incurs no respon-sibility nor any obligation whatsoever; and the fact that the governmentmay have formulated, furnished, or in any way supplied the said drawings,specifications, or other data, is not to be regarded by implication or other-wise as in any manner licensing the holder or any other person or corpora-tion, or conveying any rights or permission to manufacture use, or sell anypatented invention that may in any way be related thereto.

This report has been reviewed by the Office of Public Affairs (ASDIPA) andis releasable to the National Technical Information Service (NTIS). At NTIS,it will be available to the general public, including foreign nations.

This technical report has been reviewed and is approved for publication.

SCOTT L. MAIER, iLt, USAF STANLEY JProject Engineer Project EngineerControl Data Group Control Analysis GroupControl Systems Development Branch Control Dynamics Branch

EVARD H. FLINN, Chief RONALD 0. ANDERSON, ChiefControl Systems Development Branch Control Dynamics BranchFlight Control Division Flight Control Division

FOT E COA*ANDER

ROBERT C. ETTINGE1j Colonel, USAFChief, Flight Control Division

"If your address has changed, if you wish to be removed from our mailinglist, or if the addressee is no longer employed by your organization, pleasenotify AFWAL/FIGC, W-PAFB, OH 45433 to help us maintain a current mailinglist".

Copies of this report should not be returned unless return is required bysecurity considerations, contractual obligations, or notice on a specificdocument.AIA FORCE/56780/23 Jun. 1981 - 510

SECURITY CLASSIFICATION OF THIS PAGE (When Date Entered)

,- REPORT DOCUMENTATION PAGE READ INSTRUCTIONSRE R DBEFORE COMPLETING FORM

I REPORT NUMBER 2. GOVT ACCESSION NO. 3. RECIPIENTS CATALOG NUMBER

AFWAL-TR-81-3k7Q _7 ________'___ovAS ° SCk. TITLE (and Subtitle) S./ . *5.- IPl OF REPORT A PERIOD COVERED

.A CONTINUOUSLY JRECONFIGURING MULTI-MICROPROCESS Final R pe~t-

FLIGHT CONTROL SYSTEM 1 Aug 1079 -- 30 Apr ]W81 .S. PERFORMING ORG. REPORT NUMBE4

, AUTHOR(S) S. CONTRACT OR GRANT NUMBER(e)

Stanley J. Larimer, Captain, USAF -Scott L.lMaher First Lieutenant, USAF 4

9. PER;ORMING ORGANIZATION NAME AND ADDRESS SO. PROGRAN ELEMENT. PROJECT, TASK

Flight Dynamics Laboratory (AFWAL/FIGC) AREA& WRK UNIT NUMBERS

Air Force Wright Aeronautical Laboratories (AFSC) Program ement,: 62201FProj e c t : 2 403 1 1 -7 " -o.

Wright-Patterson AFB, Ohio 45433 Task: 02 Work Uni. " 44

1,. CONTROLLING OFFICE NAME AND ADDRESS 12. REPORT DATE

j'- May 18113. NUMBER o"AGES

148 . .14. MONITORING AGENCY NAME & ADDRESS(If different from Controlling Office) 15. SECURITY CLASS. (of this report)

UNCLASSIFIEDIS.. DECLASSIFICATION DOWNGRADING

SCHEDULE

16. DISTRIBUTION STATEMENT (of this Report)

Approved for public release; distribution unlimited.

'7. DISTRIBUTION STATEMENT (of the *betract entered In Block 20, It different from Report)

IS. SUPPLEMENTARY NOTES

19. KFY WORDS (Continue on reverae aide if necessary and identify by block number)

Uig'tal Control Systems Reconfiguration

Multi-Microprocessor Distributed SystemsFlight Control SystemsDistributed Control

ABSTRACT (Continue ot reverse side if necessary end Identify by block number)

Recent research at the US Air Force Wright Aeronautical Laboratories (FlightDynamics Lab) has resulted in the development of a promising microprocessor-based flight control system design. This system is characterized by a collec-tion of cooperatively autonomous distributed microcomputers interconnected byan arbitrary number of common serial multiplex busses. Each processor in thesystem independently determines its assignments using a simple algorithm that

(continued)

DD JA m7, 1473 EDITION OF I NOV 65 IS OBSOLETE NSECURITY CLASSIFICATION OF THIS PAGE (*%*. Data Entered)

* - / .

SECURITY CLASSIFICATION OF THIS PAGE(Wen DataEnlered)

dynamically redistributes system functions from processor to processor in anever-ending process of reconfiguration. This approach offers several benefitsin terms of system reliability, and the architecture in general incorporatesmany state-of-the-art features which promise improved system throughput, expand-ability, and above all, ease of programming

The Continuously Reconfiguring Multi-Micropro ssor Flight Control System(CRMmFCS) represents a major data point in multiprocessor control systemresearch. Promising ideas from a variety of ref rences have been included andintegrated in its design. Its laboratory impleme tation provides a demonstra-tion of these ideas for improving throughput, relibility, and ease of pro-gramming in flight control applications.

* - j"

SECURITY CLASSIFICATION OF THIS PAGE(When Dal Entered)

FOREWORD

The research described in this report was performedin-house at the AFWAL Flight Dynamics Laboratory during theperiod from September 1979 to March 1981. It is the resultof a joint effort between members of the Control SystemsDevelopment Branch (AFWAL/FIGL) and the Control DynamicsBranch (AFWAL/FIGC) of the Flight Control Division.

Work began in this area in late 1978 when the ControlSystems Development Branch initiated a work unit called"Multi-Microprocessor Control Elements" (24030244). Duringthis time, Lt. James E. May, Lt. Scott L. Maher, John Houtz,and Capt. Larry Tessler laid the foundation for a study ofhow growing microprocessor technology could be applied tothe problems of modern flight control. It was decided todesign and build some form of fully distributedmicroprocessor-based flight control system in order toexplore the potential problems and benefits in greatdetail.

In September 1979, Lt. Scott Maher took over asPrincipal Investigator and Capt. Stan Larimer joined theprogram as the Associate Investigator from the ControlDynamics Branch. In the months that followed, Maher andLarimer developed what has come to be known as the"Continuously Reconfiguring Multi-Microprocessor FlightControl System" architecture. Steve Coates, RichardGallivan, and Tom Molnar also provided invaluable input tothe research during this phase.

During the summer of 1980, Mr. Harry Snowball (ControlData Group Leader) and Mr. Evard Flinn (Control SystemsDevelopment Branch Chief) decided to intensify efforts todevelop this new architecture. They hired four newengineers including Bill Rollison, Ray Bortner, Mark Mears,and Stan Pruett to work on the project. They also assignedtechnician SSgt. Jeff Lyons and co-op students Dan Thompson,Russ Blake, and Bob Molnar to assist with the R&D activites.At the same time, Mr. Dave Bowser (Control Analysis GroupLeader) and Mr. Ron Anderson (Control Dynamics Branch Chief)increased FIGC support by assigning Lt. Allan Ballenger toact as an architecture and control law consultant on theproject. In 1981, Lt. Jack Crotty joined the program as asoftware designer thereby completing the CRMmFCS team.

The authors would like to thank these individuals fortheir many contributions to the success of this program.Bill Rollison was responsible for the design, development,construction, and testing of the transmitter and receivercircuitry. Ray Bortner designed the real-world interfaceprocessor and helped to develop the CRMmFCS control laws andcorresponding aircraft simulation. Mark Mears was

iii

responsible for the haidware and software design of theentire data collectfcn and processing network for theCRMmFCS laboratory implementation. Stan Pruett invented aunique five-port RS-232 communications controller whichallows all systems in the laboratory to communicate witheach other. He also assisted in the development of CRMmFCSprocessing module software and programmed the TRS-80 to actas controller for the entire laboratory system.

Lt. Allan Ballenger was responsible for development ofthe real-world plant simulation and overall control lawdesign. He also organized a national workshop onmulti-processor flight control architectures which provideda vital link with others working in the field. Tom Molnarserved as a technical consultant and work unit monitor forthe proaram. Lt. Jack Crotty was responsible for thedevelopment of all CRMmFCS software. Dan Thompson providedthe original design for much of the TMS-9900 software usedto implement the CRMmFCS operating system.

Rick Gallivan was responsible for the development of areal-world simulation on a Motorola 68000 microprocessor andprovided logistics support for the program. Steve Coateshandled the construction of the entire laboratory setup andh-lped to develop a real time graphics display for the pilotinterface. Jeff Lyons designed and constructed the firstworking bus termination circuit. Russ Blake and Bob Molnarprovided drafting and breadboarding support for many of thelaboratory components.

The authors would also like to express special thanksto Art Eastman and Dave Dawson for their outstanding work indesigning and producing the many figures which appearthroughout this document. Carl Weatherholt, Rudy Chapski,and Bill Adams also provided invaluable support in thelaboratory. Many thanks also go to Pam Larimer, JanRobinson, and Dave Bowser for their assistance in thepreparation of this report.

Finally, the authors wish to express their appreciationto Mr. Vernon Hoehne, Lt. Allan Balienger, and Tom Molnarfor their considerable efforts in reviewing the technicalreport.

This report covers work performed from September 1979through March 1981. It was submitted by the authors in May1981.

iv

TABLE OF CONTENTS

I. INTRODUCTION ......... ................... 1

II. OVERVIEW .......... ..................... 3

A. Overview of the CRMmFCS Architecture .. ..... 3

B. Design Philosophy ....... .............. 5

C. Evolution of Continuous Reconfiguration . . . 7

D. The Concept of Continuous Reconfiguration . . 14

Example of Continuous Reconfiguration . . 15

Advantages of Continuous Reconfiguration . 15

Controlling A C. R. System .... ........ 19

Requirements for C. R ... ........... ... 20

E. Section Summary ..... ............... 22

III. HARDWARE ARCHITECTURE .... .............. . 23

A. Introduction ...... ................. .. 23

B. Distributed Control of a MUX Bus ........ .. 23

Other Approaches .... ............. . 24

Improving Bus Efficiency ......... 26

A New Approach to the Problem . ...... 27

C. Transparent Contention . ........... .28

Transmission Without Contention ...... . 29

Transmission With Contention ....... 31

An Extention to n Busses .. ......... .. 36

D. Virtual Common Memory . . ........... 39

The Virtual Common Memory Concept . . .. 39

Virtual Common Memory Design ....... 42

E. Effective Use of Virtual Common Memory . . .. 48

F. Section Summary ..... ............... 53

v

TABLE OF CONTENTS (concluded)

IV. SOFTWARE STRUCTURE ...... ................ 54

A. Introduction ...... ................. .. 54

B. The Task Assignment Chart ... .......... 56

Application of Task Assignment Charts . . 59

Real Time Chart Execution .. ........ 61

Multi-Rate Considerations .. ........ 65

Compound Millimodules ... .......... 67

C. Autonomous Control Algorithms .. ........ 67

Autonomous Control .... ............ . 68

Volunteering .... .................. 69

Continuous Reconfiguration ........ 72

D. Reliability Considerations ... .......... . 73

E. Section Summary ..... . .............. 75

V. LABORATORY IMPLEMENTATION .... ............ 76

VI. CONCLUSION ........ .................... . 80

APPENDIX A. Bus Transmitter Design ... .......... . 82

APPENDIX B. Bus Reciever Design ... ........... . 97

APPENDIX C. Bus Termination Circuit Design ....... .. 103

APPENDIX D. Real World Interface ... ........... . 106

APPENDIX E. Data Collection Circuitry .. ........ 112

APPENDIX F. State Information Matrix (SIM) Theory . . 120

APPENDIX G. Multi-Rate Millimodule Design . ...... 135

REFERENCES ........ ...................... 140

vi

LIST OF ILLUSTRATIONS

1. Major CRMmFCS Architecture Components ... ....... 4

2. Virtual Common Memory ................ 6

3. Breakdown of Possible Architectures .... ........ 8

4. Fixed vs. Pooled Designs with Equal Redundancy . . 10

5. Fixed vs. Pooled Designs with Equal Numbers .... 12

6. Continuous Reconfiguration . . . .......... 16

7. Essential Architecture Elements ... .......... . 30

8. Transmitter-Bus Interface ............. 32

9. Bus Contention Arbitration .... ............ 34

10. Transmitter Circuit ................. 37

11. Evolution of Virtual Memory .... ............ . 40

12. Physical Implementation of Virtual Memory ..... ... 44

13. Receiver Block Diagram ..... .............. 46

14. Bus Transmission Scheduling .... ............ . 50

15. Data Flow Assignment ..... ............... 52

16. Approaches to Task Scheduling ... ........... . 55

17. A Generic Task Assignment Chart ... .......... . 57

18. CRMmFCS Task Assignment Chart ... ........... . 60

19. Task Assignment Table Generation .. ......... . 62

20. Executive Flow Chart ................ 64

21. The High-Rate Task Scheduling Problem ........ .. 66

22. Volunteering and the Volunteer Status Table .... 70

23. Laboratory Implementation .... ............. . 77

vii

LIST OF ILLUSTRATIONS (concluded)

A-i. Transmitter Block Diagram ................ 83

A-2. Transmission Format ....... ............... 85

A-3. Transmitter Circuit ...... .............. 90

B-I. Receiver Functional Diagram ... ........... . 98

B-2. Receiver Block Diagram .... ............. . i. 100

C-i. Smart Bus Configuration .... ............. 104

D-1. Real World Interface Processors .. ......... . 109

E-1. Bus and SIM Monitors ..... ............... .. 115

E-2. Data Collection and Post-Processing Circuitry 118

F-i. The State Information Matrix ... ........... . 121

F-2. The State-Time Form ..... ............... 123

F-3. State Information Matrix Processing . ....... .. 125

F-4. Single Copy / Multi-Access Architecture ..... .. 129

F-5. Multi-Copy / Multi-Access Architecture ....... .. 130

F-6. Distributed / Shared Information Architecture . . 131

F-7. Broadcast Data Vector Architecture .. ........ .. 132

F-8. Virtual Memory Emulation .... ............. . 133

F-9. Virtual Memory Equivalent .... ............ 134

G-1. Possible Rates for a Ten-Milliframe TAC ..... .. 136

viii

SECTION I

INTRODUCTION

The use of microprocessors in flight control

applications is a subject which has received much attention

in recent years. Microprocessor technology is growing

rapidly and there is a strong desire to take advantage of

it. This report presents the results of several years of

research performed at the Flight Dynamics Laboratory into

how microprocessors might best be used for flight control

applications.

Microprocessors appear to have two major areas of

application to the flight control problem. These may be

termed "the low end" and "the high end." In the low end

approach, microprocessors are distributed around the system

in a dedicated fashion wherever a small amount of processing

power is needed. In this mode, microprocessors are

relegated to the role of "smart" peripherals to some central

computer system. This application has been demonstrated

with considerable success in many currently flying aircraft.

By performing many of the repetitive, time intensive

functions such as keyboard monitoring, sensor preprocessing,

display generation, and inner-loop control, microprocessors

can relieve the central computer of much of its

computational load so that it can concentrate on what it

does best: number crunching, system management, and

outer-loop control. Since the low end application has been

demonstrated in many operational systems and its utility is

generally unquestioned, it will not be discussed further in

this report.

The "high end" application for microprocessors is a

much newer field of research. It is concerned with how to

use microprocessors as a distributed, multi-processing

replacement for the central computer itself. Since the goal

of this research program was to investigate the potential of

microprocessors for flight control and not to design a

working model for near-term application (where a more

conservative approach would be necessary), it was decided to

attempt the most ambitious (and most promising) application

of microprocessor technology: "a fully decentralized,

continuously reconfiguring, self-healing, adaptive, pooled

microprocessor-based flight control system." The result was

the development of an architecture known as the

"Continuously Reconfiguring Multi-Microprocessor Flight

Control System" (CRMmFCS).

This report presents the results of research to date in

the development of the CRMmFCS. Section II gives an

overview of the architecture and the philosophy behind it.

Section III presents a detailed discussion of the hardware

aspects of the architecture while Section IV does the same

for software. Both sections represent virtually stand-alone

discussions of their respective topics at a level which is

as thorough as possible without sacrificing readability.

Technical details which are of use to the reader only after

a complete study of the architecture have been removed to

the appendicies. Finally, Section V describes the actual

laboratory implementation and test procedure which will be

used to demonstrate the architecture. The results of these

tests will be published in a subsequent technical report.

2

SECTION II

OVERVIEW

A. OVERVIEW OF THE CRMmFCS ARCHITECTURE

The CRMmFCS design centers around a system of

autonomous microprocessors connected by a common set of

serial multiplex busses. These processors operate in a

pooled configuration where any processor can perform any

task at any time. Furthermore, task assignments are

continuously redistributed among all processors in a

never-ending process of reconfiguration. If a processor

fails in the system, it is simply left out of the next

reconfiguration cycle and the system continues to operate as

if nothing has happened. All of this is accomplished

without use of a central controller.

A diagram of the architecture is shown in Figure 1. In

the figure, six processing modules are shown connected to a

set of four common data busses. Each data bus consists of

one clock and one data line and information is transferred

between processors using a simple serial multiplexing

scheme.

Processors in the system compete for access to the

busses without central traffic control using a technique

called "transparent contention." Transparent contention is

a scheme which allows any processor to talk on any bus at

any time. In the event of a "collision" between two

transmissions only one message survives while the remaining

one is automatically retransmitted as soon as the bus is

free. Transparent contention provides one hundred percent

efficient bus utilization, eliminates most communications

overhead, and completely avoids the need for a central

controller. It is discussed in detail in Section III.

3

E C,

Eu.C,

~ft - - -I-IL C.)

MBA.2 :...

0-

In addition to competing for the bus, processors also

compete for the right to perform tasks in the system.

During every time frame all processors "volunteer" for the

tasks to be done in the following frame. These tasks are

then divided by mutual agreement among all functioning

processors in the system, again without needing a central

controller.

Because tasks to be performed are redistributed at the

beginning of each frame, the task any particular processor

performs is changing all the time. The system is said to be"continuously reconfiguring." Continuous reconfiguration

has a number of important advantages over other approaches.

These include automatic recovery in the event of a failure,

constant spare checkout because no unit acts as a spare all

the time, latent fault protection, and zero reconfiguration

delay. A complete discussion of continuous reconfiguration

is presented in Section IV.

Finally, although processors communicate only via

simple serial busses, the architecture is configured so that

they appear to share a single common memory as shown in

Figure 2. This virtual common memory contains all

information available about the state and environment of the

entire system. Processors obtain the information needed for

any task from the virtual common memory and place their

results back there for use by other processors. Complete

details on how a set of serial busses can be made to act

like a common memory are presented in Section III.

B. DESIGN PHILOSOPHY

Before beginning a detailed discussion of the

Continuously Reconfiguring Multi-Microprocessor Flight

5

Figure 2. Virtual Common Memory

6

Control System (CRMmFCS) it is desirable to briefly discuss

the design goals and philosophy which lead to this

architecture. The original objective of this in-house effort

was to develop an Air Force understanding and capability in

the area of multi- microprocessor flight control systems. It

was determined that a high risk - high payoff approach could

be taken in an effort to advance the state-of-the-art while

achieving the original objective. The approach taken was to

trade off low cost hardware for simplified software and to

distribute system control to its extreme in order to study

the extent to which its potential advantages could be

achieved. Other goals were to reduce overall hardware,

software, and life cycle costs of flight control systems

while maintaining high reliability and fault tolerance.

Design considerations also included expandability for

integrated control applications and reconfigurability to

meet future self-healing requirements.

C. EVOLUTION OF CONTINUOUS RECONFIGURATION

Figure 3 shows a breakdown of some of the possibilities

that exist for implementing digital control systems in

general. Starting at the top of the figure, it may be seen

that digital control systems can be broken down into either

uni-processor of multi-processor systems. The distinction

here is not so much whether there is one or more than one

processor in the system, but rather whether there is more

than one processor performing different functions. A system

with, say, four processors performing identical functions

for redundancy would, for the purposes of this report, be

considered a uni-processor system since its effective

throughput is that of a single processor.

In this study, the multi-processor approach was chosen

for two reasons. First, since state-of-the-art

7

digitalcontrolsystems

multi.

11processor

Epro":ssor I

F-'fixed' pooled'

cold hot ontinuousspares spares recon-

figuration

Fiqure 21.. Breakdown of Possible Architec ures

microprocessors have somewhat lower throughput than their

mini-computer counterparts, it is unlikely that a single

microprocessor would be able to handle the workload of a

modern flight control system. Since we are constrained to

use microprocessors by the goals of this investigation, it

would seem that a multiprocessor architecture is mandatory

for flight control applications. Of course, with the rate

at which the field is developing, there may soon be

microprocessors that can handle the required workload in a

uni-processor configuration. But with the development of

ever more sophisticated estimation, parameter

identification, and self-optimization algorithms and

increasingly demanding command and control functions, the

required workload may go up at an even faster rate. Since a

multi-processor architecture will always be able to improve

on the throughput of a uniprocessor architecture of the same

state-of-the-art, and since the only thing growing faster

than computer technology is the size of the problems to be

solved, there will always be a need for multi-processor

configurations. The need for better methods to construct

such architectures is the second reason why the

multi-microprocessor approach was selected for this

investigation.

Given then that a multi-microprocessor system is to be

implemented, there are two possible ways in which their

functions can be assigned. As shown in Figure 3, these ways

are "fixed" assignment of processing resources, where the

function of each processor is permanently assigned, and

"pooled" processing resources, where processors are

dynamically assigned to each function as the needs arise.

The fixed assignment is inherently simpler to implement and

is adequate for many applications. However, in systems

requiring great reliability and minimum hardware, the pooled

approach offers distinct advantages.

9

fixed *s:par* V, pooled

Figure 4. Fixed vs. Pooled Designswith Equal Redundancy

10

Figure 4 demonstrates the advantages of the pooled

approach in systems requiring a large number of processing

tasks. The system shown requires six different tasks to be

performed concurrently and that a quad level of redundancy

be maintained. A fixed assignment implementation of this

system (Figure 4a) requires that six processors be

permanently assigned to the six tasks and that three spares

be permanently assigned to each processor. The net result

is a 24 processor system. Figure 4b shows an equivalent

system using a pooled architecture. This system still

requires at least six processors to perform the six

concurrent tasks but the number of spares is substantialy

reduced. This is because, since any processor can be

dynamically assigned to any task, the three spares are able

to cover for any three failures in the system. Thus both

systems can tolerate any three random failures but the

pooled architecture requires significantly fewer processors.

It is clear that, as the number of tasks to be performed

increases, this difference becomes even more important.

The argument just presented applies only to systems

where failure detection is provided externally and the only

requirement is to replace a faulty processor with a spare.

For systems which must detect and locate their own failed

processors as well as replace them, the distinction is not

so much in the number of processors saved as it is in the

level of redundancy provided.

For example, Figure 5 shows fixed and pooled

architectures each having 24 processors and providing triad

voting for fault detection and isolation. They each have a

complement of six spares for redundancy purposes, and both

detect and correct faults by comparing the results of Al,

A2, and A3 or Bl, B2, and B3, etc. and replacing the

disagreeing processor with a spare. Unfortunately, in the

fixed architecture when a failure has occurred in a given

11

* fixedAl C D3 B3 E1 E spare

F 3 A2 D, 1, *3 pooledr any spare

C2 D 2 B2 A,

Figure 5. Fixed vs. Pooled Designswith Equal Numbers

12

task group, no further failures can be tolerated in that

group because its only available spare has been used up.

Thus, the entire system can tolerate only one random failure

with guaranteed integrity of all six functions. The pooled

architecture, on the other hand, can tolerate up to six

random failures because all of its spares are free to be

assigned wherever they are needed throughout the system.

Because of its many benefits and great versatility, the

pooled processor approach was selected for use in this

study. Given that decision, and referring again to Figure 3,

there are at least three ways of implementing a pooled

processor architecture. These three approaches include

"cold spares", "hot spares", and "continuous

reconfiguration." In each case a pool of spare processors

is maintained to replace failed processors. The difference

is in the way that the spares are brought on line.

Cold spares are the simplest approach to the problem.

A pool of idle spares is maintained and, when a failure

occurs, one of the spares is loaded with whatever data and

software it needs to perform the missing function and is

then brought on line. This is an acceptable method when the

system involved is not real-time and a brief interruption

during reconfiguration is unimportant. Unfortunately, in

real-time systems the delay involved in "warming up" a cold

spare is often unacceptable and may even be disasterous.

An obvious solution to the cold spare problem is to

maintain a pool of "hot spares." That is, a pool containing

spares which are already loaded with all the software and

current data needed to come on line immediately after a

failure is detected. One hot spare is maintained for each

function in the system and the remaining spares are left"cold." When a hot spare switches on line, a cold spare is

13

"warmed up" to replace it so that a hot spare is maintained

for every function as long as the supply of cold spares

lasts.

The hot spare approach is a great improvement over cold

spares and is entirely adequate for most applications. Its

chief drawback is the large number of spares required to

ensure that every function has its own hot spare. Once a

processor has become a hot spare it is essentially dedicated

to one function. This is contrary to the goal of truly

pooled resources. In addition, although the switch-in time

is much improved, reconfiguration is still treated as an

emergency requiring special processing and introducing

delays and reconfiguration transients. What is needed is an

approach which requires a minimum number of spares, produces

no reconfiguration delays, and avoids the dedication of

spares to specific functions. Continuous reconfiguration is

such an approach.

D. THE CONCEPT OF CONTINUOUS RECONFIGURATION

Continuous reconfiguration is defined as a scheme

whereby the tasks to be performed in a multi-processor

system are dynamically redistributed among all functioning

processors at or near the minor frame rate of the overall

system. This approach allows continuous spare checkout,

latent fault protection, and elimination of failure

transients due to reconfiguration delay. By treating

reconfiguration as the norm rather than the exception,

failures can be handled routinely rather than as

emergencies, resulting in predictable failure mode behavior.

Using this approach, it is projected that the need for

unscheduled system maintenance may be greatly reduced.

14

Example Of Continuous Reconfiguration

An example of what is meant by continuous

reconfiguration is shown in Figure 6. A system of nine

processors is shown performing six different tasks, A thru

F, during three consecutive time frames. During the first

time frame processor 1 is doing task B, processor 2 task D,

processor 3 is a spare, and so on. In continuous

reconfiguration the tasks are redistributed among the

processors at the beginning of every time frame. For

example, in the second time frame , there is an entirely

different assignment of tasks to the processors. This

reassignment is accomplished by having all of the processors

that are currently healthy in the system compete for task

assignments. If a processor fails during any time frame, it

is no longer able to compete for task assignments and is

thereby automatically removed from the system. In Figure 6,

if processor 4 failed during the second time frame, then

during the next frame, it would not be able to compete for

task assignment. The six tasks which need to be done are

taken by healthy processors and the two remaining processors

become spares (Figure 6c). In other words, a failed

processor simply disappears from the system without any

other processors being aware that it is gone.

Assuming for the moment that it is possible to

implement such a system efficiently, this scheme presents a

number of distinct advantages. These advantages will be

discussed next.

Advantages gf Continuous Reconfiguxatign

The primary motivation for developing a continuous

reconfiguration scheme is to allow a multi-processor system

to detect and recover from random failures with no effect on

its performance. This is a major problem with

15

6a. Time Frame 1

6b. Time Frame 2

6c. Time Frame 3

Figure 6. Continuous Reconfiguration

16

recontiguration in general since the process of shutting

down a failed processor and starting up a spare almost

invariably causes a short delay during which the output of

the system is incorrect. This period of time is called a

reconfiguration delay and the result is a failure transient

which can be disasterous at the system output. The main

reason for these delays is that most reconfigurable systems

treat failures as emergencies requiring special actions

which take time. In a continuously reconfiguring system,

reconfiguration is the norm, not the exception. Task

reassignment is regularly scheduled at frequent intervals so

that when a failure does occur the system takes it in stride

without missing a beat.

Figure 6c shows how this works. In the figure,

processor 4 has failed but the rest of the system doesn't

notice it. The task that processor 4 was performing in the

previous frame (Figure 6b) has been automatically reassigned

to some other processor and the only effect is the net loss

of one spare. There has been no reconfiguration delay and

therefore no transients due to reconfiguration. This is the

first major advantage of continuous reconfiguration.

Eliminating reconfiguration delay by itself can not

guarantee that there will be no failure transients at the

output. A second source of these transients is failure

detection delay. For example, in Figure 6 processor 4 may

have been generating bad data for some time before its

failure was detected in frame 2. Without continuous

reconfiguration this stream of bad data would go to a single

output device (rudder, aileron, display, etc.) causing a

significant transient in that particular device. With

continuous reconfiguration, the bad data goes to a different

device every frame depending upon which task the processor

is doing at that time. Since real world dynamics are

usually much slower that the computer's frame rate, the

17

aircraft will simply not respond to a single sample of bad

data. By moving the bad data around from surface to

surface, failure effects can be kept insignificant until

failure detection occurs. This dispersion of failure

effects is the second major advantage of continuous

reconfiguration.

A discussion of how to rapidly detect and isolate

failures and how to prevent AUn3y bad data from reaching the

system outputs will be presented in Section III when the

triad structure is introduced.

A third advantage of the continuous reconfiguration

approach is latent fault protection. Latent faults are a

class of faults that are inherently undetectable because

they produce no noticeable change in system performance or

output. This would seem to be no cause for alarm since a

fault which produces no error would appear to be harmless.

However, a latent fault can be very dangerous if it impairs

the system's ability to tolerate subsequent failures. For

example, if processor K fails in such a manner that its

outputs are correct but it is no longer able to check

processor D, then the system will continue to function

normally while the fault in K remains unobservable. if

processor D should fail, and the system is depending upon K

to detect it, a catastrophic system failure may result. The

continuous reconfiguration scheme avoids the problem because

the processor responsible for checking any other processor

changes with every frame. Thus, no dangerous combination of

failures is allowed to exist for more than one frame at a

time. The possibility of a "deadly embrace" between two

partially failed processors is also avoided.

Finally, continuous reconfiguration allows the constant

checkout of all processors because no processor serves as a

spare for more than one frame at a time. In an ordinary

18

recontigurable system, where certain processors are always

spares, there is a danger that one of the spares will fail

before it is needed. If this happens, a disaster may result

when the failed spare is used in an emergency. The problem

is analogous to changing a tire and discovering that the

spare is flat. By constantly "rotating the tires" in a

continuously reconfiguring system, failures in any processor

can be detected as soon as they occur.

In this section it has been shown that there are four

major advantages to the continuous reconfiguration approach.

These include (1) zero reconfiguration delay, (2) dispersion

of failure effects, (3) latent fault protection, and (4)

continuous spare checkout. Because of these advantages and

their potential contribution to system reliability,

continuous reconfiguration was selected as the method to be

used for managing the pooled multi-microprocessor

architecture developed in this program. The next section

looks at some of the problems involved in implementing a

continuously reconfiguring system.

Controlling A Continuously Reconfiguring System

A unique approach was taken for controlling the

continuously reconfiguring multi-microprocessor flight

control system. The traditional approach would have been to

have a central controller in charge of assigning tasks,

handling reconfiguration and controlling bus access.

Unfortunately, a central controller introduces the

possibility of a single point failure in the system

requiring redundancy incompatible with the architecture and

reducing the reliability of the continuous reconfiguration

concept.

An alternative to a central controller is the

autonomous control approach. This is a scheme whereby each

19

processor independently determines its own next task based

upon the current aircraft state. This can be better

understood by using an analogy. Like the traditional

centrally controlled computer architecture, a company has a

president who has several vice-presidents working for him.

The president has access to all information concerning the

states of the company and an understanding of how the

company should function. He uses this knowledge to allocate

tasks to the vice-presidents and arbitrate any disagreements

that may arise between them. Autonomous control is analogous

to replacing each of the vice-presidents with a clone of the

president. The vice-presidents are now capable of making

the same decisions that the president would have made under

the same circumstances, since each has access to the same

data and would go through the same decision making process

that he would. The need for the president has been

eliminated and he has been replaced by autonomous

vice-presidents. This approach is not practical in the human

world because no two humans think alike. In the computer

world, however, it is a realizable possibility.

ReQuiremenf9__C9= ous Reconfiguration

In order to make continuous reconfiguration of

autonomously controlled processors possible, several

requirements must be satisfied. These requirements include

well-defined task assignment rules, availability of all

system state information to all processors, availability of

all software to every processor, and an efficient bus

contention scheme. The methods used to meet each of these

requirements in the laboratory implementation are covered in

detail later in this report.

The first requirement is for a set of well-defined task

assignment rules. Each of the processors must have an

efficient means of determining the next task that it is

20

required to do. There must not be an opportunity for any

processor to conflict with other processors in the system

and cause system failures. The task assignment rules are a

function of the operating system software and are discussed

in detail in Section IV.

A second requirement is that all processors must have

all software. In order for a processor to be capable of

doing any system task at any point in time, it must have the

software available to do the task. This may seem

unrealistic at first but the trends in memory technology

indicate that memory may be expected to double in density

several times in the next five years while its cost

continues to decrease. This trend makes supplying all

software to every processor a reasonable exchange for the

benefits offered by the CRMmFCS.

A third requirement of this system is that all

processors must have all data. Since any processor must be

capable of doing any task at any point in time, each

processor must have access to all data concerning the

present state of the aircraft. This requirement has been

met by the development of a virtual common memory

architecture which allows every processor to access any

piece of data by what appears to be a simple read from a

shared common memory. This concept will be discussed in

detail in Section III.

Finally, if all processors are to operate independently

and yet share the same set of data busses for communication,

some method must be found for them to agree on who can talk

on a bus at any given time. Since central control of any

kind is not allowed in a fully distributed architecture,

this must be done without use of a central bus controller.

The scheme selected must also be very efficient since bus

bandwidth will be at a premium in systems with many

21

processors. This requirement for an efficient, autonomous

bus contention scheme was satisfied through a new approach

called "transparent contention." It will also be discussed

in Section III.

E. SECTION SUMMARY

This section has presented an overview of the CRMmFCS

architecture and the philosophy behind it. The concepts of

continuous reconfiguation, autonomous control, transparent

contention, and virtual common memory have also been

introduced. In the next two sections, these ideas will be

discussed thoroughly from both hardware and software points

of view. In the process, every major component of the

CRMmFCS system will be described in enough detail to give

the reader a complete understanding of the overall design.

Numerous references to the appendices will be made along the

way to aid the interested reader in an even more detailed

study of the architecture.

22

SECTION III

HARDWARE ARCHITECTURE

A. INTRODUCTION

The CRMmFCS architecture consists of a collection of

autonomous microcomputers interconnected by a set of serial

multiplex busses so that they appear to share one common

memory through which they communicate. This section

presents the details of how the system was designed from a

hardware point of view. Section IV will address the same

subject from a software perspective.

One of the main requirements of the CRMmFCS design was

the elimination of any form of central control. This meant

that processors had to independently determine their own

task assignments and that some means was required to manage

communication without a bus controller. The problem of task

selection was solved with software in the CRMmFCS and is

discussed in Section IV. Autonomous bus control, on the

other hand, had a convienient hardware solution. It is

therefore discussed in this section.

The CRMmFCS data bus is fundamental to the entire

architecture. The need for an autonomously controlled bus

which would act like a common memory influenced the design

of every system component. For this reason, the hardware

elements of the architecture will be discussed in terms of

their relationship to the global bus design.

B. DISTRIBUTED CONTROL OF A MULTIPLEX BUS

One of the most fundamental questions which must be

asked when designing a system of autonomous processors is

23

how they will communicate with each other. Direct

connection between a large number of processors is clearly

impractical since the number of lines required for n

processors is n(n-l)/2, (which rapidly becomes very large).

A common serial multiplex bus is a more reasonable

alternative, but it introduces the problem of bus traffic

control: How does one resolve processor contention for the

bus without resorting to a central controller? This section

presents a promising solution to the problem.

Othe.rArproaches

In order to place this new solution in proper

perspective, it is helpful to briefly review several

existing bus control schemes. Three such schemes will be

discussed including a well known central control approach

and two experimental distributed control methods.

The classical central control approach is the MIL-STD-

1553 class of busses. Using this scheme, each bus is set up

with a central controller and every processor in the system

is considered to be a "remote terminal." Any given

processor can talk on the bus only when instructed to do so

by the central controller. This provides secure and

flexible control of global bus resources, but has a number

of limitations. First, a processor wishing to transmit must

wait until the controller gives it permission, resulting in

some inherent throughput delay. Second, there is even more

delay involved for one processor to obtain data from

another. This is because it must request the data through

the controller and wait until the other processor receives

the request, looks up the answer, and sends it back.

Finally, the system does not make efficient use of bus

bandwidth because part of the available transmission time is

used up in the overhead of bus control. Thus, the 1553 bus

24

is simple and reliable but somewhat limited in performance.

Because it requires a central controller, it also violates

the assumed goal of a fully distributed system design.

It should be mentioned in passing that there are

1553-based designs which claim to implement distributed

control (Reference 3). Such claims are true only in a

limited sense. While p control of the bus may be

distributed in such systems, at any given time there is

still only one central controller operating in the

traditional command/response mode. For the purposes of this

report, distributed control will imply free and open

competition for the bus under a set of rules which all

participants obey. At no time is such contention arbitrated

at any central location, even if the location does move

around the system.

There are at least two examples of truly distributed

bus control schemes already in existence. The following

paragraphs summarize some of the interesting features of

each approach but no attempt will be made to cover them

comprehensively. The reader may consult the indicated

references for further details.

The first approach, found in the University of Hawaii

"Aloha" architecture (Reference 4), allows processors to

transmit on the bus any time it is available. In the event

that more than one processor starts at the same time, a

"collision" is said to have occurred and they all stop

sending immediately. Each then waits a slightly different

interval before attempting to retransmit. The one which

"times out" first gains access to the bus and completes its

message while all other processors wait until the bus is

once again available.

25

This approach avoids the need for a central controller,

but has the limitation that some time is wasted while

colliding processors "time out." This becomes serious when

demand for the bus is high and collisions are frequent.

Another approach to distributed bus control is used by

Honeywell (Reference 5). In this approach, all processors

take turns using the bus for a fixed amount of time in a

rotating fashion. If a processor has data to transmit, it

waits for its turn and then sends it all in a burst of some

maximum number of words. If it has nothing to transmit when

its turn arrives, it sends a null word and the system moves

on to the next transmitter. This is a fairly efficient

scheme, although some time is wasted for null messages. The

only real difficulty is keeping track of whose turn it is.

Improving Bu s Effiiency

While each of the approaches mentioned above has been

made to work effectively, there are still a number of things

which can be done to improve bus utilization efficiency.

These possibilities include:

(1) Scheduling transmissions to avoid periods of bus

inactivity or overload.

(2) Forming a queue of data to be transmitted in

every processor so that every available micro-

second on the bus is in use.

(3) Making sure there is no wasted time between

transmissions.

(4) Making sure there are no wasted transmissions

due to collisions.

26

(5) Transmitting data when it is generated instead of

waiting until it is needed (thereby reducing

access delays).

These five criteria served as design guides for the

approach to be presented. In the remainder of this report,

it will be shown how these goals have been achieved using

the concepts of "transparent contention" and "virtual common

memory."

A New Approach to the Problem

This section describes a new approach to autonomous bus

control designed to meet the goals listed above. The

following is an overview of how the idea works.

Time on the bus is divided into a series of consecutive

intervals (slots) that are exactly one transmission word

long (32 to 46 bits, depending on word format). At the

beginning of each new slot, all processors compete to fill

the slot with a word of data. The resulting massive bus

collision is then resolved using "transparent contention."

Transparent contention is a scheme which allows collisions

to occur on the bus in a manner such that only one of the

colliding messages survives. All other messages are

automatically suppressed without wasting a single bit of

transmission time. As a result, the slot is filled with one

and only one word and competition moves on to the next

available interval.

As long as there is data available to transmit, this

approach packs data onto the bus with absolutely maximum

density. No time is wasted during transmissions and no time

is wasted between them. One hundred percent efficient bus

utilization has been achieved.

27

In order to ensure that there is always data available

for transmission, each processor maintains a queue of words

to be transmitted. As each new piece of data is generated,

the processor places it into a first-in first-out buffer

(FIFO) and "forgets about it." A special transmitter

circuit then emptys the FIFO onto the bus by competing for

time slices with all other transmitters in the system. This

frees the processor from transmission considerations and

ensures a constant flow of data onto the bus.

There are, of course, potential problems with this

approach. If data is not generated fast enough, it is

possible for all buffers to become empty resulting in unused

time slots on the bus. This is of no concern unless there

are other times when too much data is generated resulting in

backlogs and throughput delays. It is therefore important

to schedule data generation in the system such that an even

rate of transmission is maintained. A technique for

scheduling data flow is presented later in this section.

All that remains now is to explain the details of

transparent contention. The following section discusses how

it can render bus collisions harmless. Subsequent sections

will present details on how to build and use such a system.

C. TRANSPARENT CONTENTION

This section presents the theory of operation behind

transparent contention. While almost any bus configuration

can make use of the idea, one specific design was chosen

because of its ease of implementation in the laboratory.

This design is covered first in order to clarify subsequent

discussion of the transparent contention concept.

28

The approach selected is nothing new. It amounts to

nothing more than clocking data out of one shift register

across a serial bus and into another. What jk unique is

the manner in which this process is controlled to prevent

conflicts on the bus between contending transmitters. In

order to understand this process, it is helpful to review

how a single transmitter operates when there is no

competition from other transmitters.

Transmission Without Contention

The essential elements of the bus architecture are

shown in Figure 7. In the figure, three processing modules

are shown interconnected by a common serial bus made up of a

data line and a clock line. Each processing module consists

of an ordinary microcomputer with two I/O devices including

a broadcaster (B) and a receiver (R). These devices use the

signal on the clock bus to shift data on to and off of the

data bus respectively. The box labeled "T" in the figure is

a bus termination circuit which generates the clock signal

and terminates the bus properly (See Appendix C).

Using this simple bus structure, a word of data is

transmitted in the following manner. The processor wishing

to transmit first places its information in its local

broadcaster FIFO. If the bus is available (as we assume in

this section), the broadcaster immediately latches the FIFO

output into a serial shift register and shifts it onto the

data bus with each positive-going edge of the bus clock. On

each negative-going edge, a bit on the bus is shifted into

receiving shift registers in every processor. From there,

the complete word is moved into local memory in each

processor using direct memory access. This technique will

be discussed later in the section on "Virtual Common Memory

Design."

29

___________777-7_

clockT data

B R B R IBY R]

micro micro micro

Figure 7. Essential Architecture Elements

30

Transmission With Contention

The question now arises, "What happens when more than

one processor wishes to use the bus at the same time?" The

answer is simply, "one of them wins." Exactly which one

wins is determined by a special logic circuit in each

transmitter which resolves the conflict. Its operation is

described below.

In the first place, access to the bus is granted on a

first-come, first-served basis. While one processor is

actively using the bus, a logical BUSY signal is maintained

which prevents any other processor from initiating a

broadcast (See Appendix A). This eliminates many

conflicts, but sooner or later more than one transmitter

will begin using an available bus on the exact same clock

pulse. When this happens, some other method is required to

resolve the contention.

The solution is found by observing what actually

happens when two transmitters put data on the bus at the

same time. As shown in Figure 8, each transmitter is

connected to the bus by an open collector transistor buffer.

When a transmitter wants to send a "zero", it turns on its

output transistor shorting the bus to ground. To transmit a"one" the transistor is turned off, allowing the bus to be

pulled high by the pull-up resistor. As long as all

transistors are turned off, the bus will float at a logic"I". If any transistor turns on, the bus will be pulled to

a logic "0" state.

The net result is that logic zeros have an inherent

prioriy on the bus. Because a "1" is transmitted by

"letting go" of the bus (so it will float high) while a "0"

is transmitted by actively pulling the bus low, units

31

*ref

pull-up resistor

transmitter open-l transmitter olectoA 'collector B transetortransistor tasso

buf fer buf for

Fiquro 8. Transmitter-Bus Interface

32

transmitting zeros will always win out over those sending

ones. It is this characteristic which allows transparent

contention.

The key to the idea is that every transmitter

constantly compares what it is trying to put on the bus with

what is actually there. In the event of a disagreement, the

transmitter simply stops sending and waits for the bus to

become available again. What makes this approach work is

that when any two transmitters disagree, only one of them

notices and drops off. The other one does not notice

(because it got its way on the bus) and therefore continues

its transmission. No bus time is wasted because one message

is finished without interruption.

At this point an example is helpful. Suppose two

broadcasters begin to transmit on the same clock pulse as in

Figure 9. Transmitter one attempts to send the binary

sequence 01001 while transmitter 2 sends 01101. During the

first microsecond, both pull the bus low and observe a zero

on the bus. Since that is what they wanted, they continue

to transmit. During the next microsecond, both transmitters

"let go" of the bus allowing it to float high. They each

observe a logic 1 and, satisfied, continue to transmit.

However, during the third interval, transmitter 2 releases

the bus to let it float high while transmitter 1 actively

pulls the bus low. They both observe a zero on the bus.

Since that is what number 1 wanted, it continues to

transmit. Number 2, on the other hand, does not get its

desired "one" and concludes that some other transmitter has

pulled the bus low. It therefore aborts its transmission

and waits for the bus to become available again.

The net result is that transmitter 1 successfully

completes its transmission from start to finish without

interruption while transmitter 2 aborts as soon as the two

33

transmitter 11 0 0desired output

transmitter 2 11 0 1 1desired output

actual output 0 0 -1appearing on bus

time (microseconds) 1 2 3 4 5 6

Figure 9. Bus Contention Arbitration

34

disagree. No transmission time was lost and, in fact,

transmitter 1 was never even aware of its competition.

"Transparent contention" has been achieved.

This concept works equally well for any number of

transmitters in contention. If ten of them start

simultaneously, they all send in parallel until there is a

disagreement. At that time those sending zeros win while

those sending ones drop off. The remaining transmitters

continue until the next conflict at which time still more

losers drop off. Eventually, only one transmitter is left

and it finishes its transmission, completely unaware of its

nine vanquished competitors.

This approach is based upon the assumption that no two

transmitters will ever try to send identical words at the

same time. If this coincidence should occur, each processor

would assume that its own broadcast was successful and only

one copy of the word in question would appear on the bus.

This may or may not be tolerable depending upon how

information appearing on the bus is used.

A more significant consideration is the event in which

two words being transmitted fail to disagree until near the

end of the word. At that time the losing processor would

abort its transmission, but only after having wasted its

time sending most of the word. In a system with only one

bus this is unimportant since the losing transmitter would

have nothing to do but wait for the bus anyway. But in a

system with n busses (to be discussed next), it is desirable

for a transmitter to find out if it is going to lose as soon

as possible so that it can begin searching for another bus.

If these considerations are important, there is a

simple solution. Each transmitter adds its own unique

identification code to the beginning of each message. In a

35

BE,. . . . . .. . ... ... .... . .. ... .... ,- - - A

system with 16 processors, this code would be 4 bits long.

Five bits would allow up to 32 processors, and so on. Using

this method, two processors are guaranteed to disagree

within the first 5 bits freeing the loser to seek another

bus. This aproach has the added benefit that it is possible

to determine which processor initiated each broadcast for

fault isolation purposes.

An Extension to n Buzes

The bus structure which has been discussed so far

represents a very simple way to interconnect a large number

of autonomous processors without need of a central

controller. however, a single bus system of any kind is

generally unacceptable from a reliability standpoint. At

the very least, some form of redundancy is required in order

to avoid a potential single point failure node in the

system. Also, a single serial bus has only a finite

bandwith. A large system of processors exchanging massive

amounts of data can quickly saturate such a bus. The

approach proposed in this report is ideally suited for

expansion to as many busses as are needed to meet the

reliability and throughput requirements of nearly any

system. The following paragraphs detail the implementation

and advantages of an autonomously controlled n-bus design.

Figure 10 shows the transmitter interface of a single

processor in a system with four busses (4 sets of clock and

data lines). The circuit is controlled by the box labeled

"transmission control logic." Upon receiving a "START"

signal (from the CPU), this logic instructs the "bus finder"

to "SEARCH" for a free bus. When it finds one, it locks two

data selectors and a data distributor onto the bus (using

its "BUS SELECT" lines) and signals the control logic that

it has "FOUND" a bus. The control logic then loads the

shift register with data from the CPU output buffer and

36

cdock busse m

w data busses = -

bus ,4 to 1 4 tol to 4available selector selector distributordetector

control logic *:-w register

CPU

strt

Figure 10. Transmitter Circuit

37

enables the shift register clock. Data is shifted (using

the appropriate bus clock) out through the 1 to 4

distributor onto the selected data bus using the same

open-collector transistor buffers shown in Figure 8. As

always, a one-bit comparator monitors the difference between

the shift register (desired) output and the actual output on

the selected bus. If there is ever a miscompare, an "ABORT"

is generated and the transmission control logic instructs

the bus finder to locate another bus. This process

continues until the transmitter is successful at placing its

entire word on the bus, at which time another word is

obtained from the CPU buffer and the cycle begins again.

This bus design has tremendous flexibility. Its

bandwidth is exactly four times that of a single bus and can

be expanded still further with additional busses.

Reliability is also enhanced. Because processor to bus

connections are continuously reconfiguring, selection of an

alternate bus in the event of a failure is instantaneous and

automatic.

There are, however, a few physical limitations which

remain to be resolved. The first is that there is a limit

to how many open-collector transistors can be "wire-ored" to

one bus before the sum of their leakage currents pulls the

bus low even if no transistor is on. This limit can be

increased using low leakage transistors, but can never be

totally ignored. Noise considerations on such a bus will

also require further research. For the present, the triplex

data approach described in Section IV will be relied upon to

correct for noise-corrupted transmissions.

Another consideration is the effect of propagation

delays on the output of each transmitter comparator. The

fact that desired and actual outputs match at one location

38

is no guarantee that the same holds true many feet away on

the bus. This problem is avoided for reasonable line lengths

by the manner in which data is clocked onto the bus. Data

is shifted onto the bus on the rising edge of each clock

pulse, but the comparators output is not sampled until the

falling edge. This allows one half of a microsecond for the

data to settle before it is used.

Finally, the open collector transistor implementation

is only one approach to the transparent contention concept.

Any technique where one logic state wins out over another

will work. In the case of fiber optic busses, for example,

the presence of light on the bus could be made to win out

over its absence, and so on. For the purposes of concept

demonstration in the laboratory, the wired-or approach has

been shown to work very well.

D. VIRTUAL COMMON MEMORY

Up to this point, discussion has centered around the

transparent contention concept and its physical

implementation. In this section an actual application is

presented, allowing the development of what is called"virtual common memory."

The Virtual Common Memory Concept

One of the main problems that occurs in the design of

multiprocessor systems is how to distribute and exchange

data efficiently. From a hardware standpoint, the easiest

approach is usually to connect all processors to a common

serial multiplex bus (Figure Ila). This minimizes hardware

complexity and allows expandability, but often involves a

large software overhead. This is because processors must

39

a, b.wcommon data bus common memory

"virtual memory"

Fiqure 11. Evolution of Virtual Memory

40

exchange data on a "request" or "broadcast" basis, both of

which require special handling by every processor in the

system.

In the "request" mode of operation, a processor that

needs a piece of information simply asks for it on the bus

and receives it from some other processor a short time

later. This means that each processor must constantly

monitor the bus for data requests rather than concentrating

upon the task to which it has been assigned. Because

processors must wait for much of their data, processing

inevitably takes longer than it would if all data were

available in local memory at the start of a given task.

Additional processing time is also wasted in responding to

requests for data from other system processors.

The "broadcast" method is an alternate approach to data

exchange where every piece of data is transmitted as soon as

it is produced. Each processor then selects from the bus

whatever information it needs to accomplish its current

task. This approach also requires a lot of overhead as each

processor must now constantly monitor the bus for items of

local interest.

Thus, the common serial multiplex bus, while being the

most simple and flexible from a hardware point of view, has

serious drawbacks in terms of software complexity.

The simplest and most efficient approach to

interprocessor communication from a software point of view

is a common memory containing all information required by

all processors (Figure llb). In such a system, a processor

stores its output in the common memory where it can be

instantly accessed by any other processor in the system.

When one processor needs information from another, it simply

reads it from the common memory without delay.

41

Unfortunately, what is ideal from a software standpoint

is difficult to implement in hardware. Serious contention

problems develop when more than one processor attempts to

access the same block of memory. Since each processor must

be connected to the common memory by a complete set of

address and data lines, the hardware complexity is also

large. Finally, system expandability is impaired. In a

serial bus system more processors can be added by simply

connecting them to the bus, but there is a limit to the

number of ports available in a common memory. When these

have been used, no more can be added without redesigning the

system.

So, it appears that what is good for hardware is bad

for software and vice-versa. Clearly, a scheme that could

combine the best of both approaches is highly desirable.

Virtual common memory is such a solution.

Virtual common memory is a method for making a serial

multiplex bus look like a single common memory to the system

programmer. As such, it combines the hardware simplicity ofa serial bus with the software simplicity of a common memory

(Figure llc). In the following paragraphs the physical

implementation of a virtual common memory containing all

information to be shared among processors will be

described.

Virtual Common Memory Design

This section describes the virtual common memory design

used in the CRMmFCS architecture. The essential features of

this design are shown in Figure 12. In the figure, six

processing modules are shown connected by four serial

multiplex busses of the kind described above. This

represents the system actually being constructed at the

42

Flight Dynamics Laboratory, although additional busses or

processors could have been included. The number shown is

considered to be enough to demonstrate the overall concept.

In a true shared memory architecture, the common memory

contains all information required by any processor in the

system. This information describes the entire state of the

aircraft, and the memory which contains it is called the"state information memory" (SIM). When a processor needs a

particular state variable, it accesses a well-defined

location in the SIM. When it generates a variable, it

places it in a specific SIM location where other processors

can find it. No other processing is required for complete

interprocessor communication. A complete discussion of the

SIM concept is presented in Appendix F.

In the virtual common memory design of Figure 12, each

processor is given its own copy of the SIM. To access a SIM

variable it simply looks up its own copy. To store a

variable into the SIM, a processor broadcasts it over the

bus and every processor's copy is updated simultaneously.

Since reading from a local copy is the same as reading from

a common one, and since writing to the bus is the same as

writing to a common memory, as far as any processor is

concerned there is only one "virtual" common memory being

used by everyone.

In order to make bus transmissions appear to be reads

and writes on a common memory, two special circuits were

designed. These circuits are described next.

The Transitt . The transmitter circuit (labeled

"XMIT" in Figure 12) was shown in detail in Figure 10. It

is connected to the microcomputer through a two-page buffer

(P1 and P2) which is memory-mapped to the local CPU. These

43

......................................... _- _

'tp T PI P2 rpi. P2 P' i,P 2 IP1 P2 , iP 2

local local local local local localmeoymemory memory memory memory memory

1-'icluiru 12. Ph.,2 .icAI -I npleinecnta.tioa of Virtual Niemory..

44

pages alternate functions every millisecond so that, while

one of them is being loaded by the CPU, the other is being

unloaded onto the bus. This dual buffer approach ensures

that all data is transmitted with no more than one

millisecond delay. Its application will be discussed in

greater detail later. Complete details on the transmitter

circuit are given in Appendix A.

The Recer. The receiver consists of two major

parts including a serial to parallel converting shift

register (SIPO) and a block of random access memory (RAM)

which contains a complete copy of all state information in

the system. This state information memory (SIM) is mapped

into the microcomputer's address space as a block of "read

only" memory and is accessed by the SIPO outputs as a block

of "write only" memory. Figure 13 shows how this works.

In the architecture under discussion, a word of

transmitted data is 37 bits long. It consists of four bytes

of significant data separated by a zero bit before and after

each byte (see Figure 13). A string of more than eight

consecutive "I" bits on the bus indicates that it is no

longer in use, so zero bits are included between each byte

to ensure that the bus continues to "look" busy in the event

that more than eight consecutive ones occur in the actual

data word.

Once the SIPO has been fully loaded with a 37 bit word

from the bus, the 32 significant bits (excluding the 5

separating zeros) are loaded onto the address and data lines

of the SIM RAM as follows. The first 5 bits contain the

identity code of the sending processor. These bits were

used only for quick resolution of bus contention and could

be discarded unless it is important to record who sent each

word for fault isolation purposes. (In the CRMmFCS design,

45

date bus serial-in parallel-out shift register(SIPO)

clock bus 0 byte4 01 byte3 10 byte* 01 bytel 101

16 bit j stateSim data s Information

word j6 memory jot PS2k by6 bit ram

>

Iw Id'. bi kmades fomCU

Figure 13. Receiver Block Diagram

46

e ~~ ~~, .. .. ..-.. . -i .

these bits are in fact saved in the SIM for use by

black-balling algorithms -- see Section IV-D.) The next 11

bits specify the location in the SIM to which this data word

is to be stored. They may be thought of as the name of the

SIM variable and, in this case, they allow up to 2024

different variables. Finally, the last 16 bits of the

transmission contain the value of the variable. These 16

data bits are loaded into the SIM via direct memory access

(DMA). The variable is now available for access by the CPU

whenever it is needed.

While all of this is happening, the SIPO register is

collecting the next word of data as it appears on the bus.

This, in general, begins after nine bus clock cycles (during

which the bus floats "high"). At this time every other

transmitter realizes that the bus is no longer in use

(because nine consecutive "ones" have occurred) and

contention for the bus begins again. Thirty seven

microseconds later the SIPO is again full and another DMA

cycle is executed to load its contents into the SIM. Thus,

a word is received every 9 + 37 = 46 microseconds.

In a system with n busses, there are n SIPO shift

registers requesting direct memory access to the SIM. Since

there are 46 microseconds between successive DMA requests

from any one bus, and since current high speed RAM can

handle as many as four accesses per microsecond, it is

theoretically possible for a system to have as many as 4 x

46 = 184 serial busses, each operating at 1 MHz, for a 184

MHz total system bandwidth. Such a system would allow

processors to exchange up to 184 x (1 variable/46

microseconds) = 4 million variables per second.

Of course, connecting 184 shift registers to a single

block of memory is impractical with today's technology, but

47

if the entire CPU, SIM, transmitter, and receiver were

integrated on a single chip (with only the bus lines brought

out to the pins), such an approach might well be possible.

Until then, a practical limit is about 8 busses with a

corresponding bandwith of 8 MHz. Appendix B presents a

complete description of the receiver design.

While the potential for high throughput rates is

intriguing, the real usefulness of virtual common memory is

to allow easier programming of processors connected by

serial multiplex busses. The next section shows how a

virtual common memory architecture can be used most

effectively for this purpose.

E. EFFECTIVE USE OF VIRTUAL COMMON MEMORY

Knowing how to use a virtual common memory architecture

can make a big difference in how useful the idea is. This

section discusses the application of virtual common memory

to the CRMmFCS design. This is not the only way to use

virtual common memory, but it does illustrate some important

considerations for effective use of the concept.

In the first place, it is important to schedule

transmissions carefully in order to take full advantage of

the data packing capabilities of the architecture. Figure 14

shows how transmissions are scheduled in the CRMmFCS design.

Time in the system is divided into 1 ms frames and all

processors in the system are synchronized to this frame

rate. Since each transmission is exactly 46 microseconds

long, and since the transmitters pack data onto the bus with

no wasted time in between, there is room for 1000/46 or 21

48

complete transmissions per bus per millisecond. Therefore,

when system software is being written, it is modularized

into 1 millisecond chunks (called "millimodules") and no

more than 4 x 21 = 84 transmissions are scheduled for any

one millisecond. Since there are 84 slots available in each

frame, this guarantees that every scheduled transmission

will be completed sometime within its assigned millisecond.

Each dot in Figure 14 represents a 46 bit word of data

appearing on the bus. In this example, 22 of the 84

available slots are scheduled for use. At the beginning of

each millisecond, all transmitters compete to place their

part of the scheduled variables onto the bus. Transparent

contention packs these words into one slot after the other

on all four busses until every scheduled word has been

transmitted. Then the bus sits idle until the start of the

next millisecond.

It is, of course, possible to schedule 84 transmissions

per millisecond and never have the bus sit idle. However,

for reliability reasons, it is usually wise to leave enough

unused slots to "take up the slack" in the event that one of

the busses fails. In the CRMmFCS design, 42 slots are left

unused in each frame so that the system can tolerate two bus

failures without loss of throughput.

This approach makes processor task scheduling a lot

easier. Because it is known (to the nearest millisecond)

exactly when a variable will appear on the bus, it is also

known (to the nearest millisecond) how soon a task can be

scheduled using that variable. Figure 15 summarizes this

scheduling procedure and illustrates an efficient virtual

common memory system in operation.

49

* S* S S S* S S S* S S S* 5 5 5* S S S

-4

wU

S*U)

LI)

* S 5 2U)

* 5 5 S* 5 S S* 5 S S* S S S* S S S U)

m

.~j.-4

I))

-4

* S* S S S* S S S* S S S* 5 5 5* 5 5 S

liii

50

The figure is divided into four different rows showing

where variables A, B, C, and D are scheduled to be over a

period of four milliseconds. During the first frame,

variables A and B are shown in row 1 indicating that they

are currently in the SIM and available for use. As a

result, Task 1, which computes C = A + B, can be scheduled

for this frame as shown in row 2. Once C has been computed,

it is placed in the currently active page of the transmitter

buffer (refer to Figure 12) where it remains for the

duration of the frame. Let us assume that this was page 1.

At the end of frame 1, the two transmitter pages are

switched so that page 1 is connected to the bus and page 2

(which was emptied onto the bus during frame 1) is connected

to the CPU to collect any data generated during frame 2.

Now that page 1 is connected to the bus, the transmitter

circuit broadcasts variable C in the first available slot.

This is indicated in row 4 of the figure.

It is not possible to know exactly when during frame 2

variable C actually appears on the bus. This is a random

function of when transparent contention allowed the

transmitter to gain access. But since there are more slots

than there are scheduled variables, sooner or later C will

get its chance. It is therefore guaranteed to reach the SIM

in every processor by the end of frame 2. This is indicated

by showing C in row 1 ready for use during frame 3. Task 2,

which computes D = SQRT(C), can now be scheduled for frame 3

and the entire process repeats.

Thus, one of the major goals of the design has been

accomplished: elimination of access delay to global

variables. Because data is broadcast to every processor as

soon as it is generated, it is always available in the SIM

by the time a task is scheduled to use it.

51

variables in sim Aready for use AB C

variables generated 2during this millisecond 2C=A+B D=vC

variables in buffer C Dready for broadcast 3Dvariables appearing A

on bus this millisecond C Dtime (milliseconds) 1 [ 2 1 3 1 4

Figure 15. Data Flow Assignment

52

F. SECTION SUMMARY

This section has presented an overview of the essential

hardware elements of the CRMmFCS. It has also introduced

two new concepts in interprocessor communications. The

first is called "transparent contention" and represents a

method for autonomous processors to share a common bus at

maximum efficiency without need of a central controller. It

has been shown that this approach opens up many new

possibilities for improved bandwidth while avoiding the

pitfalls of single point failures possible in many central

controller designs.

The second concept presented is called "virtual common

memory." It represents a method of minimizing the hardware

and software involved in interprocessor communications.

While not essential to the concept, transparent contention

was shown to be an ideal method for implementing virtual

common memory in many practical situations.

The next section of this report discusses how the

CRKmFCS software structure was designed to utilize the

hardware which was developed in this section.

53

SECTION IV

SOFTWARE STRUCTURE

A. INTRODUCTION

In Section III it was shown how a simple set of serial

busses could be made to look like a shared common memory to

the system programmer. Now that such a system may be

assumed to exist, it is time to show how software can be

designed to take advantage of this new architecture. This

se

new flllfffflllfff ehomhohhohhoheeehe eeeee · 2014. 9. 27. · ad-aloi 412 air force wright...

Documents