new flllfffflllfff ehomhohhohhoheeehe eeeee · 2014. 9. 27. · ad-aloi 412 air force wright...
TRANSCRIPT
-
AD-AlOI 412 AIR FORCE WRIGHT AERONAUTICAL LAOS WRIWT-PATTERSON AFS ON4 F/O 9/2A ONTINUOUSLY RECONIGURING MULTI-MICROPROCESSOR FLIGHT CONTRO-f*C (U)
MAY 61 S J LARINER, S L MAHER NUNCLASSIFIED AFAT57 NflllfffflllfffI1 EhE EEEEEEhomhohhohhohEEEEEEEEEEEEoohEImEEEohhEEEEEEE
-
AFWAL-TR-81-3070
! A CONTINUOUSLY RECONFIGURING MULTI-MICROPROCESSOR
M FLIGHT CONTROL SYSTEM
Stanley J. Larimer, Captain, USAFScott L. Ma.her, First Lieutenant, USAF
MAY 1981
Final Report for Period August 1979 to March 1981
Approved for public release; distribution unlimited.
j FLIGHT DYNAMICS LABORATORY', AIR FORCE WRIGHT AERONAUTICAL LABORATORIES
AIR FORCE SYSTEMS COMMANDWRIGHT-PATTERSON AIR FORCE BASE, OHIO 45433
1- 7 13 311
-
NOTICE
When Government drawings, specifications, or other data are used for anypurpose other than in connection with a definitely related Government pro-curement operation, the United States Government thereby incurs no respon-sibility nor any obligation whatsoever; and the fact that the governmentmay have formulated, furnished, or in any way supplied the said drawings,specifications, or other data, is not to be regarded by implication or other-wise as in any manner licensing the holder or any other person or corpora-tion, or conveying any rights or permission to manufacture use, or sell anypatented invention that may in any way be related thereto.
This report has been reviewed by the Office of Public Affairs (ASDIPA) andis releasable to the National Technical Information Service (NTIS). At NTIS,it will be available to the general public, including foreign nations.
This technical report has been reviewed and is approved for publication.
SCOTT L. MAIER, iLt, USAF STANLEY JProject Engineer Project EngineerControl Data Group Control Analysis GroupControl Systems Development Branch Control Dynamics Branch
EVARD H. FLINN, Chief RONALD 0. ANDERSON, ChiefControl Systems Development Branch Control Dynamics BranchFlight Control Division Flight Control Division
FOT E COA*ANDER
ROBERT C. ETTINGE1j Colonel, USAFChief, Flight Control Division
"If your address has changed, if you wish to be removed from our mailinglist, or if the addressee is no longer employed by your organization, pleasenotify AFWAL/FIGC, W-PAFB, OH 45433 to help us maintain a current mailinglist".
Copies of this report should not be returned unless return is required bysecurity considerations, contractual obligations, or notice on a specificdocument.AIA FORCE/56780/23 Jun. 1981 - 510
-
SECURITY CLASSIFICATION OF THIS PAGE (When Date Entered)
,- REPORT DOCUMENTATION PAGE READ INSTRUCTIONSRE R DBEFORE COMPLETING FORM
I REPORT NUMBER 2. GOVT ACCESSION NO. 3. RECIPIENTS CATALOG NUMBER
AFWAL-TR-81-3k7Q _7 ________'___ovAS ° SCk. TITLE (and Subtitle) S./ . *5.- IPl OF REPORT A PERIOD COVERED
.A CONTINUOUSLY JRECONFIGURING MULTI-MICROPROCESS Final R pe~t-
FLIGHT CONTROL SYSTEM 1 Aug 1079 -- 30 Apr ]W81 .S. PERFORMING ORG. REPORT NUMBE4
, AUTHOR(S) S. CONTRACT OR GRANT NUMBER(e)
Stanley J. Larimer, Captain, USAF -Scott L.lMaher First Lieutenant, USAF 4
9. PER;ORMING ORGANIZATION NAME AND ADDRESS SO. PROGRAN ELEMENT. PROJECT, TASK
Flight Dynamics Laboratory (AFWAL/FIGC) AREA& WRK UNIT NUMBERS
Air Force Wright Aeronautical Laboratories (AFSC) Program ement,: 62201FProj e c t : 2 403 1 1 -7 " -o.
Wright-Patterson AFB, Ohio 45433 Task: 02 Work Uni. " 44
1,. CONTROLLING OFFICE NAME AND ADDRESS 12. REPORT DATE
j'- May 18113. NUMBER o"AGES
148 . .14. MONITORING AGENCY NAME & ADDRESS(If different from Controlling Office) 15. SECURITY CLASS. (of this report)
UNCLASSIFIEDIS.. DECLASSIFICATION DOWNGRADING
SCHEDULE
16. DISTRIBUTION STATEMENT (of this Report)
Approved for public release; distribution unlimited.
'7. DISTRIBUTION STATEMENT (of the *betract entered In Block 20, It different from Report)
IS. SUPPLEMENTARY NOTES
19. KFY WORDS (Continue on reverae aide if necessary and identify by block number)
Uig'tal Control Systems Reconfiguration
Multi-Microprocessor Distributed SystemsFlight Control SystemsDistributed Control
ABSTRACT (Continue ot reverse side if necessary end Identify by block number)
Recent research at the US Air Force Wright Aeronautical Laboratories (FlightDynamics Lab) has resulted in the development of a promising microprocessor-based flight control system design. This system is characterized by a collec-tion of cooperatively autonomous distributed microcomputers interconnected byan arbitrary number of common serial multiplex busses. Each processor in thesystem independently determines its assignments using a simple algorithm that
(continued)
DD JA m7, 1473 EDITION OF I NOV 65 IS OBSOLETE NSECURITY CLASSIFICATION OF THIS PAGE (*%*. Data Entered)
* - / .
-
SECURITY CLASSIFICATION OF THIS PAGE(Wen DataEnlered)
dynamically redistributes system functions from processor to processor in anever-ending process of reconfiguration. This approach offers several benefitsin terms of system reliability, and the architecture in general incorporatesmany state-of-the-art features which promise improved system throughput, expand-ability, and above all, ease of programming
The Continuously Reconfiguring Multi-Micropro ssor Flight Control System(CRMmFCS) represents a major data point in multiprocessor control systemresearch. Promising ideas from a variety of ref rences have been included andintegrated in its design. Its laboratory impleme tation provides a demonstra-tion of these ideas for improving throughput, relibility, and ease of pro-gramming in flight control applications.
* - j"
SECURITY CLASSIFICATION OF THIS PAGE(When Dal Entered)
-
FOREWORD
The research described in this report was performedin-house at the AFWAL Flight Dynamics Laboratory during theperiod from September 1979 to March 1981. It is the resultof a joint effort between members of the Control SystemsDevelopment Branch (AFWAL/FIGL) and the Control DynamicsBranch (AFWAL/FIGC) of the Flight Control Division.
Work began in this area in late 1978 when the ControlSystems Development Branch initiated a work unit called"Multi-Microprocessor Control Elements" (24030244). Duringthis time, Lt. James E. May, Lt. Scott L. Maher, John Houtz,and Capt. Larry Tessler laid the foundation for a study ofhow growing microprocessor technology could be applied tothe problems of modern flight control. It was decided todesign and build some form of fully distributedmicroprocessor-based flight control system in order toexplore the potential problems and benefits in greatdetail.
In September 1979, Lt. Scott Maher took over asPrincipal Investigator and Capt. Stan Larimer joined theprogram as the Associate Investigator from the ControlDynamics Branch. In the months that followed, Maher andLarimer developed what has come to be known as the"Continuously Reconfiguring Multi-Microprocessor FlightControl System" architecture. Steve Coates, RichardGallivan, and Tom Molnar also provided invaluable input tothe research during this phase.
During the summer of 1980, Mr. Harry Snowball (ControlData Group Leader) and Mr. Evard Flinn (Control SystemsDevelopment Branch Chief) decided to intensify efforts todevelop this new architecture. They hired four newengineers including Bill Rollison, Ray Bortner, Mark Mears,and Stan Pruett to work on the project. They also assignedtechnician SSgt. Jeff Lyons and co-op students Dan Thompson,Russ Blake, and Bob Molnar to assist with the R&D activites.At the same time, Mr. Dave Bowser (Control Analysis GroupLeader) and Mr. Ron Anderson (Control Dynamics Branch Chief)increased FIGC support by assigning Lt. Allan Ballenger toact as an architecture and control law consultant on theproject. In 1981, Lt. Jack Crotty joined the program as asoftware designer thereby completing the CRMmFCS team.
The authors would like to thank these individuals fortheir many contributions to the success of this program.Bill Rollison was responsible for the design, development,construction, and testing of the transmitter and receivercircuitry. Ray Bortner designed the real-world interfaceprocessor and helped to develop the CRMmFCS control laws andcorresponding aircraft simulation. Mark Mears was
iii
-
responsible for the haidware and software design of theentire data collectfcn and processing network for theCRMmFCS laboratory implementation. Stan Pruett invented aunique five-port RS-232 communications controller whichallows all systems in the laboratory to communicate witheach other. He also assisted in the development of CRMmFCSprocessing module software and programmed the TRS-80 to actas controller for the entire laboratory system.
Lt. Allan Ballenger was responsible for development ofthe real-world plant simulation and overall control lawdesign. He also organized a national workshop onmulti-processor flight control architectures which provideda vital link with others working in the field. Tom Molnarserved as a technical consultant and work unit monitor forthe proaram. Lt. Jack Crotty was responsible for thedevelopment of all CRMmFCS software. Dan Thompson providedthe original design for much of the TMS-9900 software usedto implement the CRMmFCS operating system.
Rick Gallivan was responsible for the development of areal-world simulation on a Motorola 68000 microprocessor andprovided logistics support for the program. Steve Coateshandled the construction of the entire laboratory setup andh-lped to develop a real time graphics display for the pilotinterface. Jeff Lyons designed and constructed the firstworking bus termination circuit. Russ Blake and Bob Molnarprovided drafting and breadboarding support for many of thelaboratory components.
The authors would also like to express special thanksto Art Eastman and Dave Dawson for their outstanding work indesigning and producing the many figures which appearthroughout this document. Carl Weatherholt, Rudy Chapski,and Bill Adams also provided invaluable support in thelaboratory. Many thanks also go to Pam Larimer, JanRobinson, and Dave Bowser for their assistance in thepreparation of this report.
Finally, the authors wish to express their appreciationto Mr. Vernon Hoehne, Lt. Allan Balienger, and Tom Molnarfor their considerable efforts in reviewing the technicalreport.
This report covers work performed from September 1979through March 1981. It was submitted by the authors in May1981.
iv
-
TABLE OF CONTENTS
I. INTRODUCTION ......... ................... 1
II. OVERVIEW .......... ..................... 3
A. Overview of the CRMmFCS Architecture .. ..... 3
B. Design Philosophy ....... .............. 5
C. Evolution of Continuous Reconfiguration . . . 7
D. The Concept of Continuous Reconfiguration . . 14
Example of Continuous Reconfiguration . . 15
Advantages of Continuous Reconfiguration . 15
Controlling A C. R. System .... ........ 19
Requirements for C. R ... ........... ... 20
E. Section Summary ..... ............... 22
III. HARDWARE ARCHITECTURE .... .............. . 23
A. Introduction ...... ................. .. 23
B. Distributed Control of a MUX Bus ........ .. 23
Other Approaches .... ............. . 24
Improving Bus Efficiency ......... 26
A New Approach to the Problem . ...... 27
C. Transparent Contention . ........... .28
Transmission Without Contention ...... . 29
Transmission With Contention ....... 31
An Extention to n Busses .. ......... .. 36
D. Virtual Common Memory . . ........... 39
The Virtual Common Memory Concept . . .. 39
Virtual Common Memory Design ....... 42
E. Effective Use of Virtual Common Memory . . .. 48
F. Section Summary ..... ............... 53
v
-
TABLE OF CONTENTS (concluded)
IV. SOFTWARE STRUCTURE ...... ................ 54
A. Introduction ...... ................. .. 54
B. The Task Assignment Chart ... .......... 56
Application of Task Assignment Charts . . 59
Real Time Chart Execution .. ........ 61
Multi-Rate Considerations .. ........ 65
Compound Millimodules ... .......... 67
C. Autonomous Control Algorithms .. ........ 67
Autonomous Control .... ............ . 68
Volunteering .... .................. 69
Continuous Reconfiguration ........ 72
D. Reliability Considerations ... .......... . 73
E. Section Summary ..... . .............. 75
V. LABORATORY IMPLEMENTATION .... ............ 76
VI. CONCLUSION ........ .................... . 80
APPENDIX A. Bus Transmitter Design ... .......... . 82
APPENDIX B. Bus Reciever Design ... ........... . 97
APPENDIX C. Bus Termination Circuit Design ....... .. 103
APPENDIX D. Real World Interface ... ........... . 106
APPENDIX E. Data Collection Circuitry .. ........ 112
APPENDIX F. State Information Matrix (SIM) Theory . . 120
APPENDIX G. Multi-Rate Millimodule Design . ...... 135
REFERENCES ........ ...................... 140
vi
-
LIST OF ILLUSTRATIONS
1. Major CRMmFCS Architecture Components ... ....... 4
2. Virtual Common Memory ................ 6
3. Breakdown of Possible Architectures .... ........ 8
4. Fixed vs. Pooled Designs with Equal Redundancy . . 10
5. Fixed vs. Pooled Designs with Equal Numbers .... 12
6. Continuous Reconfiguration . . . .......... 16
7. Essential Architecture Elements ... .......... . 30
8. Transmitter-Bus Interface ............. 32
9. Bus Contention Arbitration .... ............ 34
10. Transmitter Circuit ................. 37
11. Evolution of Virtual Memory .... ............ . 40
12. Physical Implementation of Virtual Memory ..... ... 44
13. Receiver Block Diagram ..... .............. 46
14. Bus Transmission Scheduling .... ............ . 50
15. Data Flow Assignment ..... ............... 52
16. Approaches to Task Scheduling ... ........... . 55
17. A Generic Task Assignment Chart ... .......... . 57
18. CRMmFCS Task Assignment Chart ... ........... . 60
19. Task Assignment Table Generation .. ......... . 62
20. Executive Flow Chart ................ 64
21. The High-Rate Task Scheduling Problem ........ .. 66
22. Volunteering and the Volunteer Status Table .... 70
23. Laboratory Implementation .... ............. . 77
vii
-
LIST OF ILLUSTRATIONS (concluded)
A-i. Transmitter Block Diagram ................ 83
A-2. Transmission Format ....... ............... 85
A-3. Transmitter Circuit ...... .............. 90
B-I. Receiver Functional Diagram ... ........... . 98
B-2. Receiver Block Diagram .... ............. . i. 100
C-i. Smart Bus Configuration .... ............. 104
D-1. Real World Interface Processors .. ......... . 109
E-1. Bus and SIM Monitors ..... ............... .. 115
E-2. Data Collection and Post-Processing Circuitry 118
F-i. The State Information Matrix ... ........... . 121
F-2. The State-Time Form ..... ............... 123
F-3. State Information Matrix Processing . ....... .. 125
F-4. Single Copy / Multi-Access Architecture ..... .. 129
F-5. Multi-Copy / Multi-Access Architecture ....... .. 130
F-6. Distributed / Shared Information Architecture . . 131
F-7. Broadcast Data Vector Architecture .. ........ .. 132
F-8. Virtual Memory Emulation .... ............. . 133
F-9. Virtual Memory Equivalent .... ............ 134
G-1. Possible Rates for a Ten-Milliframe TAC ..... .. 136
viii
-
SECTION I
INTRODUCTION
The use of microprocessors in flight control
applications is a subject which has received much attention
in recent years. Microprocessor technology is growing
rapidly and there is a strong desire to take advantage of
it. This report presents the results of several years of
research performed at the Flight Dynamics Laboratory into
how microprocessors might best be used for flight control
applications.
Microprocessors appear to have two major areas of
application to the flight control problem. These may be
termed "the low end" and "the high end." In the low end
approach, microprocessors are distributed around the system
in a dedicated fashion wherever a small amount of processing
power is needed. In this mode, microprocessors are
relegated to the role of "smart" peripherals to some central
computer system. This application has been demonstrated
with considerable success in many currently flying aircraft.
By performing many of the repetitive, time intensive
functions such as keyboard monitoring, sensor preprocessing,
display generation, and inner-loop control, microprocessors
can relieve the central computer of much of its
computational load so that it can concentrate on what it
does best: number crunching, system management, and
outer-loop control. Since the low end application has been
demonstrated in many operational systems and its utility is
generally unquestioned, it will not be discussed further in
this report.
The "high end" application for microprocessors is a
much newer field of research. It is concerned with how to
use microprocessors as a distributed, multi-processing
-
replacement for the central computer itself. Since the goal
of this research program was to investigate the potential of
microprocessors for flight control and not to design a
working model for near-term application (where a more
conservative approach would be necessary), it was decided to
attempt the most ambitious (and most promising) application
of microprocessor technology: "a fully decentralized,
continuously reconfiguring, self-healing, adaptive, pooled
microprocessor-based flight control system." The result was
the development of an architecture known as the
"Continuously Reconfiguring Multi-Microprocessor Flight
Control System" (CRMmFCS).
This report presents the results of research to date in
the development of the CRMmFCS. Section II gives an
overview of the architecture and the philosophy behind it.
Section III presents a detailed discussion of the hardware
aspects of the architecture while Section IV does the same
for software. Both sections represent virtually stand-alone
discussions of their respective topics at a level which is
as thorough as possible without sacrificing readability.
Technical details which are of use to the reader only after
a complete study of the architecture have been removed to
the appendicies. Finally, Section V describes the actual
laboratory implementation and test procedure which will be
used to demonstrate the architecture. The results of these
tests will be published in a subsequent technical report.
2
-
SECTION II
OVERVIEW
A. OVERVIEW OF THE CRMmFCS ARCHITECTURE
The CRMmFCS design centers around a system of
autonomous microprocessors connected by a common set of
serial multiplex busses. These processors operate in a
pooled configuration where any processor can perform any
task at any time. Furthermore, task assignments are
continuously redistributed among all processors in a
never-ending process of reconfiguration. If a processor
fails in the system, it is simply left out of the next
reconfiguration cycle and the system continues to operate as
if nothing has happened. All of this is accomplished
without use of a central controller.
A diagram of the architecture is shown in Figure 1. In
the figure, six processing modules are shown connected to a
set of four common data busses. Each data bus consists of
one clock and one data line and information is transferred
between processors using a simple serial multiplexing
scheme.
Processors in the system compete for access to the
busses without central traffic control using a technique
called "transparent contention." Transparent contention is
a scheme which allows any processor to talk on any bus at
any time. In the event of a "collision" between two
transmissions only one message survives while the remaining
one is automatically retransmitted as soon as the bus is
free. Transparent contention provides one hundred percent
efficient bus utilization, eliminates most communications
overhead, and completely avoids the need for a central
controller. It is discussed in detail in Section III.
3
-
E C,
Eu.C,
~ft - - -I-IL C.)
MBA.2 :...
0-
-
In addition to competing for the bus, processors also
compete for the right to perform tasks in the system.
During every time frame all processors "volunteer" for the
tasks to be done in the following frame. These tasks are
then divided by mutual agreement among all functioning
processors in the system, again without needing a central
controller.
Because tasks to be performed are redistributed at the
beginning of each frame, the task any particular processor
performs is changing all the time. The system is said to be"continuously reconfiguring." Continuous reconfiguration
has a number of important advantages over other approaches.
These include automatic recovery in the event of a failure,
constant spare checkout because no unit acts as a spare all
the time, latent fault protection, and zero reconfiguration
delay. A complete discussion of continuous reconfiguration
is presented in Section IV.
Finally, although processors communicate only via
simple serial busses, the architecture is configured so that
they appear to share a single common memory as shown in
Figure 2. This virtual common memory contains all
information available about the state and environment of the
entire system. Processors obtain the information needed for
any task from the virtual common memory and place their
results back there for use by other processors. Complete
details on how a set of serial busses can be made to act
like a common memory are presented in Section III.
B. DESIGN PHILOSOPHY
Before beginning a detailed discussion of the
Continuously Reconfiguring Multi-Microprocessor Flight
5
-
Figure 2. Virtual Common Memory
6
-
Control System (CRMmFCS) it is desirable to briefly discuss
the design goals and philosophy which lead to this
architecture. The original objective of this in-house effort
was to develop an Air Force understanding and capability in
the area of multi- microprocessor flight control systems. It
was determined that a high risk - high payoff approach could
be taken in an effort to advance the state-of-the-art while
achieving the original objective. The approach taken was to
trade off low cost hardware for simplified software and to
distribute system control to its extreme in order to study
the extent to which its potential advantages could be
achieved. Other goals were to reduce overall hardware,
software, and life cycle costs of flight control systems
while maintaining high reliability and fault tolerance.
Design considerations also included expandability for
integrated control applications and reconfigurability to
meet future self-healing requirements.
C. EVOLUTION OF CONTINUOUS RECONFIGURATION
Figure 3 shows a breakdown of some of the possibilities
that exist for implementing digital control systems in
general. Starting at the top of the figure, it may be seen
that digital control systems can be broken down into either
uni-processor of multi-processor systems. The distinction
here is not so much whether there is one or more than one
processor in the system, but rather whether there is more
than one processor performing different functions. A system
with, say, four processors performing identical functions
for redundancy would, for the purposes of this report, be
considered a uni-processor system since its effective
throughput is that of a single processor.
In this study, the multi-processor approach was chosen
for two reasons. First, since state-of-the-art
7
-
digitalcontrolsystems
multi.
11processor
Epro":ssor I
F-'fixed' pooled'
cold hot ontinuousspares spares recon-
figuration
Fiqure 21.. Breakdown of Possible Architec ures
-
microprocessors have somewhat lower throughput than their
mini-computer counterparts, it is unlikely that a single
microprocessor would be able to handle the workload of a
modern flight control system. Since we are constrained to
use microprocessors by the goals of this investigation, it
would seem that a multiprocessor architecture is mandatory
for flight control applications. Of course, with the rate
at which the field is developing, there may soon be
microprocessors that can handle the required workload in a
uni-processor configuration. But with the development of
ever more sophisticated estimation, parameter
identification, and self-optimization algorithms and
increasingly demanding command and control functions, the
required workload may go up at an even faster rate. Since a
multi-processor architecture will always be able to improve
on the throughput of a uniprocessor architecture of the same
state-of-the-art, and since the only thing growing faster
than computer technology is the size of the problems to be
solved, there will always be a need for multi-processor
configurations. The need for better methods to construct
such architectures is the second reason why the
multi-microprocessor approach was selected for this
investigation.
Given then that a multi-microprocessor system is to be
implemented, there are two possible ways in which their
functions can be assigned. As shown in Figure 3, these ways
are "fixed" assignment of processing resources, where the
function of each processor is permanently assigned, and
"pooled" processing resources, where processors are
dynamically assigned to each function as the needs arise.
The fixed assignment is inherently simpler to implement and
is adequate for many applications. However, in systems
requiring great reliability and minimum hardware, the pooled
approach offers distinct advantages.
9
-
fixed *s:par* V, pooled
Figure 4. Fixed vs. Pooled Designswith Equal Redundancy
10
-
Figure 4 demonstrates the advantages of the pooled
approach in systems requiring a large number of processing
tasks. The system shown requires six different tasks to be
performed concurrently and that a quad level of redundancy
be maintained. A fixed assignment implementation of this
system (Figure 4a) requires that six processors be
permanently assigned to the six tasks and that three spares
be permanently assigned to each processor. The net result
is a 24 processor system. Figure 4b shows an equivalent
system using a pooled architecture. This system still
requires at least six processors to perform the six
concurrent tasks but the number of spares is substantialy
reduced. This is because, since any processor can be
dynamically assigned to any task, the three spares are able
to cover for any three failures in the system. Thus both
systems can tolerate any three random failures but the
pooled architecture requires significantly fewer processors.
It is clear that, as the number of tasks to be performed
increases, this difference becomes even more important.
The argument just presented applies only to systems
where failure detection is provided externally and the only
requirement is to replace a faulty processor with a spare.
For systems which must detect and locate their own failed
processors as well as replace them, the distinction is not
so much in the number of processors saved as it is in the
level of redundancy provided.
For example, Figure 5 shows fixed and pooled
architectures each having 24 processors and providing triad
voting for fault detection and isolation. They each have a
complement of six spares for redundancy purposes, and both
detect and correct faults by comparing the results of Al,
A2, and A3 or Bl, B2, and B3, etc. and replacing the
disagreeing processor with a spare. Unfortunately, in the
fixed architecture when a failure has occurred in a given
11
-
* fixedAl C D3 B3 E1 E spare
F 3 A2 D, 1, *3 pooledr any spare
C2 D 2 B2 A,
Figure 5. Fixed vs. Pooled Designswith Equal Numbers
12
-
task group, no further failures can be tolerated in that
group because its only available spare has been used up.
Thus, the entire system can tolerate only one random failure
with guaranteed integrity of all six functions. The pooled
architecture, on the other hand, can tolerate up to six
random failures because all of its spares are free to be
assigned wherever they are needed throughout the system.
Because of its many benefits and great versatility, the
pooled processor approach was selected for use in this
study. Given that decision, and referring again to Figure 3,
there are at least three ways of implementing a pooled
processor architecture. These three approaches include
"cold spares", "hot spares", and "continuous
reconfiguration." In each case a pool of spare processors
is maintained to replace failed processors. The difference
is in the way that the spares are brought on line.
Cold spares are the simplest approach to the problem.
A pool of idle spares is maintained and, when a failure
occurs, one of the spares is loaded with whatever data and
software it needs to perform the missing function and is
then brought on line. This is an acceptable method when the
system involved is not real-time and a brief interruption
during reconfiguration is unimportant. Unfortunately, in
real-time systems the delay involved in "warming up" a cold
spare is often unacceptable and may even be disasterous.
An obvious solution to the cold spare problem is to
maintain a pool of "hot spares." That is, a pool containing
spares which are already loaded with all the software and
current data needed to come on line immediately after a
failure is detected. One hot spare is maintained for each
function in the system and the remaining spares are left"cold." When a hot spare switches on line, a cold spare is
13
-
"warmed up" to replace it so that a hot spare is maintained
for every function as long as the supply of cold spares
lasts.
The hot spare approach is a great improvement over cold
spares and is entirely adequate for most applications. Its
chief drawback is the large number of spares required to
ensure that every function has its own hot spare. Once a
processor has become a hot spare it is essentially dedicated
to one function. This is contrary to the goal of truly
pooled resources. In addition, although the switch-in time
is much improved, reconfiguration is still treated as an
emergency requiring special processing and introducing
delays and reconfiguration transients. What is needed is an
approach which requires a minimum number of spares, produces
no reconfiguration delays, and avoids the dedication of
spares to specific functions. Continuous reconfiguration is
such an approach.
D. THE CONCEPT OF CONTINUOUS RECONFIGURATION
Continuous reconfiguration is defined as a scheme
whereby the tasks to be performed in a multi-processor
system are dynamically redistributed among all functioning
processors at or near the minor frame rate of the overall
system. This approach allows continuous spare checkout,
latent fault protection, and elimination of failure
transients due to reconfiguration delay. By treating
reconfiguration as the norm rather than the exception,
failures can be handled routinely rather than as
emergencies, resulting in predictable failure mode behavior.
Using this approach, it is projected that the need for
unscheduled system maintenance may be greatly reduced.
14
-
Example Of Continuous Reconfiguration
An example of what is meant by continuous
reconfiguration is shown in Figure 6. A system of nine
processors is shown performing six different tasks, A thru
F, during three consecutive time frames. During the first
time frame processor 1 is doing task B, processor 2 task D,
processor 3 is a spare, and so on. In continuous
reconfiguration the tasks are redistributed among the
processors at the beginning of every time frame. For
example, in the second time frame , there is an entirely
different assignment of tasks to the processors. This
reassignment is accomplished by having all of the processors
that are currently healthy in the system compete for task
assignments. If a processor fails during any time frame, it
is no longer able to compete for task assignments and is
thereby automatically removed from the system. In Figure 6,
if processor 4 failed during the second time frame, then
during the next frame, it would not be able to compete for
task assignment. The six tasks which need to be done are
taken by healthy processors and the two remaining processors
become spares (Figure 6c). In other words, a failed
processor simply disappears from the system without any
other processors being aware that it is gone.
Assuming for the moment that it is possible to
implement such a system efficiently, this scheme presents a
number of distinct advantages. These advantages will be
discussed next.
Advantages gf Continuous Reconfiguxatign
The primary motivation for developing a continuous
reconfiguration scheme is to allow a multi-processor system
to detect and recover from random failures with no effect on
its performance. This is a major problem with
15
-
6a. Time Frame 1
6b. Time Frame 2
6c. Time Frame 3
Figure 6. Continuous Reconfiguration
16
-
recontiguration in general since the process of shutting
down a failed processor and starting up a spare almost
invariably causes a short delay during which the output of
the system is incorrect. This period of time is called a
reconfiguration delay and the result is a failure transient
which can be disasterous at the system output. The main
reason for these delays is that most reconfigurable systems
treat failures as emergencies requiring special actions
which take time. In a continuously reconfiguring system,
reconfiguration is the norm, not the exception. Task
reassignment is regularly scheduled at frequent intervals so
that when a failure does occur the system takes it in stride
without missing a beat.
Figure 6c shows how this works. In the figure,
processor 4 has failed but the rest of the system doesn't
notice it. The task that processor 4 was performing in the
previous frame (Figure 6b) has been automatically reassigned
to some other processor and the only effect is the net loss
of one spare. There has been no reconfiguration delay and
therefore no transients due to reconfiguration. This is the
first major advantage of continuous reconfiguration.
Eliminating reconfiguration delay by itself can not
guarantee that there will be no failure transients at the
output. A second source of these transients is failure
detection delay. For example, in Figure 6 processor 4 may
have been generating bad data for some time before its
failure was detected in frame 2. Without continuous
reconfiguration this stream of bad data would go to a single
output device (rudder, aileron, display, etc.) causing a
significant transient in that particular device. With
continuous reconfiguration, the bad data goes to a different
device every frame depending upon which task the processor
is doing at that time. Since real world dynamics are
usually much slower that the computer's frame rate, the
17
-
aircraft will simply not respond to a single sample of bad
data. By moving the bad data around from surface to
surface, failure effects can be kept insignificant until
failure detection occurs. This dispersion of failure
effects is the second major advantage of continuous
reconfiguration.
A discussion of how to rapidly detect and isolate
failures and how to prevent AUn3y bad data from reaching the
system outputs will be presented in Section III when the
triad structure is introduced.
A third advantage of the continuous reconfiguration
approach is latent fault protection. Latent faults are a
class of faults that are inherently undetectable because
they produce no noticeable change in system performance or
output. This would seem to be no cause for alarm since a
fault which produces no error would appear to be harmless.
However, a latent fault can be very dangerous if it impairs
the system's ability to tolerate subsequent failures. For
example, if processor K fails in such a manner that its
outputs are correct but it is no longer able to check
processor D, then the system will continue to function
normally while the fault in K remains unobservable. if
processor D should fail, and the system is depending upon K
to detect it, a catastrophic system failure may result. The
continuous reconfiguration scheme avoids the problem because
the processor responsible for checking any other processor
changes with every frame. Thus, no dangerous combination of
failures is allowed to exist for more than one frame at a
time. The possibility of a "deadly embrace" between two
partially failed processors is also avoided.
Finally, continuous reconfiguration allows the constant
checkout of all processors because no processor serves as a
spare for more than one frame at a time. In an ordinary
18
-
recontigurable system, where certain processors are always
spares, there is a danger that one of the spares will fail
before it is needed. If this happens, a disaster may result
when the failed spare is used in an emergency. The problem
is analogous to changing a tire and discovering that the
spare is flat. By constantly "rotating the tires" in a
continuously reconfiguring system, failures in any processor
can be detected as soon as they occur.
In this section it has been shown that there are four
major advantages to the continuous reconfiguration approach.
These include (1) zero reconfiguration delay, (2) dispersion
of failure effects, (3) latent fault protection, and (4)
continuous spare checkout. Because of these advantages and
their potential contribution to system reliability,
continuous reconfiguration was selected as the method to be
used for managing the pooled multi-microprocessor
architecture developed in this program. The next section
looks at some of the problems involved in implementing a
continuously reconfiguring system.
Controlling A Continuously Reconfiguring System
A unique approach was taken for controlling the
continuously reconfiguring multi-microprocessor flight
control system. The traditional approach would have been to
have a central controller in charge of assigning tasks,
handling reconfiguration and controlling bus access.
Unfortunately, a central controller introduces the
possibility of a single point failure in the system
requiring redundancy incompatible with the architecture and
reducing the reliability of the continuous reconfiguration
concept.
An alternative to a central controller is the
autonomous control approach. This is a scheme whereby each
19
-
processor independently determines its own next task based
upon the current aircraft state. This can be better
understood by using an analogy. Like the traditional
centrally controlled computer architecture, a company has a
president who has several vice-presidents working for him.
The president has access to all information concerning the
states of the company and an understanding of how the
company should function. He uses this knowledge to allocate
tasks to the vice-presidents and arbitrate any disagreements
that may arise between them. Autonomous control is analogous
to replacing each of the vice-presidents with a clone of the
president. The vice-presidents are now capable of making
the same decisions that the president would have made under
the same circumstances, since each has access to the same
data and would go through the same decision making process
that he would. The need for the president has been
eliminated and he has been replaced by autonomous
vice-presidents. This approach is not practical in the human
world because no two humans think alike. In the computer
world, however, it is a realizable possibility.
ReQuiremenf9__C9= ous Reconfiguration
In order to make continuous reconfiguration of
autonomously controlled processors possible, several
requirements must be satisfied. These requirements include
well-defined task assignment rules, availability of all
system state information to all processors, availability of
all software to every processor, and an efficient bus
contention scheme. The methods used to meet each of these
requirements in the laboratory implementation are covered in
detail later in this report.
The first requirement is for a set of well-defined task
assignment rules. Each of the processors must have an
efficient means of determining the next task that it is
20
-
required to do. There must not be an opportunity for any
processor to conflict with other processors in the system
and cause system failures. The task assignment rules are a
function of the operating system software and are discussed
in detail in Section IV.
A second requirement is that all processors must have
all software. In order for a processor to be capable of
doing any system task at any point in time, it must have the
software available to do the task. This may seem
unrealistic at first but the trends in memory technology
indicate that memory may be expected to double in density
several times in the next five years while its cost
continues to decrease. This trend makes supplying all
software to every processor a reasonable exchange for the
benefits offered by the CRMmFCS.
A third requirement of this system is that all
processors must have all data. Since any processor must be
capable of doing any task at any point in time, each
processor must have access to all data concerning the
present state of the aircraft. This requirement has been
met by the development of a virtual common memory
architecture which allows every processor to access any
piece of data by what appears to be a simple read from a
shared common memory. This concept will be discussed in
detail in Section III.
Finally, if all processors are to operate independently
and yet share the same set of data busses for communication,
some method must be found for them to agree on who can talk
on a bus at any given time. Since central control of any
kind is not allowed in a fully distributed architecture,
this must be done without use of a central bus controller.
The scheme selected must also be very efficient since bus
bandwidth will be at a premium in systems with many
21
-
processors. This requirement for an efficient, autonomous
bus contention scheme was satisfied through a new approach
called "transparent contention." It will also be discussed
in Section III.
E. SECTION SUMMARY
This section has presented an overview of the CRMmFCS
architecture and the philosophy behind it. The concepts of
continuous reconfiguation, autonomous control, transparent
contention, and virtual common memory have also been
introduced. In the next two sections, these ideas will be
discussed thoroughly from both hardware and software points
of view. In the process, every major component of the
CRMmFCS system will be described in enough detail to give
the reader a complete understanding of the overall design.
Numerous references to the appendices will be made along the
way to aid the interested reader in an even more detailed
study of the architecture.
22
-
SECTION III
HARDWARE ARCHITECTURE
A. INTRODUCTION
The CRMmFCS architecture consists of a collection of
autonomous microcomputers interconnected by a set of serial
multiplex busses so that they appear to share one common
memory through which they communicate. This section
presents the details of how the system was designed from a
hardware point of view. Section IV will address the same
subject from a software perspective.
One of the main requirements of the CRMmFCS design was
the elimination of any form of central control. This meant
that processors had to independently determine their own
task assignments and that some means was required to manage
communication without a bus controller. The problem of task
selection was solved with software in the CRMmFCS and is
discussed in Section IV. Autonomous bus control, on the
other hand, had a convienient hardware solution. It is
therefore discussed in this section.
The CRMmFCS data bus is fundamental to the entire
architecture. The need for an autonomously controlled bus
which would act like a common memory influenced the design
of every system component. For this reason, the hardware
elements of the architecture will be discussed in terms of
their relationship to the global bus design.
B. DISTRIBUTED CONTROL OF A MULTIPLEX BUS
One of the most fundamental questions which must be
asked when designing a system of autonomous processors is
23
-
how they will communicate with each other. Direct
connection between a large number of processors is clearly
impractical since the number of lines required for n
processors is n(n-l)/2, (which rapidly becomes very large).
A common serial multiplex bus is a more reasonable
alternative, but it introduces the problem of bus traffic
control: How does one resolve processor contention for the
bus without resorting to a central controller? This section
presents a promising solution to the problem.
Othe.rArproaches
In order to place this new solution in proper
perspective, it is helpful to briefly review several
existing bus control schemes. Three such schemes will be
discussed including a well known central control approach
and two experimental distributed control methods.
The classical central control approach is the MIL-STD-
1553 class of busses. Using this scheme, each bus is set up
with a central controller and every processor in the system
is considered to be a "remote terminal." Any given
processor can talk on the bus only when instructed to do so
by the central controller. This provides secure and
flexible control of global bus resources, but has a number
of limitations. First, a processor wishing to transmit must
wait until the controller gives it permission, resulting in
some inherent throughput delay. Second, there is even more
delay involved for one processor to obtain data from
another. This is because it must request the data through
the controller and wait until the other processor receives
the request, looks up the answer, and sends it back.
Finally, the system does not make efficient use of bus
bandwidth because part of the available transmission time is
used up in the overhead of bus control. Thus, the 1553 bus
24
-
is simple and reliable but somewhat limited in performance.
Because it requires a central controller, it also violates
the assumed goal of a fully distributed system design.
It should be mentioned in passing that there are
1553-based designs which claim to implement distributed
control (Reference 3). Such claims are true only in a
limited sense. While p control of the bus may be
distributed in such systems, at any given time there is
still only one central controller operating in the
traditional command/response mode. For the purposes of this
report, distributed control will imply free and open
competition for the bus under a set of rules which all
participants obey. At no time is such contention arbitrated
at any central location, even if the location does move
around the system.
There are at least two examples of truly distributed
bus control schemes already in existence. The following
paragraphs summarize some of the interesting features of
each approach but no attempt will be made to cover them
comprehensively. The reader may consult the indicated
references for further details.
The first approach, found in the University of Hawaii
"Aloha" architecture (Reference 4), allows processors to
transmit on the bus any time it is available. In the event
that more than one processor starts at the same time, a
"collision" is said to have occurred and they all stop
sending immediately. Each then waits a slightly different
interval before attempting to retransmit. The one which
"times out" first gains access to the bus and completes its
message while all other processors wait until the bus is
once again available.
25
-
This approach avoids the need for a central controller,
but has the limitation that some time is wasted while
colliding processors "time out." This becomes serious when
demand for the bus is high and collisions are frequent.
Another approach to distributed bus control is used by
Honeywell (Reference 5). In this approach, all processors
take turns using the bus for a fixed amount of time in a
rotating fashion. If a processor has data to transmit, it
waits for its turn and then sends it all in a burst of some
maximum number of words. If it has nothing to transmit when
its turn arrives, it sends a null word and the system moves
on to the next transmitter. This is a fairly efficient
scheme, although some time is wasted for null messages. The
only real difficulty is keeping track of whose turn it is.
Improving Bu s Effiiency
While each of the approaches mentioned above has been
made to work effectively, there are still a number of things
which can be done to improve bus utilization efficiency.
These possibilities include:
(1) Scheduling transmissions to avoid periods of bus
inactivity or overload.
(2) Forming a queue of data to be transmitted in
every processor so that every available micro-
second on the bus is in use.
(3) Making sure there is no wasted time between
transmissions.
(4) Making sure there are no wasted transmissions
due to collisions.
26
-
(5) Transmitting data when it is generated instead of
waiting until it is needed (thereby reducing
access delays).
These five criteria served as design guides for the
approach to be presented. In the remainder of this report,
it will be shown how these goals have been achieved using
the concepts of "transparent contention" and "virtual common
memory."
A New Approach to the Problem
This section describes a new approach to autonomous bus
control designed to meet the goals listed above. The
following is an overview of how the idea works.
Time on the bus is divided into a series of consecutive
intervals (slots) that are exactly one transmission word
long (32 to 46 bits, depending on word format). At the
beginning of each new slot, all processors compete to fill
the slot with a word of data. The resulting massive bus
collision is then resolved using "transparent contention."
Transparent contention is a scheme which allows collisions
to occur on the bus in a manner such that only one of the
colliding messages survives. All other messages are
automatically suppressed without wasting a single bit of
transmission time. As a result, the slot is filled with one
and only one word and competition moves on to the next
available interval.
As long as there is data available to transmit, this
approach packs data onto the bus with absolutely maximum
density. No time is wasted during transmissions and no time
is wasted between them. One hundred percent efficient bus
utilization has been achieved.
27
-
In order to ensure that there is always data available
for transmission, each processor maintains a queue of words
to be transmitted. As each new piece of data is generated,
the processor places it into a first-in first-out buffer
(FIFO) and "forgets about it." A special transmitter
circuit then emptys the FIFO onto the bus by competing for
time slices with all other transmitters in the system. This
frees the processor from transmission considerations and
ensures a constant flow of data onto the bus.
There are, of course, potential problems with this
approach. If data is not generated fast enough, it is
possible for all buffers to become empty resulting in unused
time slots on the bus. This is of no concern unless there
are other times when too much data is generated resulting in
backlogs and throughput delays. It is therefore important
to schedule data generation in the system such that an even
rate of transmission is maintained. A technique for
scheduling data flow is presented later in this section.
All that remains now is to explain the details of
transparent contention. The following section discusses how
it can render bus collisions harmless. Subsequent sections
will present details on how to build and use such a system.
C. TRANSPARENT CONTENTION
This section presents the theory of operation behind
transparent contention. While almost any bus configuration
can make use of the idea, one specific design was chosen
because of its ease of implementation in the laboratory.
This design is covered first in order to clarify subsequent
discussion of the transparent contention concept.
28
-
The approach selected is nothing new. It amounts to
nothing more than clocking data out of one shift register
across a serial bus and into another. What jk unique is
the manner in which this process is controlled to prevent
conflicts on the bus between contending transmitters. In
order to understand this process, it is helpful to review
how a single transmitter operates when there is no
competition from other transmitters.
Transmission Without Contention
The essential elements of the bus architecture are
shown in Figure 7. In the figure, three processing modules
are shown interconnected by a common serial bus made up of a
data line and a clock line. Each processing module consists
of an ordinary microcomputer with two I/O devices including
a broadcaster (B) and a receiver (R). These devices use the
signal on the clock bus to shift data on to and off of the
data bus respectively. The box labeled "T" in the figure is
a bus termination circuit which generates the clock signal
and terminates the bus properly (See Appendix C).
Using this simple bus structure, a word of data is
transmitted in the following manner. The processor wishing
to transmit first places its information in its local
broadcaster FIFO. If the bus is available (as we assume in
this section), the broadcaster immediately latches the FIFO
output into a serial shift register and shifts it onto the
data bus with each positive-going edge of the bus clock. On
each negative-going edge, a bit on the bus is shifted into
receiving shift registers in every processor. From there,
the complete word is moved into local memory in each
processor using direct memory access. This technique will
be discussed later in the section on "Virtual Common Memory
Design."
29
___________777-7_
-
clockT data
B R B R IBY R]
micro micro micro
Figure 7. Essential Architecture Elements
30
-
Transmission With Contention
The question now arises, "What happens when more than
one processor wishes to use the bus at the same time?" The
answer is simply, "one of them wins." Exactly which one
wins is determined by a special logic circuit in each
transmitter which resolves the conflict. Its operation is
described below.
In the first place, access to the bus is granted on a
first-come, first-served basis. While one processor is
actively using the bus, a logical BUSY signal is maintained
which prevents any other processor from initiating a
broadcast (See Appendix A). This eliminates many
conflicts, but sooner or later more than one transmitter
will begin using an available bus on the exact same clock
pulse. When this happens, some other method is required to
resolve the contention.
The solution is found by observing what actually
happens when two transmitters put data on the bus at the
same time. As shown in Figure 8, each transmitter is
connected to the bus by an open collector transistor buffer.
When a transmitter wants to send a "zero", it turns on its
output transistor shorting the bus to ground. To transmit a"one" the transistor is turned off, allowing the bus to be
pulled high by the pull-up resistor. As long as all
transistors are turned off, the bus will float at a logic"I". If any transistor turns on, the bus will be pulled to
a logic "0" state.
The net result is that logic zeros have an inherent
prioriy on the bus. Because a "1" is transmitted by
"letting go" of the bus (so it will float high) while a "0"
is transmitted by actively pulling the bus low, units
31
-
*ref
pull-up resistor
transmitter open-l transmitter olectoA 'collector B transetortransistor tasso
buf fer buf for
Fiquro 8. Transmitter-Bus Interface
32
-
transmitting zeros will always win out over those sending
ones. It is this characteristic which allows transparent
contention.
The key to the idea is that every transmitter
constantly compares what it is trying to put on the bus with
what is actually there. In the event of a disagreement, the
transmitter simply stops sending and waits for the bus to
become available again. What makes this approach work is
that when any two transmitters disagree, only one of them
notices and drops off. The other one does not notice
(because it got its way on the bus) and therefore continues
its transmission. No bus time is wasted because one message
is finished without interruption.
At this point an example is helpful. Suppose two
broadcasters begin to transmit on the same clock pulse as in
Figure 9. Transmitter one attempts to send the binary
sequence 01001 while transmitter 2 sends 01101. During the
first microsecond, both pull the bus low and observe a zero
on the bus. Since that is what they wanted, they continue
to transmit. During the next microsecond, both transmitters
"let go" of the bus allowing it to float high. They each
observe a logic 1 and, satisfied, continue to transmit.
However, during the third interval, transmitter 2 releases
the bus to let it float high while transmitter 1 actively
pulls the bus low. They both observe a zero on the bus.
Since that is what number 1 wanted, it continues to
transmit. Number 2, on the other hand, does not get its
desired "one" and concludes that some other transmitter has
pulled the bus low. It therefore aborts its transmission
and waits for the bus to become available again.
The net result is that transmitter 1 successfully
completes its transmission from start to finish without
interruption while transmitter 2 aborts as soon as the two
33
-
transmitter 11 0 0desired output
transmitter 2 11 0 1 1desired output
actual output 0 0 -1appearing on bus
time (microseconds) 1 2 3 4 5 6
Figure 9. Bus Contention Arbitration
34
-
disagree. No transmission time was lost and, in fact,
transmitter 1 was never even aware of its competition.
"Transparent contention" has been achieved.
This concept works equally well for any number of
transmitters in contention. If ten of them start
simultaneously, they all send in parallel until there is a
disagreement. At that time those sending zeros win while
those sending ones drop off. The remaining transmitters
continue until the next conflict at which time still more
losers drop off. Eventually, only one transmitter is left
and it finishes its transmission, completely unaware of its
nine vanquished competitors.
This approach is based upon the assumption that no two
transmitters will ever try to send identical words at the
same time. If this coincidence should occur, each processor
would assume that its own broadcast was successful and only
one copy of the word in question would appear on the bus.
This may or may not be tolerable depending upon how
information appearing on the bus is used.
A more significant consideration is the event in which
two words being transmitted fail to disagree until near the
end of the word. At that time the losing processor would
abort its transmission, but only after having wasted its
time sending most of the word. In a system with only one
bus this is unimportant since the losing transmitter would
have nothing to do but wait for the bus anyway. But in a
system with n busses (to be discussed next), it is desirable
for a transmitter to find out if it is going to lose as soon
as possible so that it can begin searching for another bus.
If these considerations are important, there is a
simple solution. Each transmitter adds its own unique
identification code to the beginning of each message. In a
35
BE,. . . . . .. . ... ... .... . .. ... .... ,- - - A
-
system with 16 processors, this code would be 4 bits long.
Five bits would allow up to 32 processors, and so on. Using
this method, two processors are guaranteed to disagree
within the first 5 bits freeing the loser to seek another
bus. This aproach has the added benefit that it is possible
to determine which processor initiated each broadcast for
fault isolation purposes.
An Extension to n Buzes
The bus structure which has been discussed so far
represents a very simple way to interconnect a large number
of autonomous processors without need of a central
controller. however, a single bus system of any kind is
generally unacceptable from a reliability standpoint. At
the very least, some form of redundancy is required in order
to avoid a potential single point failure node in the
system. Also, a single serial bus has only a finite
bandwith. A large system of processors exchanging massive
amounts of data can quickly saturate such a bus. The
approach proposed in this report is ideally suited for
expansion to as many busses as are needed to meet the
reliability and throughput requirements of nearly any
system. The following paragraphs detail the implementation
and advantages of an autonomously controlled n-bus design.
Figure 10 shows the transmitter interface of a single
processor in a system with four busses (4 sets of clock and
data lines). The circuit is controlled by the box labeled
"transmission control logic." Upon receiving a "START"
signal (from the CPU), this logic instructs the "bus finder"
to "SEARCH" for a free bus. When it finds one, it locks two
data selectors and a data distributor onto the bus (using
its "BUS SELECT" lines) and signals the control logic that
it has "FOUND" a bus. The control logic then loads the
shift register with data from the CPU output buffer and
36
-
cdock busse m
w data busses = -
bus ,4 to 1 4 tol to 4available selector selector distributordetector
control logic *:-w register
CPU
strt
Figure 10. Transmitter Circuit
37
-
enables the shift register clock. Data is shifted (using
the appropriate bus clock) out through the 1 to 4
distributor onto the selected data bus using the same
open-collector transistor buffers shown in Figure 8. As
always, a one-bit comparator monitors the difference between
the shift register (desired) output and the actual output on
the selected bus. If there is ever a miscompare, an "ABORT"
is generated and the transmission control logic instructs
the bus finder to locate another bus. This process
continues until the transmitter is successful at placing its
entire word on the bus, at which time another word is
obtained from the CPU buffer and the cycle begins again.
This bus design has tremendous flexibility. Its
bandwidth is exactly four times that of a single bus and can
be expanded still further with additional busses.
Reliability is also enhanced. Because processor to bus
connections are continuously reconfiguring, selection of an
alternate bus in the event of a failure is instantaneous and
automatic.
There are, however, a few physical limitations which
remain to be resolved. The first is that there is a limit
to how many open-collector transistors can be "wire-ored" to
one bus before the sum of their leakage currents pulls the
bus low even if no transistor is on. This limit can be
increased using low leakage transistors, but can never be
totally ignored. Noise considerations on such a bus will
also require further research. For the present, the triplex
data approach described in Section IV will be relied upon to
correct for noise-corrupted transmissions.
Another consideration is the effect of propagation
delays on the output of each transmitter comparator. The
fact that desired and actual outputs match at one location
38
-
is no guarantee that the same holds true many feet away on
the bus. This problem is avoided for reasonable line lengths
by the manner in which data is clocked onto the bus. Data
is shifted onto the bus on the rising edge of each clock
pulse, but the comparators output is not sampled until the
falling edge. This allows one half of a microsecond for the
data to settle before it is used.
Finally, the open collector transistor implementation
is only one approach to the transparent contention concept.
Any technique where one logic state wins out over another
will work. In the case of fiber optic busses, for example,
the presence of light on the bus could be made to win out
over its absence, and so on. For the purposes of concept
demonstration in the laboratory, the wired-or approach has
been shown to work very well.
D. VIRTUAL COMMON MEMORY
Up to this point, discussion has centered around the
transparent contention concept and its physical
implementation. In this section an actual application is
presented, allowing the development of what is called"virtual common memory."
The Virtual Common Memory Concept
One of the main problems that occurs in the design of
multiprocessor systems is how to distribute and exchange
data efficiently. From a hardware standpoint, the easiest
approach is usually to connect all processors to a common
serial multiplex bus (Figure Ila). This minimizes hardware
complexity and allows expandability, but often involves a
large software overhead. This is because processors must
39
-
a, b.wcommon data bus common memory
"virtual memory"
Fiqure 11. Evolution of Virtual Memory
40
-
exchange data on a "request" or "broadcast" basis, both of
which require special handling by every processor in the
system.
In the "request" mode of operation, a processor that
needs a piece of information simply asks for it on the bus
and receives it from some other processor a short time
later. This means that each processor must constantly
monitor the bus for data requests rather than concentrating
upon the task to which it has been assigned. Because
processors must wait for much of their data, processing
inevitably takes longer than it would if all data were
available in local memory at the start of a given task.
Additional processing time is also wasted in responding to
requests for data from other system processors.
The "broadcast" method is an alternate approach to data
exchange where every piece of data is transmitted as soon as
it is produced. Each processor then selects from the bus
whatever information it needs to accomplish its current
task. This approach also requires a lot of overhead as each
processor must now constantly monitor the bus for items of
local interest.
Thus, the common serial multiplex bus, while being the
most simple and flexible from a hardware point of view, has
serious drawbacks in terms of software complexity.
The simplest and most efficient approach to
interprocessor communication from a software point of view
is a common memory containing all information required by
all processors (Figure llb). In such a system, a processor
stores its output in the common memory where it can be
instantly accessed by any other processor in the system.
When one processor needs information from another, it simply
reads it from the common memory without delay.
41
-
Unfortunately, what is ideal from a software standpoint
is difficult to implement in hardware. Serious contention
problems develop when more than one processor attempts to
access the same block of memory. Since each processor must
be connected to the common memory by a complete set of
address and data lines, the hardware complexity is also
large. Finally, system expandability is impaired. In a
serial bus system more processors can be added by simply
connecting them to the bus, but there is a limit to the
number of ports available in a common memory. When these
have been used, no more can be added without redesigning the
system.
So, it appears that what is good for hardware is bad
for software and vice-versa. Clearly, a scheme that could
combine the best of both approaches is highly desirable.
Virtual common memory is such a solution.
Virtual common memory is a method for making a serial
multiplex bus look like a single common memory to the system
programmer. As such, it combines the hardware simplicity ofa serial bus with the software simplicity of a common memory
(Figure llc). In the following paragraphs the physical
implementation of a virtual common memory containing all
information to be shared among processors will be
described.
Virtual Common Memory Design
This section describes the virtual common memory design
used in the CRMmFCS architecture. The essential features of
this design are shown in Figure 12. In the figure, six
processing modules are shown connected by four serial
multiplex busses of the kind described above. This
represents the system actually being constructed at the
42
-
Flight Dynamics Laboratory, although additional busses or
processors could have been included. The number shown is
considered to be enough to demonstrate the overall concept.
In a true shared memory architecture, the common memory
contains all information required by any processor in the
system. This information describes the entire state of the
aircraft, and the memory which contains it is called the"state information memory" (SIM). When a processor needs a
particular state variable, it accesses a well-defined
location in the SIM. When it generates a variable, it
places it in a specific SIM location where other processors
can find it. No other processing is required for complete
interprocessor communication. A complete discussion of the
SIM concept is presented in Appendix F.
In the virtual common memory design of Figure 12, each
processor is given its own copy of the SIM. To access a SIM
variable it simply looks up its own copy. To store a
variable into the SIM, a processor broadcasts it over the
bus and every processor's copy is updated simultaneously.
Since reading from a local copy is the same as reading from
a common one, and since writing to the bus is the same as
writing to a common memory, as far as any processor is
concerned there is only one "virtual" common memory being
used by everyone.
In order to make bus transmissions appear to be reads
and writes on a common memory, two special circuits were
designed. These circuits are described next.
The Transitt . The transmitter circuit (labeled
"XMIT" in Figure 12) was shown in detail in Figure 10. It
is connected to the microcomputer through a two-page buffer
(P1 and P2) which is memory-mapped to the local CPU. These
43
......................................... _- _
-
'tp T PI P2 rpi. P2 P' i,P 2 IP1 P2 , iP 2
local local local local local localmeoymemory memory memory memory memory
1-'icluiru 12. Ph.,2 .icAI -I npleinecnta.tioa of Virtual Niemory..
44
-
pages alternate functions every millisecond so that, while
one of them is being loaded by the CPU, the other is being
unloaded onto the bus. This dual buffer approach ensures
that all data is transmitted with no more than one
millisecond delay. Its application will be discussed in
greater detail later. Complete details on the transmitter
circuit are given in Appendix A.
The Recer. The receiver consists of two major
parts including a serial to parallel converting shift
register (SIPO) and a block of random access memory (RAM)
which contains a complete copy of all state information in
the system. This state information memory (SIM) is mapped
into the microcomputer's address space as a block of "read
only" memory and is accessed by the SIPO outputs as a block
of "write only" memory. Figure 13 shows how this works.
In the architecture under discussion, a word of
transmitted data is 37 bits long. It consists of four bytes
of significant data separated by a zero bit before and after
each byte (see Figure 13). A string of more than eight
consecutive "I" bits on the bus indicates that it is no
longer in use, so zero bits are included between each byte
to ensure that the bus continues to "look" busy in the event
that more than eight consecutive ones occur in the actual
data word.
Once the SIPO has been fully loaded with a 37 bit word
from the bus, the 32 significant bits (excluding the 5
separating zeros) are loaded onto the address and data lines
of the SIM RAM as follows. The first 5 bits contain the
identity code of the sending processor. These bits were
used only for quick resolution of bus contention and could
be discarded unless it is important to record who sent each
word for fault isolation purposes. (In the CRMmFCS design,
45
-
date bus serial-in parallel-out shift register(SIPO)
clock bus 0 byte4 01 byte3 10 byte* 01 bytel 101
16 bit j stateSim data s Information
word j6 memory jot PS2k by6 bit ram
>
Iw Id'. bi kmades fomCU
Figure 13. Receiver Block Diagram
46
-
e ~~ ~~, .. .. ..-.. . -i .
these bits are in fact saved in the SIM for use by
black-balling algorithms -- see Section IV-D.) The next 11
bits specify the location in the SIM to which this data word
is to be stored. They may be thought of as the name of the
SIM variable and, in this case, they allow up to 2024
different variables. Finally, the last 16 bits of the
transmission contain the value of the variable. These 16
data bits are loaded into the SIM via direct memory access
(DMA). The variable is now available for access by the CPU
whenever it is needed.
While all of this is happening, the SIPO register is
collecting the next word of data as it appears on the bus.
This, in general, begins after nine bus clock cycles (during
which the bus floats "high"). At this time every other
transmitter realizes that the bus is no longer in use
(because nine consecutive "ones" have occurred) and
contention for the bus begins again. Thirty seven
microseconds later the SIPO is again full and another DMA
cycle is executed to load its contents into the SIM. Thus,
a word is received every 9 + 37 = 46 microseconds.
In a system with n busses, there are n SIPO shift
registers requesting direct memory access to the SIM. Since
there are 46 microseconds between successive DMA requests
from any one bus, and since current high speed RAM can
handle as many as four accesses per microsecond, it is
theoretically possible for a system to have as many as 4 x
46 = 184 serial busses, each operating at 1 MHz, for a 184
MHz total system bandwidth. Such a system would allow
processors to exchange up to 184 x (1 variable/46
microseconds) = 4 million variables per second.
Of course, connecting 184 shift registers to a single
block of memory is impractical with today's technology, but
47
-
if the entire CPU, SIM, transmitter, and receiver were
integrated on a single chip (with only the bus lines brought
out to the pins), such an approach might well be possible.
Until then, a practical limit is about 8 busses with a
corresponding bandwith of 8 MHz. Appendix B presents a
complete description of the receiver design.
While the potential for high throughput rates is
intriguing, the real usefulness of virtual common memory is
to allow easier programming of processors connected by
serial multiplex busses. The next section shows how a
virtual common memory architecture can be used most
effectively for this purpose.
E. EFFECTIVE USE OF VIRTUAL COMMON MEMORY
Knowing how to use a virtual common memory architecture
can make a big difference in how useful the idea is. This
section discusses the application of virtual common memory
to the CRMmFCS design. This is not the only way to use
virtual common memory, but it does illustrate some important
considerations for effective use of the concept.
In the first place, it is important to schedule
transmissions carefully in order to take full advantage of
the data packing capabilities of the architecture. Figure 14
shows how transmissions are scheduled in the CRMmFCS design.
Time in the system is divided into 1 ms frames and all
processors in the system are synchronized to this frame
rate. Since each transmission is exactly 46 microseconds
long, and since the transmitters pack data onto the bus with
no wasted time in between, there is room for 1000/46 or 21
48
-
complete transmissions per bus per millisecond. Therefore,
when system software is being written, it is modularized
into 1 millisecond chunks (called "millimodules") and no
more than 4 x 21 = 84 transmissions are scheduled for any
one millisecond. Since there are 84 slots available in each
frame, this guarantees that every scheduled transmission
will be completed sometime within its assigned millisecond.
Each dot in Figure 14 represents a 46 bit word of data
appearing on the bus. In this example, 22 of the 84
available slots are scheduled for use. At the beginning of
each millisecond, all transmitters compete to place their
part of the scheduled variables onto the bus. Transparent
contention packs these words into one slot after the other
on all four busses until every scheduled word has been
transmitted. Then the bus sits idle until the start of the
next millisecond.
It is, of course, possible to schedule 84 transmissions
per millisecond and never have the bus sit idle. However,
for reliability reasons, it is usually wise to leave enough
unused slots to "take up the slack" in the event that one of
the busses fails. In the CRMmFCS design, 42 slots are left
unused in each frame so that the system can tolerate two bus
failures without loss of throughput.
This approach makes processor task scheduling a lot
easier. Because it is known (to the nearest millisecond)
exactly when a variable will appear on the bus, it is also
known (to the nearest millisecond) how soon a task can be
scheduled using that variable. Figure 15 summarizes this
scheduling procedure and illustrates an efficient virtual
common memory system in operation.
49
-
* S* S S S* S S S* S S S* 5 5 5* S S S
-4
wU
S*U)
LI)
* S 5 2U)
* 5 5 S* 5 S S* 5 S S* S S S* S S S U)
m
.~j.-4
I))
-4
* S* S S S* S S S* S S S* 5 5 5* 5 5 S
liii
50
-
The figure is divided into four different rows showing
where variables A, B, C, and D are scheduled to be over a
period of four milliseconds. During the first frame,
variables A and B are shown in row 1 indicating that they
are currently in the SIM and available for use. As a
result, Task 1, which computes C = A + B, can be scheduled
for this frame as shown in row 2. Once C has been computed,
it is placed in the currently active page of the transmitter
buffer (refer to Figure 12) where it remains for the
duration of the frame. Let us assume that this was page 1.
At the end of frame 1, the two transmitter pages are
switched so that page 1 is connected to the bus and page 2
(which was emptied onto the bus during frame 1) is connected
to the CPU to collect any data generated during frame 2.
Now that page 1 is connected to the bus, the transmitter
circuit broadcasts variable C in the first available slot.
This is indicated in row 4 of the figure.
It is not possible to know exactly when during frame 2
variable C actually appears on the bus. This is a random
function of when transparent contention allowed the
transmitter to gain access. But since there are more slots
than there are scheduled variables, sooner or later C will
get its chance. It is therefore guaranteed to reach the SIM
in every processor by the end of frame 2. This is indicated
by showing C in row 1 ready for use during frame 3. Task 2,
which computes D = SQRT(C), can now be scheduled for frame 3
and the entire process repeats.
Thus, one of the major goals of the design has been
accomplished: elimination of access delay to global
variables. Because data is broadcast to every processor as
soon as it is generated, it is always available in the SIM
by the time a task is scheduled to use it.
51
-
variables in sim Aready for use AB C
variables generated 2during this millisecond 2C=A+B D=vC
variables in buffer C Dready for broadcast 3Dvariables appearing A
on bus this millisecond C Dtime (milliseconds) 1 [ 2 1 3 1 4
Figure 15. Data Flow Assignment
52
-
F. SECTION SUMMARY
This section has presented an overview of the essential
hardware elements of the CRMmFCS. It has also introduced
two new concepts in interprocessor communications. The
first is called "transparent contention" and represents a
method for autonomous processors to share a common bus at
maximum efficiency without need of a central controller. It
has been shown that this approach opens up many new
possibilities for improved bandwidth while avoiding the
pitfalls of single point failures possible in many central
controller designs.
The second concept presented is called "virtual common
memory." It represents a method of minimizing the hardware
and software involved in interprocessor communications.
While not essential to the concept, transparent contention
was shown to be an ideal method for implementing virtual
common memory in many practical situations.
The next section of this report discusses how the
CRKmFCS software structure was designed to utilize the
hardware which was developed in this section.
53
-
SECTION IV
SOFTWARE STRUCTURE
A. INTRODUCTION
In Section III it was shown how a simple set of serial
busses could be made to look like a shared common memory to
the system programmer. Now that such a system may be
assumed to exist, it is time to show how software can be
designed to take advantage of this new architecture. This
se