new flllfffflllfff ehomhohhohhoheeehe eeeee · 2014. 9. 27. · ad-aloi 412 air force wright...

153
AD-AlOI 412 AIR FORCE WRIGHT AERONAUTICAL LAOS WRIWT-PATTERSON AFS ON4 F/O 9/2 A ONTINUOUSLY RECONIGURING MULTI-MICROPROCESSOR FLIGHT CONTRO-f*C (U) MAY 61 S J LARINER, S L MAHER N UNCLASSIFIED AFAT57 N flllfffflllfff I1 EhE EEEEE EhomhohhohhohEE EEEEEEEEEEoohEI mEEEohhEEEEEEE

Upload: others

Post on 21-Oct-2020

3 views

Category:

Documents


1 download

TRANSCRIPT

  • AD-AlOI 412 AIR FORCE WRIGHT AERONAUTICAL LAOS WRIWT-PATTERSON AFS ON4 F/O 9/2A ONTINUOUSLY RECONIGURING MULTI-MICROPROCESSOR FLIGHT CONTRO-f*C (U)

    MAY 61 S J LARINER, S L MAHER NUNCLASSIFIED AFAT57 NflllfffflllfffI1 EhE EEEEEEhomhohhohhohEEEEEEEEEEEEoohEImEEEohhEEEEEEE

  • AFWAL-TR-81-3070

    ! A CONTINUOUSLY RECONFIGURING MULTI-MICROPROCESSOR

    M FLIGHT CONTROL SYSTEM

    Stanley J. Larimer, Captain, USAFScott L. Ma.her, First Lieutenant, USAF

    MAY 1981

    Final Report for Period August 1979 to March 1981

    Approved for public release; distribution unlimited.

    j FLIGHT DYNAMICS LABORATORY', AIR FORCE WRIGHT AERONAUTICAL LABORATORIES

    AIR FORCE SYSTEMS COMMANDWRIGHT-PATTERSON AIR FORCE BASE, OHIO 45433

    1- 7 13 311

  • NOTICE

    When Government drawings, specifications, or other data are used for anypurpose other than in connection with a definitely related Government pro-curement operation, the United States Government thereby incurs no respon-sibility nor any obligation whatsoever; and the fact that the governmentmay have formulated, furnished, or in any way supplied the said drawings,specifications, or other data, is not to be regarded by implication or other-wise as in any manner licensing the holder or any other person or corpora-tion, or conveying any rights or permission to manufacture use, or sell anypatented invention that may in any way be related thereto.

    This report has been reviewed by the Office of Public Affairs (ASDIPA) andis releasable to the National Technical Information Service (NTIS). At NTIS,it will be available to the general public, including foreign nations.

    This technical report has been reviewed and is approved for publication.

    SCOTT L. MAIER, iLt, USAF STANLEY JProject Engineer Project EngineerControl Data Group Control Analysis GroupControl Systems Development Branch Control Dynamics Branch

    EVARD H. FLINN, Chief RONALD 0. ANDERSON, ChiefControl Systems Development Branch Control Dynamics BranchFlight Control Division Flight Control Division

    FOT E COA*ANDER

    ROBERT C. ETTINGE1j Colonel, USAFChief, Flight Control Division

    "If your address has changed, if you wish to be removed from our mailinglist, or if the addressee is no longer employed by your organization, pleasenotify AFWAL/FIGC, W-PAFB, OH 45433 to help us maintain a current mailinglist".

    Copies of this report should not be returned unless return is required bysecurity considerations, contractual obligations, or notice on a specificdocument.AIA FORCE/56780/23 Jun. 1981 - 510

  • SECURITY CLASSIFICATION OF THIS PAGE (When Date Entered)

    ,- REPORT DOCUMENTATION PAGE READ INSTRUCTIONSRE R DBEFORE COMPLETING FORM

    I REPORT NUMBER 2. GOVT ACCESSION NO. 3. RECIPIENTS CATALOG NUMBER

    AFWAL-TR-81-3k7Q _7 ________'___ovAS ° SCk. TITLE (and Subtitle) S./ . *5.- IPl OF REPORT A PERIOD COVERED

    .A CONTINUOUSLY JRECONFIGURING MULTI-MICROPROCESS Final R pe~t-

    FLIGHT CONTROL SYSTEM 1 Aug 1079 -- 30 Apr ]W81 .S. PERFORMING ORG. REPORT NUMBE4

    , AUTHOR(S) S. CONTRACT OR GRANT NUMBER(e)

    Stanley J. Larimer, Captain, USAF -Scott L.lMaher First Lieutenant, USAF 4

    9. PER;ORMING ORGANIZATION NAME AND ADDRESS SO. PROGRAN ELEMENT. PROJECT, TASK

    Flight Dynamics Laboratory (AFWAL/FIGC) AREA& WRK UNIT NUMBERS

    Air Force Wright Aeronautical Laboratories (AFSC) Program ement,: 62201FProj e c t : 2 403 1 1 -7 " -o.

    Wright-Patterson AFB, Ohio 45433 Task: 02 Work Uni. " 44

    1,. CONTROLLING OFFICE NAME AND ADDRESS 12. REPORT DATE

    j'- May 18113. NUMBER o"AGES

    148 . .14. MONITORING AGENCY NAME & ADDRESS(If different from Controlling Office) 15. SECURITY CLASS. (of this report)

    UNCLASSIFIEDIS.. DECLASSIFICATION DOWNGRADING

    SCHEDULE

    16. DISTRIBUTION STATEMENT (of this Report)

    Approved for public release; distribution unlimited.

    '7. DISTRIBUTION STATEMENT (of the *betract entered In Block 20, It different from Report)

    IS. SUPPLEMENTARY NOTES

    19. KFY WORDS (Continue on reverae aide if necessary and identify by block number)

    Uig'tal Control Systems Reconfiguration

    Multi-Microprocessor Distributed SystemsFlight Control SystemsDistributed Control

    ABSTRACT (Continue ot reverse side if necessary end Identify by block number)

    Recent research at the US Air Force Wright Aeronautical Laboratories (FlightDynamics Lab) has resulted in the development of a promising microprocessor-based flight control system design. This system is characterized by a collec-tion of cooperatively autonomous distributed microcomputers interconnected byan arbitrary number of common serial multiplex busses. Each processor in thesystem independently determines its assignments using a simple algorithm that

    (continued)

    DD JA m7, 1473 EDITION OF I NOV 65 IS OBSOLETE NSECURITY CLASSIFICATION OF THIS PAGE (*%*. Data Entered)

    * - / .

  • SECURITY CLASSIFICATION OF THIS PAGE(Wen DataEnlered)

    dynamically redistributes system functions from processor to processor in anever-ending process of reconfiguration. This approach offers several benefitsin terms of system reliability, and the architecture in general incorporatesmany state-of-the-art features which promise improved system throughput, expand-ability, and above all, ease of programming

    The Continuously Reconfiguring Multi-Micropro ssor Flight Control System(CRMmFCS) represents a major data point in multiprocessor control systemresearch. Promising ideas from a variety of ref rences have been included andintegrated in its design. Its laboratory impleme tation provides a demonstra-tion of these ideas for improving throughput, relibility, and ease of pro-gramming in flight control applications.

    * - j"

    SECURITY CLASSIFICATION OF THIS PAGE(When Dal Entered)

  • FOREWORD

    The research described in this report was performedin-house at the AFWAL Flight Dynamics Laboratory during theperiod from September 1979 to March 1981. It is the resultof a joint effort between members of the Control SystemsDevelopment Branch (AFWAL/FIGL) and the Control DynamicsBranch (AFWAL/FIGC) of the Flight Control Division.

    Work began in this area in late 1978 when the ControlSystems Development Branch initiated a work unit called"Multi-Microprocessor Control Elements" (24030244). Duringthis time, Lt. James E. May, Lt. Scott L. Maher, John Houtz,and Capt. Larry Tessler laid the foundation for a study ofhow growing microprocessor technology could be applied tothe problems of modern flight control. It was decided todesign and build some form of fully distributedmicroprocessor-based flight control system in order toexplore the potential problems and benefits in greatdetail.

    In September 1979, Lt. Scott Maher took over asPrincipal Investigator and Capt. Stan Larimer joined theprogram as the Associate Investigator from the ControlDynamics Branch. In the months that followed, Maher andLarimer developed what has come to be known as the"Continuously Reconfiguring Multi-Microprocessor FlightControl System" architecture. Steve Coates, RichardGallivan, and Tom Molnar also provided invaluable input tothe research during this phase.

    During the summer of 1980, Mr. Harry Snowball (ControlData Group Leader) and Mr. Evard Flinn (Control SystemsDevelopment Branch Chief) decided to intensify efforts todevelop this new architecture. They hired four newengineers including Bill Rollison, Ray Bortner, Mark Mears,and Stan Pruett to work on the project. They also assignedtechnician SSgt. Jeff Lyons and co-op students Dan Thompson,Russ Blake, and Bob Molnar to assist with the R&D activites.At the same time, Mr. Dave Bowser (Control Analysis GroupLeader) and Mr. Ron Anderson (Control Dynamics Branch Chief)increased FIGC support by assigning Lt. Allan Ballenger toact as an architecture and control law consultant on theproject. In 1981, Lt. Jack Crotty joined the program as asoftware designer thereby completing the CRMmFCS team.

    The authors would like to thank these individuals fortheir many contributions to the success of this program.Bill Rollison was responsible for the design, development,construction, and testing of the transmitter and receivercircuitry. Ray Bortner designed the real-world interfaceprocessor and helped to develop the CRMmFCS control laws andcorresponding aircraft simulation. Mark Mears was

    iii

  • responsible for the haidware and software design of theentire data collectfcn and processing network for theCRMmFCS laboratory implementation. Stan Pruett invented aunique five-port RS-232 communications controller whichallows all systems in the laboratory to communicate witheach other. He also assisted in the development of CRMmFCSprocessing module software and programmed the TRS-80 to actas controller for the entire laboratory system.

    Lt. Allan Ballenger was responsible for development ofthe real-world plant simulation and overall control lawdesign. He also organized a national workshop onmulti-processor flight control architectures which provideda vital link with others working in the field. Tom Molnarserved as a technical consultant and work unit monitor forthe proaram. Lt. Jack Crotty was responsible for thedevelopment of all CRMmFCS software. Dan Thompson providedthe original design for much of the TMS-9900 software usedto implement the CRMmFCS operating system.

    Rick Gallivan was responsible for the development of areal-world simulation on a Motorola 68000 microprocessor andprovided logistics support for the program. Steve Coateshandled the construction of the entire laboratory setup andh-lped to develop a real time graphics display for the pilotinterface. Jeff Lyons designed and constructed the firstworking bus termination circuit. Russ Blake and Bob Molnarprovided drafting and breadboarding support for many of thelaboratory components.

    The authors would also like to express special thanksto Art Eastman and Dave Dawson for their outstanding work indesigning and producing the many figures which appearthroughout this document. Carl Weatherholt, Rudy Chapski,and Bill Adams also provided invaluable support in thelaboratory. Many thanks also go to Pam Larimer, JanRobinson, and Dave Bowser for their assistance in thepreparation of this report.

    Finally, the authors wish to express their appreciationto Mr. Vernon Hoehne, Lt. Allan Balienger, and Tom Molnarfor their considerable efforts in reviewing the technicalreport.

    This report covers work performed from September 1979through March 1981. It was submitted by the authors in May1981.

    iv

  • TABLE OF CONTENTS

    I. INTRODUCTION ......... ................... 1

    II. OVERVIEW .......... ..................... 3

    A. Overview of the CRMmFCS Architecture .. ..... 3

    B. Design Philosophy ....... .............. 5

    C. Evolution of Continuous Reconfiguration . . . 7

    D. The Concept of Continuous Reconfiguration . . 14

    Example of Continuous Reconfiguration . . 15

    Advantages of Continuous Reconfiguration . 15

    Controlling A C. R. System .... ........ 19

    Requirements for C. R ... ........... ... 20

    E. Section Summary ..... ............... 22

    III. HARDWARE ARCHITECTURE .... .............. . 23

    A. Introduction ...... ................. .. 23

    B. Distributed Control of a MUX Bus ........ .. 23

    Other Approaches .... ............. . 24

    Improving Bus Efficiency ......... 26

    A New Approach to the Problem . ...... 27

    C. Transparent Contention . ........... .28

    Transmission Without Contention ...... . 29

    Transmission With Contention ....... 31

    An Extention to n Busses .. ......... .. 36

    D. Virtual Common Memory . . ........... 39

    The Virtual Common Memory Concept . . .. 39

    Virtual Common Memory Design ....... 42

    E. Effective Use of Virtual Common Memory . . .. 48

    F. Section Summary ..... ............... 53

    v

  • TABLE OF CONTENTS (concluded)

    IV. SOFTWARE STRUCTURE ...... ................ 54

    A. Introduction ...... ................. .. 54

    B. The Task Assignment Chart ... .......... 56

    Application of Task Assignment Charts . . 59

    Real Time Chart Execution .. ........ 61

    Multi-Rate Considerations .. ........ 65

    Compound Millimodules ... .......... 67

    C. Autonomous Control Algorithms .. ........ 67

    Autonomous Control .... ............ . 68

    Volunteering .... .................. 69

    Continuous Reconfiguration ........ 72

    D. Reliability Considerations ... .......... . 73

    E. Section Summary ..... . .............. 75

    V. LABORATORY IMPLEMENTATION .... ............ 76

    VI. CONCLUSION ........ .................... . 80

    APPENDIX A. Bus Transmitter Design ... .......... . 82

    APPENDIX B. Bus Reciever Design ... ........... . 97

    APPENDIX C. Bus Termination Circuit Design ....... .. 103

    APPENDIX D. Real World Interface ... ........... . 106

    APPENDIX E. Data Collection Circuitry .. ........ 112

    APPENDIX F. State Information Matrix (SIM) Theory . . 120

    APPENDIX G. Multi-Rate Millimodule Design . ...... 135

    REFERENCES ........ ...................... 140

    vi

  • LIST OF ILLUSTRATIONS

    1. Major CRMmFCS Architecture Components ... ....... 4

    2. Virtual Common Memory ................ 6

    3. Breakdown of Possible Architectures .... ........ 8

    4. Fixed vs. Pooled Designs with Equal Redundancy . . 10

    5. Fixed vs. Pooled Designs with Equal Numbers .... 12

    6. Continuous Reconfiguration . . . .......... 16

    7. Essential Architecture Elements ... .......... . 30

    8. Transmitter-Bus Interface ............. 32

    9. Bus Contention Arbitration .... ............ 34

    10. Transmitter Circuit ................. 37

    11. Evolution of Virtual Memory .... ............ . 40

    12. Physical Implementation of Virtual Memory ..... ... 44

    13. Receiver Block Diagram ..... .............. 46

    14. Bus Transmission Scheduling .... ............ . 50

    15. Data Flow Assignment ..... ............... 52

    16. Approaches to Task Scheduling ... ........... . 55

    17. A Generic Task Assignment Chart ... .......... . 57

    18. CRMmFCS Task Assignment Chart ... ........... . 60

    19. Task Assignment Table Generation .. ......... . 62

    20. Executive Flow Chart ................ 64

    21. The High-Rate Task Scheduling Problem ........ .. 66

    22. Volunteering and the Volunteer Status Table .... 70

    23. Laboratory Implementation .... ............. . 77

    vii

  • LIST OF ILLUSTRATIONS (concluded)

    A-i. Transmitter Block Diagram ................ 83

    A-2. Transmission Format ....... ............... 85

    A-3. Transmitter Circuit ...... .............. 90

    B-I. Receiver Functional Diagram ... ........... . 98

    B-2. Receiver Block Diagram .... ............. . i. 100

    C-i. Smart Bus Configuration .... ............. 104

    D-1. Real World Interface Processors .. ......... . 109

    E-1. Bus and SIM Monitors ..... ............... .. 115

    E-2. Data Collection and Post-Processing Circuitry 118

    F-i. The State Information Matrix ... ........... . 121

    F-2. The State-Time Form ..... ............... 123

    F-3. State Information Matrix Processing . ....... .. 125

    F-4. Single Copy / Multi-Access Architecture ..... .. 129

    F-5. Multi-Copy / Multi-Access Architecture ....... .. 130

    F-6. Distributed / Shared Information Architecture . . 131

    F-7. Broadcast Data Vector Architecture .. ........ .. 132

    F-8. Virtual Memory Emulation .... ............. . 133

    F-9. Virtual Memory Equivalent .... ............ 134

    G-1. Possible Rates for a Ten-Milliframe TAC ..... .. 136

    viii

  • SECTION I

    INTRODUCTION

    The use of microprocessors in flight control

    applications is a subject which has received much attention

    in recent years. Microprocessor technology is growing

    rapidly and there is a strong desire to take advantage of

    it. This report presents the results of several years of

    research performed at the Flight Dynamics Laboratory into

    how microprocessors might best be used for flight control

    applications.

    Microprocessors appear to have two major areas of

    application to the flight control problem. These may be

    termed "the low end" and "the high end." In the low end

    approach, microprocessors are distributed around the system

    in a dedicated fashion wherever a small amount of processing

    power is needed. In this mode, microprocessors are

    relegated to the role of "smart" peripherals to some central

    computer system. This application has been demonstrated

    with considerable success in many currently flying aircraft.

    By performing many of the repetitive, time intensive

    functions such as keyboard monitoring, sensor preprocessing,

    display generation, and inner-loop control, microprocessors

    can relieve the central computer of much of its

    computational load so that it can concentrate on what it

    does best: number crunching, system management, and

    outer-loop control. Since the low end application has been

    demonstrated in many operational systems and its utility is

    generally unquestioned, it will not be discussed further in

    this report.

    The "high end" application for microprocessors is a

    much newer field of research. It is concerned with how to

    use microprocessors as a distributed, multi-processing

  • replacement for the central computer itself. Since the goal

    of this research program was to investigate the potential of

    microprocessors for flight control and not to design a

    working model for near-term application (where a more

    conservative approach would be necessary), it was decided to

    attempt the most ambitious (and most promising) application

    of microprocessor technology: "a fully decentralized,

    continuously reconfiguring, self-healing, adaptive, pooled

    microprocessor-based flight control system." The result was

    the development of an architecture known as the

    "Continuously Reconfiguring Multi-Microprocessor Flight

    Control System" (CRMmFCS).

    This report presents the results of research to date in

    the development of the CRMmFCS. Section II gives an

    overview of the architecture and the philosophy behind it.

    Section III presents a detailed discussion of the hardware

    aspects of the architecture while Section IV does the same

    for software. Both sections represent virtually stand-alone

    discussions of their respective topics at a level which is

    as thorough as possible without sacrificing readability.

    Technical details which are of use to the reader only after

    a complete study of the architecture have been removed to

    the appendicies. Finally, Section V describes the actual

    laboratory implementation and test procedure which will be

    used to demonstrate the architecture. The results of these

    tests will be published in a subsequent technical report.

    2

  • SECTION II

    OVERVIEW

    A. OVERVIEW OF THE CRMmFCS ARCHITECTURE

    The CRMmFCS design centers around a system of

    autonomous microprocessors connected by a common set of

    serial multiplex busses. These processors operate in a

    pooled configuration where any processor can perform any

    task at any time. Furthermore, task assignments are

    continuously redistributed among all processors in a

    never-ending process of reconfiguration. If a processor

    fails in the system, it is simply left out of the next

    reconfiguration cycle and the system continues to operate as

    if nothing has happened. All of this is accomplished

    without use of a central controller.

    A diagram of the architecture is shown in Figure 1. In

    the figure, six processing modules are shown connected to a

    set of four common data busses. Each data bus consists of

    one clock and one data line and information is transferred

    between processors using a simple serial multiplexing

    scheme.

    Processors in the system compete for access to the

    busses without central traffic control using a technique

    called "transparent contention." Transparent contention is

    a scheme which allows any processor to talk on any bus at

    any time. In the event of a "collision" between two

    transmissions only one message survives while the remaining

    one is automatically retransmitted as soon as the bus is

    free. Transparent contention provides one hundred percent

    efficient bus utilization, eliminates most communications

    overhead, and completely avoids the need for a central

    controller. It is discussed in detail in Section III.

    3

  • E C,

    Eu.C,

    ~ft - - -I-IL C.)

    MBA.2 :...

    0-

  • In addition to competing for the bus, processors also

    compete for the right to perform tasks in the system.

    During every time frame all processors "volunteer" for the

    tasks to be done in the following frame. These tasks are

    then divided by mutual agreement among all functioning

    processors in the system, again without needing a central

    controller.

    Because tasks to be performed are redistributed at the

    beginning of each frame, the task any particular processor

    performs is changing all the time. The system is said to be"continuously reconfiguring." Continuous reconfiguration

    has a number of important advantages over other approaches.

    These include automatic recovery in the event of a failure,

    constant spare checkout because no unit acts as a spare all

    the time, latent fault protection, and zero reconfiguration

    delay. A complete discussion of continuous reconfiguration

    is presented in Section IV.

    Finally, although processors communicate only via

    simple serial busses, the architecture is configured so that

    they appear to share a single common memory as shown in

    Figure 2. This virtual common memory contains all

    information available about the state and environment of the

    entire system. Processors obtain the information needed for

    any task from the virtual common memory and place their

    results back there for use by other processors. Complete

    details on how a set of serial busses can be made to act

    like a common memory are presented in Section III.

    B. DESIGN PHILOSOPHY

    Before beginning a detailed discussion of the

    Continuously Reconfiguring Multi-Microprocessor Flight

    5

  • Figure 2. Virtual Common Memory

    6

  • Control System (CRMmFCS) it is desirable to briefly discuss

    the design goals and philosophy which lead to this

    architecture. The original objective of this in-house effort

    was to develop an Air Force understanding and capability in

    the area of multi- microprocessor flight control systems. It

    was determined that a high risk - high payoff approach could

    be taken in an effort to advance the state-of-the-art while

    achieving the original objective. The approach taken was to

    trade off low cost hardware for simplified software and to

    distribute system control to its extreme in order to study

    the extent to which its potential advantages could be

    achieved. Other goals were to reduce overall hardware,

    software, and life cycle costs of flight control systems

    while maintaining high reliability and fault tolerance.

    Design considerations also included expandability for

    integrated control applications and reconfigurability to

    meet future self-healing requirements.

    C. EVOLUTION OF CONTINUOUS RECONFIGURATION

    Figure 3 shows a breakdown of some of the possibilities

    that exist for implementing digital control systems in

    general. Starting at the top of the figure, it may be seen

    that digital control systems can be broken down into either

    uni-processor of multi-processor systems. The distinction

    here is not so much whether there is one or more than one

    processor in the system, but rather whether there is more

    than one processor performing different functions. A system

    with, say, four processors performing identical functions

    for redundancy would, for the purposes of this report, be

    considered a uni-processor system since its effective

    throughput is that of a single processor.

    In this study, the multi-processor approach was chosen

    for two reasons. First, since state-of-the-art

    7

  • digitalcontrolsystems

    multi.

    11processor

    Epro":ssor I

    F-'fixed' pooled'

    cold hot ontinuousspares spares recon-

    figuration

    Fiqure 21.. Breakdown of Possible Architec ures

  • microprocessors have somewhat lower throughput than their

    mini-computer counterparts, it is unlikely that a single

    microprocessor would be able to handle the workload of a

    modern flight control system. Since we are constrained to

    use microprocessors by the goals of this investigation, it

    would seem that a multiprocessor architecture is mandatory

    for flight control applications. Of course, with the rate

    at which the field is developing, there may soon be

    microprocessors that can handle the required workload in a

    uni-processor configuration. But with the development of

    ever more sophisticated estimation, parameter

    identification, and self-optimization algorithms and

    increasingly demanding command and control functions, the

    required workload may go up at an even faster rate. Since a

    multi-processor architecture will always be able to improve

    on the throughput of a uniprocessor architecture of the same

    state-of-the-art, and since the only thing growing faster

    than computer technology is the size of the problems to be

    solved, there will always be a need for multi-processor

    configurations. The need for better methods to construct

    such architectures is the second reason why the

    multi-microprocessor approach was selected for this

    investigation.

    Given then that a multi-microprocessor system is to be

    implemented, there are two possible ways in which their

    functions can be assigned. As shown in Figure 3, these ways

    are "fixed" assignment of processing resources, where the

    function of each processor is permanently assigned, and

    "pooled" processing resources, where processors are

    dynamically assigned to each function as the needs arise.

    The fixed assignment is inherently simpler to implement and

    is adequate for many applications. However, in systems

    requiring great reliability and minimum hardware, the pooled

    approach offers distinct advantages.

    9

  • fixed *s:par* V, pooled

    Figure 4. Fixed vs. Pooled Designswith Equal Redundancy

    10

  • Figure 4 demonstrates the advantages of the pooled

    approach in systems requiring a large number of processing

    tasks. The system shown requires six different tasks to be

    performed concurrently and that a quad level of redundancy

    be maintained. A fixed assignment implementation of this

    system (Figure 4a) requires that six processors be

    permanently assigned to the six tasks and that three spares

    be permanently assigned to each processor. The net result

    is a 24 processor system. Figure 4b shows an equivalent

    system using a pooled architecture. This system still

    requires at least six processors to perform the six

    concurrent tasks but the number of spares is substantialy

    reduced. This is because, since any processor can be

    dynamically assigned to any task, the three spares are able

    to cover for any three failures in the system. Thus both

    systems can tolerate any three random failures but the

    pooled architecture requires significantly fewer processors.

    It is clear that, as the number of tasks to be performed

    increases, this difference becomes even more important.

    The argument just presented applies only to systems

    where failure detection is provided externally and the only

    requirement is to replace a faulty processor with a spare.

    For systems which must detect and locate their own failed

    processors as well as replace them, the distinction is not

    so much in the number of processors saved as it is in the

    level of redundancy provided.

    For example, Figure 5 shows fixed and pooled

    architectures each having 24 processors and providing triad

    voting for fault detection and isolation. They each have a

    complement of six spares for redundancy purposes, and both

    detect and correct faults by comparing the results of Al,

    A2, and A3 or Bl, B2, and B3, etc. and replacing the

    disagreeing processor with a spare. Unfortunately, in the

    fixed architecture when a failure has occurred in a given

    11

  • * fixedAl C D3 B3 E1 E spare

    F 3 A2 D, 1, *3 pooledr any spare

    C2 D 2 B2 A,

    Figure 5. Fixed vs. Pooled Designswith Equal Numbers

    12

  • task group, no further failures can be tolerated in that

    group because its only available spare has been used up.

    Thus, the entire system can tolerate only one random failure

    with guaranteed integrity of all six functions. The pooled

    architecture, on the other hand, can tolerate up to six

    random failures because all of its spares are free to be

    assigned wherever they are needed throughout the system.

    Because of its many benefits and great versatility, the

    pooled processor approach was selected for use in this

    study. Given that decision, and referring again to Figure 3,

    there are at least three ways of implementing a pooled

    processor architecture. These three approaches include

    "cold spares", "hot spares", and "continuous

    reconfiguration." In each case a pool of spare processors

    is maintained to replace failed processors. The difference

    is in the way that the spares are brought on line.

    Cold spares are the simplest approach to the problem.

    A pool of idle spares is maintained and, when a failure

    occurs, one of the spares is loaded with whatever data and

    software it needs to perform the missing function and is

    then brought on line. This is an acceptable method when the

    system involved is not real-time and a brief interruption

    during reconfiguration is unimportant. Unfortunately, in

    real-time systems the delay involved in "warming up" a cold

    spare is often unacceptable and may even be disasterous.

    An obvious solution to the cold spare problem is to

    maintain a pool of "hot spares." That is, a pool containing

    spares which are already loaded with all the software and

    current data needed to come on line immediately after a

    failure is detected. One hot spare is maintained for each

    function in the system and the remaining spares are left"cold." When a hot spare switches on line, a cold spare is

    13

  • "warmed up" to replace it so that a hot spare is maintained

    for every function as long as the supply of cold spares

    lasts.

    The hot spare approach is a great improvement over cold

    spares and is entirely adequate for most applications. Its

    chief drawback is the large number of spares required to

    ensure that every function has its own hot spare. Once a

    processor has become a hot spare it is essentially dedicated

    to one function. This is contrary to the goal of truly

    pooled resources. In addition, although the switch-in time

    is much improved, reconfiguration is still treated as an

    emergency requiring special processing and introducing

    delays and reconfiguration transients. What is needed is an

    approach which requires a minimum number of spares, produces

    no reconfiguration delays, and avoids the dedication of

    spares to specific functions. Continuous reconfiguration is

    such an approach.

    D. THE CONCEPT OF CONTINUOUS RECONFIGURATION

    Continuous reconfiguration is defined as a scheme

    whereby the tasks to be performed in a multi-processor

    system are dynamically redistributed among all functioning

    processors at or near the minor frame rate of the overall

    system. This approach allows continuous spare checkout,

    latent fault protection, and elimination of failure

    transients due to reconfiguration delay. By treating

    reconfiguration as the norm rather than the exception,

    failures can be handled routinely rather than as

    emergencies, resulting in predictable failure mode behavior.

    Using this approach, it is projected that the need for

    unscheduled system maintenance may be greatly reduced.

    14

  • Example Of Continuous Reconfiguration

    An example of what is meant by continuous

    reconfiguration is shown in Figure 6. A system of nine

    processors is shown performing six different tasks, A thru

    F, during three consecutive time frames. During the first

    time frame processor 1 is doing task B, processor 2 task D,

    processor 3 is a spare, and so on. In continuous

    reconfiguration the tasks are redistributed among the

    processors at the beginning of every time frame. For

    example, in the second time frame , there is an entirely

    different assignment of tasks to the processors. This

    reassignment is accomplished by having all of the processors

    that are currently healthy in the system compete for task

    assignments. If a processor fails during any time frame, it

    is no longer able to compete for task assignments and is

    thereby automatically removed from the system. In Figure 6,

    if processor 4 failed during the second time frame, then

    during the next frame, it would not be able to compete for

    task assignment. The six tasks which need to be done are

    taken by healthy processors and the two remaining processors

    become spares (Figure 6c). In other words, a failed

    processor simply disappears from the system without any

    other processors being aware that it is gone.

    Assuming for the moment that it is possible to

    implement such a system efficiently, this scheme presents a

    number of distinct advantages. These advantages will be

    discussed next.

    Advantages gf Continuous Reconfiguxatign

    The primary motivation for developing a continuous

    reconfiguration scheme is to allow a multi-processor system

    to detect and recover from random failures with no effect on

    its performance. This is a major problem with

    15

  • 6a. Time Frame 1

    6b. Time Frame 2

    6c. Time Frame 3

    Figure 6. Continuous Reconfiguration

    16

  • recontiguration in general since the process of shutting

    down a failed processor and starting up a spare almost

    invariably causes a short delay during which the output of

    the system is incorrect. This period of time is called a

    reconfiguration delay and the result is a failure transient

    which can be disasterous at the system output. The main

    reason for these delays is that most reconfigurable systems

    treat failures as emergencies requiring special actions

    which take time. In a continuously reconfiguring system,

    reconfiguration is the norm, not the exception. Task

    reassignment is regularly scheduled at frequent intervals so

    that when a failure does occur the system takes it in stride

    without missing a beat.

    Figure 6c shows how this works. In the figure,

    processor 4 has failed but the rest of the system doesn't

    notice it. The task that processor 4 was performing in the

    previous frame (Figure 6b) has been automatically reassigned

    to some other processor and the only effect is the net loss

    of one spare. There has been no reconfiguration delay and

    therefore no transients due to reconfiguration. This is the

    first major advantage of continuous reconfiguration.

    Eliminating reconfiguration delay by itself can not

    guarantee that there will be no failure transients at the

    output. A second source of these transients is failure

    detection delay. For example, in Figure 6 processor 4 may

    have been generating bad data for some time before its

    failure was detected in frame 2. Without continuous

    reconfiguration this stream of bad data would go to a single

    output device (rudder, aileron, display, etc.) causing a

    significant transient in that particular device. With

    continuous reconfiguration, the bad data goes to a different

    device every frame depending upon which task the processor

    is doing at that time. Since real world dynamics are

    usually much slower that the computer's frame rate, the

    17

  • aircraft will simply not respond to a single sample of bad

    data. By moving the bad data around from surface to

    surface, failure effects can be kept insignificant until

    failure detection occurs. This dispersion of failure

    effects is the second major advantage of continuous

    reconfiguration.

    A discussion of how to rapidly detect and isolate

    failures and how to prevent AUn3y bad data from reaching the

    system outputs will be presented in Section III when the

    triad structure is introduced.

    A third advantage of the continuous reconfiguration

    approach is latent fault protection. Latent faults are a

    class of faults that are inherently undetectable because

    they produce no noticeable change in system performance or

    output. This would seem to be no cause for alarm since a

    fault which produces no error would appear to be harmless.

    However, a latent fault can be very dangerous if it impairs

    the system's ability to tolerate subsequent failures. For

    example, if processor K fails in such a manner that its

    outputs are correct but it is no longer able to check

    processor D, then the system will continue to function

    normally while the fault in K remains unobservable. if

    processor D should fail, and the system is depending upon K

    to detect it, a catastrophic system failure may result. The

    continuous reconfiguration scheme avoids the problem because

    the processor responsible for checking any other processor

    changes with every frame. Thus, no dangerous combination of

    failures is allowed to exist for more than one frame at a

    time. The possibility of a "deadly embrace" between two

    partially failed processors is also avoided.

    Finally, continuous reconfiguration allows the constant

    checkout of all processors because no processor serves as a

    spare for more than one frame at a time. In an ordinary

    18

  • recontigurable system, where certain processors are always

    spares, there is a danger that one of the spares will fail

    before it is needed. If this happens, a disaster may result

    when the failed spare is used in an emergency. The problem

    is analogous to changing a tire and discovering that the

    spare is flat. By constantly "rotating the tires" in a

    continuously reconfiguring system, failures in any processor

    can be detected as soon as they occur.

    In this section it has been shown that there are four

    major advantages to the continuous reconfiguration approach.

    These include (1) zero reconfiguration delay, (2) dispersion

    of failure effects, (3) latent fault protection, and (4)

    continuous spare checkout. Because of these advantages and

    their potential contribution to system reliability,

    continuous reconfiguration was selected as the method to be

    used for managing the pooled multi-microprocessor

    architecture developed in this program. The next section

    looks at some of the problems involved in implementing a

    continuously reconfiguring system.

    Controlling A Continuously Reconfiguring System

    A unique approach was taken for controlling the

    continuously reconfiguring multi-microprocessor flight

    control system. The traditional approach would have been to

    have a central controller in charge of assigning tasks,

    handling reconfiguration and controlling bus access.

    Unfortunately, a central controller introduces the

    possibility of a single point failure in the system

    requiring redundancy incompatible with the architecture and

    reducing the reliability of the continuous reconfiguration

    concept.

    An alternative to a central controller is the

    autonomous control approach. This is a scheme whereby each

    19

  • processor independently determines its own next task based

    upon the current aircraft state. This can be better

    understood by using an analogy. Like the traditional

    centrally controlled computer architecture, a company has a

    president who has several vice-presidents working for him.

    The president has access to all information concerning the

    states of the company and an understanding of how the

    company should function. He uses this knowledge to allocate

    tasks to the vice-presidents and arbitrate any disagreements

    that may arise between them. Autonomous control is analogous

    to replacing each of the vice-presidents with a clone of the

    president. The vice-presidents are now capable of making

    the same decisions that the president would have made under

    the same circumstances, since each has access to the same

    data and would go through the same decision making process

    that he would. The need for the president has been

    eliminated and he has been replaced by autonomous

    vice-presidents. This approach is not practical in the human

    world because no two humans think alike. In the computer

    world, however, it is a realizable possibility.

    ReQuiremenf9__C9= ous Reconfiguration

    In order to make continuous reconfiguration of

    autonomously controlled processors possible, several

    requirements must be satisfied. These requirements include

    well-defined task assignment rules, availability of all

    system state information to all processors, availability of

    all software to every processor, and an efficient bus

    contention scheme. The methods used to meet each of these

    requirements in the laboratory implementation are covered in

    detail later in this report.

    The first requirement is for a set of well-defined task

    assignment rules. Each of the processors must have an

    efficient means of determining the next task that it is

    20

  • required to do. There must not be an opportunity for any

    processor to conflict with other processors in the system

    and cause system failures. The task assignment rules are a

    function of the operating system software and are discussed

    in detail in Section IV.

    A second requirement is that all processors must have

    all software. In order for a processor to be capable of

    doing any system task at any point in time, it must have the

    software available to do the task. This may seem

    unrealistic at first but the trends in memory technology

    indicate that memory may be expected to double in density

    several times in the next five years while its cost

    continues to decrease. This trend makes supplying all

    software to every processor a reasonable exchange for the

    benefits offered by the CRMmFCS.

    A third requirement of this system is that all

    processors must have all data. Since any processor must be

    capable of doing any task at any point in time, each

    processor must have access to all data concerning the

    present state of the aircraft. This requirement has been

    met by the development of a virtual common memory

    architecture which allows every processor to access any

    piece of data by what appears to be a simple read from a

    shared common memory. This concept will be discussed in

    detail in Section III.

    Finally, if all processors are to operate independently

    and yet share the same set of data busses for communication,

    some method must be found for them to agree on who can talk

    on a bus at any given time. Since central control of any

    kind is not allowed in a fully distributed architecture,

    this must be done without use of a central bus controller.

    The scheme selected must also be very efficient since bus

    bandwidth will be at a premium in systems with many

    21

  • processors. This requirement for an efficient, autonomous

    bus contention scheme was satisfied through a new approach

    called "transparent contention." It will also be discussed

    in Section III.

    E. SECTION SUMMARY

    This section has presented an overview of the CRMmFCS

    architecture and the philosophy behind it. The concepts of

    continuous reconfiguation, autonomous control, transparent

    contention, and virtual common memory have also been

    introduced. In the next two sections, these ideas will be

    discussed thoroughly from both hardware and software points

    of view. In the process, every major component of the

    CRMmFCS system will be described in enough detail to give

    the reader a complete understanding of the overall design.

    Numerous references to the appendices will be made along the

    way to aid the interested reader in an even more detailed

    study of the architecture.

    22

  • SECTION III

    HARDWARE ARCHITECTURE

    A. INTRODUCTION

    The CRMmFCS architecture consists of a collection of

    autonomous microcomputers interconnected by a set of serial

    multiplex busses so that they appear to share one common

    memory through which they communicate. This section

    presents the details of how the system was designed from a

    hardware point of view. Section IV will address the same

    subject from a software perspective.

    One of the main requirements of the CRMmFCS design was

    the elimination of any form of central control. This meant

    that processors had to independently determine their own

    task assignments and that some means was required to manage

    communication without a bus controller. The problem of task

    selection was solved with software in the CRMmFCS and is

    discussed in Section IV. Autonomous bus control, on the

    other hand, had a convienient hardware solution. It is

    therefore discussed in this section.

    The CRMmFCS data bus is fundamental to the entire

    architecture. The need for an autonomously controlled bus

    which would act like a common memory influenced the design

    of every system component. For this reason, the hardware

    elements of the architecture will be discussed in terms of

    their relationship to the global bus design.

    B. DISTRIBUTED CONTROL OF A MULTIPLEX BUS

    One of the most fundamental questions which must be

    asked when designing a system of autonomous processors is

    23

  • how they will communicate with each other. Direct

    connection between a large number of processors is clearly

    impractical since the number of lines required for n

    processors is n(n-l)/2, (which rapidly becomes very large).

    A common serial multiplex bus is a more reasonable

    alternative, but it introduces the problem of bus traffic

    control: How does one resolve processor contention for the

    bus without resorting to a central controller? This section

    presents a promising solution to the problem.

    Othe.rArproaches

    In order to place this new solution in proper

    perspective, it is helpful to briefly review several

    existing bus control schemes. Three such schemes will be

    discussed including a well known central control approach

    and two experimental distributed control methods.

    The classical central control approach is the MIL-STD-

    1553 class of busses. Using this scheme, each bus is set up

    with a central controller and every processor in the system

    is considered to be a "remote terminal." Any given

    processor can talk on the bus only when instructed to do so

    by the central controller. This provides secure and

    flexible control of global bus resources, but has a number

    of limitations. First, a processor wishing to transmit must

    wait until the controller gives it permission, resulting in

    some inherent throughput delay. Second, there is even more

    delay involved for one processor to obtain data from

    another. This is because it must request the data through

    the controller and wait until the other processor receives

    the request, looks up the answer, and sends it back.

    Finally, the system does not make efficient use of bus

    bandwidth because part of the available transmission time is

    used up in the overhead of bus control. Thus, the 1553 bus

    24

  • is simple and reliable but somewhat limited in performance.

    Because it requires a central controller, it also violates

    the assumed goal of a fully distributed system design.

    It should be mentioned in passing that there are

    1553-based designs which claim to implement distributed

    control (Reference 3). Such claims are true only in a

    limited sense. While p control of the bus may be

    distributed in such systems, at any given time there is

    still only one central controller operating in the

    traditional command/response mode. For the purposes of this

    report, distributed control will imply free and open

    competition for the bus under a set of rules which all

    participants obey. At no time is such contention arbitrated

    at any central location, even if the location does move

    around the system.

    There are at least two examples of truly distributed

    bus control schemes already in existence. The following

    paragraphs summarize some of the interesting features of

    each approach but no attempt will be made to cover them

    comprehensively. The reader may consult the indicated

    references for further details.

    The first approach, found in the University of Hawaii

    "Aloha" architecture (Reference 4), allows processors to

    transmit on the bus any time it is available. In the event

    that more than one processor starts at the same time, a

    "collision" is said to have occurred and they all stop

    sending immediately. Each then waits a slightly different

    interval before attempting to retransmit. The one which

    "times out" first gains access to the bus and completes its

    message while all other processors wait until the bus is

    once again available.

    25

  • This approach avoids the need for a central controller,

    but has the limitation that some time is wasted while

    colliding processors "time out." This becomes serious when

    demand for the bus is high and collisions are frequent.

    Another approach to distributed bus control is used by

    Honeywell (Reference 5). In this approach, all processors

    take turns using the bus for a fixed amount of time in a

    rotating fashion. If a processor has data to transmit, it

    waits for its turn and then sends it all in a burst of some

    maximum number of words. If it has nothing to transmit when

    its turn arrives, it sends a null word and the system moves

    on to the next transmitter. This is a fairly efficient

    scheme, although some time is wasted for null messages. The

    only real difficulty is keeping track of whose turn it is.

    Improving Bu s Effiiency

    While each of the approaches mentioned above has been

    made to work effectively, there are still a number of things

    which can be done to improve bus utilization efficiency.

    These possibilities include:

    (1) Scheduling transmissions to avoid periods of bus

    inactivity or overload.

    (2) Forming a queue of data to be transmitted in

    every processor so that every available micro-

    second on the bus is in use.

    (3) Making sure there is no wasted time between

    transmissions.

    (4) Making sure there are no wasted transmissions

    due to collisions.

    26

  • (5) Transmitting data when it is generated instead of

    waiting until it is needed (thereby reducing

    access delays).

    These five criteria served as design guides for the

    approach to be presented. In the remainder of this report,

    it will be shown how these goals have been achieved using

    the concepts of "transparent contention" and "virtual common

    memory."

    A New Approach to the Problem

    This section describes a new approach to autonomous bus

    control designed to meet the goals listed above. The

    following is an overview of how the idea works.

    Time on the bus is divided into a series of consecutive

    intervals (slots) that are exactly one transmission word

    long (32 to 46 bits, depending on word format). At the

    beginning of each new slot, all processors compete to fill

    the slot with a word of data. The resulting massive bus

    collision is then resolved using "transparent contention."

    Transparent contention is a scheme which allows collisions

    to occur on the bus in a manner such that only one of the

    colliding messages survives. All other messages are

    automatically suppressed without wasting a single bit of

    transmission time. As a result, the slot is filled with one

    and only one word and competition moves on to the next

    available interval.

    As long as there is data available to transmit, this

    approach packs data onto the bus with absolutely maximum

    density. No time is wasted during transmissions and no time

    is wasted between them. One hundred percent efficient bus

    utilization has been achieved.

    27

  • In order to ensure that there is always data available

    for transmission, each processor maintains a queue of words

    to be transmitted. As each new piece of data is generated,

    the processor places it into a first-in first-out buffer

    (FIFO) and "forgets about it." A special transmitter

    circuit then emptys the FIFO onto the bus by competing for

    time slices with all other transmitters in the system. This

    frees the processor from transmission considerations and

    ensures a constant flow of data onto the bus.

    There are, of course, potential problems with this

    approach. If data is not generated fast enough, it is

    possible for all buffers to become empty resulting in unused

    time slots on the bus. This is of no concern unless there

    are other times when too much data is generated resulting in

    backlogs and throughput delays. It is therefore important

    to schedule data generation in the system such that an even

    rate of transmission is maintained. A technique for

    scheduling data flow is presented later in this section.

    All that remains now is to explain the details of

    transparent contention. The following section discusses how

    it can render bus collisions harmless. Subsequent sections

    will present details on how to build and use such a system.

    C. TRANSPARENT CONTENTION

    This section presents the theory of operation behind

    transparent contention. While almost any bus configuration

    can make use of the idea, one specific design was chosen

    because of its ease of implementation in the laboratory.

    This design is covered first in order to clarify subsequent

    discussion of the transparent contention concept.

    28

  • The approach selected is nothing new. It amounts to

    nothing more than clocking data out of one shift register

    across a serial bus and into another. What jk unique is

    the manner in which this process is controlled to prevent

    conflicts on the bus between contending transmitters. In

    order to understand this process, it is helpful to review

    how a single transmitter operates when there is no

    competition from other transmitters.

    Transmission Without Contention

    The essential elements of the bus architecture are

    shown in Figure 7. In the figure, three processing modules

    are shown interconnected by a common serial bus made up of a

    data line and a clock line. Each processing module consists

    of an ordinary microcomputer with two I/O devices including

    a broadcaster (B) and a receiver (R). These devices use the

    signal on the clock bus to shift data on to and off of the

    data bus respectively. The box labeled "T" in the figure is

    a bus termination circuit which generates the clock signal

    and terminates the bus properly (See Appendix C).

    Using this simple bus structure, a word of data is

    transmitted in the following manner. The processor wishing

    to transmit first places its information in its local

    broadcaster FIFO. If the bus is available (as we assume in

    this section), the broadcaster immediately latches the FIFO

    output into a serial shift register and shifts it onto the

    data bus with each positive-going edge of the bus clock. On

    each negative-going edge, a bit on the bus is shifted into

    receiving shift registers in every processor. From there,

    the complete word is moved into local memory in each

    processor using direct memory access. This technique will

    be discussed later in the section on "Virtual Common Memory

    Design."

    29

    ___________777-7_

  • clockT data

    B R B R IBY R]

    micro micro micro

    Figure 7. Essential Architecture Elements

    30

  • Transmission With Contention

    The question now arises, "What happens when more than

    one processor wishes to use the bus at the same time?" The

    answer is simply, "one of them wins." Exactly which one

    wins is determined by a special logic circuit in each

    transmitter which resolves the conflict. Its operation is

    described below.

    In the first place, access to the bus is granted on a

    first-come, first-served basis. While one processor is

    actively using the bus, a logical BUSY signal is maintained

    which prevents any other processor from initiating a

    broadcast (See Appendix A). This eliminates many

    conflicts, but sooner or later more than one transmitter

    will begin using an available bus on the exact same clock

    pulse. When this happens, some other method is required to

    resolve the contention.

    The solution is found by observing what actually

    happens when two transmitters put data on the bus at the

    same time. As shown in Figure 8, each transmitter is

    connected to the bus by an open collector transistor buffer.

    When a transmitter wants to send a "zero", it turns on its

    output transistor shorting the bus to ground. To transmit a"one" the transistor is turned off, allowing the bus to be

    pulled high by the pull-up resistor. As long as all

    transistors are turned off, the bus will float at a logic"I". If any transistor turns on, the bus will be pulled to

    a logic "0" state.

    The net result is that logic zeros have an inherent

    prioriy on the bus. Because a "1" is transmitted by

    "letting go" of the bus (so it will float high) while a "0"

    is transmitted by actively pulling the bus low, units

    31

  • *ref

    pull-up resistor

    transmitter open-l transmitter olectoA 'collector B transetortransistor tasso

    buf fer buf for

    Fiquro 8. Transmitter-Bus Interface

    32

  • transmitting zeros will always win out over those sending

    ones. It is this characteristic which allows transparent

    contention.

    The key to the idea is that every transmitter

    constantly compares what it is trying to put on the bus with

    what is actually there. In the event of a disagreement, the

    transmitter simply stops sending and waits for the bus to

    become available again. What makes this approach work is

    that when any two transmitters disagree, only one of them

    notices and drops off. The other one does not notice

    (because it got its way on the bus) and therefore continues

    its transmission. No bus time is wasted because one message

    is finished without interruption.

    At this point an example is helpful. Suppose two

    broadcasters begin to transmit on the same clock pulse as in

    Figure 9. Transmitter one attempts to send the binary

    sequence 01001 while transmitter 2 sends 01101. During the

    first microsecond, both pull the bus low and observe a zero

    on the bus. Since that is what they wanted, they continue

    to transmit. During the next microsecond, both transmitters

    "let go" of the bus allowing it to float high. They each

    observe a logic 1 and, satisfied, continue to transmit.

    However, during the third interval, transmitter 2 releases

    the bus to let it float high while transmitter 1 actively

    pulls the bus low. They both observe a zero on the bus.

    Since that is what number 1 wanted, it continues to

    transmit. Number 2, on the other hand, does not get its

    desired "one" and concludes that some other transmitter has

    pulled the bus low. It therefore aborts its transmission

    and waits for the bus to become available again.

    The net result is that transmitter 1 successfully

    completes its transmission from start to finish without

    interruption while transmitter 2 aborts as soon as the two

    33

  • transmitter 11 0 0desired output

    transmitter 2 11 0 1 1desired output

    actual output 0 0 -1appearing on bus

    time (microseconds) 1 2 3 4 5 6

    Figure 9. Bus Contention Arbitration

    34

  • disagree. No transmission time was lost and, in fact,

    transmitter 1 was never even aware of its competition.

    "Transparent contention" has been achieved.

    This concept works equally well for any number of

    transmitters in contention. If ten of them start

    simultaneously, they all send in parallel until there is a

    disagreement. At that time those sending zeros win while

    those sending ones drop off. The remaining transmitters

    continue until the next conflict at which time still more

    losers drop off. Eventually, only one transmitter is left

    and it finishes its transmission, completely unaware of its

    nine vanquished competitors.

    This approach is based upon the assumption that no two

    transmitters will ever try to send identical words at the

    same time. If this coincidence should occur, each processor

    would assume that its own broadcast was successful and only

    one copy of the word in question would appear on the bus.

    This may or may not be tolerable depending upon how

    information appearing on the bus is used.

    A more significant consideration is the event in which

    two words being transmitted fail to disagree until near the

    end of the word. At that time the losing processor would

    abort its transmission, but only after having wasted its

    time sending most of the word. In a system with only one

    bus this is unimportant since the losing transmitter would

    have nothing to do but wait for the bus anyway. But in a

    system with n busses (to be discussed next), it is desirable

    for a transmitter to find out if it is going to lose as soon

    as possible so that it can begin searching for another bus.

    If these considerations are important, there is a

    simple solution. Each transmitter adds its own unique

    identification code to the beginning of each message. In a

    35

    BE,. . . . . .. . ... ... .... . .. ... .... ,- - - A

  • system with 16 processors, this code would be 4 bits long.

    Five bits would allow up to 32 processors, and so on. Using

    this method, two processors are guaranteed to disagree

    within the first 5 bits freeing the loser to seek another

    bus. This aproach has the added benefit that it is possible

    to determine which processor initiated each broadcast for

    fault isolation purposes.

    An Extension to n Buzes

    The bus structure which has been discussed so far

    represents a very simple way to interconnect a large number

    of autonomous processors without need of a central

    controller. however, a single bus system of any kind is

    generally unacceptable from a reliability standpoint. At

    the very least, some form of redundancy is required in order

    to avoid a potential single point failure node in the

    system. Also, a single serial bus has only a finite

    bandwith. A large system of processors exchanging massive

    amounts of data can quickly saturate such a bus. The

    approach proposed in this report is ideally suited for

    expansion to as many busses as are needed to meet the

    reliability and throughput requirements of nearly any

    system. The following paragraphs detail the implementation

    and advantages of an autonomously controlled n-bus design.

    Figure 10 shows the transmitter interface of a single

    processor in a system with four busses (4 sets of clock and

    data lines). The circuit is controlled by the box labeled

    "transmission control logic." Upon receiving a "START"

    signal (from the CPU), this logic instructs the "bus finder"

    to "SEARCH" for a free bus. When it finds one, it locks two

    data selectors and a data distributor onto the bus (using

    its "BUS SELECT" lines) and signals the control logic that

    it has "FOUND" a bus. The control logic then loads the

    shift register with data from the CPU output buffer and

    36

  • cdock busse m

    w data busses = -

    bus ,4 to 1 4 tol to 4available selector selector distributordetector

    control logic *:-w register

    CPU

    strt

    Figure 10. Transmitter Circuit

    37

  • enables the shift register clock. Data is shifted (using

    the appropriate bus clock) out through the 1 to 4

    distributor onto the selected data bus using the same

    open-collector transistor buffers shown in Figure 8. As

    always, a one-bit comparator monitors the difference between

    the shift register (desired) output and the actual output on

    the selected bus. If there is ever a miscompare, an "ABORT"

    is generated and the transmission control logic instructs

    the bus finder to locate another bus. This process

    continues until the transmitter is successful at placing its

    entire word on the bus, at which time another word is

    obtained from the CPU buffer and the cycle begins again.

    This bus design has tremendous flexibility. Its

    bandwidth is exactly four times that of a single bus and can

    be expanded still further with additional busses.

    Reliability is also enhanced. Because processor to bus

    connections are continuously reconfiguring, selection of an

    alternate bus in the event of a failure is instantaneous and

    automatic.

    There are, however, a few physical limitations which

    remain to be resolved. The first is that there is a limit

    to how many open-collector transistors can be "wire-ored" to

    one bus before the sum of their leakage currents pulls the

    bus low even if no transistor is on. This limit can be

    increased using low leakage transistors, but can never be

    totally ignored. Noise considerations on such a bus will

    also require further research. For the present, the triplex

    data approach described in Section IV will be relied upon to

    correct for noise-corrupted transmissions.

    Another consideration is the effect of propagation

    delays on the output of each transmitter comparator. The

    fact that desired and actual outputs match at one location

    38

  • is no guarantee that the same holds true many feet away on

    the bus. This problem is avoided for reasonable line lengths

    by the manner in which data is clocked onto the bus. Data

    is shifted onto the bus on the rising edge of each clock

    pulse, but the comparators output is not sampled until the

    falling edge. This allows one half of a microsecond for the

    data to settle before it is used.

    Finally, the open collector transistor implementation

    is only one approach to the transparent contention concept.

    Any technique where one logic state wins out over another

    will work. In the case of fiber optic busses, for example,

    the presence of light on the bus could be made to win out

    over its absence, and so on. For the purposes of concept

    demonstration in the laboratory, the wired-or approach has

    been shown to work very well.

    D. VIRTUAL COMMON MEMORY

    Up to this point, discussion has centered around the

    transparent contention concept and its physical

    implementation. In this section an actual application is

    presented, allowing the development of what is called"virtual common memory."

    The Virtual Common Memory Concept

    One of the main problems that occurs in the design of

    multiprocessor systems is how to distribute and exchange

    data efficiently. From a hardware standpoint, the easiest

    approach is usually to connect all processors to a common

    serial multiplex bus (Figure Ila). This minimizes hardware

    complexity and allows expandability, but often involves a

    large software overhead. This is because processors must

    39

  • a, b.wcommon data bus common memory

    "virtual memory"

    Fiqure 11. Evolution of Virtual Memory

    40

  • exchange data on a "request" or "broadcast" basis, both of

    which require special handling by every processor in the

    system.

    In the "request" mode of operation, a processor that

    needs a piece of information simply asks for it on the bus

    and receives it from some other processor a short time

    later. This means that each processor must constantly

    monitor the bus for data requests rather than concentrating

    upon the task to which it has been assigned. Because

    processors must wait for much of their data, processing

    inevitably takes longer than it would if all data were

    available in local memory at the start of a given task.

    Additional processing time is also wasted in responding to

    requests for data from other system processors.

    The "broadcast" method is an alternate approach to data

    exchange where every piece of data is transmitted as soon as

    it is produced. Each processor then selects from the bus

    whatever information it needs to accomplish its current

    task. This approach also requires a lot of overhead as each

    processor must now constantly monitor the bus for items of

    local interest.

    Thus, the common serial multiplex bus, while being the

    most simple and flexible from a hardware point of view, has

    serious drawbacks in terms of software complexity.

    The simplest and most efficient approach to

    interprocessor communication from a software point of view

    is a common memory containing all information required by

    all processors (Figure llb). In such a system, a processor

    stores its output in the common memory where it can be

    instantly accessed by any other processor in the system.

    When one processor needs information from another, it simply

    reads it from the common memory without delay.

    41

  • Unfortunately, what is ideal from a software standpoint

    is difficult to implement in hardware. Serious contention

    problems develop when more than one processor attempts to

    access the same block of memory. Since each processor must

    be connected to the common memory by a complete set of

    address and data lines, the hardware complexity is also

    large. Finally, system expandability is impaired. In a

    serial bus system more processors can be added by simply

    connecting them to the bus, but there is a limit to the

    number of ports available in a common memory. When these

    have been used, no more can be added without redesigning the

    system.

    So, it appears that what is good for hardware is bad

    for software and vice-versa. Clearly, a scheme that could

    combine the best of both approaches is highly desirable.

    Virtual common memory is such a solution.

    Virtual common memory is a method for making a serial

    multiplex bus look like a single common memory to the system

    programmer. As such, it combines the hardware simplicity ofa serial bus with the software simplicity of a common memory

    (Figure llc). In the following paragraphs the physical

    implementation of a virtual common memory containing all

    information to be shared among processors will be

    described.

    Virtual Common Memory Design

    This section describes the virtual common memory design

    used in the CRMmFCS architecture. The essential features of

    this design are shown in Figure 12. In the figure, six

    processing modules are shown connected by four serial

    multiplex busses of the kind described above. This

    represents the system actually being constructed at the

    42

  • Flight Dynamics Laboratory, although additional busses or

    processors could have been included. The number shown is

    considered to be enough to demonstrate the overall concept.

    In a true shared memory architecture, the common memory

    contains all information required by any processor in the

    system. This information describes the entire state of the

    aircraft, and the memory which contains it is called the"state information memory" (SIM). When a processor needs a

    particular state variable, it accesses a well-defined

    location in the SIM. When it generates a variable, it

    places it in a specific SIM location where other processors

    can find it. No other processing is required for complete

    interprocessor communication. A complete discussion of the

    SIM concept is presented in Appendix F.

    In the virtual common memory design of Figure 12, each

    processor is given its own copy of the SIM. To access a SIM

    variable it simply looks up its own copy. To store a

    variable into the SIM, a processor broadcasts it over the

    bus and every processor's copy is updated simultaneously.

    Since reading from a local copy is the same as reading from

    a common one, and since writing to the bus is the same as

    writing to a common memory, as far as any processor is

    concerned there is only one "virtual" common memory being

    used by everyone.

    In order to make bus transmissions appear to be reads

    and writes on a common memory, two special circuits were

    designed. These circuits are described next.

    The Transitt . The transmitter circuit (labeled

    "XMIT" in Figure 12) was shown in detail in Figure 10. It

    is connected to the microcomputer through a two-page buffer

    (P1 and P2) which is memory-mapped to the local CPU. These

    43

    ......................................... _- _

  • 'tp T PI P2 rpi. P2 P' i,P 2 IP1 P2 , iP 2

    local local local local local localmeoymemory memory memory memory memory

    1-'icluiru 12. Ph.,2 .icAI -I npleinecnta.tioa of Virtual Niemory..

    44

  • pages alternate functions every millisecond so that, while

    one of them is being loaded by the CPU, the other is being

    unloaded onto the bus. This dual buffer approach ensures

    that all data is transmitted with no more than one

    millisecond delay. Its application will be discussed in

    greater detail later. Complete details on the transmitter

    circuit are given in Appendix A.

    The Recer. The receiver consists of two major

    parts including a serial to parallel converting shift

    register (SIPO) and a block of random access memory (RAM)

    which contains a complete copy of all state information in

    the system. This state information memory (SIM) is mapped

    into the microcomputer's address space as a block of "read

    only" memory and is accessed by the SIPO outputs as a block

    of "write only" memory. Figure 13 shows how this works.

    In the architecture under discussion, a word of

    transmitted data is 37 bits long. It consists of four bytes

    of significant data separated by a zero bit before and after

    each byte (see Figure 13). A string of more than eight

    consecutive "I" bits on the bus indicates that it is no

    longer in use, so zero bits are included between each byte

    to ensure that the bus continues to "look" busy in the event

    that more than eight consecutive ones occur in the actual

    data word.

    Once the SIPO has been fully loaded with a 37 bit word

    from the bus, the 32 significant bits (excluding the 5

    separating zeros) are loaded onto the address and data lines

    of the SIM RAM as follows. The first 5 bits contain the

    identity code of the sending processor. These bits were

    used only for quick resolution of bus contention and could

    be discarded unless it is important to record who sent each

    word for fault isolation purposes. (In the CRMmFCS design,

    45

  • date bus serial-in parallel-out shift register(SIPO)

    clock bus 0 byte4 01 byte3 10 byte* 01 bytel 101

    16 bit j stateSim data s Information

    word j6 memory jot PS2k by6 bit ram

    >

    Iw Id'. bi kmades fomCU

    Figure 13. Receiver Block Diagram

    46

  • e ~~ ~~, .. .. ..-.. . -i .

    these bits are in fact saved in the SIM for use by

    black-balling algorithms -- see Section IV-D.) The next 11

    bits specify the location in the SIM to which this data word

    is to be stored. They may be thought of as the name of the

    SIM variable and, in this case, they allow up to 2024

    different variables. Finally, the last 16 bits of the

    transmission contain the value of the variable. These 16

    data bits are loaded into the SIM via direct memory access

    (DMA). The variable is now available for access by the CPU

    whenever it is needed.

    While all of this is happening, the SIPO register is

    collecting the next word of data as it appears on the bus.

    This, in general, begins after nine bus clock cycles (during

    which the bus floats "high"). At this time every other

    transmitter realizes that the bus is no longer in use

    (because nine consecutive "ones" have occurred) and

    contention for the bus begins again. Thirty seven

    microseconds later the SIPO is again full and another DMA

    cycle is executed to load its contents into the SIM. Thus,

    a word is received every 9 + 37 = 46 microseconds.

    In a system with n busses, there are n SIPO shift

    registers requesting direct memory access to the SIM. Since

    there are 46 microseconds between successive DMA requests

    from any one bus, and since current high speed RAM can

    handle as many as four accesses per microsecond, it is

    theoretically possible for a system to have as many as 4 x

    46 = 184 serial busses, each operating at 1 MHz, for a 184

    MHz total system bandwidth. Such a system would allow

    processors to exchange up to 184 x (1 variable/46

    microseconds) = 4 million variables per second.

    Of course, connecting 184 shift registers to a single

    block of memory is impractical with today's technology, but

    47

  • if the entire CPU, SIM, transmitter, and receiver were

    integrated on a single chip (with only the bus lines brought

    out to the pins), such an approach might well be possible.

    Until then, a practical limit is about 8 busses with a

    corresponding bandwith of 8 MHz. Appendix B presents a

    complete description of the receiver design.

    While the potential for high throughput rates is

    intriguing, the real usefulness of virtual common memory is

    to allow easier programming of processors connected by

    serial multiplex busses. The next section shows how a

    virtual common memory architecture can be used most

    effectively for this purpose.

    E. EFFECTIVE USE OF VIRTUAL COMMON MEMORY

    Knowing how to use a virtual common memory architecture

    can make a big difference in how useful the idea is. This

    section discusses the application of virtual common memory

    to the CRMmFCS design. This is not the only way to use

    virtual common memory, but it does illustrate some important

    considerations for effective use of the concept.

    In the first place, it is important to schedule

    transmissions carefully in order to take full advantage of

    the data packing capabilities of the architecture. Figure 14

    shows how transmissions are scheduled in the CRMmFCS design.

    Time in the system is divided into 1 ms frames and all

    processors in the system are synchronized to this frame

    rate. Since each transmission is exactly 46 microseconds

    long, and since the transmitters pack data onto the bus with

    no wasted time in between, there is room for 1000/46 or 21

    48

  • complete transmissions per bus per millisecond. Therefore,

    when system software is being written, it is modularized

    into 1 millisecond chunks (called "millimodules") and no

    more than 4 x 21 = 84 transmissions are scheduled for any

    one millisecond. Since there are 84 slots available in each

    frame, this guarantees that every scheduled transmission

    will be completed sometime within its assigned millisecond.

    Each dot in Figure 14 represents a 46 bit word of data

    appearing on the bus. In this example, 22 of the 84

    available slots are scheduled for use. At the beginning of

    each millisecond, all transmitters compete to place their

    part of the scheduled variables onto the bus. Transparent

    contention packs these words into one slot after the other

    on all four busses until every scheduled word has been

    transmitted. Then the bus sits idle until the start of the

    next millisecond.

    It is, of course, possible to schedule 84 transmissions

    per millisecond and never have the bus sit idle. However,

    for reliability reasons, it is usually wise to leave enough

    unused slots to "take up the slack" in the event that one of

    the busses fails. In the CRMmFCS design, 42 slots are left

    unused in each frame so that the system can tolerate two bus

    failures without loss of throughput.

    This approach makes processor task scheduling a lot

    easier. Because it is known (to the nearest millisecond)

    exactly when a variable will appear on the bus, it is also

    known (to the nearest millisecond) how soon a task can be

    scheduled using that variable. Figure 15 summarizes this

    scheduling procedure and illustrates an efficient virtual

    common memory system in operation.

    49

  • * S* S S S* S S S* S S S* 5 5 5* S S S

    -4

    wU

    S*U)

    LI)

    * S 5 2U)

    * 5 5 S* 5 S S* 5 S S* S S S* S S S U)

    m

    .~j.-4

    I))

    -4

    * S* S S S* S S S* S S S* 5 5 5* 5 5 S

    liii

    50

  • The figure is divided into four different rows showing

    where variables A, B, C, and D are scheduled to be over a

    period of four milliseconds. During the first frame,

    variables A and B are shown in row 1 indicating that they

    are currently in the SIM and available for use. As a

    result, Task 1, which computes C = A + B, can be scheduled

    for this frame as shown in row 2. Once C has been computed,

    it is placed in the currently active page of the transmitter

    buffer (refer to Figure 12) where it remains for the

    duration of the frame. Let us assume that this was page 1.

    At the end of frame 1, the two transmitter pages are

    switched so that page 1 is connected to the bus and page 2

    (which was emptied onto the bus during frame 1) is connected

    to the CPU to collect any data generated during frame 2.

    Now that page 1 is connected to the bus, the transmitter

    circuit broadcasts variable C in the first available slot.

    This is indicated in row 4 of the figure.

    It is not possible to know exactly when during frame 2

    variable C actually appears on the bus. This is a random

    function of when transparent contention allowed the

    transmitter to gain access. But since there are more slots

    than there are scheduled variables, sooner or later C will

    get its chance. It is therefore guaranteed to reach the SIM

    in every processor by the end of frame 2. This is indicated

    by showing C in row 1 ready for use during frame 3. Task 2,

    which computes D = SQRT(C), can now be scheduled for frame 3

    and the entire process repeats.

    Thus, one of the major goals of the design has been

    accomplished: elimination of access delay to global

    variables. Because data is broadcast to every processor as

    soon as it is generated, it is always available in the SIM

    by the time a task is scheduled to use it.

    51

  • variables in sim Aready for use AB C

    variables generated 2during this millisecond 2C=A+B D=vC

    variables in buffer C Dready for broadcast 3Dvariables appearing A

    on bus this millisecond C Dtime (milliseconds) 1 [ 2 1 3 1 4

    Figure 15. Data Flow Assignment

    52

  • F. SECTION SUMMARY

    This section has presented an overview of the essential

    hardware elements of the CRMmFCS. It has also introduced

    two new concepts in interprocessor communications. The

    first is called "transparent contention" and represents a

    method for autonomous processors to share a common bus at

    maximum efficiency without need of a central controller. It

    has been shown that this approach opens up many new

    possibilities for improved bandwidth while avoiding the

    pitfalls of single point failures possible in many central

    controller designs.

    The second concept presented is called "virtual common

    memory." It represents a method of minimizing the hardware

    and software involved in interprocessor communications.

    While not essential to the concept, transparent contention

    was shown to be an ideal method for implementing virtual

    common memory in many practical situations.

    The next section of this report discusses how the

    CRKmFCS software structure was designed to utilize the

    hardware which was developed in this section.

    53

  • SECTION IV

    SOFTWARE STRUCTURE

    A. INTRODUCTION

    In Section III it was shown how a simple set of serial

    busses could be made to look like a shared common memory to

    the system programmer. Now that such a system may be

    assumed to exist, it is time to show how software can be

    designed to take advantage of this new architecture. This

    se