document # date effective lat-td-03724-01 may 3, …€¦ · document # date effective...

Hard copies of this document are for REFERENCE ONLY and should not be considered the latest

revision.

Document # Date effective

LAT-TD-03724-01 May 3, 2004

Author(s) Supersedes

Eric J Siskind, Phd NYCB Real-Time Computing Inc

Subsystem/Office

Electronics & DAQ Subsystem Document Title

FPGA Theory of Operation, AEM, GLAST LAT GASU DAQ Board

FPGA Theory of Operation, AEM


revision. LAT-TD-03724-01 2

CHANGE HISTORY LOG

Revision Effective Date Description of Changes

01 May 3. 2004 original



revision. LAT-TD-03724-01 of 27

Table of Contents -TOC

1. INTRODUCTION ........................................................................................................................ 4

2. DEFINITIONS AND ACRONYMS ............................................................................................ 5

2.1 Definitions............................................................................................................................. 5

2.2 Acronyms.............................................................................................................................. 5

3. REFERENCES ............................................................................................................................. 7

4. HARDWARE OVERVIEW, VHDL ORGANIZATION, & MEMORY ARBITRATION ........ 8

4.1 VHDL Organization.............................................................................................................. 9

4.2 Memory arbitration ............................................................................................................... 9

5. SPACE-FLIGHT CONSIDERATIONS..................................................................................... 12

6. GLOBAL LOGIC FEATURES.................................................................................................. 15

6.1 Nomenclature...................................................................................................................... 15

6.2 Clocking.............................................................................................................................. 16

6.3 VHDL Process Decomposition........................................................................................... 19

7. INDIVIDUAL MODULE FEATURES ..................................................................................... 20

7.1 LATp _Registers................................................................................................................. 20

7.2 Environment_Registers....................................................................................................... 21

7.3 Memory_Write_Data_Multiplexer ..................................................................................... 22

7.4 Memory_Address_Multiplexer........................................................................................... 22

7.5 Memory_Control................................................................................................................. 22

7.6 Trigger_Processor ............................................................................................................... 23

7.7 Event_Receiver ................................................................................................................... 25

7.8 Event_Transmitter............................................................................................................... 26




1. INTRODUCTION

This document contains a description of the theory of operation for the FPGA-hosted VHDL-

generated logic that implements the function of the Anti-Coincidence Detector (“ACD”) Electronics

Module (“AEM”) for the GLAST LAT GASU DAQ board. It is intended as an aid for the FPGA

developer or reviewer or the software implementer attempting to understand the operation of the

VHDL-generated logic. For the flight circuit board, the host FPGA is an Actel RT54SX72S part.

Early versions of the flight circuit board may be loaded with a non-radiation-hard variant of this

latter part, the Actel A54S72A.




2. DEFINITIONS AND ACRONYMS

The following terms, abbreviations, and acronyms are used in this document:

2.1 Definitions

adj adjustable

cm centimeter

Eff Efficiency

Hz Hertz, unit of frequency

kHz kilohertz, 103 Hz

MHz Megahertz, 106 Hz

msec millisecond, 10-3 Second

mV millivolt, 10-3 Volt

p-p peak-to-peak

s, sec seconds

µ 10-6

V Volt

W Watt

2.2 Acronyms

ACD Anti Coincidence Detector

AEM Anti Coincidence Detector Electronics Module

CRU Command Response Unit

EBM Event Builder Module




EBU Event Builder Unit

FIFO First In First Out memory

FPGA Field Programmable Gated Array

FREE Front End Electronics

GEM Global Trigger Electronics Module

GLAST Gamma Ray Large Area Space Telescope

LAT Large Area Telescope

LVDS Low Voltage Differential Signaling

PDU Power Distribution Unit

SRAM Static Random Access Memory

TAM Trigger Accept Message

VHDL VHSIC Hardware Description Language




3. REFERENCES

LAT-TD-00639, AEM programming ICD, functional specifications of the software interface to the

AEM

LAT-TD-00363, electrical interface between the AEM and the ACD front-end

LAT-TD-00606, details of the LATp inter-module serial protocol

LAT-TD-01545, format of the trigger message from the global-trigger electronics module (“GEM”)

to the AEM (and elsewhere)




4. HARDWARE OVERVIEW, VHDL ORGANIZATION, & MEMORY ARBITRATION

The logic within the AEM has the responsibility for performing the following functions:

• Implement a node on the LAT command/response fabric. By exchanging serial messages

implementing the LATp protocol with the CRU, this function permits software executing in

the currently active Spacecraft Interface Unit (“SIU”) cPCI system which is the commander

of the command/response fabric to write and read registers within the AEM. These registers

control both parameters of the AEM’s operation as well as the application of 3.3 and 28 volt

power to the individual ACD Front-End Electronics (“FREE”) boards.

• Extend the access of the command/response fabric to include the GARC and GAFE ASICs

on the FREE boards. This function involves converting the format of LATp messages from

the CRU that target the FREE board ASICs into the command format supported by those

ASICs, and re-formatting the reply messages received from those ASICs into appropriate

LATp messages returned to the CRU.

• Read the contents of 52 ADCs on the DAQ board and format these data into a block of

registers accessed via the command/response fabric node interface. These ADCs monitor

voltages of power going to each FREE board and generated by associated high voltage bias

supplies, the temperature of each FREE board, total current supplied to the ensemble of

FREE boards, and local power supply voltage and board temperature at the DAQ board.

• Receive Trigger Accept Messages (“TAMs” or “TAM messages”) from the GEM, generate

the appropriately timed sequence of calibration and/or event readout commands to the FREE

board ASICs, and store the TAM messages in a buffer memory. Note that the command path

to the FREE board ASICs used to implement this function is identical to that utilized to

forward formatted commands from the command/response fabric to these ASICs, and that

only a software protocol prevents the collision of these two classes of commands. This

function also includes generation of dead-time to the GEM until the appropriate set of data

responses has been received from the FREE board ASICs or alternative timeout processing

has occurred.




• Receive readout data from the FREE board ASICs in response to event readout commands

and store these data in a buffer memory. Note that the response path from the FREE board

ASICs used to implement this function is identical to that utilized to receive responses to

commands from the command/response fabric, and that only a software protocol prevents the

collision of these two classes of responses.

• Format stored TAM messages and event data from the buffer memory into messages

consisting of sequences of one or more LATp cells and forward these event contributions on

the event fabric connection to the EBM, while responding to backpressure from the EBM.

4.1 VHDL Organization

The first two of these functions are combined into a single VHDL module; each of the remaining

functions is coded into its own VHDL module. In addition, there is a top-level wrapper module that

instantiates the other five modules. The top-level module instantiates three additional modules

which generate the write data, address lines, and control signals for the asynchronous SRAM

respectively. It also contains a limited number of input flip-flops for those pins that feed logic in

multiple lower-level function modules, as well as a clock buffer and inverter.

4.2 Memory arbitration

The heart of the AEM is the buffer memory system, which provides the ability to simultaneously

receive new events from the FREE boards while forwarding previously acquired events to the EBM.

The AEM’s buffer memory size is 512k 16-bit words, implemented in two 512k by 8-bit static

memory chips. Only a small portion of this total capacity is used to buffer TAM messages and event

data; this active buffer region can be relocated within the overall memory region by entries in the

AEM’s configuration register in order to mitigate the effects of partial memory failures. At present,

the design generates the two most significant and two least significant of the memory’s 19 address

bits from this relocation register, thus providing some reserve in the case of failure of SRAM

internal components associated with either entire rows or entire columns of the chips’ memory

arrays.

Allocation of memory cycles is performed via a fixed time-slice algorithm utilizing a free-running

mod-15 counter rather than employing demand-based real-time arbitration. Time-slices 0-5 and 8-




13 are allocated to writing event data from FREE boards 0-11, while time-slice 7 is utilized to write

TAM data. The remaining time-slices 6 and 14 provide bandwidth for reading data to be formatted

into output event contributions sent to the EBM. In fact, since the output channel is only a single bit

wide while the memory width is 16 bits, this is slightly more than twice the necessary memory read

bandwidth. However, there is a two-tick pipeline delay introduced between the selection of a

memory address and the return of the read data from that address. The first tick is associated with

moving the output of the memory address multiplexer within the FPGA through an output pipeline

latch to the external pin and thence (ultimately) to the memory chips. The second tick is the pipeline

delay to latch the read data at the FPGA’s input pins. This read pipeline is effective as long as Tpd

at the address output pins plus Taa at the memory plus Tsu at the read data input pins is less than the

worst case clock period, appropriately de-rated for clock skewing plus circuit board trace

propagation delays. Thus, despite the fact that there is a potential delay of up to 8 clock ticks while

waiting for a memory read time-slice plus an additional two-tick delay after that to obtain the read

data word, it is still possible to generate the address for the next memory fetch conditionally from the

contents of the current fetch while the 16 bits from the current fetch are being serialized. In addition,

although the memory cycle allocation runs on a mod-15 counter while the output event is generated

according to the LATp mod-132 format (with 8 fetches spaced every 16 ticks but the ninth after an

additional 20 ticks), the presence of two time-slices allocated to memory reads every 15 ticks

ensures that the event fabric output construction FSM will always receive new memory read data in

a timely fashion.

The format of event data from the FREE board ASICs consists of a fixed-length 39-bit header

followed by 0 to 18 15-bit data words. The header is deserialized into two 16-bit words plus a 7-bit

field left-justified in a third 16-bit word; each 15-bit data word is right-justified in its own 16-bit

memory word. (The previous sentence neglects a transfer of one bit per word into the previous

word, which has no effect on memory bandwidth or latency.) The memory time-slicing algorithm

was chosen to maintain pace with the 15-tick rate of generation of new data words from each FREE

board once the fixed-length header has been processed. Given this choice, no additional latency

requirement is imposed by the arrival of the second word of deserialized header contents 16 ticks

after the generation of the first such word. However, an additional latency requirement is imposed

by the 7-tick interval between the arrival of the second and third words. The requirement is met by




the inclusion of a second 16-bit latch between the deserializer output and the memory buffer for each

of the twelve FREE boards. In effect, for each FREE board a separate two-deep by 16-bit wide

hardware FIFO is implemented within the FPGA fabric in order to buffer the variable arrival

intervals between successive words from the deserializers to the time structure of memory time-slice

availability. Note that the allocation of a separate time-slice to the writing of TAM messages is not

strictly necessary, as this activity always precedes the entry of data returned from the FREE boards

into the memory buffer. However, as the additional slice was available its allocation to this purpose

simplified the logic. Alternative time-slicing schemes with both mod-14 and mod-16 counters were

considered; the former adds nothing to performance and detracts from simplicity by requiring that

the TAM share write cycles with the FREE boards, while the latter requires that the depth of the

hardware FIFO for each FREE board increase from two entries to three.




5. SPACE-FLIGHT CONSIDERATIONS

Very few specific design changes were made to the FPGA logic to accommodate the demands of

space-flight. This reflects not a careless and negligent attitude but rather a deliberate and considered

approach. Specifically, it was understood at the outset that no incorrect action taken by the logic

would represent a threat either to human life, to the integrity of the spacecraft, or to the well-being of

the overall LAT instrument. There are no consumables, whether finite energy sources or one-time

pyrotechnics, that are expended at the direction of this logic. At most, a transient event can result in

a slight overuse of power, error in the collected data, or (in the case of the power control to the ACD

front-end electronics) excessive power-cycling of electronic components. In general, the use of the

Actel RT54SX parts provides adequate SEU protection by virtue of the triplicated feedback paths in

both the master and slave portions of the R-cells inherent in that family of components. The effect

of SEU on data quantities stored in RAM memory buffers was not deemed specifically important in

the overall system architecture to merit the inclusion of coding with a Hamming distance of four (to

correct single-bit errors and detect double-bit errors) or even two (to detect single-bit errors) in the

specifications for any of the data paths in the electronics. An exception to this statement may now

be considered to include coding with a Hamming distance of four on data fields that describe the

number of words entered into FIFOs (so that the proper number of words will be removed from

those FIFOs) in the particular instances where those word counts themselves pass through FIFOs or

other memory structures that are not SEU-protected.

At the system architecture level, and again at the software architecture level, the overall design has

continually stressed the concept of errors that are “fatal, but not serious,” i.e. that cause the

unplanned termination of LAT instrument data-collection activities and/or software threads of

execution, but are easily recovered from by warm or cold restarts of software in conjunction with

hardware resets. Such errors, as long as they are infrequent and uncorrelated with specific physics

event topologies, only serve to add slightly to instrument dead-time with no further consequences.

The only specific concession that was made to space-flight was the inclusion of the “safe” attribute

on FSMs whose states were not encoded in such a way as to eliminate all possible illegal states,

regardless of the fact that the only way that these states can be entered is via multi-event upset. This




inclusion was made solely in the name of political correctness, and with the understanding that the

inclusion of logic to force the FSMs out of illegal states into a known idle state did not materially

detract from the maximum achievable clock frequency. Specifically, it is the designer’s belief that

any event which results in an FSM entering an illegal state requires a full and complete reset of the

hardware to ensure its future correct operation, and cannot be recovered from by simply forcing the

FSM into a known idle state. In particular, the full state of any set of control logic with an FSM at

its core is represented by far more than the state of the FSM itself, and includes the state of any

number of flip-flops that are set, cleared, incremented/decremented, and/or strobed with new data on

specific state-to-state transitions within the FSM, often in a data-dependent fashion. Any operation

that returns the FSM to a known state without performing a similar reset on this extended state

information may have incomplete and/or unintended consequences. Although it is theoretically

possible to annex the state of these external registers into the design of the FSM itself, so that the

overall state of the control logic is reflected solely by the state of the FSM, such an approach in

practice leads to such an immense increase in the number of potential states of the FSM as to make

state decoding impossible within any realistic clock period. In addition, the overriding system

design concern is that the states of some of these registers are correlated with the states of variables

stored in memory by software in some processor. In the end, it is no more effective to force an FSM

into a known idle state in response to finding it in an illegal state than it is to recognize that the next

instruction address in a software program has jumped to an illegal value and respond to this event by

simply forcing that instruction address to a known location in the program’s idle loop without any

concern for the state of data memory.

None of the preceding statements should be construed in a fashion as to suggest that the increased

likelihood of SEUs in the space-borne environment did not focus attention on the need to make the

hardware design as impervious as possible to unexpected conditions in the incoming data stream.

This is particularly true in the case where the choice of subsequent states of FSMs is dependent on

the presence or absence of particular bits in the data, and especially so in the case where those data

have passed through memory buffers where they are subjected to undetected SEUs. Thus, in the

particular case of the AEM, hard limits are imposed by the FSMs on the number of data words from

a single FREE board that will be entered into or removed from memory in any given event, despite

the fact that each data word carries its own “more data coming” flag. These hard limits are derived




from the known maximum number of data words that a FREE can properly generate per event.

However, it is argued that this kind of approach is merely good design practice in any environment

and does not represent any particular concession to the rigors of space-flight.




6. GLOBAL LOGIC FEATURES

The following section deals with features that are global to the entire AEM design rather than

specific to any particular VHDL module.

6.1 Nomenclature

In general, signal names are chosen to be (hopefully) self-explanatory without resorting to an entire

tome per signal name. The suffix “_Next” is appended to a signal name to denote the “D” input of a

flip-flop whose “Q” output is the signal with the same name but without the suffix. In some cases,

the un-appended name may be that of a vector; in those cases the signal whose name contains the

suffix is typically the serial input to a shift register whose parallel outputs comprise the vector.

There may be a limited number of instances remaining in which the suffix “_Next2” denotes the

name of a signal which precedes a signal whose name excludes that suffix by a two-deep shift

register instead of a single D flip-flop.

The prefix of a lower case “n,” either at the beginning of a signal name or subsequent to the first

underscore character in the signal name, denotes a signal which is active low rather than active high.

The prefixes of lower case “d,” “u”, and “s” are used, either alone, in combination, or repetitively to

denote a variety of signal delaying techniques applied to the signal with the un-prefixed name. The

etymology of these names originates with the “d” prefix which originally denoted a “delayed”

signal, where the delay can be generated either by a D flip-flop or (shudder) by a gate delay.

Multiple d’s, as in “dd” or “ddd” connote delays by multiple D flip-flops or by (retch) multiple gate

delays. However, in the present design if the delay is indeed generated by a D flip-flop then the “d”

denotes the fact that the flip-flop is clocked on the “downward” or falling edge transition of the

external 20 MHz clock whose name is nSys_Clk. The corresponding “u” prefix denotes delay by a

D flip-flop clocked on the “upward” or rising edge transition of nSys_Clk. As its name indicates,

nSys_Clk is considered to be an active-low signal; the corresponding internal clock is Sys_Clk,

which is derived in the top-level VHDL module from nSys_Clk by an inversion operation. As a

result, “d” delay flip-flops are clocked on the rising edge of Sys_Clk while “u” delay flip-flops are

clocked on the falling edge of Sys_Clk. This is the first of a number of conventions adopted to




maximize confusion. An additional aid to confusion is realized by the employment of the prefix “d”

as an abbreviation for the “du” operation denoting processing of an input signal first through a flip-

flop clocked on the upward transition of nSys_Clk and then again with a second flip-flop clocked on

the downward transition of this same clock. The dubious reasoning behind this sequence of

operations will appear in the subsequent discussion of clocking. Note that the ordering of a

sequence of delaying operations is specified by the addition of a further prefix for each new

operation; thus the left-most prefix specifies the most recently applied operation.

Finally, the “s” prefix denotes a “synchronizing” D flip-flop of the “d” variety applied to an input

signal that (in general) is either inherently asynchronous to the 20 MHz clock or else synchronous to

the clock but so unpredictable in phase during the normal course of events as to best be treated as

asynchronous. If the signal processed by the synchronizer must enter into control logic at more than

the input of a single subsequent flip-flip, then it must receive “ds” processing by a two-stage

synchronizer where the second stage is clocked after the meta-stable stages in the output of the “s”

flip-flop have had one additional clock period to decay. In general, this approach to moving control

signals between clocking domains is effective given the typical ratio of clock period to meta-stable

state decay time constant as long as the intent of the second flip-flop is not defeated by abnormally

long routing delays between the two stages of the synchronizer. When moving data signals between

clocking domains, a single-stage synchronizer is frequently employed as long as the two-stage

synchronizer in an accompanying control path ensures that the meta-stable states in the data path

have decayed before the data are latched or otherwise clocked in the new domain. The foregoing is

largely academic because the only inherently asynchronous signal in the AEM is the power-on reset

input nLogic_Reset. Although not strictly necessary, this input is processed by a two-stage “ds”

synchronizer. In fact, the only transition between clock domains within the entire LAT occurs at the

interface between the 20 MHz LAT clock and the 33 MHz cPCI clock in any of the cPCI crates; this

interface occurs across the two ports of several asynchronous FIFOs located between the two halves

of the “LCB” or LAT Communications Board.

6.2 Clocking

The LAT 20 MHz clock is derived from an underlying 40 MHz clock by a divide-by-two operation

within the CRU FPGA. A separate output flip-flop is provided within the CRU for each 20 MHz




clock output, which in turn drives a single point-to-point trace on the GASU DAQ circuit board. (In

a few cases, where the far end of that trace is on an LVDS differential transmitter with four ports, a

single trace may drive all four transmitter ports.) As a result of these features, the 20 MHz clock

signals start life with a strict 50% duty cycle; this characteristic only degrades as a result of pulse-

width shrinkage within the LVDS differential signaling or on the relatively short traces between

chips on the DAQ board.

At present, the fundamental rule of LAT clocking is that flip-flops driving serial data onto chip

output pins are clocked on the downward transition of one of these board-level 20 MHz clocks. If

the serial data stream is propagating in the same direction as the clock, i.e. in a direction that is

outwards from the CRU, then the flip-flops on chip input pins at the receiving end of a signal path

are clocked on the upward transition of the accompanying 20 MHz clock. This scheme is effective

as long as Tpd at the output flip-flop plus Tsu at the input flip-flop is less than the minimum down

pulse width of the clock, as degraded for clock skewing both between data and clock in the signal

path and between the clock distribution networks in the transmitting and receiving chips. The

validity of the assumption that this criterion is always met is currently the subject of active inquiry,

but the results of that inquiry are beyond the scope of this document.

For serial data streams propagating counter to the flow of the clock, in general the LAT system

design always assumed that the phase of clocking the data into the chip at the receiving end would

be dependent upon the round-trip propagation delay between the two ends of the signal path. As an

aid to implementing this plan, the functions of many of the LVDS receivers on the engineering

prototypes of the DAQ board were implemented in the “GLTC” ASIC. This is an 18-input LVDS

receiver with independent masking of each input, independent registered outputs for each input, and

an additional set of 9 registered outputs for the ORs of masked input pairs. (The OR feature is

intended to support the ORing of signals from phototubes on the two sides of an ACD tile or strip.)

In addition, the phase of the clock transition that updates the output registers is controlled by a single

input pin on each GLTC chip. Historically, the intent was to clock all logic within FPGAs, including

both input pin and output pin flip-flops, on the downward edge of the 20 MHz clock. However, data

streams returning to the DAQ board from the periphery of the LAT were received in GLTCs where

their active clock edges could be selected (presumably once, at LAT system integration time).

Regardless of whether the GLTC output registers were clocked on the upward or downward edges of




the local 20 MHz clock, the contents of these output registers were transferred to their accompanying

FPGA input pin registers on the next downward clock edge. This resulted in an additional delay

between GLTC output and FPGA input of either a half clock period or an entire clock period,

depending on the clock edge employed in the GLTC.

However, the current design of the flight version of the DAQ board calls for the replacement of the

GLTCs by commercial LVDS receivers everywhere but in the reception of the veto outputs from the

ACD phototubes for which they were originally designed. In that application, the pair-wise ORing

of signals and inclusion of individual input masks represent a significant off-loading of logic

functions from the FPGAs in the GEM. (A few additional GLTCs are utilized in the GEM for the

reception of individual trigger and “busy” signals.) Because of the elimination of the GLTC output

register function between the LVDS receivers and the FPGA input pins on the data streams returned

by the 12 FREE boards to the AEM in the direction counter to the clock propagation (the

“ACD_nData” vector), it is necessary to include provision for clocking these signals into the FPGA

on the upwards transition of the 20 MHz nSys_Clk signal. This is necessary before entering these

signals into the bulk of the AEM’s internal logic, which is still clocked on the upward transition of

Sys_Clk, i.e. on the downward transition of nSys_Clk, as in the original design. Because each of

these signals is shared by two different blocks of logic, one associated with extending LATp

command/response fabric connectivity to the FREE board ASICs and the other associated with

deserializing event data and entering those data into the memory buffer, it is prudent to re-register

these inputs on the trailing edge of nSys_Clk in order to avoid distributing the outputs of the “u”

input flip-flops to multiple destinations within half a clock period. This is the rationale behind the

successive “u” and then “d” flip-flops on these 12 inputs. However, given that the length of the

cable plant between the GASU and the FREE boards is not identical for all 12 boards, it may prove

necessary to eliminate the “u” flip-flops on a board-by-board basis at LAT system integration time in

order to avoid violating the setup time specification at the inputs to these flip-flops. This is the

reason for the confusing abbreviation of “du” to “d,” since the “u” may not be present in some or all

elements of this vector signal.




6.3 VHDL Process Decomposition

At the present time, it appears to the VHDL developer that there are no generally accepted norms for

what constitutes a reasonable coding style when it comes to the decomposition of an assemblage of

logic into processes. Examples were studied that ran the gamut from placing every flip-flop in its

own process to having only two processes per module, with one of these containing all sequential

logic, reduced to the form of D flip-flops, and the other containing all combinatorial logic. In

general, all VHDL modules were already simplified to the point that they contained a maximum of

one FSM. After a small amount of reflection, the following template for process decomposition was

adopted:

• There is typically one process per sequential clock. This contains all the sequential elements driven by that clock, generally in the form of D flip-flops. The inputs to these flip-flops are usually signals with the _Next suffix. However, in those cases where it is possible to generate a reasonably simple Boolean combinatorial expression for the _Next signal, where that expression does not contain relational operators that the VHDL synthesis tool is incapable of processing within the context of a signal assignment, that non-relational expression is utilized directly rather than defining it in another process.

• There is one process per FSM with the sole output of the _Next value of the FSM’s state signal. • There is one process per FSM that defines combinatorial outputs generated by states or state

transitions of that FSM. This includes both pure combinatorial outputs as well as the _Next inputs to D flip-flops that represent extended state signals associated with the FSM, where those extended state signals are modified by FSM state transitions or state visitations.

• Where there are examples of output signals driven from the FSM whose transitions or combinatorial definitions are quite complicated, these examples are frequently broken off from the FSM output process into their own process. However, there are no consistent rules for consigning a piece of FSM output logic to a private process.




7. INDIVIDUAL MODULE FEATURES

The following section deals with features that are specific to the individual low-level VHDL

modules. Extensive comments are now included in the individual modules, so only an overall

discussion is presented here. The function of the minimal logic included in the top-level module has

already been discussed

7.1 LATp _Registers

This module contains the logic to implement a node on the LAT command/response fabric and

extend the access of the command/response fabric to include the GARC and GAFE ASICs on the

FREE boards. It also provides a readout channel number and conversion start pulse to the

environmental monitoring ADCs. Finally, it provides a limited amount of remote access support to

hardware that is logically part of the AEM but is physically located in the CRU. This hardware was

added late in the development effort in order to correct for a shortcoming in the GARC design that

can be overcome with the aid of a burst of low-frequency clock pulses shortly after the power-up

operation on each FREE board. In order to maximize phase coherence in 20 MHz clocks throughout

the LAT, this clock frequency selection logic was located within the CRU, which contains the phase

reference for all LAT system clocking.

The heart of the LATp registers module is a 16-bit shift register that is used to deserialize incoming

commands from the LATp command/response fabric and serialize responses returned to that fabric.

This shift register is generally associated with an 8-bit mod-132 clock phase counter that counts

clock ticks within an incoming or outgoing 132-tick LATp cell. The module’s FSM can be

construed as simply the arbitration logic to determine the current activity associated with the shift

register, with the phase counter keeping track of the appropriate times at which to conditionally

switch the current activity. There are only two sets of state transitions that are not ultimately timed

by the current contents of the mod-132 counter. The first of these is the set of transitions out of the

Idle state, which is driven by the arrival of the first two bits of an incoming command cell, i.e. the

cell’s initial delineator. The other is the set of transitions out of the Awaiting_Response state, which

is driven by either the start bit of a response from the expected FREE board or else by a timeout.




A second 16-bit shift register is used to serialize the internally generated calibration and/or event

readout commands to the FREEs. The presence of this shift register, as well as deserializing shift

registers in other VHDL modules for the event readout data returned by the FREEs, permits the

exchange of messages between the command/response fabric and the AEM to proceed in parallel

with the processing of triggers, the generation of event readout commands to the FREEs, and the

receipt of event data from the FREEs. The timing of the generation of these internally commands is

chosen to support, although not require, the earliest possible command generation consistent with the

receipt of necessary trigger information in the TAM. Specifically, the fiducial point from which

trigger time delays are measured is associated with clocking the start bit of the event readout

command into the output flip-flops for the command lines going to the FREEs on the same tick that

clocks the zero suppress bit of the TAM, which becomes the first bit of the readout command

subsequent to the start bit, into the input flip-flop on the pin that receives the TAM.

7.2 Environment_Registers

This module contains the logic to run a conversion cycle in and then acquire data from the four

ADCs associated with one FREE board (or a similar set of ADCs associated with the AEM itself).

These data comprise the contents of an environmental monitor register. Note that this logic is only

instantiated once, and thus the results of and ADC readout from the ADCs associated with any

FREE are stored in a single common register. Providing individual control logic and data storage for

ADC readout for each of the 13 sets of four ADCs would require more than 600 additional R-cells in

the flight FPGA part; these resources were simply not available.

The logic in this module is relatively simple. The data paths consists of four 12-bit shift registers

that capture the serial read data streams from the four ADCs while the control logic is built around a

four-state binary-encoded FSM that simply cycles from the idle state through the three successive

stages of the conversion process. An eight-bit counter that is reset as each new state is entered is

used to define the length of time spent in each state, or in the case of the data conversion state, the

maximum duration of that conversion. The chip select output for the selected set of four ADCs is

asserted during all three of the non-idle states, while the readout clock is high in the idle and initial

analog acquisition stage, low during the digital conversion stage, and cycles for 16 ticks at 2.5 MHz

during the data readout state.




7.3 Memory_Write_Data_Multiplexer

This module contains the logic to multiplex the individual sources of write data for the buffer

memory onto a single bus, and to enable the tri-state drivers for that bus when an actual memory

write cycle is performed. The output of the multiplexer, whose select inputs are derived from the

memory time-slice allocation counter, is fed to registered output flip-flops, whose outputs in turn are

driven onto the external memory data bus via tri-state drivers. At present, a single register whose

output changes on clock cycle boundaries is used to enable these tri-state drivers, and there is

minimal effort expended to compensate for skewing in the distribution of this register’s output to the

sixteen individual enable lines of the tri-state drivers.

7.4 Memory_Address_Multiplexer

This module contains the logic to multiplex the individual sources of write and read cycle addresses

for the buffer memory onto a single bus. The output of the multiplexer, whose select inputs are

derived from the memory time-slice allocation counter, is fed to registered output flip-flops, whose

outputs in turn are driven directly onto the external memory address bus.

7.5 Memory_Control

This module contains the logic to generate the chip select, output enable, and write enable signals to

the external buffer memory, as well as the tri-state enable line for write data sent from the AEM

FPGA to this memory. It also contains the time-slice allocation counter. In the cases of event data

from the FREE boards and TAM data that are written into the buffer memory, the individual VHDL

module performing the write operation raises a request level, and this module asserts an

acknowledgment pulse during the single clock cycle representing the time-slice allocated to this data

source. However, in the case of buffer memory read cycles, the relevant VHDL module is furnished

with the contents of the time-slice allocation counter, and thus generates single-cycle requests to

perform read operations in the appropriate pair of time-slices.

At present, the external chip select line to the buffer memory is asserted for the entire duration of the

clock tick in which either a write or read cycle is performed. The external write enable line is

asserted for approximately the second and third quarters of the clock tick in which a write cycle is




performed. Unfortunately, because of the lack of poly-phase clocking within an individual clock

tick, the generation of this write pulse employs registers clocked on both leading and trailing edges

of the 20 MHz clock to generate a pulse that is longer than half a clock tick by several gate delays.

A sequence of gate delays timed to the characteristics of the particular logic family is then used to

delay this half-tick pulse by approximately one quarter of a clock cycle. For similar reasons related

to lack of poly-phase clocking, both the external output enable signal and the tri-state enable line for

the memory write data are asserted for the entire duration of read and write cycles respectively. This

implies that there is currently no “bus turnaround” or “break before make” period when two memory

cycles of different types immediately succeed one another. The only available scheme for

implementing such a bus turnaround period would appear to rely on the generation of a pulse whose

duration is the first quarter of each clock cycle (by clocking a one into a register at the beginning of

every clock cycle and then using the delayed output of that register as an asynchronous reset to the

register). That pulse could then be used to blank out the first quarter of the external output enable

signal and the tri-state enable line for the memory write data. Such a scheme might succeed because

the output enable delay of the memory during read cycles is relatively short, and because the write

data are written on the trailing edge of the write enable pulse, which would be half a clock tick later

than the point at which the write data tri-state buffers are enabled. The use of such a scheme is

currently under consideration, but the excessive use of gate delays for timing control, already a cause

for concern in the write enable generation, has precluded this possibility to date. One mitigating

factor is the observation that the combination of the time-slicing algorithm and the fact that all FREE

boards should commence emitting data for the same event simultaneously tends to make most write

cycles follow other write cycles, rather than read cycles. Thus the bus turnaround problem is

associated with only 14-17 percent of write cycles, and only then when an additional event is being

acquired while a previously event is being read from the memory buffer, i.e. during periods of

significant event pile-up.

7.6 Trigger_Processor

This module contains the logic to receive TAM messages from the GEM, save these messages in the

buffer memory, generate the appropriately timed calibration strobe generation and/or event read

command to the FREE boards, and supervise the loading of FREE board event data into the buffer




memory. These operations are supervised by a 7-state FSM that is basically a counter, with only one

state transition that is not to the next successive state. That transition is an abort when the TAM

message contains to generate a calibration strobe without subsequent data readout. The module also

contains the logic to generate the “full” and “empty” flags for the event FIFO contained within the

buffer memory.

The incoming TAM message is deserialized in a 16-deep shift register, and successive halves of this

message are latched in a 16-bit latch before being entered into the buffer memory. The structure of

the TAM message is such that the so-called “CALSTROBE” and “TACK” bits appear early in the

message, but the “ZERO SUPPRESS” arrives somewhat later. This is apparently an artifact of

optimizing the TAM structure to the requirements of the readout electronics for the LAT

tracker/calorimeter towers. In the towers, the tracker always performs zero suppression, while the

zero suppression for the calorimeter is performed by the Tower Electronics Module (“TEM”) on

non-suppressed data returned to the TEM by the calorimeter front-end electronics. Thus, in the case

of the towers, the zero-suppression control information from the TAM is only consumed by the TEM

and is not included in any trigger information forwarded by the TEM to the front-end electronics.

However, in the ACD electronics the zero-suppression is performed by the FREE boards, and thus

the zero-suppression request is included in the trigger message from the AEM to the FREE boards.

This implies that the AEM is incapable of generating an event read trigger command message to the

FREE boards until it is in possession of the TAM’s zero-suppression control bit, or at least until the

arrival of that bit is imminent. In fact, because the zero suppression bit is the first bit after the start

bit in the event read trigger command, the reference time for all trigger delays is based on the start

bit of that command being clocked into the registered outputs of the FREE board’s command lines

one tick before the zero suppression bit is clocked into the registered input of the incoming TAM

message.

A significant amount of pipelining is employed in the generation of the delays for the FREE board

trigger-related commands. The values of 0 and 1 for the requested delay lengths are pre-decoded. If

the requested delay is zero, the command initialization signal is passed immediately. Otherwise, the

command initialization sets a delay flip-flop. The command is then generated and the flip-flop is

cleared when there is one remaining tick in the delay time, since it took one tick to set the flip-flop in

the first place. In fact, the delay timer is only started if the requested delay is two or more ticks, and




the flip-flop is cleared one tick after that timer reaches two remaining ticks, or else immediately if

the requested delay was one tick and the timer was never started. In the case that the TAM message

includes requests for both a calibration strobe and subsequent data readout, there is no hardware

protection for the delay between the calibration strobe and the data readout command being less than

the length of the calibration strobe command itself. This combination is architecturally deemed to be

a software error.

It is important to note that the advance of the FSM out of the Awaiting_Triggers state occurs when

two independent and asynchronous conditions have been met. The delay until the second half of the

TAM has been written into the buffer memory is dependent upon the current contents of the time-

slice allocation counter when the first bit of the TAM arrives. The delay until all requested

commands have been sent to the FREE boards is dependent upon the contents of the TAM message

itself as well as the lengths of the requested trigger delays. Thus it is necessary to perform the FSM

state advance on latched indications of the completion of these activities rather than on pulsed

indications of such.

7.7 Event_Receiver

This module, which is instantiated once per FREE board, contains the logic to receive event data

from an individual FREE board and enter those data into the buffer memory. The control logic is in

a relatively simple FSM, with the only complication introduced by the possibility of a timeout. The

FSM makes a transition from the Idle state to the Awaiting_Data state upon receipt of the

Start_Data_Acquisition signal from the trigger_processor module, assuming that the contribution

from the particular FREE board is not disabled. In normal operation, it leaves this state upon receipt

of the start bit of the FREE board’s event data stream. In this case, the FSM will either never receive

the Abort_Data_Acquisition signal (if the instantiations of this logic for all other enabled FREE

boards receive their respective start bits) or else it will ignore that signal (if one more enabled

instantiations does not receive a start bit) since it is only sampled while in the Awaiting_Data state.

The FSM will always return to the Idle state, either because it receives a start bit or because it fails to

receive one, thus prompting the trigger_processor module to issue the Abort_Data_Acquisition

signal.




This module also includes the two-deep hardware FIFO that is interposed between the deserializing

shift register and the buffer memory. That FIFO is implemented as a pair of successive 16-bit

latches with a “full” flag on each latch. A two-input multiplexer selects the contents of the second

latch for writing into the buffer memory if that second latch is full, and otherwise sends the contents

of the first latch to the memory_write_data_multiplexer module.

The logic replaces the parity bit of each PHA word by a corresponding parity error bit before

entering that word into the hardware FIFO, and sets a flag in the cable status register if that error bit

is asserted. The parity bit of the header is always replaced by a zero and never generates an error

flag because that header parity is not correctly generated by the FREE board’s GARC chip under all

circumstances. Incorrect parity may be generated when the number of zero suppression bits in that

header is decreased because the maximum number of PHA values in the event contribution has been

reached, or decreased because zero suppression has been disabled by the contents of the zero-

suppression control bit in the event data readout command.

A subtle re-mapping of “PHA data present” information is also performed before deserialized data

are entered into the hardware FIFO. As presented by the FREE board, the most significant bit of

each PHA value word indicates whether or not that data word is actually present. However, when

entered into the hard FIFO, that bit from the first PHA value word is remapped into the “PHA vector

present” bit in the last word of the header, replacing the “command parity error” bit. If one or more

PHA value words are entered into the FIFO, the “PHA data present” bit of the next PHA value word

is remapped into the “additional PHA word present” bit of the current PHA value word. Note that

regardless of the value of the “additional PHA word present” bit in the 18th PHA value word, the

FSM prevents a 19th such word from being deserialized and entered into the hardware FIFO, thus

ensuring that once a start bit is received, processing of a single event’s data stream will conclude

after a limited period of time.

7.8 Event_Transmitter

This module contains the logic to generate the AEM’s event contribution and forward it to the EBM.

The FSM containing the control logic simply cycles through five successive states at appropriate

times. In particular, note that the advance from the Fetching_TAM_High state to the

Sending_Header state is delayed until there is adequate time to properly generate the LATp header




parity from the LATp destination address contained in the first half of the stored TAM message

before loading that parity into the serializing shift register. In addition, note that the logic that marks

each successive FREE board’s contribution to the event as completed while advancing to the next

FREE board ignores the “additional PHA word present” in the 18th PHA value word from each

FREE board. Thus, regardless of logic failures in the FREE board, communications failures in

transporting the event data from the FREE board to the AEM logic, or SEU corruption of the buffer

memory contents, the length of each FREE board’s contribution to the event is limited by the logic

in this VHDL module.

Because of the possibility that events will be generated with the contribution of one or more (or even

all) FREE boards disabled, the FREE board number is included in the third word of its header, along

with an end-of-file (“EOF”) bit that tags the contribution of the last FREE board included in the

event. In the case that contributions from all FREE boards are disabled, the logic still includes a null

header from the non-existent FREE board 15 (decimal) so that there is a place to include an asserted

EOF bit. The generation of this phantom null header and the related generation of padding for the

remainder of the current LATp cell subsequent to the end of the last included FREE board’s

contribution to the event represent the only real challenges to the logic design. The approach taken

is to use FREE board number 15 as an internal marker signifying that all contributions to the event

from enabled real FREE boards have already been processed. If this marker is detected and one or

more complete FREE board headers have already been added to the outgoing event, then only zero

padding will be added until the end of the current cell. However, if no header has yet been sent, then

the first two header words are still zero-filled but the third word includes the marker FREE board

number and the asserted EOF bit; the remainder of that cell is then padded out with zeroes. Note

that data fetches from the buffer memory continue from the region of that memory that would

otherwise be associated with FREE board 15 even after all legitimate event data have already been

added to the event data stream. The loading of a data word into the serializing shift register when

the next word to be fetched from the buffer memory will be one of these phantom data words sets

the Done flag, which is sampled twice at the end of each cell, first to disable the effects of

backpressure from the EBM and later to terminate the transmission of the current event’s data.

document # date effective lat-td-03724-01 may 3, …€¦ · document # date effective...

Documents