lab for reliable computing generalized latency-insensitive systems for single-clock and multi-clock...

24
Lab for Reliable Computing Generalized Latency-Insensitive Systems for Single-Clock and Multi-Clock Architectures Singh, M.; Theobald, M.; Design, Automation and Test in Euro pe Conference and Exhibition, 2004.

Post on 21-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Lab for Reliable Computing

Generalized Latency-Insensitive Systems for

Single-Clock and Multi-Clock Architectures

Singh, M.; Theobald, M.;

Design, Automation and Test in Europe Con

ference and Exhibition, 2004.

Lab for Reliable Computing, 2004/4

Kun-Sheng Huang 2

ReferenceReference

A methodology for correct-by-construction latency insensitive designCarloni, L.P.; McMillan, K.L.; Saldanha, A.; Sangiovanni-Vincentelli, A.L.;Computer-Aided Design, 1999. Digest of Technical Papers. 1999 IEEE/ACM International Conference

Lab for Reliable Computing, 2004/4

Kun-Sheng Huang 3

OutlineOutline

Introduction

Latency-Insensitive Systems

New Approach – Part I: More Flexible Synchronous Modules

New Approach – Part II: Arbitrary Communication Network Topologies

Conclusions

Lab for Reliable Computing, 2004/4

Kun-Sheng Huang 4

IntroductionIntroduction

Latency-insensitive systems were originally proposed for the design of single-clock SoC’s

A synchronous module is said to be latency-insensitive if it can operate correctly in the presence of arbitrary delays on its input and output channels

Two limitations of original research: 1. Assumption that the data rates on all input a

nd output channels of a synchronous module are identical

2. Only considers point-to-point interconnects

Lab for Reliable Computing, 2004/4

Kun-Sheng Huang 5

Latency-Insensitive Systems(1/4)

Using clock gating to stall a module whenever any of its communication channels is unavailable

Encapsulating the synchronous modules inside specially-designed “wrapper” circuits

As a result of this encapsulation, the synchronous blocks become more modular, thereby facilitating design reuse

Lab for Reliable Computing, 2004/4

Kun-Sheng Huang 6

Latency-Insensitive Systems(2/4)

Figure 1. Carloni et al.’s approach to latency-insensitive design

Lab for Reliable Computing, 2004/4

Kun-Sheng Huang 7

Latency-Insensitive Systems(3/4)

Communication between the modules is achieved using point-to-point channels

The complete design flow consists of four basic steps:

1. Specification of synchronous components

2. Encapsulation

3. Physical layout, placement and routing

4. Relay station insertion

Lab for Reliable Computing, 2004/4

Kun-Sheng Huang 8

Latency-Insensitive Systems(4/4)

Lab for Reliable Computing, 2004/4

Kun-Sheng Huang 9

More Flexible Synchronous Modules

Carloni et al.’s approach uses a simplifying assumption:

Every input/output channel is exercised by a module on every clock tick

Thus, may cause a significant loss of throughput by generating more stalls than necessary

Lab for Reliable Computing, 2004/4

Kun-Sheng Huang 10

Example(1/2)

Lab for Reliable Computing, 2004/4

Kun-Sheng Huang 11

Example(2/2)

Carloni et al.’s approach can be made to work in this scenario provided M1 sends nine “garbage” data values to M2

This approach may introduce additional critical paths into the system, thereby potentially causing loss of performance

Transmitting unnecessary data values is wasteful of power

Apply to the stall generation circuitry inside the wrapper circuit

Lab for Reliable Computing, 2004/4

Kun-Sheng Huang 12

Generalized Latency-Insensitive Modules(1/3)

Simple combinational gate

Lab for Reliable Computing, 2004/4

Kun-Sheng Huang 13

Generalized Latency-Insensitive Modules(2/3)

More sophisticated finite-state machine (FSM)

Lab for Reliable Computing, 2004/4

Kun-Sheng Huang 14

Generalized Latency-Insensitive Modules(3/3)

The generalization presented here has two key benefits

1. A significant reduction in unnecessary stalls may be obtained, since stalls are no longer caused by the unavailability of those channels that are not currently needed

2. Modules that are not currently producing needed outputs can be safely stalled, without fear of stalling their neighbors; As a result, significant savings in power consumption may be obtained

Lab for Reliable Computing, 2004/4

Kun-Sheng Huang 15

Wrapper Specification and Synthesis(1/2)

Lab for Reliable Computing, 2004/4

Kun-Sheng Huang 16

Wrapper Specification and Synthesis(2/2)

There are two interesting features of Figure 5: 1. The machine’s output g is latched by a regi

ster on the negative clock edge before being used to gate the module’s clock

2. The second feature is that the register that stores the state bits is controlled by gclock, not by the original clock (clock)

Lab for Reliable Computing, 2004/4

Kun-Sheng Huang 17

Example

synchronous module

(b) xx/1: the module’s clock is enabled */0: represent the remaining conditions, i.e., when the module is stalled(c) g = y’ac + ybc; Y = y’ac; S0: y=0; S1: y=1; g: the FSM output; Y: next-state value

Lab for Reliable Computing, 2004/4

Kun-Sheng Huang 18

Arbitrary Communication Network Topologies

The basic approach to latency-insensitive design assumes that all channels in the system are point-to-point channels

Augment the basic approach with arbitrary communication network topologies

Lab for Reliable Computing, 2004/4

Kun-Sheng Huang 19

Example

The actual throughput obtained may even be less than the rate of the slowest module because, in addition, the slowest module may be stalled at times

Lab for Reliable Computing, 2004/4

Kun-Sheng Huang 20

Generalized Communication Network (1/3)

Using a number of specialized blocks to implement the communication network

Specialized blocks include:

(i) forks, which replicate one input data stream onto multiple output channels

(ii) splits, which distribute data from one input channel onto multiple output channels

(iii) merges, which combine (i.e., interleave) multiple input data streams onto one output channel

Lab for Reliable Computing, 2004/4

Kun-Sheng Huang 21

Generalized Communication Network (2/3)

Three steps: 1. Specify the communication network topology, either

using the specialized blocks, or using a high-level CSP-like language such as Tangram or Balsa

2. Choose between a synchronous and an asynchronous implementation; If synchronous, implement as stallable finite-state machines; If asynchronous, implement using predesigned handshake circuits available in Tangram and Balsa

3. Identify wires with long latencies. Segment them, and insert relay stations (synchronous) or FIFO handshake cells (asynchronous)

Lab for Reliable Computing, 2004/4

Kun-Sheng Huang 22

Generalized Communication Network (3/3)

The net impact of the proposed generalization of the communication network is two-fold:

1. A significantly greater degree of expressivity is offered for the specification of inter-module communication

2. The designer is offered much greater freedom to “mix-’n-match” modules of different speeds and different types of interfaces

Lab for Reliable Computing, 2004/4

Kun-Sheng Huang 23

Conclusions

The first extension allows much greater flexibility in interfacing a synchronous module with its I/O channels, thereby allowing higher system throughput through elimination of unnecessary stalls

The second extension proposes more general communication network topologies than the currently popular point-to-point interconnects

The third extension allows the handling of multiple clock domains

Lab for Reliable Computing, 2004/4

Kun-Sheng Huang 24

Relay stationRelay station