bridging the gap between asynchronous design and designers

52
1 Bridging the gap Bridging the gap between asynchronous between asynchronous design design and designers and designers Peter A. Beerel Peter A. Beerel Fulcrum Fulcrum Microsystems, Microsystems, Calabasas Hills, CA, Calabasas Hills, CA, USA USA Jordi Cortadella Jordi Cortadella Universitat Universitat Polit Polit è è cnica de cnica de Catalunya, Barcelona, Catalunya, Barcelona, Spain Spain

Upload: amelia

Post on 21-Jan-2016

25 views

Category:

Documents


0 download

DESCRIPTION

Bridging the gap between asynchronous design and designers. Peter A. BeerelFulcrum Microsystems, Calabasas Hills, CA, USA Jordi CortadellaUniversitat Polit è cnica de Catalunya, Barcelona, Spain Alex KondratyevCadence Berkeley Labs, Berkeley, CA, USA. Outline. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Bridging the gap between asynchronous design and designers

11

Bridging the gap between Bridging the gap between asynchronous designasynchronous design

and designersand designers

Peter A. BeerelPeter A. Beerel Fulcrum Microsystems,Fulcrum Microsystems,Calabasas Hills, CA, USACalabasas Hills, CA, USA

Jordi CortadellaJordi Cortadella Universitat PolitUniversitat Politèècnica decnica deCatalunya, Barcelona, SpainCatalunya, Barcelona, Spain

Alex KondratyevAlex Kondratyev Cadence Berkeley Labs,Cadence Berkeley Labs,Berkeley, CA, USABerkeley, CA, USA

Page 2: Bridging the gap between asynchronous design and designers

22

1.1. Basic concepts on asynchronous circuit designBasic concepts on asynchronous circuit design

Tea BreakTea Break

2.2. Logic synthesis from concurrent specificationsLogic synthesis from concurrent specifications

3.3. Synchronization of complex systemsSynchronization of complex systems

LunchLunch

4.4. Design automation for asynchronous circuitsDesign automation for asynchronous circuits

Tea BreakTea Break

5.5. Industrial experiencesIndustrial experiences

OutlineOutline

Page 3: Bridging the gap between asynchronous design and designers

33

Basic concepts on Basic concepts on asynchronous circuit designasynchronous circuit design

Page 4: Bridging the gap between asynchronous design and designers

44

OutlineOutline

What is an asynchronous circuit ?What is an asynchronous circuit ?

Asynchronous communicationAsynchronous communication

Asynchronous design styles (Micropipelines)Asynchronous design styles (Micropipelines)

Asynchronous logic building blocksAsynchronous logic building blocks

Control specification and implementationControl specification and implementation

Delay models and classes of async circuitsDelay models and classes of async circuits

Channel-based designChannel-based design

Why asynchronous circuits ?Why asynchronous circuits ?

Page 5: Bridging the gap between asynchronous design and designers

55

Synchronous circuitSynchronous circuit

R R R RCL CL CL

CLK

Implicit (global) synchronization between blocksClock period > Max Delay (CL + R)

Page 6: Bridging the gap between asynchronous design and designers

66

Asynchronous circuitAsynchronous circuit

R R R RCL CL CL

Req

Ack

Explicit (local) synchronization:Req / Ack handshakes

Page 7: Bridging the gap between asynchronous design and designers

77

Motivation for asynchronousMotivation for asynchronous

Asynchronous design is often unavoidable: Asynchronous interfaces, arbiters etc.

Modern clocking is multi–phase and distributed –and virtually ‘asynchronous’ (cf. GALS – next slide):

Mesachronous (clock travels together with data) Local (possibly stretchable) clock generation

Robust asynchronous design flow is coming(e.g. VLSI programming from Philips, Balsa fromUniv. of Manchester, NCL from Theseus Logic …)

Page 8: Bridging the gap between asynchronous design and designers

88

Globally Async Locally Sync (GALS)Globally Async Locally Sync (GALS)

Local CLK

R RCL

Async-to-sync Wrapper

Req1

Req2

Req3

Req4

Ack3

Ack4Ack2

Ack1

Asynchronous World

Clocked Domain

Page 9: Bridging the gap between asynchronous design and designers

99

Key Design DifferencesKey Design Differences

Synchronous logic design:

proceeds without taking timing correctness(hazards, signal ack–ing etc.) into account

Combinational logic and memory latches(registers) are built separately

Static timing analysis of CL is sufficient todetermine the Max Delay (clock period)

Fixed set–up and hold conditions for latches

Page 10: Bridging the gap between asynchronous design and designers

1010

Key Design DifferencesKey Design Differences

Asynchronous logic design: Must ensure hazard–freedom, signal ack–ing,

local timing constraints Combinational logic and memory latches (registers)

are often mixed in “complex gates” Dynamic timing analysis of logic is needed to

determine relative delays between paths

To avoid complex issues, circuits may be builtas Delay-insensitive and/or Speed-independent (as discussed later)

Page 11: Bridging the gap between asynchronous design and designers

1111

Verification and Testing DifferencesVerification and Testing Differences

Synchronous logic verification and testing: Only functional correctness aspect is verified and tested Testing can be done with standard ATE and at low

speed (but high–speed may be required for DSM)

Asynchronous logic verification and testing: In addition to functional correctness, temporal aspect is

crucial: e.g. causality and order, deadlock–freedom Testing must cover faults in complex gates

(logic+memory) and must proceed at normal operation rate

Delay fault testing may be needed

Page 12: Bridging the gap between asynchronous design and designers

1212

Synchronous communicationSynchronous communication

Clock edges determine the time instants where data must be sampled

Data wires may glitch between clock edges(set–up/hold times must be satisfied)

Data are transmitted at a fixed rate(clock frequency)

1 1 0 0 1 0

Page 13: Bridging the gap between asynchronous design and designers

1313

Dual railDual rail

Two wires with L(low) and H (high) per bit “LL” = “spacer”, “LH” = “0”, “HL” = “1”

n–bit data communication requires 2n wires

Each bit is self-timed

Other delay-insensitive codes exist (e.g. k-of-n)and event–based signalling (choice criteria: pin and power efficiency)

1 1

0 0

1

0

Page 14: Bridging the gap between asynchronous design and designers

1414

Bundled dataBundled data

Validity signal Similar to an aperiodic local clock

n–bit data communication requires n+1 wires

Data wires may glitch when no valid

Signaling protocols level sensitive (latch) transition sensitive (register): 2–phase / 4–phase

1 1 0 0 1 0

Page 15: Bridging the gap between asynchronous design and designers

1515

Example: memory read cycleExample: memory read cycle

Transition signaling, 4-phase

Valid address

Address

Valid data

Data

A A

DD

Page 16: Bridging the gap between asynchronous design and designers

1616

Example: memory read cycleExample: memory read cycle

Transition signaling, 2-phase

Valid address

Address

Valid data

Data

A A

DD

Page 17: Bridging the gap between asynchronous design and designers

1717

Asynchronous modulesAsynchronous modules

Signaling protocol:Signaling protocol:

reqin+ start+ [reqin+ start+ [computationcomputation] done+ reqout+ ackout+ ackin+] done+ reqout+ ackout+ ackin+reqin- start- [reqin- start- [resetreset] done- reqout- ackout- ackin-] done- reqout- ackout- ackin-

(more concurrency is also possible)(more concurrency is also possible)

Data IN Data OUT

req in req out

ack in ack out

DATAPATH

CONTROL

start done

Page 18: Bridging the gap between asynchronous design and designers

1818

Asynchronous latches: C elementAsynchronous latches: C element

CA

BZ

A B Z+

0 0 00 1 Z1 0 Z1 1 1

Vdd

Gnd

A

A

A

AB

B

B

B

Z

Z

Z

[van Berkel 91]

Static Logic Implementation

Page 19: Bridging the gap between asynchronous design and designers

1919

C-element: Other implementationsC-element: Other implementations

A

A

B

B

Gnd

Vdd

Z

A

A

B

B

Gnd

Vdd

Z

Weak inverter

Quasi-StaticDynamic

Page 20: Bridging the gap between asynchronous design and designers

2020

Dual-rail logicDual-rail logic

A.t

A.f

B.t

B.f

C.t

C.f

Dual-rail AND gate

Valid behavior for monotonic environment

Page 21: Bridging the gap between asynchronous design and designers

2121

Completion detection Completion detection

Dual-rail logic

•••

•••

C done

Completion detection tree

Page 22: Bridging the gap between asynchronous design and designers

2222

Differential cascode voltage switch logic Differential cascode voltage switch logic

start

start

A.t

B.t

C.t

A.fB.fC.f

Z.tZ.f

done

3––input AND/NAND gate

N-type transistor network

Page 23: Bridging the gap between asynchronous design and designers

2323

Examples of dual-rail designExamples of dual-rail design

Asynchronous dual-rail ripple-carry adder(A. Martin, 1991)

Critical delay is proportional to logN(N=number of bits)

32–bit adder delay (1.6m MOSIS CMOS): 11 ns versus 40 ns for synchronous

Async cell transistor count = 34versus synchronous = 28

More recent success stories (modularity and automatic synthesis) of dual-rail logic fromNull-Convention Logic (Theseus Logic)

Page 24: Bridging the gap between asynchronous design and designers

2424

Bundled-data logic blocks Bundled-data logic blocks

Single-rail logic

•••

•••

delaystart done

Conventional logic + matched delay

Page 25: Bridging the gap between asynchronous design and designers

2525

Micropipelines Micropipelines (Sutherland 89)(Sutherland 89)

C

Join Merge

Toggle

r1

r2

g1

g2

d1

d2

Request-Grant-Done (RGD)Arbiter

Call

r1

r2

ra

a1

a2Select

inoutf

outt

sel

inout0

out1

Micropipeline (2-phase) control blocks

Page 26: Bridging the gap between asynchronous design and designers

2626

Micropipelines (Sutherland 89)Micropipelines (Sutherland 89)

L L L Llogic logic logic

Rin

Aout

C C

C C

Rout

Aindelay

delay

delay

Page 27: Bridging the gap between asynchronous design and designers

2727

Data-path / ControlData-path / Control

L L L Llogic logic logic

Rin RoutCONTROL AinAout

Page 28: Bridging the gap between asynchronous design and designers

2828

Control specificationControl specification

A+

B+

A–

B–

A

B

A inputB output

Page 29: Bridging the gap between asynchronous design and designers

2929

Control specificationControl specification

A+

B–

A–

B+

A B

Page 30: Bridging the gap between asynchronous design and designers

3030

Control specificationControl specification

A+

C–

A–

C+A

C

B+

B– B

C

Page 31: Bridging the gap between asynchronous design and designers

3131

Control specificationControl specification

A+

C–

A–

C+A

C

B+

B–B

C

Page 32: Bridging the gap between asynchronous design and designers

3232

Control specificationControl specification

CC

Ri

Ro

Ai

Ao

Ri+

Ao+

Ri-

Ao-

Ro+

Ai+

Ro-

Ai-

Ri Ro

Ao Ai

FIFOcntrl

Page 33: Bridging the gap between asynchronous design and designers

3333

A simple filter: specificationA simple filter: specification

y := 0;loop x := READ (IN); WRITE (OUT, (x+y)/2); y := x;end loop

RinAin

Aout Rout

ININ

OUTOUT

filter

Page 34: Bridging the gap between asynchronous design and designers

3434

A simple filter: block diagramA simple filter: block diagram

x y+

controlRin

Ain

Rout

Aout

Rx AxRy Ay Ra Aa

ININOUTOUT

• x and y are level-sensitive latches (transparent when R=1)• + is a bundled-data adder (matched delay between Ra and Aa)• Rin indicates the validity of IN• After Ain+ the environment is allowed to change IN• (Rout,Aout) control a level-sensitive latch at the output

Page 35: Bridging the gap between asynchronous design and designers

3535

A simple filter: control spec.A simple filter: control spec.

x y+

controlRin

Ain

Rout

Aout

Rx AxRy Ay Ra Aa

ININOUTOUT

Rin+

Ain+

Rin–

Ain–

Rx+

Ax+

Rx–

Ax–

Ry+

Ay+

Ry–

Ay–

Ra+

Aa+

Ra–

Aa–

Rout+

Aout+

Rout–

Aout–

Page 36: Bridging the gap between asynchronous design and designers

3636

A simple filter: control impl.A simple filter: control impl.

C

Rin

Ain

Rx Ax RyAy AaRa

Aout

Rout

Rin+

Ain+

Rin–

Ain–

Rx+

Ax+

Rx–

Ax–

Ry+

Ay+

Ry–

Ay–

Ra+

Aa+

Ra–

Aa–

Rout+

Aout+

Rout–

Aout–

Page 37: Bridging the gap between asynchronous design and designers

3737

Taking delays into accountTaking delays into account

x+

x–

y+

y–

z+

z– xz

yx’

z’

Delay assumptions:• Environment: 3 time units• Gates: 1 time unit

events: x+ x’– y+ z+ z’– x– x’+ z– z’+ y–

time: 3 4 5 6 7 9 10 12 13 14

Page 38: Bridging the gap between asynchronous design and designers

3838

Taking delays into accountTaking delays into account

xz

yx’

z’

Delay assumptions: unbounded delays

events: x+ x’– y+ z+ x– x’+ y–

time: 3 4 5 6 9 10 11

very slow

failure !

x+

x–

y+

y–

z+

z–

Page 39: Bridging the gap between asynchronous design and designers

3939

Gate vs wire delay modelsGate vs wire delay models

Gate delay model: delays in gates, no delays in wiresGate delay model: delays in gates, no delays in wires

Wire delay model: delays in gates and wiresWire delay model: delays in gates and wires

Page 40: Bridging the gap between asynchronous design and designers

4040

Delay models for async. circuitsDelay models for async. circuits

Bounded delays (BD): realistic for gates and wires. Technology mapping is easy, verification is difficult

Speed independent (SI): Unbounded (pessimistic) delays for gates and “negligible” (optimistic) delays for wires.

Technology mapping is more difficult, verification is easy

Delay insensitive (DI): Unbounded (pessimistic) delays for gates and wires.

DI class (built out of basic gates) is almost empty

Quasi-delay insensitive (QDI): Delay insensitive except for critical wire forks (isochronic forks).

In practice it is the same as speed independent

BD

SI QDI

DI

Page 41: Bridging the gap between asynchronous design and designers

4141

Channel-Based DesignChannel-Based Design

Synchronization and communication between blocks Synchronization and communication between blocks implemented with handshaking using asynchronous channels by implemented with handshaking using asynchronous channels by

sending/receiving sending/receiving “data tokens” “data tokens”

Synchronous SystemSynchronous System Asynchronous SystemAsynchronous System

AsynchronousAsynchronous channelchannel

clockclock

Page 42: Bridging the gap between asynchronous design and designers

4242

Channel Design – Single RailChannel Design – Single Rail

FeaturesFeatures One request wire One request wire

One wire per data bitOne wire per data bit

One acknowledgment wireOne acknowledgment wire

Has timing assumptionsHas timing assumptions

4-phase bundled-data channel

ReqAck

DataData stable

12

34

Req

Ack

Datasender receiver

Page 43: Bridging the gap between asynchronous design and designers

4343

Channel Design: Dual Rail & 1-of-NChannel Design: Dual Rail & 1-of-NDual RailDual Rail Two wires per data bitTwo wires per data bit

One acknowledgment wireOne acknowledgment wire

Advantage:Advantage:Supports delay-insensitive designSupports delay-insensitive design

1-of-N1-of-N Generalization of dual-railGeneralization of dual-rail

4-phase 1-of-N channel

Ack

Data1

2

3

4Ack

Data(1-of-N)

sender receiver

DataDataTT DataDataFF Logical Logical ValueValue

00 00 ResetReset

00 11 00

11 00 11

11 11 InvalidInvalid

Page 44: Bridging the gap between asynchronous design and designers

4444

Anatomy of a Channel-Based Anatomy of a Channel-Based Asynchronous DesignAsynchronous Design

Architecture is typically a multi-level hierarchy of Architecture is typically a multi-level hierarchy of communicating blockscommunicating blocks

BN-1 BN-2 BN-3

FAN-1 FAN-2 FAN-3 FA0

ASIC

Main FSM

Register Bank

Memory

Adder/Mult.

Subtract/Divider

Reg C

Reg B

Adder

Multiplier

Reg A

Yields a hierarchical netlist of cells, where at each level blocks communicate along channels

channels

leaf cells

Page 45: Bridging the gap between asynchronous design and designers

4545

Asynchronous CellsAsynchronous Cells

DefinitionDefinition Smallest element that communicates with its neighbors along Smallest element that communicates with its neighbors along

asynchronous channelsasynchronous channels

FunctionalityFunctionality Reads a subset of input channels Reads a subset of input channels Computes F and writes to a subset of output channelsComputes F and writes to a subset of output channels

Linear PipelinesLinear Pipelines Only one input and one output channelOnly one input and one output channel

FInput

Channels

OutputChannels

F

Page 46: Bridging the gap between asynchronous design and designers

4646

Cells for Cells for Non-Linear PipelinesNon-Linear Pipelines

FForkJoin

Conditional Split

Conditional Join

• Non-Linear PipelinesJoins and Forks

Conditional Joins: Read only some of the input channels

Conditional Splits: Write only to some of the output channels

F

FF

Page 47: Bridging the gap between asynchronous design and designers

4747

Template-Based Leaf-Cell Design

• Each pipeline style (QDI, timed…) has a different blueprint

• Create a library using a blueprint to implement the lowest level

communicating blocks

RCDRCD

FLCDLCD

CC

Blueprint for a QDI N-input M-output pipeline stage

RCDRCD

FLCDLCD

CC

LCDLCD

2-input 1-output pipeline stage

RCDRCD

FLCDLCD

CC

RCDRCD

1-input 2-output pipeline stage

Page 48: Bridging the gap between asynchronous design and designers

4848

Template-Based Leaf-Cell Design

• Pros

• Enables fine-grain 2-D pipelining yielding high-performance

• Simplifies logic synthesis by enabling simple control circuit

generation and re-use of typical datapath synthesis

• Leaf-cells can be layed-out and verified creating a leaf-cell

library, localizing timing assumptions

• Cons

• Unified template may not be optimal in all cases

• Particularly, less effective for non-pipelined architectures

with more complicated control

Page 49: Bridging the gap between asynchronous design and designers

4949

Motivation (designer’s view)Motivation (designer’s view)

Modularity for system-on-chip design Plug-and-play interconnectivity

Average-case peformance No worst-case delay synchronization

Many interfaces are asynchronous Buses, networks, ...

Page 50: Bridging the gap between asynchronous design and designers

5050

Motivation (technology aspects)Motivation (technology aspects)

Low power Automatic clock gating

Electromagnetic compatibility No peak currents around clock edges

Security No ‘electro–magnetic difference’ between logical ‘0’ and

‘1’in dual rail code

Robustness High immunity to technology and environment variations

(temperature, power supply, ...)

Page 51: Bridging the gap between asynchronous design and designers

5151

DissuasionDissuasionConcurrent models for specification CSP, Petri nets, ...: no more FSMs

Difficult to design Hazards, synchronization

Complex timing analysis Difficult to estimate performance

Difficult to test No way to stop the clock

Page 52: Bridging the gap between asynchronous design and designers

5252

But ... some successful storiesBut ... some successful stories

PhilipsAMULET microprocessorsSharpIntel (RAPPID)Start-up companies:

Theseus logic, Fulcrum Microsystems,Self–Timed Solutions

Recent blurb: It's Time for Clockless Chips, by Claire Tristram (MIT Technology Review, v. 104, no.8, October 2001: http://www.technologyreview.com/magazine/oct01/tristram.asp) ….