ihp im technologiepark 25 15236 frankfurt (oder) germany ihp im technologiepark 25 15236 frankfurt...

151
IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp- microelectronics.com © 2009 - All rights reserved Asynchronous Circuit Design GALS Systems Synchronous and GALS NoCs - DAAD Workshop, Nis, Serbia, July 2009 - Dr. Miloš Krstić

Upload: toby-carmon

Post on 29-Mar-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHPIm Technologiepark 2515236 Frankfurt (Oder)

Germany

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Asynchronous Circuit Design GALS Systems

Synchronous and GALS NoCs

- DAAD Workshop, Nis, Serbia, July 2009 -

Dr. Miloš Krstić

Page 2: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Overview

• Motivation

• Problems of the synchronous design

• Asynchronous circuit design

• GALS - State of the Art

• Synchronous and GALS NoCs

2

Page 3: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Challenges with Synchronous Design

• Most digital systems today operate synchronously.

• However, the complexity of electronic systems grows enormously.

Year

Property 1999 2001 2005 2011

CMOS process [m] 0.18 0.15 0.1 0.05

Transistors on chip [Mtrans/cm2] 7 14 41 247

On-chip clock [GHz] 1.25 1.77 3.5 10

Off-chip clock [GHz] 0.48 0.722 1.035 1.54

Power dissipation (handheld systems) [W] 1.4 1.7 2.4 2.2

Vdd [V] 1.5 1.2 0.9 0.5

3

Page 4: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Classical Synchronous Paradigm

• Usually digital circuits are designed to work synchronously

R1 R2 R3CL3 R4CL4

CLK

CLK

CLK GATING SIGNAL

R1 R2 R3CL3 R4CL4

4

Page 5: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Synchronous communication

• Clock edges determine the time instants where data must be sampled

• Data wires may glitch between clock edges (setup/hold times must be satisfied)

• Data are transmitted at a fixed rate - clock frequency

1 1 0 0 1 0

5

Page 6: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Problems with Synchronous Design

• As clock speeds increase clock distribution becomes difficult:

We need to minimize clock skew.

There is some upper limit to clock speed that depends on the material properties of the device.

It is not possible to propagate a signal from one side of the chip to the other side within the single clock cycle

• Worst-case performance.

• Sensitive to variations in

Voltage, Temperature, Process.

• Not modular

(fixed clock rate: poor match for reusability of components).

• Clock burns large fraction of chip power (~40-70%)

• Synchronization failure.

6

Page 7: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

What is Asynchronous Design ? (I)

• Synchronization is achieved without a global clock.

• Asynchronous Communication:

Handshake mechanisms

7

Sender Receiver

request

acknowledge

data

Page 8: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

What is Asynchronous Design ? (II)

R1 R2 R3

CL3

R4

CTL CTL CTL CTL

CL4

REQ

ACK

R1 R2 R3CL3

R4CL4

LINK / CHANNELTOKEN FLOW

REQACKDATA

EXAMPLE:

8

Page 9: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Asynchronous design styles (I)

• Bundled data (Single Rail) 4 - phase protocol

This style is very widely used because of very small and fast asynchronous controllers

REQ

ACK

DATA

REQ

ACK

DATA

4 PHASE PROTOCOL:ALWAYS LIKE THIS

SOME VARIATIONS

n

9

Page 10: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Bundled data

• Validity signal

Similar to an aperiodic local clock

• n-bit data communication requires n+1 wires

• Data wires may glitch when no valid

1 1 0 0 1 0

10

Page 11: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Asynchronous design stiles (II)

• Bundled data (Single Rail) 2 - phase protocol

This style looks simpler and faster than 4-phase, but controllers are more complex

REQ

ACK

DATA

REQ

ACK

DATA

2 PHASE PROTOCOL

n

11

Page 12: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Asynchronous design stiles (III)

• 4-phase dual rail protocol

Each data bit encoded into 2 wires

Offers generation of Delay-Insensitive circuits

Introduces very big area overhead

ACK

DATA

ACK

DATA

2n

EMPTY 0 0VALUE d.t d.f

VALID “0” 0 1VALID “1” 1 0Not used 1 1

EMPTY EMPTY EMPTYVALID VALID VALID

E 10

12

Page 13: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Dual rail

• Two wires per bit

“00” = spacer, “01” = 0, “10” = 1

1 1

0 0

1

0

13

Page 14: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Asynchronous modules

• Signaling protocol:reqin+ start+ [computation] done+ reqout+ ackout+ ackin+reqin- start- [reset] done- reqout- ackout- ackin-

Data IN Data OUT

req in req out

ack in ack out

DATAPATH

CONTROL

start done

14

Page 15: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Asynchronous components

15

• Asynchronous design require additional components and special logic

• Such components are not available in standard synchronous design kit

• Critical components are C-element and Mutex

Page 16: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Muller C-element

0 0 0A b z

0 1 no change1 0 no change1 1 1

16

Page 17: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Mutual Exclusion element

• ME prevents multiple event propagation

ME is used for arbitration R1R2

G1 G2

MU

TE

X

R2

R1 G1

G2

x1x2

17

Page 18: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Dual-rail logic

A.t

A.f

B.t

B.f

C.t

C.f

Dual-rail AND gate

18

• Dual-rail logic require additional logic for each logical operation

Page 19: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Completion detection (dual-rail)

•••

•••

C done

Completion detection tree

19

Page 20: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Completion detection (bundled-data)

•••

•••

delaystart done

logic

Conventional logic + matched delay

20

Page 21: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Muller pipeline

• The” delay-insensitive handshake machine

• C[i] accepts 1/0 from C[i-1] only if C[i+1]=0/1

• Think of 1010101.. as waves: 10 10 10 1..

• The C-elements propagate waves precisely

• Timing depends on local delays, may vary along the pipe

• If RIGHT is quiet, pipe will fill and stall

C

C[i+2]

C

C[i+1]

C

C[i]

C

C[i-1]

ACK ACK ACK ACK

REQ REQ REQ REQ

ACK

REQ

ACK

REQ

LEFT

ACK

REQ

RIGHT

21

Page 22: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Micropipelines (Sutherland 89)

L L L Llogic logic logic

Rin

Aout

C C

C C

Rout

Aindelay

delay

delay

22

Page 23: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Abstract Pipeline

• Bubbles

• TokensValid (0 or 1, who cares) and Empty tokens

E V V E E

23

Page 24: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Abstract Rings

• 3 stages, 1 bubble:

3 steps for token round

6 steps to cycle

V E V

V E E

V V E

E V E

token

bubble

24

Page 25: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Building Blocks

Latch Source Sink

Fork Join(wait for all)

Merge(wait for one)

MUX

0

1

DEMUX

0

1

Function Block(Join; CL; Fork)

25

Page 26: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Describing Asynchronous Cirsuit - STGs

A+

B+

A–

B–

A

B

A inputB output

26

Page 27: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Control specification – C element

A+

C-

A-

C+A

C

B+

B- B

C

27

Page 28: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Control specification – FIFO Controller

CC

RiRo

Ai

Ao

Ri+

Ao+

Ri-

Ao-

Ro+

Ai+

Ro-

Ai-

Ri Ro

Ao Ai

FIFOcntrl

28

Page 29: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

A simple filter: specification

y := 0;loop x := READ (IN); WRITE (OUT, (x+y)/2); y := x;end loop

RinAin

Aout Rout

ININ

OUTOUT

filter

29

J. Cortadella - Introduction to asynchronous circuit design: specification and synthesis

Page 30: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

A simple filter: block diagram

x y+

controlRin

Ain

Rout

Aout

Rx AxRy Ay Ra Aa

ININOUTOUT

• x and y are level-sensitive latches (transparent when R=1)• + is a bundled-data adder (matched delay between Ra and Aa)• Rin indicates the validity of IN• After Ain+ the environment is allowed to change IN• (Rout,Aout) control a level-sensitive latch at the output

30

Page 31: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

A simple filter: control spec.

x y+

controlRin

Ain

Rout

Aout

Rx AxRy Ay Ra Aa

ININOUTOUT

Rin+

Ain+

Rin-

Ain-

Rx+

Ax+

Rx-

Ax-

Ry+

Ay+

Ry-

Ay-

Ra+

Aa+

Ra-

Aa-

Rout+

Aout+

Rout-

Aout-31

Page 32: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

A simple filter: control impl.

Rin+

Ain+

Rin-

Ain-

Rx+

Ax+

Rx-

Ax-

Ry+

Ay+

Ry-

Ay-

Ra+

Aa+

Ra-

Aa-

Rout+

Aout+

Rout-

Aout-

C

Rin

Ain

Rx Ax RyAy AaRa

Aout

Rout

32

Page 33: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Taking delays into account

x+

x-

y+

y-

z+

z- xz

yx’

z’

Delay assumptions:• Environment: 3 times units• Gates: 1 time unit

events: x+ x’- y+ z+ z’- x- x’+ z- z’+ y- time: 3 4 5 6 7 9 10 12 13 14

33

Page 34: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Taking delays into account

x+

x-

y+

y-

z+

z- xz

yx’

z’

Delay assumptions: unbounded delays

events: x+ x’- y+ z+ x- x’+ y-

time: 3 4 5 6 9 10 11

very slow

failure !

34

Page 35: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Gate vs wire delay models

• Gate delay model: delays in gates, no delays in wires

• Wire delay model: delays in gates and wires

35

Page 36: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Delay models for async. circuits

• Bounded delays (BD): realistic for gates and wires.

Technology mapping is easy, verification is difficult

• Speed independent (SI): Unbounded (pessimistic) delays for gates and “negligible” (optimistic) delays for wires.

Technology mapping is more difficult, verification is easy

• Delay insensitive (DI): Unbounded (pessimistic) delays for gates and wires.

DI class (built out of basic gates) is almost empty

• Quasi-delay insensitive (QDI): Delay insensitive except for critical wire forks (isochronic forks).

Formally, it is the same as speed independent

In practice, different synthesis strategies are used

BD

SI QDI

DI

36

Page 37: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Desynchronization - concept

• Start with synchronous design

• Replace clock with local handshake

• Use standard CAD tools

• Does not change datapath

• Guaranteed correctness

37* Eyal Friedman, Desynchronization - From Synchronous to Asynchronous design, Seminar in VLSI Architecture, Technion, Israel, Spring 2008

Page 38: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Desynchronization - flow steps

• Main assumptions:

Normal Combinatorial logic, DFF

single clock

single clock edge

CL CLD-FF D-FFD-FF

CLK

38

Page 39: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Desynchronization flow step #1

• Replace DFF by M+S latches

CL CLM SM S M S

CLK

CL CLD-FF D-FFD-FF

CLK39

Page 40: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Desynchronization flow step #2

• Add matched delays • Respect bundling assumption

Delay > Tpd of CL

Delay serves as completion signal

CL CLM SM S M S

CLK

CL CLM SM S M S

Matched delay Matched delay

CLK

40

Page 41: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Desynchronization flow step #3

• Replace clock by local handshake controllers

CL CLM SM S M S

Matched delay Matched delay

CLK

CL CLM SM S M S

Matched delay Matched delayctrl ctrl ctrl ctrl ctrl ctrl

41

Page 42: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Why Asynchronous Design?

• We are used to sync designLogic and timing assumptions are simpler, but not true in realityCurrently it is very hard to solve big problems of synchronous design like clock skew, big power consumption, process variability ...

• Common arguments for asynchronous design:Low power ? High speed ? Low emission ? Low sensitivity to PVT (Process, Voltage, Temperature) variations ? High modularity (SoC) ? No clock distribution and timing problems (works) ? Secure chips ?

42

Page 43: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Why not Asynchronous Design?

• Overhead (area, speed, power)

• Hard to designNon-decomposable to small combinatorial logic blocksConverting synchronous design to asynchronous typically fails

• Few CAD toolsThere is no real complete design-flow availableThere is only one commercial async EDA vendor available (Handshake Solutions) with very specific design flow (HASTE)

• Hard to testAsynchronous test methods are not present yet (or not mature enough), and it is difficult to go into any production without proper testing

43

Page 44: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Available tools

• There are several tools available for automation of Asynchronous Design

• Mostly tools are developed at Universities• Two groups of tools: for synthesis of asynchronous controllers

and for design of the systems

• I group

Minimalist

Petrify

3D

II group

BALSA

TAST

HASTE

44

Page 45: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Minimalist

• Developed at Columbia University

• “burst-mode” synthesis package

• based on synthesis of asynchronous FSMs

• integrates synthesis, testability and verification tools

• Good side

Produce Hazard-free control circuits

Contains several different algorithms for synthesis

Can provide generalized C-element based mapping and also behavioral Verilog

• Bad side

Doesn’t support arbitration and EBM

No optimal algorithm selection

45

Page 46: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Petrify

• Designed by J. Cortadella, M. Kishinevsky, A. Kondratyev, L. Lavagno, A. Yakovlev

• Synthesis of Asynchronous controllers defined as Petri Nets or Signal Transition Graphs (STG)

• Good side

Produce optimal Hazard-free control circuits

Can provide generalized C-element based mapping, complex-gate mapping and mapping to the technology libraries

• Bad side

Supports only asynchronous design, not mixed sync-async

With increased number of signals, synthesis time grows exponentially

Suitable for relatively small controllers

46

Page 47: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

3D

• Produced by Kenneth Yun

• “Extended Burst-Mode” synthesis package

• Good side

Produce Hazard-free control circuits

Supports restricted multiple-input change (input burst) with don't-care inputs

Supports input choices based on sampling possibly glitchy signals

Suitable for mixed sync-async systems (like GALS)

• Bad side

No technology mapping

No optimal algorithm selection

No support and further development

47

Page 48: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

TAST

• Produced by TIMA Laboratory, France

• TAST is compiler/synthesizer of Asynchronous digital circuits from high level communication description language

Input is CHP language, which can describe Petri Nets.

It is using VHDL as a format for behavioral and post synthesis simulation.

Produces QDI (dual-rail, 1-M code rail) circuits

• Good side

Produces complete asynchronous system and provides full design-flow

• Bad side

Uses QDI style, which gives very big area overhead

Gives not optimized output circuits

Not available in the moment

48

Page 49: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

TAST Design flow

49

Page 50: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

BALSA

• Produced by University of Manchester• BALSA is compiler/synthesizer of Asynchronous digital circuits

from high level communication description language

Input is BALSA language developed specially for this package

Produces Bundled data, Dual-rail, 1-M code rail circuits

• Good side

Produces complete asynchronous system and provides full design-flow

• Bad side

Gives large overhead compared with manual design (up to 300 %)

All tools are not freely available

50

Page 51: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

BALSA Design Flow

51

Page 52: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Asynchronous Success Stories - Philips

Philips developed its own full design flow based on TANGRAM language

Design flow also contains design for testability

Asynchronous Demonstrators

DCC error corrector - 1993-1994 - Low Power

80C51 - 1995 - Low Power, Low EMI

Smartcards - 1998 - Low Power, Security

DCC error corrector date area [mm2] power [mW]

synchronous 93 3.4 2.60

async (dual-rail) 93/05 7.0 0.41

synchronous 94 3.3 0.60

async (single rail) 94/09 3.9 0.08

52

Page 53: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Asynchronous Success Stories - Philips 80c51 (I)

• Application - Pager baseband controller

First asynchronous C ever on the market

• Motivations for asynchronous solution of 80c51

Low power

Low EMI for easy integration

53

Page 54: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Asynchronous Success Stories - Philips 80c51 (II)

• Low power issue

Circuit is only active when and where needed

54

Page 55: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Asynchronous Success Stories - Philips 80c51 (III)

• Low current peaks

55

Page 56: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Asynchronous Success Stories - Philips 80c51 (IV)

• Low EMI

56

Page 57: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Asynchronous Success Stories - RAPPID

• RAPPID - Revolving Asynchronous Pentium Processor Instruction-length Decoder

• Instruction Length Decoder was performance bottleneck in ca. 1995-vintage CISC processors

• Potential for optimization for common cases (RISC-like)

• Results

Developed a novel aggressive asynchronous method

About 3x throughput T=3x

About one half latency L=2x

About one half power P=2x

About same area A=0.8x

Namely, this is TxLxPxA 10 improvement

57

Page 58: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Asynchronous Success Stories - Amulet

• Amulet group is formed in Manchester University

• Amulet1 (1994)

60000 transistors in 1.0m, ARM6 instruction set

Half instruction throughput with same energy efficiency as ARM6

• Amulet2e (1996)

450000 transistors in 0.5m, ARM7 compatible

Still half the performance of a synchronous chip

• Amulet3i (2000)

800000 transistors in 0.35m, ARM9 compatible

Same performance as synchronous solution with an equal or marginally better energy efficiency

58

Page 59: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Globally Asynchronous Locally Synchronous (GALS) Systems

59

Page 60: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

GALS Technique

60

• GALS is abbreviation for Globally-Asynchronous Locally-Synchronous systems.

• GALS techniques have the potential to solve some of the most challenging design issues of SoC integration of communication systems.

Page 61: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Asynchronous wrapper

GALS method

Req

Ack

Data

SynchronousSynchronousblock 3block 3

SynchronousSynchronousblock 1block 1

SynchronousSynchronousblock 2block 2

Asynchronous wrapper

Asynchronous wrapper

Network Node

Network Node

Network Node

Data

• GALS can be used on ist own or within the NoC concept

61

Page 62: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

GALS as a Powerful Design Technique

• In the wireless communication systems GALS can approach the main design challenges.

• GALS makes data transfer between the blocks very easy.

• Design problems as timing closure or clock-tree generation are limited to the level of much smaller local blocks.

• Decoupling of local blocks from central clock source reduces spectral noise considerably.

• Power saving is automatically integrated in asynchronous wrapper.

62

Page 63: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Power reduction with GALS

DDAATTAAPPAATTHH MMEEMMOORRYY

CCOONNTTRROOLL,, II//OO

CCLLOOCCKK

Power distribution in high-

performance CPU

• Clock signal is the dominant source of power consumption .

• First estimations showed that about 30% of power savings could be expected in the clock net due to the application of GALS.

• Recently, some more pessimistic power estimation figures were presented

• GALS techniques offer independent setting of frequency and voltage levels for each locally synchronous module.

• When using dynamic voltage scaling (DVS), an average energy reduction of up to 30% can be reached

63

Page 64: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Potential for reducing EMI with GALS

• We have simulated noise generated on the power supply line in the synchronous and request-driven GALS system.

dB

Frequency GHz

Frequency GHz

64

0.5 1 1.5 2 2.5 3 3.5 4 4.5

-20

-40

-60

-80

-100

-120

0.5 1 1.5 2 2.5 3 3.5 4 4.5

-20

-40

-60

-80

-100

-120

-140

dB

GALS introduces reduction of GALS introduces reduction of about 20 dBabout 20 dB

Page 65: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

GALS Opportunities – 3D Integration

• 3D Integration can be very interesting as the application field

SensorSensor

A/DA/D

MemoryMemory

DSPDSP

CommComm

65

Page 66: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

GALS Opportunities - NoCs

• Another interesting application can be Networks on Chips and MP SoCs (Multi-Processor System-on-Chip)

IP coremaster

IF

IFIP coreslave

switch

IP coremaster

IF

IP coremaster

IF

IFIP coreslave

IF IP coreslave

switch

switch

switch

Network on Chip

IP coremaster

IF IP coremaster

IF

IFIP coreslave

IFIP coreslave

switch

IP coremaster

IF IP coremaster

IF

IP coremaster

IFIP coremaster

IF

IFIP coreslave

IFIP coreslave

IF IP coreslave

IF IP coreslave

switch

switch

switch

Network on Chip

66

Page 67: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

GALS Opportunities – Process Scaling and Variability

• Asynchronous design gives average-case performance in comparison to worst-case performance of synchronous system

Variability on the Vth makes individual transistors faster or slower, more or less energy consuming.

65nmmin-size

VtNom

%Vth variability = +/- 30% (+/-3)

67

Page 68: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

GALS Methods

• GALS based on synchronizers

• GALS based on asynchronous FIFOs

• GALS based on pausible clocking

68

Page 69: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

GALS with the Synchronizers

req

ack

req

ack

Handshake Converter

2-phase handshake

4-phase handshake

data

Clockless domain

Clocked domain

clock

69

Page 70: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

GALS with FIFOs

Locally Synchronous

Module 1

Clock 1

F

IFO

Locally Synchronous

Module 2

Clock 2

Wr_clk

Wr_en

Data Data

Rd_en

Rd_clk

full

empty

Rd_valid

70

Page 71: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Asynchronous wrappers

• GALS usually contains synchronous islands communicating with each other through asynchronous wrappers

• Asynchronous wrapper surrounds locally-synchronous islands

Wrapper consists of pausable clock and Input & Output ports

71

Page 72: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Classical Pausible Clocking GALS approach

Locally Synchronous

Module 1

Local Clock

Generator1

Ou

tpu

t po

rt

Locally Synchronous

Module 2

Local Clock

Generator2

Inp

ut p

ort

Data

stretch1 stretch2

• Published in Jens Muttersbach et al., Globally-Asynchronous Locally-Synchronous Architectures to Simplify the Design of On-Chip Systems, In Proc. of ASIC/SOC Conference, pp. 317-321, Sept. 1999.

72

Asynchronous Wrapper 1

Asynchronous Wrapper 2

handshake

Page 73: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Pausable Clock Generator

73

AARRBBIITTEERR

CC

AACCKKII11//22 RREEQQII11//22

LLCCLLKK DDEELLAAYY LLIINNEE

RRCCLLKK

SSTTOOPPII

RRCCLLKKDD

ccllkk__ggrraanntt

rrccllkk

rrccllkkdd

ffiinn ffoouutt

bboouutt bbiinn

DDEELLAAYY SSLLIICCEE

DDEELLAAYY SSLLIICCEE

DDEELLAAYY SSLLIICCEE

cccc11 cccc22 ccccnn

Page 74: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Main challenges of the typical GALS methods

• In many solutions, the problems of data transfer and throughput is critical.

Most of them can perform data transfer every second clock cycle of the local clock.

• Some described circuits can theoretically transfer data every clock cycle.

However, the intensive stretching of the pausable clock generator will significantly diminish the practical performance.

• The latency of the transferred data is not known in advance and may vary significantly from one data transfer to the other one.

• It is not very practical to use the ring oscillators for local clock generation.

• All solutions are oriented towards a very general application.

They are not optimised for specific systems and environmental demands.

74

Page 75: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Basic concept of the request-driven operation

• This approach covers point-to-point communication with very intensive but bursty data transfer.

• When receiving input burst, GALS block can operate in a request-driven mode.

• When there is no input activity, the data stored inside the locally synchronous pipeline has to be flushed out.

Then a local clock generator drives the GALS blocks.

• A Time-out function controls the transition from request driven operation to local clock generation mode.

75

Page 76: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Request-driven asynchronous wrapper

• Local clock can be generated either internally or externally.

Locally

Synchronous Module

Local clock generation

Inp

ut

p

ort

Ou

tpu

t p

ort

Time-out detection

Han

dsha

ke

sign

als

Han

dsha

ke

sign

als

Asynchronous wrapper

Data

Data

request driven clock

local clock

76

Page 77: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

What can we gain from this GALS technique?

• Reliable and fast transfer of large bursts of data is achieved. Data transfer is possible at every clock cycle of synchronous block.

• In request-driven mode operation there is no arbitration in input port. The circuit immediately responds to input requests.

• The clock speed is determined by the master and not by the slower participant in the communication.

• The local clock can be generated internally or externally.

• This proposed architecture offers an efficient power-saving mechanism, similar to clock gating.

• EMI should be reduced due to varying delays and frequencies in different asynchronous wrappers.

77

Page 78: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Building the wrapper components - input port

78

RREEQQ__IINNTT

AACCKKEENN

RRSSTT

AACCKKCC

IINNPPUUTT CCOONNTTRROOLLLLEERR

RREEQQ__AA11

AACCKK__AA AACCKK__IINNTT

RREEQQII11 AACCKKII11

SSTT SSTTOOPP

• Input port has to provide control of the dataflow according to a ‘broad’ 4-phase handshake protocol.

• The input port consists of a speed-independent (SI) input controller along with few additional gates that have to provide glitch-free transitions of the input signals.

Page 79: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Input controller specification

79

AACCKKCC--,, SSTT++ //

RREEQQ__AA11++ // RREEQQII11++

00

11

22 33

44

55

66 77

RREEQQ__AA11++ // RREEQQ__IINNTT++,,

RRSSTT++,, AACCKK__AA++

AACCKKCC++,, RREEQQ__AA11-- // RREEQQ__IINNTT--,, RRSSTT--,,

AACCKK__AA--

AACCKKCC--,, RREEQQ__AA11++ // RREEQQ__IINNTT++,,

RRSSTT++,, AACCKK__AA ++

SSTTOOPP++ // RRSSTT++

SSTTOOPP--,, SSTT--// RRSSTT--

AACCKKII11++ // AACCKKEENN++,,

RREEQQII11--

AACCKKCC-- // AACCKK__AA ++,, RRSSTT++

AACCKKII11--,, AACCKKCC++ //

88

RREEQQ__AA11--,, SSTT--// AACCKK__AA--,, RRSSTT--,,

AACCKKEENN--

99

RREEQQ__AA11++ // RREEQQ__IINNTT++,,

RRSSTT++,, AACCKK__AA++

ST+ /

• Input controller is modeled as an AFSM (asynchronous finite state machine).

• The controller is specified according to burst-mode requirements.

• Burst-mode AFSM is implemented as ‘Huffman Machine’ without explicit latches.

State graph of the input controller

Hazard-Free Combinational

Network

X

YZ

A

BC

outputsinputs

State (several bits)

Request-driven mode

Local clock generation mode

Transitional mode

Idle mode

Page 80: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Input controller implementation

• Burst-mode input controller is synthesized using 3D tool that supports 2-level hazard-free logic minimization and achieves optimal state assigment:

REQ_INT = REQ_A1 REQ_INT + ACKC' REQ_INT + REQ_A1 ACKC' ST' ACKEN' REQ_INT = REQ_A1 REQ_INT + ACKC' REQ_INT + REQ_A1 ACKC' ST' ACKEN'

ACK_A = ACKC' REQ_INT + REQ_A1 RST +ACKC' ST ACKI1' ACKEN Z0' + REQ_A1 ACKC' ACK_A = ACKC' REQ_INT + REQ_A1 RST +ACKC' ST ACKI1' ACKEN Z0' + REQ_A1 ACKC'

ST' ACKEN'ST' ACKEN'

ACKEN = ACKI1 + REQ_A1 ACKEN + ST ACKENACKEN = ACKI1 + REQ_A1 ACKEN + ST ACKEN

RST = STOP + ACKC' REQ_INT + REQ_A1 RST + ST RST + ACKC' ST ACKI1' ACKEN Z0' + RST = STOP + ACKC' REQ_INT + REQ_A1 RST + ST RST + ACKC' ST ACKI1' ACKEN Z0' +

REQ_A1 ACKC' ST' ACKEN'REQ_A1 ACKC' ST' ACKEN'

REQ_I1 = REQ_A1 ST ACKI1' ACKEN'REQ_I1 = REQ_A1 ST ACKI1' ACKEN'

Z0 = ACKI1 + REQ_A1' ACKC + REQ_A1' ST' Z0 + ACKC' ACKEN Z0 + ACKC ACKEN' Z0Z0 = ACKI1 + REQ_A1' ACKC + REQ_A1' ST' Z0 + ACKC' ACKEN Z0 + ACKC ACKEN' Z0

• Logic equations are automatically converted into synthesizable structural VHDL code with our 3DC tool.

• Formal analysis of the asynchronous wrapper is performed.

80

Page 81: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

VHDL description of a port

UN1: inv1x port map (ackc,t3);UN2: inv1x port map (st,t4);UN3: inv1x port map (clk1,t5);UN4: inv1x port map (req,t6);UN5: inv1x port map (ackeni,t7);UN6: inv1x port map (endi,t8);UN7: inv1x port map (z0,t9);UN8: inv1x port map (z1,t10);UN8i: inv1x port map (dvsi,t11);

U6: and2ix port map (reqci,ackc,t1);U7: and2x port map (req,reqci,t28);U8: and4x port map (req,t3,t4,t9,t12);U9: or3x port map (t1,t28,t12,reqcix);

U7i: and2x port map (req,reseti,t2);U7ii: and2x port map (st,acki,t31);U13: and3x port map (req,t3,z0,t13);U14: or5x port map (t1,t13,t12,t2,t31,ackix);

U10: and2x port map (ackc,ackeni,t14);U12: and2x port map (t9,ackeni,t15);U15: or3x port map (t15,t14,clk1,ackenix);

U11: and3x port map (st,t3,z0,t16);U19: or5x port map (endi,t1,t2,t12,t16,resetix);

U17: and2x port map (t7,t9,t17);U18: and3x port map (req,st,t5,t18);U20: and2x port map (t18,t17,reqiix);

U25: and2x port map (req,z0,t22);U26: and2x port map (st,z0,t23);U23: and3x port map (ackc,t5,ackeni,t21);U27: or4x port map (t21,t22,t23,endi,z0x);

U28: and2x port map (t6,ackc,t24);U29: and2x port map (ackc,z1,t25);U30: and3x port map (t6,t4,z1,t26);U32: or3x port map (t25,t26,t24,z1x);

entity and2x is port (a,b: in std_logic; c: out std_logic);end and2x;architecture struc of and2x isattribute DONT_TOUCH_NETWORK of a,b,c: signal is true;beginc<=(a and b) after 100 ps;end struc;

81

Page 82: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Externally-driven GALS Wrapper

LLOOCCAALLLLYY SSYYNNCCHHRROONNOOUUSS

MMOODDUULLEE

CCMMUU

II NNPP

UUTT

PP

OORR

TT

OOUU

TTPP

UUTT

PP

OORR

TT

TTIIMMEE--OOUUTT DDEETTEECCTTIIOONN

HHaa n

ndd

ss hh

aa kk e

e ss i

i ggnn

aa ll ss

HHaa n

ndd

ss hh

aa kk e

e ss i

i ggnn

aa ll ss

AAssyynncchhrroonnoouuss wwrraappppeerr

DDaattaa__iinn DDaattaa__oouutt

rreeqquueesstt ddrriivveenn cclloocckk

eexxtteerrnnaallllyy ggeenneerraatteedd cclloocckk

EExxtteerrnnaall cclloocckk

AAddaapptteedd bblloocckk

RReeuusseedd bblloocckkss

82

Page 83: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Clock Management Unit

EECCLLKK

eexxtteerrnnaall__cclloocckk

RREEQQII11 SSttrreettcchh AACCKKII11

ccllkk__ggrraanntt MMUUTTEEXX MMUUTTEEXX

MMUUTTEEXX

--

CC

++

CC

CC

SSTTOOPPII

AANNDD22

MM33

MM11 MM22

CC22 CC11

CC33

OORR22

IINNVV11

ssttee

ccgg

ssttii

83

Page 84: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Baseband processor for WLAN

• The goal of one of our projects was to develop a wireless broadband communication system in the 5 GHz band.

• The modem is compliant with the IEEE802.11a WLAN standard

• System uses Orthogonal Frequency Division Multiplexing (OFDM) with data rates ranging from 6 to 54 Mbit/s.

• The synchronous baseband processor was implemented as an ASIC (700k gates).

84

Page 85: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Structure of the synchronous baseband processor

• Baseband processor includes receiver and transmitter datapath structure.

• Very complex blocks are implemented such as Viterbi decoder, FFT, IFFT, CORDIC processors, ...

80 Msps block

20 Msps block

85

Baseband Processor

Transmitter

Receiver

Input buffer

Scram

blerS

ignal field generator

Encoder

Interleaver

Mapper

Pilot insertion

Pilot scrambler

IFF

T

Guard interval insertion

Pream

ble insertion

Synchronizerdatapath

Channel

estimator

Dem

apper

Deinterleaver

Viterbi decoder

Encoder

Interleaver

Mapper

Descram

bler

Parallel

converter

FF

T

Synchronizertracking

Buffer 20 - 80

Buffer 80 -20

Page 86: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Design challenges in the baseband processor

• Design of the baseband processor involves the challenges as:- several clock domains,- global clock tree generation, - large number of clock leaves (36 k flip- flops),- clock skew handling, - timing closure between the different modules, - clock gating, - power consumption, - EMI.

• Request–driven GALS architecture was developed as a possible solution for those problems.

86

Page 87: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

GALS partitioning

Tx_1

Baseband Processor

Input buffer

Scram

blerS

ignal field generator

Encoder

Interleaver

Mapper

Pilot insertion

Pilot scrambler

IFF

T

Guard interval insertion

Pream

ble insertion

Synchronizerdatapath

Channel

estimator

Dem

apper

Deinterleaver

Viterbi decoder

Encoder

Interleaver

Mapper

Descram

bler

Parallel

converter

FF

T

Synchronizertracking

Buffer 20 - 80

Buffer 80 -20

Tx_2 Tx_3T

x_int

(async-syn

c interface)

Rx_3 Rx_2

Rx_1

Rx_in

t (asyn

c-sync in

terface)

To

ken rate

adap

tation

FIF

O T

A

Rx_TRAA

ctivation

interface

• The partitioning process has to take into account possible power saving.

80 Msps block

20 Msps block

Rate adaption block

Interface block

87

Page 88: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Test strategy

• We are using a hardware tester which is strictly cycle based and cannot react to asynchronous output signals of the circuit.

• The GALS arbitration processes preclude cycle level determinism.

• We want to have a possibility to run very complex functional tests internally.

• Applied test technique should support system diagnosis.

• A test strategy based on Built-In Self-Test (BIST) is proposed.

• BIST reduces the effort for generating a test program and enables us to use a synchronous tester.

88

Page 89: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Design for Testability in GALS

• TPG and TDE are based on the linear feedback shift register structure with embedded additional logic.

• A central BIST controller performs control of the test procedure.

• We can run hierarchical tests.

• This BIST technique can be used as a method for prototype verification.

• In combination with the scan approach, BIST can be even used as a basis for the manufacturing test.

Tx_1 block

Tx_in

t

Rx_in

t

F

IFO

_TA

Activation interface

TDE0

TPG0

TDE2 TDE3 TDE4

TDE5

TDE6 TDE7

TDE8 TDE9

TDE10

TPG2 TPG1

TDE1

TPG4 TPG3 B

IST

inte

rnal lo

op

Tx_2 block

Tx_3 block

Rx_3 block

Rx_TRA block

Rx_2 block

Rx_1 block

89

Page 90: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Design flow

• We have used IHP 0.25 CMOS process.

• Asynchronous wrapper is equivalent to about 1.3 k inverter gates.

Only tunable clock generation is 0.9 k gates.

• Asynchronous wrapper has throughput up to 150 Msps in request driven mode and 100 Msps in local mode.

This application needs 80 Msps.

90

AFSM specifaction

3D - Logic synthesis

3DC tool – translation from 3D to structural

VHDL

Functional specification

VHDL description

Abstract behavioural simulation

Gate mapping

Realistic behavioural simulation

Timing driven synthesis

Postsynthesis simulation

Layout

Back annotation

Tape-out

Asynchronous wrappers

Synchronous blocks

Synopsys DC

Synopsys DC

Cadence Silicon Encounter

Model Sim

Model Sim

Model Sim

Model Sim

Power estimation

Prime Power

Power estimationPrime Power

Formal analysisLoLA

Page 91: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Area and power distribution

• Area and power statistics are based on the synthesized netlist data.

Locally synchronous blocks occupy around 90% of the total area, The BIST circuitry requires around 3.5%, interface blocks 2.9%, and asynchronous wrappers 2%.

• Based on the switching activities, in the realistic transceiver scenario, power estimation with Prime Power tool has been performed.

Synchronous datapath logic uses most of the power (around 52.4%),then local synchronous clock trees are using 34.5%, async-to-sync interfaces 7%, and asynchronous wrappers 2.9%.

• After layout, the estimated power consumption is 324.6 mW.

91

Page 92: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Implementational results

• Our GALS baseband processor

is fabricated and tested.

• The total number of pins is 120 and the

silicon area including pads is 45.1 mm2.

• Measured dynamic power dissipated in

the pure synchronous baseband processor

was 332 mW, and for the GALS baseband

processor slightly lower, at 328 mW.

Receiver

Transmitter

92

Page 93: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Improving System Integration with GALS

• Synchronous baseband processor challenges:

- several clock domains,

- global clock tree generation,

- large number of clock leaves,

- clock skew handling,

- timing closure between blocks,

- clock gating.

93

Solved by GALS architectureSolved by GALS architecture

No global clock in GALSNo global clock in GALS

Clock leaves distributed over Clock leaves distributed over GALS blocksGALS blocksClock skew is reduced from Clock skew is reduced from 660ps to 486 ps660ps to 486 psCommunication between the Communication between the blocks through handshakingblocks through handshakingClock-gating embedded in the Clock-gating embedded in the asynchronous wrapperasynchronous wrapper

Page 94: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

EMI measurement (I)

• The supply voltage variation spectrum of the inner processor core is measured.

0

-10

-20

-30

-40

-50

-60

-70

0 50 100 150 200 250 300 350 400 450 500

synchronous baseband processor GALS baseband processor

dB

MHz

94

~ 5 dB~ 5 dB

Page 95: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

EMI measurement (II)

• Additionally, instantaneous supply voltage peaks are reduced from 140 mV (synchronous design) from cycle to cycle to the less than 100 mV (GALS).

• This reduction can be very important for mixed-signal designs and for secure systems.

• An application with fine-grained GALS partitioning can lead to results closer to theoretical maximum reduction.

95

Page 96: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Conclusions

• There are several asynchronous design currently on the market

Asynchronous design is with greatest success used in the medium complexity - medium performance circuits

• Future applications

GALS, large networks on the chips (NoCs)

3D Integration

Some local blocks in the GALS then could be asynchronous

Asynchronous circuitry can provide lower EMI for SOCs

• Design & Test flow remains as a problem

96

Page 97: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Synchronous and GALS Networks on Chips

97

Page 98: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Synchronous and GALS NoCs

• Today on-chip design is more and more communication-centric

• Classical topologies are not sufficient (point-to-point, mesh, bus, etc.)

• Shared bus = low performance

Bandwidth is shared

Bus width (bits) relatively small

Global clock frequency limited

• Disadvantage of multiple buses

Not scalable, not generic

• Promising alternative could be Networks on Chip (NoCs)

• NoCs can be implemented completely synchronously, mesochronously, or in GALS fashion

98

Page 99: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

NoC Paradigm

• Apply Networks Protocols to SoC

• Network:

Provides communication

Satisfy quality-of-service requirements:

Reliability

Performance: Throughput, latency, ..

Power ?

• Additional requirements unique to NoC

Energy bounds

Area

Fit it to the standard design flow

99

Page 100: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Switching Network Basics

• Transport Layer: Msg end-to-end

Implemented using network adapters

Assembly and disassembly of the packets at source/destination

• Network Layer: Pkt end-to-end

Implemented using routers

Routers decide the routing path to destination

header of the packet

topology knowledge

Scalable distributed system: load shared between routers

• Data-Link Layer : Pkt over link

Packets: header, payload, trailer

Error correction (on packet): redundancy, error correction codes

* Technion - Asynchronous NoC - Nikolai Samolazov

100

Page 101: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Bus vs. Network Arguments

  BUS NoC

Scalability: Every IP adds parasitic capacitance

Only P2P connections

  Timing is difficult Can be pipelined

  Bus Arbiter performance Load shared by routers

Bandwidth: Limited and shared by all IP

Scales with network size

Latency: Zero when granted control Network latency always exists

Cost: Low area Significant area

Design Complexity:

Simple: well known and understood

Requires changes in HW and sometimes SW levels

101

Page 102: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Hybrid Network

• Shared Busses as first level communication medium

• NoC routers as main communication devices

102

Page 103: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Homogenous NoC

FU

FU

FU

FU

FU

FU

FU

FU

FU

FU

* NoC General Concepts - Andreas Ehliar - Per Karlström103

Page 104: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Heterogeneous NoC

FU

FU

FU

FU

FU

MUL

ALU

DSP

104

Page 105: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Heterogeneus NoC

FU

FU

FU

FUMUL

ALU

DSP

105

Page 106: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Quality of Service

• Guaranteed latency

• Guaranteed bandwidth

• Correctness

106

Page 107: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Design Issues - Architecture

FU FU

FU FU

FU

FU

FU FU FU

108

Page 108: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Design Issues - Architecture

FU FU

FU FU

FU

FU

FU FU FU

109

Page 109: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

NoC Design

• Architecture

Network Adapter and Router Architecture

- Asynchronous or synchronous

Network Topology

Routing Strategy

- Static Routing

- Adaptive Routing

Interconnect

- Repeaters

- Pipelining

• Design Technology

Tools and Methodologies

Simulation and (correctness, performance, power) Validation

- SystemC

111

Page 110: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Design Issues - Flow Control

112

Page 111: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Design Issues - Long Wires

• Solving the global interconnect mess

Delay

Bit errors

Repeaters

Clock domains

• Create one optimized solution that can be reused

113

Page 112: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Design Issues - Long Wires

• Add flip flops to increase clock frequency

• What about ACKs?

NoCRouter

NoCRouter

114

Page 113: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Design Issues - Long Wires

• Add flip flops to increase clock frequency

• What about ACKs?

NoCRoute

r

NoCRoute

r

115

Page 114: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Design Issues - Long Wires

• Bit errors on long wires will not be avoidable in the future

• Use error correcting codes

Disadvantage: More wires, more throughput needed

• Use parity bits to discover errors

Resend damaged packets

No longer possible to guarantee real-time performance

116

Page 115: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Design Issues - Long Wires

• Possibility to create heavily optimized solution

Low voltage signaling

Advanced symbol encoding/decoding

Wave pipelining

117

Page 116: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Design Issues - Long Wires

• High performance interconnect through wave pipelining

Need very careful analysis

NoCRoute

r

NoCRout

er

NoCRoute

r

NoCRout

er

118

Page 117: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Design Issues - Long Wires

• Wave pipelining performance

3.45 GHz signaling on one bit line in 0.25 um

More energy efficient than regular pipeline

Faster than regular pipeline

• Disadvantage

Much harder to test/verify

119

Page 118: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Network Topologies

• Mesh• Tree• Fat-Tree• Routing algorithm depends on topology

120

Page 119: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Routing

• Routing: path from source to destination.

Must: deadlock free, livelock free

Livelock: message proceeds indefinitely, but never arrives

Possible only in adaptive non-minimal routing

Deadlock: packets waiting for each other in a cycle

• Three main categories:

Static (non-adaptive): predetermined path

Minimal fully adaptive: routes through any shortest path

Partially adaptive:

multiple routing paths

Some paths not shortest

121

Page 120: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Wormhole Routing

• Header forwarded ASAP, not waiting to trailer

• Used in high-performance parallel computing networks (lumped)

Not in the internet (distributed)

• Packet may span several routers

Packet divided into flits (atomic flow control units)

• Main Disadvantage: cascaded contention

Packet requests busy link

VLSI routers: small buffers packet cannot be buffered in one router

Routers spanned by packet are stalled

Practical limitation, prevents achieving theoretical bandwidth

122

Page 121: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

NoC Design Characteristics: Cost

• Area

Network components area

Wires, repeaters area

• Power

Energy per transmitted packet

Idle power

123

Page 122: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

NoC Design Characteristics: Performance

• Latency [sec]

From header leaving source, to trailer reaching destination

Composed of waiting latency + network latency

Waiting Latency

Time message waits before entering the network

Network Latency

Time message travels inside the network

• Throughput [bits/sec]

Measured at network port

Average amount of user data that is accepted by the network on that port in a certain amount of time

• Aggregate Throughput [bits/sec]

Sum of the throughputs at all network ports

124

Page 123: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

NoC Saturation

• Offered Load

Traffic produced by network clients as percentage of maximal network bandwidth

L : number of cycles needed to accept the message, D : average number of cycles between messages

• Saturation Threshold:

Offered Load at which average latency rises exponentially to infinite value

DL

LOL

125

Page 124: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Cost - Performance Tradeoff

Santiago Gonzalez Pestana et al. “Cost-Performance Trade-offs in Networks on Chip: A Simulation-Based Approach”, DATE 2004

126

Page 125: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Architecture of On-Chip Router

127

•Technion, Asynchronous vs. Synchronous Design Techniques for NoCs

•Robert Mullins, Asynchronous vs. Synchronous Design Techniques for NoCs

Page 126: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Router Pipeline

• Numerous stages of Router Pipeline

• Raise communication latency

• Can make packet buffers less effective

• Incurs pipelining overheads

128

Page 127: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Synchronous NoCs - Summary

• Can design high-performance single cycle routers

• Design is simplified by presence of global synchrony

• Distribution of global clock can be eased by

New clock generation / distribution techniques

Source synchronous communication

129

Page 128: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Limitations of Fully-Synchronous Networks

1. Difficult to distribute clock

Network spread over die & may have irregular layout

Minimising skew costs complexity and power

• Alternatives/extensions to PLL and H-tree:

Clock deskewing techniques

Distributed Clock Generator (DCG).

Distributed PLLs

Standing-wave oscillators and rotary clock schemes

Resonant global clocks, optical clock distribution etc.

130

Page 129: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Limitations of Fully-Synchronous Networks

2. Single Network Clock Frequency

Communicating synchronous IP blocks may operate at different and potentially adaptive clock frequencies

What is most appropriate network clock frequency?

131

Page 130: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Why Asynchronous NoCs

• No clock distribution, simple solution• Networked IP blocks run at different clock frequencies

No synchronization issues at interfaces• Ability to exploit data / path-dependent delays

Low-latency common or high-priority paths through router• Freedom to optimize network links

Not constrained by need to distribute/generate multiple clock frequencies. Can exploit high-frequency narrow links

Dynamic latency/throughput trade-offs (adaptive pipeline depth)

Exploit dynamic optimizations on links (e.g. DVS)• Easy to use interfaces, modularity, Robust and simple

implementation, Reduced design time• Some arguments for reduced power

132

Page 131: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Different NoC Architectures

• Router clocks derived from a single source

• Locally Generated Clocks (periodic & free-running)

• Synchronous Routers with Asynchronous Links

• Locally Clocked Routers / Asynchronous Interconnect (GALS style network)

Can support asynchronous interconnects

No longer exploiting periodic nature of router clocks

Correct operation is independent of the delay of the link

• GALS interfaces with pausible clocks

If necessary clock is stretched, data is always transferred reliably

Need to construct local delay line

• Local aperiodic clock generation

• Data-Driven Local Clock

Similarities to stoppable GALS interface and asynchronous priority arbiters

133

Page 132: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Mesochronous Clocking

• Clock skew may force the system to be partitioned into multiple clock domains

• Can exploit the fact that only the phase of each router’s clock differs, simple error-free clock-domain crossing possible (single clock source)

134

Page 133: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Router clocks derived from a single source

• Each router’s clock may be generated from the global network clock, either by:

Clock division or

Clock multiplication

• Clock domain crossing techniques can exploit known clock frequency relationships

Chakraborty and M. Greenstreet, “Efficient Self-Timed Interfaces for Crossing Clock Domains”, In Proceedings ASYNC’03

L. F. G. Sarmenta, G. A. Pratt and S. A. Ward, “Rational Clocking”, ICCD’95

135

Page 134: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Using Synchronisers for GALS NoCs

• Asynchronous channel uses 4-phase bundled data protocol

A. Sheibanyrad, A. Greiner, Two efficient synchronous asynchronous converters well-suited for networks-on-chip in GALS architectures, 2005

136

Page 135: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Locally Generated Clocks (periodic & free-running)

• Can exploit knowledge about clocks (when crossing clock domains) even if all we know is that they are periodic, examples:

predictive synchronizers [Dally][Frank/Ginosar]

asynchronous FIFOs [Chakraborty/Greenstreet]

137

Page 136: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Using Asynchronous FIFOs in GALS NoCs

• Synchronous network wrapper assembly/disassembly data packets

• Can connect many independent clock domains

138

Page 137: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

NoC architecture for low power

• NoC concept together with GALS methodology gives good opportunities for power saving

• Each hardware block in NoC system can be setted to the optimal frequency/voltage

• Best is to combine DVFS with GALS concept in order to reduce power

139

Page 138: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

NoC architecture for DVFS – LETI Solution (NoCs 2008)

• A fully asynchronous Network-on-Chip

• IP units are synchronous islands using programmable Local Clock Generator

• Within the IP unit

Synchronization is done thanks to Pausable Clock

A Power Unit manages internal Vcore generated using external Vhigh and Vlow

A Network Interface is in charge of

NoC communications

Local Power Management

• Main CPU in charge of global power management

140

Page 139: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

DVFS with GALS NoCs

• Each synchronous IP is an independent power and frequency domain

• A local fine grain Dynamic Voltage Scaling:

Implementation of a local hardware controller to control transitions between Vhigh and Vlow

Ensures smooth DVS transitions for IP safe computation

• A local fine grain Dynamic Frequency Scaling:

Automatic frequency scaling

Use of clock generation re-programming to find the optimal V/F point of operation

• Thanks to pausable clock technique, IP unit continues its operation during DVFS phases

• GALS architecture and local clock generation is a natural enabler for easy local DVFS

141

Page 140: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

NoC Unit architecture

• Each IP core encapsulated with

Network Interface

Test Wrapper

Pausable Clock

Power Supply Unit

• IP units have 5 supply modes

Init: reset at Vhigh (1.2V)

High: Vhigh supply

Low: Vlow supply (0.8V)

Hopping: switch Vhigh / Vlow for DVFS

Idle: retention state at Vlow (no clock)

Off: stand-by mode

142

Page 141: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Local Power Manager

• Local Power Manager handles unit power modes

• A set of programmable registers, through the NoC

• Configuration of

Programmable delay line

Power Supply Unit

• Pulse Width modulator used to control the Hopping mode

143

Page 142: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Power Supply Unit

• Power Supply Unit manages Vcore

• Two power switches Thigh and Tlow LVT transistors

• A Hopping Unit

• An Ultra Cut-Off Generator

144

Page 143: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Hopping Unit

• Energy per operation scales with V²

Decrease Voltage (and Frequency) to be energy efficient

• «Triple state» power supply

Use of two PMOS power switches

Vhigh (1.2 V), Vlow (0.7 V), or OFF (0 V)

• Switch between Vhigh and Vlow

Transitions take less than 100 ns

Mean speed / mean power of the IP is programmed by a PWM

• Compatible with synchronous and asynchronous IPs

For GALS system: coordination done with local clock generator

• Can easily be integrated in any CMOS circuit

No inductor contrary to traditional DC/DC converters

No capacitor contrary to charge pump implementation

145

Page 144: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Ultra Cut-Off Generator

• When reverse polarizing the gate, the leakage current goes through a minimum

• The optimal polarization point varies with the temperature, the supply voltage and the process corners

• The proposed UCO generator automatically polarizes the gate of the Power switch to its point of minimum leakage

• Compensates for temperature variation, alleviates corners variations.

• The gate oxide reliability is considered by introducing a passive stress reduction mechanism

146

Page 145: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Pausable Clock Interface

• Pause temporary the clock when a transfer (NoC) or a supply switch is required

• Based on

Two GALS ports : Synchronous-to Asynchronous and Asynchronous-to-Synchronous

A programmable delay line

A pausable clock generator

• Pausable Clock Generator arbitrates pause requests

147

Page 146: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Pausable Clock Interface

• Programmable delay line

Precise, small and low power

Using Standard cells

On the same unit power domain

148

Page 147: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Power Gain

• Programmable delay line matches with unit logic on the same power domain

Compensates any mismatch thanks to re-programmation

• Power reduction

Vhigh=1.2V and Vlow=0.8V

35 % dynamic power reduction between High and Low modes

Hopping mode is used to save power without any latency cost

Leakage power thanks to UCO is reduced by 2 decade

• Power Supply Unit efficiency

Hopping Unit

Only resistive losses in the power transistors

About 1 mW dynamic power

=> more than 95 % power efficiency

90 % total efficiency (external DC-DC taken into account)

An adaptive and reliable Power Supply Unit giving high power reduction factor and high power efficiency

149

Page 148: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Physical Implementation

• Power Switch

One single Power-Switch for the complete power domain

Sized to get a speed loss<5%

Area : about <5% of the power domain

• Hopping Unit

Area : 140μm*35μm

Hopping Transition : <100 ns

150

Page 149: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Synchronous or Asynchronous?

• A clock less on-chip network appears to be an elegant solution although some questions remain:

Test

Performance concerns

Shouldn’t asynchronous designs offer latency advantages?

Fast local control, path/data dependent delays, DI interconnects

Perhaps asynchronous routers mimic synchronous architectures too closely?

Exploit flexibility, novel architectures, different topologies

Overheads for data-driven clocking or GALS currently look small in comparison to the classical approach

• Synchronous design has advantages too

Predictability and determinism can be exploited

Fast single cycle routers possible

Global snapshot of state is good for scheduling • Still lots of interesting research to be done

151

Page 150: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

GALAXY project

• GALAXY project (GALS InterfAce for CompleX Digital SYstem Integration) is funded in the FP7 program of EU

• www.galaxy-project.org

152

Page 151: IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany  © 2009 -

IHP Im Technologiepark 25 15236 Frankfurt (Oder) Germany www.ihp-microelectronics.com © 2009 - All rights reserved

Project goals

• This project builds on a technology approach in which the EU currently has world leadership

• We are on the way to provide an integrated GALS NoC design flow

• We will provide an interoperability framework between the existing open and commercial CAD tools

• The project is evaluating the ability of the GALS approach to

solve system integration issues,

implement a complex GALS system on 40 nm CMOS process,

explore the low EMI and low-power properties,

and robustness to process variability problems.

153