ece4740: digital vlsi design

15
6/8/2018 1 ECE4740: Digital VLSI Design Lecture 22: Tree adders 799 Carry-skip and carry-select adders Recap 800

Upload: others

Post on 05-Nov-2021

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ECE4740: Digital VLSI Design

6/8/2018

1

ECE4740: Digital VLSI Design

Lecture 22: Tree adders

799

Carry-skip and carry-select adders

Recap

800

Page 2: ECE4740: Digital VLSI Design

6/8/2018

2

Carry-skip adder principle

• If BP=P0P1P2P3=1 then CO,3=Ci,0, otherwise block itself generates (or kills) carry internally

801

FA

A0 B0

S0

Ci,0FA

A1 B1

S1

FA

A2 B2

S2

FA

A3 B3

S3

Co,3

Co,3

BP = P0 P1 P2 P3 �Block Propagate� why is this the critical path?

N-bit carry skip adder

• Set block size to B=sqrt(N/2)

• Delay grows only with sqrt(N)802

Ci,0

Sum

Carry

Propagation

Setup

Sum

Carry

Propagation

Setup

Sum

Carry

Propagation

Setup

Sum

Carry

Propagation

Setup

bits 0 to 3bits 4 to 7bits 8 to 11bits 12 to 15

no direct pathto carry out

Image taken from: CMOS VLSI Design: A Circuits and Systems Perspective by Weste, Harris

Page 3: ECE4740: Digital VLSI Design

6/8/2018

3

Carry select adder (CSA)

• Pre-compute carry out for each block for Cin=0 and Cin=1

• Select correct outputs as soon as Cin is ready

• Only a MUX in the critical path!

803

4-b setup

0 carry propagation

1 carry propagation 1

0

MUX CinCout

Sum generation

P[3:0] G[3:0]

C[3:0]

A[3:0] B[3:0]

S[3:0]

Square root carry select adder

• Linearly increasing group sizes: T grows in sqrt(N)

804

0 carry

setup

1 carry 1

0

MUXCin

sum gen

P’s G’s

C’s

S[1:0]

A[1:0] B[1:0]

0 carry

setup

1 carry

MUX

sum gen

P’s G’s

C’s

S[4:2]

A[4:2] B[4:2]

0 carry

setup

1 carry

MUX

sum gen

P’s G’s

C’s

S[8:5]

A[8:5] B[8:5]

0 carry

setup

1 carry

MUX

sum gen

P’s G’s

C’s

S[13:9]

A[13:9] B[13:9]

0 carry

setup

1 carry

MUXCout

sum gen

P’s G’s

C’s

S[19:14]

A[19:14]B[19:14]

Page 4: ECE4740: Digital VLSI Design

6/8/2018

4

Tree adders

Recap and more topologies

805

The carry recurrence

• Remember: Ci+1=Gi+PiCi

C1 = G0 + P0 C0

• C2 = (G1 + P1G0) + (P1P0) C0

• Can be modeled as an operation on a tuple:(Gi,Pi)(Gi-1,Pi-1)=(Gi+Pi*Gi-1,Pi*Pi-1)

806

new group generate and group propagate signals

Page 5: ECE4740: Digital VLSI Design

6/8/2018

5

PG diagram notation

807

i:j

i:j

i:k k-1:j

i:j

i:k k-1:j

i:j

Gi:k

Pk-1:j

Gk-1:j

Gi:j

Pi:j

Pi:k

Gi:k

Gk-1:j

Gi:j G

i:j

Pi:j

Gi:j

Pi:j

Pi:k

Black cell Gray cell Buffer

Image Adapted From: CMOS VLSI Design: A Circuits and Systems Perspective by Weste, Harris

dot operator

generateonly

generate and propagate can also just

be a wire in some cases

Brent-Kung adder (1982)

808

1:03:25:47:69:811:1013:1215:14

3:07:411:815:12

7:015:8

11:0

5:09:013:0

0123456789101112131415

15:014:013:0 12:0 11:010:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0

Image taken from: CMOS VLSI Design: A Circuits and Systems Perspective by Weste, Harris

Page 6: ECE4740: Digital VLSI Design

6/8/2018

6

Brent-Kung adder (cont’d)

809

1:03:25:47:69:811:1013:1215:14

3:07:411:815:12

7:015:8

11:0

5:09:013:0

0123456789101112131415

15:014:013:0 12:0 11:010:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0

collect all carries from 13:0 for sum output 14

Image taken from: CMOS VLSI Design: A Circuits and Systems Perspective by Weste, Harris

Summary: Brent-Kung adder

810

1:03:25:47:69:811:1013:1215:14

3:07:411:815:12

7:015:8

11:0

5:09:013:0

0123456789101112131415

15:014:013:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0

• A=2N-log(N)-2

• T=2log(N)-2

• FOmax=log(N)

• Pros: – regular structure (…really?) – limited fan-in for all gates

• Cons: – FO is an issue: grows log(N)– Power?

uneven path lengths glitches

Image taken from: CMOS VLSI Design: A Circuits and Systems Perspective by Weste, Harris

Page 7: ECE4740: Digital VLSI Design

6/8/2018

7

Kogge-Stone adder (1973)

811

1:02:13:24:35:46:57:68:79:810:911:1012:1113:1214:1315:14

3:04:15:26:37:48:59:610:711:812:913:1014:1115:12

4:05:06:07:08:19:210:311:412:513:614:715:8

2:0

0123456789101112131415

15:014:013:0 12:011:010:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0

• A=NlogN-N+1

• T=log(N)

• FOmax=2

• High wiring overhead

Image taken from: CMOS VLSI Design: A Circuits and Systems Perspective by Weste, Harris

Sklansky adder (1960)

812

1:0

2:03:0

3:25:47:69:811:1013:1215:14

6:47:410:811:814:1215:12

12:813:814:815:8

0123456789101112131415

15:014:013:0 12:011:010:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0

• A=0.5Nlog(N)

• T=log(N)

• FOmax=N/2

Jack Sklansky, UC Irvine

Images taken from: CMOS VLSI Design: A Circuits and Systems Perspective by Weste,

Harris; https://www.the-scientist.com/?articles.view/articleNo/18705/title/New-

Technology-Weighs-In-On-Mammography-Debate/

Page 8: ECE4740: Digital VLSI Design

6/8/2018

8

Which tree adder should I pick?

813

Name Area [A] Time [T] Max. FO Wiring

Ripple carry N-1 N-1 2

Sklansky N/2*log(N) log(N) N/2

Brent-Kung 2N-log(N)-2 2*log(N)-2 log(N)

Kogge-Stone N*log(N)-N+1 log(N) 2

Carry increment 2N-sqrt(2N) sqrt(2N) sqrt(2N)

• Trade-off between area, propagation delay, etc.

• For adders with a small number of bits, do not forget carry select and carry skip adders!

Adder design with Synopsys DC

What are CAD tools doing?

814

Page 9: ECE4740: Digital VLSI Design

6/8/2018

9

Synopsys design compiler (DC)

• One of the leading CAD tools for digital integrated circuits and FPGAs

• How do you use it?

– You write hardware description language

– You provide constraints and then compile it

– The tool generates a gate-level netlist

• Automatic logic optimization, sizing, etc.

815

Simple example

• Faraday 90nm CMOS technology

• 16-bit adder design

– 16 bit inputs (Data1 and Data2)

– 17 bit outputs (Output)

816

DQ

DQ

CLK

CLKDQ

CLK

+

Data1

Data2

Output

Page 10: ECE4740: Digital VLSI Design

6/8/2018

10

Synthesis script example

[...]

analyze -library WORK -format vhdl {chris_adder.vhd}

elaborate chris_adder -architecture arch1 -library WORK

create_clock -name "ClkxCI" -period 5 -waveform {0 2.5} {ClkxCI}

compile_ultra

817

set the clock period to 5ns= constraint

Area results for T=5ns

• 49 sequential cells (D-flip-flops)

• 18 combinational cells

• Combinational area = 463um2

• Noncombinational area = 882um2

818

WHY?

Page 11: ECE4740: Digital VLSI Design

6/8/2018

11

819

how Synopsys reports the circuit area

usually comes without units depends on library

Timing results for 5ns

• Critical path: tpdmax=4.76ns

– Startpoint: LSB of Data1

– Endpoint: carry output (bit 16)

• Most likely a simple ripple carry adder

• Carry out (= bit 16) is critical!

820

Page 12: ECE4740: Digital VLSI Design

6/8/2018

12

821

Part 1

822

AN2RLX1 = 2-input AND gatew/ drive

strength 1

Part 2

4.76ns –3.84ns = 0.92 ns

Page 13: ECE4740: Digital VLSI Design

6/8/2018

13

The reason (VHDL ftw!)

823Can you see it now?

asynchronous reset

do not write data-path stuff in combinational blocks bad practice

What happens if we set T=1ns?

• Very aggressive delay constraint!

• Combinational area = 1323um2

• Noncombinational area = 987um2

• (Remember 463um2, 882um2 of ripple carry adder)

824

Page 14: ECE4740: Digital VLSI Design

6/8/2018

14

825

Part 1

826

Part 2

1.00ns –0.76ns = -0.24 ns

Page 15: ECE4740: Digital VLSI Design

6/8/2018

15

Fastest design

• tpd = 1.24ns (including flip-flop timing)

• tpd = 720ps (without flip-flop timing)

• Old lab 4 constraints:

– 500ps critical path

– Area smaller than 800um2 5pts

• Hand-design can be faster & smaller than tool-based design (but it’s much more work)!

827

Note that we are comparing different processes here and in lab 4;

flip-flop timing: propagation delay = 280ps=propagation, setup time = 240ps

CAD tools are smart!

828R. Zimmermann, “Non-Heuristic Optimization and Synthesis of Parallel-Prefix Adders,” IWLAS 1996