eecs 219b spring 2001

23
1 EECS 219B EECS 219B Spring 2001 Spring 2001 Timing Optimization Andreas Kuehlmann

Upload: allegra-nichols

Post on 30-Dec-2015

40 views

Category:

Documents


0 download

DESCRIPTION

EECS 219B Spring 2001. Timing Optimization Andreas Kuehlmann. Restructuring for Timing Optimization. Outline: Definitions and problem statement Overview of techniques (motivated by adders) Tree height reduction (THR) Generalized bypass transform (GBX) Generalized select transform (GST) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: EECS 219B Spring 2001

1

EECS 219BEECS 219BSpring 2001Spring 2001

Timing Optimization

Andreas Kuehlmann

Page 2: EECS 219B Spring 2001

2

Restructuring for Timing OptimizationRestructuring for Timing Optimization

Outline:• Definitions and problem statement• Overview of techniques (motivated by adders)

– Tree height reduction (THR)– Generalized bypass transform (GBX)– Generalized select transform (GST)– Partial collapsing

Page 3: EECS 219B Spring 2001

3

Circuit RestructuringCircuit Restructuring

Approaches:

Local: • Mimic optimization techniques in adders

– Carry lookahead (THR tree height reduction)– Conditional sum (GST transformation)– Carry bypass (GBX transformation)

Global:• Reduce depth of entire circuit

– Partial collapsing– Boolean simplification

Page 4: EECS 219B Spring 2001

4

Tree-Height Reduction (THR)Tree-Height Reduction (THR)

n

l m

i j

h

k

3

6

5 5

1 4

1

0 0 0 0 2 0 0

a b c d e f g

i

1

0 0

a b

m

j

h

k

3

41

0 0 2 0 0

c d e f g

n’Duplicatedlogic

12

00

5Criticalregion

CollapsedCritical region

Singh’88:

Page 5: EECS 219B Spring 2001

5

Tree-Height ReductionTree-Height Reduction

i

1

0 0

a b

m

j

h

k

3

41

0 0 2 0 0

c d e f g

n’Duplicatedlogic

12

00

5

i

1

0 0

a b

m

j

h

k

3

41

0 0 2 0 0

c d e f g

12

0

35

n’

2

1

0

4

CollapsedCritical region

New delay = 5

Page 6: EECS 219B Spring 2001

6

Generalized bypass transform (GBX)Generalized bypass transform (GBX)

• Make critical path false– Speed up the circuit

• Bypass logic of critical path(s)

fm=f fm+1 fn=g…

fm =f fm+1 fn=g… 0

1

g’

dg__df

Booleandifference

s-a-0 redundant

McGeer’91:

Page 7: EECS 219B Spring 2001

7

GBX and KMS transformGBX and KMS transform

GBX gives little area increase, BUT creates an untestable fault

(on control input to multiplexer)

KMS transform: (remove false paths without increasing delay)

• fk is last node on false path that fans out.

• Duplicate false path {f1,…, fk} -> {f’1, … , f’k}

• f’j fans out to every fanout of fj except fj+1, and fj just fans out to fj+1

• Set f0 input to f1 to controlling value and propagate constant (can do because path is false and does not fanout)

KMS results

– Function of every node, except f1, … ,fk is unchanged

– Added k nodes– Area added in linear in size of length of false paths; in practice small area increase.

Page 8: EECS 219B Spring 2001

8

KMSKMS

fm+1 fm+2 fn…fm+k fm+k+1

f’m+1 f’m+2 f’m+k

fm+1 fm+2 fn…fm+k fm+k+10

…Delay is notincreased

Keutzer, Malik, Saldanha’90:

Page 9: EECS 219B Spring 2001

9

Generalized select transform (GST)Generalized select transform (GST)

Berman’90: Late signal feeds multiplexor

c d e f g

a

b

out

c d e f g

b

c d e f g

b

a=0

a=1

out0

1

a

Page 10: EECS 219B Spring 2001

10

GST vs GBXGST vs GBX

… 0

1g’

dh__da

a

a

b

cg

h

c d e f gb

c d e f gb

a=0

a=1

out0

1

a

GST

c d e f gb

c d e f gb

a=0

a=1

… 0

1g’

a cg

b

GBX

a

h

Boolean

diffe

Note:

rence =

a a a

hh h

GBX

Page 11: EECS 219B Spring 2001

11

GST vs GBXGST vs GBX

• Select transform appears to be more area efficient• But Boolean difference generally more efficiently formed in

practice• No delay/speedup advantage for either transform• Can reuse parts of the critical paths for multiple fanouts on GST

c d e f gb

c d e f gb

a=0

a=1

out10

1

a

GST out20

1

a

Page 12: EECS 219B Spring 2001

12

Technology Independent Delay ReductionsTechnology Independent Delay Reductions

Generally THR, GBX, GST (critical path based methods) work OK, – but very greedy and computationally expensive

Why are technology independent delay reductions hard?

Lack of fast and accurate delay models– # levels, fast but crude– # levels + correction term (fanout, wires,… ): a little better,

but still crude (what coefficients to use?)– Technology mapped: reasonable, but very slow– Place and route: better but extremely slow– Silicon: best, but infeasibly slow (except for FPGAs)

better

slower

Page 13: EECS 219B Spring 2001

13

Clustering/partial-collapseClustering/partial-collapse

Traditional critical-path based methods require– Well defined critical path– Good delay/slack information

Problems:– Good delay information comes from mapper and layout– Delay estimates and models are weak

Possible solutions:– Better delay modeling at technology independent level– Make speedup, insensitive to actual critical paths and

mapped delays

Page 14: EECS 219B Spring 2001

14

Clustering/Partial CollapseClustering/Partial Collapse

Two-level circuits are fast– Collapse circuit to 2-level - but

• Huge area penalty• Huge capacitive loading on inputs (can be much slower)

To avoid huge area penalty– Identify clusters of nodes

• Each cluster has some fixed size– Perform collapse of each cluster– Simplify each node

Details– How to choose the clusters?– How to choose cluster size?– How to simplify each node?

Page 15: EECS 219B Spring 2001

15

Lawler’s Clustering AlgorithmLawler’s Clustering Algorithm

• Optimal in delay:– For a given clustering size

• May duplicate nodes (hence possible area penalty)– Not optimal w.r.t duplication– Use a heuristic

• Fast: O(m x k)– m = number of edges in network– k = maximum cluster size

Page 16: EECS 219B Spring 2001

16

Clustering Algorithm - OverviewClustering Algorithm - Overview

• Label phase: (k is cluster size)– If node u is an input, label(u) := L := 0

• Else L := max label of fanin of u– If (# nodes in TFI(u) with (label = L) >= k)

label(u) := L+1• Cluster phase: (outputs to inputs)

– If node u is an output, L := infinity• Else L := max label of fanouts of u

– If (label(u) < L) then create a new cluster with “root” u and with members all the nodes in TFI(u) with label = label(u)

• Collapse phase:– Collapse all nodes in a cluster into a single node– Note: a node may be in several clusters (causes area increase

Page 17: EECS 219B Spring 2001

17

Example of ClusteringExample of Clustering

0

0

0

0 0

0

1

1

1

1

2

01

12

0

0

Result: Lawler’s algorithmgives minimum depth circuit

Typically, 1. we decompose initial circuit

into 2-input NANDs and invertors.

2. then cluster size k reflects # 2-input NANDs to be collapsed together.

k = 3

Page 18: EECS 219B Spring 2001

18

Choosing Choosing kk

• I(k): number of levels, given k

• d(k): duplication ratio

– Number of gates in cluster network divided by number of gates in original network

• Determine k0 where k0/d(k0)~2.0

• For every k from 2 to k0, compute d(k), I(k)

– Use exhaustive enumeration: label and cluster (without collapse) for each k.

– Each iteration is O(|E|k)

• Choose k such that

– I(k) is minimized

• Break ties using d(k)

– Minimize d(k)

d(k)

I(k)

1 2 k0

Page 19: EECS 219B Spring 2001

19

Area RecoveryArea Recovery

Area increase is due to node duplication – this occurs when node is in multiple clusters

Two solutions:– Break clusters into smaller pieces off critical path– After cluster and collapse, recover area

Page 20: EECS 219B Spring 2001

20

Relabeling Procedure:Relabeling Procedure:

Attempt to increase node labels without exceeding cluster size

In reverse topological order

Start : assign

Increase label(u) if• new-label(u) <= label(v) for each fanout v and• new-label(u) = new-label(v) for each fanout v only if

label(u) = label(v) before relabeling, and• no cluster size is violated

- ( ) max ( )i jj PO

new label O label O

Page 21: EECS 219B Spring 2001

21

Relabeling ExampleRelabeling Example

00

00

00

00 00

00

11

11

11

22

22

00

00

00

00 00

00

11

11

11

11

22

before

after

Page 22: EECS 219B Spring 2001

22

Post-Collapse Area RecoveryPost-Collapse Area Recovery

• Algebraic factorization, but– Undo factorization if depth increases

• Optimization using DC’s• Redundancy removal

Page 23: EECS 219B Spring 2001

23

Conclusions Conclusions

• Variety of methods for delay optimization– No single technique dominates – When applied to ripple-carry adder get– Carry-lookahead adder (THR)– Carry-bypass adder (GBX)– Carry-select adder (GST)– Clustering/Partial collapse

• All techniques ignore false paths when assessing the delay and critical regions– Can use KMS transform to eliminate false paths without

increasing delay (area increase however).