fundamental algorithms for system modeling, analysis, and ... · timing analysis for digital ics...

23
1 Fundamental Algorithms for System Modeling, Analysis, and Optimization Analysis, and Optimization Edward A. Lee, Jaijeet Roychowdhury, Sanjit A. Seshia, Stavros Tripakis UC Berkeley EECS 144/244 Spring 2013 Spring 2013 Copyright © 2010-13, E. A. Lee, J. Roychowdhury, S. A. Seshia, S. Tripakis. All rights reserved Lecture: Timing Analysis, Retiming Thanks to Kurt Keutzer for several slides RTL Synthesis Flow RTL S HDL HDL Simulation/ Verification FSM, Verilog, VHDL Synthesis netlist logic optimization netlist Library/ module generators a b s q 0 1 d clk a b q 0 1 d Boolean circuit/network Boolean circuit/network EECS 144/244, UC Berkeley: 2 physical design layout K. Keutzer s clk Graph / Rectangles Timing Analysis

Upload: phungcong

Post on 27-Jul-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Fundamental Algorithms for System Modeling, Analysis, and ... · Timing Analysis for Digital ICs ... logic clk Combinational logic clk Combinational ... of theof the slowest circuitslowest

1

Fundamental Algorithms for System Modeling, Analysis, and OptimizationAnalysis, and Optimization

Edward A. Lee, Jaijeet Roychowdhury, Sanjit A. Seshia, Stavros TripakisUC BerkeleyEECS 144/244Spring 2013Spring 2013

Copyright © 2010-13, E. A. Lee, J. Roychowdhury, S. A. Seshia, S. Tripakis. All rights reserved

Lecture: Timing Analysis, Retiming

Thanks to Kurt Keutzer for several slides

RTL Synthesis Flow

RTLS

HDLHDL

Simulation/ Verification

FSM,Verilog,VHDL

Synthesis

netlist

logicoptimization

netlist

Library/modulegenerators

a

b

s

q0

1

d

clk

a

bq

0

1

d

Boolean circuit/network

Boolean circuit/network

EECS 144/244, UC Berkeley: 2

physicaldesign

layout

K. Keutzer

s clk

Graph / RectanglesTiming Analysis

Page 2: Fundamental Algorithms for System Modeling, Analysis, and ... · Timing Analysis for Digital ICs ... logic clk Combinational logic clk Combinational ... of theof the slowest circuitslowest

2

Timing Analysis / Verification

Verifying a property about system timing

Arises in many settings: Integrated circuits Integrated circuits

Embedded software

Distributed embedded systems

Biological systems

EECS 144/244, UC Berkeley: 3

Here we focus on circuits.

Chapter 15 of Lee-Seshia book contains a more broad discussion.

Timing Analysis for Digital ICs

(Clock) Speed is one of the major performance metrics for digital circuits

Timing Analysis = the process of verifying that a chip meets its speed requirementE.g., 1 GHz means that next-state function must be

computed within 1 ns

EECS 144/244, UC Berkeley: 4

Page 3: Fundamental Algorithms for System Modeling, Analysis, and ... · Timing Analysis for Digital ICs ... logic clk Combinational logic clk Combinational ... of theof the slowest circuitslowest

3

Static Timing Analysis for Circuits

Determine fastest permissible clock speed (e.g. 1 GHz)

clk

Combinationallogic

clk

Combinationallogic

clk

Combinationallogic

EECS 144/244, UC Berkeley: 5

p p ( g )by determining delay of longest path from register to register (e.g. 1ns.)

Cycle Time - Critical Path Delay

Cycle time (T) cannot be smaller than longest path delay (Tmax) data

cycle time

p y ( max)

Longest (critical) path delay is a function of:

Total gate, wire delays

Tclock1

data

maxT T

EECS 144/244, UC Berkeley: 6

clock

Q1 Q2

Tclock1 Tclock2critical path,

~5 logic levels

Page 4: Fundamental Algorithms for System Modeling, Analysis, and ... · Timing Analysis for Digital ICs ... logic clk Combinational logic clk Combinational ... of theof the slowest circuitslowest

4

Cycle Time - Setup Time

For FFs to correctly latch data, input data must be stable during: data

setup time

be stable during:

Setup time (Tsetup) beforeclock arrives

Tclock1

data

max setupT T T

EECS 144/244, UC Berkeley: 7

clock

Q1 Q2

Tclock1 Tclock2critical path,

~5 logic levels

Cycle Time - Clock-skew

Tclock2

T

dataIf clock network has unbalanced delay – clock skew Tclock1

Q2clock skew

Q2

skew

Cycle time is also a function of clock skew (Tskew)

max setup skewT T T T

EECS 144/244, UC Berkeley: 8

clock

Q1 Q2

Tclock1 Tclock2

critical path, ~5 logic levels

Page 5: Fundamental Algorithms for System Modeling, Analysis, and ... · Timing Analysis for Digital ICs ... logic clk Combinational logic clk Combinational ... of theof the slowest circuitslowest

5

Cycle Time - Clock to Q

Cycle time is also a function of propagation delay of

Tclock1

T l k2

data

p p g yFF (Tclk-to-Q)

Tclk-to-Q : time from arrival of clock signal till change at FF output)

Q

Tclock2

Q2clock-to-Q

Q

max setup skew clk to QT T T T T

EECS 144/244, UC Berkeley: 9

clock

Q1 Q2

Tclock1 Tclock2

Q2

critical path, ~5 logic levels

Min Path Delay - Hold Time

For FFs to correctly latch data, input data must be stable during Hold time T

data

hold time

stable during Hold time (Thold) after clock arrives

Determined by delay of shortest path in circuit (Tmin) and clock skew (Tskew)

Tclock1

min hold skewT T T

EECS 144/244, UC Berkeley: 10

clock

Q1 Q2

Tclock1 Tclock2short path, ~3

logic levels

Page 6: Fundamental Algorithms for System Modeling, Analysis, and ... · Timing Analysis for Digital ICs ... logic clk Combinational logic clk Combinational ... of theof the slowest circuitslowest

6

Summary

Need to be able to compute

Tmin: delay of “fastest” path in the circuit

Tmax: delay of “slowest” (critical) path in the circuit

EECS 144/244, UC Berkeley: 11

Modeling Timing in a Combinational Circuit

Arrival time in green A 2

1

0.20

10

X

Z

W

.05

C

B

f

2

2

1

1

0

.20

.20

.10

YZ

.15

.05

1

Interconnect delay in red

Gate delay in blue

What’s the right mathematical

EECS 144/244, UC Berkeley: 13

What s the right mathematical object to use to represent this physical object?

Page 7: Fundamental Algorithms for System Modeling, Analysis, and ... · Timing Analysis for Digital ICs ... logic clk Combinational logic clk Combinational ... of theof the slowest circuitslowest

7

Modeling - 1

Use a labeled directed graph

G = <V E>

A 21

0.20

10

X

Z

W

.05

X0.05A

2

G = <V,E>

Vertices represent gates, primary inputs and primary outputs

Edges represent wires

C

B

f

2

2

1

1

0

.20

.20

.10

YZ

.15

.05

1

EECS 144/244, UC Berkeley: 14

C

B

f

Y

W .05.1

1

.2

.15.20

.20

12

2

2

Z

Labels represent delays

What about arrival times?

0

0

1

s

Modeling - 2

C

X0

0

A

15.201

2

2

Find longest/shortest paths in a directedgraph G = <V,E>

C

B

f

Y

W .05.1

1

.2

0

1

.15

.202

2

Z

g ap G ,

What sort of directed graph do we have?

Is this in the standard form for a

s

EECS 144/244, UC Berkeley: 15

longest/shortest path problem?

Page 8: Fundamental Algorithms for System Modeling, Analysis, and ... · Timing Analysis for Digital ICs ... logic clk Combinational logic clk Combinational ... of theof the slowest circuitslowest

8

Split Nodes into Edges

C

X0

05.10

A

.15.201

2

2C

B

f

Y

W .05.1

1

.2

0

1.20

2

Z

22

s

EECS 144/244, UC Berkeley: 16

2 1

0.5 2.5

3

DAG with Weighted Edges

X0

A

Cf

Y

W1.050.1

1

0.2

0

0

1

2.15

2.2

2.2Zs

EECS 144/244, UC Berkeley: 17

B1

Problem: Find the longest (critical) path from source s to sink f.

Page 9: Fundamental Algorithms for System Modeling, Analysis, and ... · Timing Analysis for Digital ICs ... logic clk Combinational logic clk Combinational ... of theof the slowest circuitslowest

9

Naïve Approach: Enumerate Paths

A

How many paths in this example?In the worst case?

Cf

X

W

0

1.05.1

.2

0

0

A

2.15

2.2

2 2Zs

EECS 144/244, UC Berkeley: 18

B

Y11

2.2

Problem: Find the longest path from source s to sink f.

Algorithm 1: Longest path in a DAG

Critical Path Method [Kirkpatrick 1966, IBM JRD](dynamic programming)

Let w(u,v) denote weight of edge from u to v

Steps:

1. Topologically sort vertices (graph is acyclic)

order: v1, v2, …, vn v1 = s, vn = ?

2. For each vertex v, compute

EECS 144/244, UC Berkeley: 19

d(v) = length of longest path from source s to v

d(v1) = 0

For i = 2..n

d(vi) = maxall incoming edges (u, vi)d(u) + w(u,vi)

Page 10: Fundamental Algorithms for System Modeling, Analysis, and ... · Timing Analysis for Digital ICs ... logic clk Combinational logic clk Combinational ... of theof the slowest circuitslowest

10

Algorithm 1: Longest path in a DAG

Critical Path Method [Kirkpatrick 1966, IBM JRD]

Let w(u,v) denote weight of edge from u to v

Steps:

1. Topologically sort vertices

order: v1, v2, …, vn v1 = s, vn = f

2. For each vertex v, compute

d( ) l th f l t th f t

Time Complexity?O(m+n)

EECS 144/244, UC Berkeley: 20

d(v) = length of longest path from source s to v

d(v1) = 0

For i = 2..n

d(vi) = maxall incoming edges (u, vi)d(u) + w(u,vi)

Run the CPM on our example

Graph vs. Circuit

Delay of a combinational circuit depends on Circuit topology (graph model)

Delay model (e.g., fixed vs. variable)

Boolean behavior of gates

We have only considered circuit topology so far.

EECS 144/244, UC Berkeley: 21

Longest/shortest path found on the graph can be very pessimistic: Paths can be FALSE

Delay values are BOUNDS

Page 11: Fundamental Algorithms for System Modeling, Analysis, and ... · Timing Analysis for Digital ICs ... logic clk Combinational logic clk Combinational ... of theof the slowest circuitslowest

11

False Paths (consider Transition Mode)

A path is false if it cannot be responsible for the delay of a circuit

EECS 144/244, UC Berkeley: 22

Graph model implies path of length 6

False Paths

A path is false if it cannot be responsible for the delay of a circuit

EECS 144/244, UC Berkeley: 23

Graph model implies path of length 6

Page 12: Fundamental Algorithms for System Modeling, Analysis, and ... · Timing Analysis for Digital ICs ... logic clk Combinational logic clk Combinational ... of theof the slowest circuitslowest

12

False vs. True Paths

TRUE path = one that can be responsible for the delay of a circuita circuit

Need techniques to find whether a path is TRUE or FALSE

EECS 144/244, UC Berkeley: 24

The Fixed Delay Model: Constant delay for each gate (or wire) Typo: this should be 1

EECS 144/244, UC Berkeley: 25

Page 13: Fundamental Algorithms for System Modeling, Analysis, and ... · Timing Analysis for Digital ICs ... logic clk Combinational logic clk Combinational ... of theof the slowest circuitslowest

13

Paradoxical Behavior with the Transition Model?Typo: this should be 1

non-monotonic

EECS 144/244, UC Berkeley: 26

w.r.t. time delays!

Problems with Fixed Delay + Transition Model

1. Transition model can be tricky to reason about

Fi d d l li i d2. Fixed gate delays are unrealistic, due to manufacturing process variations

More realistic delay model: Lower and upper bounds

EECS 144/244, UC Berkeley: 27

Perform timing analysis for a whole family of circuits that share the same lower/upper bounds

Page 14: Fundamental Algorithms for System Modeling, Analysis, and ... · Timing Analysis for Digital ICs ... logic clk Combinational logic clk Combinational ... of theof the slowest circuitslowest

14

Fixed Delays Bounded Delays

Want algorithms that report the critical path delay

of the slowest circuit in the circuit familyof the slowest circuit in the circuit family

EECS 144/244, UC Berkeley: 28

Paper: Computation of Floating Mode Delay in Combinational

Circuits: Theory and Algorithms, Devadas, Keutzer, Malik, 1993

(on bspace)

RETIMING

EECS 144/244, UC Berkeley: 29

Page 15: Fundamental Algorithms for System Modeling, Analysis, and ... · Timing Analysis for Digital ICs ... logic clk Combinational logic clk Combinational ... of theof the slowest circuitslowest

15

Retiming Tradeoffs

aa

b 1

1

12

2

2

EECS 144/244, UC Berkeley: 30

[Shenoy, 1997]

6

4

clock period =

# registers =

Retiming Tradeoffs

aa

b 1

1

12

2

2

EECS 144/244, UC Berkeley: 31

5

4

clock period =

# registers =

[Shenoy, 1997]

Page 16: Fundamental Algorithms for System Modeling, Analysis, and ... · Timing Analysis for Digital ICs ... logic clk Combinational logic clk Combinational ... of theof the slowest circuitslowest

16

Retiming Tradeoffs

aa

b 1

1

12

2

2

EECS 144/244, UC Berkeley: 32

clock period =

# registers =

4

3

[Shenoy, 1997]

Retiming Tradeoffs

aa

b 1

1

12

2

2

EECS 144/244, UC Berkeley: 33

clock period =

# registers =

2

4

[Shenoy, 1997]

Page 17: Fundamental Algorithms for System Modeling, Analysis, and ... · Timing Analysis for Digital ICs ... logic clk Combinational logic clk Combinational ... of theof the slowest circuitslowest

17

Goals of Retiming

Possible goals:

• Minimize clock period (min-period retiming)

• Minimize number of registers (min-area retiming)Minimize number of registers (min area retiming)

• Minimize number of registers for a target clock period (constrained min-area retiming)

EECS 144/244, UC Berkeley: 34[Shenoy, 1997]

Abstraction:Circuit Graph

Nodes: circuit elementsNode weights: delay

EECS 144/244, UC Berkeley: 35

Dummy node for fanout

Page 18: Fundamental Algorithms for System Modeling, Analysis, and ... · Timing Analysis for Digital ICs ... logic clk Combinational logic clk Combinational ... of theof the slowest circuitslowest

18

Abstraction:Registers?

EECS 144/244, UC Berkeley: 36

Abstraction - Registers

0

1

1

1

0

00 0

0 0 0

0

000

00

EECS 144/244, UC Berkeley: 37

Arc weights indicate registers

1

Page 19: Fundamental Algorithms for System Modeling, Analysis, and ... · Timing Analysis for Digital ICs ... logic clk Combinational logic clk Combinational ... of theof the slowest circuitslowest

19

Abstraction - Environment

0

1

0

00 0

0 0 0

0

000

0

EECS 144/244, UC Berkeley: 38

11

1

0

Cutset – Divides the Graph in Two

0

1

0

00 0

0 0 0

0

000

0

EECS 144/244, UC Berkeley: 39

11

1

0

Page 20: Fundamental Algorithms for System Modeling, Analysis, and ... · Timing Analysis for Digital ICs ... logic clk Combinational logic clk Combinational ... of theof the slowest circuitslowest

20

Retiming: Add a register on all arcs crossing the cutset in one direction, and subtract a register from all arcs crossing the cutset in the other direction.

0 0+1=1

1

0

00

00+1=1

0

0

000

0

0+1=1

EECS 144/244, UC Berkeley: 40

11-1=0

1-1=0

0

Recall: Retiming Tradeoffs

aa

b 1

1

12

2

2

EECS 144/244, UC Berkeley: 41

[Shenoy, 1997]

6

4

clock period =

# registers =

Page 21: Fundamental Algorithms for System Modeling, Analysis, and ... · Timing Analysis for Digital ICs ... logic clk Combinational logic clk Combinational ... of theof the slowest circuitslowest

21

Recall: Retiming Tradeoffs

aa

b 1

1

12

2

2

EECS 144/244, UC Berkeley: 42

5

4

clock period =

# registers =

[Shenoy, 1997]

Simplest cutset surrounds one node

0

1

1-1=0

1-1=00

00

0+1=1 0

0

000

0

EECS 144/244, UC Berkeley: 43

10 0

0

Page 22: Fundamental Algorithms for System Modeling, Analysis, and ... · Timing Analysis for Digital ICs ... logic clk Combinational logic clk Combinational ... of theof the slowest circuitslowest

22

Recall: Retiming Tradeoffs

aa

b 1

1

12

2

2

EECS 144/244, UC Berkeley: 44

5

4

clock period =

# registers =

[Shenoy, 1997]

Recall: Retiming Tradeoffs

aa

b 1

1

12

2

2

EECS 144/244, UC Berkeley: 45

clock period =

# registers =

4

3

[Shenoy, 1997]

Page 23: Fundamental Algorithms for System Modeling, Analysis, and ... · Timing Analysis for Digital ICs ... logic clk Combinational logic clk Combinational ... of theof the slowest circuitslowest

23

For each node v, define r (v) = # of registers moved from the outputs to the inputs.

0

r (v) = -1

A retiming is now an assignment r (v) for every node v such that the weight of every arc is non-negative.

1

1-1=0

1-1=00

00

0+1=1 0

0

000

0v

y g

EECS 144/244, UC Berkeley: 46

10 0

0

Rest of the story

Retiming operations = graph transformations

Each graph has a vector of costs: (clock period, # registers, …)

Formulate and solve optimization problem.

Papers (on bspace):

EECS 144/244, UC Berkeley: 47

Papers (on bspace):1. Leiserson, C. E. and J. B. Saxe (1983). "Optimizing synchronous systems."

Journal of VLSI and Computer Systems: pp. 41-67.

2. Leiserson, C. E. and J. B. Saxe (1991). "Retiming synchronous circuitry." Algorithmica 6(1): pp. 5-35.

3. Shenoy, N. (1997). "Retiming: Theory and practice." Integration, the VLSI Journal 22: pp. 1-21.