async00_tut4.ppt
TRANSCRIPT
Introduction toasynchronous circuit design:
specification and synthesis
Part IV:
Synthesis from HDL
Other synthesis paradigms
Outline
• Synthesis from standard HDL (Verilog) [L. Lavagno et al Async00]
– Subset for asynchronous specification
– Data-path/control partitioning
– Circuit architecture. Control generation
• Synthesis from asynchronous HDL (CSP, Tangram)
– CSP for control generation [A. Martin et al, Caltech]
– Tangram for silicon compilation [K. van Berkel et al, Philips]
• Control synthesis using FSMs [K. Yun, S. Nowick]
– Burst-mode machines
– Comparison with STGs
• Disclaimer: this is NOT a comprehensive review
Motivation
• Language-based design key enabler to synchronous logic success
• Use HDL as single language for• specification• logic simulation and debugging• synthesis• post-layout simulation
• HDL must support multiple levels of abstraction
Control-data partitioning
• Splitting of asynchronous control and synchronous data path
• Automated insertion of bundling delays
CONTROLUNIT
DATAPATH
delay
request
acknowledge
Design flow
Control/datasplitting
STG(control)
HDLspecification
SynthesizableHDL (data)
Synthesis(petrify)
Timing analysis(Synopsys)
HDLimplementation
Synthesis(Synopsys)
Logicimplementation
Delayinsertion
Logic delays
Asynchronous Verilog subset by examplealways begin
wait(start);R = SMP * 3;RES = SMP * 4 + R;if(RES[7] == 1) RES = 0;else begin if(RES[6] == 1) RES = 1;end;done = 1;wait(!start);done = 0;
end
RRES
SMP
donestart
RES
C.U.
• begin-end for sequencing, fork-join for concurrency, if-else for input choice
• Only structured mix of sequencing, concurrency and choice can be specified
Controller design flow
Trace Expressions
Circuit
Petri NetTransformations
Reductions
Synthesis
HDL
Syntax-directed translation
Trace expressions: example
( a || ( b ; c) ) || (d e)�
||
;
||
a
b c
d e
Reduction Examplea
f b
c
d g
h
e
d;a; ( b || f )
c
g; h;e
Transformation: concurrency reduction
a
f b
c
d
;
||
a
b c df
;
;
Concurrency in TE: b and f have a
commonparallel father
a
f b
c
d
f and b are ordered
;
||
a
b c df
;
;;
Transformation: concurrency reduction
Synthesis
• Place-based encoding ( based on a David-cell approach)
• Transformations to improve area and performance
• Structural methods to derive a circuit [Pastor et al.] Transactions on CAD, Nov’98
Place-based encoding
p1 p2
p3
p4
t1
t2
p3+
p1- p2-
p4+
p3-
t1
t2
p1+p2+
p4-
1100
0010
0001
ER(t1) = 111-
ER(t2) = --11
Synthesis example: VME bus p2+
ldtack+
p8- p11-
lds+
p1+
D+
p3+
p1-
p2-
p4+
dtack+
p3-
p5+
dsr-
p4-
p9+p6+
D- p5-
p10+ p7+
lds- dtack-
p9- p6-
p11+
ldtack- p8+
dsr+p10-
p7-
LDTACK+
D+
DTACK+
DSr-
D-
DTACK- LDS-
LDTACK-DSr+
LDS+
Place encoding
VME bus spec after transforms
p2+ldtack+
p8- p11-
lds+
p1+
D+
p3+
p1-
p2-
p4+
dtack+
p3-
p5+
dsr-
p4-
p9+p6+
D- p5-
p10+ p7+
lds- dtack-
p9- p6-
p11+
ldtack- p8+
dsr+p10-
p7-
ldtack+
lds+d+
dtack+
dsr- p9+
d-
lds- dtack-
p9-ldtack-
dsr+
ReductionsTransforms
Deriving Next state functionx+
z+
z-
y-
x-
y+
p1
p2
p3
p4
p5
p6
p7
Next-state functionof signal y ?
000
1-0
1-1
0-1
-0-
-1-
010
Deriving Next State functionx+
z+
z-
y-
x-
y+
p1
p2
p3
p4
p5
p6
p7
Next-state functionof signal y ?
000
1-0
1-1
0-1
10--01
11--11
010
y = x + z
Conclusion
• Initial prototype of automated flow without state explosion for ASIC design– From HDLs (control / data splitting)– Existing tools for data-path synthesis– Direct synthesis guarantees implementation
(HDL Petri net, Petri-net-based encoding)– Synthesis of large controllers by efficient spec models (Free-
choice Petri nets + trace expressions)– Exploration of the design space (optimization) by property-
preserving transformations– Logic synthesis by structural methods
• Quality of design often acceptable• Timing post-optimization can be applied
Synthesis from asynchronous HDL
• CSP based languages
• CSP = communicating sequential processes [T. Hoare]
• Two synthesis techniques– based on program transformations [Caltech]– based on direct compilation [Philips]
• Tools are more mature than for asynchronous synthesis from standard HDL
• Complete shift in design methodology is required
Using CSP for control generation
• After li goes high do full handshake at the right, then complete handshake at the left and iterate.
li+ ro+ ri+ ro- ri- lo+ li- lo-
ro
ri
li
lo
Q element
*[[li];ro+;[ri];ro-;[not ri];lo+;[not li];lo-]
• “;” = sequencing operator• ro+ = ro goes high; ro- = ro goes low• [li] = wait until li is high; [not li] = wait until li is low
CSP:
STG:
Using CSP for control generation
*[[li];ro+;[ri];ro-;[not ri];lo+;[not li];lo-]
• Conflict: ro+ and ro- are not mutually exclusive (since ri+ and li+ are not)
• Eliminate conflict by state signal insertion (= CSC)
CSP:
Production rules:li -> ro+; ri -> ro-not ri -> lo+; not li -> lo-
ri
li
ro
weak
Conflict elimination
*[[li];ro+;[ri];x+;[x];ro-;[not ri];lo+;[not li];x-;[not x];lo-]CSP:
Production rules:not x and li -> ro+; x or not li -> ro-x and not ri -> lo+; not x or ri -> lo-ri -> x+; not li -> x-
FFx not x
li
lo ri
ro
Conclusions
• Generating circuits from CSP control program is similar to STG synthesis
• One can be reduced to the other
• Particular technique may vary. Direct CSP program transformations can be (and were) used instead of methods based on state space generation
• See reference list for more details
Buffer example in Tangram
(a?byte & b!byte)begin
x0: var byte | forever do
a?x0 ; b!x0od
end
Buffer
*
xa bT
;
T
a b
passive port
active port Each circle mapped to a netlist
Data path
Q element
Summary
• Tangram program is partitioned into data path and control
• Data path is implemented as dual or single rail
• Control is mapped to composition of standard elements (“;” “||” etc)
• Each standard element is mapped to a circuit
• Post-optimization is done
• Composing islands of control elements and re-synthesis with STG can give more aggressive optimization
• Philips made a few chips using Tangram, including a product: 8051 micro-controller in low-power pager Muna (25 wks battery life from one AAA battery)
• Similar approach used in Balsa(Manchester Univ., public domain)
Burst mode FSM
s1
s2
s3
s4
b-/x-a+b+/y+
a-/x+y-
c+/y-c-/y+
• Close to synchronous FSMs with binary encoded I/O
• Work in bursts:– Input transitions fire
– Output transitions fire
– State signals change
• Mostly limited to fundamental mode: next input burst cannot arrive before stabilization at the outputs
Extended Burst mode
s1
s2
s3
s4
b-/x-
a+b*/y+<b+>a-/x+y-
<b+>c+/y-c-/y+
• Directed don’t cares (b*): some concurrency is allowed for input transitions that do not influence an output burst
• Conditional guards <b+> = “if b=1 then …”
Synthesis of XBM
• Next state and output functions free of functional and logic hazards
• Sequential feedbacks should not introduce new hazards
• State assignment– one state of the BM spec to one layer of Karnaugh map
– compatible layers are merged
– layers are compatible if merging does not introduce CSC violations or hazards
– Layers are encoded using race free encoding
XBM and STG
s1
s2
s3
s4
b-/x-
a+b*/y+<b+>a-/x+y-
<b+>c+/y-c-/y+
x-
a+
y+
b+
eps
c-
a- c+
y-
y+
x+ y-
b-
Summary
• Specification: XBM is subclass of STGs
• Synthesis: techniques are extensions of synchronous state assignment and logic minimization
• Timing:
– environment is limited to fundamental mode (difficult for pipelined and highly concurrent systems)
– internals are delay insensitive
• See reference list for details