ece 551 digital system design & synthesis lecture 08 the synthesis process constraints and...
TRANSCRIPT
ECE 551Digital System Design &
Synthesis
Lecture 08
The Synthesis ProcessConstraints and Design RulesHigh-Level Synthesis Options
Pre-Synthesis Steps Syntax Check
Makes sure your HDL code follows the syntax rules of the Standard.
Finds errors like typos, missing semicolons, “begin” without “end”, assigning to a net in a behavioral block, etc.
Only a surface-level check Checks each module in isolation; doesn’t look at
how they fit together
7
Pre-Synthesis Steps Elaboration
“Elaborates” HDL statements Unrolls FOR loops Computes values of constant functions Replaces parameters with their values Substitutes macro text Evaluates generate conditionals and loops Checks to make sure instantiated modules are
defined Checks inter-module connections for mismatched
input/output connections (i.e. module port width not the same as connected net/variable width)
8
Pre-Synthesis Steps Design Check
Checks design for issues that may make it unsynthesizable, but are otherwise legal HDL
Detects multiple drivers to non-tristates Detects combinational loops Gives errors or warnings about unsynthesizable
constructs like delays, unsupported operators, etc.
Warns about unconnected or constant-value ports May give warnings about inferred latches
Many of these produce warnings rather than errors; make sure you read the warnings when synthesizing! 9
Synthesis Process Inputs
Functional hardware description in HDL List of design constraints and design rules
Desired clock frequency / maximum delay Limits on area, power, capacitance
Technology library (logic cells, wire models, etc.) User-specified synthesis options/strategies
Output Ideally: A netlist that uses the specified
technology library, produces the same behavior as the functional description, and meets the design constraints
Reports that summarize the area and timing of the implementation
10
Logic Synthesis Steps Translation
The synthesis tool identifies the behavior of high-level constructs and replaces them with a structural representation from a generic technology library.
Examples: “adder”, “multiplier”, “flip-flop”, “latch”
High-Level Optimizations The tool performs optimizations at the Boolean
equation level The types of optimizations depend on your
strategies Examples: Reducing the number of logic levels,
minimizing the number of Boolean operations, eliminating redundant computations
12
Logic Synthesis Steps Mapping
The synthesis tool replaces the generic representations of gates and logic structures with equivalent hardware representations from the provided technology library
The netlist now consists of a structural representation of logic cells (Standard Cell) or LUTs/CLBs (FPGA)
Low-Level Optimizations The tool performs optimizations at the logic cell
level, either to reduce delay or reduce area Examples: Duplicating logic, re-ordering
operations to minimize delay, re-timing registers13
A Brief Aside on Mapping People commonly say that when using
Structural Verilog, you know exactly what gates you are getting.
Is this true? It actually depends on what’s in your Tech Library If your library contains an XOR gate, then an XOR
primitive will be mapped to that gate But what if your Tech Library only contains NAND
gates? Or only Look-up Tables?
14
Why require Constraints & Strategies? Synthesis is hard (NP-hard!)
For a circuit of any useful size, the number of possible implementations is enormous
It is too computationally intensive to try them all Need to know when a solution is good enough to
stop We usually give the tool hints on how to proceed
Often there is no universally “best” solution Area vs. delay Throughput vs. latency Power vs. frequency Constraints & strategies allow us to manage
tradeoffs to find the solution that meets our needs
15
Constraint Examples Minimize area
16
module mac(input clk, rst, input [31:0] in, output [63:0] out);reg [31:0] constreg;reg [63:0] mult, add, result;reg [2:0] count;
assign out = result;always @(*) mult = constreg * in;always @(*) add = mult + result;
always @ (posedge clk) begin if (rst) begin constreg <= in; result <= 0; count <= 0; end else if (count > 0) begin result <= add; count <= count - 1; end else begin result <= 0; count <= 4; endendendmodule
Setting Design Constraints set_max_area 20000
Sets maximum area to 20,000 cell units
set_max_delay 4 -to all_outputs() Sets maximum delay of 4 to any output
set_max_dynamic_power 10mW Sets maximum dynamic power to 10 mW
create_clk “clk” –period 10 Specifies that port clk is a clock with a period of
10ns
create_clk –name “my_clk” –period 12 Creates a virtual clock called my_clk with a period
of 12ns; use with combinational logic17
Constraint Examples
18
CLK_PERIOD = 4 (250 MHz)MAX_AREA = 80000
Arrival: 3.73Slack: 0.01Area: 68122Slack = CLK_PERIOD – (Arrival + Library Setup Time)
Library Setup Time is approximately 0.25-0.26 ns for these examples
Constraint Examples Maximize speed
21
module mac(input clk, rst, input [31:0] in, output [63:0] out);reg [31:0] constreg;reg [63:0] mult, add, result;reg [2:0] count;
assign out = result;always @(*) mult = constreg * in;always @(*) add = mult + result;
always @ (posedge clk) begin if (rst) begin constreg <= in; result <= 0; count <= 0; end else if (count > 0) begin result <= add; count <= count - 1; end else begin result <= 0; count <= 4; endendendmodule
Constraint Examples
22
CLK_PERIOD = 4 (250 MHz)MAX_AREA = 80000
Arrival: 3.73 (+ 0.26 = 3.99)Slack: 0.01Area: 68122
Constraint Examples
23
CLK_PERIOD = 3.6 (278 MHz)MAX_AREA = 80000
Arrival: 3.46 (+ 0.26 = 3.68)Slack: -0.08Area: 73131
Constraint Examples
24
CLK_PERIOD = 3.7 (270 MHz)MAX_AREA = 90000
Arrival: 3.45 (+ 0.25 = 3.7)Slack: 0.00Area: 75673
Optimization Priorities Design rules have priority over timing goals Timing goals have priority over area goals
Design rules have highest priority
To prioritize area constraints: use the ignore_tns (total negative slack) option
when you specify the area constraint:set_max_area -ignore_tns 10000
To change priorities use set_cost_priority Example: set_cost_priority -delay
To remove all optimization constraints use remove_constraint
25
Compiling the Design Once optimizations specifications are set, the
design is compiled The compile command
Logic-level and gate-level synthesis Optimizations of the design
The compile_ultra command Two-pass high effort compile of the design May want to compile normally first to get ballpark
figure (higher effort == longer compilation)
27
What is the purpose of doing multiple passes?
Synthesis Strategies Even after supplying HDL code, Tech Library,
and Constraints, the designer is still responsible for the Synthesis Strategy.
Why do we use Strategies? The amount of CPU time and memory we devote
to synthesis are still limited resources The designers may already have a good idea
about what sort of hardware they want
28
Compiling the Design Useful compile options include:
-map_effort low | medium | high (default is medium) -area_effort low | medium | high (default same as map_effort) -incremental_mapping (may improve already-mapped) -verify (compares initial and synthesized designs) -ungroup_all (collapses all levels of design hierarchy)
29
Top-Down Compilation Use top-down compile strategy used when
compile time or synthesizer memory are not limiters
Synthesizes each design unit separately and uses top-level constraints
Basic steps are: Read in the entire design using analyze/elaborate
or:acs_read_hdl -recurse $TOP_DESIGN
Resolve multiple instances of any design references with uniquify
Apply attributes and constraints to the top level Compile the design using compile or compile_ultra
30
Example Top-Down Script# read in the entire design analyze -library WORK -format verilog {E.v D.v C.v B.v A.v TOP.v}elaborate {E.v D.v C.v B.v A.v TOP.v}current_design TOPlink # links TOP.v to libraries and modules it references# set design constraintsset_max_area 2000# resolve multiple referencesuniquify# compile the designcompile
31
Bottom-Up Compile Strategy The bottom-up compile strategy
Compile the subdesigns separately and then incorporate them
Top-level constraints are applied and the design is checked for violations.
Advantages: Compiles large designs more quickly (divide-and-conquer) Requires less memory than top-down compile
Disadvantages Need to develop local constraints as well as global
constraints May need to repeat process several times to meet design
goals
Might use if memory or CPU time are limited
32
Compile-Once-Don’t-Touch Method The compile-once-don’t-touch method uses
the set_dont_touch command to preserve the compiled subdesign
current_design topcharacterize U2/U3current_design Ccompilecurrent_design topset_dont_touch {U2/U3 U2/U4}compile
What are advantages and disadvantages?
33
Resolving Multiple References In a hierarchical design, subdesigns are often
referenced by more than one cell instance
34
Uniquify Method The uniquify command creates a uniquely named
copy of the design for each instance.current_design topuniquifycompile
Each design optimized separately What are advantages and disadvantages?
35
Ungroup Method (“Flattening”) The ungroup command makes unique copies
of the design and removes levels of the hierarchy
current_design Bungroup {U3 U4}current_design topcompile
What are advantages and disadvantages?36
Benefits of Ungrouping Hierarchy
37
module logic1(input a, c, e, output reg x);always @(a, c, e) x = ((~a|~c) & e) | (a&c);endmodule
module logic2(input a, b, c, d, output reg y);always @(a, b, c, d) y = ((((~a|~c)&b) | ((a|~b)&c))&d) | ((a|~b)&~d);endmodule
module logic(input a, b, c, d, e, f, output reg z);wire x, y;logic1(a, c, e, x);logic2(a, b, c, d, y);always @(x, y, f) z = (~f&x) | (f&y);endmodule
Without HierarchyArea: 34.15Delay: 0.25
With HierarchyArea: 36.15Delay: 0.25
Ungrouping versus Boolean Flattening Ungrouping is commonly referred to as
“Flattening the Hierarchy”, even by tool vendors
Because of this, many people incorrectly think the “set_flatten true” option in Synopsys is the same as “ungroup”
set_flatten true tells Design Vision to flatten the Boolean equations describing your logic down to a two-level expression. That is, to create a Sum of Products expression.
Flattening Boolean equations is a way of reducing delay at the cost of increased area – we’ll talk about it more in a later lecture. 38
Dealing with Structured Logic Sometimes we do not want the synthesis tool
to try to optimize our Boolean equations. Structured Logic refers to Boolean logic
operations that are structured in a certain way to achieve a goal, such as reduced delay or fault tolerance.
Examples: Carry-Lookahead Adder, Wallace Multiplier, duplicated logic
set_structure true (default) – tells the tool it can re-order, factor, or decompose the logic equations
set_structure false – tells the tool to leave the logic alone
39
Checking your Design Use the check_design command to verify
design consistency. Usually run both before and after compiling a
design Gives a list of warning and error messages Errors will cause compiles to fail Warnings indicate a problem with the current
design Try to fix all of these, since later they can lead to
problems Use check_design –summary or check_design -
no_warnings to limit the number of warnings given
Use check_timing to locate potential timing problems
40
Analyzing your Design [1] There are several commands to analyze your
design report_design
display characteristics of the current design operating conditions, wire load model, output delays, etc. parameters used by the design
report_area displays area information for the current design number of nets, ports, cells, references area of combinational logic, non-combinational,
interconnect, total
41
Analyzing Your Design [2] report_hierarchy
displays the reference hierarchy of the current design tells modules/cells used and the libraries they come from
report_timing reports timing information about the design default shows one worst case delay path
report_resources Lists the resources and datapath blocks used by the
current design
Can send reports to files report_resources > cmult_resources.rpt
Lots of other report commands available
42
Synthesis Scripts Synthesis scripts provide a convenient
method for performing synthesis multiple times
To run the script, enter the directory which contains the Verilog code and type: dc_shell –tcl_mode –f script.tcl dc_shell –tcl_mode –f script.tcl > log.txt &
This will start the script and store its output to log.txt
4343
44
Example Synthesis Script
analyze -library WORK -format verilog {/.register_file_behave.v}
elaborate reg_file_behave -architecture verilog -library WORKcreate_clock –name "clk" -period 2 -waveform {0 1} {clk}set_dont_touch_network [ find clock clk ]set_max_area 30000check_designuniquifycompile -map_effort mediumreport_area > area_report.txtreport_timing > timing_report.txtreport_constraint -all_violators > violator_report.txt
44
Design Optimization: FIR Filter Used in signal processing Passes through some data but not all (filter!) Example: Remove noise from image/sound
Uses multipliers and adders Multiply constant “tap” value against time-
delayed input value
In the Verilog, y is out, bk is taps, and x is data
45
][][0
knxbnyM
k k
FIR Filter Design
46
b 1
yF IR [n ]
x [n ] z -1
xb 0
z -1
x
+
x
+
b 2
z -1
x
+
b M
x[n-M ]x [n -1 ] x [n -2 ]
F ilte r taps
Design Optimization: FIR Filter We’ll look at three different approaches to
implementing this filter “Initial” “Small” “Fast”
We’ll revisit the idea of re-architecting algorithms for better area, latency, and throughput later.
As an exercise, you should take some time on your own to try to understand exactly what is happening in each of the following code segments.
Learning to read and understand someone else’s (confusing) code is an extremely valuable skill
47
Initial Design: Code [1]
48
module fir_init(clk, rst, in, out); parameter bitwidth = 8; parameter ntaps = 4; parameter logntaps = 2;
input clk, rst; input [bitwidth-1:0] in; output reg [bitwidth-1:0] out;
reg [bitwidth-1:0] taps [0:ntaps-1]; reg [bitwidth-1:0] data [0:ntaps-1]; reg [logntaps:0] count; integer i;
Initial Design: Code [2]
49
always @(posedge clk) begin if (rst) begin
// indicate we need to load all the tap values count <= 0; // reset the data and taps for (i = 0; i < ntaps; i = i + 1) begin: resetloop data[i] <= 0; taps[i] <= 0; end
end else if (count < ntaps) begin
// we need to load the tap values before filtering for (i = ntaps-1; i > 0; i = i - 1) begin: loadtaps taps[i] <= taps[i-1]; end // load the new value at tap[0] taps[0] <= in; count <= count+1;
end
Initial Design: Code [3]
50
else begin // ready to do the filtering // first shift in the new input data value for (i = ntaps-1; i > 0; i = i - 1) begin: shiftdata data[i] <= data[i-1]; end // load the new value at data[0] data[0] <= in;
end // else: !if(count < ntaps) end // always @ (posedge clk) // compute the filtered result always @(*) begin out = 0; for (i = 0; i < ntaps; i = i + 1) begin: filterloop
out = out + (data[i] * taps[ntaps-1 - i]); end endendmodule
Initial Design: Synthesis Constraints
CLK_PERIOD 4 INPUT_DELAY 0.2 OUTPUT_DELAY 0.2 MAX_AREA 8000
Results Arrival Time 3.13 Slack .67 (MET) Area 7335
51
Should we make our contraints more aggressive?
Small Design: Code [1]
53
module fir_area(clk, rst, in, out); parameter bitwidth = 8; parameter ntaps = 4; parameter logntaps = 2;
input clk, rst; input [bitwidth-1:0] in; output reg [bitwidth-1:0] out;
reg [bitwidth-1:0] taps [0:ntaps-1]; reg [bitwidth-1:0] data [0:ntaps-1]; reg [bitwidth-1:0] partial; reg [logntaps:0] count; reg [logntaps-1:0] step; reg ready; // indicates ready to filter integer i;
Small Design: Code [2]
54
always @(posedge clk) begin if (rst) begin
// indicate we need to load all the tap values count <= 0; ready <= 0; // reset the data and taps for (i = 0; i < ntaps; i = i + 1) begin: resetloop data[i] <= 0; taps[i] <= 0; end
end else if (count < ntaps && ~ready) begin
// we need to load the tap values before filtering for (i = ntaps-1; i > 0; i = i - 1) begin: loadtaps taps[i] <= taps[i-1]; end // load the new value at tap[0] taps[0] <= in; count <= count+1;if (count >= ntaps) begin ready <= 1; count <= 0; end
end
Small Design: Code [3]
55
else begin // ready to do the filtering // first shift in the new input data value for (i = ntaps-1; i > 0; i = i - 1) begin: shiftdata data[i] <= data[i-1]; end // load the new value at data[0] data[0] <= in;
end // else: !if(count < ntaps) end // always @ (posedge clk)
Small Design: Code [4]
56
// compute the filtered result always @(posedge clk) begin if (rst || ~ready) begin step <= 0; partial <= 0; end else begin if (step == 0) begin out <= partial; partial <= (data[0] * taps[ntaps-1]); end else begin out <= out; partial <= partial + (data[step] * taps[ntaps - 1 – step]); end if (step < ntaps-1) step <= step + 1; else step <= 0; end endendmodule
Small Design: Synthesis Constraints
CLK_PERIOD 4 INPUT_DELAY 0.2 OUTPUT_DELAY 0.2 MAX_AREA 8000
Results Arrival Time 2.76 (vs. 3.13) Slack .92 (MET) (4 clock cycles) Area 5754 (vs. 7335)
What are the tradeoffs?
57
Fast Design: Code [1]
59
module fir_fast(clk, rst, in, out); parameter bitwidth = 8; parameter ntaps = 4; parameter logntaps = 2;
input clk, rst; input [bitwidth-1:0] in; output [bitwidth-1:0] out;
reg [bitwidth-1:0] taps [0:ntaps-1]; reg [bitwidth-1:0] mult [0:ntaps-1]; reg [bitwidth-1:0] partial [0:ntaps-1]; reg [logntaps:0] count; reg ready; // indicates ready to filter
integer i;
assign out = partial[ntaps-1];
Fast Design: Code [2]
60
always @(posedge clk) begin if (rst) begin
// indicate we need to load all the tap values count <= 0; // reset the taps for (i = 0; i < ntaps; i = i + 1) begin: resetloop taps[i] <= 0; end
end else if (count < ntaps && ~ready) begin
// we need to load the tap values before filtering for (i = ntaps-1; i > 0; i = i - 1) begin: loadtaps taps[i] <= taps[i-1]; end // load the new value at tap[0] taps[0] <= in; count <= count+1;
end
Fast Design: Code [3]
61
else begin // taps stay the same
end // else: !if(count < ntaps) end // always @ (posedge clk)
// compute the filtered result (pipelined) always @(posedge clk) begin // get the product of the input with each of the tap values for (i = 0; i < ntaps; i = i + 1) mult[i] <= in * taps[i]; // special case at front partial[0] <= mult[0]; // get the partial sums for the rest for (i = 1; i < ntaps; i = i + 1) partial[i] <= partial[i-1] + mult[i]; endendmodule
Fast Design: Synthesis Constraints
CLK_PERIOD 4 INPUT_DELAY 0.2 OUTPUT_DELAY 0.2 MAX_AREA 8000
Results Arrival Time 1.92 (vs. 3.13) Slack 1.82 (MET) (1 clock cycle!*) Area 7311 (vs. 7335)What are the tradeoffs?
62
Optimization Strategies Area vs. Delay - Often only really optimize for
one “Fastest given an area constraint” “Smallest given a speed constraint”
Design Compiler Reference Manual has several pointers on synthesis settings for these goals
In some ways, synthesis is as much an art as it is a science
Experiment with different options to see how they interact with each other
64
Design Examples All using same constraints No special synthesis options Can get even more dramatic results by
combining: Coding style Tight constraints Synthesis optimization options
65
Some More “Small Design” Results
66
constraints results
area inputdelay
outputdelay
clock period
area slack
compile –area_effort medium 8000 0.2 0.2 4 5797 2.05
compile –area_effort high 5500 0.2 0.2 4 5778 2.05
compile ultra 5500 0.2 0.2 4 5242 1.42
compile ultra 5000 0.2 0.2 4 5242 1.42
compile + compile ultra 5000 0.2 0.2 4 6562 1.78
compile ultra 5500 0.2 0.2 2 5274 0.01
compile ultra 5500 0.2 0.2 1.8 5391 0.00
compile ultra 5500 0.2 0.2 1.7 5519 0.00
compile ultra (rst no delay) 5500 0.2 0.2 1.7 5414 0.00
compile ultra 5500 0.1 0.1 1.7 5636 0.01
compile ultra (rst no delay) 5500 0.1 0.1 1.7 5414 0.00
compile ultra 5500 0.5 0.5 1.7 5923 0.00
compile ultra (rst no delay) 5500 0.5 0.5 1.7 5414 0.00
67
Scriptanalyze -library WORK -format verilog {fir_area.v}elaborate fir_area -architecture verilog -library WORKcreate_clock -name "clk" -period 4 {clk}set_dont_touch_network [ find clock clk ]set_max_area 5000set NORM_INPUTS [remove_from_collection [all_inputs] "clk rst"]#set NORM_INPUTS [remove_from_collection [all_inputs] "clk"]set_input_delay 0.2 -max -clock clk $NORM_INPUTSset_output_delay 0.2 -max -clock clk [all_outputs]check_design > check_design.txtuniquify#compile -map_effort medium -area_effort mediumcompile -map_effort high -area_effort highcompile_ultrareport_area > area_report.txtreport_timing > timing_report.txtreport_constraint -all_violators > violator_report.txtexit