dm 2 2014 [modalità compatibilità] -...
TRANSCRIPT
© Digital Integrated Circuits2nd Design Methodologies
DesignMethodologies:standard cell synthesis flow
© Digital Integrated Circuits2nd Design Methodologies
Semicustom Design Flow
HDLHDL
Logic SynthesisLogic Synthesis
FloorplanningFloorplanning
PlacementPlacement
RoutingRouting
GDSII file.Tape-out to silicon foundry
for mask generation
Circuit ExtractionCircuit Extraction
Pre-Layout SimulationPre-Layout Simulation
Post-Layout Simulation
Post-Layout Simulation
Structural (RTL)
Physical
BehavioralDesign Capture
Des
ign
Itera
tion
Gate-level netlist
Floorplan
Place&Route
Layout
Parasitic extraction
© Digital Integrated Circuits2nd Design Methodologies
Semicustom design flowDesign capture: schematics, block diagrams, HDLs, imported IPs
Logic synthesis: from HDL language into a gate-level netlist, combined with the netlist of reused or generated macros
PreLayout Simulation: (grossly) estimated parasitics and layout parameters; performance analysis
Floorplanning: chip outlay creation based on estimated modulesizes, early design of clock and power distribution networks
Placement: Precise positioning of cells within blocks
© Digital Integrated Circuits2nd Design Methodologies
Semicustom design flowRouting: Interconnects between cells and blocks
Extraction: chip model from actual physical layout and parasitics
PostLayout Simulation: Check functionality and correctnessof the circuit in presence of layout parasitics; Performance AND Power analysis
Tape out: binary file generation in GDSII format, containing information needed for mask generation. To silicon foundry.
© Digital Integrated Circuits2nd Design Methodologies
Integrating Logic synthesis with Physical Design
Physical SynthesisPhysical Synthesis
RTL (Timing) Constraints
Place-and-RouteOptimization
Place-and-RouteOptimization
Netlist with Place-and-Route Info
MacromodulesFixed netlists
Logic synthesiswith first-orderplace-and-route
Accurate Place-and-routemeeting timing constraints
• Exponential increaseof design tool complexity and run-time
© Digital Integrated Circuits2nd Design Methodologies
Design Environment The process parameters Technology library Operating conditions (PVT)
I/O port attributes Driving strength of input ports Capacitive Loading of output ports Design rule constraints
– max_transition, max_fanout, max_capacitance
Statistical wire-load model wirelength=f(fanout) Resistance/Capacitance/Area-per-unit-length given pre-layout static timing analysis
© Digital Integrated Circuits2nd Design Methodologies
Input and output delay constraints
These parameters may have a
tremendous impact on driving strength of boundary cells
and power consumption of the design as a whole
© Digital Integrated Circuits2nd Design Methodologies
Design constraints Clock signal specification
Period Duty cycle Transition time Skew
Delay specifications Maximum delays Minimum delays
Timing exceptions Multicycle paths False paths
Path grouping E.g., for multi-clock designs
When the max. speed of the design is searched for,
then a max. period of 0.1ns can be given as a constraint.
The min. period can be derived from the amount of
violation
© Digital Integrated Circuits2nd Design Methodologies
Design constraints
Enforce absolute constraints
Extract timing of paths
Are bundling constraints fulfilled?
Enforce minimum delay requirements on bundling paths
Clock signal specification Period Duty cycle Transition time Skew
Delay specifications Maximum delays Minimum delays
Timing exceptions Multicycle paths False paths
Path grouping E.g., for multi-clock designs
© Digital Integrated Circuits2nd Design Methodologies
Design constraints Clock signal specification
Period Duty cycle Transition time Skew
Delay specifications Maximum delays Minimum delays
Timing exceptions Multicycle paths False paths
Path grouping E.g., for multi-clock designs
set_multicycle_path -from U1 -to U5
© Digital Integrated Circuits2nd Design Methodologies
Performance-Area/Power trade-off during logic synthesis
Target clock periods for 32 bit adders
0500
1000150020002500300035004000
CLF PPARCH CSM RPL
Istance (DW)
[ps]
012345678
BK PPARCH CSM RPL
[ns]
Istance (DW)
Target clock periods for64 bit adders
LET US COMPARE SEVERAL ADDER IMPLEMENTATIONS WHILE RELAXING TARGET CLOCK SPEED FOR SYNTHESIS
•As the target clock period increases, new adder architectures come progressively into play (see lower side of bars in the plots).• As the period is further increased, adders’ slack is exploited for power optimizations (RTL netlist transformations, insertion of HVT cells), therefore adders do not show slacks for a certain time window • After a certain period, RTL netlists of adders cannot be power-optimized any more, and they start having slacks (upper side of the bars in the plots)
© Digital Integrated Circuits2nd Design Methodologies
Area-Power for 32 bit adders
Area 32 bit
0200400600800
100012001400160018002000
330 341 385 418 979 1199 3500
Period [ps]
[u^2
]
CLF
BK
PPARCH
CLA
CSM
RPCS
RPL
Power 32 Bit
0,00E+000
1,00E-003
2,00E-003
3,00E-003
4,00E-0035,00E-003
6,00E-003
7,00E-003
8,00E-003
9,00E-003
1,00E-002
330 341 385 418 979 1199 3500
Period [ps]To
tal P
ower
CLFBKPPARCHCLACSMRPCSRPL
Let us sweep a range of target clock periodsMaximum data introduction rateSynthesis tool optimizes adder slack for power.
• The “new entry” adder for a given target period is never the most power efficient• Higher area always means higher power
© Digital Integrated Circuits2nd Design Methodologies
FloorplanningTypical issues the floorplanning tool copes with: does the design fit the chip budgeted area? estimates area of major units and defines their relative placement based on some objective function estimates wire lengths and wiring congestion, although more advanced
cost functions can be considered:
Best IR drop solutions
spread out the hot spot across a large part of the floorplan,
instead of concentrating it
in a specific region.
Having high communication traffic
(thick lines) spread over short (up) or
long (bottom) virtual links is likely to
heavily affect the power required for data transmission
later on.
© Digital Integrated Circuits2nd Design Methodologies
Placement
Placement: assign cells to positions on the chip, such that no two cells overlap with each other (legalization), and some costfunction (e.g., projected wirelength) is optimized Real objective functions are more complex: wirelength, routability/channel density, timing,
power,....
© Digital Integrated Circuits2nd Design Methodologies
Link with routing Ideally, placement and routing (P&R) should be
performed simultaneously as they depend on eachother’s results
This is however often too computation-intensive Approximation: placement estimates the wire length
of a net using some wirelength modelDuring P&R, the gate-level netlist will change:-Buffer insertion-Driving strength resizing-Local logic optimizations ending up in selective netlist modifications to meet design constraints-Avoid routing congestion
These are good reasons why row utilization should NOT be 100%
© Digital Integrated Circuits2nd Design Methodologies
Routing CongestionOccurs when the demand for routing resources in a region
(tends to) exceeds their supply, estimated based on a pre-routing model Unexpected delay overheads on wires
Take the most resistive metal layers Detours Use of lots of delay-penalizing vias Higher susceptibility to crosstalk
Design convergence unpredictable Designer may not be focusing on the real critical paths Detours may push congestion to neighboring regions Possible outcome: unroutability, of failure to meet timing constraints
Impact on Yield Due to the increased number of vias Due to higher probability of shorts and opens in critical regions due to random
defects Due to the larger area needed to «resolve» a congested design
© Digital Integrated Circuits2nd Design Methodologies
Placement
Acting upon the “row utilization parameter” of most placement tools, a given cell placement density can be achieved to compact vs. alleviate routing congestion of the design (pay attention to the trade-offs with
timing closure, area and power budgets!)
Quantificationbased on some routingmodel
© Digital Integrated Circuits2nd Design Methodologies
The (lucky) physical synthesis flowOpen tool Create Floorplan Create Power Grid Placement
Timing analysisInsert clock tree
(CTS)Timing
optimizationand new reports
Routing
Post-routing optimization and design closure
Timing convergence (min‐max analysis)Clock Domain Clock Period Slack Pre‐Opt Slack Post‐Opt
clk_Audio 100 MHz 5,75 ns 3,45 ns
clk_CPU 500 MHz 0,2 ns 0,09 ns
clk_DDR 250 MHz 0,38 ns 0,17 ns
clk_DMA 200 MHz 0,65 ns 0,39 ns
clk_DSP 300 MHz 0,34 ns 0,19 ns
clk_Radio 150 MHz 1,12 ns 0,82 ns
clk_SD_USB_WiFi 200 MHz 0,53 ns 0,23 ns
clk_SPI 140 MHz 6,33 ns 2,27 ns
clk_SRAM 500 MHz ‐0,19 ns 0,14 ns
clk_Video 300 MHz 0,38 ns 0,2 ns
First‐time‐right design is a dream!
Cells in those clock domains that have a big slack are relaxed (from a driving strength viewpoint) to save power.
Small timinig violations in the fastest clock domains are easily fixed.
© Digital Integrated Circuits2nd Design Methodologies
In practice: The “Timing Closure” Concern
Courtesy Synopsys
Iterative Removal of Timing and Layout Violations (white lines)Synthesis iterations, buffer insertion, placement constraints, routing issues,..
Initial design Intermediate design Final design
Due to the increased role of parasitics(mostly interconnect-related) in deep sub-micron designs, prediction models of synthesis tools are having a hard time
© Digital Integrated Circuits2nd Design Methodologies
Case study – a NoC switchFixing layout rule violations
SWITCH
Arbiter
CrossbarOutbuf
Outbuf
Outbuf
Outbuf
Inputbuf
Inputbuf
Inputbuf
Inputbuf
© Digital Integrated Circuits2nd Design Methodologies
Case study – a NoC switchFixing layout rule violations
SWITCH
OutbufInputbuf
Inputbuf
Inputbuf
Inputbuf
Arbiter
Outbuf
Arbiter
Outbuf
Arbiter
Outbuf
Arbiter
Crossbar&
control
© Digital Integrated Circuits2nd Design Methodologies
Switch radixTopologies often differentiate themselves based
on the switch radix they require
65nm MVth 1.2V technology; Clock gating enabled
Area and power increased with switch radix, while frequency decreased dramatically
© Digital Integrated Circuits2nd Design Methodologies
Switch radix
Placement-aware logic synthesis worked as expected Physical synthesis is aware of placement…not routing!!!!
Beginning from 14x14 switches, wire density in the switch crossbar becomes an issue
Meeting timing constraints, avoiding crosstalk and resolving DRC violations cannot be met at the same time
Hundreds of violations in 14x14, tens of thousands in 30x30
© Digital Integrated Circuits2nd Design Methodologies
Switch radix There are two options to
fix DRC violations Increase switch area Decrease switch
frequency
Switch area can be controlled by specifying “row utilization” parameter85% was OK up to 10x1070% was OK for 14x14At 30x30 even an utilization of 50% did not fix violationsTuning switch area only partially effective
© Digital Integrated Circuits2nd Design Methodologies
Switch radix There are two options to
fix DRC violations Increase switch area Decrease switch
frequency
25% slow-down was OK for 14x1430% was OK for 18x18At 30x30 even halving clock speed did notfix violationsFrequency slowdown somewhat more effective for this design (not general!)
© Digital Integrated Circuits2nd Design Methodologies
Switch radix Key take-away: high radix switches at 65nm are feasible until
10x10 or 14x14, after which their overhead in area and frequency becomes too severe.
We would need long links to connect cores
to the switch. They would be pipelined,
with additional area and power cost
© Digital Integrated Circuits2nd Design Methodologies
Standard cell designIt has become immensely popular, except for• very high performance ICs• ultra low energy consumption ICs• extremely regularly structured ICs (memory, multiplier,..)
Reasons for the success• Increased quality of automatic cell placementand routing tools
• Availability of multiple routing layers• Advent of sophisticated logic-synthesis tools
- abstract design inputs: behavioural models, RTL models
- gate-level netlist production(behavioural synthesis, logic synthesis, respectively)
Drawbacks• Cell redesign with every migration to a new technology• Huge cost for mask sets (order of hundreds of thousands $)
© Digital Integrated Circuits2nd Design Methodologies
Macrocells
For certain blocks, standard cell approach might be inefficient
(multipliers, memories, embedded up, DSPs) Blocks whose complexity is larger thantraditional standard cells: macrocells 2 kinds of macrocells:Hard Macros and Soft Macros
© Digital Integrated Circuits2nd Design Methodologies
Hard macrocells custom designs of the requested functions Functionality and layout are fixed Some parameterization is feasible (e.g., multipliers,
memories) good properties of custom design
(dense layout, optimized performance and power) opportunity for reuse in many designs hard to port them to new manifacturers or
technologies -> less and less used Examples: embedded uP or memories, DSPs Parameterization by means of module compilers
Replication of basic macrocells
© Digital Integrated Circuits2nd Design Methodologies
Hard MacroModules
25632 (or 8192 bit) SRAM, generated by hard-macro module generator (or memory compiler)• automatic layout generation• provides timing and power information• adds redundancy to deal with defects
© Digital Integrated Circuits2nd Design Methodologies
Soft MacrocellsModule with a given functionality, but without a
specific physical implementation• Placement and routing may vary from instance to instance• Timing is not predictable – wait for final layout• No advantages of full custom design, they rely on the semicustom physical design process• Ease of migration to new technologies• Structural generators: specify function and parameters, and they generate:-a netlist of standard cells-constraints for the place and route toolsCleverer structures (than logic synthesis) based on
function knowledge (e.g., multipliers)
© Digital Integrated Circuits2nd Design Methodologies
“Soft” MacroModules
Synopsys DesignCompiler
2 instances of 8x8 multiplier module with different aspect ratios
Input to Module compiler
Macrocell generator: optimized connection of standard cellsSoft approach advantages: different aspect ratios can be generated
© Digital Integrated Circuits2nd Design Methodologies
Hybrid ASIC design methodology•Macromodules have changed the semicustom design landscape: design reuse instead of designing from scratch
•Macrocells can be acquired from third-party vendors, who make the parts available through royalty or licensing agreement (Intellectual Property modules, IPs)
•Examples: embedded microprocessors, DSPs, bus interfaces (e.g., PCI), special purpose functions (FFT, ECC, MPEG dec.), graphic accelerators (GPUs)
•For an IP to be useful, it has to come with appropriate software tools, not just hardware (e.g., xdevelopment toolchain, test benches for validation)
© Digital Integrated Circuits2nd Design Methodologies
“Intellectual Property”
A Protocol stack SoC for Wireless
Tensilica Xtensa soft-core
generated from Verilog description
Hard-wired(std cells)
A typical SoC consists of a blend of design styles and modules, embedding a number of hard or soft macrocells within a sea of
standard cells
Hard Macrocells(compiler fromProcess vendor)