dm 2 2014 [modalità compatibilità] -...

39
© Digital Integrated Circuits 2nd Design Methodologies Design Methodologies: standard cell synthesis flow

Upload: hoanganh

Post on 07-Sep-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

© Digital Integrated Circuits2nd Design Methodologies

DesignMethodologies:standard cell synthesis flow

© Digital Integrated Circuits2nd Design Methodologies

Semicustom Design Flow

HDLHDL

Logic SynthesisLogic Synthesis

FloorplanningFloorplanning

PlacementPlacement

RoutingRouting

GDSII file.Tape-out to silicon foundry

for mask generation

Circuit ExtractionCircuit Extraction

Pre-Layout SimulationPre-Layout Simulation

Post-Layout Simulation

Post-Layout Simulation

Structural (RTL)

Physical

BehavioralDesign Capture

Des

ign

Itera

tion

Gate-level netlist

Floorplan

Place&Route

Layout

Parasitic extraction

© Digital Integrated Circuits2nd Design Methodologies

Semicustom design flowDesign capture: schematics, block diagrams, HDLs, imported IPs

Logic synthesis: from HDL language into a gate-level netlist, combined with the netlist of reused or generated macros

PreLayout Simulation: (grossly) estimated parasitics and layout parameters; performance analysis

Floorplanning: chip outlay creation based on estimated modulesizes, early design of clock and power distribution networks

Placement: Precise positioning of cells within blocks

© Digital Integrated Circuits2nd Design Methodologies

Semicustom design flowRouting: Interconnects between cells and blocks

Extraction: chip model from actual physical layout and parasitics

PostLayout Simulation: Check functionality and correctnessof the circuit in presence of layout parasitics; Performance AND Power analysis

Tape out: binary file generation in GDSII format, containing information needed for mask generation. To silicon foundry.

© Digital Integrated Circuits2nd Design Methodologies

Integrating Logic synthesis with Physical Design

Physical SynthesisPhysical Synthesis

RTL (Timing) Constraints

Place-and-RouteOptimization

Place-and-RouteOptimization

Netlist with Place-and-Route Info

MacromodulesFixed netlists

Logic synthesiswith first-orderplace-and-route

Accurate Place-and-routemeeting timing constraints

• Exponential increaseof design tool complexity and run-time

© Digital Integrated Circuits2nd Design Methodologies

Logic synthesis

© Digital Integrated Circuits2nd Design Methodologies

Design Environment The process parameters Technology library Operating conditions (PVT)

I/O port attributes Driving strength of input ports Capacitive Loading of output ports Design rule constraints

– max_transition, max_fanout, max_capacitance

Statistical wire-load model wirelength=f(fanout) Resistance/Capacitance/Area-per-unit-length given pre-layout static timing analysis

© Digital Integrated Circuits2nd Design Methodologies

Input and output delay constraints

These parameters may have a

tremendous impact on driving strength of boundary cells

and power consumption of the design as a whole

© Digital Integrated Circuits2nd Design Methodologies

Design constraints Clock signal specification

Period Duty cycle Transition time Skew

Delay specifications Maximum delays Minimum delays

Timing exceptions Multicycle paths False paths

Path grouping E.g., for multi-clock designs

When the max. speed of the design is searched for,

then a max. period of 0.1ns can be given as a constraint.

The min. period can be derived from the amount of

violation

© Digital Integrated Circuits2nd Design Methodologies

Design constraints

Enforce absolute constraints

Extract timing of paths

Are bundling constraints fulfilled?

Enforce minimum delay requirements on bundling paths

Clock signal specification Period Duty cycle Transition time Skew

Delay specifications Maximum delays Minimum delays

Timing exceptions Multicycle paths False paths

Path grouping E.g., for multi-clock designs

© Digital Integrated Circuits2nd Design Methodologies

Design constraints Clock signal specification

Period Duty cycle Transition time Skew

Delay specifications Maximum delays Minimum delays

Timing exceptions Multicycle paths False paths

Path grouping E.g., for multi-clock designs

set_multicycle_path -from U1 -to U5

© Digital Integrated Circuits2nd Design Methodologies

Performance-Area/Power trade-off during logic synthesis

Target clock periods for 32 bit adders

0500

1000150020002500300035004000

CLF PPARCH CSM RPL

Istance (DW)

[ps]

012345678

BK PPARCH CSM RPL

[ns]

Istance (DW)

Target clock periods for64 bit adders

LET US COMPARE SEVERAL ADDER IMPLEMENTATIONS WHILE RELAXING TARGET CLOCK SPEED FOR SYNTHESIS

•As the target clock period increases, new adder architectures come progressively into play (see lower side of bars in the plots).• As the period is further increased, adders’ slack is exploited for power optimizations (RTL netlist transformations, insertion of HVT cells), therefore adders do not show slacks for a certain time window • After a certain period, RTL netlists of adders cannot be power-optimized any more, and they start having slacks (upper side of the bars in the plots)

© Digital Integrated Circuits2nd Design Methodologies

Area-Power for 32 bit adders

Area 32 bit

0200400600800

100012001400160018002000

330 341 385 418 979 1199 3500

Period [ps]

[u^2

]

CLF

BK

PPARCH

CLA

CSM

RPCS

RPL

Power 32 Bit

0,00E+000

1,00E-003

2,00E-003

3,00E-003

4,00E-0035,00E-003

6,00E-003

7,00E-003

8,00E-003

9,00E-003

1,00E-002

330 341 385 418 979 1199 3500

Period [ps]To

tal P

ower

CLFBKPPARCHCLACSMRPCSRPL

Let us sweep a range of target clock periodsMaximum data introduction rateSynthesis tool optimizes adder slack for power.

• The “new entry” adder for a given target period is never the most power efficient• Higher area always means higher power

© Digital Integrated Circuits2nd Design Methodologies

FloorplanningTypical issues the floorplanning tool copes with: does the design fit the chip budgeted area? estimates area of major units and defines their relative placement based on some objective function estimates wire lengths and wiring congestion, although more advanced

cost functions can be considered:

Best IR drop solutions

spread out the hot spot across a large part of the floorplan,

instead of concentrating it

in a specific region.

Having high communication traffic

(thick lines) spread over short (up) or

long (bottom) virtual links is likely to

heavily affect the power required for data transmission

later on.

© Digital Integrated Circuits2nd Design Methodologies

Placement

Placement: assign cells to positions on the chip, such that no two cells overlap with each other (legalization), and some costfunction (e.g., projected wirelength) is optimized Real objective functions are more complex: wirelength, routability/channel density, timing,

power,....

© Digital Integrated Circuits2nd Design Methodologies

Link with routing Ideally, placement and routing (P&R) should be

performed simultaneously as they depend on eachother’s results

This is however often too computation-intensive Approximation: placement estimates the wire length

of a net using some wirelength modelDuring P&R, the gate-level netlist will change:-Buffer insertion-Driving strength resizing-Local logic optimizations ending up in selective netlist modifications to meet design constraints-Avoid routing congestion

These are good reasons why row utilization should NOT be 100%

© Digital Integrated Circuits2nd Design Methodologies

Wirelength estimation models

© Digital Integrated Circuits2nd Design Methodologies

Routing CongestionOccurs when the demand for routing resources in a region

(tends to) exceeds their supply, estimated based on a pre-routing model Unexpected delay overheads on wires

Take the most resistive metal layers Detours Use of lots of delay-penalizing vias Higher susceptibility to crosstalk

Design convergence unpredictable Designer may not be focusing on the real critical paths Detours may push congestion to neighboring regions Possible outcome: unroutability, of failure to meet timing constraints

Impact on Yield Due to the increased number of vias Due to higher probability of shorts and opens in critical regions due to random

defects Due to the larger area needed to «resolve» a congested design

© Digital Integrated Circuits2nd Design Methodologies

Placement

Acting upon the “row utilization parameter” of most placement tools, a given cell placement density can be achieved to compact vs. alleviate routing congestion of the design (pay attention to the trade-offs with

timing closure, area and power budgets!)

Quantificationbased on some routingmodel

© Digital Integrated Circuits2nd Design Methodologies

The (lucky) physical synthesis flowOpen tool Create Floorplan Create Power Grid Placement

Timing analysisInsert clock tree

(CTS)Timing

optimizationand new reports

Routing

Post-routing optimization and design closure

Timing convergence (min‐max analysis)Clock Domain Clock Period Slack Pre‐Opt Slack Post‐Opt

clk_Audio 100 MHz 5,75 ns 3,45 ns

clk_CPU 500 MHz 0,2 ns 0,09 ns

clk_DDR 250 MHz 0,38 ns 0,17 ns

clk_DMA 200 MHz 0,65 ns 0,39 ns

clk_DSP 300 MHz 0,34 ns 0,19 ns

clk_Radio 150 MHz 1,12 ns 0,82 ns

clk_SD_USB_WiFi 200 MHz 0,53 ns 0,23 ns

clk_SPI 140 MHz 6,33 ns 2,27 ns

clk_SRAM 500 MHz ‐0,19 ns 0,14 ns

clk_Video 300 MHz 0,38 ns 0,2 ns

First‐time‐right design is a dream!

Cells in those clock domains that have a big slack are relaxed (from a driving strength viewpoint) to save power.

Small timinig violations in the fastest clock domains are easily fixed.

© Digital Integrated Circuits2nd Design Methodologies

In practice: The “Timing Closure” Concern

Courtesy Synopsys

Iterative Removal of Timing and Layout Violations (white lines)Synthesis iterations, buffer insertion, placement constraints, routing issues,..

Initial design Intermediate design Final design

Due to the increased role of parasitics(mostly interconnect-related) in deep sub-micron designs, prediction models of synthesis tools are having a hard time

© Digital Integrated Circuits2nd Design Methodologies

Case study – a NoC switchFixing layout rule violations

SWITCH

Arbiter

CrossbarOutbuf

Outbuf

Outbuf

Outbuf

Inputbuf

Inputbuf

Inputbuf

Inputbuf

© Digital Integrated Circuits2nd Design Methodologies

Case study – a NoC switchFixing layout rule violations

SWITCH

OutbufInputbuf

Inputbuf

Inputbuf

Inputbuf

Arbiter

Outbuf

Arbiter

Outbuf

Arbiter

Outbuf

Arbiter

Crossbar&

control

© Digital Integrated Circuits2nd Design Methodologies

Switch radixTopologies often differentiate themselves based

on the switch radix they require

65nm MVth 1.2V technology; Clock gating enabled

Area and power increased with switch radix, while frequency decreased dramatically

© Digital Integrated Circuits2nd Design Methodologies

Switch radix

Placement-aware logic synthesis worked as expected Physical synthesis is aware of placement…not routing!!!!

Beginning from 14x14 switches, wire density in the switch crossbar becomes an issue

Meeting timing constraints, avoiding crosstalk and resolving DRC violations cannot be met at the same time

Hundreds of violations in 14x14, tens of thousands in 30x30

© Digital Integrated Circuits2nd Design Methodologies

Switch radix There are two options to

fix DRC violations Increase switch area Decrease switch

frequency

Switch area can be controlled by specifying “row utilization” parameter85% was OK up to 10x1070% was OK for 14x14At 30x30 even an utilization of 50% did not fix violationsTuning switch area only partially effective

© Digital Integrated Circuits2nd Design Methodologies

Switch radix There are two options to

fix DRC violations Increase switch area Decrease switch

frequency

25% slow-down was OK for 14x1430% was OK for 18x18At 30x30 even halving clock speed did notfix violationsFrequency slowdown somewhat more effective for this design (not general!)

© Digital Integrated Circuits2nd Design Methodologies

Switch radix Key take-away: high radix switches at 65nm are feasible until

10x10 or 14x14, after which their overhead in area and frequency becomes too severe.

We would need long links to connect cores

to the switch. They would be pipelined,

with additional area and power cost

© Digital Integrated Circuits2nd Design Methodologies

Standard cell designIt has become immensely popular, except for• very high performance ICs• ultra low energy consumption ICs• extremely regularly structured ICs (memory, multiplier,..)

Reasons for the success• Increased quality of automatic cell placementand routing tools

• Availability of multiple routing layers• Advent of sophisticated logic-synthesis tools

- abstract design inputs: behavioural models, RTL models

- gate-level netlist production(behavioural synthesis, logic synthesis, respectively)

Drawbacks• Cell redesign with every migration to a new technology• Huge cost for mask sets (order of hundreds of thousands $)

© Digital Integrated Circuits2nd Design Methodologies

Macrocells

For certain blocks, standard cell approach might be inefficient

(multipliers, memories, embedded up, DSPs) Blocks whose complexity is larger thantraditional standard cells: macrocells 2 kinds of macrocells:Hard Macros and Soft Macros

© Digital Integrated Circuits2nd Design Methodologies

Hard macrocells custom designs of the requested functions Functionality and layout are fixed Some parameterization is feasible (e.g., multipliers,

memories) good properties of custom design

(dense layout, optimized performance and power) opportunity for reuse in many designs hard to port them to new manifacturers or

technologies -> less and less used Examples: embedded uP or memories, DSPs Parameterization by means of module compilers

Replication of basic macrocells

© Digital Integrated Circuits2nd Design Methodologies

Hard MacroModules

25632 (or 8192 bit) SRAM, generated by hard-macro module generator (or memory compiler)• automatic layout generation• provides timing and power information• adds redundancy to deal with defects

© Digital Integrated Circuits2nd Design Methodologies

Soft MacrocellsModule with a given functionality, but without a

specific physical implementation• Placement and routing may vary from instance to instance• Timing is not predictable – wait for final layout• No advantages of full custom design, they rely on the semicustom physical design process• Ease of migration to new technologies• Structural generators: specify function and parameters, and they generate:-a netlist of standard cells-constraints for the place and route toolsCleverer structures (than logic synthesis) based on

function knowledge (e.g., multipliers)

© Digital Integrated Circuits2nd Design Methodologies

“Soft” MacroModules

Synopsys DesignCompiler

2 instances of 8x8 multiplier module with different aspect ratios

Input to Module compiler

Macrocell generator: optimized connection of standard cellsSoft approach advantages: different aspect ratios can be generated

© Digital Integrated Circuits2nd Design Methodologies

Hybrid ASIC design methodology•Macromodules have changed the semicustom design landscape: design reuse instead of designing from scratch

•Macrocells can be acquired from third-party vendors, who make the parts available through royalty or licensing agreement (Intellectual Property modules, IPs)

•Examples: embedded microprocessors, DSPs, bus interfaces (e.g., PCI), special purpose functions (FFT, ECC, MPEG dec.), graphic accelerators (GPUs)

•For an IP to be useful, it has to come with appropriate software tools, not just hardware (e.g., xdevelopment toolchain, test benches for validation)

© Digital Integrated Circuits2nd Design Methodologies

“Intellectual Property”

A Protocol stack SoC for Wireless

Tensilica Xtensa soft-core

generated from Verilog description

Hard-wired(std cells)

A typical SoC consists of a blend of design styles and modules, embedding a number of hard or soft macrocells within a sea of

standard cells

Hard Macrocells(compiler fromProcess vendor)

© Digital Integrated Circuits2nd Design Methodologies

© Digital Integrated Circuits2nd Design Methodologies

Design synthesis