l1: application specific integrated circuits: introduction jun-dong cho sungkyunkwan univ. dept. of...
TRANSCRIPT
L1: Application Specific Integrated Circuits:
Introduction
Jun-Dong ChoSungKyunKwan Univ.
Dept. of ECE, Vada Lab. http://vada.skku.ac.kr
VLSI Algorithmic Design Automation Lab.
2
Contents
Why ASIC? Introduction to System On Chip Design Hardware and Software Co-design Low Power ASIC Designs
VLSI Algorithmic Design Automation Lab.
3
Why ASIC - Design productivity grows!Complexity increase 40 % per year Design productivity increase 15 % per year
Integration of PCB on single die
VLSI Algorithmic Design Automation Lab.
4
Silicon in 2010Die Area: 2.5x2.5 cmVoltage: 0.6 VTechnology: 0.07 m
Density Access Time(Gbits/cm2) (ns)
DRAM 8.5 10DRAM (Logic) 2.5 10SRAM (Cache) 0.3 1.5
Density Max. Ave. Power Clock Rate(Mgates/cm2) (W/cm2) (GHz)
Custom 25 54 3Std. Cell 10 27 1.5
Gate Array 5 18 1Single-Mask GA 2.5 12.5 0.7
FPGA 0.4 4.5 0.25
VLSI Algorithmic Design Automation Lab.
5
ASIC Principles Value-added ASIC for huge volume opportunities;
standard parts for quick time to market applications Economics of Design
Fast Prototyping, Low Volume Custom Design, Labor Intensive, High Volume
CAD Tools Needed to Achieve the Design Strategies System-level design: Concept to VHDL/C Physical design VHDL/C to silicon, Timing closure (Monterey,
Magma, Synopsys, Cadence, Avant!) Design Strategies: Hierarchy; Regularity; Modularity;
Locality
VLSI Algorithmic Design Automation Lab.
6
ASIC Design Strategies Design is a continuous tradeoff to achieve
performance specs with adequate results in all the other parameters.
Performance Specs - function, timing, speed, power
Size of Die - manufacturing cost Time to Design - engineering cost and schedule Ease of Test Generation & Testability -
engineering cost, manufacturing cost, schedule
VLSI Algorithmic Design Automation Lab.
7
ASIC Flow
VLSI Algorithmic Design Automation Lab.
8
Structured ASIC Designs Hierarchy: Subdivide the design into many levels
of sub-modules Regularity: Subdivide to max number of similar
sub-modules at each level Modularity: Define sub-modules unambiguously &
well defined interfaces Locality: Max local connections, keeping critical
paths within module boundaries
VLSI Algorithmic Design Automation Lab.
9
ASIC Design Options Programmable Logic Programmable Interconnect Reprogrammable Gate Arrays Sea of Gates & Gate Array Design Standard Cell Design Full Custom Mask Design
Symbolic Layout Process Migration - Retargeting Designs
VLSI Algorithmic Design Automation Lab.
10
ASIC Design Methodologies
D ensity
P erfo rm ance
F lex ibi l i ty
D esign tim e
M anu factu ring tim e
C o st - lo w vo lum e
C o st - h igh vo lum e
C ustomC ustom
V ery H igh
V ery H igh
V ery H igh
V ery L ong
V ery H igh
L o w
M edium
C ell-based
H igh
H igh
H igh
H igh
L o w
Sho rt
M edium
P red iffused
L o w
H igh
H igh
H igh
Sho rt
Sho rt
M edium
P rew ired
L o w
H igh
L o w
V ery Sho rt
V ery Sho rt
M edium - L o w
M edium - L o w
VLSI Algorithmic Design Automation Lab.
11
Why SOC?
• SOC specs are coming from ICT system engineers rather
than RTL descriptions
•SOC will bridge the gap b/w s/w and their implementation
in novel, energy-efficient silicon architecture.
•In SOC design, chips are assembled at IP block level (design reusable) and IP interfaces rather than gate level
VLSI Algorithmic Design Automation Lab.
12
Common Fabric for IP Blocks Soft IP blocks are portable, but not as predictable as
hard IP. Hard IP blocks are very predictable since a specific
physical implementation can be characterized, but are hard to port since are often tied to a specific process.
Common fabric is required for both portability and predictability.
Wide availability: Cell Based Array, metal programmable architecture that provides the performance of a standard cell and is optimized for synthesis.
VLSI Algorithmic Design Automation Lab.
13
Four main applications
Set-top box: Mobile multimedia system, base station for the home local-area network.
Digital PCTV: concurrent use of TV,3D graphics, and Internet services
Set-top box LAN service: Wireless home-networks, multi-user wireless LAN
Navigation system: steer and control traffic and/or goods-transportation
VLSI Algorithmic Design Automation Lab.
14
PC-Multimedia Applications
VLSI Algorithmic Design Automation Lab.
15Types of System-on-a-Chip Designs
VLSI Algorithmic Design Automation Lab.
16
Physical gap
Timing closure problem: layout-driven logic and RT-level synthesis
Energy efficiency requires locality of computation and storage: match for stream-based data processing of speech,images, and multimedia-system packets.
Next generation SOC designers must bridge the architectural gap b/w system specification and energy-efficient IP-based architectures, while CAE vendors and IP providers will bridge the physical gap.
VLSI Algorithmic Design Automation Lab.
17
Circular Y-Chart
VLSI Algorithmic Design Automation Lab.
18
SOC Co-Design Challenges Current systems are complex and heterogenous
Contain many different types of components Half of the chip can be filled with 200 low-power,
RISC-like processors (ASIP) interconnected by field-programmable buses, embedded in 20Mbytes of distributed DRAM and flash memory, Another Half: ASIC
Computational power will not result from multi-GHz clocking but from parallelism, with below 200 MHz. This will greatly simplify the design for correct timing, testability, and signal integrity.
VLSI Algorithmic Design Automation Lab.
19
Bridging the architectural gap One-M gate reconfigurable, one-M gate hardwired
logic. 50GIPS for programmable components or 500
GIPS for dedicated hardwares Product reliability: design at a level far above the
RT level, with reuse factors in excess of 100 Trade-off: 100MOPs/watt (microprocessor)
100GOPs/watt (hardwired) Reconf. Computing with a large number of computing nodes and a very restricted instruction set (Pleiades)
VLSI Algorithmic Design Automation Lab.
20
Why Lower Power
Portable systems long battery life light weight small form factor
IC priority list power dissipation cost performance
Technology direction Reduced voltage/power
designs based on mature high performance IC technology, high integration to minimize size, cost, power, and speed
VLSI Algorithmic Design Automation Lab.
21
year
Power(W)
1980 1985 1990 1995 2000
10
20
30
40
50
5
15
25
35
45
i286i386 DX 16 i486 DX25
i486 DX 50
i486 DX2 66 P-PC601 50
P6 166
P5 66
Alpha21064 200
Alpha 21164
i486 DX4 100
P II 300
P-PC604 133
P-PC750 400
P III 500
Alpha 21264
Microprocessor Power Dissipation
VLSI Algorithmic Design Automation Lab.
22
Levels for Low Power DesignSystem
Algorithm
Architecture
Circuit/Logic
Technology
Hardware-software partitioning,
Complexity, Concurrency, Locality,
Parallelism, Pipelining, Signal correlations
Sizing, Logic Style, Logic Design
Threshold Reduction, Scaling, Advanced packaging
Possible Power Savings at Different Design LevelsLevel of
Abstraction Expected Saving
Algorithm
Architecture
Logic Level
Layout Level
Device Level
10 - 100 times
10 - 90%
20 - 40%
10 - 30%
10 - 30%
Regularity, Data representation
Instruction set selection, Data rep.
SOI
Power down
VLSI Algorithmic Design Automation Lab.
23
Power-hungry Applications
Signal Compression: HDTV Standard, ADPCM, Vector Quantization, H.263, 2-D motion estimation, MPEG-2 storage management
Digital Communications: Shaping Filters, Equalizers, Viterbi decoders, Reed-Solomon decoders
VLSI Algorithmic Design Automation Lab.
24
New Computing Platforms
SOC power efficiency more than 10GOPs/w Higher On Chip System Integration: COTS: 100W,
SOAC:10W (inter-chip capacitive loads, I/O buffers) Speed & Performance: shorter interconnection,fewer
drivers,faster devices,more efficient processing artchitectures
Mixed signal systems Reuse of IP blocks Multiprocessor, configurable computing Domain-specific, combined memory-logic
2P kCFV
VLSI Algorithmic Design Automation Lab.
25
Three Factors affecting Energy– Reducing waste by Hardware Simplification:
redundant h/w extraction, Locality of reference,Demand-driven / Data-driven computation,Application-specific processing,Preservation of data correlations, Distributed processing
– All in one Approach(SOC): I/O pin and buffer reduction– Voltage Reducible Hardwares
2-D pipelining (systolic arrays) SIMD:Parallel Processing:useful for data w/ parallel
structure VLIW: Approach- flexible
VLSI Algorithmic Design Automation Lab.
26
IBM’s PowerPC Lower Power Architecture Optimum Supply Voltage through Hardware Parallel, Pipelining ,Parallel instruction
execution 603e executes five instruction in parallel (IU, FPU, BPU, LSU, SRU) FPU is pipelined so a multiply-add instruction can be issued every clock cycle Low power 3.3-volt design
Use small complex instruction with smaller instruction length IBM’s PowerPC 603e is RISC
Superscalar: CPI < 1 603e issues as many as three instructions per cycle
Low Power Management 603e provides four software controllable power-saving modes.
Copper Processor with SOI IBM’s Blue Logic ASIC :New design reduces of power by a factor of 10 times
VLSI Algorithmic Design Automation Lab.
27
Power-Down Techniques
◆ Lowering the voltage along with the clock actually alters the energy-per-operation of the microprocessor, reducing the energy required to perform a fixed amount of work
VLSI Algorithmic Design Automation Lab.
28
Implementing Digital Systems
VLSI Algorithmic Design Automation Lab.
29
H/W and S/W Co-design
VLSI Algorithmic Design Automation Lab.
30
Three Co-Design Approaches IFIP International Conference FORTE/PSTV’98, Nov.’98 N.S. Voros et.al, “Hardware -software co-design of embedded
systems using multiple formalisms for application development” ASIP co-design: starts with an application, builds a specific
programmable processor and translates the application into software code. H/w and s/w partitioning includes the instruction set design.
H/w s/w synchronous system co-design: s/w processor as a master controller, and a set of h/w accelerators as co-processors. Vulcan,Codes,Tosca,Cosyma
H/w s/w for distributed systems: mapping of a set of communication processors onto a set of interconnected processors. Behavioral decomposition, process allocation and communication
transformation. Coware(powerful),Siera (reuse),Ptolemy (DSP)
VLSI Algorithmic Design Automation Lab.
31
Mixing H/W and S/W Argument: Mixed hardware/ software systems
represent the best of both worlds.High performance, flexibility, design reuse, etc.
Counterpoint: From a design standpoint, it is the worst of both worlds
Simulation: Problems of verification, and test become harder
Interface: Too many tools, too many interactions, too much heterogeneity
Hardware/ software partitioning is “AI- complete”!
VLSI Algorithmic Design Automation Lab.
32
Low power partitioning approach
Different HW resources are invoked according to the instruction executed at a specific point in time
During the execution of the add op., ALU and register are used, but Multiplier is in idle state.
Non-active resources will still consume energy since the according circuit continue to switch
Calculate wasting energy Adding application specific core and partial
running Whenever one core performing, all the other
cores are shut down
VLSI Algorithmic Design Automation Lab.
33
ASIP Design Given a set of applications, determine micro
architecture of ASIP (i. e., configuration of functional units in datapaths, instruction set)
To accurately evaluate performance of processor on a given application need to compile the application program onto the processor datapath and simulate object code.
The micro architecture of the processor is a design parameter!
VLSI Algorithmic Design Automation Lab.
34
ASIP Design Flow
VLSI Algorithmic Design Automation Lab.
35
Cross-Disciplinary nature
Software for low power:loop transformation leads to much higher temporal and spatial locality of data.
Code size becomes an important objective Software will eventually become a part of the chip
Behavior-platform-compiler codesign: codesigned with C++ or JAVA, describing their h/w and s/w implementation.
Multidisciplinary system thinking is required for future designs (e.g., Eindhoven Embedded Systems Institute http://www.eesi.tue.nl/english)
VLSI Algorithmic Design Automation Lab.
36
VLSI Signal Processing Design Methodology
pipelining, parallel processing, retiming, folding, unfolding, look-ahead, relaxed look-ahead, and approximate filtering
bit-serial, bit-parallel and digit-serial architectures, carry save architecture
redundant and residue systems Viterbi decoder, motion compensation, 2D-
filtering, and data transmission systems
VLSI Algorithmic Design Automation Lab.
37
Low Power DSP DO-LOOP Dominant
VSELP Vocoder : 83.4 %2D 8x8 DCT : 98.3 %LPC computation : 98.0 %
DO-LOOP Power Minimization ==> DSP Power Minimization
VSELP : Vector Sum Excited Linear PredictionLPC : Linear Prediction Coding
VLSI Algorithmic Design Automation Lab.
38
Deep-Submicron Design Flows Rapid evaluation of complex designs for area and
performance Timing convergence via estimated routing
parasitics In-place timing repair without resynthesis Shorter design intervals, minimum iterations Block-level design and place and route Localized changes without disturbance Integration of complex projects and design reuse
VLSI Algorithmic Design Automation Lab.
39
SOC CAD Companies Avant! www.avanticorp.com Cadence www.cadence.com Duet Tech www.duettech.com Escalade www.escalade.com Logic visions
www.logicvision.com Mentor Graphics
www.mentor.com Palmchip www.palmchip.com Sonic www.sonicsinc.com Summit Design www.summit-
design.com
Synopsys www.synopsys.com
Topdown design solutions www.topdown.com
Xynetix Design Systems www.xynetix.com
Zuken-Redac www.redac.co.uk