arm advantage 物理ip 技术 增加你的处理器性能 · partners with a simple, deterministic...
TRANSCRIPT
11
Increasing Your Processor Performance with ARM Advantage
Memories and Standard Cells
常骊波常骊波常骊波常骊波
ARM中国中国中国中国2007年年年年12月月月月
ARM Advantage 物理物理物理物理IP技术技术技术技术增加你的处理器性能增加你的处理器性能增加你的处理器性能增加你的处理器性能
22
ARM966E-S™
ARM1026EJ-S™
2005
DM
IPS
250
300
500
ARM7TDMI®
100
ARM946E-S™
Cortex-M3
ARM968E-S™
600
ARM926EJ-S™
Cortex-A8
1000+
ARM1176JZF-S™
ARM1136EJ-S™
2000+
2006
ARM® Cortex™“Intelligent Computing”
ARM11™ MPCore™ x4
ARM1156T2F-S™
ARM7TDMI-S™
ARM7EJ-S™
广泛的广泛的广泛的广泛的ARM处理器供您选择处理器供您选择处理器供您选择处理器供您选择
Cortex-R4
Cortex-A9 ™
33
但是但是但是但是….
Are you getting the optimum benefit?
44
或者您所面对的正是或者您所面对的正是或者您所面对的正是或者您所面对的正是 ?A fast processor with slow memory is like driving a sports car in
heavy traffic….
55
选择正确的处理器选择正确的处理器选择正确的处理器选择正确的处理器,,,,然后然后然后然后…A
RM
117
6JZ
F-S +
The right ARM core Optimized ARM Physical IP
WINNER
66
ARM Processor Performance Package� The ‘Processor Performance Package’ (PPP) is ARM Artisan
Physical IP that is optimized for use with high performance ARM processors.
� Specially designed and optimized Memory Instances for Processor memory
� High Performance Advantage-HS 12 track standard cell library
� Floor planning guidelines and other configuration files for “out of the box” implementation
77
为什么选择为什么选择为什么选择为什么选择 PPP ?
� Physical implementation of the processor determines system throughput .� Choice of cell library affects power and area numbers.
� Processor memory performance impacts system performance .
� PPP provides for up to 20% performance increase over mainstream Advantage memories� With minimal impact on dynamic power
� Very little area impact.
� Floor planning guidelines & other ARM documentation make implementation simple.
88
设计流程设计流程设计流程设计流程
ARM1176JZ[F]-S
ConfigurationStep 1
99
设计流程设计流程设计流程设计流程 - Processor Configuration
� ARM 1176JZ[F]-S configuration� Use the verilog memory wrappers to connect the processor signals to
the fast memory instances.
� Use the Clock gating cell provided in the PPP to implement high level architectural clock gating .
� Validate the connections between the processor and the memory instances using the test bench provided.
1010
设计流程设计流程设计流程设计流程
ARM1176JZ[F]-S
Configuration
Prepare Libraries for EDA Flow
Step 1
Step 2
1111
设计流程设计流程设计流程设计流程 - Prepare EDA Libraries
� ARM 1176JZ[F]-S configuration� Use the verilog memory wrappers to connect the processor signals to
the fast memory instances.
� Use the Clock gating cell provided in the PPP to implement high level architectural clock gating .
� Validate the connections between the processor and the memory instances using the test bench provided.
� Prepare EDA libraries� Use the scripts provided to generate the Synopsys Milkyway views,
Cadence VoltageStorm views and Magma Volcano views.
1212
设计流程设计流程设计流程设计流程
ARM1176JZ[F]-S
Configuration
Prepare Libraries for EDA Flow
Perform Implementation
Step 1
Step 2
Step 3
1313
设计流程设计流程设计流程设计流程 - Perform Implementation
� ARM 1176JZ[F]-S configuration� Use the verilog memory wrappers to connect the processor signals to
the fast memory instances.
� Use the Clock gating cell provided in the PPP to implement high level architectural clock gating .
� Validate the connections between the processor and the memory instances using the test bench provided.
� Prepare EDA libraries� Use the scripts provided to generate the Synopsys Milkyway views
Cadence VoltageStorm views and Magma Volcano views.
� Perform Implementation� Request backend views for GDS2 stream out and transistor level
DRC/LVS analysis from ARM
1414
Library Preparation
Standard Cell Library Preparation
For
Synopsys
flow
For
Cadence
flow
For
Magma
flow
Memory Library Preparation
For
Synopsys
flow
For
Cadence
flow
For
Magma
flow
1515
Library Preparation
� Standard Cell Library Preparation� For a Synopsys flow, Milkyway libraries of the standard cells are
provided as part of the Advantage-HS standard cell library.
� For a Cadence flow, VoltageStorm views can be generated using the scripts provided .
� For a Magma flow, Scripts are provided for generating the views for both standard cells and memories
� Memory Library Preparation� Scripts are provided for generating Synopsys Milkyway , Cadence
VoltageStorm and Magma Volcano views.
1616
Implementation-Synopsys flow
1717
Implementation-Cadence Flow
1818
Implementation-Magma FlowARM Processor IPARM 1176JZ(F)S
Technology File
(65LP from TSMC)
RC Rules(65LP from TSMC)
Tool specificATPG libraries for memories
Verilog Libraries for Std cells(tsmc65lp_rvt_sc_adv12.v)
Volcano of Std cellsand Memories
Logical/Physical Synthesis
Blast Create, Talus Design
Place & Route
Blast Fusion, Talus Vortex
Signoff Timing & Noise Analysis
Quartz Time
ATPG
Talus ATPG
Signoff Parasitic Extraction
Quartz RC
1919
ARM Reference Methodology (iRM)� ARM Reference Methodologies are designed to provide ARM
Partners with a simple, deterministic and rapid route from RTL to GDSII
� The iRM takes a configured RTL representation of an ARM core and performs implementation to a cell level DRC/LVS clean representation
� It provides an accompanying set of models for specific characteristics( timing,test,physical) of the final implementation
� The Processor Performance Package can be easily integrated into an iRM if higher achievable performance or cache configuration changes are required
2020
ARM 1176JZ[F]-S Performance Package for TSMC65LP
86.70 µWStatic Power
0.363 mW/MHzDynamic Power
1.80mm2Area
506MHzFrequency
Nominal Vt onlyFrequency data from PrimeTime-SI @ ss,1.08V, 125C (un-margined)Power results Dhrystone @ tt, 1.2V, 25CArea includes RAM @ 84% utilization
2121
Performance without Penalty
ARM Validated deliverablesReduce Risk
Standard cell architecture and memory access timing is critical to CPU speed
� Optimized memory’s improve access timing without compromising area.
� Advantage-HS 12 track standard cell architecture is designed for high performance
20% Performance increase.
automem configuration script for synthesis supporting cache sizes :
8K/8K, 16K/16K, 32K/32K
Reduce time to market
Using Lvt to achieve equivalent speed can add up to 5% wafer cost + additional mask cost.
Save $
FeatureBenefit
2222
ARM1176 Performance Package deliverables
� ARM Advantage-HS standard cell library. (CLN65LP)� 12 Track high cell architecture for high performance
� Large cell set with over 900 cells and fine drive strength granularity
� Multiple beta ratios for often used cells enabling power/performance optimization
� Robust power rail architecture to support high performance designs
� Pre-Configured RAM instances for All Cache configurations� Performance numbers achieved using Rvt only
� DFT views provided Fastscan and Tetramax
� Documentation includes : � Automatic Memory Configuration for L1 Cache Instances (8K/8K,
16K/16K, 32K/32K, only)
� Guidelines on the integration of TCM memories.
� Library preparation for Synopsys, Cadence and Magma EDA tools flow
� Floor planning guidelines and references to other ARM documentation
2323
所支持的所支持的所支持的所支持的ARM处理器处理器处理器处理器
� ARM 926
� Cortex-R4
� Cortex-A8
� Cortex-A9
2424
挑战挑战挑战挑战 – Implementation Ranges
WANTEDHigher performance
WANTEDLower power
Higher area density
Nominalperformance
200
250
300
350
400
150 200 250 300 350mW
MH
zYou can accomplish all these with the Processor Performance Package and other ARM Physical IP
2525
移动应用移动应用移动应用移动应用
� High speed required for embedded processor (~650MHz)
� High density for rest of the SoC (~300MHz)
� Aggressive power management� Low leakage “LP/LL” processes
� Multi-VT designs
� Low voltage operation
� Retention and shutdown modes
� Processor Performance Package is the best choice for the higher-speed ARM processors
� Advantage memory is the most appropriate choice for the high-speed section
� Metro memory is the most appropriate choice for the high density section
2626
办公或企业应用办公或企业应用办公或企业应用办公或企业应用
� High speed required over the entire chip (>750MHz)
� Typically use G or high-speed processes
� Speed is the key criterion� Processor Performance package offers the ideal solution
� Setup time + access time
� Memories need to support pipelined outputs for better timing
� High-capacity memories are also required� 2-4Mbits of contiguous SRAM
� Advantage & Advantage-HS memory with pipelined outputs is the most appropriate choice
� In some cases, low VT devices may be used in the periphery to further improve access time
� Large SRAMs greater than 1Mbit are also required
2727
高性能消费应用高性能消费应用高性能消费应用高性能消费应用� High speed required for embedded processor (~650MHz)
� High density for rest of the SoC (~300MHz)
� Moderate power management� G or low leakage “LP/LL” processes
� Multi-VT designs
� Voltage islands
� Large memories may be required� Up to 4Mbits of single-port SRAM
� Advantage memory with mixed VT periphery is the most appropriate choice for the high-speed section
� Metro memory with mixed VT periphery is the most appropriate choice for the high-density section
� SRAMs larger than 1Mbit are available as instances
2828
低成本消费应用低成本消费应用低成本消费应用低成本消费应用� Moderate speed required over entire SoC (<300 MHz)
� High density required for entire SoC
� Moderate power management� Low leakage “LP/LL” processes� Multi-VT designs � Voltage islands
� Low speed subsegment (< 100MHz)� Very low leakage requirements� Low voltage operation
� Metro memory with mixed VT periphery is the most appropriate choice for the moderate speed segment
� Metro memory with all high VT periphery is the most appropriate choice for the low speed segment
� Memory power management should be used across the chip
2929
� All of the options needed to give the optimum PPA trade-off
� Available at multiple Vt
� PMK for low-power at nominal Vt (RVt)
� Advantage-HS (LVt) with Cortex-A8 for maximum performance in consumer devices
� 65nm platforms available for TSMC and Common Platform
65nm High Performance PlatformProductStandard Cells
Advantage SC 10T RVt, HVt, LVtAdvantage PMK 10T RVtMetro SC 8T RVt, HVtMetro PMK 8T RVtAdvantage SC 12T RVt, HVt, LVtAdvantage PMK 12T RVt
Memory GeneratorsAdvantage SRAM-SP 64 Rows/BankAdvantage SRAM-DPAdvantage RF-SPAdvantage RF-2PAdvantage ROM-VIAMetro SRAM-SP 128 Rows/BankemBISTRx
I/O ProductsLVDS 850 MHz, 2.5VHSTL Class I/II 2.5VDDR1/2 flip-chipDDR1/2 wire-bond 2.5V - CUP
High Speed Serial PHYsPCI Express 1.1PCI Express 2.0Xuai 3.125GbpsCEI Short-Reach 6.4Gbps10G
3030
45nm Low Power Mobile Platform
45nm platform based on IBM CMOS11LP and TSMC 45GS platform also available for licensing today
� Manufacturability becoming major issue
� Yield, variability, test/repair
� Increased investment will pay off as reduces cost for high-volume devices
� Cortex-A9 with PMK delivers high-performance and low-power for Connected Mobile Computers
Standard CellsMetro SC 9T RVt, HVt, LVtMetro PMK 9T RVt, HVt, LVtAdvantage SC 12T RVt, HVt, LVtAdvantage PMK 12T RVt, HVt, LVt
Memory GeneratorsAdvantage SRAM-SP (Large Bit cell) 64 / 128 R/BAdvantage SRAM-SP (Small Bit Cell) 64 / 128 R/BAdvantage SRAM-DP 64 / 128 R/BAdvantage RF-SP 128 R/BAdvantage RF-2P 128 R/BAdvantage ROM-VIA 64 R/B
Memory Self-Test and RepairemBISTRx
I/O Products - Inline/StaggeredGPIO Programmable LVDS SSTL_18 SSTL_2 USB 1.1 PCI-X HSTL Class I/II
DDR ProductsMDDR
3131
结论结论结论结论
� ARM Cell Libraries and Memories give you a predictable route to silicon with a industry standard methodology.
� The ARM Processor Performance Package helps you get the best PPA performance out of your ARM processor.
� Reference methodology and other ARM documents make implementation an easy task
� You can target a variety of application using the Processor Performance package combined with other ARM Physical IP.