understanding the tradeoffs and tuning the methodologyrtcgroup.com/arm/2007/presentations/128 - the...
TRANSCRIPT
1
Understanding the tradeoffs and Tuning the methodology
Graham Scott, Technical Lead ARM Cortex Application Processors, CadenceNandan Nayampally, Director CPU Product Marketing, ARM Inc
222
Agenda
Market drivers for the Cortex-A8
Introduction to the Cortex-A8
The range of implementation options for the Cortex-A8
Summary
333
Enabling Enhanced User Experiences
Browse any website
Handle and process high data rates
2008 2009
Handle any office email & document
Provide the power for the next gen’ 3D games
Edit & Enhance 8MP photos
Edit & Enhance captured videos
Watch any video in any format
Handle futureUser Interfaces
444
Multi-function and Convergence
555
CMC Requirements Example
Better Processor andSystem Performance
Rich Operating Systems Advanced browsersAdvanced securityGeneral applications3D graphics & gamingJava & Execution
EnvironmentsHigh bandwidth networksMulti-format audio Video Recorder / Player Voice/Video over IP
..and these pictures are from today’s ARM technology
6
Cortex-A8: High Performance Uni-Processor
High throughput processor>1GHz operation, delivering 2,000+ DMIPSLow-power implementations at under 300 mW
In-order, dual-issue, superscalar coreMMU for running virtual memory open OSThumb-2 technology NEON™ media acceleration technologyJazelle-RCT technologyTrustZone security foundationIEM™ Intelligent Energy Management and leakage controlConfigurable L1 cachesIntegrated L2 Cache
Configurable size 0K - 1MB with programmable wait-states
ECC error checking for fault-tolerance.
7
A Selection of Real Performance PointsThis part of the presentation will give real examples of the range ofperformance achievable.Signoff criteria need to be known to compare performance pointsPerformance numbers are achievable with the stated design flow and all frequency values shown are worst case (Vdd-10%, SS, 125C) Criteria Include:
OCV (10% inside WC and BC)Setup Margin (50ps)Well ties (where appropriate)Metal FillDense power gridLimited metal layer usage (65nm flows only) Holds fixedMinimal Routing ViolationsUnless stated, all physical IP available from ARM
Flow generally tuned with Performance and Power Given Equal Weight
8
Implementation FlowConstraintsFloorplanNetlist Std Cell
ModelsIP/BlockModels
postRoute Opt Concurrently Optimize Timing/SIoptDesign –postRoute –setup -si optDesign –postRoute -hold –si
PostCts Opt
N2N Opt
PreCTS Opt
Clock Synth
S.M.A.R.T. NR
clockDesignReduce Clock Uncertainty
Optional: setOptMode –usefulSkewoptDesign -preCts
Optional: setOptMode –usefulSkewoptDesign –postCts [-ilm]
Optional: optDesign –postCts –hold [-ilm]
Wire Spreading for SI and YieldConcurrent MCV insertion for Yield
tuned targets
PlacementsetPlaceMode –timingDrivenplaceDesign –inPlaceOpt
9
TSMC 90G SynthesizableFirst fully synthesizable flow for the Cortex-A8 to be distributed by ARM.
700MHz, 1400DMIPS.Worst case:Includes all margins discussed earlier
Used as baseline for following discussion
Comments:TSMC 90G was chosen for comparitive reasons:First Optimized implementation of the Cortex-A8 was implemented in this process.This process is not amenable to wireless/ mobile due to hits higher leakage however, it is still aconsideration for Consumer and Enterprise applications.
10
Use of Optimized techniquesThe Cortex-A8 processor was designed for synthesizable implementation. However, it was also partitioned so that partners could:
Apply advanced circuit design and structured implementation techniquesImprove performance, power and areaSignificantly reduce resource requirement compared to full-custom
Along with synthesizable blocks, the Optimized implementation uses:Limited set of Custom array blocks
Improve frequency and reduce powerStructured datapath blocks
Improve frequency, power and area.Advanced clocking (clock mesh)
Improves frequency and power.
11
TSMC 65LP SynthesizableFully synthesized implementation500MHz, 1000DMIPS.Leakage Power < 0.5% of 90GDynamic Power < 50% of 90G
Considerations:Always ON operation
Key requirements:Extremely low stand-by power
>100x lower than 90GEfficient high-performance
Tradeoffs:Higher-dynamic power
Higher Vdd (1.2V)Lower frequency
Slower, less-leaky transistors
Summary: Higher DMIPS/mW
Implementation TargetsIdeal for high-end mobile devicesHigh-performance with Low-leakageImplementation geared towards low-powerMethodology can be tuned to enable multiple voltage domains and added powersavings
12
TSMC 65GP Synthesizable (NVT)Considerations:
Only ON when in use (battery)Tethered (no battery)
Key requirements:High-end performanceLower dynamic power
to reduce packaging costLower area cost
Tradeoffs:Higher leakage power
Summary: Low area and power cost
Fully synthesized implementation 800MHz+, 1600+ DMIPS.Leakage Power > 50x of 65LPDynamic power < 70% of 65LP
Implementation TargetsIdeal for high-end consumer or tethereddevices High-performance with Low dynamic powerImplementation geared towards performanceMethodology tuned to achieve rightperformance/ cost (area/power) balance
13
TSMC 65GP Synthesizable (LVT)Fully synthesized implementationFurther Boost in performance over NVT implementationLeakage Power > 100% of 65LPDynamic Power < 60% of 65LP
Considerations:Only ON when in use (battery)Tethered (no battery)Extra Performance
Key requirements:High-end performanceLower dynamic power
to reduce packaging costLower area cost
Tradeoffs:Higher leakage power again
Summary: Low area and power cost
Implementation TargetsIdeal for tethered consumer and enterprisedevices High-performance with Low dynamic powerImplementation tuned for performanceMethodology tuned to achieve maintain orincrease performance while reducing areaand dynamic power
14
SummaryThe Cortex-A8 has a range of implementation choices
Optimized implementation techniques provide best performance, area and powerSynthesized implementations can also provide a high level of performance while maintaining aggressive power goals.
Depending on application, the choice of process and tuning of the methodology, can help the user achieve the ideal performance/costtradeoff.