understanding the tradeoffs and tuning the methodologyrtcgroup.com/arm/2007/presentations/128 - the...

14
1 Understanding the tradeoffs and Tuning the methodology Graham Scott, Technical Lead ARM Cortex Application Processors, Cadence Nandan Nayampally, Director CPU Product Marketing, ARM Inc

Upload: lamkhuong

Post on 28-Jul-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Understanding the tradeoffs and Tuning the methodologyrtcgroup.com/arm/2007/presentations/128 - The ARM Cortex-A8... · Understanding the tradeoffs and Tuning the methodology Graham

1

Understanding the tradeoffs and Tuning the methodology

Graham Scott, Technical Lead ARM Cortex Application Processors, CadenceNandan Nayampally, Director CPU Product Marketing, ARM Inc

Page 2: Understanding the tradeoffs and Tuning the methodologyrtcgroup.com/arm/2007/presentations/128 - The ARM Cortex-A8... · Understanding the tradeoffs and Tuning the methodology Graham

222

Agenda

Market drivers for the Cortex-A8

Introduction to the Cortex-A8

The range of implementation options for the Cortex-A8

Summary

Page 3: Understanding the tradeoffs and Tuning the methodologyrtcgroup.com/arm/2007/presentations/128 - The ARM Cortex-A8... · Understanding the tradeoffs and Tuning the methodology Graham

333

Enabling Enhanced User Experiences

Browse any website

Handle and process high data rates

2008 2009

Handle any office email & document

Provide the power for the next gen’ 3D games

Edit & Enhance 8MP photos

Edit & Enhance captured videos

Watch any video in any format

Handle futureUser Interfaces

Page 4: Understanding the tradeoffs and Tuning the methodologyrtcgroup.com/arm/2007/presentations/128 - The ARM Cortex-A8... · Understanding the tradeoffs and Tuning the methodology Graham

444

Multi-function and Convergence

Page 5: Understanding the tradeoffs and Tuning the methodologyrtcgroup.com/arm/2007/presentations/128 - The ARM Cortex-A8... · Understanding the tradeoffs and Tuning the methodology Graham

555

CMC Requirements Example

Better Processor andSystem Performance

Rich Operating Systems Advanced browsersAdvanced securityGeneral applications3D graphics & gamingJava & Execution

EnvironmentsHigh bandwidth networksMulti-format audio Video Recorder / Player Voice/Video over IP

..and these pictures are from today’s ARM technology

Page 6: Understanding the tradeoffs and Tuning the methodologyrtcgroup.com/arm/2007/presentations/128 - The ARM Cortex-A8... · Understanding the tradeoffs and Tuning the methodology Graham

6

Cortex-A8: High Performance Uni-Processor

High throughput processor>1GHz operation, delivering 2,000+ DMIPSLow-power implementations at under 300 mW

In-order, dual-issue, superscalar coreMMU for running virtual memory open OSThumb-2 technology NEON™ media acceleration technologyJazelle-RCT technologyTrustZone security foundationIEM™ Intelligent Energy Management and leakage controlConfigurable L1 cachesIntegrated L2 Cache

Configurable size 0K - 1MB with programmable wait-states

ECC error checking for fault-tolerance.

Page 7: Understanding the tradeoffs and Tuning the methodologyrtcgroup.com/arm/2007/presentations/128 - The ARM Cortex-A8... · Understanding the tradeoffs and Tuning the methodology Graham

7

A Selection of Real Performance PointsThis part of the presentation will give real examples of the range ofperformance achievable.Signoff criteria need to be known to compare performance pointsPerformance numbers are achievable with the stated design flow and all frequency values shown are worst case (Vdd-10%, SS, 125C) Criteria Include:

OCV (10% inside WC and BC)Setup Margin (50ps)Well ties (where appropriate)Metal FillDense power gridLimited metal layer usage (65nm flows only) Holds fixedMinimal Routing ViolationsUnless stated, all physical IP available from ARM

Flow generally tuned with Performance and Power Given Equal Weight

Page 8: Understanding the tradeoffs and Tuning the methodologyrtcgroup.com/arm/2007/presentations/128 - The ARM Cortex-A8... · Understanding the tradeoffs and Tuning the methodology Graham

8

Implementation FlowConstraintsFloorplanNetlist Std Cell

ModelsIP/BlockModels

postRoute Opt Concurrently Optimize Timing/SIoptDesign –postRoute –setup -si optDesign –postRoute -hold –si

PostCts Opt

N2N Opt

PreCTS Opt

Clock Synth

S.M.A.R.T. NR

clockDesignReduce Clock Uncertainty

Optional: setOptMode –usefulSkewoptDesign -preCts

Optional: setOptMode –usefulSkewoptDesign –postCts [-ilm]

Optional: optDesign –postCts –hold [-ilm]

Wire Spreading for SI and YieldConcurrent MCV insertion for Yield

tuned targets

PlacementsetPlaceMode –timingDrivenplaceDesign –inPlaceOpt

Page 9: Understanding the tradeoffs and Tuning the methodologyrtcgroup.com/arm/2007/presentations/128 - The ARM Cortex-A8... · Understanding the tradeoffs and Tuning the methodology Graham

9

TSMC 90G SynthesizableFirst fully synthesizable flow for the Cortex-A8 to be distributed by ARM.

700MHz, 1400DMIPS.Worst case:Includes all margins discussed earlier

Used as baseline for following discussion

Comments:TSMC 90G was chosen for comparitive reasons:First Optimized implementation of the Cortex-A8 was implemented in this process.This process is not amenable to wireless/ mobile due to hits higher leakage however, it is still aconsideration for Consumer and Enterprise applications.

Page 10: Understanding the tradeoffs and Tuning the methodologyrtcgroup.com/arm/2007/presentations/128 - The ARM Cortex-A8... · Understanding the tradeoffs and Tuning the methodology Graham

10

Use of Optimized techniquesThe Cortex-A8 processor was designed for synthesizable implementation. However, it was also partitioned so that partners could:

Apply advanced circuit design and structured implementation techniquesImprove performance, power and areaSignificantly reduce resource requirement compared to full-custom

Along with synthesizable blocks, the Optimized implementation uses:Limited set of Custom array blocks

Improve frequency and reduce powerStructured datapath blocks

Improve frequency, power and area.Advanced clocking (clock mesh)

Improves frequency and power.

Page 11: Understanding the tradeoffs and Tuning the methodologyrtcgroup.com/arm/2007/presentations/128 - The ARM Cortex-A8... · Understanding the tradeoffs and Tuning the methodology Graham

11

TSMC 65LP SynthesizableFully synthesized implementation500MHz, 1000DMIPS.Leakage Power < 0.5% of 90GDynamic Power < 50% of 90G

Considerations:Always ON operation

Key requirements:Extremely low stand-by power

>100x lower than 90GEfficient high-performance

Tradeoffs:Higher-dynamic power

Higher Vdd (1.2V)Lower frequency

Slower, less-leaky transistors

Summary: Higher DMIPS/mW

Implementation TargetsIdeal for high-end mobile devicesHigh-performance with Low-leakageImplementation geared towards low-powerMethodology can be tuned to enable multiple voltage domains and added powersavings

Page 12: Understanding the tradeoffs and Tuning the methodologyrtcgroup.com/arm/2007/presentations/128 - The ARM Cortex-A8... · Understanding the tradeoffs and Tuning the methodology Graham

12

TSMC 65GP Synthesizable (NVT)Considerations:

Only ON when in use (battery)Tethered (no battery)

Key requirements:High-end performanceLower dynamic power

to reduce packaging costLower area cost

Tradeoffs:Higher leakage power

Summary: Low area and power cost

Fully synthesized implementation 800MHz+, 1600+ DMIPS.Leakage Power > 50x of 65LPDynamic power < 70% of 65LP

Implementation TargetsIdeal for high-end consumer or tethereddevices High-performance with Low dynamic powerImplementation geared towards performanceMethodology tuned to achieve rightperformance/ cost (area/power) balance

Page 13: Understanding the tradeoffs and Tuning the methodologyrtcgroup.com/arm/2007/presentations/128 - The ARM Cortex-A8... · Understanding the tradeoffs and Tuning the methodology Graham

13

TSMC 65GP Synthesizable (LVT)Fully synthesized implementationFurther Boost in performance over NVT implementationLeakage Power > 100% of 65LPDynamic Power < 60% of 65LP

Considerations:Only ON when in use (battery)Tethered (no battery)Extra Performance

Key requirements:High-end performanceLower dynamic power

to reduce packaging costLower area cost

Tradeoffs:Higher leakage power again

Summary: Low area and power cost

Implementation TargetsIdeal for tethered consumer and enterprisedevices High-performance with Low dynamic powerImplementation tuned for performanceMethodology tuned to achieve maintain orincrease performance while reducing areaand dynamic power

Page 14: Understanding the tradeoffs and Tuning the methodologyrtcgroup.com/arm/2007/presentations/128 - The ARM Cortex-A8... · Understanding the tradeoffs and Tuning the methodology Graham

14

SummaryThe Cortex-A8 has a range of implementation choices

Optimized implementation techniques provide best performance, area and powerSynthesized implementations can also provide a high level of performance while maintaining aggressive power goals.

Depending on application, the choice of process and tuning of the methodology, can help the user achieve the ideal performance/costtradeoff.