power-aware architecture
DESCRIPTION
Power-Aware Architecture. 林光輝 D87921034 鄭伯壎 D90943006 陳盈貝 D90943004. 資料來源 : ISSCC 2003 Microprocessor Workshop. 2003 年 6 月 3 日. Battery technology Linear improvements, nowhere near the exponential power increases we ’ ve seen Cooling techniques Air-cooled is reaching limits - PowerPoint PPT PresentationTRANSCRIPT
Power-Aware Architecture
林光輝 D87921034
鄭伯壎 D90943006
陳盈貝 D90943004
2003 年 6 月 3日
資料來源 : ISSCC 2003 Microprocessor Workshop
Why worry about power dissipation• Battery technology
– Linear improvements, nowhere near the exponential power increases we’ve seen
• Cooling techniques– Air-cooled is reaching limits– Fans often undesirable (noise,
weight, expense)– $1 per chip per Watt when
operating in the >40W realm – Water-cooled ?!?
• Environment– US EPA: significant % of current
electricity usage in US is directly due to desktop computers
– Increasing fast. And doesn’t count embedded systems, Printers, UPS backup?
• Past: – Power important for laptops,
cell phones• Present:
– Power a Critical, Universal design constraint even for very high-end chips
• Circuits and process scaling can no longer solve all power problems.– SYSTEMS must also be
power-aware– Architecture, OS, compilers
Notebook Power Usage Stats
52%
12%2%
18%
16%
Motherboard
Hard Disk
Floppy Disk
LCD/VGA
Power Supply
1995 5V Notebook PCFrom Roy, 1997From Roy, 1997
Processor Power Pie-Chart • High performance processors (prior/current generation)
typically burn most of their power in the clocked latches and arrays (registers, caches).
9%
46%
12%
4%
28%
1%
Clks DistLatchesLogicIOArraysother
Pre-silicon ckt-sim based; assumes: no clock-gating
Example data
(taken from: Bose, Martonosi, Brooks: Sigmetrics-2001 Tutorial)
Power-Performance efficiency
• Performance metrics:– delay (execution time) per instruction; MIPS
• CPI (cycles per instr): abstracts out the MHz• SPEC (int or fp); TPM: factors in benchmark, MHz
• power and energy metrics:– watts (W) and joules (J=W*sec)
• joint metric possibilities (perf and power)– Watts (W): for ultra LP processors; also, thermal issues– MIPS/W or SPEC/W ~ energy per instruction
• CPI * W: equivalent inverse metric– MIPS2/W or SPEC2/W ~ energy*delay (EDP)– MIPS3/W or SPEC3/W ~ energy*(delay)2 (ED2P)
Energy vs. Power
• Power metrics (like W): – max power => package design, cost, reliability– average power => avg electric bill, battery life
• Energy metrics (like SPEC/W):– compare battery life expectations; given workload– compare energy efficiencies: processors that use constant volta
ge, frequency or capacitance scaling to reduce power• ED2P metrics (like SPEC3/W or CPI3 * W):
– compare pwr-perf efficiencies: processors that use voltage scaling as the primary method of power reduction/control
For a systematic and mathematically sound treatment of the metricsissue, i.e. the right choice of k in SPECk/W, see Zyuban et al. ISLPED-02
Deducing Optimal Pipe Depths
0
0.2
0.4
0.6
0.8
1
710131619222528313437
Total FO4 Per Stage
Rel
ativ
e to
Opt
imal
FO
4
bipsbips/Wbips^2/Wbips^3/W
Power-performance optimal Performance optimal
MICRO-35 paper(2002; V. Srinivasan et al.)
Metrics Comparison
0
5
10
15
20
25
30
35
40
45
50
Relative toworst
performer
IntelPI I I
MotoPPC740
SpecInt/ WSpecInt**2/ WSpecInt**3/ W
0
5
10
15
20
25
30
Relative toworst
performer
IntelPI I I
MotoPPC740
SpecFP/ WSpecFp**2/ WSpecFp**3/ W
(Brooks, Bose et al., IEEE Micro, Nov/Dec 2000)• Note:
> at the low end, E metrics like SpecInt/W appear to be fair> at the highest end, ED2P metrics like (SpecInt)3/W seem to do the job> perhaps at the midrange, EDP metrics like (SpecInt)2/W are appropriate?
Analysis Abstraction Levels
Abstraction Analysis Analysis Analysis Analysis Energy
Level Capacity Accuracy Speed Resources Savings
Most Worst Fastest Least Most
Application
Behavioral
Architectural (RTL)
Logic (Gate)
Transistor (Circuit)
Least Best Slowest Most Least
Power/Performance abstractions at different levels of this hierarchy…
• Low-level:– Hspice– PowerMill
• Medium-Level: – RTL, Gate-level Models
• Architecture-level:– PennState: SimplePower– Intel: Tempest– Princeton: Wattch– IBM: PowerTimer
Recent work in statisticalperformance models is asmart abstraction on top ofcurrent detailed simulators
(L. Eeckhout, et al.,Noonburg and Shen,Carl, Nussbaum, Smith, …)
Note:
PowerTheater
PowerTimer: Power models f(SF)
0
200
400
600
800
1000
1200
1400
0 10 20 30 40 50SF
mW
fpq
fxq
fpr-map
gpr-map
gct
Power linearly dependent on Switching Factor
At 0% SF, Power = Clock Power (significant without clock gating)
Model Validation
• Main challenge: defining a specification reference
MODEL UNDER TEST
GOLDENREFERENCE
compare
outputs
An Input Testcase
Flag Error (if outputs differ)
• Secondary problems:– generate apt test cases– test case coverage– choice of o/p signatures
Processors Architecture
• Thought for best system level power efficiency– Monolithic processor– Processor + application accelerators– Multi-processor systems
• The system is still immature, initial steps using monolithic processors– other choices involve considerable s/w effort
Monolithic processors
• The PC model• Microarchitectural complexity increases over time to
provide more and more performance– superscalar, deep pipes, speculative execution
• Traditionally bad for power consumption– need very careful trade off between power and
performance• Powering down unused functional units harder• Simplest software platform
Accelerator based systems
• Core processor for OS + application specific accelerators
• Many benefits to modularity• Functional partitioning = power
partitioning– easy to control system power
• Harder to program generically– OS needs to understand
underlying hardware structure• Application specific hardware
can be made very power efficient
CPU
MPEG
3DGx
SharedMem.
Native Java execution
• Interest in Java growing to allow downloadable applications
• Many implementation methods– coprocessors, standalone accelerators,…
• Inside the core is the most power efficient– allows reuse of execution units– calls to JVM simplified– system design simplified
ARM1136J Java implementation
Instruction Stream
Execute UnitThumb
Decode Stage
Execute Stage
Fetch Stage
Instruction Pipeline
ARM
JavaJava
ARM
Software for power control
• Context specific software power control vital• OS needs to understand how to configure system for
lowest power for each execution scenario– enable required functional units– set optimum operating voltage, frequency and body
bias– disable everything else
• Interaction between h/w and s/w needs to increase dramatically to achieve this
System Power Modeling Issues
• Need to understand system power components better– where does all the power go in server farms?– system-level power-performance metrics?– relate volumetric power density to system reliability?– MP scalabilty problem – power and performance?– what are the right microarch paradigms of future?– how do we model and design (inductive) noise-aware
processors and systems ?• reliabilty vs. performance vs. power tradeoffs