power management lecture notes s. yalamanchili and s. mukhopadhyay

45
Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay

Upload: suzanna-pierce

Post on 25-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay

Power Management

Lecture notes S. Yalamanchili and S. Mukhopadhyay

Page 2: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay

(2)

Technology Scaling

• 30% scaling down in dimensions doubles transistor density

• Power per transistor Vdd scaling lower power

• Transistor delay = Cgate Vdd/ISAT Cgate, Vdd scaling lower delay

GATE

SOURCE

BODY

DRAIN

tox

GATE

SOURCE DRAIN

L

leakddstdddd IVIVfCVP 2

Page 3: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay

(3)

Moore’s Law

3

From wikipedia.org

• Performance scaled with number of transistors

• Dennard scaling*: power scaled with feature size

Goal: Sustain Performance Scaling

*R. Dennard, et al., “Design of ion-implanted MOSFETs with very small physical dimensions,” IEEE Journal of Solid State Circuits, vol. SC-9, no. 5, pp. 256-268, Oct. 1974.

Page 4: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay

(4)

Parallelism and PowerIBM Power5

Source: IBM

AMD Trinity

Source: forwardthinking.pcmag.com

• How much of the chip area is devoted to compute?

• Run many cores slower. Why does this reduce power?

Page 5: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay

(5)

The Power Wall

• Power per transistor scales with frequency but also scales with Vdd

Lower Vdd can be compensated for with increased pipelining to keep throughput constant

Power per transistor is not same as power per area power density is the problem!

Multiple units can be run at lower frequencies to keep throughput constant, while saving power

leakddstdddd IVIVfCVP 2

Page 6: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay

(6)

Mukhopadhyay and Yalamanchili (2009)

Based on scaling using Pentium-class cores While Moore’s Law continues, scaling phenomena have

changed Power densities are increasing with each generation

6

What is the Problem?

Page 7: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay

(7)

ITRS Roadmap for Logic Devices

From: “ExaScale Computing Study: Technology Challenges in Achieving Exascale Systems,” P. Kogge, et.al, 2008

Page 8: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay

Power Management Basics

Lecture notes S. Yalamanchili and S. Mukhopadhyay

Page 9: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay

(9)

What are my Options?

1. Better technology Manufacturing Better devices (FinFet) New Devices non-CMOS? this is the future

2. Be more efficient – activity management Clock gating – dynamic energy/power Power gating – static energy/power Power state management - both

3. Improved architecture Simpler pipelines

4. Parallelism

Not this course

Page 10: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay

(10)

Activity Management

• Turn off clock to a block of logic

• Eliminate unnecessary transitions/activity

• Clock distribution power

• Turn off power to a block of logic, e.g., core

• No leakage

Combinational Logic

clk

clk

cond

input

clk

   

Core 0 Core 1

VddPower gate transistor

Clock Gating Power Gating

Page 11: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay

(11)

Multiple Voltage Frequency Domains

From E. Rotem et. Al. HotChips 2011

• Cores and ring in one DVFS domain• Graphics unit in another DVFS domain• Cores and portion of cache can be gated

off

Intel Sandy Bridge Processor

Page 12: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay

(12)

Processor Power States

• Performance States – P-states Operate at different voltage/frequencies

o Recall delay-voltage relationship Lower voltage lower leakage Lower frequency lower power (not the same as energy!) Lower frequency longer execution time

• Idle States - C-states Sleep states Differ is how much state is saved

• SW or HW managed transitions between states!

Page 13: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay

(13)

Example of P-states

• Software Managed Power States

• Changing Power States is not free

AMD Trinity A10-5800 APU: 100W TDP

CPU P-state

Voltage (V)

Freq (MHz)

HWOnly

(Boost)

Pb0 1 2400

Pb1 0.875 1800

SW-Visible

P0 0.825 1600

P1 0.812 1400

P2 0.787 1300

P3 0.762 1100

P4 0.75 900

Page 14: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay

(14)

Example of P-states

From: http://www.intel.com/content/www/us/en/processors/core/2nd-gen-core-family-mobile-vol-1-datasheet.html

Page 15: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay

(15)

Management Knobs

• Each core can be in any one of a multiple of states

• How do I decide what state to set each core? Who decides? HW? SW?

• How do I decide when I can turn off a core?

• What am I saving? Static energy or dynamic energy?

Page 16: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay

(16)

Power Management

• Software controlled power management Optimize power and/or energy Orchestrated by the operating system or application

libraries Industry standard interfaces for power management

o Advanced Configuration and Power Interface (ACPI) https://www.acpica.org/ http://www.acpi.info/

• Hardware power management Optimized power/energy Failsafe operation, e.g., protect against thermal

emergencies

Page 17: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay

(17)

Power Management3.0

Time Die

Tem

pera

ture

Thermal Headroo

m

Convert thermal headroom to higher performance through boost

HW Boost states

Max Die Temp

SW visible states

Perf

orm

an

ce

CPUDVFS-state

HWOnly

(Boost)

Pb0Pb1

SW-Visible

P0P1P2- - -

Pmin

Inst

ruct

ions/

cycl

e

Time

Performance and energy efficiency depend on effective utilization of power and thermal headroom

Page 18: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay

(18)

Boosting

• Exploit package physics Temperature changes on the

order of milliseconds

• Use the thermal headroom

Max Power

TDP Power

Low power – build up thermal credits

Turbo boost region

10s of seconds

Intel Sandy Bridge

Page 19: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay

(19)

Power Gating

Intel Sandy Bridge Processor

• Turn off components that are not being used Lose all state information

• Costs of powering down

• Costs of powering up

• Smart shutdown Models to guide decisions

Page 20: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay

(20)

Parallelism

• Concurrency + lower frequency greater energy efficiency

leakddstdddd IVIVfCVP 2

Core

Cache

Core

Cache

Core

Cache

Core

Cache

Core

Cache

• 4X #cores• 0.75x voltage• 0.5x Frequency• 1X power• 2X in performance

Example

Page 21: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay

(21)

Simplify Core DesignAMD Bulldozer Core

ARM A7 Core (arm.com)

• Support for branch prediction, schedulers, etc. consumes more energy per instruction

• Can fit many more simpler cores on a die

Page 22: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay

(22)

Metrics

• Power efficiency MIPS/watt Ops/watt

• Energy efficiency Joules/instruction Joules/op

• Composite Energy-delay product Energy-delay2 Why are these useful?

Page 23: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay

Modeling

Lecture notes S. Yalamanchili and S. Mukhopadhyay

Page 24: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay

(24)

Microarchitectural Level Models

• How can we study power consumption without building circuits? Models

• Models can are available at multiple levels of abstraction.

We are interested in microarchitectural models

Page 25: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay

(25)

Processor Microarchitecture

Instruction Cache

Instruction Queue

FetchQueue

Instruction Decoder

BranchPrediction

Register Files

Instruction TLB

ALU

MUL

FPU

LD

ST

L1 Data Cache

DataTLB

L2 Data CacheNoC Router

On-ChipNetwork

Fetch Decode Execute/Writeback

Memory

Network

Page 26: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay

(26)

Energy/Power Calculation

• How do we calculate energy or power dissipation for a given microarchitecture?

• Energy/Power varies between: Different ISA; ARM vs Intel x86

Different microarchitecture; in-order vs out-of-order

Different applications; memory vs compute-bound

Different technologies; 90nm vs 22nm technology

Different operation conditions; frequency, temperature

Page 27: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay

(27)

Architecture Activity (1)

Instruction Cache

Instruction Queue

FetchQueue

Instruction Decoder

BranchPrediction

Register Files

Instruction TLB

ALU

MUL

FPU

LD

ST

L1 Data Cache

DataTLB

L2 Data CacheNoC Router

On-ChipNetwork

Activity 1: Instruction Fetch

icache.read++; fbuffer.write++;

• Collect activity counts of each architecture component (through simulation or measurement).

• List of components differs between microarchitectures.

• Activity counts at each component differs between applications.

Page 28: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay

(28)

Architecture Activity (2)

Instruction Cache

Instruction Queue

FetchQueue

Instruction Decoder

BranchPrediction

Register Files

Instruction TLB

ALU

MUL

FPU

LD

ST

L1 Data Cache

DataTLB

L2 Data CacheNoC Router

On-ChipNetwork

Activity 2: Instruction Decode

fbuffer.read++; idecoder.logic++;

• Read/write accesses to caches, buffers, etc.

• Logical accesses to logic blocks such as decoder, ALUs, etc.

• Tradeoff of differentiating more access types (accuracy) vs simulation speed (complexity).

Page 29: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay

(29)

Power and Architecture Activity

• For example, At nth clock cycle, collected counters are: Data cache:

o read = 20, write = 12;

o per-read energy = 0.5nJ; per-write energy = 0.6nJ;

o Read energy = read*per-read energy = 10nJ

o Write energy = write*per-write energy = 7.2nJ

o Total activity energy = read+write energies = 17.2nJ

o If n = 50th clock cycle and clock frequency = 2GHz,Total activity power = energy*clock_freq/n = 688mW

*Note: n/clock_freq = n clock periods in sec power = time average of energy

Page 30: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay

(30)

Things to consider (1)

1. How do we calculate per-read/write energies?

• Per-access energies can be estimated from circuit-level designs and analyses.

• There are various open-source tools for this.

Architecture Specification

Technology Parameters

Circuit-levelEstimation

Tool

Estimation Results:Area, Energy, Timing, etc.

Page 31: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay

(31)

Things to consider (2)

2. Is per-access energy always the same?

• Per-access energy in fact depends on:• how many bits are switching • how they are switching (0→1 or 1→0)

• It is reasonable to assume constant per-access energy in long-term observation (e.g., n = 1M clock cycles); the number of switching bits are averaged (e.g., 50% of bits are switching).

• Most architecture simulators do not capture bit-level details due to simulation complexity.

Page 32: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay

(32)

Things to consider (3)

3. If a register file didn’t have read/write accesses but held data, what is the energy dissipation?

• Energy (or power) is largely comprised of dynamic and static dissipations.

• Dynamic (or switching) energy refers to energy dissipation due to switching activities.

• Static (or leakage) energy is dissipation to keep the electronic system turned on.

• In this case, the register file has no dynamic energy dissipation but consumes static energy.

Page 33: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay

Thermal Issues

Lecture notes S. Yalamanchili and S. Mukhopadhyay

Page 34: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay

(34)

Thermal Issues

• Heat can cause damage to the chip Need failsafe operation

• Thermal fields change the physical characteristics Leakage current and therefore power increases Delay increases Device degradation becomes worse

• Cooling solution determines the permitted power dissipation

Page 35: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay

(35)

Thermal Design Power (TDP)

• This is the maximum power at which the part is designed to operate Dictates the design of the

cooling system o Max temperature Tjmax

Typically fixed by worst case workload

• Parts are typically operating below the TDP

• Opportunities for turbo mode?

AMD Trinity APU

http://ecs.vancouver.wsu.edu/thermofluids-research

Page 36: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay

(36)

Heat Sink Limits on Performance

Thermal design power (TDP) Determines the cooling solution & package

limits

Performance depends on effective utilization of this thermal headroom

www.legitreviews.com

Inst

ruct

ion

s/cy

cle

Time

Thermal Headroom

Max Die Temp

Convert thermal headroom to higher performance through boosting

HW Boost states

SW visible states

Boost powerTDP Power

Workload

Tem

pP

ow

er

Page 37: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay

(37)

Trinity TDP

Source: http://www.anandtech.com/show/6347/amd-a10-5800k-a8-5600k-review-trinity-on-the-desktop-part-2

Page 38: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay

(38)

Issues

• Cooling chips is now an issue for computer architects!

• Co-design the cooling system and the processor

• Some very “cool” new technologies E.g., microfluidics!

Page 39: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay

(39)

Electrical and Fluidic I/Os

• Fluid flow through the microchannels carry heat out to an external heat exchanger (e.g., heat sink)

Courtesy L. Zheng ECE) and Professor Muhannad Bakir (ECE)

Page 40: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay

(40)

Fabrication Examples

Electrical and fluidic microbumps, fluidic vias and fine wires

Micropin-fins (150 µm diameter and 225 µm diameter)and vias

Courtesy L. Zheng ECE) and Professor Muhannad Bakir (ECE)

Page 41: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay

(41)

Conclusions

• Power/energy is the leading driver of modern architecture design

• Power and energy management is key to scalability

• Need integrated power/energy, performance, thermal management in fielded systems

• What about energy/power efficient algorithms?

Page 42: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay

(42)

Study Guide

• Explain the difference between energy dissipation and power dissipation

• Distinguish between static power dissipation and dynamic power dissipation

• Explain dynamic voltage frequency scaling What are power states? Why is this an advantage? What is the impact of DVFS on i) energy, ii) execution

time, and iii) power

• Distinguish between clock gating and power gating

Page 43: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay

(43)

Study Guide (cont.)• Define thermal design power (TDP)

• Name two schemes to preventing the chip from exceeding TDP. Explain how they achieve this goal

• What does boosting achieve?

• What is the difference between C-states and P-states?

• Name one power management technique that will save static power?

• How does using many slower simpler cores improve power efficiency?

Page 44: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay

(44)

Study Guide (cont.)

• How is thermal design power (TDP) calculated?

• When using boost algorithms, what determines the duration of the high frequency operation?

• How does a power virus work?

• Describe how throttling works

• Know the power dissipation in some modern processor-memory systems drawn from the embedded, server, and high performance computing segments

Page 45: Power Management Lecture notes S. Yalamanchili and S. Mukhopadhyay

(45)

Glossary

• Boosting

• C-states

• Dynamic Power and Energy

• Power Gating

• P-states

• Static Power and Energy

• Time constant

• Thermal Design Point

• Throttling