system-level power optimization - infflavio/ensino/cmp502/date00_tut_slides.pdf · • describe...

136
Benini & De Micheli DATE 2000 SYSTEM-LEVEL SYSTEM-LEVEL POWER OPTIMIZATION: POWER OPTIMIZATION: TECHNIQUES AND TOOLS TECHNIQUES AND TOOLS Giovanni De Micheli Giovanni De Micheli Luca Benini Luca Benini CSL - Stanford University CSL - Stanford University DEIS - Università di Bologna DEIS - Università di Bologna

Upload: others

Post on 04-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

SYSTEM-LEVELSYSTEM-LEVELPOWER OPTIMIZATION:POWER OPTIMIZATION:

TECHNIQUES AND TOOLSTECHNIQUES AND TOOLS

Giovanni De MicheliGiovanni De MicheliLuca BeniniLuca Benini

CSL - Stanford UniversityCSL - Stanford University

DEIS - Università di BolognaDEIS - Università di Bologna

Page 2: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Motivation

• Energy-efficient computing is required by:– Mobile electronic systems

• to extend battery life-time

– Large-scale electronic systems• to reduce operating costs and meet

environmental standards

• The quest for energy efficiency affects allaspects of system design

Page 3: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Objectives of the tutorial

• Describe design techniques and tools forsystem-level design

• Address system-level modeling, designand power management issues

• Purposely neglect:– Chip-level design issues

• physical and logic design

– Distributed systems (e.g. wireless networks)

Page 4: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Electronic systems

• A system is a combination of:– Hardware:

• Computation units• Storage units• Communication units• Peripherals

– Software :• Application and system software

• Energy is required by all hardware units• Software organization affects how

hardware consumes energy

Page 5: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Electronic system design

• Conceptualization and modeling:– From idea to model

• Design:– HW: computation, storage and communication– SW: application and system software

• Run-time management:– Run-time system management and control of

all units including peripherals

Page 6: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Examples• Modeling:

– Choice of algorithm– Application-specific hardware vs. programmable

hardware (software) implementation– Word-width and precision

• Design:– Structural trade-off

• Resource sharing and logic restructuring– Exploit multiple/variable supplies

• Management:– Operating system– Dynamic power management

Page 7: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Outline

• Introduction• System conceptualization and modeling

– Modeling and design– Modeling for energy-efficient design

• System design• System management• Conclusions

Page 8: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

System models

• Modeling is an abstraction:– Represent important features and hide

unnecessary details

• Functional models:– Capture functionality and requirements– Executable models:

• Support hw and/or sw compilation and simulation

• Implementation models:– Describe target realization

Page 9: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Taxonomy

Classes of Systems

General-purpose Systems Special-purpose Systems

Modeling Styles

Implementation Models Functional Models

Executable Non-executable

Page 10: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Functional models

• Abstract functional representation– Mathematical notation

• Abstract algorithmic notation– Pseudocode

• Executable functional models– Hw and/or sw languages:

• Verilog, VHDL, SystemC, C, C++, Java

• Non-executable functional models– Complex and/or incomplete representations

• e.g., task graph

Page 11: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Functional models: examples

+==

=− 1j2iyK

j2ixKz

)1i(102

i101i

while(1) { read(x); read(y); if (c%10 == 0) { z = K1 * x; write(z); z = K2 * y; write(z); } c++;}

read yread x

dsample 10dsample 10

scale K1 scale K2

multiplex

Page 12: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Conceptualization

• Refining abstract into executable models• Many different realizations:

– Different algorithms implement a function– e.g., sorting

• Many different stylistic embodiments– Different way of expressing computation, storage

and communication– e.g., different ways of writing hw/sw models

• Form and content of executable modelsaffects energy consumption and bias design

Page 13: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

System platform• Micro-architecture:

– Programmable component/core(e.g., processor, controller)

• Macro-architecture: system organization– Programmable and application-specific

components, memory, communication units

• System design:– Mapping a functional model onto a platform– Hardware and software design

• How to achieve energy-efficient design?

Page 14: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

System-level design• Design macro-architecture:

– Use off-the-shelf components or cores• Do not change micro-architectures

– Design application-specific components/cores• Exploit component architecture

• Low-energy computation can be achievedwith either approach:– Designing custom components requires more

effort but may lower-power consumption

Page 15: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Outline• Introduction• System conceptualization and modeling

– Modeling and design– Energy efficient design from

• Executable functional models• Non-executable functional models• Implementation models

• System design• System management• Conclusions

Page 16: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Algorithm selection

• Inputs– A target macro-architecture– Abstract functional/executable spec.– Constraints– Library of algorithms

• Objective Select the most energy-efficient algorithm

that satisfies constraints

Page 17: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Example: set data types[Wuytack96]

• Select abstract data type (data struct & algorithm)• Minimize memory power• Application: ATM segment protocol processor• Library with 32 complex abstract data types

Array Linked list Pointer array Binary tree

×1000 ∆P between best and worst

Page 18: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Issues in algorithm selection

• Applicable only to general-purposeprimitives with many alternativeimplementation

• Pre-characterization on targetarchitecture

• Limited search space exploration

Page 19: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Algorithm optimization

• Algorithm ⇒ control data flow graph• Computational energy

– Explicit in CDFG– Cost-per-operation metrics

• Communication, storage energy– Implicit in CDFG– Locality metrics

Page 20: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Example: dataflow optimization[Chau92,Srivastava96,Mehra96]

+ +

+

∗∗+∆

∆+ +

+

∗∗+∆

• Computational energy⇒ graph nodes– E.g. Node minimization

• Communication/storageenergy ⇒ graph edges– E.g. Min-cut clustering

Page 21: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Algorithm optimization issues• Analyze CDFG locality

– Need to rigorously define locality metrics– Locality analysis can be expensive

• Tradeoff between spatial locality andtemporal locality– Parallel vs. sequential

• Relationship with macro-architecturaltemplate (cost metric characterization)

• Accuracy

Page 22: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Algorithm transformation

• More aggressive than optimization– Global vs. local– Change the semantic of the computation

• Hard to automate• Very effective

Page 23: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Approximate processing

Introducing well-controlled errors can beadvantageous for power– Reduced data width (coarse discretization)– Layered algorithms (successive

approximations)– Lossy communication

Stage1340×400

Stage2680×800

Stage 31020×1200

Min P? Med P?N N

YY

Page 24: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Control flow-basedtransformations

• Mutual exclusion

if (hi > 0) NewAddr(Pack)Route(Pack)

>0NewAddr Route

Pack

hi

• Computational kernel

dequant(BLK)IDCT(BLK)

CPU CPU DCT

Page 25: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Algorithm transformationIssues

• Approximate processing– Error control

• Quantify average and worst-case errors

– Scope of applicability• In some domains, exact results are required

• Control flow-based transformation– Control flow analysis is hard– Input dependency of control flow

Page 26: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Outline• Introduction• System conceptualization and modeling

– Modeling and design– Energy efficient design from

• Executable functional models• Non-executable functional models• Implementation models

• System design• System management• Conclusions

Page 27: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Task graph• Nodes are tasks• Edges are dependencies• Periodic execution is implicitly assumed

T1T1

T2T2T3T3

BB

EE

Page 28: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Task graph techniques

• Problem formulation– Input

• Task graph• Set of available processing elements (PEs)• Power, performance, cost metrics• Performance & cost constraints

– Output• Power-optimized implementation (constrained)

• [Dave99], [Kirovski97]

Page 29: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Processing elements• Several classes of PEs

– General-purpose processors (e.g. RISC core)– Digital signal processors (e.g. VLIW core)– Programmable logic (e.g. LUT-based FPGA)– Specialized processors (e.g. custom DCT core)

• Trade off flexibility vs. efficiency– Specialized is faster and power-efficient– General-purpose is flexible and

inexpensive

Page 30: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Cost metrics• Multi-objective problems

– Constrained optimization– Explore design trade-offs

• Example: performance and power

#C lock C ycles Energy (µJ)

PE1 PE2 PE3 PE1 PE2 PE3Energy per Cycle 1.5 .2 .05

T1 100 250 900 150 50 45

T2 110 260 900 165 52 45

T3 120 300 1000 180 60 50

TCLK = 10ns @ Vdd = 3.3V

Page 31: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Constrained optimization

• Design space– Who does what and when (binding &

scheduling)– Supply voltage of the various PEs:

• TCLK = K Vdd/(Vdd - Vt)2

• Design target– Minimize power– Performance constraint (e.g. Titeration=21µsec)

Page 32: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Basic search algorithmTask graph preprocessing

(clustering-splitting)Task graph preprocessing

(clustering-splitting)

PE allocationPE allocation

Scheduling & BindingScheduling & Binding

Stop?Stop?Stop?Cost EstimationCost Estimation

NO

YES

Page 33: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Refined Power Metrics I

• Communication power• Memory power

MMPEPEMMPEPE MMPEPE

BUST1T1

T2T2T3T3

BB

EE

Page 34: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Refined Power Metrics II

• Multi-tasking overhead• Power management overhead

PE1

PE1

PE1

PE1

Page 35: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Limitations of Task-LevelAbstraction

• Task-level description does not expresscomplete functional information– Tasks must be defined a priori– Functional transformations are impossible

• Power metric characterization– Exaustive (every PE for every task)– Inaccurate (misses inter-task effects)

Page 36: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Outline• Introduction• System conceptualization and modeling

– Modeling and design– Energy efficient design from

• Executable functional models• Non-executable functional models• Implementation models

• System design• System management• Conclusions

Page 37: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

The spreadsheet model

• General-purpose systems– Backward compatibility– Component-based

• Spreadsheet-based analysis– Basic budgeting– Simple “what if” analyses– No learning curve

Page 38: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Example: datasheet analysis

PDA #Comp Vdd Iidle Ion %on %idle I(mA)

Proc 1 3.3 0.5 50 0.7 0.3 36.15

DRAM 1 3.3 0.1 12 0.7 0.3 8.43

FLASH 5 3.3 0.0 9 0.7 0.3 31.5

IR 1 3.3 0.0 64 0.05 0.95 3.2

RTC 1 3.3 0.0 0.1 1 0 0.1

DC-DC 1 - 0.1 5.5 0.99 0.01 5.44

TOT 83.82TOT 83.82

Page 39: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Outline• Introduction• System conceptualization and modeling• System design

– Computation– Memory– Communication– Software

• System Management• Conclusions

Page 40: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

System design• Input

– The output of the conceptualization phase• A macro-architectural template• A hardware-software partition• Component by component constraints

• Output– Complete hardware design

ProcProc InteconnectInteconnect

PEPE PEPE PEPE PEPE

MemMem MemMem

Page 41: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Design process

• Specify computation, storage, templatecomponents, and software– Synergic process

• Fundamental tradeoff: general-purposevs. application-specific– Flexibility has a cost in terms of power

Page 42: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Processing element design

• Application specific processing unit– Minimum flexibility, minimum power

• Application-specific processor– Tailored processor template

• Core processor– Maximum flexibility, maximum power

Page 43: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Application-specificcomputational units

• Designed at the circuit/logic/RT level– Outside the scope of this tutorial

• Synthesized from high-level executablespecification (behavioral synthesis)– Supply voltage reduction– Switching frequency reduction– Load capacitance reduction– Minimization of switching activity

Page 44: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Power-driven voltage scaling

From faster to power efficient by scalingdown voltage supply– Traditional speed-enhancing

transformations can be exploited for low-power design

• Pipelining• Parallelization• Loop unrolling• Re-timing

Page 45: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Issues

• Performance-enhancing transformationsdo not always pay off– Region of diminishing returns (e.g.

speculation)

• Voltage supply is driven low bytechnological reasons– Reduced headroom

Page 46: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Advanced voltage scaling

• Multiple voltages– Slow down non-critical path with lower

voltage supply– Two or more power grids– High-efficiency voltage converters

* + -

+

+

Page 47: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Clock frequency reduction

• fclk↓ does not decrease energy– …but it may increase battery life: C=K/Iα

• Multi-frequency clocks– GALS [Hemani99]– Low-frequency distribution [Chung95]

Clk domain 1

Clk domain 2

Clk domain 3

Page 48: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Reducing load capacitance

• Reduce wiring capacitance– Reduce local loads– Reduce global interconnect

Global interconnect can be reducedby improving spatial locality: trade off

communication for computation

Page 49: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Reduce switching activity• Improve correlation between

consecutive input to functional macros• Reduced glitching• All basic HLS steps have been modified

– A synergic approach lead best results

+ +

- +

>

+ +

-

+

>

+

+

-

+

>

A AB C

H

0

AB ABA C

A CH H

0 0

Page 50: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Issues

• High-level estimation– Accuracy is still limited– Dependency on input patterns

• Design flow– HLS is not yet mainstream technology– Compete with RTL techniques

Page 51: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Application-specificprocessors

• Parameterized processors tailored to aspecific application– Optimally exploit parallelism– Eliminate unneeded features

• Applied to different architectures– Single-issue cores ⇒ instruction subsetting– Superscalar cores ⇒ # and type of FUs– VLIW cores ⇒ FUs and compiler

Page 52: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Example: Application-specific VLIW optimization

ApplicationProcessor library

ApplicationProcessor library

Select ProcessorRetarget compiler

Select ProcessorRetarget compiler

Compile andSimulate (ISS)

Compile andSimulate (ISS)

Estimate and findbest so far

Estimate and findbest so far

Other?Other?

Eliminate dominatedsolutions

Eliminate dominatedsolutions

Done

Y

N

Page 53: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Issues

• Exploration techniques– Limited search space– Accuracy of cost metrics

• Back-end– Synthesis of ASIPS– Competitive with highly-optimized cores?

• Preliminary research results

Page 54: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Low power core processors

• Details are outside the tutorial’s scope– [Gonzalez96,Burd96]

• Key ideas– Low voltage– Reduce wasted switching– Specialized modes of operations/instructions– Variable voltage supply

Page 55: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Core design space forMultimedia [Nishitani99]

Sony Mpeg2 Enc

NEC Mpeg2 Enc

TMS320C6xTMS320C6201

MPactTriMedia

SH4

Scarlet

DEC21164

PentiumII

MMXPentiumVR4400

VR4300StrongARM110

Pallas

Lucent16210SPX

MN1933211

VR4100

1,0e+1

1,0e+2

1,0e+3

1,0e+4

1,0e+5

1,0e-2 1,0e-1 1,0e+0 1,0e+1 1,0e+2

MPEG2 ENC

MPEG1 ENC

MPEG2 DEC

ASIC

MediaProc

General Purpose

MOPS

W

Page 56: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Exploiting Variable Supply• Supply voltage can be dynamically

changed during system operation– Quadratic power savings– Circuit slowdown

• Just-in-time computation– Stretch execution time up to the max

tolerable

Available time

PowerFixed voltage + Shutdown

Variable voltage

Page 57: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Variable-supplyArchitectures

• High-efficiency adjustable DC-DC converter• Adjustable synchronization

– Variable-frequency clock generator [Chandrakasan96]

– Self-timed circuits [Nielsen94]

• Example: Power-pro architecture [Ishiara98], Crusoeembedded processor [Transmeta00]

DecPower

Manager

CPUProgROM

Data RAM

DC-DC& VCO

Vdd CLK

Page 58: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Issues

• Optimization still not proven on real-lifearchitectures

• Overhead in supporting variable voltage– Adjustable DC-DC– Adjustable clock– Interfaces

• Reliability concerns

Page 59: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Outline• Introduction• System conceptualization and modeling• System design

– Computation– Memory– Communication– Software

• System Management• Conclusions

Page 60: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Memory Optimization

• Custom data processors– Computation is less critical than data

storage (for data-dominated applications)

• General-purpose processors– A significant fraction of system power is

consumed by memories

Memory-related consumptionStorage

Transfer

Page 61: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Memory Power Optimization

• Key idea: exploit locality– Hierarchical memory– Partitioned memory

B1B1

B2B2

B3B3

B1B1

B2B2

B1B1

B2B2

PEPE

L1

L2L3L3

Page 62: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Optimization approaches

• Fixed memory access patterns– Optimize memory architecture

• Fixed memory architecture– Optimize memory access patterns

• Concurrently optimize memoryarchitecture and accesses

Page 63: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Optimize MemoryArchitecture

• Data replication to localize accesses– Implicit: multi-level caches [Su95], [Bahar98]– Explicit: buffers [Bajwa97], [Wuytack98]

• Partitioning to minimize cost per access– Multi-bank caches [Ko98]– Partitioned memories [Tellez97], [Wuytack98]

L1L1 L2L2 L3L3

B12 B23

Pmem = PL1⋅hitL1+PB12(1-hitL1)+PL2 ⋅hitL1+(PB23+PL3)⋅(1-hitL1-hitL2)

Page 64: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Optimize Memory Accesses

• Sequentialize memory accesses– Reduce address bus transitions [Catthoor94], [Su95]

– Exploit multiple small memories [Panda96]

• Localize program execution– Fit frequently executed code into a small instruction

buffer (or cache) [Panwar95], [Bellas98]

• Reduce storage requirements [Gebotys96],[Catthoor]

Page 65: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Optimize Memory Architectureand Access Patterns

• Two phase-process– Specification (program) transformations

• Reduce memory requirements• Improve regularity of accesses

– Build optimized memory architecture

• Highest potential– How to automate program transformations

Page 66: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Memory Optimization Tools

• Atomium (IMEC)– Supports very high-level transformations

and optimizations (e.g. data-structureselection)

• Avalanche (Princeton)– Focuses on cache optimization, provides

cycle-accurate estimates

• UCLA system-level tool– Integrates voltage supply selection

Page 67: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Outline• Introduction• System conceptualization and modeling• System design

– Computation– Memory– Communication– Software

• System Management• Conclusions

Page 68: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Design of communicationunits

• Trends:– Faster computation blocks, larger chips

– Communication speed is critical

– Energy cost of communication is significant

• Multifaceted design approach:– On chip, networks, wireless, ...

– Protocol stack

Page 69: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Protocol stacka simplified view

• Data Link– Error control through

coding and channelmanagement

– Shared busses

• Physical layer– Signaling– Modulation

NetworkNetwork

OS & MiddlewareOS & Middleware

ApplicationsApplications

Data LinkData Link

PhysicalPhysical

Page 70: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Data encoding

• Theoretical results:– Bounds on transition activity reduction:

• The higher the entropy rate of the source is,the lower is the gain achievable by coding

• Practical applications:– Processor-memory (and other) busses

• Data busses, address busses

• Transition activity reduction does notguarantee energy savings

Page 71: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Bus encoding

BUSControl line(s)

ENCDEC

ENCDEC

PROCESSOR MEMORY

Page 72: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Bus encoding

• Data buses:– Random white noise model

• Address busses:– Some spatio-temporal correlations

• Embedded software:– Addresses and data can be analyzed

a priori to determine encoding

Page 73: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Bus-Invert coding for data busses[Stan, Burleson]

• Add redundant line INV to bus• When INV=0

– Data is equal to remaining bus lines

• When INV = 1– Data is complement of remaining bus lines

• Performance:– Peak: at most n/2 bus lines switch– Average: Code is optimal. No other code

with 1-bit redundancy can do better

Page 74: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Bus-Invert coding for data busses

• Average switching reduction is bus-widthdependent:– Ex: 3.27 for an 8-bit bus

• Average switching per line decreases asbusses get wider– Use partitioned codes– No longer optimal (among redundant codes)

• Implementation issues:– Difference (XOR) of two data samples and

majority vote

Page 75: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Bus-Inver code comparisons

lines mode A- trans A-trans/ line A-power

2 - 1 0.5 100%2 BI 0.75 0.375 75%8 - 4 0.5 100%8 1 BI 3.27 0.409 81.8%8 4 BI 3 0.375 75%16 - 8 0.5 100%16 1 BI 6.83 0.427 85.4%

Page 76: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Extensions and generalizations

• Transition signaling:– Assert logic TRUE / FALSE by signal

transitions

• Use several redundant lines– Limited-weight codes [Stan, Burelson]– Coding is space and time– Modulation techniques

Page 77: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Encoding instruction addresses

• Most instruction addresses are consecutive– Use Gray code [Su, Tsui, Despain]

• Word-oriented machines:– Increments by 4 (32bit) or by 8 (64bit).– Modify Gray code to switch 1 bit per increment

[Metha, Owens, Irwin]– Gray code adder for jumps

• Harder to partition• Convert to Gray code after update

Page 78: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Working zone encoding (WZE)[Mussol, Lang, Cortadella]

• Conjecture:– Software programs favor working zones of

their address space

• WZE:– Transmit WZ identifier and offset in WZ– 1-hot encoding for offsets

• Applicability:– No caches: data/instruction/shared address

busses– With caches: data/instruction-only busses

Page 79: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

T0 Code• Add redundant line INC to bus• When INC = 0

– Address is equal to remaining bus lines

• When INC = 1– Transmitter freezes other bus lines– Receiver increments previously transmitted

address by a parameter called stride

• Asymptotically zero transitions forsequences– Better than Gray code

Page 80: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Mixed bus encoding techniques• T0_BI:

– Use two redundant lines: INC and INV– Good for shared address/data busses

• Dual encoding:– Good for time-multiplexed address busses– Use redundant line SEL :

• SEL=1 denotes addresses• SEL is already present in the bus interface

– Dual T0:• Use T0 code when SEL is asserted.

– Dual T0_BI:• Use T0 when SEL is asserted; otherwise use BI

Page 81: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Address bus encodingusing statistical analysis

• Statistical analysis of bus traces– Spatio-temporal correlation of word K-tuples– Often limited to first / second order statistics (K=1,2)

• Encode words according to correlation• Use transition signaling• Spatio-temporal correlation computation:

– On-line adaptive– Off-line for embedded software

Page 82: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Address bus encoding forembedded software

• Off-line statistical analysis of bus traces– Compute bit 2nd order correlation from known stream:

• Correlate bit it with bit jt+1

• Use correlation measure to group bits into fields– Apply graph clustering algorithm– Cluster correspond to mutual high spatio-temporal correlation

• Re-encode bus lines in each cluster– Group bus lines into clusters (with locally high correlation)– Encode signals within each cluster to reduce bus switching

Page 83: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Example[Ramprasad98,Benini99]

E⊕ ⊕

D

X(n-1)

z(n)

X(n-1)

y(n)y(n) x(n)x(n)

Correlator Decorrelator

Page 84: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Bus encoding: summary• Bus encoding is very useful to reduce switching

of high-capacitance busses

• Some techniques require synthesis ofdedicated encoder/decoder circuitry

• Power consumption of such circuits must beweighted against power savings on busses

• Techniques differ for address and data busses

Page 85: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Outline• Introduction• System conceptualization and modeling• System design

– Computation– Memory– Communication– Software

• System Management• Conclusions

Page 86: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Impact of software

• For a given a hardware platform, the energyto realize a function depends on software– Operating system– Different algorithms to embody a function

(e.g., sorting)– Different coding styles– Application software compilation

Page 87: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Source-code transformations• Minimize power-consuming activity

– Computation

– Communication

– Storage

for (c = 1..N) receive(A) B = c*A

receive(A)for (c = 1..N) B = c*A

A*B+A*C A*(B+C)

for (c = 1..N) B[c] = A[c]*D[c]for (c = 1..N) F[c] = B[c]-1

for (c = 1..N) F[c] = A[c]*D[c]-1

Page 88: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Coding styles• Use processor-specific instruction style:

– Variable types– Function calls style– Conditionalized instructions (for ARM)

• Follow general guidelines for software coding– Use table look-up instead of conditionals– Make local copies of global variables so that they

can be assigned to registers– Avoid multiple memory look-ups with pointer chains

Page 89: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Example:ARM Variable Types

• Default “int” variable type is 18.2% moreenergy efficient than “char” or “short”

• Sign or zero extending is needed for shortervariable types

int wordinc (int a){ return a + 1;}

char charinc (char a){ return a + 1;}

Page 90: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Example:ARM conditional execution

• All ARM instructions are conditional• Conditional execution reduces the number of

branches

int g(int a, int b, int c, int d){ if (a > 0 && b > 0 && c < 0 && d < 0)

return a + b + c + d; return -1;}

Page 91: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Example:Switch versus Table Lookup

• Table lookup is 52.3% more energy efficientthan a dense switch statement since fewerjumps are required

if ((unsigned) condition >= 4 ) return 0;return "EQ\0NE\0CS\0CC\0" + 3*condition;

switch(condition){ case 0: return "EQ"; case 1: return "NE"; case 2: return "CS"; case 3: return "CC"; default: return 0;}

Page 92: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Example: MPEG

• Compiler optimizations account for only 0.6%energy savings

• Source code optimizations give as much as32.3% energy savings

Source General Special Size Time EnergyOpt. Opt. Opt. %change %change %changenone Balance none 0 0 0none Code Size none 0 -0.6 -0.6none Time none 0 -0.2 -0.2none Balance all 0 -0.6 -0.6all Balance none -5.8 -35 -32.3all Balance all -5.8 -35 -32.3

Page 93: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Software analysis

• Simulation-based approach– Perform system-level simulation of hardware

platform running application software– Best for software program comparisons

• Experimental measurement– Run program with instruction streams and

measure average energy cost of instructions– Best for instruction-level analysis

Page 94: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Instruction-level analysis[Tiwari, Malik, Wolfe]

• Analyze loop execution containing specificinstruction(s)– Loop should be long enough to neglect overhead

and short enough to avoid cache misses– About 200 instructions

• Measure instruction base cost• Measure inter-instruction effects

(can be approximated by a constant)

Page 95: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Base cost for the 486DX2

# Instruction Current cCycles1 NOP 275.7 1

2 MOV DX, BX 302.4 13 MOV DX, [BX] 428.3 1

4 MOV DX, [BX] [DI] 409.0 2

5 MOV [BX], DX 521.7 16 MOV [BX] [DI], DX 451.7 2

Page 96: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Instruction-level analysis

• Energy cost per instruction:– current * cycles * voltage / frequency

• Energy cost per instruction stream– Basis for comparison

• Example: StrongArm@170MHz– Average power/instruction: 200mW– Load/store: 260mW

Page 97: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Software compilation

Parse & LexParse & Lex

Machine independent optimizations

Machine independent optimizations

Instruction selectionOperation schedulingRegister assignment

Instruction selectionOperation schedulingRegister assignment

Page 98: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Compilation for low-powerInstruction selection

• Use instruction cost as guideline:– Example: use register-based operation

• Instruction selection by tree cover:– Use energy cost as guideline [Tiwari]– Experimental results do not show major variations

from code obtained by selecting instructions tominimize code length

Page 99: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Compilation for low-poweroperation scheduling

• Reorder instructions:– Reduce inter-instruction effects [Tiwari]– Switching in control part [Tiwari, Malik, Wolfe]

• Cold scheduling:– Reorder instructions to reduce inter-instruction

effects on instruction bus [Su, Tsui, Despain]:• Consider instruction op-codes• Inter-instruction cost is op-code Hamming distance• Use list scheduler where priority criterion is tied to

Hamming distance

Page 100: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Scheduling to reduce off-chip traffic[Tomiyama, Ishihara, Inoue, Yasuura]

• Schedule instructions (within basic blocks) tominimize Hamming distance

• Scheduling algorithm:– Operates within each basic block– Searches for linear orders consistent with data-flow– Prunes search space by avoiding redundant

solutions (hash sub-trees) and heuristically limitsthe number of sub-trees

Page 101: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Example

• ( ) 11111111• (A) 10010011• (B) 11101000• (C) 10111011• (D) 01110100• ( ) 11111111

• ( ) 11111111• (A) 10010011• (C)10111011• (B) 11101000• (D) 01110100• ( ) 111111111

4

6464

4

244

4

Page 102: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Compilation for low-powerregister assignment

• Minimize spills to memory• Register labeling [Metha, Owens, Irwin]

– Reduce switching in instruction register/bus andregister file decoder by encoding

– Reduce Hamming distance between addresses ofconsecutive register accesses

– Approach is complementary to cold scheduling

Page 103: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Other compiler optimizations

• Loop unrolling to reduce overhead– Contra: increased code space

• Software pipelining– Decreases the number of stalls by fetching

instructions in different iterations

• Eliminate tail recursion– Reduce overhead and use of stack

Page 104: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Software compilation:summary

• Several interesting ideas but:– Shortest code may turn out to be best for energy

• Memory organization and encoding of data,instructions and addresses play major role:– High-capacitance busses draw a lot of power

• Instruction/data compression/decompressionlead to significant saving due to traffic reduction

Page 105: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Outline• Introduction• System conceptualization and modeling• System design• System Management

– Dynamic power management techniques– Implementation of DPM

• Conclusions

Page 106: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Dynamic power management• Systems are:

– Designed to deliver peak performance, but …– Not needing peak performance most of the time

• Components are idle at times• Dynamic power management (DPM):

– Puts idle components in low-power non-operationalstates when idle

• Power manager:– Observes and controls the system– Power consumption of power manager is negligible

Page 107: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Structure ofpower-manageable systems

• System consists of several components:– E.g., Laptop: Processor, memory, disk, display …– E.g., SOC: CPU, DSP, FPU, RF unit

• Components may:– Self-manage state transitions– Be controlled externally

• Power manager:– Abstraction of power control unit– May be realized in hardware or software

Page 108: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Power manageable components

• Components with several internal states– Corresponding to power and service levels

• Abstracted as a power state machine– State diagram with:

• Power and service annotation on states• Power and delay annotation on edges

Page 109: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Example: SA-1100

• RUN: operational• IDLE: a sw routine

may stop the CPUwhen not in use,while monitoringinterrupts

• SLEEP: Shutdownof on-chip activity

RUN

SLEEPIDLE

400mW

160uW50mW

90us

90us10us

10us160ms

Page 110: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Power State Transitions(Fujitsu MHF 2043 AT)

Working: 2.2 W(spinning + IO)

Sleeping: 0.13 W(stop spinning)

Idle: 0.95 W(spinning)

read / write

IO complete

shut down0.36 J, 0.67 sec

spin up4.4J, 1.6 sec

Page 111: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

The applicability of DPM

• State transition power (Ptr) and delay (Ttr)• If Τtr = 0, Ptr = 0 the policy is trivial

– Stop a component when it is not needed

• If Τtr != 0 or Ptr != 0 (always…)– Shutdown only when idleness is

long enough to amortize the cost

– What if Τ and P are not deterministic?

Ptr Ttr

OFF

Ptr Ttr

ON

Page 112: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

offon

ontrtrtrBE PP

PPTTT

−−

+=offon

ontrtrtrBE PP

PPTTT

−−

+=

System break-even-time:TBE

Transition delay (Ttr) Transition power (Ptr)

Sleep power (Poff)

•• Minimum idle time for amortizing theMinimum idle time for amortizing thecost of component shutdowncost of component shutdown

Page 113: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

When to use powermanagement

• When– Average idle periods are long enough– Transition delay is short enough– Transition power is low enough– Sleep power is low enough

• When designing system for a known workload– Criteria for component specification and design

avgidleBE TT < avgidleBE TT <

Page 114: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Controlling PM systems

• DPM is a control problem: a policy is the control law– Collect observations– Issue commands

• Optimal control– Synthesize the best controller (PM)

Page 115: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

The workload

The system

Tidlen

Tactiven-1

Tidlen-1

t

On Off

Pon Poff

PsdTsd

Page 116: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Predictive techniques• Observe time-varying workload

– Predict idle period Tpred ~ Tidle

– Go to sleep state if Tpred is long enough toamortize state transition cost

• Main issue: prediction accuracy

T

Load

Wasted idle time Incorrect prediction

Tpred

Tidle

Page 117: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

• Simple policy– If Tidle > TTO go to SLEEP– Stay in sleep until workload != 0

• Rationale– When Tidle > TTO it is likely that: Tidle > TTO + TBE

• Choice of TTO is critical– Large is safe, but it could be useless– Too small is highly undesirable

• Limitations– Performance penalty for wake-up is paid after every

shutdown– Power is wasted during TTO

Fixed time-out

Page 118: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

• Choose: TTO = TBE

• Property:– Worst- case energy consumption is twice higher as

compared to an oracle policy which knows the future• Rationale:

– Worst case workload has repeated idles periodsof length TBE separated by point-wise activity

• Disadvantage:– For some components (e.g., hard disks) a small time-out

corresponds to many transitions, which affect componentreliability

Component-specific time-out

Page 119: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Predictive shutdown[Srivastava and Broderson]

• More aggressive than time-out– Eliminates power waste caused by TTO

– Shutdown as soon as idle if predicted idleperiod Tpred>TBE

• Prediction is based on past history– Non-idle periods Tactive are observed as well– Tidle

n, Tactiven: n-th idle and active periods

Page 120: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Previous active period

Tidlen

Tactiven-1

TBE

Ta-threshold

If Tactiven-1 < Ta-threshold then go to SLEEP

Page 121: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

When to use predictivetechniques

• When workload has memory• Implementing predictive schemes

– Predictor families must be chosen basedon workload types

– Predictor parameters must be tuned to theinstance-specific workload statistics

– When workload is non-stationary orunknown, on-line adaptation is required.

Page 122: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Stochastic control approach

• Recognize inherent uncertainty

– Exact prediction of future events is impossible

– Abstraction of system model implies uncertainty

• Model components,system and workload as

stochastic processes

• Expected values of cost metrics are

optimized

Page 123: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Controlled Markov processes• System and environment modeled as Markov chains

– System is called service provider (SP)– Environment is called service requester (SR)

• SP is a controlled Markov chain– State transition probabilities depends on commands

• The power manager (PM) observes the state of thesystem and issues commands to control evolution

SR SPPM

Page 124: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Example

PowerManager

Policy

System

Queue

4 3 2 1

Sleep

Active

Idle

Sleep

Service Provider

Active

Idle

Service Requestor

Page 125: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Controlled Markov processesadvantages

• Optimum policy is captured by actionswhich are function of system state only– Actions are typically non-deterministic

i.e., probabilities of issuing a command– Control policy is a table– Policy implementation is easy

• Computation can be cast as linear programand solved exactly

Page 126: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Markov models for DPMoutstanding issues

• Discrete-time versus continuous time– Avoid periodic evaluation of system state– Event-driven paradigm

• Stochastic distributions:– Geometric and exponential distribution of

events may not fit component– Use semi-Markov models

• Non-stationary work-loads– Use adaptive schemes

Page 127: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Service Requestor - Idle State

0.0001

0.001

0.01

0.1

1

1 10 100 1000

Time (s)

Tai

l dis

trib

utio

n

ExponentialPareto

Experimental

bSR ta1E −⋅−=

Page 128: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

When to use stochasticcontrol

• Constrained power optimization– Power/performance trade-off

• Global view of the system– Relations between workload and resources– Extensions to multiple power states

• For stationary Markov models– Optimum policies can be efficiently computed

• For non-stationary Markov models– Good heuristic adaptive policies can be derived

Page 129: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Outline• Introduction• System conceptualization and modeling• System design• System Management

– Dynamic power management techniques– Implementation of DPM

• Conclusions

Page 130: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Operating system-basedpower management

• In systems with an operating system (OS)– The OS knows of tasks running and waiting– The OS should perform the DPM decisions

• Advanced Configuration and Power Interface(ACPI) [Intel, Microsoft, Toshiba]– Open standard to facilitate design of OS-based

power management

Page 131: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

OS

Kernel Power Management

DeviceDriver

ACPI driverAML interpreter

ACPI architecture

BIOS

Applications

Platform Hardware

Motherboarddevices

Chipset CPU

ACPI Tables

ACPITable interface

BIOS interface

Registerinterface

ACPI BIOS ACPI registers

Page 132: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

State of the art

• ACPI supported by:

– Recent PCs (1999+)

– Windows 2000

• Most PC still use APM

• Experiments with:

– Sony VAIO PCG-F150

– Power savings: 1.7 X

– Comparable performance

Page 133: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Implementations of DPM

• Shutdown idle components• Gate clock of idle units• Clock setting and voltage setting

– Support multiple-voltage multiple-frequencycomponents

– Components with multiple working power states

Page 134: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

DPM and operating systems

• Task scheduling:– Schedule tasks with variable latencies (due to

clock setting) while meeting timing constraints.– Schedule tasks to increase efficiency of DPM,

by grouping idle periods of resources

• Exploit interaction among applicationprograms, OS scheduling and hardware, toimprove power manageability

Page 135: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

DPM: summary• DPM exploits variations in workload:

– Analysis of workload statistics and componentparameters is required to determine its applicability

• DPM policy choices:– Predictive methods and stochastic control

• Implementation choices:– Shutdown, clock gating and setting– Strong interaction between software layers and

hardware

• New requirements for operating systems

Page 136: SYSTEM-LEVEL POWER OPTIMIZATION - INFflavio/ensino/cmp502/date00_tut_slides.pdf · • Describe design techniques and tools for system-level design • Address system-level modeling

Benini & De Micheli DATE 2000

Conclusions• System-level energy-efficient design is an area

of research with many opportunities– Many interesting ideas have been proposed– Design tools will not solve the entire problem

• Global view of the system is required:– Technology: pushing the limit of low voltage, noise,

reliability, frequency, performance, …– Hardware architecture: systems with heterogeneous

components and multi-threaded execution– Software: operating systems and compilers