peter m. kogge: cse dept. university of notre dame [email protected] kanad ghose: cs dept

24
May 23-24, 2000 Scottsdale, AZ Kickoff_may_2000.ppt 1 Morphable Computer Architectures for Highly Energy Aware Systems: PACC Program Review: Nov. 1-3; Annapolis, MD Peter M. Kogge: CSE Dept. University of Notre Dame [email protected] Kanad Ghose: CS Dept. SUNY-Binghamton; [email protected] Nikzad “Benny” Toomarian: Center for Integrated Space Microsystems (CISM) Jet Propulsion Lab; [email protected]

Upload: abla

Post on 22-Jan-2016

35 views

Category:

Documents


0 download

DESCRIPTION

Morphable Computer Architectures for Highly Energy Aware Systems: PACC Program Review: Nov. 1-3; Annapolis, MD. Peter M. Kogge: CSE Dept. University of Notre Dame [email protected] Kanad Ghose: CS Dept. SUNY-Binghamton; [email protected] Nikzad “Benny” Toomarian: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Peter M. Kogge: CSE Dept.  University of Notre Dame kogge@cse.nd Kanad Ghose: CS Dept

May 23-24, 2000Scottsdale, AZKickoff_may_2000.ppt

1

Morphable Computer Architecturesfor Highly Energy Aware Systems:

PACC Program Review: Nov. 1-3; Annapolis, MD

Peter M. Kogge: CSE Dept. University of Notre Dame [email protected]

Kanad Ghose: CS Dept.SUNY-Binghamton; [email protected]

Nikzad “Benny” Toomarian: Center for Integrated Space Microsystems (CISM)

Jet Propulsion Lab; [email protected]

Page 2: Peter M. Kogge: CSE Dept.  University of Notre Dame kogge@cse.nd Kanad Ghose: CS Dept

Nov. 1-3, 2000Annapolis, MDOct_2000_review.ppt

2

Outline

Quad Chart

“Gear-Shifting” Simplified

The Morph Program

The Morph Architecture

Test Bed & Benchmarks

Page 3: Peter M. Kogge: CSE Dept.  University of Notre Dame kogge@cse.nd Kanad Ghose: CS Dept

Nov. 1-3, 2000Annapolis, MDOct_2000_review.ppt

3

MORPHMORPH: Dynamic Low Energy Architectures

Profiles

Baseline

Morphable Node

Data Placement

Adaptive Algorithms

Run-time

Demo & Eval

5/00 11/00 5/01 11/01 5/02

New Ideas• Morphable microarchitecture to allow dynamic changes in energy expended per cycle• Energy efficient morphable memory hierarchies• Energy efficient ISA extensions to process data more energy efficiently• Adaptive algorithms to select best configuration• Energy aware run-time which can reconfigure system

MORPHAdds An

““Energy Gear”Energy Gear”to Dynamically Configurable

Embedded Systems

IMPACT• Focus on energy, not just power, management• Develops suite of widely applicable energy-reducing architectural techniques• Adds extra technology-independent degrees of freedom to dynamic energy control• Provides an overall inherently more energy efficient embedded computing system• Designed for transfer to real missions

Page 4: Peter M. Kogge: CSE Dept.  University of Notre Dame kogge@cse.nd Kanad Ghose: CS Dept

Nov. 1-3, 2000Annapolis, MDOct_2000_review.ppt

4

What is “Gear-Shifting” all about?

Definitions: IPC = Instructions per Cycle EPC = Energy per Cycle C = Cycles per Second Performance = “Instructions/second” = IPCxC Power = “Energy/second” = EPCxC M = performance required during some mode (instructions/second)

Real world: performance needs change very dramatically

Observations on Conventional Designs: Conventional designs fix IPC at some IPCmax to meet peak need In such designs EPC = KxIPCa, where “a” can range to almost 4 Assume arbitrary clock selection (up to a maximum clock Cmax) Ignore Vdd changes for now

Power @ M = KxIPCmaxax(M/ IPCmax) = KxMxIPCmax

a-1

Dependent on clock only thru M

Page 5: Peter M. Kogge: CSE Dept.  University of Notre Dame kogge@cse.nd Kanad Ghose: CS Dept

Nov. 1-3, 2000Annapolis, MDOct_2000_review.ppt

5

Some Simplified Gear Equations

Assume IPC smoothly changeable from IPCmin to IPCmax

Let R = (IPCmax/IPCmin) = “dynamic ratio” of performance range

Let g be a gear setting, ranging from 0 to 1 to change IPC

IPC(g) = IPCmin + (IPCmax - IPCmin)g = IPCmax[1/R + (1-1/R)g]

EPC(g) = Kx{IPCmax[1/R + (1-1/R)g]}a

Power(g, C) = K x {IPCmax[1/R + (1-1/R)g]}a x C

GEARSGEARS Large R: OUR CHALLENGELarge R: OUR CHALLENGE

Page 6: Peter M. Kogge: CSE Dept.  University of Notre Dame kogge@cse.nd Kanad Ghose: CS Dept

Nov. 1-3, 2000Annapolis, MDOct_2000_review.ppt

6

A Gear-Shifting Strategy

To minimize power as we vary performance requirement M:

Use most efficient IPCmin as long as possible (until clock at maximum) G = 0

Then smoothly vary g while using Cmax

0 Imax x CmaxImin x Cmax

Performance Rqmt

G

0

1

0 Imax x CmaxImin x Cmax

Performance Rqmt

C

0

Cmax

Page 7: Peter M. Kogge: CSE Dept.  University of Notre Dame kogge@cse.nd Kanad Ghose: CS Dept

Nov. 1-3, 2000Annapolis, MDOct_2000_review.ppt

7

The Result

0 IminCmax ImaxCmax

1

(1/R)a-1

0

Ratio of Power under optimal gear change to conventional fixed IPC Power

Performance Rqmt M

Potentially huge for large R

And we canstill use all theother tricksto lower peakpower!P

ower

Sav

ings

Fac

tor

Huge savings if applications spend most time here

Page 8: Peter M. Kogge: CSE Dept.  University of Notre Dame kogge@cse.nd Kanad Ghose: CS Dept

Nov. 1-3, 2000Annapolis, MDOct_2000_review.ppt

8

The Morph Program

Develop a microarchitecture with a large dynamic R “Multi-cluster” superscalar CPU Intelligent placement of data within mixed memory type hierarchy Inherently low energy caches Low energy ISA extensions

Define & use a realistic embedded benchmark suite Drawn from deep-space processing needs - initially rovers Include other DARPA benchmarks such as from DIS Baseline on variety of systems

Develop real-time algorithms for reconfiguration

Demonstrate potential gains via simulation Simplescalar + energy models

Technology transfer to potential future JPL missions

Page 9: Peter M. Kogge: CSE Dept.  University of Notre Dame kogge@cse.nd Kanad Ghose: CS Dept

Nov. 1-3, 2000Annapolis, MDOct_2000_review.ppt

9

The Team

SUNY-BINGHAMTON• Morphable Caches, RFs• Dynamic Bit Slicing• Energy Eff VLIW archs• Supporting compiler techniques

UNIVERSITYOF NOTRE DAME

• Morphable multi-cluster architecture• “At the sense amps” ISA extension• Runtime with hooks for dynamic morphing control

JET PROPULSIONLABORATORY

• Scenarios & benchmarks• Baseline characterizations• Runtime adaptation algorithms

Energy AwareData Placement

Overall Goals:• Architectures with variable IPC, EPC• Tools & S/W to manage morphing• Realistic demonstrations

Peter KoggeVincent FreehJay Brockman

Nikzad ToomarianMohammed MojarradiSavio Chau

Kanad Ghose

Page 10: Peter M. Kogge: CSE Dept.  University of Notre Dame kogge@cse.nd Kanad Ghose: CS Dept

Nov. 1-3, 2000Annapolis, MDOct_2000_review.ppt

10

Starting A Solution:Multi Cluster Architecture

Fetch

Decode

Register File

DataCache

Fetch

Decode

R ename

Issue W indow

Register File

Bypass

DataCache

memoryd isambiguation

Fetch

Decode

Renameand steering

Issue Window

Register File

Bypass

DataCache

RAW

RAB

memorydisambiguat ion

Issue Window

Register File

Bypass

DataCache

RAW

RAB

memorydisambiguation

One Cluster

(a) Simple Pipeline (b) Classical Superscalar (c) New Multi Cluster

Problem: single large centralized register files with many ports Solution: multiple smaller

register files with few ports

IssueWidth(IW)

EPC/IPC ~ (IW)k

k as high as 1.9

w(IW/w)k

<< (IW)kw Clusters

IW/w

Page 11: Peter M. Kogge: CSE Dept.  University of Notre Dame kogge@cse.nd Kanad Ghose: CS Dept

Nov. 1-3, 2000Annapolis, MDOct_2000_review.ppt

11

Target MorphMorph Configuration

Fetch

Decode

Renameand steering

Issue Window

Register File

Bypass

DataCache

RAW

RAB

memorydisambiguation

Issue Window

Register File

Bypass

DataCache

RAW

RAB

memorydisambiguation

One Cluster

EEPROM

FLASH

DRAM

SRAM

Dynamic issuewidth

Dynamic ALU width

Low energy caches

Energy-aware data placement

Dynamic data path width

Alternative ISAfeatures

Selective substrate bias

Embedded+external memory

Variable multi-cluster microarchitecture

Page 12: Peter M. Kogge: CSE Dept.  University of Notre Dame kogge@cse.nd Kanad Ghose: CS Dept

Nov. 1-3, 2000Annapolis, MDOct_2000_review.ppt

12

Evaluation Methodology

PACCBenchmarks

++

++

+

++

++

+

++

++

+

++

++

+

++

++

+

++

++

+

++

++

+

++

++

+

++

++

+

++

++

+

++

++

+

++

++

+

++

++

+

++

++

+

++

++

+

++

++

+

++

++

+

IPC: Instructions per Cycle

EP

C:

En

ergy

per

Cyc

le

Energy Efficient Family

+ Today’s Performance Only Design Point

+

++

++

++

Page 13: Peter M. Kogge: CSE Dept.  University of Notre Dame kogge@cse.nd Kanad Ghose: CS Dept

Nov. 1-3, 2000Annapolis, MDOct_2000_review.ppt

13

Multi-Cluster vs Conventional Results

1x6

1x41x

8

4x4

2x6

Conventional

Up to 1/2 the energy at same IPC, or 20% better IPC at same energy

2x4

4x2

Morph: dynamicallychange the cluster size& ride the EPC/IPC Savings

Page 14: Peter M. Kogge: CSE Dept.  University of Notre Dame kogge@cse.nd Kanad Ghose: CS Dept

Nov. 1-3, 2000Annapolis, MDOct_2000_review.ppt

14

On-chip Caches: Addressing Dynamic & Static Leakage

On-chip caches dissipate 25% to 45% of total energy Likely to increase because of leakage

Added line buffers (4 to 16) reduce dynamic energy dissipation by 40% to 65+%, with no penalty in access time and with 4% to 6% area penalty

Use of dynamic activation of recently-accessed L2 cache areas reduce dynamic dissipation component by 40% to 80% Only selected areas of L2 in active mode, rest in standby Size of bit-cell groups controlled is critical Additional L2 area penalty of approx. 8% Heuristics for controlling transitions between active & standby modes

Page 15: Peter M. Kogge: CSE Dept.  University of Notre Dame kogge@cse.nd Kanad Ghose: CS Dept

Nov. 1-3, 2000Annapolis, MDOct_2000_review.ppt

15

Addressing Dynamic & Static Dissipations in Caches

Page 16: Peter M. Kogge: CSE Dept.  University of Notre Dame kogge@cse.nd Kanad Ghose: CS Dept

Nov. 1-3, 2000Annapolis, MDOct_2000_review.ppt

16

Exploiting Bit-Slice Inactivity in Datapaths

Expectation: Higher-order data bits likely to be insignificant at least some of the time

Opportunity: exploit byte slice inactivity over transfer paths, within storage devices (register files, caches) & function units

FOR SPECfp95 DP

FOR INTEGERS FROM SPECfp95

A circuit to provide read-enables in RFsto avoid energy dissipation on access

Page 17: Peter M. Kogge: CSE Dept.  University of Notre Dame kogge@cse.nd Kanad Ghose: CS Dept

Nov. 1-3, 2000Annapolis, MDOct_2000_review.ppt

17

Deep Space: The Ultimate Power-Constrained Embedded System

Limited energy/power sources Renewable variable power: Solar cells Constant power: RPGs Fixed energy: batteries

Multiple operational modes, all compute/energy constrained Cruise Communication: compression vs

transmission Data gathering vs analysis Movement: collision avoidance

Today: “Pre-canned” power management by

serialized operations

Morph Initial Focus: Rovers

Page 18: Peter M. Kogge: CSE Dept.  University of Notre Dame kogge@cse.nd Kanad Ghose: CS Dept

Nov. 1-3, 2000Annapolis, MDOct_2000_review.ppt

18

Pathfinder Sojourner

Energy Required

Function Time and Calculation

7.51W-hr5.63W-hr6.92W-hr1.83W-hr0.45W-hr

1.2W-hr 

5.2W-hr0.63W-hr15.0W-hr

50W-hr

95W-hr

motor heating: 1 motor at a timemotor heating: 2 motors at a timedriving (extreme terrain @ -80degC)hazard detectionimaging (3 images @ 2 min/image)image compression (compress 3 images @ 6 min/image)6Mbit communication @ 50min/sol42, 10 sec health checks during dayremainder of 7 hr daytime CPU operationWEB heating (as needed)

= 7.51W x 1hr = 11.26W x 0.5hr= 13.85W x 0.5hr= 7.33W x 0.25hr= 4.5W x 0.1hr= 3.7W x 0.3hr = 6.27W x 0.8hr= 6.27W x 0.1hr= 3.7W x 4hr= 50W-hr 

vs peak 15 W-hr Solar Cells + 150 W-hr non-rechargeable battery

Effects on application code:• Many actions sequential, not simultaneous• No dynamic scheduling, no autonomy• Not even CPU-clock management• Nowhere near enough CPU performance• Designed to limit worst case power• Dump excess power into heaters

Page 19: Peter M. Kogge: CSE Dept.  University of Notre Dame kogge@cse.nd Kanad Ghose: CS Dept

Nov. 1-3, 2000Annapolis, MDOct_2000_review.ppt

19

Pancam/Mini-TES

Mini-Corer

Instrument Arm Cluster : Raman Spectrometer Alpha-Proton-X-Ray Spectrometer (APXS) Mössbauer Spectrometer Microscopic Imager

Athena/Mars ’03 Rovers Athena/Mars ’03 Rovers Rover ConfigurationRover Configuration

• 3 Hrs/day of solar @ 50 W• 5 amp hr 16V batteries• More complex communication• More complex on-board eqpt• Still statically scheduled

Page 20: Peter M. Kogge: CSE Dept.  University of Notre Dame kogge@cse.nd Kanad Ghose: CS Dept

Nov. 1-3, 2000Annapolis, MDOct_2000_review.ppt

20

MUSES-CN Asteroid NanoRoverMUSES-CN Asteroid NanoRover

Solar powered @ 1 watt

including RF telecommunications system for communications to lander or small-body orbiter for relay to Earth.

Clock-adjustable CPU speed

To run a command: Determine available solar power. Minimum required power = device + CPU power If available power < minimum required:

if parameter enables re-orienting , re-orient to maximize solar power

if still not enough and parameter enables waiting, wait up to parameter limit for solar power

if still not enough, abort command Set CPU speed to maximum allowable based on

(power available) - (minimum needed for devices)

Perform command: during command execution, if power drops significantly (or load shed indication?...):

CPU speed is reduced to minimum required Operate motors one-at-a-time

Return CPU speed to parameter-specified idle

Still “sequential” operation

Page 21: Peter M. Kogge: CSE Dept.  University of Notre Dame kogge@cse.nd Kanad Ghose: CS Dept

Nov. 1-3, 2000Annapolis, MDOct_2000_review.ppt

21

Some Morph Test Beds

PACC-Blue• 400MHz PPC 7400• Enhanced superscalar + Altivec• Linux

PACC-Gold• 400MHz PPC 750• Linux

JPL PPC-SBC•200 MHz 750•VxWorks

Oscilloscope

Logic AnalyzerPowerPC 750

NT Box

Ethernet

•Different PowerPC configurations•Microarchitecture•Clock rates•ISA extensions

• Run rover/PACC application code• Measure time/power• Use as input to Simplescalar simulation

Page 22: Peter M. Kogge: CSE Dept.  University of Notre Dame kogge@cse.nd Kanad Ghose: CS Dept

Nov. 1-3, 2000Annapolis, MDOct_2000_review.ppt

22

The NASA X2000 Avionics System

high-rateinput

(camera)

high-speed bus (e.g. IEEE 1394)

communicationmodule (CDMA)

bus powercontroller

symmetric multiprocessor modules

altimetersubnet

microcontroller-directed subnet- power regulations & control- analog telemetry sensors- safety inhibits- valve & pyro drive

reconfigurable hardware blocks

low-speed bus (e.g. I2C )

• Design for 10-20X reduction in power, at 10-20X performance increase• With long-term survivability & technology scaling• Application-specific adaptive configuration to match run-time power supply constraints

Page 23: Peter M. Kogge: CSE Dept.  University of Notre Dame kogge@cse.nd Kanad Ghose: CS Dept

Nov. 1-3, 2000Annapolis, MDOct_2000_review.ppt

23

X2000 FD Testbed with Power Awareness

cPCI bus (6U chassis)

PPC

750 (Synergy)

PPC

750 (Synergy)

1394a I/F (Saderta)

1394a I/F (Saderta)

Dual

I2C I/F (JPL)

Empty Slo

t

Empty Slo

t

Empty Slo

t

SUN E3500Workstation(35 GB HD)

SUN Ultra 10Workstation

SUN Ultra 10Workstation

cPCI bus (6U chassis)

PPC

750 (Synergy)

PPC

750 (Synergy)

1394a I/F (Saderta)

1394a I/F (Saderta)

Dual

I2C I/F (JPL)

Empty Slo

t

Empty Slo

t

Empty Slo

t

cPCI bus (6U chassis)

PPC

750 (Synergy)

PPC

750 (Synergy)

1394a I/F (Saderta)

1394a I/F (Saderta)

Dual

I2C I/F (JPL)

Empty Slo

t

Empty Slo

t

GPIB

cPCI bus (6U chassis)

PPC

750 (Synergy)

PPC

750 (Synergy)

1394a I/F (Saderta)

1394a I/F (Saderta)

Dual

I2C I/F (JPL)

Empty Slo

t

Empty Slo

t

Empty Slo

t

cPCI bus (6U chassis)

PPC

750 (Synergy)

PPC

750 (Synergy)

1394a I/F (Saderta)

1394a I/F (Saderta)

Dual

I2C I/F (JPL)

Empty Slo

t

Empty Slo

t

FPGA Rapid Prototype

PCI Bus analyzer

Hard Drive Hard Drive Hard Drive

Hard Drive Hard Drive

Terminal Server

Page 24: Peter M. Kogge: CSE Dept.  University of Notre Dame kogge@cse.nd Kanad Ghose: CS Dept

Nov. 1-3, 2000Annapolis, MDOct_2000_review.ppt

24