day 4: symbiotic optimization kim hazelwood acaces summer school july 2009

32
Day 4: Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009

Upload: bernadette-ramsey

Post on 02-Jan-2016

222 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Day 4: Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009

Day 4: Symbiotic Optimization

Kim HazelwoodACACES Summer School

July 2009

Page 2: Day 4: Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009

ACACES 2009 – Process Virtualization

Modern Computing Challenges

Performance

Power & temperature

Reliability

Parallelism (multicore)

Heterogeneity

Limited resources (embedded computing)

Page 3: Day 4: Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009

ACACES 2009 – Process Virtualization

Typical Approaches

Optimize using SW or HW techniques in isolation

Performance

• SW: Compile-time optimizations, OS scheduling

• HW: Architectural improvements, VLSI technology

Reliability: Code/data duplication (HW or SW)

Power & Temperature

• HW control mechanisms

• Profile + recompile cycle HW

SW

Page 4: Day 4: Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009

ACACES 2009 – Process Virtualization

Modern Design Constraints

Compilers – “Compile once, run anywhere”• Cannot ship “MS Office for 3Q08 batch of Core2

3GHz, > 8GB RAM, BrandX power supply, located in high altitudes…”

OSes – Scheduling algorithms don’t scale to modern architectures

Microarchitecture – Limited window of application knowledge • Past must predict the future

Circuits – Guaranteed correctness, reliability, • Must design for the worst case

Page 5: Day 4: Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009

ACACES 2009 – Process Virtualization

Tortola: Symbiotic Optimization

Enable HW/SW Communication

What small changes can we make to HW/OS to enable collaborative solutions?

SW Applications

Binary Modifier

OS/HW

runtime traits hardware

feedback;os load

scheduling hints

Page 6: Day 4: Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009

ACACES 2009 – Process Virtualization

The Power of Virtualization

• No longer restricted to a fixed ISA

• Reduce hardware complexity– No more backwards compatibility warts– Fix bugs after shipment– Reduce time to market for new architectures

HW

SW Applications

Binary Modifier

Initiallyx86x86

Eventually

SWI

HWI

Page 7: Day 4: Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009

ACACES 2009 – Process Virtualization

Tortola Applications

Combine global program information with run-time feedback

• System-specific power usage• Application-specific heat anomalies• Workload/input specific performance optimization

Two case studies

• The di/dt problem – solved using hardware feedback and dynamic optimization

• Heterogeneous multicore scheduling – solved using hardware feedback, OS feedback, and OS hints from the binary modifier

Page 8: Day 4: Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009

ACACES 2009 – Process Virtualization

The di/dt Problem

Voltage stability is important for reliability, performance

Low-power techniques have a negative side effect: current variation

ITRS cites noise management as a Grand Challenge for 5-10 year time frame

Dips (undershoots) in supply voltage – can cause incorrect values to be calculated or stored

Spikes (overshoots) in supply voltage – can cause reliability problems

Page 9: Day 4: Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009

ACACES 2009 – Process Virtualization

di/dt Solutions

Software

MicroArch

Circuit-Level

Compiler Optimizations

Sensor/Actuator Mechanisms

Decoupling capacitors More Vdd Gnd pins on

package

Co-Designed MicroArch & SW Binary Modifier

Page 10: Day 4: Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009

ACACES 2009 – Process Virtualization

Sensor-Actuator Mechanisms

On-chip voltage sensors detect abnormally high/low voltage levels

On-chip actuator then attempts to quickly raise/lower the processor’s current draw

• Phantom firing– increases current (at the expense of power)• Resource throttling

– reduces current (at the expense of performance)

Page 11: Day 4: Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009

ACACES 2009 – Process Virtualization

1.05V

1.03V

1V

0.97V

0.95V

Op

era

tin

g

Volt

ag

e R

an

ge

Hard EmergencySoft

Emergency

ControlThreshold

Detecting Imminent Emergencies

Page 12: Day 4: Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009

ACACES 2009 – Process Virtualization

Targeting Mid-Frequency Di/dt

•Problematic: wide current spike

•Worst case: pulse at the resonant frequency

Su

pp

ly V

olta

ge (

V)

Pro

cess

or C

urr

ent

(A)

Time (Cycles)

Minimum Voltage

20 cycles

Su

pp

ly V

olta

ge (

V)

Pro

cess

or C

urr

ent

(A)

Time (Cycles)

Minimum Voltage

60 cycles

Maximum Voltage

*From:

Joseph et al.

HPCA 2003

Page 13: Day 4: Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009

ACACES 2009 – Process Virtualization

A di/dt Stressmark

BEGIN_LOOP:…ldt $f1, ($4)divt $f1, $f2, $f3divt $f3, $f2, $f3stt $f3, 8($4)ldq $7, 8($4)cmovne $31, $7, $3stq $3, $(4)stq $3, $(4)stq $3, $(4)…stq $3, $(4)…JUMP BEGIN_LOOP

Seq

uen

tial

Seq

uen

tial

Low

Pow

er

Low

Pow

er

Para

llel

Para

llel

Hig

h P

ow

er

Hig

h P

ow

er

But…Actuator engages every loop iteration degrading performance

Why not correct the problem in the code?

Page 14: Day 4: Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009

ACACES 2009 – Process Virtualization

Why use Dynamic Binary Modification?

Modify the instruction stream at run time

Much easier to react to an emergency than to predict one

Emergencies are processor dependent, but software should not be!

Enables run-time guarantees

Page 15: Day 4: Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009

ACACES 2009 – Process Virtualization

Leverage our additional software layer to supplement existing solutions

Microarchitecture provides feedback to our software-based virtual layer

Proposed Solution

Microprocessor

Sensor+Actuator Ext

AlteredAlteredExecutableExecutable

SWHW

BinaryModifierExecutableExecutable

VL

Page 16: Day 4: Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009

ACACES 2009 – Process Virtualization

Required Investigations

Characterizing emergencies

• How often do we see di/dt emergency loops?

• Are emergencies usually code-based?

Communication between the microarchitecture and the virtual layer

• What information should be passed to virtual layer during an emergency?

Fixing di/dt via binary modification

• Will existing techniques help?

• New algorithms?

Page 17: Day 4: Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009

ACACES 2009 – Process Virtualization

Last-Executed Branch

Data suggests modifying a few code sequences will eliminate many voltage emergencies

1

10

100

1000

10000

100000

1000000

gzip

vpr

gcc

mcf

crafty

parser

eon

perlbmk

gap

vortex

bzip2

twolf

wupwise

swim art

mgrid

applu

mesa

galgel

equake

facerec

sixtrack

ammp

apsi

Emergencies

Distinct Total

Page 18: Day 4: Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009

ACACES 2009 – Process Virtualization

Our goal is to• Smooth out current profile, or• Knock pulses off of the resonant frequency

Some existing options• Software pipelining, code motion, instruction

padding

Microprocessor

Sensor+Actuator Ext’ns

AlteredAlteredExecutableExecutable

BinaryModifier

ApplyApplyOptimizationsOptimizations

ExecutableExecutable

Possible Compiler Optimizations

Page 19: Day 4: Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009

ACACES 2009 – Process Virtualization

AABB

Iteration=1

Iteration=2

Iteration=3

Current

AABB

AABB

AABB

Software pipelining

smoothes profile

Current

Problematicloop:

AAAABBBB

Current

Loop unrolling disrupts

resonance pulse

Unrolledloop:

Loop Unrolling & SW Pipelining

Page 20: Day 4: Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009

ACACES 2009 – Process Virtualization

0.97V

0.98V

0.99V

1.00V

1.01V

1.02VBefore Loop Unrolling After Loop Unrolling

H

L

H1

H2

L1

L2

Unrolling the Di/dt Stressmark

Page 21: Day 4: Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009

ACACES 2009 – Process Virtualization

Two Case Studies

The di/dt problem – solved using hardware feedback and dynamic optimization

Heterogeneous multicore scheduling – solved using hardware feedback, OS feedback, and hints from the binary modifier

Page 22: Day 4: Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009

ACACES 2009 – Process Virtualization

Process Heterogeneity

• Process variation

Design Heterogeneity

• Specialized processor cores

Today’s Multicore Designs

Future Multicore Designs

Heterogeneous Multicores

Page 23: Day 4: Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009

ACACES 2009 – Process Virtualization

The Challenge: Scheduling

Many OSes assume identical core resources

• Bad assumption, even today (hyperthreading)

The OS may not have enough information to make the best scheduling decisions

• depending on the type of heterogeneity

Ideal process-to-core mappings change dynamically

Should this be a task for the OS alone?

Page 24: Day 4: Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009

ACACES 2009 – Process Virtualization

Our Approach

Claim: OSes could benefit from scheduling hints

Solution: Combine historical performance data with application phase information

SW Applications

Binary Modifier

OS/HW

phase information performance

counter info; os load

scheduling hints

Page 25: Day 4: Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009

ACACES 2009 – Process Virtualization25

Initial Investigation

Hardware configuration

• In-order core

• Out-of-order core

Software configuration

• Two applications executing (SPEC all combinations)

Migration indicators

• IPC (from HW)

• Phase (from SW) 1M cycle granularity

out of order

in order

App1 App2

Page 26: Day 4: Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009

ACACES 2009 – Process Virtualization26

Results

1. Calculated ideal (omniscient) scheduling

2. Calculated random scheduling

3. Explored two migration heuristics• IPC-threshold scheduling• IPC-delta scheduling

Metric: Distance from ideal

Page 27: Day 4: Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009

ACACES 2009 – Process Virtualization27

Random Scheduling

Probability of Swapping at Each 1M Timeslice 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

60%

40%

20%

0%

Dis

tan

ce f

rom

Id

eal

Page 28: Day 4: Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009

ACACES 2009 – Process Virtualization28

IPC Threshold SchedulingPerc

en

t D

ista

nce

fro

m Id

eal

30

20

10

00 Backoff 0.5 Backoff 1.0 Backoff 1.5 Backoff 1.9 Backoff 2.0 Backoff

0.5 IPC 0.75 IPC 1.0 IPC 1.25 IPC 1.5 IPC 1.75 IPC 2.0 IPC

Page 29: Day 4: Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009

ACACES 2009 – Process Virtualization29

IPC Delta SchedulingPerc

en

t D

ista

nce

fro

m Id

eal

30

20

10

0 0.05 0.1 0.15 0.2 0.3 0.4 0.5 0.75 1 1.25 1.5 1.75 2

Migration Trigger - IPC Change

Page 30: Day 4: Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009

ACACES 2009 – Process Virtualization30

Ongoing Investigations

Varying heterogeneity

• cache sizes

• floating-point availability

Determining migration indicators

• cache misses

Combinations

Other software feedback

Page 31: Day 4: Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009

ACACES 2009 – Process Virtualization

Symbiotic Optimization

Cross-layer approaches can be powerful

Minor changes enable communication channels

Hardware feedback and binary modification can help solve the di/dt problem

Hardware feedback and program phases can guide OS scheduling decisions

The Tortola design can also target power reduction, temperature reduction, reliability, etc.

http://www.tortolaproject.com/

Page 32: Day 4: Symbiotic Optimization Kim Hazelwood ACACES Summer School July 2009

ACACES 2009 – Process Virtualization

Course Summary

Now you know …

Process VMs versus system VMs

Research issues in building process VMs

How to use process VMs for a variety of other research projects

Why cross-layer approaches can be powerful and how to use process VMs to get there

Thanks and enjoy the rest of your summer!

32