new methods for exploiting program structure and behavior
TRANSCRIPT
New Methods forExploiting Program Structure and Behavior
in Computer Architecture
Guri Sohi
Computer Sciences DepartmentUniversity of Wisconsin-Madison
Contributors: Andreas Moshovos, Avinash Sodani,Amir Roth, Andy Glew, Craig Zilles, Harit Modi, Adam Butts,
Param Oberoi
r Architecture Slide2
s
ionships
Ou
• G
• C
• E
• P
• E
• H
New Methods for Exploiting Program Structure and Behavior in Compute
tline
rowth in processor performance
oncepts and performance refiners
xisting basis: observed events
otential future basis: causal relationship
xamples
andling information about causal relat
r Architecture Slide3
ckler
96 97 98 99
66
PPro/150
II/300PentiumIII/600
GroR
ela
tive I
nte
ger P
erfo
rm
an
ce
New Methods for Exploiting Program Structure and Behavior in Compute
wth in Microprocessor Performance
Thanks to Doug Burger and Steve Ke
80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95Date of First Volume Shipments
0.1
1
10
100
1000PerformanceClock RatePerformance/Clock Rate
40%
52%
8088/8
80286/10
386/16486/25
Pentium/
Pentium
r Architecture Slide4
ncepts
s
Co
• S
• A
• C
New Methods for Exploiting Program Structure and Behavior in Compute
ntributors to Performance Growth
emiconductor technology
rchitectural and microarchitectural coO memory hierarchiesO pipeliningO out-of-order execution with speculationO speculative multithreading
oncept Enablers/Performance RefinerO novel cache elementsO cache managementO branch predictorsO dependence predictors
r Architecture Slide5
re Role
parameters
future actions
PP
xecutionardware
Interface
Sof• In
• H
New Methods for Exploiting Program Structure and Behavior in Compute
tware/Hardware Interface and Hardwaterface is imperative (vs. declarative)
+ results in repeatable execution state- limited flexibility for changing technology
ardware tries to determine program’s
APP APP A
Compiler
? PerformanceRefiners
EH
Imperative
r Architecture Slide6
ions
ram’s
to learn
Ha
• H
• Ue
• La
New Methods for Exploiting Program Structure and Behavior in Compute
rdware Performance Refiners
ardware tries to learn program’s intentO Observes program actions (statistical)O Uses other information
ses information to keep ahead of progxecution
O make predictions about future outcomes
imited amount of hardware limits abilitybout events
r Architecture Slide7
outputsvaluesrns
based onerties
old all the time
uture actions
Ob
New Methods for Exploiting Program Structure and Behavior in Compute
served Events: Secondary Information
Secondary information: instruction inputs/O Examples: branch outcomes, addresses, O Properties: spatial/temporal locality, patte
Current mechanisms almost exclusivelysecondary information and its prop
Problem I: secondary properties may not h
Problem II: Hard to determine program’s f
r Architecture Slide8
mongst
program
, causalecture?
outcomes!
Ob
• Po
• Pb
• Cre
• Gm
New Methods for Exploiting Program Structure and Behavior in Compute
servations
rograms have structure (relationships aperations)
rogram structure causes the observedehavior
an we exploit primary information, i.e.lationships in architecture/microarchit
ood model for predicting about futureight be selected parts of the program
r Architecture Slide9
t operationsependences
t (strong)ary behavior
Prim
New Methods for Exploiting Program Structure and Behavior in Compute
ary Information: Program Structure
Primary information: relationships amongsO Examples: control dependences, data dO Properties:
- temporal stability: program is invarian- causality: causes all observed second
r Architecture Slide10
machines
memory
Pro
• B
• S
• In
• C
• P
New Methods for Exploiting Program Structure and Behavior in Compute
blems Considered
ranch Prediction
cheduling memory operations in OOO
ter-operation communication through
ommunication in multiprocessors
refetching linked data structures
r Architecture Slide11
tion (taken or
ous branch
branch outcomes
a cause
tion? 98), Roth
Pro
• Pn
• So
• Yo
• C
New Methods for Exploiting Program Structure and Behavior in Compute
blem: Branch Prediction
redict the outcome of a branch instrucot taken)
mith 2-bit predictor: learns about previutcomes
O observes outcomes of single branch
eh and Patt-type adaptive predictors: utcomes correlated with other branch
O learn branch correlationsO Still exploiting an observation, rather than
an we exploit primary (causal) informaO Yes, but not in this talk (e.g., Farcy (MICRO
(ICS’99))
r Architecture Slide12
ns of/stores
tore
if incorrect speculation
Pro
• Oin
• W
New Methods for Exploiting Program Structure and Behavior in Compute
blem: Scheduling Memory Operations
OO instruction scheduler has collectiostructions to schedule, including loads
hen to schedule a load instruction?O when it is not dependent on a pending s
- may be too lateO speculate no dependence, and recover
- need performance refiner to improveaccuracy
r Architecture Slide13
I1ST1I3I4
ST2
LD1I7
LD2I9
Ideal
n blindideal
De
C
New Methods for Exploiting Program Structure and Behavior in Compute
pendence Speculation
I1
ST1
I3
I4
ST2
LD1
I7
LD2
I9
I1ST1I3I4
ST2
LD1I7
I9
LD1I7
LD2
I1ST1I3I4
ST2LD1I7
LD2I9
Blind Selective
STA
LL
ode
• Selective may perform worse tha• Can also perform as well as the • In practice:
performance behavior varies
ST2
I4
I3
r Architecture Slide14
able
ze?
② Synchronize
LDPC STPC PRED
STPC
lations
Lea
• D
• T
• U
New Methods for Exploiting Program Structure and Behavior in Compute
rning Dependence Relationships
ependence: (Load PC, Store PC)
emporal locality - Small Working Set.
se a small table to:
Memory Dependence Prediction T
LD
ST
LDPC STPC
① Misspeculation
② Allocate entry
LDPC STPC PRED
LD
① Execute?
② No! Synchronize
LDPC STPC PRED
LDPC
LD
① Synchroni
ST
(1). track recent mis-specu
(2). Predict dependences
r Architecture Slide15
loads
d
nces and use to
Emer 1998
Sol
New Methods for Exploiting Program Structure and Behavior in Compute
ution Summary
Solution1: use prediction to stall offendingO no program structure information requireO not very effective
More Hesson, et. al., 1997
Solution 2: determine store-load dependesynchronize speculation
O use program structureO very effective
More Moshovos, et. al., 1997, Chrysos and
r Architecture Slide16
Memory
storevalue
store addr
load
load addr
eline
ad: Direct Link
store2 addr
Delays
ProP
rogr
am O
rder
De
New Methods for Exploiting Program Structure and Behavior in Compute
blem: Communicating Values Through
dela
y
Tim
Store - Lo
Implicit
1. Calculate address2. Establish Dependence
?
addr
ess
valu
e
Memory
store
load
store 2
Instructions
Explicit
lays: No
Implicit
Explicit
address
r Architecture Slide17
nto explicit
k: synonym
MemoryHierarchy
Traditional
ss
ss
Spe
Dyn
• D
• Sp
s
Sy
New Methods for Exploiting Program Structure and Behavior in Compute
culative Memory Cloaking
amically & Transparently convert implicit i
ependence prediction ➔ direct load-store lin
eculative and has to be eventually verified
tore PC load PC synonymDependence Prediction
value f/e
nonym File
addre
addre
1
2
3
4speculative
verify
value
Timeline
storevalue
addr
addr
load
r Architecture Slide18
ation deviceds.ertain loads
lue along link
Me
New Methods for Exploiting Program Structure and Behavior in Compute
mory Cloaking Summary
Program structure: Memory is a communicfor passing values from stores to loaNot random: only certain stores to c
Speculative Memory CloakingLink stores to loads explicitly, pass va
Store
Load
Value
Value/Addr
Value/Addr
MemoryValue
r Architecture Slide19
ed for passing another (USE).it directly)
-USE links into
CRO-30
ueMemory
Spe
New Methods for Exploiting Program Structure and Behavior in Compute
culative Memory Bypassing Summary
Program structure: Loads and stores are usvalues from one instruction (DEF) toVia memory? (maybe not, can do
Speculative Memory BypassingCollapse DEF-store, store-load, loada direct DEF-USE link
More: Moshovos & Sohi, Tyson & Austin, MI
Store
Load
Val
DEF
USE
Value
Value
Value Value
r Architecture Slide20
emory MP’s
g patterns
utes
ig)
y of program,
(small)er
Pro
New Methods for Exploiting Program Structure and Behavior in Compute
blem: Fast Communication in Shared m
Problem: Optimize CC protocols for sharin
So far: Detect patterns using address attribO Track state proportional in size to data (bO Little predictive power
Program structure: Sharing pattern propertnot data
Detect using instruction relationshipsO Track state proportional in size to programO Great predictive power, works much bett
More: [Kaxiras, PhD Thesis, HPCA99]
r Architecture Slide21
es
d
es, OO-progs
; l = l->next )
ss(l); == key)
r Load
refetch
y...
Pro
P
S
A
New Methods for Exploiting Program Structure and Behavior in Compute
blem: Prefetching Linked Data Structur
Linked Data Structures (LDS): pointer-baseO Lists, trees, graphs, etc.O Prevalent: simulators, compilers, databas
for (l = list; l
proceif (l->keykey
next
A B C
Pointe
ointer Chasing Problem
olution: Make sure latencies are short → P
→ Pointer loads serialized
keynext
keynext
→ Latencies add
s if memory latency wasn’t a problem alread
r Architecture Slide22
ible
Pot
New Methods for Exploiting Program Structure and Behavior in Compute
ential Solution
v /Pre.fetch/ := Issue loads as early as poss(as soon as address is ready)
First reaction: Try to predict addressesO Array: See A[0], A[1], A[2] → predict A[3]O LDS: See A, B, C → predict ?
r Architecture Slide23
producing
pendences,
tches
99]
Me
New Methods for Exploiting Program Structure and Behavior in Compute
chanics I - Overview
Observation: some piece of code must beaddresses/traversing structure
Step 1. Examine running program, learn deisolate code thread
Step 2. Pre-execute code to launch prefe
More: [Roth, Moshovos & Sohi, ASPLOS-8, ISCA
r Architecture Slide24
extext
l->keyl = l->nextCan go
(static) loads
ext
d outputs
ged to do this
nt load inputs
Me
E
New Methods for Exploiting Program Structure and Behavior in Compute
chanics II - Learning Dependences
l = l->nl = l->nDone
Output Load
l = l->next
l->key
l = l->nextA B
B
B
C
stablish (dynamic) dependence betweenTi
me
/Pro
gra
m O
rde
r
B l = l->n
1. Buffer recent loa
Use values exchan
2. Compare curre
r Architecture Slide25
ss
D-Cache
t
le
C->next
Me
New Methods for Exploiting Program Structure and Behavior in Compute
chanics III - Prefetching
2. Compute prefetch addre
l = l->nextl = l->next
l = l->nexl->key
Done Can goB
l = l->next
l->key
l = l->next
l->key
B C
C
C
D
Tim
e/P
rog
ram
Ord
er
1. Access dependence tab
r Architecture Slide26
ram structure
, managed,nce refiners?
Sum
• Sin
• Ha
• H
New Methods for Exploiting Program Structure and Behavior in Compute
mary and More Questions
everal applications for very limited progformation
ow should this information be gatherednd provided to the hardware performa
O described techniques used hardwareO software knows this information trivially
ow can software get involved?O 4 models
r Architecture Slide27
ormation.nere possible with
Mo
• HR
New Methods for Exploiting Program Structure and Behavior in Compute
del 1 for Software Involvement
ardware extracts program structure infepresents internally in declarative man
- Likely to have limited abilities; much morsoftware
+ may be only practical option
r Architecture Slide28
tion available
r hardware
e
are platformses of changes in
Mo
• Str
• U
• E
• D
• B
New Methods for Exploiting Program Structure and Behavior in Compute
del 2 for Software Involvement
oftware has program structure informaivially
se information to develop mandate fo
xpress mandate in imperative languag
oes not work with a collection of hardwO balance keeps shifting with disparate rat
technology
een there, done that
r Architecture Slide29
hardware
on
atable).re (not
Mo
• E
• S
• A
New Methods for Exploiting Program Structure and Behavior in Compute
del 3 for Software Involvement
xpress computation as a dataflow grapO dependence relationships available to h
hould this be done? NOO No repeatable state for program executi
dvantages of repeatable stateO Easy to write debug software (debO Easy to design, build, verify hardwa
debatable
r Architecture Slide30
ner
mation in a
Mo
• E
• Pd
• O
New Methods for Exploiting Program Structure and Behavior in Compute
del 4 for Software Involvement
xpress computation in imperative manO repeatable state
rovide advisory program structure inforeclarative manner
verheads for doing soO provide information judiciously
r Architecture Slide31
uired?
rmation?
re?
Res
New Methods for Exploiting Program Structure and Behavior in Compute
earch Issues
For a particular optimizationO What program structure information is reqO How do we represent this information?O How do we collect and manage this info
More broadlyO Where can we apply program structure?O Is there a larger framework?O What is general purpose program structuO Implementations?
r Architecture Slide32
tionwkward
Res
New Methods for Exploiting Program Structure and Behavior in Compute
each Issues II: Implementations
Hardware only
Software to hardwareO Compiler has all kinds of program informaO How to express it? Instruction-like things aO Where and how much to express?
Software/hardware hybrids
r Architecture Slide33
tionsfuture
on observed
ed programre actions
rs
veying,ation
Sum
• A(pp
• Ce
• Fst
• S
• Sm
New Methods for Exploiting Program Structure and Behavior in Compute
mary
rchitectural/microarchitectural innovaerformance refiners) will be critical to
rocessor performance growth
urrent performance refiners are basedvents
uture performance enablers likely to neructure to reason about program’s futu
everal examples presented; many othe
everal research issues in gathering, conaintaining, and managing such inform