trace substitution hans vandierendonck, hans logie, koen de bosschere ghent university europar 2003,...

13
Trace Substitution Hans Vandierendonck, Hans Logie, Koen De Bosschere Ghent University EuroPar 2003, Klagenfurt

Upload: moses-wilkerson

Post on 21-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Trace Substitution Hans Vandierendonck, Hans Logie, Koen De Bosschere Ghent University EuroPar 2003, Klagenfurt

Trace Substitution

Hans Vandierendonck,Hans Logie, Koen De Bosschere

Ghent University

EuroPar 2003, Klagenfurt

Page 2: Trace Substitution Hans Vandierendonck, Hans Logie, Koen De Bosschere Ghent University EuroPar 2003, Klagenfurt

August 27, 2003 Euro-Par 2003 2

Instruction Fetch

• Wide-issue superscalar processors need to fetch multiple branches per cycle– IPC=8 implies fetching ~16 instructions/cycle and

predicting ~3 branches/cycle– Multi-ported instruction cache?

• Trace cache:– Packs fetch groups in a trace– Trace tagged with PC, path, next fetch PC– Multiple branch predictor (MBP) predicts branch

directions

Page 3: Trace Substitution Hans Vandierendonck, Hans Logie, Koen De Bosschere Ghent University EuroPar 2003, Klagenfurt

August 27, 2003 Euro-Par 2003 3

The Trace Cache

instructioncache

tracecache

MBP

MUX

select

hit

pred. trace

pred. insn

fetch addressinstructionshit/miss

legend

pred. path

fetch address

next addressinstructions

fillunit

onlyexecuted

paths!

Page 4: Trace Substitution Hans Vandierendonck, Hans Logie, Koen De Bosschere Ghent University EuroPar 2003, Klagenfurt

August 27, 2003 Euro-Par 2003 4

Overview

• Observation– Trace cache misses are (sometimes) branch

mispredictions

• Trace Substitution– How to make use of it

• Evaluation– Is it worth it?

• Conclusion

Page 5: Trace Substitution Hans Vandierendonck, Hans Logie, Koen De Bosschere Ghent University EuroPar 2003, Klagenfurt

August 27, 2003 Euro-Par 2003 5

Observation

• Multiple branch predictor affects trace cache:– Non-perfect branch

predictors reduce the trace cache hit rate

– FIPA correlates better with TC hit rate than with MBP accuracy

TC: 16K-traces, 4-way set-assoc, path associativityMGAg, Mgshare: 12-bit historyrepeat: 8Kbit hybrid, accessed 3x

0

2

4

6

8

10

12

14

16

MG

Ag

Mg

sha

re

rep

ea

t

pe

rfe

ct

MG

Ag

Mg

sha

re

rep

ea

t

pe

rfe

ct

MG

Ag

Mg

sha

re

rep

ea

t

pe

rfe

ct

gcc vortex avg

FIP

A

70%

75%

80%

85%

90%

95%

100%

Hit

ra

te (

%)

FIPA MBP hits TC hits

Page 6: Trace Substitution Hans Vandierendonck, Hans Logie, Koen De Bosschere Ghent University EuroPar 2003, Klagenfurt

August 27, 2003 Euro-Par 2003 6

TC Misses Are a Tell-Tale for MBP misses

• Trace cache misses coincide with branch mispredictions, e.g.:– 16K-entry trace cache, 12-bit MGAg:

• 84.9% of TC misses are also MBP misses• 37.6% of MBP misses are also TC misses

– 256-entry trace cache, 12 bit MGAg:• 25.1% of TC misses are also MBP misses• 55.9% of MBP misses are also TC misses

• This work: use TC misses to detect MBP misses and fix them

high accuracy,low coverage

low accuracy,higher coverage

Page 7: Trace Substitution Hans Vandierendonck, Hans Logie, Koen De Bosschere Ghent University EuroPar 2003, Klagenfurt

August 27, 2003 Euro-Par 2003 7

Trace Substitution

• Assumption: TC miss implies MBP miss– Correlation between branches implies that some

paths never occur– TC stores only those paths that do occur

• If the predicted path is wrong …– Fetch a different trace– Override MBP with MRU trace starting at fetch PC

• Detect MRU trace from LRU bits stored in TC• No trace substitution applied if it does not exist

Page 8: Trace Substitution Hans Vandierendonck, Hans Logie, Koen De Bosschere Ghent University EuroPar 2003, Klagenfurt

August 27, 2003 Euro-Par 2003 8

Implementation

instructioncache

tracecache

MBP

MUX

select

hit

MRU hit

MRU

pred. trace

pred. insn

fetch addressinstructionshit/miss

legend

pred. path

fetch address

next addressinstructions

fillunit

Page 9: Trace Substitution Hans Vandierendonck, Hans Logie, Koen De Bosschere Ghent University EuroPar 2003, Klagenfurt

August 27, 2003 Euro-Par 2003 9

Evaluation Setup

• Benchmarks– SPECint95 (except compress, go), reference inputs– 500 million instructions from start of program– Compiled for Alpha ISA, Compaq C compiler, -O4

• Fetch Unit– TC: 1 trace = 16 instructions, 3 cond. branches, trace ends at

system call, indirect jump– TC: 4-way set-assoc., path associativity– MBP: MGAg, varying history length– Instruction cache: 32K, 2-way, 32byte blocks, LRU

• Metric– FIPA = fetched instructions per fetch unit access

Page 10: Trace Substitution Hans Vandierendonck, Hans Logie, Koen De Bosschere Ghent University EuroPar 2003, Klagenfurt

August 27, 2003 Euro-Par 2003 10

Evaluation (1)

• Observations:– Gap MGAg-perfect

increases with TC size– 20-40% of gap filled

with trace substitution– Only on TC miss, thus

performance increase drops with TC size

TC: 4-way set-associativeMGAg: 12-bit history

8

9

10

11

12

13

14

64 256 1024 4096 16384

Trace cache size (traces)

FIP

A

perfect

MGAg+subst

MGAg

Page 11: Trace Substitution Hans Vandierendonck, Hans Logie, Koen De Bosschere Ghent University EuroPar 2003, Klagenfurt

August 27, 2003 Euro-Par 2003 11

Evaluation (2)

• Observations:– Compensate poor

branch predictor– No history ~ 10 bit

history– Improvement drops

with more accurate predictor

TC: 256 traces, 4-ways

8.0

8.5

9.0

9.5

10.0

10.5

11.0

11.5

12.0

0 2 4 6 8 10 12 14 16

Branch history length

FIP

A

MGAg+subst

MGAg

Page 12: Trace Substitution Hans Vandierendonck, Hans Logie, Koen De Bosschere Ghent University EuroPar 2003, Klagenfurt

August 27, 2003 Euro-Par 2003 12

Accuracy vs. Usage

• Definitions:– Usage = substitutions

per fetch unit access– Accuracy = fraction

correct substitutions

• Note– Accuracy limited

because correct-path trace is not always present!

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

0 2 4 6 8 10 12 14 16

Branch history length

Fra

ction o

f A

ccesses

Usage

Accuracy

TC: 256 traces, 4-way

Page 13: Trace Substitution Hans Vandierendonck, Hans Logie, Koen De Bosschere Ghent University EuroPar 2003, Klagenfurt

August 27, 2003 Euro-Par 2003 13

Conclusion

• Proposed trace substitution– TC miss flags MBP miss

• Not always correct, not all MBP misses found• Fetch MRU trace instead: cheap implementation

• Results in– Consistent performance improvement

• No history+substitution ~ MGAg with 10-bit history• In other cases: 0.2 instructions/access

or same performance as with 16 times smaller MBP

• Most effective when MBP or TC is small