just a new case for geometric history length predictors

54
1 André Seznec Caps Team IRISA/INRIA Just a new case for geometric history length predictors André Seznec and Pierre Michaud IRISA/INRIA/HIPEAC

Upload: miette

Post on 04-Jan-2016

27 views

Category:

Documents


3 download

DESCRIPTION

Just a new case for geometric history length predictors. André Seznec and Pierre Michaud IRISA/INRIA/HIPEAC. Why branch prediction ?. Processors are deeply pipelined: More than 30 cycles on a Pentium 4 Processors execute a few instructions per cycle: 3-4 on current superscalar processors - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Just a new case  for geometric history length predictors

1

André Seznec Caps Team

IRISA/INRIA

Just a new case for geometric history length predictors

André Seznec and Pierre Michaud

IRISA/INRIA/HIPEAC

Page 2: Just a new case  for geometric history length predictors

2

André SeznecCaps Team

Irisa

Why branch prediction ?

Processors are deeply pipelined: More than 30 cycles on a Pentium 4

Processors execute a few instructions per cycle: 3-4 on current superscalar processors More than 100 instructions are inflight in a processor

≈ 10 % of instructions are branches (mostly conditional) Effective direction/target known very late:

• To avoid delays, predict direction/target

• Backtrack as soon as a misprediction is detected

Page 3: Just a new case  for geometric history length predictors

3

André SeznecCaps Team

Irisa

Why (essentially) predicting the direction of conditional branches

Most branchs are conditional: PC relative address are known/computed at decode time:

• Correction early in the pipeline

Effective direction is computed at execution time• Correction very late in the pipeline

Other branchs: Returns: easy with a return stack Indirect jumps: same penalty as conditional branches on

mispredictions

3 cycles lost

30 cycles lost

Page 4: Just a new case  for geometric history length predictors

4

André SeznecCaps Team

Irisa

Dynamic branch prediction J. Smith 1981

Predict the same direction as the last time: Use of a table of 1-bit counter indexed by the instruction

address Table of 2bit counters: hysteresis to enhance predictor behaviors

0123

0000

11 1 1

Page 5: Just a new case  for geometric history length predictors

5

André SeznecCaps Team

Irisa

Exploiting branch historyYeh and Patt 91, Pan et al 91

Autocorrelation: Local history

• the past behavior of this particular branch

Intercorrelation: Global history

• the (recent) past behavior of all branches

Page 6: Just a new case  for geometric history length predictors

6

André SeznecCaps Team

Irisa

Global history predictor gshare, McFarling 93

PC

PHT

2n 2 bits counters

histo L bits

n bits

xor

prediction

Prediction is inserted in the history to predict the next branch

A representationof the recent path

Page 7: Just a new case  for geometric history length predictors

7

André SeznecCaps Team

Irisa

Local history predictor

2m L bits

local history

PHT

2n 2 bits counters

histo L bits

Prediction

PC

m bits

Page 8: Just a new case  for geometric history length predictors

8

André SeznecCaps Team

Irisa

And of course: hybrid predictors !!

Take a few distinct branch predictors

Select or compute the final prediction

Page 9: Just a new case  for geometric history length predictors

9

André SeznecCaps Team

Irisa

EV8 predictor: (derived from) (2Bc-gskew)Seznec et al, ISCA 2002

e-gskewMichaud et al 97

Page 10: Just a new case  for geometric history length predictors

10

André SeznecCaps Team

Irisa

Perceptron predictorJimenez and Lin 2001

∑Sign=prediction

X

Page 11: Just a new case  for geometric history length predictors

11

André SeznecCaps Team

Irisa

Qualities required on a branch predictor

State-of-the-art accuracy: 2bcgskew was state-of-art on most applications, but was

lagging behind neural inspired predictors on a few.

Reasonable implementation cost: Neural inspired are not really realistic, (huge number of

adders), many access to the tables

In-time response

Page 12: Just a new case  for geometric history length predictors

12

André SeznecCaps Team

Irisa

Speculative history must be managed !?

Global history: Append a bit on a single history register Use of a circular buffer and just a pointer to speculatively

manage the history

Local history: table of histories (unspeculatively updated) must maintain a speculative history per inflight branch:

• Associative search, etc ?!?

Page 13: Just a new case  for geometric history length predictors

13

André SeznecCaps Team

Irisa

From my experience with Alpha EV8 team

Designers hate maintaining

speculative local history

Page 14: Just a new case  for geometric history length predictors

14

André SeznecCaps Team

Irisa

L(0) ?

L(4)

L(3)

L(2)L(1)

TOT1

T2

T3

T4

The basis : A Multiple length global history predictor

Page 15: Just a new case  for geometric history length predictors

15

André SeznecCaps Team

Irisa

Underlying idea

H and H’ two history vectors equal on L bitse.g. L <L(2)

Branches (A,H) and (A,H’) biased in opposite directions

Table T2 allows to discriminatebetween (A,H) and (A,H’)

Page 16: Just a new case  for geometric history length predictors

16

André SeznecCaps Team

Irisa

Selecting between multiple predictions

Classical solution: Use of a meta predictor

“wasting” storage !?! chosing among 5 or 10 predictions ??

Neural inspired predictors: Use an adder tree instead of a meta-predictorJimenez and Lin 2001

Partial matching: Use tagged tables and the longest matching historyChen et al 96, Eden and Mudge 1998, Michaud 2005

Page 17: Just a new case  for geometric history length predictors

17

André SeznecCaps Team

Irisa

L(0) ∑

L(4)

L(3)

L(2)L(1)

TOT1

T2

T3

T4

Final computation through a sum

Prediction=Sign

Page 18: Just a new case  for geometric history length predictors

18

André SeznecCaps Team

Irisa

pc h[0:L1]

ctru

tag

hash hash

=?

ctru

tag

hash hash

=?

ctru

tag

hash hash

=?

prediction

pc pc h[0:L2] pc h[0:L3]

11 1 1 1 1 1

1

1

Final computation through partial match

Tagless base predictor

Page 19: Just a new case  for geometric history length predictors

19

André SeznecCaps Team

Irisa

From “old experience” on 2bcgskew and EV8

Some applications benefit from 100+ bits histories

Some don’t !!

Page 20: Just a new case  for geometric history length predictors

20

André SeznecCaps Team

Irisa

GEometric History Length predictor

L(1)1iαL(i)

0 L(0)

The set of history lengths forms a geometric series

{0, 2, 4, 8, 16, 32, 64, 128}

What is important: L(i)-L(i-1) is drastically increasing

Spends most of the storage for short history !!

Page 21: Just a new case  for geometric history length predictors

21

André SeznecCaps Team

Irisa

Tree adder: O-GEHL: Optimized Geometric History Length

predictor

Partial match: TAGE: Tagged Geometric History Length predictor

• Inspired from PPM-like, Michaud 2004

Page 22: Just a new case  for geometric history length predictors

22

André SeznecCaps Team

Irisa

O-GEHL update policy

Perceptron inspired threshold based

Perceptron-like threshold did not work

Reasonable fixed threshold= Number of tables

Page 23: Just a new case  for geometric history length predictors

23

André SeznecCaps Team

Irisa

Dynamic update threshold fitting

On an O-GEHL predictor, best threshold depends on the application the predictor size the counter width

By chance, on most applications, for the best fixed threshold,

updates on mispredictions ≈ updates on correct predictions

Monitor the difference

and adapt the update threshold

Page 24: Just a new case  for geometric history length predictors

24

André SeznecCaps Team

Irisa

Adaptative history length fitting (inspired by Juan et al 98)

(½ applications: L(7) < 50) ≠

(½ applications: L(7) > 150)

Let us adapt some history lengths to the behavior of each application

8 tables: T2: L(2) and L(8) T4: L(4) and L(9) T6: L(6) and L(10)

Page 25: Just a new case  for geometric history length predictors

25

André SeznecCaps Team

Irisa

Adaptative history length fitting (2)

Intuition: if high degree of conflicts on T7, stick with short

history

Implementation: monitoring of aliasing on updates on T7 through a

tag bit and a counter

Simple is sufficient: Flipping from short to long histories and vice-versa

Page 26: Just a new case  for geometric history length predictors

26

André SeznecCaps Team

Irisa

Evaluation framework

1st Championship Branch Prediction traces:

20 traces including system activity

Floating point apps : loop dominated

Integer apps: usual SPECINT

Multimedia apps

Server workload apps: very large footprint

Page 27: Just a new case  for geometric history length predictors

27

André SeznecCaps Team

Irisa

Reference configurationpresented for 2004 Championship Branch Prediction

8 tables: 2 Kentries except T1, 1Kentries 5 bit counters for T0 and T1, 4 bit counters

otherwise 1 Kbits of one bit tags associated with T7

10K + 5K + 6x8K + 1K = 64K

L(1) =3 and L(10)= 200 {0,3,5,8,12,19,31,49,75,125,200}

Page 28: Just a new case  for geometric history length predictors

28

André SeznecCaps Team

Irisa

The OGEHL predictor

2nd at CBP: 2.82 misp/KI

Best practice award: The predictor the closest to a possible hardware

implementation Does not use exotic features:

• Various prime numbers, etc• Strange initial state

Very short warming intervals: Chaining all simulations: 2.84 misp/KI

Page 29: Just a new case  for geometric history length predictors

29

André SeznecCaps Team

Irisa

A case for the OGEHL predictor (2)

High accuracy 32Kbits (3,150): 3.41 misp/KI

better than any implementable 128 Kbits predictor before CBP• 128 Kbits 2bcgkew (6,6,24,48): 3.55 misp/KI• 176 Kbits PBNP (43) : 3.67 misp/KI

1Mbits (5,300): 2.27 misp/KI• 1Mbit 2bcgskew (9,9,36,72): 3.19 misp/KI• 1888 Kbits PBNP (58): 3.23 misp/KI

Page 30: Just a new case  for geometric history length predictors

30

André SeznecCaps Team

Irisa

Robustness of the OGEHL predictor (3)

Robustness to variations of history lengths choices: L(1) in [2,6], L(10) in [125,300] misp. rate < 2.96 misp/KI

Geometric series: not a bad formula !! best geometric L(1)=3, L(10)=223, 2.80 misp/KI best overall {0, 2, 4, 9, 12, 18, 31, 54, 114, 145, 266} 2.78

misp/KI

Page 31: Just a new case  for geometric history length predictors

31

André SeznecCaps Team

Irisa

OGEHL scalability

4 components — 8 components• 64 Kbits: 3.02 -- 2.84 misp/KI• 256Kbits: 2.59 -- 2.44 misp/KI• 1Mbit: 2.40 -- 2.27 misp/KI

6 components — 12 components• 48 Kbits: 3.02 – 3.03 misp/KI• 768Kbits: 2.35 – 2.25 misp/KI

4 to 12 components bring high accuracy

Page 32: Just a new case  for geometric history length predictors

32

André SeznecCaps Team

Irisa

Impact of counter width

Robustness to counter width variations: 3-bit counter, 49 Kbits: 3.09 misp/KI

Dynamic update threshold fitting helps a lot

5-bit counter 79 Kbits: 2.79 misp/KI

4-bit is the best tradeoff

Page 33: Just a new case  for geometric history length predictors

33

André SeznecCaps Team

Irisa

PPM-like predictor: 3.24 misp/KI

but ..

The update policy was poor

Page 34: Just a new case  for geometric history length predictors

34

André SeznecCaps Team

Irisa

TAGE update policy

General principle:

Minimize the footprint of the prediction.

Just update the longest history

matching component and allocate at most one entry on mispredictions

Page 35: Just a new case  for geometric history length predictors

35

André SeznecCaps Team

Irisa

A tagged table entry

Ctr: 3-bit prediction counter U: 2-bit useful counter

Was the entry recently useful ? Tag: partial tag

Tag CtrU

Page 36: Just a new case  for geometric history length predictors

36

André SeznecCaps Team

Irisa

=? =? =?

11 1 1 1 1 1

1

1

Hit

Hit

Altpred

Pred

Miss

Page 37: Just a new case  for geometric history length predictors

37

André SeznecCaps Team

Irisa

Updating the U counter

If (Altpred ≠ Pred) then• Pred = taken : U= U + 1• Pred ≠ taken : U = U - 1

Graceful aging:Periodic reset of 1 bit in all counters

Page 38: Just a new case  for geometric history length predictors

38

André SeznecCaps Team

Irisa

Allocating a new entry on a misprediction

Find a single Unuseful entry with a longer history: Priviledge the smallest possible history

• To minimize footprint But not too much

• To avoid ping-pong phenomena

Initialize Ctr as weak and U as zero

Page 39: Just a new case  for geometric history length predictors

39

André SeznecCaps Team

Irisa

TAGE update policy

+ a few other tricks

The update is not on the critical path

Page 40: Just a new case  for geometric history length predictors

40

André SeznecCaps Team

Irisa

Tradeoffs on tag width

Too narrow: Lots of ( destructive) false match

Too wide: Losing storage space

Tradeoff: 11 bits for 8 tables 9 bits for 5 tables

can be tuned on a per table basis: (destructive) false match is better tolerated on shorter history

Page 41: Just a new case  for geometric history length predictors

41

André SeznecCaps Team

Irisa

TAGE vs OGEHL

8 comp. OGEHL

64K: 2.84 misp/KI128K: 2.54 misp/KI256K: 2.43 misp/KI512K: 2.33 misp/KI1M : 2.27 misp/KI

8 comp. TAGE

64K: 2.58 misp/KI128K: 2.38 misp/KI256K: 2.23 misp/KI512K: 2.12 misp/KI1M: 2.05 misp/KI

5 comp. TAGE

64K: 2.70 misp/KI128K: 2.45 misp/KI256K: 2.28 misp/KI512K: 2.19 misp/KI1M: 2.12 misp/KI

Page 42: Just a new case  for geometric history length predictors

42

André SeznecCaps Team

Irisa

Partial tag matching is more effective than adder tree

At equal table numbers and equal history lengths: TAGE better than OGEHL on each of the 20 benchmarks

Accuracy limit ( with very high storage budget) is higher on TAGE Additions leads to averaging and information loss With tags, no information is lost

Page 43: Just a new case  for geometric history length predictors

43

André SeznecCaps Team

Irisa

Prediction computation time ?

3 successive steps: Index computation: a few entry XOR gate Table read Adder tree

May not fit on a single cycle: But can be ahead pipelined !

Page 44: Just a new case  for geometric history length predictors

44

André SeznecCaps Team

Irisa

Ahead pipelining a global history branch predictor (principle)

Initiate branch prediction X+1 cycles in advance to provide the prediction in time Use information available:

• X-block ahead instruction address• X-block ahead history

To ensure accuracy: Use intermediate path information

Page 45: Just a new case  for geometric history length predictors

45

André SeznecCaps Team

Irisa

Practice

Ahead OGEHL or TAGE:8 // prediction computations

bcd

Ha

A

A B C D

Page 46: Just a new case  for geometric history length predictors

46

André SeznecCaps Team

Irisa

Ahead Pipelined 64 Kbits OGEHL or TAGE

3-block 64Kbits TAGE: 2.70 misp/KI vs 2.58 misp/KI

3-block 64Kbits OGEHL:2.94 misp/KI vs 2.84 misp/KI

Not such a huge accuracy loss

Page 47: Just a new case  for geometric history length predictors

47

André SeznecCaps Team

Irisa

And indirect jumps ?

TAGE principles can be adapted to predict indirect jumps:

“A case for (partially) tagged branch predictors”, JILP Feb. 2006

Page 48: Just a new case  for geometric history length predictors

48

André SeznecCaps Team

Irisa

A final case for the Geometric History Length predictors

delivers state-of-the-art accuracy uses only global information:

Very long history: 200+ bits !! can be ahead pipelined many effective design points

OGEHL or TAGE Nb of tables, history lengths

prediction computation logic complexity is low(compared with concurrent predictors)

Page 49: Just a new case  for geometric history length predictors

49

André SeznecCaps Team

Irisa

The End

Page 50: Just a new case  for geometric history length predictors

50

André SeznecCaps Team

Irisa

BACK UP

Page 51: Just a new case  for geometric history length predictors

51

André SeznecCaps Team

Irisa

Improved the global history

Address + conditional branch history: path confusion on short histories

Address + path: Direct hashing leads to path confusion

1. Represent all branches in branch history

2. Use also path history ( 1 bit per branch, limited to 16 bits)

Page 52: Just a new case  for geometric history length predictors

52

André SeznecCaps Team

Irisa

Hashing 200+ bits for indexing !!

Need to compute 11 bits indexes : Full hashing is unrealistic

1. Just regularly pick at most 33 bits in:

address+branch history +path history

2. A single 3-entry exclusive-OR stage

Page 53: Just a new case  for geometric history length predictors

53

André SeznecCaps Team

Irisa

Piecewise linear -- OGEHL (1) accuracy

My own best implementation of piecewise linear: Shares counters for the same history rank but uses only power of two numbers of entries in tables

64 Kbits: H=31, 256 counters per rank, x 32 sums 3.72 misp/KI ------------ 2.84 misp/KI

1 MBit : H=63, 2048 counters per rank, x 32 sums 2.64 misp/KI ------------ 2.27 misp/KI

48 Mbit: H=95, 64K counters per rank, x 32 sums 2.30 misp/KI

Page 54: Just a new case  for geometric history length predictors

54

André SeznecCaps Team

Irisa

Piecewise linear – OGEHL-TAGE (2) hardware complexity

Computation logic complexity: OGEHL: only a few adders TAGE: only a few comparators + priority encoder PL: hundreds of adders

Number of storage tables OGEHL, TAGE: 4 to 12 PL: H tables ( update time)

Index computation time: + a 3-entry XOR gate on OGEHL/TAGE

Information to checkpoint: Kilobits for PL