andré seznec caps team irisa/inria 1 looking for limits in branch prediction with the gtl predictor...

1

André Seznec Caps Team

IRISA/INRIA

Looking for limits in branch predictionwith the GTL predictor

André Seznec

IRISA/INRIA/HIPEAC

2

André SeznecCaps Team

Irisa

Motivations

Geometric history length predictors introduced in 2004-2006 OGEHL, CBP-1, dec. 2004 TAGE, JILP ’06, feb. 2006

• Storage effective• Exploits very long global histories• Were defined with possible implementation in mind

What are the limits of accuracy that can be captured with these schemes ?

How do they compare with unconstrained prediction schemes ?

3


Irisa

L(0) ?

L(4)

L(3)

L(2)L(1)

TOT1

T2

T3

T4

Geometric history length predictors:

global history +multiple lengths

4


Irisa

GEometric History Length predictor

L(1)1iαL(i)

0 L(0)

The set of history lengths forms a geometric series

What is important: L(i)-L(i-1) is drastically increasing

most of the storage for short history !!

{0, 2, 4, 8, 16, 32, 64, 128}

Capture correlation on very long histories

5


Irisa

Combining multiple predictions

Neural inspired predictors Use a (multiply)-add tree

Partial matching Use tagged tables and the longest matching history

O-GEHL, CBP-1

TAGE, JILP’ 06

6


Irisa

L(0) ∑

L(4)

L(3)

L(2)L(1)

TOT1

T2

T3

T4

CBP-1 (2004): O-GEHL

Final computation through a sum

Prediction=Sign

256Kbits: 12 components 3.670 misp/KI

7


Irisa

=? =? =?

11 1 1 1 1 1

1

1

JILP ‘06: TAGElongest matching history

256Kbits: 3.358 misp/KI

8


Irisa

What is global history

conditional branch history: path confusion on short histories

path history: Direct hashing leads to path confusion

1. Represent all branches in branch history

2. Use path AND direction history

9


Irisa

Using a kernel history and a user history

Traces mix user and kernel activities: Kernel activity after exception

• Global history pollution

Solution: use two separate global histories

User history is updated only in user mode Kernel history is updated in both modes

10


Irisa

Accuracy limits for TAGE

Varying the predictor size, the number of components, the tag width, the history length.

Allowing multiple allocations

The best accuracy on distributed traces:

3.054 misp/KI• History length around 1,000• 15-20 components• No need for tags wider than 16 bits

11


Irisa

Accuracy limits for GEHL

Varying the predictor size, the number of components, the history length, counter width

(slightly) improving the update policy and fitting in the two hours simulation rule

on the distributed traces:

2.842 misp/KI• 97 components• 8 bits counter• 2,000 bits global history

12


Irisa

GEHL vs TAGE

Realistic implementation parameters (storage budget, number of components)TAGE is more accurate than (O-)GEHL

Unlimited budget, huge number of componentsGEHL is more accurate than TAGE

13


Irisa

Will it be sufficient to win

The Championship ?

GEHL history length: 2,00097 components

2.842 misp/KI

14


Irisa

A step further: hybrid GEHL-TAGE

On a few benchmarks, TAGE is more accurate than GEHL,

Let us try an hybrid GEHL-TAGE predictor

15


Irisa

Hybrid GEHL-TAGE

Bran

ch/p

ath h

istory + P

C

GEHL

TAGE

Meta=

egskew

mu

x

Inherit from:Agree/bimode, YAGS, 2bcgskew,

16


Irisa

GEHL+TAGE

GEHL provides the main prediction: also used as the base predictor for TAGE

(YAGS inspired)

TAGE records when GEHL fails:

{prediction, address, history}

(agree/bimode, YAGS inspired)

Meta selects between GEHL and TAGE

(2bcgskew inspired)

17


Irisa

Let us have fun !!

GEHL history length: 400

TAGE history length: 100,000

2.774 misp/KI

18


Irisa

Might still be unsufficient

GEHL history length: 400

TAGE history length: 100,000

2.774 misp/KI

19


Irisa

Adding a loop predictor

The loop predictor captures the number of iterations of a loopWhen successively encounters 8 times the

same number of iterations, the loop predictor provides the prediction.

Advantage:Very reliable

20


Irisa

GTL predictor

Bran

ch/p

ath h

istory + P

C

GEHL

TAGE

Meta=

egskew

mu

x

Looppredictor

mu

x

+ static prediction on first occurrence

confid

ence

21


Irisa

Hope this will be sufficient to win

the Championship !!

GTL

GEHL, 97 comp., 400 hist. + TAGE, 19 comp., 100,000 hist

+ loop predictor

2.717 misp/KI

22


Irisa

Geometric History Length predictorsand limits on branch prediction

Unlimited budget, huge number of components GEHL is more accurate than TAGE

Very old correlation can be captured: On two benchmarks, using 10,000 history is really

helping

Does not seem to be a lot of potential extra benefit from local history We did not find any interesting extra scheme apart loop

prediction Loop prediction, very marginal apart gzip

23


Irisa

The End

andré seznec caps team irisa/inria 1 looking for limits in branch prediction with the gtl predictor...

Documents

tage slide

tage history length

tage gehl

direction history slide

mispki history length

short history

local history

bits global history