dynamic branch prediction

21
07/03/22 Lec4 ILP 1 Dynamic Branch Prediction Why does prediction work? Underlying algorithm has regularities Data that is being operated on has regularities Instruction sequence has redundancies that are artifacts of way that humans/compilers think about problems Is dynamic branch prediction better than static branch prediction? Seems to be There are a small number of important branches in programs which have dynamic behavior

Upload: alima

Post on 06-Jan-2016

33 views

Category:

Documents


0 download

DESCRIPTION

Dynamic Branch Prediction. Why does prediction work? Underlying algorithm has regularities Data that is being operated on has regularities Instruction sequence has redundancies that are artifacts of way that humans/compilers think about problems - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Dynamic Branch Prediction

04/20/23 Lec4 ILP 1

Dynamic Branch Prediction

• Why does prediction work?– Underlying algorithm has regularities

– Data that is being operated on has regularities

– Instruction sequence has redundancies that are artifacts of way that humans/compilers think about problems

• Is dynamic branch prediction better than static branch prediction?

– Seems to be

– There are a small number of important branches in programs which have dynamic behavior

Page 2: Dynamic Branch Prediction

04/20/23 Lec4 ILP 2

Dynamic Branch Prediction

• Performance = ƒ(accuracy, cost of misprediction)

• Branch History Table: Lower bits of PC address index table of 1-bit values

– Says whether or not branch taken last time

– No address check

• Problem: in a loop, 1-bit BHT will cause two mispredictions (avg is 9 iteratios before exit):

– End of loop case, when it exits instead of looping as before

– First time through loop on next time through code, when it predicts exit instead of looping

Page 3: Dynamic Branch Prediction

04/20/23 Lec4 ILP 3

• Solution: 2-bit scheme where change prediction only if get misprediction twice

• Red: stop, not taken

• Green: go, taken

• Adds hysteresis to decision making process

Dynamic Branch Prediction

T

T NT

NT

Predict Taken

Predict Not Taken

Predict Taken

Predict Not TakenT

NTT

NT

Page 4: Dynamic Branch Prediction

04/20/23 Lec4 ILP 4

18%

5%

12%10%

9%

5%

9% 9%

0%1%

0%2%4%6%8%

10%12%14%16%18%20%

eqnt

ott

espr

esso gc

c li

spice

dodu

c

spice

fppp

p

mat

rix30

0

nasa

7

Mis

pre

dic

tio

n R

ate

BHT Accuracy

• Mispredict because either:– Wrong guess for that branch

– Got branch history of wrong branch when index the table

• 4096 entry table:

IntegerFloating Point

Page 5: Dynamic Branch Prediction

Correlating Branch Predictor

• It may possible to improve the accuracy if we look at the behavior of other branches.

if (aa == 2)aa = 0;

if (bb == 0)bb = 0;

if (aa != bb)

The behavior of b3 is correlated with the behavior of b1 and b2.

Page 6: Dynamic Branch Prediction

Correlating Predictors

• Two-level predictors

if (d == 0)d = 1;

if (d == 1)

Page 7: Dynamic Branch Prediction

initial value of d

b1 value of d before b2

b2

0 nt 1 nt

1 t 1 nt

2 t 2 t

Page 8: Dynamic Branch Prediction

1-bit Predictor (Initialized to NT)

d b1 predic

b1 action

new b1 pr

b2 predic

b2 action

new b2 pr

2 nt t t nt t t

0 t nt nt t nt nt

2 nt t t nt t T

0 t nt nt t nt nt

Page 9: Dynamic Branch Prediction

(1,1) Predictor

• Every branch has two separate prediction bits.

– First bit: the prediction if the last branch in the program is not taken.

– Second bit: the prediction if the last branch in the program is taken.

• Write the pair of prediction bits together.

Page 10: Dynamic Branch Prediction

Combinations & Meaning

Prediction bits Prediction if not taken

Prediction if taken

NT/NT NT NT

NT/T NT T

T/NT T NT

T/T T T

Page 11: Dynamic Branch Prediction

(1,1) Predictor Example

d b1 pred

b1 action

new b1

pred

b2 pred

b2 action

new b2

pred

2 nt/nt t t/nt nt/nt t nt/t

0 t/nt nt t/nt nt/t nt nt/t

2 t/nt t t/nt nt/t t nt/t

0 t/nt nt t/nt nt/t nt nt

Initial prediction bits are nt/ntInitial last branch is ntUsed prediction bit is colored with red

Page 12: Dynamic Branch Prediction

04/20/23 Lec4 ILP 12

Correlated Branch Prediction

• Idea: record m most recently executed branches as taken or not taken, and use that pattern to select the proper n-bit branch history table

• In general, (m,n) predictor means record last m branches to select between 2m history tables, each with n-bit counters

– Thus, old 2-bit BHT is a (0,2) predictor

• Global Branch History: m-bit shift register keeping T/NT status of last m branches.

• Each entry in table has m n-bit predictors.

Page 13: Dynamic Branch Prediction

04/20/23 Lec4 ILP 13

Correlating Branches

(2,2) predictor

– Behavior of recent branches selects between four predictions of next branch, updating just that prediction

Branch address

2-bits per branch predictor

Prediction

2-bit global branch history

4

Page 14: Dynamic Branch Prediction

04/20/23 Lec4 ILP 14

0%

Fre

quen

cy o

f M

isp

redi

ctio

ns

0%1%

5%6% 6%

11%

4%

6%5%

1%2%

4%

6%

8%

10%

12%

14%

16%

18%

20%

4,096 entries: 2-bits per entry Unlimited entries: 2-bits/entry 1,024 entries (2,2)

Accuracy of Different Schemes

4096 Entries 2-bit BHTUnlimited Entries 2-bit BHT1024 Entries (2,2) BHT

nasa

7

mat

rix3

00

dodu

cd

spic

e

fppp

p

gcc

expr

esso

eqnt

ott li

tom

catv

Page 15: Dynamic Branch Prediction

04/20/23 Lec4 ILP 15

Tournament Predictors

• Multilevel branch predictor

• Use n-bit saturating counter to choose between predictors

• Usual choice between global and local predictors

Page 16: Dynamic Branch Prediction

04/20/23 Lec4 ILP 16

Tournament Predictors

Tournament predictor using, say, 4K 2-bit counters indexed by local branch address. Chooses between:

• Global predictor

– 4K entries index by history of last 12 branches (212 = 4K)

– Each entry is a standard 2-bit predictor

• Local predictor

– Local history table: 1024 10-bit entries recording last 10 branches, index by branch address

– The pattern of the last 10 occurrences of that particular branch used to index table of 1K entries with 3-bit saturating counters

Page 17: Dynamic Branch Prediction

04/20/23 Lec4 ILP 17

Comparing Predictors

• Advantage of tournament predictor is ability to select the right predictor for a particular branch

– Particularly crucial for integer benchmarks

– A typical tournament predictor will select the global predictor almost 40% of the time for the SPEC integer benchmarks and less than 15% of the time for the SPEC FP benchmarks

0%

1%

2%

3%

4%

5%

6%

7%

8%

9%

10%

0 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120 128

Total predictor size (Kbits)

Con

ditio

nal b

ranc

h m

ispr

edic

tion

rate

Local

Correlating

Tournament

2-bit BHTSPEC89

Page 18: Dynamic Branch Prediction

04/20/23 Lec4 ILP 18

Pentium 4 Misprediction Rate (per 1000 instructions, not per branch)

11

13

7

12

9

10 0 0

5

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

164.gzip

175.vpr

176.gcc

181.mcf

186.crafty

168.wupwise

171.swim

172.mgrid

173.applu

177.mesa

Bra

nch

mis

pre

dic

tion

s p

er

10

00

In

str

ucti

on

s

SPECint2000 SPECfp2000

6% misprediction rate per branch SPECint (19% of INT instructions are branch)

2% misprediction rate per branch SPECfp(5% of FP instructions are branch)

Page 19: Dynamic Branch Prediction

• Branch target calculation is costly and stalls the instruction fetch.

• BTB stores PCs the same way as caches

• The PC of a branch is sent to the BTB

• When a match is found the corresponding Predicted PC is returned

• If the branch was predicted taken, instruction fetch continues at the returned predicted PC

Branch Target Buffers (BTB)

Page 20: Dynamic Branch Prediction

Branch Target Buffers

Branch PC Predicted PC

=?

PC

of in

structio

nF

ET

CH

Extra Prediction

statebits

Yes: inst is branch,Next PC = predicted PC

No: proceed normally (Next PC = PC+4)

Page 21: Dynamic Branch Prediction

04/20/23 Lec4 ILP 21

Dynamic Branch Prediction Summary

• Prediction becoming important part of execution

• Branch History Table: 2 bits for loop accuracy

• Correlation: Recently executed branches correlated with next branch

– Either different branches (GA)

– Or different executions of same branches (PA)

• Tournament predictors take insight to next level, by using multiple predictors

– usually one based on global information and one based on local information, and combining them with a selector

– In 2006, tournament predictors using 30K bits are in processors like the Power5 and Pentium 4

• Branch Target Buffer: include branch address & prediction