Download - Dynamic Branch Prediction
04/20/23 Lec4 ILP 1
Dynamic Branch Prediction
• Why does prediction work?– Underlying algorithm has regularities
– Data that is being operated on has regularities
– Instruction sequence has redundancies that are artifacts of way that humans/compilers think about problems
• Is dynamic branch prediction better than static branch prediction?
– Seems to be
– There are a small number of important branches in programs which have dynamic behavior
04/20/23 Lec4 ILP 2
Dynamic Branch Prediction
• Performance = ƒ(accuracy, cost of misprediction)
• Branch History Table: Lower bits of PC address index table of 1-bit values
– Says whether or not branch taken last time
– No address check
• Problem: in a loop, 1-bit BHT will cause two mispredictions (avg is 9 iteratios before exit):
– End of loop case, when it exits instead of looping as before
– First time through loop on next time through code, when it predicts exit instead of looping
04/20/23 Lec4 ILP 3
• Solution: 2-bit scheme where change prediction only if get misprediction twice
• Red: stop, not taken
• Green: go, taken
• Adds hysteresis to decision making process
Dynamic Branch Prediction
T
T NT
NT
Predict Taken
Predict Not Taken
Predict Taken
Predict Not TakenT
NTT
NT
04/20/23 Lec4 ILP 4
18%
5%
12%10%
9%
5%
9% 9%
0%1%
0%2%4%6%8%
10%12%14%16%18%20%
eqnt
ott
espr
esso gc
c li
spice
dodu
c
spice
fppp
p
mat
rix30
0
nasa
7
Mis
pre
dic
tio
n R
ate
BHT Accuracy
• Mispredict because either:– Wrong guess for that branch
– Got branch history of wrong branch when index the table
• 4096 entry table:
IntegerFloating Point
Correlating Branch Predictor
• It may possible to improve the accuracy if we look at the behavior of other branches.
if (aa == 2)aa = 0;
if (bb == 0)bb = 0;
if (aa != bb)
The behavior of b3 is correlated with the behavior of b1 and b2.
Correlating Predictors
• Two-level predictors
if (d == 0)d = 1;
if (d == 1)
initial value of d
b1 value of d before b2
b2
0 nt 1 nt
1 t 1 nt
2 t 2 t
1-bit Predictor (Initialized to NT)
d b1 predic
b1 action
new b1 pr
b2 predic
b2 action
new b2 pr
2 nt t t nt t t
0 t nt nt t nt nt
2 nt t t nt t T
0 t nt nt t nt nt
(1,1) Predictor
• Every branch has two separate prediction bits.
– First bit: the prediction if the last branch in the program is not taken.
– Second bit: the prediction if the last branch in the program is taken.
• Write the pair of prediction bits together.
Combinations & Meaning
Prediction bits Prediction if not taken
Prediction if taken
NT/NT NT NT
NT/T NT T
T/NT T NT
T/T T T
(1,1) Predictor Example
d b1 pred
b1 action
new b1
pred
b2 pred
b2 action
new b2
pred
2 nt/nt t t/nt nt/nt t nt/t
0 t/nt nt t/nt nt/t nt nt/t
2 t/nt t t/nt nt/t t nt/t
0 t/nt nt t/nt nt/t nt nt
Initial prediction bits are nt/ntInitial last branch is ntUsed prediction bit is colored with red
04/20/23 Lec4 ILP 12
Correlated Branch Prediction
• Idea: record m most recently executed branches as taken or not taken, and use that pattern to select the proper n-bit branch history table
• In general, (m,n) predictor means record last m branches to select between 2m history tables, each with n-bit counters
– Thus, old 2-bit BHT is a (0,2) predictor
• Global Branch History: m-bit shift register keeping T/NT status of last m branches.
• Each entry in table has m n-bit predictors.
04/20/23 Lec4 ILP 13
Correlating Branches
(2,2) predictor
– Behavior of recent branches selects between four predictions of next branch, updating just that prediction
Branch address
2-bits per branch predictor
Prediction
2-bit global branch history
4
04/20/23 Lec4 ILP 14
0%
Fre
quen
cy o
f M
isp
redi
ctio
ns
0%1%
5%6% 6%
11%
4%
6%5%
1%2%
4%
6%
8%
10%
12%
14%
16%
18%
20%
4,096 entries: 2-bits per entry Unlimited entries: 2-bits/entry 1,024 entries (2,2)
Accuracy of Different Schemes
4096 Entries 2-bit BHTUnlimited Entries 2-bit BHT1024 Entries (2,2) BHT
nasa
7
mat
rix3
00
dodu
cd
spic
e
fppp
p
gcc
expr
esso
eqnt
ott li
tom
catv
04/20/23 Lec4 ILP 15
Tournament Predictors
• Multilevel branch predictor
• Use n-bit saturating counter to choose between predictors
• Usual choice between global and local predictors
04/20/23 Lec4 ILP 16
Tournament Predictors
Tournament predictor using, say, 4K 2-bit counters indexed by local branch address. Chooses between:
• Global predictor
– 4K entries index by history of last 12 branches (212 = 4K)
– Each entry is a standard 2-bit predictor
• Local predictor
– Local history table: 1024 10-bit entries recording last 10 branches, index by branch address
– The pattern of the last 10 occurrences of that particular branch used to index table of 1K entries with 3-bit saturating counters
04/20/23 Lec4 ILP 17
Comparing Predictors
• Advantage of tournament predictor is ability to select the right predictor for a particular branch
– Particularly crucial for integer benchmarks
– A typical tournament predictor will select the global predictor almost 40% of the time for the SPEC integer benchmarks and less than 15% of the time for the SPEC FP benchmarks
0%
1%
2%
3%
4%
5%
6%
7%
8%
9%
10%
0 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120 128
Total predictor size (Kbits)
Con
ditio
nal b
ranc
h m
ispr
edic
tion
rate
Local
Correlating
Tournament
2-bit BHTSPEC89
04/20/23 Lec4 ILP 18
Pentium 4 Misprediction Rate (per 1000 instructions, not per branch)
11
13
7
12
9
10 0 0
5
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
164.gzip
175.vpr
176.gcc
181.mcf
186.crafty
168.wupwise
171.swim
172.mgrid
173.applu
177.mesa
Bra
nch
mis
pre
dic
tion
s p
er
10
00
In
str
ucti
on
s
SPECint2000 SPECfp2000
6% misprediction rate per branch SPECint (19% of INT instructions are branch)
2% misprediction rate per branch SPECfp(5% of FP instructions are branch)
• Branch target calculation is costly and stalls the instruction fetch.
• BTB stores PCs the same way as caches
• The PC of a branch is sent to the BTB
• When a match is found the corresponding Predicted PC is returned
• If the branch was predicted taken, instruction fetch continues at the returned predicted PC
Branch Target Buffers (BTB)
Branch Target Buffers
Branch PC Predicted PC
=?
PC
of in
structio
nF
ET
CH
Extra Prediction
statebits
Yes: inst is branch,Next PC = predicted PC
No: proceed normally (Next PC = PC+4)
04/20/23 Lec4 ILP 21
Dynamic Branch Prediction Summary
• Prediction becoming important part of execution
• Branch History Table: 2 bits for loop accuracy
• Correlation: Recently executed branches correlated with next branch
– Either different branches (GA)
– Or different executions of same branches (PA)
• Tournament predictors take insight to next level, by using multiple predictors
– usually one based on global information and one based on local information, and combining them with a selector
– In 2006, tournament predictors using 30K bits are in processors like the Power5 and Pentium 4
• Branch Target Buffer: include branch address & prediction