![Page 1: Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c785503460f9492d095/html5/thumbnails/1.jpg)
Dynamic History-Length Fitting:A third level of adaptivity for branch
prediction
Toni JuanSanji SanjeevanJuan J. Navarro
Department of Computer ArchitectureUniversity Politècnica de Catalunya
Presented by Danyao WangECE1718, Fall 2008
ISCA '98
![Page 2: Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c785503460f9492d095/html5/thumbnails/2.jpg)
2
Overview
• Branch prediction background
• Dynamic branch predictors
• Dynamic history-length fitting (DHLF)– Without context switches
– With context switches
• Results
• Conclusion
![Page 3: Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c785503460f9492d095/html5/thumbnails/3.jpg)
3
Why branch prediction?
• Superscalar processors with deep pipelines– Intel Core 2 Duo: 14 stages
– AMD Athlon 64: 12 stages
– Intel Pentium 4: 31 stages
• Many cycles before branch is resolved– Wasting time if wait…
– Would be good if can do some useful work…
• Branch prediction!
![Page 4: Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c785503460f9492d095/html5/thumbnails/4.jpg)
4
What does it do?sub r1, r2, r3bne r1, r0, L1add r4, r5, r6…
L1: add r4, r7, r8sub r9, r4, r2
fetch decode sub
fetch decode sub
fetch decode bne
fetch decode add
Execute speculatively
Predict taken.Fetch from L1
Branch resolved
Time
Branch fetched
Validate prediction: Correct
![Page 5: Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c785503460f9492d095/html5/thumbnails/5.jpg)
5
What happens when mispredicted?sub r1, r2, r3bne r1, r0, L1add r4, r5, r6…
L1: add r4, r7, r8sub r9, r4, r2
fetch decode sub
fetch decode sub
fetch decode bne
fetch decode add
Execute speculatively
Predict taken.Fetch from L1
Branch resolved
Time
Branch fetched
Validate prediction: Incorrect!
squash
![Page 6: Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c785503460f9492d095/html5/thumbnails/6.jpg)
6
How to predict branches?
• Statically at compile time– Simple hardware
– Not accurate enough…
• Dynamically at execution time– Hardware predictors
• Last-outcome predictor
• Saturation counter
• Pattern predictor
• Tournament predictorMore ComplexMore Accurate
![Page 7: Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c785503460f9492d095/html5/thumbnails/7.jpg)
7
Last-Outcome Branch Predictor
• Simplest dynamic branch predictor
• Branch prediction table with 1-bit entries
• Intuition: history repeats itself
2N entries
PC
lower N bits of PC
Branch Prediction Table
index
1-bit Prediction: T or NT-Read at Fetch-Write on misprediction
![Page 8: Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c785503460f9492d095/html5/thumbnails/8.jpg)
8
Saturation Counter Predictor
• Observation: branches highly bimodal
• n-bit saturation counter– Hysteresis
– n-bit entries in branch prediction table
00 01 10 11
Pred. TakenPred. Not-TakenT T T
T
NNN
N
WEAK bias
Strong biase.g. 2-bit bimodal predictor
![Page 9: Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c785503460f9492d095/html5/thumbnails/9.jpg)
9
Pattern Predictors
• Near-by branches often correlate
• Looks for patterns in branch history– Branch History Register (BHR): m most recent branch
outcomes
2N entries
PC
lower n bits of PC
Branch Prediction Table
N-bit index
saturation counter
BHR
m-bit history
f
Two-Level Predictor
![Page 10: Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c785503460f9492d095/html5/thumbnails/10.jpg)
10
Tournament Predictor
• No one-size-suits-all predictor
• Dynamically choose among different predictors
Predictor A
Predictor B
PC
Predictor C
Chooser or metapredictor
![Page 11: Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c785503460f9492d095/html5/thumbnails/11.jpg)
11
What is the best predictor?
Optimal
Better
![Page 12: Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c785503460f9492d095/html5/thumbnails/12.jpg)
12
Observations
• Predictor performance depends on history length
• Optimal history length differs for programs
• Predictors with fixed history length underperforming potential
• … dynamic history length?
![Page 13: Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c785503460f9492d095/html5/thumbnails/13.jpg)
Dynamic History-Length Fitting (DHLF)
![Page 14: Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c785503460f9492d095/html5/thumbnails/14.jpg)
14
Intuition
• Tournament predictor– Picks best out of many predictors
– Spatial multiplexing
– Area cost …
• DHLF: time multiplexing– Try different history lengths during execution
– Adapt history length to code
– Hope to find the best one
![Page 15: Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c785503460f9492d095/html5/thumbnails/15.jpg)
15
2-Level Predictor Revisited
• Index = f(PC, BHR)
• gshare, f = xor, m < n
• 2-bit saturation counter
2n entries
PC
lower n bits of PC
Branch Prediction Table
n-bit index
saturation counter
BHR
m-bit history
f
PredeterminedFigure out dynamically
![Page 16: Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c785503460f9492d095/html5/thumbnails/16.jpg)
16
DHLF Approach
• Current history length
• Best so far length
• Misprediction counter
• Branch counter
• Table of measured misprediction rates per length– Initialized to zero
• Sampling at fixed intervals (step size)– Try new length: get MR– Adjust if worse than best seen before– Move to a random length if length has not changed for a while
• Avoids local minima
![Page 17: Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c785503460f9492d095/html5/thumbnails/17.jpg)
17
DHLF ExamplesIndex = 12 bitsstep = 16K
Optimal
![Page 18: Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c785503460f9492d095/html5/thumbnails/18.jpg)
18
Experimental Methodology
• SPECint95
• gshare and dhlf-gshare
• Trace-driven simulation
• Simulated up to 200M conditional branches
• Branch history register & pattern history table immediately updated with the true outcome
![Page 19: Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c785503460f9492d095/html5/thumbnails/19.jpg)
19
DHLF Performance
• Area overhead– Index length = 10; step size = 16K; overhead = 7%– Index length = 16; step size = 16K; overhead = 0.02%
Better
![Page 20: Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c785503460f9492d095/html5/thumbnails/20.jpg)
20
Optimization Strategies
• Step size– Small: learns faster
• Has to be big enough for meaningful misprediction stats
– Big: learns slower
• Change length incrementally– Test as many lengths as possible
• Warm-up period– No MR count for 1 interval after length change
![Page 21: Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c785503460f9492d095/html5/thumbnails/21.jpg)
21
Context Switches
• Branch prediction table trashed periodically
• Lower prediction accuracy immediately after a context switch
• Context switch frequency affects optimal history length
![Page 22: Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c785503460f9492d095/html5/thumbnails/22.jpg)
22
Impact on Misprediction Rate
Better
gshare. Index = 16 bits
Context-switch distance: # branches executed between context switches
![Page 23: Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c785503460f9492d095/html5/thumbnails/23.jpg)
23
Coping with Context Switches
• Upon context switch– Discard current misprediction counter
– Save current predictor data• misprediction table
• current history length
• Approx. 221 bits for 16-bit index, step = 16K, 13 bit misprediction counter
• Returning from a context switch– Warm-up: no MR counter for 1 interval
![Page 24: Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c785503460f9492d095/html5/thumbnails/24.jpg)
24
DHLF with Context SwitchesM
ispr
edic
tion
rate
Better
x dhlf-gshare with step value = 16K gshare with all possible history length
Branch prediction table flush every 70K instructions to simulate context switch.
![Page 25: Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c785503460f9492d095/html5/thumbnails/25.jpg)
25
Contributions
• Dynamically finds near-optimal history lengths
• Performs well for programs with different branch behaviours
• Performs well under context switches
• Can be applied to any two-level branch predictor
• Small area overhead
![Page 26: Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c785503460f9492d095/html5/thumbnails/26.jpg)
Backup Slides
![Page 27: Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c785503460f9492d095/html5/thumbnails/27.jpg)
27
DHLF Performance: SPECint95
dhlf-share; step size = 16K. Compared to all possible history lengths (no context switch)
Better
Better
![Page 28: Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c785503460f9492d095/html5/thumbnails/28.jpg)
28
DHLP with Context Switches
Better
Better
dhlf-gshare; step size = 16K; context-switch distance = 70K
![Page 29: Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c785503460f9492d095/html5/thumbnails/29.jpg)
29
dhlf-gskew
Step value = 16K. Compared to all history lengths for gskew,
Better
![Page 30: Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c785503460f9492d095/html5/thumbnails/30.jpg)
30
dhlf-gskew with Context Switch
Step size = 16K; Context-switch distance = 70K.
Better
![Page 31: Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c785503460f9492d095/html5/thumbnails/31.jpg)
31
DHLF Structure
Run next interval
Misprediction table
N entries
0
1
Nstep dynamicbranches
Initial history length
branch counter
misprediction counter
current misprediction > min achieved?
ptr. to min. misprediction count
ptr. to entry for current history length
Yes
Adjust history length
No
DHLF Data Structure
![Page 32: Dynamic History-Length Fitting: A third level of adaptivity for branch prediction Toni Juan Sanji Sanjeevan Juan J. Navarro Department of Computer Architecture](https://reader030.vdocuments.site/reader030/viewer/2022032516/56649c785503460f9492d095/html5/thumbnails/32.jpg)
32
Questions
• Is fixed context switch distance realistic?
• Does updating the PHT with true branch data immediately affect results?– Previous studies show little impact due to this