evaluation of dynamic branch prediction schemes in a mips pipeline debajit bhattacharya ali...
TRANSCRIPT
Evaluation of Dynamic Branch Prediction Schemes
in a MIPS PipelineDebajit Bhattacharya
Ali JavadiAbhariELE 475 Final Project 9th May, 2012
Motivation
Branch Prediction
Simulation Setup & Testing Methodology
Dynamic Branch Prediction Single Bit Saturating Counter Two Bit Saturating Counter Two Level Local Branch History & Single Bit Prediction Two Level Local Branch History & Two Bit Prediction
Comparison of Performances
Conclusion
Future Work
Outline
Why Branch Prediction?
Branches (Conditional & Un-conditional) redirect the stream of instructions – results in dead cycles in the front-end
Branch Cost increases with – Super-pipeline – delays the branch resolution e.g. Pentium 3 & 4 have 10 and 20 cycles penalty
respectively Super-scalar – multiplies the dead instructions e.g. 6-stage MIPS pipe has 3 and 7 dead
instructions in their one way and two way implementations respectively
Branch Prediction
Minimizes the dead cycles generated by a “taken” branch
Essential in modern processors to restore the IPC
Two components of prediction – Direction/Outcome of branch (applies to
conditional branches only) Target of branch (applies to all branches)
Simulation Setup & Testing Methodology
5 Stage MIPS pipeline
Parcv2 instruction set
Pv2byp – configuration from Lab
Own Assembly Test
Micro-benchmarks from Lab Vector-vector Add Complex Multiply Binary Search Masked Filter
Pv2Byp Pipeline
Target address of J and JAL known at D stage
Target address of JR and JALR known at X stage
Branch direction/outcome known at X stage
F D X M W
Dynamic Branch Prediction
Performance = f(accuracy, cost of misprediction)
One Level Predictor – Bimodal Prediction Branch History Table Branch Target Buffer
Two level Predictor Branch History Register Table Pattern History Table Branch Target Buffer
All the tables are read at the F stage for prediction
All the tables are written in either D or X stage (depending on the resolution of the branch and correctness of prediction
Hardware Description
BHT Indexed by the lower <bht_IndexSize> bits of PC Holds the prediction bit(s) (1 or 2)
BHR Indexed by lower <bhr_IndexSize> bits of PC Holds the local branch history <pht_IndexSize> bits
PHT Indexed by entries of BHR <pht_IndexSize> bits Holds the prediction bit(s)
BTB Indexed by lower <btb_IndexSize> bits of PC Holds the rest of the bits of PC as tag Holds the branch target PC Holds a valid bit for two level predictor
Hardware Description
Predict Bits
Valid Tag Target
0..0
0..1
1..1
PC[bht_IndexSize+1:2]
PC[btb_IndexSize+1:2]
0..0
0..1
1..1
BHT BTB
=PC[31:btb_IndexSize+2]
BTBHit
One Bit Saturating Counter
Exploits Temporal Correlation between two states – T and NT
Always two mispredicts in a backward branch loop
Predict T
Predict NT
NT
NTT
T
Two bit Saturating Counter
Needs two consecutive T/NT to change prediction state
Tolerates one branch going unusual direction, still predicts next branch correctly
Works better than One bit Counter in a nested loop
Predict T
Predict T
NT
NTT
T
Predict NT
Predict NT
NT NT
T T
Strong TakenWeak Taken
Weak Not taken
Strong Not taken
Two level Branch Predictor [Yeh & Patt, ’92]
Many branches execute repetitive patterns
Local/Current branch history patterns
Requires Initial settling of counter values
111……….01
S111..01
000..00
111..11
index
Pattern History Bit(s)
FSM Logic
Prediction BitBranch Result from X stage
BHR PHT
Comparison of Performance
com
plex
mul
tiply
vect
or a
dditi
on
bina
ry sea
rch
mas
ked
filte
r
10 lo
op
50 lo
op
nest
ed lo
op0
0.2
0.4
0.6
0.8
1
1.2
Fall Thru1-level 1-bit1-level 2-bit2-level 1-bit2-level 2-bit
Effect of BTB Size
com
plex
mul
tiply
vect
or a
dditi
on
bina
ry sea
rch
mas
ked
filte
r
50 lo
op
nest
ed lo
op0
0.2
0.4
0.6
0.8
1
1.2
2bit8bit16bit
1 Level 2 Bit
Effect of PHT Size
2 Level 2 Bit
0
0.2
0.4
0.6
0.8
1
4bit8bit16bit
Conclusion
Predictor Size – Hardware Cost – Better Prediction Accuracy
Larger BHTs – Smaller BTBs – Reduces Hardware cost – Reuses branch history even if the entry is not present in BTB
Smaller BHTs – Multiple branches alias – degraded prediction
All branches reach unique BHT entry – Accuracy saturates
BHR width must capture the repetitive pattern in two level predictor – Otherwise performs worse than bimodal scheme
Future Work
Global Branch Prediction – Data dependent correlation – nested loops
Gshare and Gselect
Extending to two way superscalar – Pv2ssc
Thank You!Q & A
Backup Slides