perceptrons branch prediction and its ’ recent developments mostly based on the dynamic branch...

Perceptrons Branch Prediction and its’ recent developments

Mostly based on the Dynamic Branch Prediction with Perceptrons

Daniel A. Jim´enez Calvin Lin

By Shugen Li

Introduction As the new technology development on

the deeper pipeline and faster clock cycle, modern computer architectures increasingly rely on speculation to boost instruction-level parallelism.

Machine learning techniques offer the possibility of further improving performance by increasing prediction accuracy.

Introduction (cont’)

Figure 1. A conceptual system model for branch predictionAdapted from I. K. Chen, J. T. Coffey, and T. N. Mudge, “Analysis of branch prediction via data compression”,

Introduction (cont’)

we can improve accuracy by replacing these traditional predictor with neural networks, which provide good predictive capabilities

Perceptrons is one of the simplest possible neural networks -easy to understand, simple to implement, and have several attractive properties

Why perceptrons ? The major benefit of perceptrons is that by examini

ng theirweights, i.e., the correlations that they learn, it is easy to understand the decisions that they make.

many neural networks is difficult or impossible to determine exactly how the neural network is making its decision.

perceptron’s decision-making process is easy to understand as the result of a simple mathematical formula.

Perceptrons Model Input Xi as the bits o

f the global branch history shift register

Weight W0-n is the Weights vector

Y is the output of the perceptrons , Y>0 means prediction is taken , otherwise not taken

Perceptrons training Let branch outcome t be -1

if the branch was not taken, or 1 if it was taken, and let be the threshold, a parameter to the training algorithm used to decide when enough training has been done.

These two pages and figures are adapted from F. Rosenblatt. Principles of Neurodynamics: Perceptrons and the Theory of

Brain Mechanisms.

Perceptrons limitation Only capable of learning linearly sepa

rable functions It means a perceptron can learn the lo

gical AND of two inputs, but not the exclusive-OR

Predictor block Diagram

Experimental result Use Spec2000 interger benchmark and com

pare with gshare and bi-mode. Also compare with a hybrid gshare/percept

ron predictor. Its ability to make use of longer history leng

ths. Done well when the branch being predicted

exhibits linearly separable behavior.

much longer history lengths than traditionaltwo-level schemes

Performance

Implementation Computing the Perceptron Output.

not needed to compute the dot product. Instead, simply add when the input bit is 1 and subtract (a

dd the two’s complement) when the input bit is -1. similar to that performed by multiplication circuits, which

must find the sum of partial products that are each a function of an integer and a single bit.

Furthermore, only the sign bit of the result is needed to make a prediction, so the other bits of the output can be computed more slowly without having to wait for a prediction.

Implementation (cont’) Training

Litimations Delay-huge latency even if

simplified method

Low performance on the non linearly separable

Aliasing and Hardware

Recent development (1)Low-power Perceptrons (selective weight) by Kaveh Aasaraai, Amirali Baniasadi

Non-Effective (NE): These weights have a sign opposite to the dot product value sign. We refer to the summation of NEs as NE-SUM.

Semi-Effective (SE): Weights having the sign of the dot product value, but with an absolute value less than NE-SUM.

Highly-Effective (HE): Weights having the same sign as dot product value and a value greater than NESUM.

Recent development (2)The Combined Perceptron Branch PredictorBy Matteo Monchiero Gianluca Palermo

The predictor consists of two concurrent perceptron-like neural networks; one using as inputs branch history information, the other one program counter bits.

Recent development (3)Path-based neural predictionBy Daniel A.Jimennez

On a N-branch Path-Based Neural predictor, the prediction for a branch is initiated N-branch ahead. The predictions for the N next branches are computed in parallel.

A row of N counters is read using the current instruction block address. On blocks featuring a branch, one of the read counters is added to each of the N partial sums.

The delay is the perceptron table read delay followed by a single multiply-add delay.

No consider the table read delay. Also the misprediction penalty.

Recent development (4)Revisiting the perceptron predictorBy A. Seznec

the accuracy of perceptron predictors is further improved with the following extensions: using pseudo-tag to reduce aliasing impact skewing perceptron weight tables to improve ta

ble utilization, introducing redundant history to handle linearly

inseparable data sets. The nonlinear redundant history also leads to a

more efficient representation, Multiply-Add Contributions (MAC), of perceptron weights

Increasing hardware complexity.

Recent development (5)the O-GEometric History Length branch predictorBy A. Seznec

The GEHL predictor features M distinct predictor tables Ti

The predictor tables store predictions as signed saturated counters.

A single counter C(i) is read on each predictor table Ti.(1< i < M)

The prediction is computed as the sign of the sum S of the M counters C(i). As the first equation.

The prediction is taken when S is positive or nul and not-taken when S is negative.

Recent development(5) Cont’the O-GEometric History Length branch predictorBy A. Seznec

The history lengths used the second equation for computing the indexing functions for tables Ti

The element on all T(i) table is easy to train, similar like in the perceptrons predictor for

Low hardware cost and better latency.

Conclusion Perceptrons is attractive as using long history lengths withou

t requiring exponential resources. It’s weakness is the increased computational complexity an

d following latency and hardware cost. As the new idea, it can be combined with the tranditional me

thods to obtain better performance. There are several methods being developed to reduce the lat

ency and handle the mis-prediction. Finally this technology will be more practical as the hardwar

e cost go down quickly. There should be more space for the further development.

Reference [1] D. Jimenez and C. Lin, “Dynamic branch prediction withpercept

rons”, Proc. of the 7th Int. Symp. on High Perf.Comp. Arch (HPCA-7), 2001.

[2] D. Jimenez and C. Lin, “Neural methods for dynamic branch prediction”, ACM Trans. on Computer Systems,2002.

[3] A. Seznec, “Revisiting the perceptron predictor”,Technical Report, IRISA, 2004.

[4] A. Seznec. An optimized 2bcgskew branch predictor. Technical report Irisa, Sep 2003.

[5] G. Loh. The frankenpredictor. In The 1st JILP Championship Branch Prediction Competition (CBP-1), 2004

[6] K. Aasaraai and A. Baniasadi Low-power Perceptrons [7] A. Seznec. The O-GEometric History Length branch predictor [8] M. Monchiero and G. Palermo The Combined Perceptron Branch

Predictor[9] F. Rosenblatt. Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms. Spartan, 1962.

Thank You!

Question?

perceptrons branch prediction and its ’ recent developments mostly based on the dynamic branch...

Documents