Perceptrons Branch Prediction and its’ recent developments

Download Report

Transcript Perceptrons Branch Prediction and its’ recent developments

Perceptrons Branch Prediction
and its’ recent developments
Mostly based on the Dynamic Branch Prediction with
Perceptrons
Daniel A. Jim´enez Calvin Lin
By Shugen Li
Introduction


As the new technology development on the
deeper pipeline and faster clock cycle,
modern computer architectures increasingly
rely on speculation to boost instruction-level
parallelism.
Machine learning techniques offer the
possibility of further improving performance
by increasing prediction accuracy.
Introduction (cont’)

Figure 1. A conceptual system model for branch prediction
Adapted from I. K. Chen, J. T. Coffey, and T. N. Mudge, “Analysis of branch prediction via data compression”,
Introduction (cont’)


we can improve accuracy by replacing
these traditional predictor with neural
networks, which provide good
predictive capabilities
Perceptrons is one of the simplest
possible neural networks -easy to
understand, simple to implement, and
have several attractive properties
Why perceptrons ?



The major benefit of perceptrons is that by
examining theirweights, i.e., the correlations that
they learn, it is easy to understand the decisions that
they make.
many neural networks is difficult or impossible to
determine exactly how the neural network is making
its decision.
perceptron’s decision-making process is easy to
understand as the result of a simple mathematical
formula.
Perceptrons Model



Input Xi as the bits
of the global branch
history shift register
Weight W is the
0-n
Weights vector
Y is the output of the
perceptrons , Y>0
means prediction is
taken , otherwise not
taken
Perceptrons training

Let branch outcome t be -1 if
the branch was not taken, or
1 if it was taken, and let be
the threshold, a parameter
to the training algorithm
used to decide when enough
training has been done.
These two pages and figures are adapted
from F. Rosenblatt. Principles of
Neurodynamics: Perceptrons and the
Theory of Brain Mechanisms.
Perceptrons limitation


Only capable of learning linearly
separable functions
It means a perceptron can learn the
logical AND of two inputs, but not the
exclusive-OR
Predictor block Diagram
Experimental result




Use Spec2000 interger benchmark and
compare with gshare and bi-mode.
Also compare with a hybrid
gshare/perceptron predictor.
Its ability to make use of longer history
lengths.
Done well when the branch being predicted
exhibits linearly separable behavior.
much longer history lengths than traditional
two-level schemes
Performance
Implementation

Computing the Perceptron Output.




not needed to compute the dot product.
Instead, simply add when the input bit is 1 and subtract
(add the two’s complement) when the input bit is -1.
similar to that performed by multiplication circuits, which
must find the sum of partial products that are each a
function of an integer and a single bit.
Furthermore, only the sign bit of the result is needed
to make a prediction, so the other bits of the output
can be computed more slowly without having to wait
for a prediction.
Implementation (cont’)

Training
Litimations



Delay-huge latency even if simplified
method
Low performance on the non linearly
separable
Aliasing and Hardware
Recent development (1)
Low-power Perceptrons (selective weight)
Amirali Baniasadi



Non-Effective (NE): These
weights have a sign opposite
to the dot product value sign.
We refer to the summation
of NEs as NE-SUM.
Semi-Effective (SE): Weights
having the sign of the dot
product value, but with an
absolute value less than NESUM.
Highly-Effective (HE):
Weights having the same
sign as dot product value
and a value greater than
NESUM.
by Kaveh Aasaraai,
Recent development (2)
The Combined Perceptron Branch Predictor
By Matteo Monchiero Gianluca Palermo

The predictor
consists of two
concurrent
perceptron-like
neural networks; one
using as inputs
branch history
information, the
other one program
counter bits.
Recent development (3)
Path-based neural prediction
By Daniel A.Jimennez




On a N-branch Path-Based Neural predictor, the
prediction for a branch is initiated N-branch ahead.
The predictions for the N next branches are
computed in parallel.
A row of N counters is read using the current
instruction block address. On blocks featuring a
branch, one of the read counters is added to each of
the N partial sums.
The delay is the perceptron table read delay followed
by a single multiply-add delay.
No consider the table read delay. Also the
misprediction penalty.
Recent development (4)
Revisiting the perceptron predictor
By A. Seznec

the accuracy of perceptron predictors is
further improved with the following
extensions:





using pseudo-tag to reduce aliasing impact
skewing perceptron weight tables to improve table
utilization,
introducing redundant history to handle linearly
inseparable data sets.
The nonlinear redundant history also leads to a
more efficient representation, Multiply-Add
Contributions (MAC), of perceptron weights
Increasing hardware complexity.
Recent development (5)
the O-GEometric History Length branch predictor
By A. Seznec





The GEHL predictor features
M distinct predictor tables Ti
The predictor tables store
predictions as signed
saturated counters.
A single counter C(i) is read
on each predictor table
Ti.(1< i < M)
The prediction is computed
as the sign of the sum S of
the M counters C(i). As the
first equation.
The prediction is taken when
S is positive or nul and nottaken when S is negative.
Recent development(5) Cont’
the O-GEometric History Length branch predictor
By A. Seznec



The history lengths used the
second equation for
computing the indexing
functions for tables Ti
The element on all T(i) table
is easy to train, similar like in
the perceptrons predictor for
Low hardware cost and
better latency.
Conclusion

Perceptrons is attractive as using long history lengths without
requiring exponential resources.





It’s weakness is the increased computational complexity and
following latency and hardware cost.
As the new idea, it can be combined with the tranditional
methods to obtain better performance.
There are several methods being developed to reduce the
latency and handle the mis-prediction.
Finally this technology will be more practical as the hardware
cost go down quickly.
There should be more space for the further development.
Reference








[1] D. Jimenez and C. Lin, “Dynamic branch prediction withperceptrons”,
Proc. of the 7th Int. Symp. on High Perf.Comp. Arch (HPCA-7), 2001.
[2] D. Jimenez and C. Lin, “Neural methods for dynamic branch
prediction”, ACM Trans. on Computer Systems,2002.
[3] A. Seznec, “Revisiting the perceptron predictor”,Technical Report,
IRISA, 2004.
[4] A. Seznec. An optimized 2bcgskew branch predictor. Technical
report Irisa, Sep 2003.
[5] G. Loh. The frankenpredictor. In The 1st JILP Championship Branch
Prediction Competition (CBP-1), 2004
[6] K. Aasaraai and A. Baniasadi Low-power Perceptrons
[7] A. Seznec. The O-GEometric History Length branch predictor
[8] M. Monchiero and G. Palermo The Combined Perceptron Branch
Predictor
[9] F. Rosenblatt. Principles of Neurodynamics: Perceptrons and the
Theory of Brain Mechanisms. Spartan, 1962.
Thank You!
Question?