Dynamic Branch Prediction (Sec 4.3)

Download Report

Transcript Dynamic Branch Prediction (Sec 4.3)

Dynamic Branch Prediction (Sec 4.3)
•
•
•
•
Control dependences become a limiting factor in
exploiting ILP
So far, we’ve discussed only static branch prediction
schemes
Here, we talk about using hardware to dynamically
predict branch outcome.
The effectiveness of a branch prediction scheme
depends on
– Its accuracy of prediction
– Its cost when the prediction is correct and when it
is incorrect.
Branch Prediction Buffer
•
•
•
•
•
In its simplest form, a memory contains a bit,
called prediction bit, saying whether the branch
was recently taken or not
The memory is indexed by the lower portion of
the address of the branch instruction
The fetching begins in the predicted direction
If the prediction is wrong, the prediction bit is
inverted
The simple one-bit scheme has performance
shortcomings (Example on page 263)
Branch Prediction Buffer (Cont’d)
•
•
•
•
•
•
•
Two-bit prediction schemes track the previous two
consecutive branches to change the prediction (Fig. 4.13)
An n-bit predictor can have an n-bit counter, and a branch
prediction can depend on its value
The branch prediction buffer is accessed during the IF stage
If the instruction is decoded as branch, the next fetch is
based on the prediction
See Figure 4.14 to see the prediction accuracy
Prediction accuracy becomes more important in programs
with high branch frequency
We may improve prediction accuracy if we also look at the
recent behavior of other branches
Branch Prediction Buffer (Cont’d)
•
Consider the following code fragment:
If (aa = = 2)
aa = 0;
If (bb = = 2)
bb =0;
If (aa ! = bb) {
•
DLX code for the above is
L1:
L2:
•
SUBI
R3, R1, #2
BNEZ
R3, L1
ADD
SUBI
R1, R0, R0
R3, R2, #2
BNEZ
R3, L2
ADD
R2, R0, R0
SUB
R3, R1, R2;
BEQZ
R3, L3
;branch b1 (aa !=2)
;aa = = 0
;branch b2 (bb!=2)
;bb= = 0
R3= aa - bb
;branch b3 (aa = = bb)
b3 behavior is correlated with the behavior of b1 & b2
Correlating Branch Predictors
•
Consider the code:
If (d = = 0)
d = 1;
If(d = = 1)
•
The instruction sequence generated as follows:
L1:
BNEZ
ADDI
SUBI
BNEZ
R1, L1
R1, R0, #1
R3, R1, #1
R3, L2
L2:
•
See Figures 4.26, 4.17, 4.18 and 4.19
;b1
(d != 0)
;d = = 0 so d = 1
;branch b2
(d != 1)
Correlating Branch Predictors (cont’d.)
•
(m, n) predictor (Figure 4.20)
–
–
–
–
–
Uses the behavior of last ‘m’ branches (global history)
N-bit predictor for a branch
2m branch predictors to choose from
Global history can be recorded as an n-bit shift register
Concatenate low order bits prove the branch address with mbit global history (see figure 4.20)
Branch Target Buffers
•
•
•
•
•
•
•
A branch target buffer stores the predicted address for the
next instruction
The intent is to know the branch target address at the end of
the IF stage (see Fig. 4.22)
We access the buffer during the IF stage
If we get a bit, we fetch the next instruction for the
predicted PC value
If there is no match, proceed normally
A branch predictor field can also be added for extra
prediction
See Fig. 4.23, Fig 4.24, Do example on page 274
Multiple–Issue Processors
•
•
•
So for, we tried to achieve the ideal CPI of 1
How can we improve performance further, to achieve CPI < 1?
Multiple-issue processors are used to improve performance
further
– Superscalar processor:
• Issue varying numbers of instructions per clock
• Could be statically scheduled (Sun Ultra SPARC II/III)
• Or dynamically scheduled (Pentium III/4, MIPSR 10k)
– VLIW (Very Large Instruction World) processors
• Fixed number of instructions per clock
• Statically scheduled by the compiler (Trimedia, 1860, Itanium)
Superscalar Processors
•
•
•
•
A superscalar processor has dynamic issue capability
The hardware may issue from one to eight instruction in
a clock cycle
Usually the instructions are independent and/or follow
certain constraints, such as memory access, etc.
If there is a dependency or structural hazard in an
instruction, only the preceding instructions are issued