lecture8_predication_v2.ppt

Download Report

Transcript lecture8_predication_v2.ppt

Loop Unrolling
&
Predication
CSE 820
Software Pipelining
With software pipelining a reorganized loop
contains instructions from different
iterations of the original loop.
Sometimes called symbolic loop unrolling.
Michigan State University
Computer Science and Engineering
Software Pipelined Loop
Michigan State University
Computer Science and Engineering
Unrolled Loop
select subset of each iteration (bold)
Iteration 1: L.D
ADD.D
S.D
Iteration 2: L.D
ADD.D
S.D
Iteration 3: L.D
ADD.D
S.D
F0,0 (R1)
F4, F0, F2
F4, 0 (R1)
F0,0 (R1)
F4, F0, F2
F4, 0 (R1)
F0,0 (R1)
F4, F0, F2
F4, 0 (R1)
Michigan State University
Computer Science and Engineering
Software Pipelining
Loop: S.D
F4, 16 (R1); stores into M[i]
ADD.D F4, F0, F2 ; adds to M[i-1]
L.D
F0,0 (R1) ; loads M[i-2]
DADDUI R1, R1, # -8
BNE
R1, R2, Loop
Requires start-up and clean-up.
Michigan State University
Computer Science and Engineering
Symbolic Loop Unrolling
Software pipelining can be thought of as
symbolic loop unrolling, but has the
advantage of generating less code.
Michigan State University
Computer Science and Engineering
Software Pipelining has less overhead
Michigan State University
Computer Science and Engineering
Global Code Scheduling
allows moving instructions across branches
Most techniques
concentrate on determining a
Straight-line code segment representing
the most frequently executed code
Michigan State University
Computer Science and Engineering
Trace Scheduling
Concept
1. Guess the likely path through branches
(called the trace)
2. Trace now contains long stretches of code
without taken branches (predicted)
3. Schedule the trace allowing movement across
branches
•
•
Add code to off-the-trace
to undo the effects of movement
The increased ability to move across branches
should improve scheduling
Michigan State University
Computer Science and Engineering
Movement + Undo
Consider
if (cond)
then { x=x + 5; // likely }
else // unlikely
After Movement
x = x + 5;
if (cond)
then { // likely}
else { x = x – 5; // unlikely} // undo
Michigan State University
Computer Science and Engineering
Select a trace
Michigan State University
Computer Science and Engineering
Trace showing jumps off the trace
Michigan State University
Computer Science and Engineering
Superblocks
Avoid the multiple entry and exits of traces.
Superblock has one entry and multiple exits
which makes scheduling easier.
The one-entry-multiple-exit is achieved by
duplicating code where the unlikely path
exits the trace so that no reentry is needed.
Michigan State University
Computer Science and Engineering
Superblock: one entry and multiple exits
Michigan State University
Computer Science and Engineering
Predicated Instructions
Requires
– Hardware
– ISA modification
Predicated instructions eliminate branches,
converting a control dependence
into a data dependence.
IA-64 has predicated instructions,
but many existing ISA contain at least one
(the conditional move).
Michigan State University
Computer Science and Engineering
Conditional Move
if (R1 == 0) R2 = R3;
Branch:
BNEZ R1,L
ADDU R2, R3, R0
L:
Conditional Move:
CMOVZ R2, R3, R1
In a pipeline, the control dependence at the
beginning of the pipeline is transformed into
a data dependence at the end of the pipeline.
Michigan State University
Computer Science and Engineering
Full Predication
Every instruction has a predicate:
if the predicate is false, it becomes a NOP.
It is particularly useful for global scheduling
since non-loop branches can be eliminated:
the harder ones to schedule.
Michigan State University
Computer Science and Engineering
Exceptions & Predication
A predicated instruction must not be
allowed to generate an exception,
if the predicate is false.
Michigan State University
Computer Science and Engineering
Implementation
Although predicated instructions can be
annulled early in the pipeline,
annulling during commit delays
annulment until later so data hazards
have an opportunity to be resolved.
The disadvantage is that resources such
as functional units and registers
(rename or other) are used.
Michigan State University
Computer Science and Engineering
Predication is good for…
• Short alternative control flow
• Eliminating some unpredictable
branches
• Reducing the overhead of global
scheduling
But the precise rules for compilation are
still being determined.
Michigan State University
Computer Science and Engineering
Limitations
• Annulled instructions waste resources:
registers, functional units,
cache & memory bandwidth
• If predicate condition cannot be
separated from the instruction, a branch
might have had better performance,
if it could have been accurately predicted.
Michigan State University
Computer Science and Engineering
Limitations (con’t)
• Predication across multiple branches can
complicate control and is undesirable
unless hardware supports it (as in IA-64).
• Predicated instructions may have a
speed penalty—not the case when all
instructions are predicated.
Michigan State University
Computer Science and Engineering
Example
if (A==0) A=B; else A= A+4;
LD
R1,0(R3)
BNEZ
R1,L1
LD
R1,0(R2)
J
L2
L1: DADDI R1,R1,#4
L2: SD
R1,0(R3)
;load A
;test A
;then clause
;skip else
;else clause
;store A
Michigan State University
Computer Science and Engineering
Hoist Load
if (A==0) A=B; else A= A+4;
L3:
LD
LD
BEQZ
DADDI
SD
R1,0(R3)
R14,0(R2)
R1,L3
R14,R1,#4
R14,0(R3)
;load A
;speculative load B
;other branch of if
;else clause
;store A
What if speculative load raises an exception?
Michigan State University
Computer Science and Engineering
Guard
if (A==0) A=B; else A= A+4;
LD
R1,0(R3)
sLD
R14,0(R2)
BNEZ
R1,L1
SPECCK 0(R2)
J
L2
L1: DADDI
R14,R1,#4
L2: SD
R14,0(R3)
;load A
;speculative load
;test A
;speculative check
;skip else
;else clause
;store A
sLD does not raise certain exceptions;
leaves them for SPECCK (IA-64).
Michigan State University
Computer Science and Engineering
Other exception techniques
• Poison bit:
– applied to destination register.
– set upon exception
– raise exception upon access to poisoned
register.
Michigan State University
Computer Science and Engineering
Hoist Load above Store
If memory addresses are known,
a load can be hoisted above a store.
If not, …
add a special instruction to check addresses
before the loaded value is used.
(It is similar to SPECCK shown earlier: IA-64)
Michigan State University
Computer Science and Engineering
Speculation: soft vs. hard
• must be able to disambiguate memory
(to hoist loads past stores), but at compile
time information is insufficient
• hardware works best when control flow is
unpredictable and when hardware branch
prediction is superior
• exception handling is easier in hardware
• trace techniques require compensation code
• compilers see further for better scheduling
Michigan State University
Computer Science and Engineering
IA-64
Michigan State University
Computer Science and Engineering