Pipeline Presentation

Download Report

Transcript Pipeline Presentation

Pipeline Enhancements for the Y86 Architecture
Kelly Carothers
Enhancments done
Hardware:
BTFNT Branch Jumping
Load-forwarding for variables
Software:
Use of IADDL
Rearrangement of code
Loop Unrolling
Load-forwarding
The passing of variables from further in the pipe
backwards before it is written to a register or
memory.
CPE Avg: 17.15
Load-forwarding from Memory stage
to Execute Stage
IADDL
Single instruction replaces the IRMOVL and ADDL
instructions for an immediate add.
CPE Avg: 14.22
IADDL implementation
IADDL Code Comparison:
Original vs. Modified
# Loop header
xorl %esi,%esi
# count = 0;
andl %edx,%edx
# len <= 0?
jle Done
# if so, goto Done:
# Loop body.
Loop: mrmovl (%ebx), %eax #src...
rmmovl %eax, (%ecx) # ...and
store it to dst
andl %eax, %eax
# val <= 0?
jle Npos
# if so, goto Npos:
irmovl $1, %edi
addl %edi, %esi
# count++
Npos: irmovl $1, %edi
subl %edi, %edx
# len-irmovl $4, %edi
addl %edi, %ebx
# src++
addl %edi, %ecx
# dst++
andl %edx,%edx
# len > 0?
jg Loop
# if so, goto Loop:
# Loop header
xorl %esi,%esi
# count = 0;
andl %edx,%edx
# len <= 0?
jle Done
# if so, goto Done:
# Loop body.
Loop: mrmovl (%ebx), %eax
rmmovl %eax, (%ecx)
andl %eax, %eax
# val <= 0?
jle Npos
# if so, goto Npos:
iaddl $1, %esi
# count++
Npos: iaddl $-1, %edx # len-iaddl $4, %ebx
# src++
iaddl $4, %ecx
# dst++
andl %edx,%edx
# len > 0?
jg Loop
# if so, goto Loop:
BTFNT Branch Jumping
BTFNT – Backwards Taken Forwards Not Taken:
Always take the smaller address.
CPE Avg : 12.37
Code Rearrangement
*Code was arranged specifically for BTFNT
*Many unnecessary checks removed
Avg CPE: 11.71
Code Rearrangement:
IADDL Mod vs. End Result
# Loop header
xorl %esi,%esi
# count = 0;
andl %edx,%edx
# len <= 0?
jle Done
# if so, goto Done:
# Loop body.
Loop: mrmovl (%ebx), %
rmmovl %eax, (%ecx)
andl %eax, %eax
# val <= 0?
jle Npos
# if so, goto Npos:
iaddl $1, %esi
# count++
Npos: iaddl $-1, %edx # len-iaddl $4, %ebx # src++
iaddl $4, %ecx # dst++
andl %edx,%edx
# len > 0?
jg Loop
# if so, goto Loop:
rrmovl %edx, %esi
iaddl $1, %edx
Loop: iaddl $-1, %edx
jle Done
Loop1: mrmovl (%ebx), %eax
rmmovl %eax, (%ecx)
Npos: iaddl $4, %ebx
# src++
iaddl $4, %ecx
# dst++
andl %eax, %eax
jle decEsi
jmp Loop
decEsi: iaddl $-1, %esi,
jg Loop
Loop Unrolling
*Increases code size
*Decreases CPE
Loop Unrolling:
No unrolling vs. 1 unroll
Loop1: mrmovl (%ebx), %eax
rmmovl %eax, (%ecx)
Npos: iaddl $4, %ebx
# src++
iaddl $4, %ecx
# dst++
andl %eax, %eax
jle decEsi
jmp Loop
Loop1: mrmovl (%ebx), %eax
rmmovl %eax, (%ecx)
Npos: iaddl $4, %ebx
# src++
iaddl $4, %ecx
# dst++
andl %eax, %eax
jle decEsi
iaddl $-1, %edx
jle Done
mrmovl (%ebx), %eax
rmmovl %eax, (%ecx)
iaddl $4, %ebx
iaddl $4, %ecx
andl %eax, %eax
jle decEsi
jmp Loop
Loop Unrolling Results
No Unrolling, Base Avg. CPE: 11.64
1 Unroll, Avg CPE: 11.16
2 Unrolls, Avg CPE: 11.00
Total Results
Initial Avg CPE: 18.15
Final Avg CPE: 11.00
Total Decrease of 7.15 CPE.
Final Results
Enhancement
AVG CPE
CPE Decrease
None
18.15
-------
Load-Forwarding
17.15
1.00
IADDL
14.22
2.93
BTFNT
12.37
1.85
Code Rearranging
11.64
.73
1 Loop Unrolled
11.16
.48
2 Loops Unrolled
11.00
.16