Transcript Slide 1

Lecture 8: Modern Dynamic Instruction
Scheduling
Tomasulo weakness, data forwarding,
reg mapping table, generic
superscalar models, examples
1
Tomasulo Performance
Observe at the EX
stage, how many
cycles to execute
this code?
LW
R2,45(R3)
ADD R6,R2,R4
SUB R10,R0,R6
ADD R10,R10,R12
Assume load takes 1
cycle, ALU 1 cycle
IM
Fetch Unit
Reorder
Buffer
Decode
Rename
Regfile
S-buf
L-buf
RS
RS
FU1
FU2
DM
2
Tomasulo vs MIPS Pipeline
How many cycles on
the 5-stage MIPS
pipeline?
Why does the simple
pipeline run faster?
IF
ID
EX
MEM
WB
Stall check
Data forwarding
3
Tomasulo Complexity and Efficiency
Modern processors
employ deep pipeline
=> Can the rename
stage be finished in
one fast cycle?
=> How are register
content storages?
IM
Fetch Unit
Reorder
Buffer
Decode
Rename
Regfile
S-buf
L-buf
RS
RS
FU1
FU2
DM
4
Review Tomasulo Inst Scheduling
Both in RS, no contention on CDB or FU
ADD
SUB
R2,R2,45
R6,R2,R4
Cycle 1:
Cycle 2:
Cycle 3:
# R2=>tag p, result = A
# R4 is ready, = B
ADD starts at FU, producing A
ADD broadcast p + A
SUB matches on p and accepts A
SUB starts execution, FU calc A-B
A is produced at cycle 1, but consumed at
cycle 3 -- unavoidable?
5
Review Data Forwarding
MIPS pipeline data
forwarding:
FU/MEM => FU
Why not in Tomasulo?
REG/ROB
FU
ROB
bypass
Cycle 2: forward A from
FU output to FU
input…
But tag broadcasting has
one cycle delay!!
When is it known that A
will be ready?
Cycle 1: A is to be ready
Cycle 2: A and its tag are
broadcast
If tag is broadcast onecycle earlier …
6
Revise Scheduling*
RS1: ADD
RS2: SUB
RS3: ADD
R6,R2,R4
R10,R0,R6
R12,R10,R6
ADD(1) has been ready and selected
1.
2.
3.
- ADD(1)’s tag is broadcast, and
operands are sent to FU;
- SUB is waken up and selected;
- SUB’s tag is broadcast,
operands are sent to FU;
- forwarding logic replace 2nd FU
operand with FU output;
- ADD(2) is waken up and
accepts FU output, and is
selected
So on and so forth…
RS RS RS RS RS
1 2 3 4 5
SELECT
FU
RS can be centralized or distributed
*Updated
One cycle earlier
How to address CDB contention?
7
Revise Pipeline Stages
FETCH
FETCH
ISSUE
RENAME
EXE
REG/ROB Rd
WB
COMMIT
ISSUE: decode, rename,
allocate RS and ROB, and
read REG/ROB
EX: Wakeup and select inst,
then fu-execute
SCHEDULE
EXE
WB
COMMIT
8
Examples: Intel P6
…
Decode
Decode
Rename
ROB Rd
…
• 40-entry ROB
• 20-entry RS station
• Register Alias Table
9
Rethink RS and ROB design
Data broadcasting to
RS stations:
Broadcasting saves
reg-write to regread delay
n child instructions
can receive data
simultaneously
However,
Data forwarding can
be used
Not all n child
instructions may fuexecute next cycle
RS and ROB may
store duplicate
values
10
Physical Register
RS entry
op
Qj
Qk busy Vj
ROB entry
i-type dest
PC
Vk
Physical register
p1
p2
p3
valid result
p_n
Physical register: collection of all temporary
register contents
11
Register Mapping Approach
Rename architectural
register to physical
register
NO real architectural
registers (now virtual
register)
RS => issue queue
Rename stage: allocate
issue queue entry,
allocate ROB, allocate
physical register
What is tag now?
ra rb rc pc
Mapping Table
pa p b
alloc
free
list
pa
pb
p1
p2
p3
p_n
vala valb
12
Mis-speculation Recovery
RS+ROB: no changes to
arch. registers, so just
clear pipeline and re-fetch
Fundamental issue:
software does not see
wrong register contents
Recovery for mapping
approach: Roll back
mapping table to the misspeculation point
Architectural registers
=> virtual registers
Committed
mapping
ROB
p1
p2
p3
mapping 1
mapping 2
p_n
mapping table
status
How to implement mapping
table supporting recovery?
13
Change of pipeline
FETCH
IM
RENAME
Fetch Unit
SCHEDULE
Decode
Rename
issue queue
REG
ROB
phy. regfile
EXE
WB
COMMIT
S-buf
L-buf
FU1
FU2
DM
14
Example: Intel Pentium 4
Alloc
Rename
Rename
Queue
Schd
Schd
Schd
Disp
Disp
Reg
Reg
Ex
128
entries
15
Alpha 21264 Pipeline
16
Generic Superscalar Processor Models
D-cache
FU
FU
D-cache
FU
FU
Wakeup
select
bypass
Reg
ROB
Schedule
Rename
commit
execute
Reservation based
Fetch
bypass
Regfile
Wakeup
select
Schedule
Rename
Fetch
Issue queue based
commit
execute
Source: Paracharla PhD thesis 1998
17
Summary of Dynamic Scheduling
Pipeline stages



Renaming (in-order)
Schedule
Commit (in-order)
Two organizations


Mapping table + phy reg +
issue queue + ROB;
REN => SCHD => REG
Reg alias table + RS + ROB,
reg in RS and ROB;
REN => REG => SCHD
CDC6600: introduces
scoreboarding
Tomasulo: introduces
renaming and tag
broadcasting
Reorder buffer: provides inorder commit
Real OOO processors

Scheduling methods

Tag broadcasting vs.
scoreboarding (later)


very complicated (like a
vehicle)
bring impl variants
but all root in those basic
designs
18