Scoreboarding & Tomasulo’s Approach
Download
Report
Transcript Scoreboarding & Tomasulo’s Approach
Scoreboarding & Tomasulo’s
Approach
Bazat pe slide-urile lui Vincent H.
Berk
Scoreboarding
3/43
Scoreboard
Registers
Data buses
FP mult
FP mult
FP divide
FP add
Integer unit
Scoreboard
Control/status
Control/status
Figure A.51 The basic structure of a DLX processor with a scoreboard
4/43
Four Stages of Scoreboard Control: ISSUE
1. Issue: decode instructions & check
for structural hazards (ID1)
If a functional unit for the instruction is free and no other active
instruction has the same destination register (WAW), the
scoreboard issues the instruction to the functional unit and
updates its internal data structure. If a structural or WAW
hazard exists, then the instruction issue stalls, and no further
instructions will issue until these hazards are cleared.
Algorithm:
Assure In-Order issue
Multiple issues per cycle are allowed
Check if Destination Register is already reserved for writing
(WAW)
Check if Read-Operand stage of Functional Unit is free
(Structural)
Four Stages of Scoreboard Control:
READ-OPERANDS
5/43
2.Read operands: wait until no data
hazards, then read operands (ID2) –
First Functional Pipeline Stage
A source operand is available if no earlier issued active instruction
is going to write it, or if the register containing the operand is
being written by a currently active functional unit. When the
source operands are available, the scoreboard tells the functional
unit to proceed to read the operands from the registers and begin
execution. The scoreboard resolves RAW hazards dynamically in
this step, and instructions may be sent into execution out of order.
Algorithm:
Wait for operands to become available, Register Result Status
(RAW)
Operand Caching is allowed
Forwarding from another WB stage is allowed
6/43
Four Stages of Scoreboard Control – ex + write
3. Execution: operate on operands (EX)
The functional unit begins execution upon receiving operands.
When the result is ready, it notifies the scoreboard that it has
completed execution. This stage can be (sub-)pipelined.
4. Write result: finish execution (WB)
Once the scoreboard is aware that the functional unit has
completed execution, the scoreboard checks for WAR hazards.
If none, it writes results. If WAR, it stalls the instruction.
Algorithm:
Delay write until all Rj and Rk fields for this register are marked as
either cached or read.
If caching of operands is done: forward answer right away.
If not, wait until all operands are read before writing.
Forward answers to units waiting for this write for their operand.
Three Parts of the Scoreboard
1. Instruction status
Indicates which of 4 steps the instruction is in.
2. Functional unit status
Indicates the state of the functional unit (FU). 9 fields for each
functional unit
Busy – Indicates whether the unit is busy or not
Op – Operation to perform in the unit (e.g., + or -)
Fi – Destination register
Fj, Fk – Source-register numbers
Qj, Qk – Functional units producing source registers Fj, Fk
Rj, Rk – Flags indicating when Fj, Fk are available and not
yet read. (Alternatively: read and cached)
3. Register result status:
Indicates which functional unit will write each register, if
one exists. Blank when no pending instructions will
write that register.
7/43
8/43
Scoreboard Example Cycle 1
Instruction status
Instruction
j
LD
F6
34
LD
F2
45
MULTD F0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional unit status
Name
Integer
Mult1
Mult2
Add
Divide
Issue
Read
Execution Write
operands complete Result
1
Clock
dest
S1
S2
1
FU for j FU for k
Fj?
Fk?
Busy
Yes
No
No
No
No
Op
Load
Fi
F6
Fj
Fk
R2
Qj
Qk
Rj
Rk
Yes
F0
F2
F4
F6
Int
F8
F10
F12
...
F30
Register result status
FU
R2 has not been read/cached until cycle 2!!!
9/43
Scoreboard Example Cycle 2
Instruction status
Instruction
j
LD
F6
34
LD
F2
45
MULTD F0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional unit status
Name
Integer
Mult1
Mult2
Add
Divide
Issue
1
Read
Execution Write
operands complete Result
2
Clock
dest
S1
S2
2
FU for j FU for k
Fj?
Fk?
Busy
Yes
No
No
No
No
Op
Load
Fi
F6
Fj
Fk
R2
Qj
Qk
Rj
Rk
Yes
F0
F2
F4
F6
Int
F8
F10
F12
...
F30
Register result status
FU
Issue 2nd LD or MULT?
10/43
Scoreboard Example Cycle 4
Instruction status
Instruction
j
LD
F6
34
LD
F2
45
MULTD F0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional unit status
Name
Integer
Mult1
Mult2
Add
Divide
Issue
1
Read
Execution Write
operands complete Result
2
3
4
Clock
dest
S1
S2
4
FU for j FU for k
Fj?
Fk?
Busy
Yes
No
No
No
No
Op
Load
Fi
F6
Fj
Fk
R2
Qj
Qk
Rj
Rk
No
Yes
F0
F2
F4
F6
Int
F8
F10
F12
...
F30
Register result status
FU
11/43
Scoreboard Example Cycle 5
Instruction status
Instruction
j
LD
F6
34
LD
F2
45
MULTD F0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional unit status
Name
Integer
Mult1
Mult2
Add
Divide
Issue
1
5
Read
Execution Write
operands complete Result
2
3
4
Clock
dest
S1
S2
5
FU for j FU for k
Fj?
Fk?
Busy
Yes
No
No
No
No
Op
Load
Fi
F2
Fj
Fk
R3
Qj
Qk
Rj
Rk
Yes
F0
F2
Int
F4
F6
F8
F10
F12
...
F30
Register result status
FU
SUPERSCALAR: Issue MULTD?
12/43
Scoreboard Example Cycle 6
Instruction status
Instruction
j
LD
F6
34
LD
F2
45
MULTD F0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional unit status
Name
Integer
Mult1
Mult2
Add
Divide
Issue
1
5
6
Read
Execution Write
operands complete Result
2
6
3
4
Clock
dest
S1
S2
Qj
F10
Busy
Yes
Yes
No
No
No
Op
Load
Mult
Fi
F2
F0
Fj
F2
Fk
R3
F4
F0
Mult1
F2
Int
F4
F6
F8
6
FU for j FU for k
Qk
Int
Fj?
Fk?
Rj
No
Rk
Yes
Yes
...
F30
Register result status
FU
F12
13/43
Scoreboard Example Cycle 7
Instruction status
Instruction
j
LD
F6
34
LD
F2
45
MULTD F0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional unit status
Name
Integer
Mult1
Mult2
Add
Divide
Issue
1
5
6
7
Read
Execution Write
operands complete Result
2
6
3
7
4
dest
S1
S2
Clock
Busy
Yes
Yes
No
Yes
No
Op
Load
Mult
Fi
F2
F0
Fj
F2
Fk
R3
F4
Sub
F8
F6
F2
F0
Mult1
F2
Int
F4
F6
F8
Add
7
FU for j FU for k
Qj
Fj?
Fk?
Rj
No
Rk
No
Yes
Int
Yes
No
F12
...
F30
Qk
Int
Register result status
FU
F10
Read multiply operands? DIVD could have been issued on this cycle.
14/43
Scoreboard Example Cycle 8a
Instruction status
Instruction
j
LD
F6
34
LD
F2
45
MULTD F0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional unit status
Name
Integer
Mult1
Mult2
Add
Divide
Issue
1
5
6
7
8
Read
Execution Write
operands complete Result
2
6
3
7
4
dest
S1
S2
Clock
8
FU for j FU for k
Busy
Yes
Yes
No
Yes
Yes
Op
Load
Mult
Fi
F2
F0
Fj
F2
Fk
R3
F4
Qj
Sub
Div
F8
F10
F6
F0
F2
F6
Mult1
F0
Mult1
F2
Int
F4
F6
F8
Add
F10
Div
Fj?
Fk?
Rj
No
Rk
Yes
Yes
Int
Yes
No
No
Yes
F12
...
F30
Qk
Int
Register result status
FU
15/43
Scoreboard Example Cycle 8b
Instruction status
Instruction
j
LD
F6
34
LD
F2
45
MULTD F0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional unit status
Name
Integer
Mult1
Mult2
Add
Divide
Issue
1
5
6
7
8
Busy
No
Yes
No
Yes
Yes
Read
Execution Write
operands complete Result
2
6
3
7
4
8
dest
S1
S2
Op
Fi
Fj
Fk
Mult
F0
F2
Sub
Div
F8
F10
F2
F4
Clock
8
FU for j FU for k
Fj?
Fk?
Rj
Rk
F4
Yes
Yes
F6
F0
F2
F6
Mult1
Yes
No
Yes
Yes
F6
F8
Add
F10
Div
...
F30
Qj
Qk
Register result status
FU
F0
Mult1
F12
16/43
Scoreboard Example Cycle 9
Instruction status
Instruction
j
LD
F6
34
LD
F2
45
MULTD F0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional unit status
Name
Integer
Mult1
Mult2
Add
Divide
Issue
1
5
6
7
8
Busy
No
Yes
No
Yes
Yes
Read
Execution Write
operands complete Result
2
6
9
9
3
7
4
8
dest
S1
S2
Op
Fi
Fj
Fk
Mult
F0
F2
Sub
Div
F8
F10
F2
F4
Clock
9
FU for j FU for k
Fj?
Fk?
Rj
Rk
F4
Yes
Yes
F6
F0
F2
F6
Mult1
Yes
No
Yes
Yes
F6
F8
Add
F10
Div
...
F30
Qj
Qk
Register result status
FU
Issue ADDD?
F0
Mult1
F12
17/43
Scoreboard Example Cycle 11
Instruction status
Instruction
j
LD
F6
34
LD
F2
45
MULTD F0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional unit status
Name
Integer
Mult1
Mult2
Add
Divide
Issue
1
5
6
7
8
Busy
No
Yes
No
Yes
Yes
Read
Execution Write
operands complete Result
2
6
9
9
3
7
4
8
Clock 11
11
dest
S1
S2
Op
Fi
Fj
Fk
Mult
F0
F2
Sub
Div
F8
F10
F2
F4
FU for j FU for k
Fj?
Fk?
Rj
Rk
F4
Yes
Yes
F6
F0
F2
F6
Mult1
Yes
No
Yes
Yes
F6
F8
Add
F10
Div
...
F30
Qj
Qk
Register result status
FU
F0
Mult1
F12
18/43
Scoreboard Example Cycle 12
Instruction status
Instruction
j
LD
F6
34
LD
F2
45
MULTD F0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional unit status
Name
Integer
Mult1
Mult2
Add
Divide
Issue
1
5
6
7
8
Busy
No
Yes
No
No
Yes
Read
Execution Write
operands complete Result
2
6
9
9
3
7
4
8
11
12
dest
S1
S2
Op
Fi
Fj
Fk
Mult
F0
F2
F4
Div
F10
F0
F6
Mult1
F2
F4
F6
F8
F10
Div
Clock 12
FU for j FU for k
Qj
Qk
Fj?
Fk?
Rj
Rk
Yes
Yes
No
Yes
...
F30
Register result status
FU
F0
Mult1
F12
19/43
Scoreboard Example Cycle 13
Instruction status
Instruction
j
LD
F6
34
LD
F2
45
MULTD F0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional unit status
Name
Integer
Mult1
Mult2
Add
Divide
Issue
1
5
6
7
8
13
Busy
No
Yes
No
Yes
Yes
Read
Execution Write
operands complete Result
2
6
9
9
3
7
4
8
11
12
dest
S1
S2
Op
Fi
Fj
Fk
Mult
F0
F2
Add
Div
F6
F10
F2
F4
Clock 13
FU for j FU for k
Fj?
Fk?
Rj
Rk
F4
Yes
Yes
F8
F0
F2
F6
Yes
No
Yes
Yes
F6
Add
F8
...
F30
Qj
Qk
Mult1
Register result status
FU
F0
Mult1
F10
Div
F12
Scoreboarding Summary
Limitations of CDC 6600 scoreboard
No forwarding hardware
Limited to instructions in basic block (small
window)
Small number of functional units (structural
hazards), especially integer/load/store units
Do not issue if structural or WAW hazards
Wait for WAR hazards
Imprecise exceptions
Key idea: Allow instructions behind stall
to proceed
Decode issue instructions and read operands
Enables out-of-order execution out-of-order
completion
20/43
Scoreboarding Summary
21/43
Modern Day Improvements:
All operands are cached as soon as available
Forwarding
Pipelining Functional Units
Microcoding, eg. IA32 (widens execution window)
More precise exceptions
In order retirement
Works best with tons of actual registers
Tomasulo approach:
Reservation stations vs. Forwarding and Caching
Temporary Registers work as many virtual registers
Tomasulo’s Approach
Hardware Schemes for ILP
Key idea: Allow instructions behind stall to
proceed
Decode => issue instructions and read operands
Enables out-of-order execution => out-of-order
completion
Why in hardware at run time?
Works when dependence is not known at run time
Simplifies compiler
Allows code for one machine to run well on another
Out-of-order execution divides ID stage:
Issue — decode instructions, check for structural
hazards
Read operands — wait until no data hazards, then
read operands
23/43
Tomasulo’s Algorithm
24/43
For IBM 360/91 about 3 years after CDC 6600
Goal: High performance without special compilers
Differences between IBM 360 & CDC 6600 ISA
IBM has only 2 register specifiers/instruction vs. 3 in CDC 6600
IBM has 4 FP registers vs. 8 in CDC 6600
Differences between Tomasulo’s Algorithm & Scoreboard
Control & buffers (called “reservation stations”) distributed with
functional units vs. centralized in scoreboard
Registers in instructions replaced by pointers to reservation station
buffer
HW renaming of registers to avoid WAR, WAW hazards
Common data bus (CDB) broadcasts results to functional units
Load and stores treated as functional units as well
Alpha 21264, HP 8000, MIPS 10000, Pentium III, PowerPC
604, ...
Three Stages of Tomasulo Algorithm
1. Issue: Get instruction from FP operation queue
If reservation station free, issues instruction &
sends operands (renames registers).
2. Execution:
Operate on operands (EX)
When operands ready then execute; if not ready,
watch common data bus for result.
3. Write result: Finish execution (WB)
Write on common data bus to all awaiting units;
mark reservation station available.
Common data bus: data + source (“come from” bus)
25/43
26/43
Tomasulo Organization
From Instruction Unit
From
Memory
Load
Buffers
FP Registers
FP Op
Queue
Operand
Bus
To Memory
Operation Bus
FP Add
Res.
Station
FP Mul
Res.
Station
Reservation
Stations
FP Adders
FP Multipliers
Common data bus (CDB)
Store
Buffers
Reservation Station Components
Op – Operation to perform in the unit (e.g., + or – )
Qj, Qk –
Reservation stations producing source registers
Vj, Vk – Value of source operands
Rj, Rk – Flags indicating when Vj, Vk are ready
Busy – Indicates reservation station and FU is busy
Register result status
Indicates which functional unit will write each register, if one
exists. Blank when no pending instructions will write that
register.
27/43
28/43
Tomasulo Example Cycle 1
Instruction status
Instruction
j
k
LD
F6
34
R2
LD
F2
45
R3
MULTD F0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Reservation Stations
Name Busy
Add1
No
Add2
Add3
Mult1
Mult2
Register result status
Op
Write
Result
Load1
Load2
Load3
S1
S2
RS for j
RS for k
Vj
Vk
Qj
Qk
Busy Address
Yes 34+R2
No
No
Clo c k 1
No
No
No
No
F0
FU
Issue
1
Execution
complete
F2
F4
F6
Load1
F8
F10
F12
...
F30
29/43
Tomasulo Example Cycle 2
ENGS 116 Lecture 8
Instruction status
Instruction
j
k
LD
F6
34
R2
LD
F2
45
R3
MULTD F0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Reservation Stations
Name Busy
Add1
No
Add2
Add3
Mult1
Mult2
Register result status
Op
Write
Result
Load1
Load2
Load3
S1
S2
RS for j
RS for k
Vj
Vk
Qj
Qk
Busy Address
Yes 34+R2
Yes 45+R3
No
Clo c k 2
No
No
No
No
F0
FU
Issue
1
2
Execution
complete
29
F2
Load2
F4
F6
Load1
F8
F10
F12
...
F30
Tomasulo Example Cycle 3
Instruction status
Instruction
j
LD
F6
34
LD
F2
45
MULTD F0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
Reservation Stations
k
R2
R3
F4
F2
F6
F2
Name Busy
Add1
No
Add2
Add3
Mult1
Mult2
Register result status
FU
Issue
1
2
3
Op
Execution
complete
3
Write
Result
S1
S2
RS for j
RS for k
Vj
Vk
Qj
Qk
Clo c k 3
No
No
Yes MULTD
No
F0
F2
Mult1 Load2
Load1
Load2
Load3
Busy Address
Yes 34+R2
Yes 45+R3
No
F4
R(F4)
Load2
F6
Load1
F8
F10
Register names are renamed in reservation stations
Load1 completing — who is waiting for Load1?
F12
...
F30
30/43
31/43
Tomasulo Example Cycle 4
Instruction status
Instruction
j
k
LD
F6
34
R2
LD
F2
45
R3
MULTD F0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Reservation Stations
Issue
1
2
3
4
Execution
complete
3
4
Write
Result
4
S1
S2
RS for j
RS for k
Vk
Qj
Qk
Load2
Name Busy Op
Vj
Add1
Yes SUBD M(34+R2)
Add2
Add3
Mult1
Mult2
Register result status
FU
Busy Address
No
Yes 45+R3
No
Clo c k 4
No
No
Yes MULTD
No
F0
F2
Mult1 Load2
Load1
Load2
Load3
F4
R(F4)
Load2
F6
M(34+R2)
F8
Add1
Load2 completing — who is waiting for it?
F10
F12
...
F30
32/43
Tomasulo Example Cycle 5
Instruction status
Instruction
j
k
LD
F6
34
R2
LD
F2
45
R3
MULTD F0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Reservation Stations
Issue
1
2
3
4
5
Execution
complete
3
4
Write
Result
4
5
S1
S2
RS for j
RS for k
Vk
M(45+R3)
Qj
Qk
Name Busy Op
Vj
Add1
Yes SUBD M(34+R2)
Add2
Add3
Mult1
Mult2
Register result status
FU
No
No
Yes MULTD M(45+R3)
Yes DIVD
F0
F2
Mult1 M(45+R3)
F4
Load1
Load2
Load3
Busy Address
No
No
No
Clo c k 5
R(F4)
M(34+R2)
F6
Mult1
F8
Add1
F10
Mult2
F12
...
F30
33/43
Tomasulo Example Cycle 6
Instruction status
Instruction
j
k
LD
F6
34
R2
LD
F2
45
R3
MULTD F0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Reservation Stations
Issue
1
2
3
4
5
6
Execution
complete
3
4
Write
Result
4
5
S1
S2
RS for j
RS for k
Vk
M(45+R3)
Qj
Qk
M(45+R3)
Add1
R(F4)
M(34+R2)
Mult1
F6
Add2
F8
Add1
Name Busy Op
Vj
Add1
Yes SUBD M(34+R2)
Add2
Add3
Mult1
Mult2
Register result status
FU
Yes ADDD
No
Yes MULTD M(45+R3)
Yes DIVD
F0
Mult1
F2
F4
Load1
Load2
Load3
Busy Address
No
No
No
Clo c k 6
F10
Mult2
F12
...
F30
Tomasulo Summary
Reservation stations: renaming to larger set of
registers + buffering source operands
Prevents registers as bottleneck
Avoids WAR, WAW hazards of scoreboard
Allows loop unrolling in HW
Not limited to basic blocks
(integer units get ahead, beyond branches)
Lasting Contributions
Dynamic scheduling
Register renaming
Load/store disambiguation
360/91 descendants are Pentium III; PowerPC 604;
MIPS R10000; HP-PA 8000; Alpha 21264
34/43
Tomasulo with Speculation
1.
35/43
Issue – Empty reservation station and an empty ROB
slot. Send operands to reservation station from register
file or from ROB. This stage is often referred to as:
dispatch
2. Execute – Monitor CDB for operands, check RAW
hazards. When both operands are available, then
execute.
3. Write Result – When available, write result to CDB
through to ROB and any waiting reservation stations.
Stores write to value field in ROB.
4. Commit – Three cases:
•
Normal Commit: write registers, in order commit
•
Store: update memory
•
Incorrect branch: flush ROB, reservation stations and
restart execution at correct PC
36/43
Now, for the grand finale
Let’s compare!!!
38/43
Scoreboard
Registers
Data buses
FP mult
FP mult
FP divide
FP add
Integer unit
Scoreboard
Control/status
Control/status
Figure A.51 The basic structure of a DLX processor with a scoreboard
39/43
Tomasulo Organization
From Instruction Unit
From
Memory
Load
Buffers
FP Registers
FP Op
Queue
Operand
Bus
To Memory
Operation Bus
FP Add
Res.
Station
FP Mul
Res.
Station
Reservation
Stations
FP Adders
FP Multipliers
Common data bus (CDB)
Store
Buffers
40/43
Scoreboard Example Cycle 6
Instruction status
Instruction
j
LD
F6
34
LD
F2
45
MULTD F0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional unit status
Name
Integer
Mult1
Mult2
Add
Divide
Issue
1
5
6
Read
Execution Write
operands complete Result
2
6
3
4
Clock
dest
S1
S2
Qj
F10
Busy
Yes
Yes
No
No
No
Op
Load
Mult
Fi
F2
F0
Fj
F2
Fk
R3
F4
F0
Mult1
F2
Int
F4
F6
F8
6
FU for j FU for k
Qk
Int
Fj?
Fk?
Rj
No
Rk
Yes
Yes
...
F30
Register result status
FU
F12
41/43
Tomasulo Example – cycle 6
Instruction status
Instruction
j
k
LD
F6
34
R2
LD
F2
45
R3
MULTD F0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Reservation Stations
Issue
1
2
3
4
5
6
Execution
complete
3
4
Write
Result
4
5
S1
S2
RS for j
RS for k
Vk
M(45+R3)
Qj
Qk
M(45+R3)
Add1
R(F4)
M(34+R2)
Mult1
F6
Add2
F8
Add1
Name Busy Op
Vj
Add1
Yes SUBD M(34+R2)
Add2
Add3
Mult1
Mult2
Register result status
FU
Yes ADDD
No
Yes MULTD M(45+R3)
Yes DIVD
F0
Mult1
F2
F4
Load1
Load2
Load3
Busy Address
No
No
No
Clo c k 6
F10
Mult2
F12
...
F30
42/43
Differences between Tomasulo’s Algorithm & Scoreboard
Control & buffers
(“reservation stations”)
distributed with functional
units
Registers in instructions
replaced by pointers to
reservation station buffer
HW renaming of registers
to avoid WAR, WAW
hazards
Common data bus (CDB)
broadcasts results to
functional units
Load and stores treated as
functional units as well
Stages: Issue, Execution,
Write result
Control & buffers
centralized
Use actual registers
Do not issue if structural
or WAW hazards
Wait for WAR hazards
Forwarding?
Stages: Issue, Read
operands, Execution,
Write result
43/43