Scoreboarding & Tomasulo’s Approach

Download Report

Transcript Scoreboarding & Tomasulo’s Approach

Scoreboarding & Tomasulo’s
Approach
Bazat pe slide-urile lui Vincent H.
Berk
Scoreboarding
3/43
Scoreboard
Registers
Data buses
FP mult
FP mult





FP divide
FP add
Integer unit
Scoreboard
Control/status
Control/status
Figure A.51 The basic structure of a DLX processor with a scoreboard
4/43
Four Stages of Scoreboard Control: ISSUE
1. Issue: decode instructions & check
for structural hazards (ID1)
If a functional unit for the instruction is free and no other active
instruction has the same destination register (WAW), the
scoreboard issues the instruction to the functional unit and
updates its internal data structure. If a structural or WAW
hazard exists, then the instruction issue stalls, and no further
instructions will issue until these hazards are cleared.
Algorithm:
 Assure In-Order issue
 Multiple issues per cycle are allowed
 Check if Destination Register is already reserved for writing
(WAW)
 Check if Read-Operand stage of Functional Unit is free
(Structural)
Four Stages of Scoreboard Control:
READ-OPERANDS
5/43
2.Read operands: wait until no data
hazards, then read operands (ID2) –
First Functional Pipeline Stage
A source operand is available if no earlier issued active instruction
is going to write it, or if the register containing the operand is
being written by a currently active functional unit. When the
source operands are available, the scoreboard tells the functional
unit to proceed to read the operands from the registers and begin
execution. The scoreboard resolves RAW hazards dynamically in
this step, and instructions may be sent into execution out of order.
Algorithm:
 Wait for operands to become available, Register Result Status
(RAW)
 Operand Caching is allowed
 Forwarding from another WB stage is allowed
6/43
Four Stages of Scoreboard Control – ex + write
3. Execution: operate on operands (EX)
 The functional unit begins execution upon receiving operands.
When the result is ready, it notifies the scoreboard that it has
completed execution. This stage can be (sub-)pipelined.
4. Write result: finish execution (WB)
 Once the scoreboard is aware that the functional unit has
completed execution, the scoreboard checks for WAR hazards.
If none, it writes results. If WAR, it stalls the instruction.
Algorithm:
 Delay write until all Rj and Rk fields for this register are marked as
either cached or read.
 If caching of operands is done: forward answer right away.
 If not, wait until all operands are read before writing.
 Forward answers to units waiting for this write for their operand.
Three Parts of the Scoreboard
1. Instruction status
 Indicates which of 4 steps the instruction is in.
2. Functional unit status

Indicates the state of the functional unit (FU). 9 fields for each
functional unit

Busy – Indicates whether the unit is busy or not

Op – Operation to perform in the unit (e.g., + or -)

Fi – Destination register

Fj, Fk – Source-register numbers


Qj, Qk – Functional units producing source registers Fj, Fk
Rj, Rk – Flags indicating when Fj, Fk are available and not
yet read. (Alternatively: read and cached)
3. Register result status:
 Indicates which functional unit will write each register, if
one exists. Blank when no pending instructions will
write that register.
7/43
8/43
Scoreboard Example Cycle 1
Instruction status
Instruction
j
LD
F6
34
LD
F2
45
MULTD F0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional unit status
Name
Integer
Mult1
Mult2
Add
Divide
Issue
Read
Execution Write
operands complete Result
1
Clock
dest
S1
S2
1
FU for j FU for k
Fj?
Fk?
Busy
Yes
No
No
No
No
Op
Load
Fi
F6
Fj
Fk
R2
Qj
Qk
Rj
Rk
Yes
F0
F2
F4
F6
Int
F8
F10
F12
...
F30
Register result status
FU
R2 has not been read/cached until cycle 2!!!
9/43
Scoreboard Example Cycle 2
Instruction status
Instruction
j
LD
F6
34
LD
F2
45
MULTD F0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional unit status
Name
Integer
Mult1
Mult2
Add
Divide
Issue
1
Read
Execution Write
operands complete Result
2
Clock
dest
S1
S2
2
FU for j FU for k
Fj?
Fk?
Busy
Yes
No
No
No
No
Op
Load
Fi
F6
Fj
Fk
R2
Qj
Qk
Rj
Rk
Yes
F0
F2
F4
F6
Int
F8
F10
F12
...
F30
Register result status
FU
Issue 2nd LD or MULT?
10/43
Scoreboard Example Cycle 4
Instruction status
Instruction
j
LD
F6
34
LD
F2
45
MULTD F0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional unit status
Name
Integer
Mult1
Mult2
Add
Divide
Issue
1
Read
Execution Write
operands complete Result
2
3
4
Clock
dest
S1
S2
4
FU for j FU for k
Fj?
Fk?
Busy
Yes
No
No
No
No
Op
Load
Fi
F6
Fj
Fk
R2
Qj
Qk
Rj
Rk
No
Yes
F0
F2
F4
F6
Int
F8
F10
F12
...
F30
Register result status
FU
11/43
Scoreboard Example Cycle 5
Instruction status
Instruction
j
LD
F6
34
LD
F2
45
MULTD F0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional unit status
Name
Integer
Mult1
Mult2
Add
Divide
Issue
1
5
Read
Execution Write
operands complete Result
2
3
4
Clock
dest
S1
S2
5
FU for j FU for k
Fj?
Fk?
Busy
Yes
No
No
No
No
Op
Load
Fi
F2
Fj
Fk
R3
Qj
Qk
Rj
Rk
Yes
F0
F2
Int
F4
F6
F8
F10
F12
...
F30
Register result status
FU
SUPERSCALAR: Issue MULTD?
12/43
Scoreboard Example Cycle 6
Instruction status
Instruction
j
LD
F6
34
LD
F2
45
MULTD F0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional unit status
Name
Integer
Mult1
Mult2
Add
Divide
Issue
1
5
6
Read
Execution Write
operands complete Result
2
6
3
4
Clock
dest
S1
S2
Qj
F10
Busy
Yes
Yes
No
No
No
Op
Load
Mult
Fi
F2
F0
Fj
F2
Fk
R3
F4
F0
Mult1
F2
Int
F4
F6
F8
6
FU for j FU for k
Qk
Int
Fj?
Fk?
Rj
No
Rk
Yes
Yes
...
F30
Register result status
FU
F12
13/43
Scoreboard Example Cycle 7
Instruction status
Instruction
j
LD
F6
34
LD
F2
45
MULTD F0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional unit status
Name
Integer
Mult1
Mult2
Add
Divide
Issue
1
5
6
7
Read
Execution Write
operands complete Result
2
6
3
7
4
dest
S1
S2
Clock
Busy
Yes
Yes
No
Yes
No
Op
Load
Mult
Fi
F2
F0
Fj
F2
Fk
R3
F4
Sub
F8
F6
F2
F0
Mult1
F2
Int
F4
F6
F8
Add
7
FU for j FU for k
Qj
Fj?
Fk?
Rj
No
Rk
No
Yes
Int
Yes
No
F12
...
F30
Qk
Int
Register result status
FU
F10
Read multiply operands? DIVD could have been issued on this cycle.
14/43
Scoreboard Example Cycle 8a
Instruction status
Instruction
j
LD
F6
34
LD
F2
45
MULTD F0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional unit status
Name
Integer
Mult1
Mult2
Add
Divide
Issue
1
5
6
7
8
Read
Execution Write
operands complete Result
2
6
3
7
4
dest
S1
S2
Clock
8
FU for j FU for k
Busy
Yes
Yes
No
Yes
Yes
Op
Load
Mult
Fi
F2
F0
Fj
F2
Fk
R3
F4
Qj
Sub
Div
F8
F10
F6
F0
F2
F6
Mult1
F0
Mult1
F2
Int
F4
F6
F8
Add
F10
Div
Fj?
Fk?
Rj
No
Rk
Yes
Yes
Int
Yes
No
No
Yes
F12
...
F30
Qk
Int
Register result status
FU
15/43
Scoreboard Example Cycle 8b
Instruction status
Instruction
j
LD
F6
34
LD
F2
45
MULTD F0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional unit status
Name
Integer
Mult1
Mult2
Add
Divide
Issue
1
5
6
7
8
Busy
No
Yes
No
Yes
Yes
Read
Execution Write
operands complete Result
2
6
3
7
4
8
dest
S1
S2
Op
Fi
Fj
Fk
Mult
F0
F2
Sub
Div
F8
F10
F2
F4
Clock
8
FU for j FU for k
Fj?
Fk?
Rj
Rk
F4
Yes
Yes
F6
F0
F2
F6
Mult1
Yes
No
Yes
Yes
F6
F8
Add
F10
Div
...
F30
Qj
Qk
Register result status
FU
F0
Mult1
F12
16/43
Scoreboard Example Cycle 9
Instruction status
Instruction
j
LD
F6
34
LD
F2
45
MULTD F0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional unit status
Name
Integer
Mult1
Mult2
Add
Divide
Issue
1
5
6
7
8
Busy
No
Yes
No
Yes
Yes
Read
Execution Write
operands complete Result
2
6
9
9
3
7
4
8
dest
S1
S2
Op
Fi
Fj
Fk
Mult
F0
F2
Sub
Div
F8
F10
F2
F4
Clock
9
FU for j FU for k
Fj?
Fk?
Rj
Rk
F4
Yes
Yes
F6
F0
F2
F6
Mult1
Yes
No
Yes
Yes
F6
F8
Add
F10
Div
...
F30
Qj
Qk
Register result status
FU
Issue ADDD?
F0
Mult1
F12
17/43
Scoreboard Example Cycle 11
Instruction status
Instruction
j
LD
F6
34
LD
F2
45
MULTD F0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional unit status
Name
Integer
Mult1
Mult2
Add
Divide
Issue
1
5
6
7
8
Busy
No
Yes
No
Yes
Yes
Read
Execution Write
operands complete Result
2
6
9
9
3
7
4
8
Clock 11
11
dest
S1
S2
Op
Fi
Fj
Fk
Mult
F0
F2
Sub
Div
F8
F10
F2
F4
FU for j FU for k
Fj?
Fk?
Rj
Rk
F4
Yes
Yes
F6
F0
F2
F6
Mult1
Yes
No
Yes
Yes
F6
F8
Add
F10
Div
...
F30
Qj
Qk
Register result status
FU
F0
Mult1
F12
18/43
Scoreboard Example Cycle 12
Instruction status
Instruction
j
LD
F6
34
LD
F2
45
MULTD F0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional unit status
Name
Integer
Mult1
Mult2
Add
Divide
Issue
1
5
6
7
8
Busy
No
Yes
No
No
Yes
Read
Execution Write
operands complete Result
2
6
9
9
3
7
4
8
11
12
dest
S1
S2
Op
Fi
Fj
Fk
Mult
F0
F2
F4
Div
F10
F0
F6
Mult1
F2
F4
F6
F8
F10
Div
Clock 12
FU for j FU for k
Qj
Qk
Fj?
Fk?
Rj
Rk
Yes
Yes
No
Yes
...
F30
Register result status
FU
F0
Mult1
F12
19/43
Scoreboard Example Cycle 13
Instruction status
Instruction
j
LD
F6
34
LD
F2
45
MULTD F0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional unit status
Name
Integer
Mult1
Mult2
Add
Divide
Issue
1
5
6
7
8
13
Busy
No
Yes
No
Yes
Yes
Read
Execution Write
operands complete Result
2
6
9
9
3
7
4
8
11
12
dest
S1
S2
Op
Fi
Fj
Fk
Mult
F0
F2
Add
Div
F6
F10
F2
F4
Clock 13
FU for j FU for k
Fj?
Fk?
Rj
Rk
F4
Yes
Yes
F8
F0
F2
F6
Yes
No
Yes
Yes
F6
Add
F8
...
F30
Qj
Qk
Mult1
Register result status
FU
F0
Mult1
F10
Div
F12
Scoreboarding Summary
 Limitations of CDC 6600 scoreboard
 No forwarding hardware
 Limited to instructions in basic block (small
window)
 Small number of functional units (structural
hazards), especially integer/load/store units
 Do not issue if structural or WAW hazards
 Wait for WAR hazards
 Imprecise exceptions
 Key idea: Allow instructions behind stall
to proceed
 Decode  issue instructions and read operands
 Enables out-of-order execution  out-of-order
completion
20/43
Scoreboarding Summary
21/43
 Modern Day Improvements:
 All operands are cached as soon as available
 Forwarding
 Pipelining Functional Units
 Microcoding, eg. IA32 (widens execution window)
 More precise exceptions
 In order retirement
 Works best with tons of actual registers
 Tomasulo approach:
 Reservation stations vs. Forwarding and Caching
 Temporary Registers work as many virtual registers
Tomasulo’s Approach
Hardware Schemes for ILP
 Key idea: Allow instructions behind stall to
proceed
 Decode => issue instructions and read operands
 Enables out-of-order execution => out-of-order
completion
 Why in hardware at run time?
 Works when dependence is not known at run time
 Simplifies compiler
 Allows code for one machine to run well on another
 Out-of-order execution divides ID stage:
 Issue — decode instructions, check for structural
hazards
 Read operands — wait until no data hazards, then
read operands
23/43
Tomasulo’s Algorithm
24/43
 For IBM 360/91 about 3 years after CDC 6600
 Goal: High performance without special compilers
 Differences between IBM 360 & CDC 6600 ISA
 IBM has only 2 register specifiers/instruction vs. 3 in CDC 6600
 IBM has 4 FP registers vs. 8 in CDC 6600
 Differences between Tomasulo’s Algorithm & Scoreboard
 Control & buffers (called “reservation stations”) distributed with
functional units vs. centralized in scoreboard
 Registers in instructions replaced by pointers to reservation station
buffer
 HW renaming of registers to avoid WAR, WAW hazards
 Common data bus (CDB) broadcasts results to functional units
 Load and stores treated as functional units as well
 Alpha 21264, HP 8000, MIPS 10000, Pentium III, PowerPC
604, ...
Three Stages of Tomasulo Algorithm
1. Issue: Get instruction from FP operation queue
If reservation station free, issues instruction &
sends operands (renames registers).
2. Execution:
Operate on operands (EX)
When operands ready then execute; if not ready,
watch common data bus for result.
3. Write result: Finish execution (WB)
Write on common data bus to all awaiting units;
mark reservation station available.
Common data bus: data + source (“come from” bus)
25/43
26/43
Tomasulo Organization
From Instruction Unit
From
Memory
Load
Buffers
FP Registers
FP Op
Queue
Operand
Bus
To Memory
Operation Bus
FP Add
Res.
Station
FP Mul
Res.
Station
Reservation
Stations
FP Adders
FP Multipliers
Common data bus (CDB)
Store
Buffers
Reservation Station Components
Op – Operation to perform in the unit (e.g., + or – )
Qj, Qk –
Reservation stations producing source registers
Vj, Vk – Value of source operands
Rj, Rk – Flags indicating when Vj, Vk are ready
Busy – Indicates reservation station and FU is busy
Register result status
 Indicates which functional unit will write each register, if one
exists. Blank when no pending instructions will write that
register.
27/43
28/43
Tomasulo Example Cycle 1
Instruction status
Instruction
j
k
LD
F6
34
R2
LD
F2
45
R3
MULTD F0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Reservation Stations
Name Busy
Add1
No
Add2
Add3
Mult1
Mult2
Register result status
Op
Write
Result
Load1
Load2
Load3
S1
S2
RS for j
RS for k
Vj
Vk
Qj
Qk
Busy Address
Yes 34+R2
No
No
Clo c k 1
No
No
No
No
F0
FU
Issue
1
Execution
complete
F2
F4
F6
Load1
F8
F10
F12
...
F30
29/43
Tomasulo Example Cycle 2
ENGS 116 Lecture 8
Instruction status
Instruction
j
k
LD
F6
34
R2
LD
F2
45
R3
MULTD F0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Reservation Stations
Name Busy
Add1
No
Add2
Add3
Mult1
Mult2
Register result status
Op
Write
Result
Load1
Load2
Load3
S1
S2
RS for j
RS for k
Vj
Vk
Qj
Qk
Busy Address
Yes 34+R2
Yes 45+R3
No
Clo c k 2
No
No
No
No
F0
FU
Issue
1
2
Execution
complete
29
F2
Load2
F4
F6
Load1
F8
F10
F12
...
F30
Tomasulo Example Cycle 3
Instruction status
Instruction
j
LD
F6
34
LD
F2
45
MULTD F0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
Reservation Stations
k
R2
R3
F4
F2
F6
F2
Name Busy
Add1
No
Add2
Add3
Mult1
Mult2
Register result status
FU
Issue
1
2
3
Op
Execution
complete
3
Write
Result
S1
S2
RS for j
RS for k
Vj
Vk
Qj
Qk
Clo c k 3
No
No
Yes MULTD
No
F0
F2
Mult1 Load2
Load1
Load2
Load3
Busy Address
Yes 34+R2
Yes 45+R3
No
F4
R(F4)
Load2
F6
Load1
F8
F10
Register names are renamed in reservation stations
Load1 completing — who is waiting for Load1?
F12
...
F30
30/43
31/43
Tomasulo Example Cycle 4
Instruction status
Instruction
j
k
LD
F6
34
R2
LD
F2
45
R3
MULTD F0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Reservation Stations
Issue
1
2
3
4
Execution
complete
3
4
Write
Result
4
S1
S2
RS for j
RS for k
Vk
Qj
Qk
Load2
Name Busy Op
Vj
Add1
Yes SUBD M(34+R2)
Add2
Add3
Mult1
Mult2
Register result status
FU
Busy Address
No
Yes 45+R3
No
Clo c k 4
No
No
Yes MULTD
No
F0
F2
Mult1 Load2
Load1
Load2
Load3
F4
R(F4)
Load2
F6
M(34+R2)
F8
Add1
Load2 completing — who is waiting for it?
F10
F12
...
F30
32/43
Tomasulo Example Cycle 5
Instruction status
Instruction
j
k
LD
F6
34
R2
LD
F2
45
R3
MULTD F0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Reservation Stations
Issue
1
2
3
4
5
Execution
complete
3
4
Write
Result
4
5
S1
S2
RS for j
RS for k
Vk
M(45+R3)
Qj
Qk
Name Busy Op
Vj
Add1
Yes SUBD M(34+R2)
Add2
Add3
Mult1
Mult2
Register result status
FU
No
No
Yes MULTD M(45+R3)
Yes DIVD
F0
F2
Mult1 M(45+R3)
F4
Load1
Load2
Load3
Busy Address
No
No
No
Clo c k 5
R(F4)
M(34+R2)
F6
Mult1
F8
Add1
F10
Mult2
F12
...
F30
33/43
Tomasulo Example Cycle 6
Instruction status
Instruction
j
k
LD
F6
34
R2
LD
F2
45
R3
MULTD F0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Reservation Stations
Issue
1
2
3
4
5
6
Execution
complete
3
4
Write
Result
4
5
S1
S2
RS for j
RS for k
Vk
M(45+R3)
Qj
Qk
M(45+R3)
Add1
R(F4)
M(34+R2)
Mult1
F6
Add2
F8
Add1
Name Busy Op
Vj
Add1
Yes SUBD M(34+R2)
Add2
Add3
Mult1
Mult2
Register result status
FU
Yes ADDD
No
Yes MULTD M(45+R3)
Yes DIVD
F0
Mult1
F2
F4
Load1
Load2
Load3
Busy Address
No
No
No
Clo c k 6
F10
Mult2
F12
...
F30
Tomasulo Summary
 Reservation stations: renaming to larger set of
registers + buffering source operands
 Prevents registers as bottleneck
 Avoids WAR, WAW hazards of scoreboard
 Allows loop unrolling in HW
 Not limited to basic blocks
(integer units get ahead, beyond branches)
 Lasting Contributions
 Dynamic scheduling
 Register renaming
 Load/store disambiguation
 360/91 descendants are Pentium III; PowerPC 604;
MIPS R10000; HP-PA 8000; Alpha 21264
34/43
Tomasulo with Speculation
1.
35/43
Issue – Empty reservation station and an empty ROB
slot. Send operands to reservation station from register
file or from ROB. This stage is often referred to as:
dispatch
2. Execute – Monitor CDB for operands, check RAW
hazards. When both operands are available, then
execute.
3. Write Result – When available, write result to CDB
through to ROB and any waiting reservation stations.
Stores write to value field in ROB.
4. Commit – Three cases:
•
Normal Commit: write registers, in order commit
•
Store: update memory
•
Incorrect branch: flush ROB, reservation stations and
restart execution at correct PC
36/43
Now, for the grand finale
Let’s compare!!!
38/43
Scoreboard
Registers
Data buses
FP mult
FP mult





FP divide
FP add
Integer unit
Scoreboard
Control/status
Control/status
Figure A.51 The basic structure of a DLX processor with a scoreboard
39/43
Tomasulo Organization
From Instruction Unit
From
Memory
Load
Buffers
FP Registers
FP Op
Queue
Operand
Bus
To Memory
Operation Bus
FP Add
Res.
Station
FP Mul
Res.
Station
Reservation
Stations
FP Adders
FP Multipliers
Common data bus (CDB)
Store
Buffers
40/43
Scoreboard Example Cycle 6
Instruction status
Instruction
j
LD
F6
34
LD
F2
45
MULTD F0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional unit status
Name
Integer
Mult1
Mult2
Add
Divide
Issue
1
5
6
Read
Execution Write
operands complete Result
2
6
3
4
Clock
dest
S1
S2
Qj
F10
Busy
Yes
Yes
No
No
No
Op
Load
Mult
Fi
F2
F0
Fj
F2
Fk
R3
F4
F0
Mult1
F2
Int
F4
F6
F8
6
FU for j FU for k
Qk
Int
Fj?
Fk?
Rj
No
Rk
Yes
Yes
...
F30
Register result status
FU
F12
41/43
Tomasulo Example – cycle 6
Instruction status
Instruction
j
k
LD
F6
34
R2
LD
F2
45
R3
MULTD F0
F2
F4
SUBD F8
F6
F2
DIVD F10 F0
F6
ADDD F6
F8
F2
Reservation Stations
Issue
1
2
3
4
5
6
Execution
complete
3
4
Write
Result
4
5
S1
S2
RS for j
RS for k
Vk
M(45+R3)
Qj
Qk
M(45+R3)
Add1
R(F4)
M(34+R2)
Mult1
F6
Add2
F8
Add1
Name Busy Op
Vj
Add1
Yes SUBD M(34+R2)
Add2
Add3
Mult1
Mult2
Register result status
FU
Yes ADDD
No
Yes MULTD M(45+R3)
Yes DIVD
F0
Mult1
F2
F4
Load1
Load2
Load3
Busy Address
No
No
No
Clo c k 6
F10
Mult2
F12
...
F30
42/43
Differences between Tomasulo’s Algorithm & Scoreboard
 Control & buffers
(“reservation stations”)
distributed with functional
units
 Registers in instructions
replaced by pointers to
reservation station buffer
 HW renaming of registers
to avoid WAR, WAW
hazards
 Common data bus (CDB)
broadcasts results to
functional units
 Load and stores treated as
functional units as well
 Stages: Issue, Execution,
Write result
 Control & buffers
centralized
 Use actual registers
 Do not issue if structural
or WAW hazards
 Wait for WAR hazards
 Forwarding?
 Stages: Issue, Read
operands, Execution,
Write result
43/43