Scoreboarding

Download Report

Transcript Scoreboarding

ENGS 116 Lecture 7
1
Scoreboarding
Vincent H. Berk
October 5, 2005
Reading for today: A7, A9-A11, article: Yeager
Reading for Friday: A.8, article: Smith&Pleszkun
ENGS 116 Lecture 7
Scoreboard Implications (hardware ILP)
• Out-of-order completion  WAR, WAW hazards?
• Solutions for WAR
– Queue both the operation and copies of its operands
– Read registers only during Read Operands stage
• For WAW, must detect hazard: stall until other write completes
• Need to have multiple instructions in execution phase  multiple
execution units or pipelined execution units
• Scoreboard keeps track of dependences, state, and operations
• Scoreboard replaces ID, EX, WB with 4 main stages
– The EX stage can be (sub-)pipelined
2
ENGS 116 Lecture 7
3
Registers
Data buses
FP mult
FP mult





FP divide
FP add
Integer unit
Scoreboard
Control/status
Control/status
Figure A.51 The basic structure of a DLX processor with a scoreboard
ENGS 116 Lecture 7
Four Stages of Scoreboard Control: ISSUE
1. Issue: decode instructions & check for structural hazards
(ID1)
If a functional unit for the instruction is free and no other active
instruction has the same destination register (WAW), the scoreboard
issues the instruction to the functional unit and updates its internal data
structure. If a structural or WAW hazard exists, then the instruction
issue stalls, and no further instructions will issue until these hazards are
cleared.
Algorithm:
• Assure In-Order issue
• Multiple issues per cycle are allowed
• Check if Destination Register is already reserved for writing (WAW)
• Check if Read-Operand stage of Functional Unit is free (Structural)
4
ENGS 116 Lecture 7
Four Stages of Scoreboard Control:
READ-OPERANDS
2. Read operands: wait until no data hazards, then read operands (ID2) –
First Functional Pipeline Stage
A source operand is available if no earlier issued active instruction is going to
write it, or if the register containing the operand is being written by a currently
active functional unit. When the source operands are available, the scoreboard
tells the functional unit to proceed to read the operands from the registers and
begin execution. The scoreboard resolves RAW hazards dynamically in this
step, and instructions may be sent into execution out of order.
Algorithm:
• Wait for operands to become available, Register Result Status (RAW)
• Operand Caching is allowed
• Forwarding from another WB stage is allowed
5
ENGS 116 Lecture 7
Four Stages of Scoreboard Control
3. Execution: operate on operands (EX)
– The functional unit begins execution upon receiving operands.
When the result is ready, it notifies the scoreboard that it has
completed execution. This stage can be (sub-)pipelined.
4. Write result: finish execution (WB)
– Once the scoreboard is aware that the functional unit has
completed execution, the scoreboard checks for WAR hazards.
If none, it writes results. If WAR, it stalls the instruction.
Algorithm:
• Delay write until all Rj and Rk fields for this register are marked as
either cached or read.
– If caching of operands is done: forward answer right away.
– If not, wait until all operands are read before writing.
• Forward answers to units waiting for this write for their operand.
6
ENGS 116 Lecture 7
7
Three Parts of the Scoreboard
1. Instruction status: Indicates which of 4 steps the instruction is in.
2. Functional unit status: Indicates the state of the functional unit (FU).
9 fields for each functional unit
 Busy – Indicates whether the unit is busy or not
 Op – Operation to perform in the unit (e.g., + or -)
 Fi – Destination register
 Fj, Fk – Source-register numbers
 Qj, Qk – Functional units producing source registers Fj, Fk
 Rj, Rk – Flags indicating when Fj, Fk are available and not yet
read. (Alternatively: read and cached)
3. Register result status: Indicates which functional unit will write each
register, if one exists. Blank when no pending instructions will write
that register.
ENGS 116 Lecture 7
8
Instruction status
Instruction
Issue
LD
F6, 34 (R2)

Read
Operands

LD
F2, 45 (R3)


MULTD F0, F2, F4

SUBD
F8, F6, F2

DIVD
ADDD
F10, F0, F6
F6, F8, F2

Execution
complete

Write
result


Functional unit status
Name
Busy
Op
Fi
Fj
Fk
Integer
Mult1
Yes
Yes
Load
Mult
F2
F0
R3
F2
F4
Mult2
Add
No
Yes
Sub
F8
F6
F2
Divide
Yes
Div
F10
F0
F6
Qj
Qk
Integer
Integer
Mult1
Rj
Rk
Yes
No
Yes
Yes
No
No
Yes
…
F30
Register result status
FU
F0
F2
Mult1
Integer
F4
F6
F8
F10
Sub
Divid e
F12
FIGURE A.52 Components of the scoreboard
ENGS 116 Lecture 7
9
Scoreboard Example Cycle 1
Instruction status
Instruction
j
LD
F6
34
LD
F2
45
MULTD F0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional unit status
Name
Integer
Mult1
Mult2
Add
Divide
Issue
Read
Execution Write
operands complete Result
1
Clock
dest
S1
S2
1
FU for j FU for k
Fj?
Fk?
Busy
Yes
No
No
No
No
Op
Load
Fi
F6
Fj
Fk
R2
Qj
Qk
Rj
Rk
Yes
F0
F2
F4
F6
Int
F8
F10
F12
...
F30
Register result status
FU
R2 has not been read/cached until cycle 2!!!
ENGS 116 Lecture 7
10
Scoreboard Example Cycle 2
Instruction status
Instruction
j
LD
F6
34
LD
F2
45
MULTD F0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional unit status
Name
Integer
Mult1
Mult2
Add
Divide
Issue
1
Read
Execution Write
operands complete Result
2
Clock
dest
S1
S2
2
FU for j FU for k
Fj?
Fk?
Busy
Yes
No
No
No
No
Op
Load
Fi
F6
Fj
Fk
R2
Qj
Qk
Rj
Rk
Yes
F0
F2
F4
F6
Int
F8
F10
F12
...
F30
Register result status
FU
Issue 2nd LD or MULT?
ENGS 116 Lecture 7
11
Scoreboard Example Cycle 4
Instruction status
Instruction
j
LD
F6
34
LD
F2
45
MULTD F0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional unit status
Name
Integer
Mult1
Mult2
Add
Divide
Issue
1
Read
Execution Write
operands complete Result
2
3
4
Clock
dest
S1
S2
4
FU for j FU for k
Fj?
Fk?
Busy
Yes
No
No
No
No
Op
Load
Fi
F6
Fj
Fk
R2
Qj
Qk
Rj
Rk
No
Yes
F0
F2
F4
F6
Int
F8
F10
F12
...
F30
Register result status
FU
ENGS 116 Lecture 7
12
Scoreboard Example Cycle 5
Instruction status
Instruction
j
LD
F6
34
LD
F2
45
MULTD F0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional unit status
Name
Integer
Mult1
Mult2
Add
Divide
Issue
1
5
Read
Execution Write
operands complete Result
2
3
4
Clock
dest
S1
S2
5
FU for j FU for k
Fj?
Fk?
Busy
Yes
No
No
No
No
Op
Load
Fi
F2
Fj
Fk
R3
Qj
Qk
Rj
Rk
Yes
F0
F2
Int
F4
F6
F8
F10
F12
...
F30
Register result status
FU
SUPERSCALAR: Issue MULTD?
ENGS 116 Lecture 7
13
Scoreboard Example Cycle 6
Instruction status
Instruction
j
LD
F6
34
LD
F2
45
MULTD F0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional unit status
Name
Integer
Mult1
Mult2
Add
Divide
Issue
1
5
6
Read
Execution Write
operands complete Result
2
6
3
4
Clock
dest
S1
S2
Qj
F10
Busy
Yes
Yes
No
No
No
Op
Load
Mult
Fi
F2
F0
Fj
F2
Fk
R3
F4
F0
Mult1
F2
Int
F4
F6
F8
6
FU for j FU for k
Qk
Int
Fj?
Fk?
Rj
No
Rk
Yes
Yes
...
F30
Register result status
FU
F12
ENGS 116 Lecture 7
14
Scoreboard Example Cycle 7
Instruction status
Instruction
j
LD
F6
34
LD
F2
45
MULTD F0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional unit status
Name
Integer
Mult1
Mult2
Add
Divide
Issue
1
5
6
7
Read
Execution Write
operands complete Result
2
6
3
7
4
dest
S1
S2
Clock
Busy
Yes
Yes
No
Yes
No
Op
Load
Mult
Fi
F2
F0
Fj
F2
Fk
R3
F4
Sub
F8
F6
F2
F0
Mult1
F2
Int
F4
F6
F8
Add
7
FU for j FU for k
Qj
Fj?
Fk?
Rj
No
Rk
No
Yes
Int
Yes
No
F12
...
F30
Qk
Int
Register result status
FU
F10
Read multiply operands? DIVD could have been issued on this cycle.
ENGS 116 Lecture 7
15
Scoreboard Example Cycle 8a
Instruction status
Instruction
j
LD
F6
34
LD
F2
45
MULTD F0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional unit status
Name
Integer
Mult1
Mult2
Add
Divide
Issue
1
5
6
7
8
Read
Execution Write
operands complete Result
2
6
3
7
4
dest
S1
S2
Clock
8
FU for j FU for k
Busy
Yes
Yes
No
Yes
Yes
Op
Load
Mult
Fi
F2
F0
Fj
F2
Fk
R3
F4
Qj
Sub
Div
F8
F10
F6
F0
F2
F6
Mult1
F0
Mult1
F2
Int
F4
F6
F8
Add
F10
Div
Fj?
Fk?
Rj
No
Rk
Yes
Yes
Int
Yes
No
No
Yes
F12
...
F30
Qk
Int
Register result status
FU
ENGS 116 Lecture 7
16
Scoreboard Example Cycle 8b
Instruction status
Instruction
j
LD
F6
34
LD
F2
45
MULTD F0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional unit status
Name
Integer
Mult1
Mult2
Add
Divide
Issue
1
5
6
7
8
Busy
No
Yes
No
Yes
Yes
Read
Execution Write
operands complete Result
2
6
3
7
4
8
dest
S1
S2
Op
Fi
Fj
Fk
Mult
F0
F2
Sub
Div
F8
F10
F2
F4
Clock
8
FU for j FU for k
Fj?
Fk?
Rj
Rk
F4
Yes
Yes
F6
F0
F2
F6
Mult1
Yes
No
Yes
Yes
F6
F8
Add
F10
Div
...
F30
Qj
Qk
Register result status
FU
F0
Mult1
F12
ENGS 116 Lecture 7
17
Scoreboard Example Cycle 9
Instruction status
Instruction
j
LD
F6
34
LD
F2
45
MULTD F0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional unit status
Name
Integer
Mult1
Mult2
Add
Divide
Issue
1
5
6
7
8
Busy
No
Yes
No
Yes
Yes
Read
Execution Write
operands complete Result
2
6
9
9
3
7
4
8
dest
S1
S2
Op
Fi
Fj
Fk
Mult
F0
F2
Sub
Div
F8
F10
F2
F4
Clock
9
FU for j FU for k
Fj?
Fk?
Rj
Rk
F4
Yes
Yes
F6
F0
F2
F6
Mult1
Yes
No
Yes
Yes
F6
F8
Add
F10
Div
...
F30
Qj
Qk
Register result status
FU
Issue ADDD?
F0
Mult1
F12
ENGS 116 Lecture 7
18
Scoreboard Example Cycle 11
Instruction status
Instruction
j
LD
F6
34
LD
F2
45
MULTD F0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional unit status
Name
Integer
Mult1
Mult2
Add
Divide
Issue
1
5
6
7
8
Busy
No
Yes
No
Yes
Yes
Read
Execution Write
operands complete Result
2
6
9
9
3
7
4
8
Clock 11
11
dest
S1
S2
Op
Fi
Fj
Fk
Mult
F0
F2
Sub
Div
F8
F10
F2
F4
FU for j FU for k
Fj?
Fk?
Rj
Rk
F4
Yes
Yes
F6
F0
F2
F6
Mult1
Yes
No
Yes
Yes
F6
F8
Add
F10
Div
...
F30
Qj
Qk
Register result status
FU
F0
Mult1
F12
ENGS 116 Lecture 7
19
Scoreboard Example Cycle 12
Instruction status
Instruction
j
LD
F6
34
LD
F2
45
MULTD F0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional unit status
Name
Integer
Mult1
Mult2
Add
Divide
Issue
1
5
6
7
8
Busy
No
Yes
No
No
Yes
Read
Execution Write
operands complete Result
2
6
9
9
3
7
4
8
11
12
dest
S1
S2
Op
Fi
Fj
Fk
Mult
F0
F2
F4
Div
F10
F0
F6
Mult1
F2
F4
F6
F8
F10
Div
Clock 12
FU for j FU for k
Qj
Qk
Fj?
Fk?
Rj
Rk
Yes
Yes
No
Yes
...
F30
Register result status
FU
F0
Mult1
F12
ENGS 116 Lecture 7
20
Scoreboard Example Cycle 13
Instruction status
Instruction
j
LD
F6
34
LD
F2
45
MULTD F0
F2
SUBD F8
F6
DIVD F10 F0
ADDD F6
F8
k
R2
R3
F4
F2
F6
F2
Functional unit status
Name
Integer
Mult1
Mult2
Add
Divide
Issue
1
5
6
7
8
13
Busy
No
Yes
No
Yes
Yes
Read
Execution Write
operands complete Result
2
6
9
9
3
7
4
8
11
12
dest
S1
S2
Op
Fi
Fj
Fk
Mult
F0
F2
Add
Div
F6
F10
F2
F4
Clock 13
FU for j FU for k
Fj?
Fk?
Rj
Rk
F4
Yes
Yes
F8
F0
F2
F6
Yes
No
Yes
Yes
F6
Add
F8
...
F30
Qj
Qk
Mult1
Register result status
FU
F0
Mult1
F10
Div
F12
ENGS 116 Lecture 7
21
Pipelining Functional Units
•
•
•
•
•
•
Would add multiple ‘virtual’ FUs to scoreboard
Lower hardware cost than multiple actual units
Inherently avoids WAW
Bubbles are inserted at Issue and Read-Op
Works best with more actual registers
Consider the example from the book:
– Mult 1&2 are a two stage pipeline
ENGS 116 Lecture 7
22
Superscalar: Multiple Issues per cycle
• Very tedious
• ENSURE in order issue to avoid hazards
– Issuing hardware has to have ‘look-ahead’ hardware
• Works best with multiple internally pipelined Fus
• Consider:
DIV.D
F0, F2, F4
ADD.D
F10, F0, F8
SUB.D
F8, F8, F14
ENGS 116 Lecture 7
23
Exceptions
• Imprecise due to out-of-order execution
• Improved by keeping track of recently executed instructions:
– Instructions are ‘retired’ in order
– Synchronous exceptions raised at retirement
– Operating system responsible for recovery
• Non-fatal Asynchronous exceptions let pipeline and scoreboard run
empty before context switch.
• Trap/INT instructions (system calls) require context switch.
• On context switch, pipeline and scoreboard is run empty.
ENGS 116 Lecture 7
24
Scoreboarding Summary
• Limitations of CDC 6600 scoreboard
– No forwarding hardware
– Limited to instructions in basic block (small window)
– Small number of functional units (structural hazards), especially
integer/load/store units
– Do not issue if structural or WAW hazards
– Wait for WAR hazards
– Imprecise exceptions
• Key idea: Allow instructions behind stall to proceed
– Decode  issue instructions and read operands
– Enables out-of-order execution  out-of-order completion
ENGS 116 Lecture 7
25
Scoreboarding Summary
• Modern Day Improvements:
– All operands are cached as soon as available
– Forwarding
– Pipelining Functional Units
– Microcoding, eg. IA32 (widens execution window)
– More precise exceptions
– In order retirement
– Works best with tons of actual registers
• Tomasulo approach:
– Reservation stations vs. Forwarding and Caching
– Temporary Registers work as many virtual registers