CS252 Graduate Computer Architecture Lecture 7 ILP 1:
Download
Report
Transcript CS252 Graduate Computer Architecture Lecture 7 ILP 1:
CS252
Graduate Computer Architecture
Lecture 7
ILP 1:
Loop-Level parallelism extraction,
Data Flow, Explicit Register Renaming.
September 22nd, 2003
Prof. John Kubiatowicz
http://www.cs.berkeley.edu/~kubitron/courses/cs252-F03
9/22/03
CS252/Kubiatowicz
Lec 7.1
Review:
Dynamic hardware techniques for
out-of-order execution
• HW exploitation of ILP
– Works when can’t know dependence at compile time.
– Code for one machine runs well on another
• Scoreboard (ala CDC 6600 in 1963)
–
–
–
–
Centralized control structure
No register renaming, no forwarding
Pipeline stalls for WAR and WAW hazards.
Are these fundamental limitations??? (No)
–
–
–
–
Distributed control structures
Implicit renaming of registers (dispatched pointers)
WAR and WAW hazards eliminated by register renaming
Results broadcast to all reservation stations for RAW
• Reservation stations (ala IBM 360/91 in 1966)
9/22/03
CS252/Kubiatowicz
Lec 7.2
Registers
FP Mult
FP Mult
FP Divide
FP Add
Integer
SCOREBOARD
9/22/03
Functional Units
Review: Scoreboard Architecture
(CDC 6600)
Memory
CS252/Kubiatowicz
Lec 7.3
Review: Four Stages of
Scoreboard Control
• Issue—decode instructions & check for structural hazards (ID1)
– Instructions issued in program order (for hazard checking)
– Don’t issue if structural hazard
– Don’t issue if instruction is output dependent on any previously issued but
uncompleted instruction (no WAW hazards)
• Read operands—wait until no data hazards, then read ops (ID2)
–
All real dependencies (RAW hazards) resolved in this stage, since we wait for
instructions to write back data.
– No forwarding of data in this model!
• Execution—operate on operands (EX)
– The functional unit begins execution upon receiving operands. When the result is
ready, it notifies the scoreboard that it has completed execution.
• Write result—finish execution (WB)
– Stall until no WAR hazards with previous instructions:
Example:
DIVD
ADDD
SUBD
F0,F2,F4
F10,F0,F8
F8,F8,F14
CDC 6600 scoreboard would stall SUBD until ADDD reads operands
9/22/03
CS252/Kubiatowicz
Lec 7.4
Review: Tomasulo Organization
FP Registers
From Mem
FP Op
Queue
Load Buffers
Load1
Load2
Load3
Load4
Load5
Load6
Store
Buffers
Add1
Add2
Add3
Mult1
Mult2
FP adders
Reservation
Stations
To Mem
FP multipliers
Common Data Bus (CDB)
9/22/03
CS252/Kubiatowicz
Lec 7.5
Review: Three Stages of
Tomasulo Algorithm
1. Issue—get instruction from FP Op Queue
If reservation station free (no structural hazard),
control issues instr & sends operands (renames registers).
2. Execution—operate on operands (EX)
When both operands ready then execute;
if not ready, watch Common Data Bus for result
3. Write result—finish execution (WB)
Write on Common Data Bus to all awaiting units;
mark reservation station available
• Normal data bus: data + destination (“go to” bus)
• Common data bus: data + source (“come from” bus)
– 64 bits of data + 4 bits of Functional Unit source address
– Write if matches expected Functional Unit (produces result)
– Does the broadcast
9/22/03
CS252/Kubiatowicz
Lec 7.6
Review: Comparison Cycle 62
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
Read Exec Write
k Issue Oper Comp Result
R2
R3
F4
F2
F6
F2
1
5
6
7
8
13
2
6
9
9
21
14
3
7
19
11
61
16
4
8
20
12
62
22
Exec Write
Issue ComplResult
1
2
3
4
5
6
3
4
15
7
56
10
4
5
16
8
57
11
• Why take longer on scoreboard/6600?
• Structural Hazards
• Lack of forwarding
9/22/03
CS252/Kubiatowicz
Lec 7.7
Tomasulo Loop Example
Loop: LD
MULTD
SD
SUBI
BNEZ
F0
F4
F4
R1
R1
0
F0
0
R1
Loop
R1
F2
R1
#8
• Assume Multiply takes 4 clocks
• Assume first load takes 8 clocks (cache miss), second
load takes 1 clock (hit)
• To be clear, will show clocks for SUBI, BNEZ
• Reality: integer instructions ahead
9/22/03
CS252/Kubiatowicz
Lec 7.8
Loop Example
Instruction status:
ITER Instruction
1
1
1
2
2
2
LD
MULTD
SD
LD
MULTD
SD
F0
F4
F4
F0
F4
F4
j
k
0
F0
0
0
F0
0
R1
F2
R1
R1
F2
R1
Reservation Stations:
Time
Name Busy
Add1
No
Add2
No
Add3
No
Mult1 No
Mult2 No
Op
Vj
Exec Write
Issue CompResult
S1
Vk
S2
Qj
RS
Qk
Busy Addr
Load1
Load2
Load3
Store1
Store2
Store3
No
No
No
No
No
No
Code:
LD
MULTD
SD
SUBI
BNEZ
F0
F4
F4
R1
R1
Fu
0
F0
0
R1
Loop
R1
F2
R1
#8
...
F30
Register result status
Clock
0
9/22/03
F0
R1
80
F2
F4
F6
F8
F10 F12
Fu
CS252/Kubiatowicz
Lec 7.9
Loop Example Cycle 1
Instruction status:
ITER Instruction
1
1
1
2
2
2
LD
MULTD
SD
LD
MULTD
SD
F0
F4
F4
F0
F4
F4
j
k
0
F0
0
0
F0
0
R1
F2
R1
R1
F2
R1
Reservation Stations:
Time
Name Busy
Add1
No
Add2
No
Add3
No
Mult1 No
Mult2 No
Op
Vj
Exec Write
Issue CompResult
1
S1
Vk
S2
Qj
RS
Qk
Busy Addr
Fu
Load1
Load2
Load3
Store1
Store2
Store3
Yes
No
No
No
No
No
80
Code:
LD
MULTD
SD
SUBI
BNEZ
F0
F4
F4
R1
R1
0
F0
0
R1
Loop
R1
F2
R1
#8
...
F30
Register result status
Clock
1
9/22/03
R1
80
F0
F2
F4
F6
F8
F10 F12
Fu Load1
CS252/Kubiatowicz
Lec 7.10
Loop Example Cycle 2
Instruction status:
ITER Instruction
1
1
1
2
2
2
LD
MULTD
SD
LD
MULTD
SD
F0
F4
F4
F0
F4
F4
j
k
0
F0
0
0
F0
0
R1
F2
R1
R1
F2
R1
Reservation Stations:
Time
Name Busy Op
Add1
No
Add2
No
Add3
No
Mult1 Yes Multd
Mult2 No
Vj
Exec Write
Issue CompResult
1
2
S1
Vk
S2
Qj
RS
Qk
R(F4) Load1
Busy Addr
Fu
Load1
Load2
Load3
Store1
Store2
Store3
Yes
No
No
No
No
No
80
Code:
LD
MULTD
SD
SUBI
BNEZ
F0
F4
F4
R1
R1
0
F0
0
R1
Loop
R1
F2
R1
#8
...
F30
Register result status
Clock
2
9/22/03
R1
80
F0
Fu Load1
F2
F4
F6
F8
F10 F12
Mult1
CS252/Kubiatowicz
Lec 7.11
Loop Example Cycle 3
Instruction status:
ITER Instruction
1
1
1
2
2
2
LD
MULTD
SD
LD
MULTD
SD
F0
F4
F4
F0
F4
F4
j
k
0
F0
0
0
F0
0
R1
F2
R1
R1
F2
R1
Reservation Stations:
Time
Name Busy Op
Add1
No
Add2
No
Add3
No
Mult1 Yes Multd
Mult2 No
Vj
Exec Write
Issue CompResult
1
2
3
S1
Vk
S2
Qj
RS
Qk
R(F4) Load1
Busy Addr
Fu
Load1
Load2
Load3
Store1
Store2
Store3
Yes
No
No
Yes
No
No
80
80
Mult1
Code:
LD
MULTD
SD
SUBI
BNEZ
F0
F4
F4
R1
R1
0
F0
0
R1
Loop
R1
F2
R1
#8
...
F30
Register result status
Clock
3
R1
80
F0
Fu Load1
F2
F4
F6
F8
F10 F12
Mult1
• Implicit renaming sets up “DataFlow” graph
9/22/03
CS252/Kubiatowicz
Lec 7.12
Loop Example Cycle 4
Instruction status:
ITER Instruction
1
1
1
2
2
2
LD
MULTD
SD
LD
MULTD
SD
F0
F4
F4
F0
F4
F4
j
k
0
F0
0
0
F0
0
R1
F2
R1
R1
F2
R1
Reservation Stations:
Time
Name Busy Op
Add1
No
Add2
No
Add3
No
Mult1 Yes Multd
Mult2 No
Vj
Exec Write
Issue CompResult
1
2
3
S1
Vk
S2
Qj
RS
Qk
R(F4) Load1
Busy Addr
Fu
Load1
Load2
Load3
Store1
Store2
Store3
Yes
No
No
Yes
No
No
80
80
Mult1
Code:
LD
MULTD
SD
SUBI
BNEZ
F0
F4
F4
R1
R1
0
F0
0
R1
Loop
R1
F2
R1
#8
...
F30
Register result status
Clock
4
R1
80
F0
Fu Load1
F2
F4
F6
F8
Mult1
• Dispatching SUBI Instruction
9/22/03
F10 F12
CS252/Kubiatowicz
Lec 7.13
Loop Example Cycle 5
Instruction status:
ITER Instruction
1
1
1
2
2
2
LD
MULTD
SD
LD
MULTD
SD
F0
F4
F4
F0
F4
F4
j
k
0
F0
0
0
F0
0
R1
F2
R1
R1
F2
R1
Reservation Stations:
Time
Name Busy Op
Add1
No
Add2
No
Add3
No
Mult1 Yes Multd
Mult2 No
Vj
Exec Write
Issue CompResult
1
2
3
S1
Vk
S2
Qj
RS
Qk
R(F4) Load1
Busy Addr
Fu
Load1
Load2
Load3
Store1
Store2
Store3
Yes
No
No
Yes
No
No
80
80
Mult1
Code:
LD
MULTD
SD
SUBI
BNEZ
F0
F4
F4
R1
R1
0
F0
0
R1
Loop
R1
F2
R1
#8
...
F30
Register result status
Clock
5
R1
72
F0
Fu Load1
F2
F4
F8
F10 F12
Mult1
• And, BNEZ instruction
9/22/03
F6
CS252/Kubiatowicz
Lec 7.14
Loop Example Cycle 6
Instruction status:
ITER Instruction
1
1
1
2
2
2
LD
MULTD
SD
LD
MULTD
SD
F0
F4
F4
F0
F4
F4
j
k
0
F0
0
0
F0
0
R1
F2
R1
R1
F2
R1
Reservation Stations:
Time
Name Busy Op
Add1
No
Add2
No
Add3
No
Mult1 Yes Multd
Mult2 No
Vj
Exec Write
Issue CompResult
1
2
3
6
S1
Vk
S2
Qj
RS
Qk
R(F4) Load1
Busy Addr
Fu
Load1
Load2
Load3
Store1
Store2
Store3
Yes
Yes
No
Yes
No
No
80
72
80
Mult1
Code:
LD
MULTD
SD
SUBI
BNEZ
F0
F4
F4
R1
R1
0
F0
0
R1
Loop
R1
F2
R1
#8
...
F30
Register result status
Clock
6
R1
72
F0
Fu Load2
F2
F4
F6
F8
F10 F12
Mult1
• Notice that F0 never sees Load from location 80
9/22/03
CS252/Kubiatowicz
Lec 7.15
Loop Example Cycle 7
Instruction status:
ITER Instruction
1
1
1
2
2
2
LD
MULTD
SD
LD
MULTD
SD
F0
F4
F4
F0
F4
F4
j
k
0
F0
0
0
F0
0
R1
F2
R1
R1
F2
R1
Reservation Stations:
Time
Name Busy Op
Add1
No
Add2
No
Add3
No
Mult1 Yes Multd
Mult2 Yes Multd
Vj
Exec Write
Issue CompResult
1
2
3
6
7
S1
Vk
S2
Qj
RS
Qk
R(F2) Load1
R(F2) Load2
Busy Addr
Fu
Load1
Load2
Load3
Store1
Store2
Store3
Yes
Yes
No
Yes
No
No
80
72
80
Mult1
Code:
LD
MULTD
SD
SUBI
BNEZ
F0
F4
F4
R1
R1
0
F0
0
R1
Loop
R1
F2
R1
#8
...
F30
Register result status
Clock
7
R1
72
F0
Fu Load2
F2
F4
F6
F8
F10 F12
Mult2
• Register file completely detached from computation
• First and Second iteration completely overlapped
9/22/03
CS252/Kubiatowicz
Lec 7.16
Loop Example Cycle 8
Instruction status:
ITER Instruction
1
1
1
2
2
2
LD
MULTD
SD
LD
MULTD
SD
F0
F4
F4
F0
F4
F4
j
k
0
F0
0
0
F0
0
R1
F2
R1
R1
F2
R1
1
2
3
6
7
8
Vj
S1
Vk
Reservation Stations:
Time
Exec Write
Issue CompResult
Name Busy Op
Add1
No
Add2
No
Add3
No
Mult1 Yes Multd
Mult2 Yes Multd
S2
Qj
RS
Qk
R(F2) Load1
R(F2) Load2
Busy Addr
Fu
Load1
Load2
Load3
Store1
Store2
Store3
Yes
Yes
No
Yes
Yes
No
80
72
80
72
Mult1
Mult2
Code:
LD
MULTD
SD
SUBI
BNEZ
F0
F4
F4
R1
R1
0
F0
0
R1
Loop
R1
F2
R1
#8
...
F30
Register result status
Clock
8
9/22/03
R1
72
F0
Fu Load2
F2
F4
F6
F8
F10 F12
Mult2
CS252/Kubiatowicz
Lec 7.17
Loop Example Cycle 9
Instruction status:
ITER Instruction
1
1
1
2
2
2
LD
MULTD
SD
LD
MULTD
SD
F0
F4
F4
F0
F4
F4
j
k
0
F0
0
0
F0
0
R1
F2
R1
R1
F2
R1
1
2
3
6
7
8
9
Vj
S1
Vk
S2
Qj
Reservation Stations:
Time
Exec Write
Issue CompResult
Name Busy Op
Add1
No
Add2
No
Add3
No
Mult1 Yes Multd
Mult2 Yes Multd
RS
Qk
R(F2) Load1
R(F2) Load2
Busy Addr
Fu
Load1
Load2
Load3
Store1
Store2
Store3
Yes
Yes
No
Yes
Yes
No
80
72
80
72
Mult1
Mult2
Code:
LD
MULTD
SD
SUBI
BNEZ
F0
F4
F4
R1
R1
0
F0
0
R1
Loop
R1
F2
R1
#8
...
F30
Register result status
Clock
9
R1
72
F0
Fu Load2
F2
F4
F6
F8
F10 F12
Mult2
• Load1 completing: who is waiting?
• Note: Dispatching SUBI
9/22/03
CS252/Kubiatowicz
Lec 7.18
Loop Example Cycle 10
Instruction status:
ITER Instruction
1
1
1
2
2
2
LD
MULTD
SD
LD
MULTD
SD
F0
F4
F4
F0
F4
F4
j
k
0
F0
0
0
F0
0
R1
F2
R1
R1
F2
R1
Reservation Stations:
Time
4
Exec Write
Issue CompResult
1
2
3
6
7
8
S1
Vk
9
10
10
S2
Qj
Name Busy Op
Vj
Add1
No
Add2
No
Add3
No
Mult1 Yes Multd M[80] R(F2)
Mult2 Yes Multd
R(F2) Load2
RS
Qk
Busy Addr
Load1
Load2
Load3
Store1
Store2
Store3
No
Yes
No
Yes
Yes
No
Code:
LD
MULTD
SD
SUBI
BNEZ
F0
F4
F4
R1
R1
Fu
72
80
72
Mult1
Mult2
0
F0
0
R1
Loop
R1
F2
R1
#8
...
F30
Register result status
Clock
10
R1
64
F0
Fu Load2
F2
F4
F6
F8
F10 F12
Mult2
• Load2 completing: who is waiting?
• Note: Dispatching BNEZ
9/22/03
CS252/Kubiatowicz
Lec 7.19
Loop Example Cycle 11
Instruction status:
ITER Instruction
1
1
1
2
2
2
LD
MULTD
SD
LD
MULTD
SD
F0
F4
F4
F0
F4
F4
j
k
0
F0
0
0
F0
0
R1
F2
R1
R1
F2
R1
Reservation Stations:
Time
3
4
Exec Write
Issue CompResult
1
2
3
6
7
8
S1
Vk
Name Busy Op
Vj
Add1
No
Add2
No
Add3
No
Mult1 Yes Multd M[80] R(F2)
Mult2 Yes Multd M[72] R(F2)
9
10
10
11
S2
Qj
RS
Qk
Busy Addr
Load1
Load2
Load3
Store1
Store2
Store3
No
No
Yes
Yes
Yes
No
Code:
LD
MULTD
SD
SUBI
BNEZ
F0
F4
F4
R1
R1
64
80
72
Fu
Mult1
Mult2
0
F0
0
R1
Loop
R1
F2
R1
#8
...
F30
Register result status
Clock
11
R1
64
F0
Fu Load3
F2
F4
F6
F8
F10 F12
Mult2
• Next load in sequence
9/22/03
CS252/Kubiatowicz
Lec 7.20
Loop Example Cycle 12
Instruction status:
ITER Instruction
1
1
1
2
2
2
LD
MULTD
SD
LD
MULTD
SD
F0
F4
F4
F0
F4
F4
j
k
0
F0
0
0
F0
0
R1
F2
R1
R1
F2
R1
Reservation Stations:
Time
2
3
Exec Write
Issue CompResult
1
2
3
6
7
8
S1
Vk
Name Busy Op
Vj
Add1
No
Add2
No
Add3
No
Mult1 Yes Multd M[80] R(F2)
Mult2 Yes Multd M[72] R(F2)
9
10
10
11
S2
Qj
RS
Qk
Busy Addr
Load1
Load2
Load3
Store1
Store2
Store3
No
No
Yes
Yes
Yes
No
Code:
LD
MULTD
SD
SUBI
BNEZ
F0
F4
F4
R1
R1
64
80
72
Fu
Mult1
Mult2
0
F0
0
R1
Loop
R1
F2
R1
#8
...
F30
Register result status
Clock
12
R1
64
F0
Fu Load3
F2
F4
F6
F8
F10 F12
Mult2
• Why not issue third multiply?
9/22/03
CS252/Kubiatowicz
Lec 7.21
Loop Example Cycle 13
Instruction status:
ITER Instruction
1
1
1
2
2
2
LD
MULTD
SD
LD
MULTD
SD
F0
F4
F4
F0
F4
F4
j
k
0
F0
0
0
F0
0
R1
F2
R1
R1
F2
R1
Reservation Stations:
Time
1
2
Exec Write
Issue CompResult
1
2
3
6
7
8
S1
Vk
Name Busy Op
Vj
Add1
No
Add2
No
Add3
No
Mult1 Yes Multd M[80] R(F2)
Mult2 Yes Multd M[72] R(F2)
9
10
10
11
S2
Qj
RS
Qk
Busy Addr
Load1
Load2
Load3
Store1
Store2
Store3
No
No
Yes
Yes
Yes
No
Code:
LD
MULTD
SD
SUBI
BNEZ
F0
F4
F4
R1
R1
64
80
72
Fu
Mult1
Mult2
0
F0
0
R1
Loop
R1
F2
R1
#8
...
F30
Register result status
Clock
13
9/22/03
R1
64
F0
Fu Load3
F2
F4
F6
F8
F10 F12
Mult2
CS252/Kubiatowicz
Lec 7.22
Loop Example Cycle 14
Instruction status:
ITER Instruction
1
1
1
2
2
2
LD
MULTD
SD
LD
MULTD
SD
F0
F4
F4
F0
F4
F4
j
k
0
F0
0
0
F0
0
R1
F2
R1
R1
F2
R1
Reservation Stations:
Time
0
1
Exec Write
Issue CompResult
1
2
3
6
7
8
9
14
10
11
S1
Vk
S2
Qj
RS
Qk
Name Busy Op
Vj
Add1
No
Add2
No
Add3
No
Mult1 Yes Multd M[80] R(F2)
Mult2 Yes Multd M[72] R(F2)
10
Busy Addr
Load1
Load2
Load3
Store1
Store2
Store3
No
No
Yes
Yes
Yes
No
Code:
LD
MULTD
SD
SUBI
BNEZ
F0
F4
F4
R1
R1
64
80
72
Fu
Mult1
Mult2
0
F0
0
R1
Loop
R1
F2
R1
#8
...
F30
Register result status
Clock
14
R1
64
F0
Fu Load3
F2
F4
F6
F8
F10 F12
Mult2
• Mult1 completing. Who is waiting?
9/22/03
CS252/Kubiatowicz
Lec 7.23
Loop Example Cycle 15
Instruction status:
ITER Instruction
1
1
1
2
2
2
LD
MULTD
SD
LD
MULTD
SD
F0
F4
F4
F0
F4
F4
j
k
0
F0
0
0
F0
0
R1
F2
R1
R1
F2
R1
Reservation Stations:
Time
0
Exec Write
Issue CompResult
1
2
3
6
7
8
9
14
10
15
11
S1
Vk
S2
Qj
RS
Qk
Name Busy Op
Vj
Add1
No
Add2
No
Add3
No
Mult1 No
Mult2 Yes Multd M[72] R(F2)
10
15
Busy Addr
Load1
Load2
Load3
Store1
Store2
Store3
No
No
Yes
Yes
Yes
No
Code:
LD
MULTD
SD
SUBI
BNEZ
F0
F4
F4
R1
R1
64
80
72
Fu
[80]*R2
Mult2
0
F0
0
R1
Loop
R1
F2
R1
#8
...
F30
Register result status
Clock
15
R1
64
F0
Fu Load3
F2
F4
F6
F8
F10 F12
Mult2
• Mult2 completing. Who is waiting?
9/22/03
CS252/Kubiatowicz
Lec 7.24
Loop Example Cycle 16
Instruction status:
ITER Instruction
1
1
1
2
2
2
LD
MULTD
SD
LD
MULTD
SD
F0
F4
F4
F0
F4
F4
j
k
0
F0
0
0
F0
0
R1
F2
R1
R1
F2
R1
1
2
3
6
7
8
9
14
10
15
11
16
Vj
S1
Vk
S2
Qj
RS
Qk
Reservation Stations:
Time
Exec Write
Issue CompResult
Name Busy Op
Add1
No
Add2
No
Add3
No
Mult1 Yes Multd
Mult2 No
10
15
R(F2) Load3
Busy Addr
Load1
Load2
Load3
Store1
Store2
Store3
No
No
Yes
Yes
Yes
No
Code:
LD
MULTD
SD
SUBI
BNEZ
F0
F4
F4
R1
R1
64
80
72
Fu
[80]*R2
[72]*R2
0
F0
0
R1
Loop
R1
F2
R1
#8
...
F30
Register result status
Clock
16
9/22/03
R1
64
F0
Fu Load3
F2
F4
F6
F8
F10 F12
Mult1
CS252/Kubiatowicz
Lec 7.25
Loop Example Cycle 17
Instruction status:
ITER Instruction
1
1
1
2
2
2
LD
MULTD
SD
LD
MULTD
SD
F0
F4
F4
F0
F4
F4
j
k
0
F0
0
0
F0
0
R1
F2
R1
R1
F2
R1
1
2
3
6
7
8
9
14
10
15
11
16
Vj
S1
Vk
S2
Qj
RS
Qk
Reservation Stations:
Time
Exec Write
Issue CompResult
Name Busy Op
Add1
No
Add2
No
Add3
No
Mult1 Yes Multd
Mult2 No
10
15
R(F2) Load3
Busy Addr
Fu
Load1
Load2
Load3
Store1
Store2
Store3
No
No
Yes
Yes
Yes
Yes
64
80
72
64
Code:
LD
MULTD
SD
SUBI
BNEZ
F0
F4
F4
R1
R1
0
F0
0
R1
Loop
R1
F2
R1
#8
...
F30
[80]*R2
[72]*R2
Mult1
Register result status
Clock
17
9/22/03
R1
64
F0
Fu Load3
F2
F4
F6
F8
F10 F12
Mult1
CS252/Kubiatowicz
Lec 7.26
Loop Example Cycle 18
Instruction status:
ITER Instruction
1
1
1
2
2
2
LD
MULTD
SD
LD
MULTD
SD
F0
F4
F4
F0
F4
F4
j
k
0
F0
0
0
F0
0
R1
F2
R1
R1
F2
R1
1
2
3
6
7
8
9
14
18
10
15
10
15
Vj
S1
Vk
S2
Qj
RS
Qk
Reservation Stations:
Time
Exec Write
Issue CompResult
Name Busy Op
Add1
No
Add2
No
Add3
No
Mult1 Yes Multd
Mult2 No
11
16
R(F2) Load3
Busy Addr
Fu
Load1
Load2
Load3
Store1
Store2
Store3
No
No
Yes
Yes
Yes
Yes
64
80
72
64
Code:
LD
MULTD
SD
SUBI
BNEZ
F0
F4
F4
R1
R1
0
F0
0
R1
Loop
R1
F2
R1
#8
...
F30
[80]*R2
[72]*R2
Mult1
Register result status
Clock
18
9/22/03
R1
64
F0
Fu Load3
F2
F4
F6
F8
F10 F12
Mult1
CS252/Kubiatowicz
Lec 7.27
Loop Example Cycle 19
Instruction status:
ITER Instruction
1
1
1
2
2
2
LD
MULTD
SD
LD
MULTD
SD
F0
F4
F4
F0
F4
F4
j
k
0
F0
0
0
F0
0
R1
F2
R1
R1
F2
R1
1
2
3
6
7
8
9
14
18
10
15
19
10
15
19
11
16
Vj
S1
Vk
S2
Qj
RS
Qk
Reservation Stations:
Time
Exec Write
Issue CompResult
Name Busy Op
Add1
No
Add2
No
Add3
No
Mult1 Yes Multd
Mult2 No
R(F2) Load3
Busy Addr
Load1
Load2
Load3
Store1
Store2
Store3
No
No
Yes
No
Yes
Yes
Code:
LD
MULTD
SD
SUBI
BNEZ
F0
F4
F4
R1
R1
Fu
64
72
64
[72]*R2
Mult1
0
F0
0
R1
Loop
R1
F2
R1
#8
...
F30
Register result status
Clock
19
9/22/03
R1
64
F0
Fu Load3
F2
F4
F6
F8
F10 F12
Mult1
CS252/Kubiatowicz
Lec 7.28
Loop Example Cycle 20
Instruction status:
ITER Instruction
1
1
1
2
2
2
LD
MULTD
SD
LD
MULTD
SD
F0
F4
F4
F0
F4
F4
j
k
0
F0
0
0
F0
0
R1
F2
R1
R1
F2
R1
1
2
3
6
7
8
9
14
18
10
15
19
10
15
19
11
16
20
Vj
S1
Vk
S2
Qj
RS
Qk
Reservation Stations:
Time
Exec Write
Issue CompResult
Name Busy Op
Add1
No
Add2
No
Add3
No
Mult1 Yes Multd
Mult2 No
R(F2) Load3
Busy Addr
Load1
Load2
Load3
Store1
Store2
Store3
No
No
Yes
No
No
Yes
Code:
LD
MULTD
SD
SUBI
BNEZ
F0
F4
F4
R1
R1
Fu
64
64
Mult1
0
F0
0
R1
Loop
R1
F2
R1
#8
...
F30
Register result status
Clock
20
9/22/03
R1
64
F0
Fu Load3
F2
F4
F6
F8
F10 F12
Mult1
CS252/Kubiatowicz
Lec 7.29
Why can Tomasulo overlap
iterations of loops?
• Register renaming
– Multiple iterations use different physical destinations for registers
(dynamic loop unrolling).
• Reservation stations
– Permit instruction issue to advance past integer control flow
operations
• Other idea: Tomasulo building dynamic “DataFlow”
graph from instructions.
9/22/03
CS252/Kubiatowicz
Lec 7.30
Administrivia
• Paper mailing list is:
[email protected]
• Prereq exams:
– They are available with my administrative assistant (really!)
– If you got “X” or “-” please read solutions!!!
9/22/03
CS252/Kubiatowicz
Lec 7.31
Data-Flow Architectures
• Basic Idea: Hardware respresents direct encoding of compiler
dataflow graphs:
Input: a,b
y:= (a+b)/x
x:= (a*(a+b))+b
output: y,x
B
A
+
• Data flows along arcs in
“Tokens”.
• When two tokens arrive at
compute box, box “fires” and
produces new token.
• Split operations produce copies
of tokens
*
/
+
X(0)
9/22/03
Y
X
CS252/Kubiatowicz
Lec 7.32
Paper by Dennis and Misunas
Operation
Unit 0
Operation
Unit m-1
Instruction
Operand 1
Operand 2
Operation
Packet
Data Packets
Instruction Cell
“Reservation Station?”
9/22/03
Instruction
Cell 0
Instruction
Cell 1
Memory
Instruction
Cell n-1
CS252/Kubiatowicz
Lec 7.33
Explicit Register Renaming
• Make use of a physical register file that is larger
than number of registers specified by ISA
• Keep a translation table:
– ISA register => physical register mapping
– When register is written, replace table entry with new register from
freelist.
– Physical register becomes free when not being used by any
instructions in progress.
• Pipeline can be exactly like “standard” DLX pipeline
– IF, ID, EX, etc….
• Advantages:
–
–
–
–
9/22/03
Removes all WAR and WAW hazards
Like Tomasulo, good for allowing full out-of-order completion
Allows data to be fetched from a single register file
Makes speculative execution/precise interrupts easier:
» All that needs to be “undone” for precise break point
is to undo the table mappings
CS252/Kubiatowicz
Lec 7.34
Registers
FP Mult
FP Mult
FP Divide
FP Add
Integer
SCOREBOARD
Functional Units
Question:
Can we use explicit register
renaming with scoreboard?
Memory
Rename
Table
9/22/03
CS252/Kubiatowicz
Lec 7.35
Scoreboard Example
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Read Exec Write
Issue Oper Comp Result
Functional unit status:
Op
dest
Fi
S1
Fj
S2
Fk
Register Rename and Result
Clock
F0 F2
F4
F6
F8 F10 F12
P4
P6
Time Name
Int1
Int2
Mult1
Add
Divide
FU
Busy
FU
Qj
FU
Qk
Fj?
Rj
Fk?
Rk
...
F30
No
No
No
No
No
P0
P2
P8
P10
P12
P30
• Initialized Rename Table
9/22/03
CS252/Kubiatowicz
Lec 7.36
Renamed Scoreboard 1
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Read Exec Write
Issue Oper Comp Result
1
Functional unit status:
Time Name
Int1
Int2
Mult1
Add
Divide
Busy
Op
dest
Fi
Yes
No
No
No
No
Load
P32
Register Rename and Result
Clock
F0 F2
1
FU
P0
P2
S1
Fj
S2
Fk
FU
Qj
FU
Qk
Fj?
Rj
R2
F4
F6
P4
P32
Yes
F8 F10 F12
P8
Fk?
Rk
P10
P12
...
F30
P30
• Each instruction allocates free register
• Similar to single-assignment compiler transformation
CS252/Kubiatowicz
9/22/03
Lec 7.37
Renamed Scoreboard 2
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Read Exec Write
Issue Oper Comp Result
1
2
2
Functional unit status:
Time Name
Int1
Int2
Mult1
Add
Divide
Busy
Op
dest
Fi
Yes
Yes
No
No
No
Load
Load
P32
P34
Register Rename and Result
Clock
F0 F2
2
9/22/03
FU
P0
P34
S1
Fj
S2
Fk
FU
Qj
FU
Qk
Fj?
Rj
R2
R3
F4
F6
P4
P32
Yes
Yes
F8 F10 F12
P8
Fk?
Rk
P10
P12
...
F30
P30
CS252/Kubiatowicz
Lec 7.38
Renamed Scoreboard 3
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Read Exec Write
Issue Oper Comp Result
1
2
3
2
3
Functional unit status:
Time Name
Int1
Int2
Mult1
Add
Divide
Busy
Op
dest
Fi
Yes
Yes
Yes
No
No
Load
Load
Multd
P32
P34
P36
P34
F4
F6
P4
P32
Register Rename and Result
Clock
F0 F2
3
9/22/03
FU
3
P36
P34
S1
Fj
S2
Fk
R2
R3
P4
Fj?
Rj
Fk?
Rk
Int2
No
Yes
Yes
Yes
F8 F10 F12
...
F30
P8
FU
Qj
P10
FU
Qk
P12
P30
CS252/Kubiatowicz
Lec 7.39
Renamed Scoreboard 4
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Read Exec Write
Issue Oper Comp Result
1
2
3
4
2
3
3
4
4
Busy
Op
dest
Fi
S1
Fj
S2
Fk
No
Yes
Yes
Yes
No
Load
Multd
Sub
P34
P36
P38
P34
P32
R3
P4
P34
F4
F6
F8 F10 F12
P4
P32
P38
Functional unit status:
Time Name
Int1
Int2
Mult1
Add
Divide
Register Rename and Result
Clock
F0 F2
4
9/22/03
FU
P36
P34
FU
Qj
FU
Qk
Int2
Int2
P10
P12
Fj?
Rj
Fk?
Rk
No
Yes
Yes
Yes
No
...
F30
P30
CS252/Kubiatowicz
Lec 7.40
Renamed Scoreboard 5
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Read Exec Write
Issue Oper Comp Result
1
2
3
4
5
2
3
3
4
4
5
Busy
Op
dest
Fi
S1
Fj
S2
Fk
No
No
Yes
Yes
Yes
Multd
Sub
Divd
P36
P38
P40
P34
P32
P36
P4
P34
P32
F4
P4
Functional unit status:
Time Name
Int1
Int2
Mult1
Add
Divide
Register Rename and Result
Clock
F0 F2
5
9/22/03
FU
P36
P34
FU
Qj
Fj?
Rj
Fk?
Rk
Mult1
Yes
Yes
No
Yes
Yes
Yes
F6
F8 F10 F12
...
F30
P32
P38
P40
FU
Qk
P12
P30
CS252/Kubiatowicz
Lec 7.41
Renamed Scoreboard 6
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Read Exec Write
Issue Oper Comp Result
1
2
3
4
5
2
3
6
6
Functional unit status:
Time Name
Int1
Int2
10 Mult1
2 Add
Divide
9/22/03
FU
4
5
S1
Fj
S2
Fk
Busy
Op
dest
Fi
No
No
Yes
Yes
Yes
Multd
Sub
Divd
P36
P38
P40
P34
P32
P36
P4
P34
P32
Mult1
Yes
Yes
No
F4
F6
F8 F10 F12
...
P4
P32
P38
Register Rename and Result
Clock
F0 F2
6
3
4
P36
P34
FU
Qj
P40
FU
Qk
P12
Fj?
Rj
Fk?
Rk
Yes
Yes
Yes
F30
P30
CS252/Kubiatowicz
Lec 7.42
Renamed Scoreboard 7
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Read Exec Write
Issue Oper Comp Result
1
2
3
4
5
2
3
6
6
Functional unit status:
Time Name
Int1
Int2
9 Mult1
1 Add
Divide
9/22/03
FU
4
5
S1
Fj
S2
Fk
Busy
Op
dest
Fi
No
No
Yes
Yes
Yes
Multd
Sub
Divd
P36
P38
P40
P34
P32
P36
P4
P34
P32
Mult1
Yes
Yes
No
F4
F6
F8 F10 F12
...
P4
P32
P38
Register Rename and Result
Clock
F0 F2
7
3
4
P36
P34
FU
Qj
P40
FU
Qk
P12
Fj?
Rj
Fk?
Rk
Yes
Yes
Yes
F30
P30
CS252/Kubiatowicz
Lec 7.43
Renamed Scoreboard 8
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Read Exec Write
Issue Oper Comp Result
1
2
3
4
5
2
3
6
6
Functional unit status:
Time Name
Int1
Int2
8 Mult1
0 Add
Divide
9/22/03
FU
4
5
8
Busy
Op
dest
Fi
No
No
Yes
Yes
Yes
Multd
Sub
Divd
P36
P38
P40
P34
P32
P36
P4
P34
P32
Mult1
Yes
Yes
No
F4
F6
F8 F10 F12
...
P4
P32
P38
Register Rename and Result
Clock
F0 F2
8
3
4
P36
P34
S1
Fj
S2
Fk
FU
Qj
P40
FU
Qk
P12
Fj?
Rj
Fk?
Rk
Yes
Yes
Yes
F30
P30
CS252/Kubiatowicz
Lec 7.44
Renamed Scoreboard 9
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Read Exec Write
Issue Oper Comp Result
1
2
3
4
5
2
3
6
6
Functional unit status:
Time Name
Int1
Int2
7 Mult1
Add
Divide
9/22/03
FU
4
5
8
9
S1
Fj
S2
Fk
Busy
Op
dest
Fi
No
No
Yes
No
Yes
Multd
P36
P34
P4
Divd
P40
P36
P32
Mult1
No
Yes
F4
F6
F8 F10 F12
...
F30
P4
P32
P38
Register Rename and Result
Clock
F0 F2
9
3
4
P36
P34
FU
Qj
P40
FU
Qk
P12
Fj?
Rj
Fk?
Rk
Yes
Yes
P30
CS252/Kubiatowicz
Lec 7.45
Renamed Scoreboard 10
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Read Exec Write
Issue Oper Comp Result
1
2
3
4
5
10
2
3
6
6
Functional unit status:
Time Name
Int1
Int2
6 Mult1
Add
Divide
Busy
No
No
Yes
Yes
Yes
Op
FU
P36
4
5
8
9
dest
Fi
S1
Fj
S2
Fk
FU
Qj
FU
Qk
Fj?
Rj
Fk?
Rk
WAR Hazard gone!
Multd
Addd
Divd
Register Rename and Result
Clock
F0 F2
10
3
4
P34
P36
P42
P40
P34
P38
P36
P4
P34
P32
Mult1
Yes
Yes
No
Yes
Yes
Yes
F4
F6
F8 F10 F12
...
F30
P4
P42
P38
P40
P12
P30
• Notice that P32 not listed in Rename Table
– Still live. Must not be reallocated by accident
CS252/Kubiatowicz
9/22/03
Lec 7.46
Renamed Scoreboard 11
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Read Exec Write
Issue Oper Comp Result
1
2
3
4
5
10
2
3
6
6
9/22/03
FU
8
9
S1
Fj
S2
Fk
Busy
Op
dest
Fi
No
No
Yes
Yes
Yes
Multd
Addd
Divd
P36
P42
P40
P34
P38
P36
P4
P34
P32
Mult1
Yes
Yes
No
F4
F6
F8 F10 F12
...
P4
P42
P38
Register Rename and Result
Clock
F0 F2
11
4
5
11
Functional unit status:
Time Name
Int1
Int2
5 Mult1
2 Add
Divide
3
4
P36
P34
FU
Qj
P40
FU
Qk
P12
Fj?
Rj
Fk?
Rk
Yes
Yes
Yes
F30
P30
CS252/Kubiatowicz
Lec 7.47
Renamed Scoreboard 12
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Read Exec Write
Issue Oper Comp Result
1
2
3
4
5
10
2
3
6
6
9/22/03
FU
8
9
S1
Fj
S2
Fk
Busy
Op
dest
Fi
No
No
Yes
Yes
Yes
Multd
Addd
Divd
P36
P42
P40
P34
P38
P36
P4
P34
P32
Mult1
Yes
Yes
No
F4
F6
F8 F10 F12
...
P4
P42
P38
Register Rename and Result
Clock
F0 F2
12
4
5
11
Functional unit status:
Time Name
Int1
Int2
4 Mult1
1 Add
Divide
3
4
P36
P34
FU
Qj
P40
FU
Qk
P12
Fj?
Rj
Fk?
Rk
Yes
Yes
Yes
F30
P30
CS252/Kubiatowicz
Lec 7.48
Renamed Scoreboard 13
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Read Exec Write
Issue Oper Comp Result
1
2
3
4
5
10
2
3
6
6
3
4
4
5
8
9
11
13
Busy
Op
dest
Fi
S1
Fj
S2
Fk
No
No
Yes
Yes
Yes
Multd
Addd
Divd
P36
P42
P40
P34
P38
P36
P4
P34
P32
F4
P4
Functional unit status:
Time Name
Int1
Int2
3 Mult1
0 Add
Divide
Register Rename and Result
Clock
F0 F2
13
9/22/03
FU
P36
P34
FU
Qj
Fj?
Rj
Fk?
Rk
Mult1
Yes
Yes
No
Yes
Yes
Yes
F6
F8 F10 F12
...
F30
P42
P38
P40
FU
Qk
P12
P30
CS252/Kubiatowicz
Lec 7.49
Renamed Scoreboard 14
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Read Exec Write
Issue Oper Comp Result
1
2
3
4
5
10
2
3
6
6
3
4
4
5
8
9
11
13
14
Busy
Op
dest
Fi
S1
Fj
S2
Fk
No
No
Yes
No
Yes
Multd
P36
P34
P4
Divd
P40
P36
P32
F4
P4
Functional unit status:
Time Name
Int1
Int2
2 Mult1
Add
Divide
Register Rename and Result
Clock
F0 F2
14
9/22/03
FU
P36
P34
FU
Qj
Fj?
Rj
Fk?
Rk
Yes
Yes
Mult1
No
Yes
F6
F8 F10 F12
...
F30
P42
P38
P40
FU
Qk
P12
P30
CS252/Kubiatowicz
Lec 7.50
Renamed Scoreboard 15
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Read Exec Write
Issue Oper Comp Result
1
2
3
4
5
10
2
3
6
6
3
4
4
5
8
9
11
13
14
Busy
Op
dest
Fi
S1
Fj
S2
Fk
No
No
Yes
No
Yes
Multd
P36
P34
P4
Divd
P40
P36
P32
F4
P4
Functional unit status:
Time Name
Int1
Int2
1 Mult1
Add
Divide
Register Rename and Result
Clock
F0 F2
15
9/22/03
FU
P36
P34
FU
Qj
Fj?
Rj
Fk?
Rk
Yes
Yes
Mult1
No
Yes
F6
F8 F10 F12
...
F30
P42
P38
P40
FU
Qk
P12
P30
CS252/Kubiatowicz
Lec 7.51
Renamed Scoreboard 16
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Read Exec Write
Issue Oper Comp Result
1
2
3
4
5
10
2
3
6
6
3
4
16
8
4
5
11
13
14
Busy
Op
dest
Fi
S1
Fj
S2
Fk
No
No
Yes
No
Yes
Multd
P36
P34
P4
Divd
P40
P36
P32
F4
P4
Functional unit status:
Time Name
Int1
Int2
0 Mult1
Add
Divide
Register Rename and Result
Clock
F0 F2
16
9/22/03
FU
P36
P34
9
FU
Qj
Fj?
Rj
Fk?
Rk
Yes
Yes
Mult1
No
Yes
F6
F8 F10 F12
...
F30
P42
P38
P40
FU
Qk
P12
P30
CS252/Kubiatowicz
Lec 7.52
Renamed Scoreboard 17
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Read Exec Write
Issue Oper Comp Result
1
2
3
4
5
10
2
3
6
6
3
4
16
8
4
5
17
9
11
13
14
Busy
Op
dest
Fi
S1
Fj
S2
Fk
FU
Qj
No
No
No
No
Yes
Divd
P40
P36
P32
F4
P4
Functional unit status:
Time Name
Int1
Int2
Mult1
Add
Divide
Register Rename and Result
Clock
F0 F2
17
9/22/03
FU
P36
P34
Fj?
Rj
Fk?
Rk
Mult1
Yes
Yes
F6
F8 F10 F12
...
F30
P42
P38
P40
FU
Qk
P12
P30
CS252/Kubiatowicz
Lec 7.53
Renamed Scoreboard 18
Instruction status:
Instruction
LD
F6
LD
F2
MULTD F0
SUBD
F8
DIVD
F10
ADDD
F6
j
34+
45+
F2
F6
F0
F8
k
R2
R3
F4
F2
F6
F2
Read Exec Write
Issue Oper Comp Result
1
2
3
4
5
10
2
3
6
6
18
11
Functional unit status:
Time Name
Int1
Int2
Mult1
Add
40 Divide
9/22/03
FU
4
5
17
9
13
14
S1
Fj
S2
Fk
FU
Qj
Busy
Op
dest
Fi
No
No
No
No
Yes
Divd
P40
P36
P32
Mult1
Yes
Yes
F4
F6
F8 F10 F12
...
F30
P4
P42
P38
Register Rename and Result
Clock
F0 F2
18
3
4
16
8
P36
P34
P40
FU
Qk
P12
Fj?
Rj
Fk?
Rk
P30
CS252/Kubiatowicz
Lec 7.54
Explicit Renaming Support
Includes:
• Rapid access to a table of translations
• A physical register file that has more registers
than specified by the ISA
• Ability to figure out which physical registers are
free.
– No free registers stall on issue
• Thus, register renaming doesn’t require
reservation stations. However:
– Many modern architectures use explicit register renaming +
Tomasulo-like reservation stations to control execution.
9/22/03
CS252/Kubiatowicz
Lec 7.55
Explicit register renaming:
R10000 Freelist Management
P0 P2 P4 F6 F8 P10 P12 P14 P16 P18 P20 P22 P24 p26 P28 P30
Current Map Table
Done?
Newest
P32 P34 P36 P38
Freelist
P60 P62
Oldest
• Physical register file larger than ISA register file
• On issue, each instruction that modifies a register is
allocated new physical register from freelist
• Used on: R10000, Alpha 21264, HP PA8000
9/22/03
CS252/Kubiatowicz
Lec 7.56
Explicit register renaming:
R10000 Freelist Management
P32 P2 P4 F6 F8 P10 P12 P14 P16 P18 P20 P22 P24 p26 P28 P30
Done?
Current Map Table
Newest
P34 P36 P38 P40
Freelist
P60 P62
F0 P0 LD P32,10(R2)
N
Oldest
• Note that physical register P0 is “dead” (or not “live”)
past the point of this load.
– When we go to commit the load, we free up
9/22/03
CS252/Kubiatowicz
Lec 7.57
Explicit register renaming:
R10000 Freelist Management
P32 P2 P4 F6 F8 P34 P12 P14 P16 P18 P20 P22 P24 p26 P28 P30
Current Map Table
Done?
Newest
P36 P38 P40 P42
Freelist
9/22/03
P60 P62
F10 P10 ADDD P34,P4,P32 N
F0 P0 LD P32,10(R2)
N
Oldest
CS252/Kubiatowicz
Lec 7.58
Explicit register renaming:
R10000 Freelist Management
P32 P36 P4 F6 F8 P34 P12 P14 P16 P18 P20 P22 P24 p26 P28 P30
Done?
Current Map Table
--
P38 P40 P44 P48
P60 P62
Freelist
-F2 P2
F10 P10
F0 P0
Newest
BNE P36,<…>
DIVD P36,P34,P6
ADDD P34,P4,P32
LD P32,10(R2)
N
N
N
N
Oldest
P32 P36 P4 F6 F8 P34 P12 P14 P16 P18 P20 P22 P24 p26 P28 P30
P38 P40 P44 P48
9/22/03
P60 P62
Checkpoint at BNE instruction
CS252/Kubiatowicz
Lec 7.59
Explicit register renaming:
R10000 Freelist Management
P40 P36 P38 F6 F8 P34 P12 P14 P16 P18 P20 P22 P24 p26 P28 P30
Current Map Table
P42 P44 P48 P50
P0 P10
Freelist
-F0 P32
F4 P4
-F2 P2
F10 P10
F0 P0
Done?
ST 0(R3),P40
Y
Newest
ADDD P40,P38,P6 Y
LD P38,0(R3)
Y
BNE P36,<…>
N
DIVD P36,P34,P6 N
ADDD P34,P4,P32 y
Oldest
LD P32,10(R2)
y
P32 P36 P4 F6 F8 P34 P12 P14 P16 P18 P20 P22 P24 p26 P28 P30
P38 P40 P44 P48
9/22/03
P60 P62
Checkpoint at BNE instruction
CS252/Kubiatowicz
Lec 7.60
Explicit register renaming:
R10000 Freelist Management
P32 P36 P4 F6 F8 P34 P12 P14 P16 P18 P20 P22 P24 p26 P28 P30
Current Map Table
Done?
Newest
P38 P40 P44 P48
P60 P62
Freelist
F2 P2 DIVD P36,P34,P6 N
F10 P10 ADDD P34,P4,P32 y
F0 P0 LD P32,10(R2)
y
Oldest
Speculation error fixed by restoring map table and freelist
P32 P36 P4 F6 F8 P34 P12 P14 P16 P18 P20 P22 P24 p26 P28 P30
P38 P40 P44 P48
9/22/03
P60 P62
Checkpoint at BNE instruction
CS252/Kubiatowicz
Lec 7.61
What about Precise
Interrupts?
• Both Scoreboard and Tomasulo have:
In-order issue, out-of-order execution, and
out-of-order completion
• Need to “fix” the out-of-order completion
aspect so that we can find precise
breakpoint in instruction stream.
9/22/03
CS252/Kubiatowicz
Lec 7.62
Relationship between precise
interrupts and specultation:
• Speculation is a form of guessing.
• Important for branch prediction:
– Need to “take our best shot” at predicting branch direction.
– If we issue multiple instructions per cycle, lose lots of potential
instructions otherwise:
» Consider 4 instructions per cycle
» If take single cycle to decide on branch, waste from 4 - 7 instruction
slots!
• If we speculate and are wrong, need to back up and restart
execution to point at which we predicted incorrectly:
– This is exactly same as precise exceptions!
• Technique for both precise interrupts/exceptions and
speculation: in-order completion or commit
9/22/03
CS252/Kubiatowicz
Lec 7.63
HW support for precise interrupts
• Need HW buffer for results of
uncommitted instructions: reorder
buffer
– 3 fields: instr, destination, value
– Reorder buffer can be operand source
=> more registers like RS
– Use reorder buffer number instead of
reservation station when execution
FP
completes
Op
– Supplies operands between execution
Queue
complete & commit
– Once operand commits,
result is put into register
– Instructionscommit
– As a result, its easy to undo speculated
Res Stations
instructions
on mispredicted branches
FP Adder
or on exceptions
9/22/03
Reorder
Buffer
FP Regs
Res Stations
FP Adder
CS252/Kubiatowicz
Lec 7.64
Four Steps of Speculative
Tomasulo Algorithm
1.Issue—get instruction from FP Op Queue
If reservation station and reorder buffer slot free, issue instr &
send operands & reorder buffer no. for destination (this stage
sometimes called “dispatch”)
2.Execution—operate on operands (EX)
When both operands ready then execute; if not ready, watch CDB
for result; when both in reservation station, execute; checks RAW
(sometimes called “issue”)
3.Write result—finish execution (WB)
Write on Common Data Bus to all awaiting FUs
& reorder buffer; mark reservation station available.
4.Commit—update register with reorder result
When instr. at head of reorder buffer & result present, update
register with result (or store to memory) and remove instr from
reorder buffer. Mispredicted branch flushes reorder buffer
(sometimes called “graduation”)
9/22/03
CS252/Kubiatowicz
Lec 7.65
Summary
• DataFlow view:
– Data triggers execution rather than instructions triggering data
• Dynamic hardware schemes can unroll loops dynamically
in hardware
– Form of limited dataflow
– Register renaming is essential
• Explicit Renaming: more physical registers than needed
by ISA.
– Rename table: tracks current association between architectural
registers and physical registers
– Uses a translation table to perform compiler-like
transformation on the fly
• Precise Interrupts:
– Must commit things back in order
– Reorder buffer: temporarily holds results until commit possible
– Toss out things to achieve precise interrupt point
9/22/03
CS252/Kubiatowicz
Lec 7.66