CS 162 Computer Architecture Lecture 3: Pipelining Contd. Instructor: L.N. Bhuyan www.cs.ucr.edu/~bhuyan/cs162

Download Report

Transcript CS 162 Computer Architecture Lecture 3: Pipelining Contd. Instructor: L.N. Bhuyan www.cs.ucr.edu/~bhuyan/cs162

CS 162 Computer Architecture
Lecture 3: Pipelining Contd.
Instructor: L.N. Bhuyan
www.cs.ucr.edu/~bhuyan/cs162
1
1999 ©UCB
Single Cycle Datapath (From Ch 5)
4
P
C
Read
Addr
31:0
Instruction
Imem
15:11
a
d
d
25:21
20:16
M
u
x
<<
2
PCSrc
MemWrite
Read
Reg1
Read
Read data1
Reg2
Read
Write
data2
Reg
Write
Data
Regs
RegDst
RegWrite
15:0
2
a
d
d
M
u
x
Sign
Extend
M
u
x
A
L
U
Read
data
Zero
Address
MemToReg
Dmem
ALUcon
ALUsrc
Write
Data
MemRead
ALUOp
M
u
x
1999 ©UCB
Required Changes to Datapath
°Introduce registers to separate 5
stages by putting IF/ID, ID/EX, EX/MEM,
and MEM/WB registers in the datapath.
°Next PC value is computed in the 3rd
step, but we need to bring in next instn
in the next cycle – Move PCSrc Mux to
1st stage. The PC is incremented unless
there is a new branch address.
°Branch address is computed in 3rd
stage. With pipeline, the PC value has
changed! Must carry the PC value
along with instn. Width of IF/ID register
= (IR)+(PC) = 64 bits.
3
1999 ©UCB
Changes to Datapath Contd.
°For lw instn, we need write register
address at stage 5. But the IR is now
occupied by another instn! So, we
must carry the IR destination field as
we move along the stages. See
connection in fig.
Length of ID/EX register =
(Reg1:32)+(Reg2:32)+(offset:32)+
(PC:32)+ (destination register:5)
= 133 bits
Assignment: What are the lengths of
EX/MEM, and MEM/WB registers
4
1999 ©UCB
Pipelined Datapath (with Pipeline Regs)(6.2)
Fetch
Decode
Execute
Memory
Write
Back
0
M
u
x
1
IF/ID
EX/MEM
ID/EX
MEM/WB
Add
4
Add
Add
result
PC
Ins truction
Shift
left 2
Address
Read
register 1
Read
data 1
Read
register 2
Read
data 2
Write
register
Imem
Write
data
0
M
u
x
1
Regs
Zero
ALU ALU
result
Address
Write
data
16
Sign
extend
32
Read
data
1
M
u
x
0
Dmem
5
5
64 bits
133 bits
102 bits
69 bits
1999 ©UCB
Pipelined Control (6.3)
• Start with single-cycle controller
• Group control lines by pipeline stage needed
• Extend pipeline registers with control bits
WB
Instruction
Control
Mem
WB
EX
Mem
RegDst
ALUop
ALUSrc
IF/ID
6
ID/EX
WB
Branch
MemRead
MemWrite
EX/MEM
MemToReg
RegWrite
MEM/WB
1999 ©UCB
Pipelined Processor: Datapath + Control
• More work to correctly handle pipeline hazards
PCSrc
ID/EX
0
M
u
x
1
WB
Control
IF/ID
EX/MEM
M
WB
EX
M
MEM/WB
WB
Add
Imem
Read
regis ter 2
Writ e
regis ter
Writ e
data
ALUSrc
Read
data 1
Read
data 2
Regs
0
M
u
x
1
Zero
ALU ALU
result
Address
Write
data
Instruction 16
[15– 0]
Instruction
[20– 16]
Instruction
[15– 11]
Si gn
ex tend
32
6
ALU
control
0
M
u
x
1
MemToReg
Read
regis ter 1
Branch
Sh if t
left 2
MemWrite
Address
Instructi on
PC
Add
Add result
RegWrite
4
Read
data
Dmem
1
M
u
x
0
MemRead
ALUOp
RegDst
7
1999 ©UCB
Recap
°if can keep all pipeline stages busy,
can retire (complete) up to one
instruction per clock cycle (thereby
achieving single-cycle throughput)
°The pipeline paradox (for MIPS): any
instruction still takes 5 cycles to
execute (even though can retire one
instruction per cycle)
8
1999 ©UCB
Problems for Pipelining
°Hazards prevent next instruction from
executing during its designated clock
cycle, limiting speedup
• Structural hazards: HW cannot support
this combination of instructions (single
memory for instruction and data)
• Data hazards: Instruction depends on
result of prior instruction still in the
pipeline
• Control hazards: conditional branches &
other instructions may stall the pipeline
delaying later instructions
9
1999 ©UCB
Single Memory is a Structural Hazard
Time (clock cycles)
Reg
Reg
M
Reg
M
Reg
M
Reg
M
Reg
ALU
M
Reg
M
Reg
ALU
M
M
ALU
Reg
ALU
10
M
ALU
I
n
s Load
t Instr 1
r.
Instr 2
O
Instr 3
r
d Instr 4
e
r
M
Reg
• Can’t read same memory twice in same clock cycle
1999 ©UCB
EX: MIPS multicycle datapath:
Structural Hazard in Memory
P
C
Address
Instruction
Register
Read
Reg1
Memory
Read
Reg2
Instruction
or Data
Data
11
Read
data 1
A
Registers
Memory
Data
Register
Write
Reg
Read
data 2
A
L
U
ALUOut
B
Data
1999 ©UCB
Structural Hazards limit performance
°Example: if 1.3 memory accesses per
instruction (30% of instructions
execute loads and stores)
and only one memory access per
cycle then
• Average CPI  1.3
• Otherwise datapath resource is more
than 100% utilized
Structural Hazard Solution: Add more
Hardware
12
1999 ©UCB
Speed Up Equation for Pipelining
CPIpipelined = Ideal CPI + Pipeline stall clock cycles per instn
Speedup = Ideal CPI x Pipeline depth
Clock Cycleunpipelined
---------------------------------- X -------------------------
Ideal CPI + Pipeline stall
CPI Clock Cyclepipelined
x
Speedup =
Pipeline depth
Clock Cycleunpipelined
------------------------ X --------------------------1 + Pipeline stall CPI Clock Cyclepipelined
13
1999 ©UCB
Example: Dual-port vs. Single-port
° Machine A: Dual ported memory
° Machine B: Single ported memory, but its pipelined
implementation has a 1.05 times faster clock rate
° Ideal CPI = 1 for both
° Loads are 40% of instructions executed
SpeedUpA = Pipeline Depth/(1 + 0) x (clockunpipe/clockpipe)
= Pipeline Depth
SpeedUpB = Pipeline Depth/(1 + 0.4 x 1)
x (clockunpipe/(clockunpipe / 1.05)
= (Pipeline Depth/1.4) x 1.05
= 0.75 x Pipeline Depth
SpeedUpA / SpeedUpB = Pipeline Depth/(0.75 x Pipeline Depth) = 1.33
° Machine A is 1.33 times faster
14
1999 ©UCB
Data Hazard on Register $1 (6.4)
add $1 ,$2, $3
sub $4, $1 ,$3
and $6, $1 ,$7
or
$8, $1 ,$9
xor $10, $1 ,$11
15
1999 ©UCB
Data Hazard Solution:
• “Forward” result from one stage to another
Time (clock cycles)
and $6,$1,$7
IM
Reg
IM
EX
MEM
WB
DM
Reg
Reg
DM
Reg
IM
Reg
ALU
sub $4,$1,$3
ID/RF
ALU
add $1,$2,$3
IF
ALU
I
n
s
t
r.
DM
Reg
ALU
O
IM
DM
Reg
r or $8,$1,$9
d
IM
Reg
xor $10,$1,$11
e
r
• “or” OK if implement register file properly
ALU
16
Reg
DM
Reg
1999 ©UCB
Hazard Detection for Forwarding
° A hazard must be detected just before execution so that
in case of hazard, the data can be forwarded to the
input of the ALU.
° It can be detected when a source register (Rs or Rt or
both) of the instruction at the EX stage is equal to the
destination register (Rd) of an instruction in the
pipeline (either in MEM or WB stage)
° Compare the values of Rs and Rt registers in the ID/EX
stage with Rd at EX/MEM and MEM/WB stages =>
Need to carry Rs, Rt, Rd values to the ID/EX register
from the IF/ID register (only Rd was carried before)
° If they match, forward the data to the input of the ALU
through the multiplexor.
See Fig. 6.43 pp. 488 of the text
17
1999 ©UCB
Forwarding: What about Loads?
• Dependencies backward in time are hazards
IM
Reg
IM
EX
MEM
WB
DM
Reg
Reg
ALU
sub $4,$1,$3
ID/RF
ALU
lw $1,0($2)
IF
DM
Reg
• Can’t solve with forwarding alone
• Must stall instruction dependent on load
•“Load-Use” hazard
18
1999 ©UCB
Data Hazard Even with Forwarding
• Must stall pipeline 1 cycle (insert 1 bubble)
Time (clock cycles)
19
IM
WB
DM
Reg
Reg
bub
ble
IM
bub
ble
Reg
bub
ble
IM
DM
Reg
DM
Reg
Reg
ALU
or $8,$1,$9
Reg
MEM
ALU
and $6,$1,$7
IM
EX
ALU
sub $4,$1,$6
ID/RF
ALU
lw $1, 0($2)
IF
DM
1999 ©UCB
Compiler Schemes to Improve Load Delay
° Compiler will detect data dependency and inserts
nop instructions until data is available
sub $2, $1, $3
nop
and $12, $2, $5
or $13, $6, $2
add $14, $2, $2
sw $15, 100($2)
° Compiler will find independent instructions to
fill in the delay slots
20
1999 ©UCB
Software Scheduling to Avoid Load Hazards
Try producing fast code for
a = b + c;
d = e – f;
assuming a, b, c, d ,e, and f in memory.
Slow code:
LW
LW
ADD
SW
LW
LW
SUB
SW
21
Fast code:
Rb,b
Rc,c
Ra,Rb,Rc
a,Ra
Re,e
Rf,f
Rd,Re,Rf
d,Rd
LW
Rb,b
LW
Rc,c
LW
Re,e
ADD
Ra,Rb,Rc
LW
Rf,f
SW
a,Ra
SUB
Rd,Re,Rf
SW
d,Rd
1999 ©UCB