CS152 – Computer Architecture and Engineering Lecture 12 – Pipeline Wrap up: Control Hazards, RAW/WAR/WAW 2004-10-07 John Lazzaro (www.cs.berkeley.edu/~lazzaro) Dave Patterson (www.cs.berkeley.edu/~patterson) www-inst.eecs.berkeley.edu/~cs152/ CS 152 L12 RAW Hazards, Interrrupts (1) Fall.

Download Report

Transcript CS152 – Computer Architecture and Engineering Lecture 12 – Pipeline Wrap up: Control Hazards, RAW/WAR/WAW 2004-10-07 John Lazzaro (www.cs.berkeley.edu/~lazzaro) Dave Patterson (www.cs.berkeley.edu/~patterson) www-inst.eecs.berkeley.edu/~cs152/ CS 152 L12 RAW Hazards, Interrrupts (1) Fall.

CS152 – Computer Architecture and
Engineering
Lecture 12 – Pipeline Wrap up:
Control Hazards, RAW/WAR/WAW
2004-10-07
John Lazzaro
(www.cs.berkeley.edu/~lazzaro)
Dave Patterson
(www.cs.berkeley.edu/~patterson)
www-inst.eecs.berkeley.edu/~cs152/
CS 152 L12 RAW Hazards, Interrrupts (1)
Fall 2004 © UC Regents
Pipelining Review
• What makes it easy
– all instructions are the same length
– just a few instruction formats
– memory operands appear only in loads and stores
• Hazards limit performance
– Structural: need more HW resources
– Data: need forwarding, compiler scheduling
• Data hazards must be handled carefully
• MIPS I instruction set architecture made pipeline
visible (delayed branch, delayed load)
CS 152 L12 RAW Hazards, Interrrupts (2)
Fall 2004 © UC Regents
Outline
•
•
•
•
Pipelined Control
Control Hazards
RAW, WAR, WAW
Brainstorm on pipeline bugs
CS 152 L12 RAW Hazards, Interrrupts (3)
Fall 2004 © UC Regents
MIPS Pipeline Data / Control Paths A (fast)
1
PCSrc
ID/EX
0
EX
Control
EX/MEM
MEM
IF/ID
Add
RegWrite
4
PC
Instruction
Memory
Read
Address
Shift
left 2
Add
Read Addr 1
File
Write Addr
Write Data
16
Sign
Extend
WB
Data
Memory
Register Read
Read Addr 2Data 1
MEM/WB
Branch
ALUSrc
ALU
Read
Data 2
Address
0
Read
Data
32
ALU
cntrl
1
0
Write Data
1
MemtoReg
MemWrite MemRead
ALUOp
0
1
RegDst
CS 152 L12 RAW Hazards, Interrrupts (4)
Fall 2004 © UC Regents
MIPS Pipeline Data / Control Paths (debug)
1
ID/EX
PCSrc
EX/MEM
MEM/WB
EX
Control
Add
RegWrite
4
PC
Instruction
Memory
Read
Address
Shift
left 2
Add
Read Addr 1
File
Write Addr
Write Data
16
Sign
Extend
MEM
Control
ALUSrc
ALU
Read
Data 2
Address
0
Read
Data
32
ALU
cntrl
MemtoReg
1
0
Write Data
1
WB
Control
Branch
Data
Memory
Register Read
Read Addr 2Data 1
Instr
IF/ID
Instr
Instr
0
MemWrite MemRead
ALUOp
0
1
RegDst
CS 152 L12 RAW Hazards, Interrrupts (5)
Fall 2004 © UC Regents
MIPS Pipeline Control (pipelined debug)
1
EX
IF/ID
MEM
Control
WB
Control
Control
Add
RegWrite
4
PC
Instruction
Memory
Read
Address
Shift
left 2
Add
Read Addr 1
File
Write Addr
Write Data
16
Sign
Extend
Branch
Data
Memory
Register Read
Read Addr 2Data 1
MEM/WB
Instr
Instr
0
PCSrc
EX/MEM
Instr
ID/EX
ALUSrc
ALU
Read
Data 2
Address
0
Read
Data
32
ALU
cntrl
1
0
Write Data
1
MemtoReg
MemWrite MemRead
ALUOp
0
1
RegDst
CS 152 L12 RAW Hazards, Interrrupts (6)
Fall 2004 © UC Regents
Control Hazards
• When the flow of instruction addresses is
not what the pipeline expects; incurred by
change of flow instructions
– Conditional branches (beq, bne)
– Unconditional branches (j)
• Possible solutions
– Stall
– Move decision point earlier in the pipeline
– Predict
– Delay decision (requires compiler support)
CS 152 L12 RAW Hazards, Interrrupts (7)
Fall 2004 © UC Regents
Datapath Branch and Jump Hardware
Jump
PCSrc
1
1
0
ID/EX
Shift
left 2
0
IF/ID
EX/MEM
Control
Add
PC+4[31-28]
Shift
left 2
4
PC
Instruction
Memory
Read
Address
Add
Read Addr 1
Data
Memory
Register Read
Read Addr 2Data 1
File
Write Addr
Write Data
16
Sign
Extend
MEM/WB
Branch
ALU
Read
Data 2
1
Address
Read
Data
Write Data
0
32
1
0
ALU
cntrl
0
1
Forward
Unit
CS 152 L12 RAW Hazards, Interrrupts (9)
Fall 2004 © UC Regents
Administrivia
• Finish Lab 3; meet with TA Friday
• Midterm Tue Oct 12 5:30 - 8:30 in 101
Morgan
– Northwest corner of campus, near Arch and
Hearst
– Midterm review Sunday Oct 10, 7 PM, 306 Soda
– Bring 1 page, handwritten notes, both sides
– Nothing electronic: no calculators, cell phones,
pagers, …
– Meet at LaVal’s Northside afterwards for Pizza
CS 152 L12 RAW Hazards, Interrrupts (10)
Fall 2004 © UC Regents
Jumps Incur One Stall
• Jumps not decoded until ID, so one stall is needed
j
O
r
d
e
r
lw
IM
Reg
ALU
I
n
s
t
r.
DM
Reg
stall
IM
Reg
DM
ALU
Reg
ALU
and
IM
Reg
DM
Reg
• Fortunately, jumps are very infrequent –
only 2% of the SPECint instruction mix
CS 152 L12 RAW Hazards, Interrrupts (11)
Fall 2004 © UC Regents
Review: Branches Incur Three Stalls
beq
O
r
d
e
r
stall
IM
Reg
ALU
I
n
s
t
r.
DM
Reg
Can fix
branch
hazard by
waiting –
stall – but
affects
throughput
stall
stall
CS 152 L12 RAW Hazards, Interrrupts (12)
Reg
IM
Reg
DM
ALU
and
IM
ALU
lw
Reg
DM
Fall 2004 © UC Regents
Moving Branch Decisions Earlier in Pipe
• Move the branch decision hardware back to the EX stage
– Reduces the number of stall cycles to two
– Adds an and gate and a 2x1 mux to the EX timing path
• Add hardware to compute the branch target address and
evaluate the branch decision to the ID stage
– Reduces the number of stall cycles to one (like with jumps)
– Computing branch target address can be done in parallel with
RegFile read (done for all instructions – only used when needed)
– Comparing the registers can’t be done until after RegFile read, so
comparing and updating the PC adds a comparator, an and gate,
and a 3x1 mux to the ID timing path
– Need forwarding hardware in ID stage
• For longer pipelines, decision points are later in the
pipeline, incurring more stalls, so we need a better
solution
CS 152 L12 RAW Hazards, Interrrupts (13)
Fall 2004 © UC Regents
Early Branch Forwarding Issues
• Bypass of source operands from the EX/MEM
if (IDcontrol.Branch
and (EX/MEM.RegisterRd
and (EX/MEM.RegisterRd
ForwardC = 1
if (IDcontrol.Branch
and (EX/MEM.RegisterRd
and (EX/MEM.RegisterRd
ForwardD = 1
!=
==
Forwards the
0)
result from the
IF/ID.RegisterRs)) second previous
instr. to either input
of the Compare
!= 0)
== IF/ID.RegisterRt))
• MEM/WB dependency also needs to be forwarded
• If the instruction 2 before the branch is a load, then a
stall will be required since the MEM stage memory
access is occurring at the same time as the ID stage
branch compare operation
CS 152 L12 RAW Hazards, Interrrupts (14)
Fall 2004 © UC Regents
Branch Prediction
•
Resolve branch hazards by assuming a given outcome
and proceeding without waiting to see the actual branch
outcome
1. Predict not taken – always predict branches will not be
taken, continue to fetch from the sequential instruction
stream, only when branch is taken does the pipeline
stall
– If taken, flush instructions in the pipeline after the branch
•
•
in IF, ID, and EX if branch logic in MEM – three stalls
in IF if branch logic in ID – one stall
– ensure that those flushed instructions haven’t changed
machine state– automatic in the MIPS pipeline since machine
state changing operations are at the tail end of the pipeline
(MemWrite or RegWrite)
– restart the pipeline at the branch destination
CS 152 L12 RAW Hazards, Interrrupts (15)
Fall 2004 © UC Regents
Flushing with Misprediction (Not Taken)
•
DM
IM
Reg
IM
Reg
DM
Reg
Reg
DM
ALU
20 or r8,$1,$9
Reg
ALU
16 and $6,$1,$7
O
r
d
e
r
IM
ALU
8 flush
sub $4,$1,$5
Reg
ALU
4 beq $1,$2,2
I
n
s
t
r.
IM
Reg
DM
Reg
To flush the IF stage instruction, add a IF.Flush control
line that zeros the instruction field of the IF/ID pipeline
register (transforming it into a noop)
CS 152 L12 RAW Hazards, Interrrupts (17)
Fall 2004 © UC Regents
Branch Prediction, con’t
•
Resolve branch hazards by statically assuming a given
outcome and proceeding
2. Predict taken – always predict branches will be taken
– Predict taken always incurs a stall (if branch destination
hardware has been moved to the ID stage)
•
As the branch penalty increases (for deeper pipelines),
a simple static prediction scheme will hurt performance
•
With more hardware, possible to try to predict branch
behavior dynamically during program execution
3. Dynamic branch prediction – predict branches at runtime using run-time information
CS 152 L12 RAW Hazards, Interrrupts (18)
Fall 2004 © UC Regents
Dynamic Branch Prediction
• A branch prediction buffer (aka branch history table (BHT))
in the IF stage, addressed by the lower bits of the PC,
contains a bit that tells whether the branch was taken the
last time it was execute
– Bit may predict incorrectly (may be from a different branch with the
same low order PC bits, or may be a wrong prediction for this
branch) but the doesn’t affect correctness, just performance
– If the prediction is wrong, flush the incorrect instructions in pipeline,
restart the pipeline with the right instructions, and invert the
prediction bit
• The BHT predicts when a branch is taken, but does not tell
where its taken to!
– A branch target buffer (BTB) in the IF stage can cache the branch
target address (or !even! the branch target instruction) so that a
stall can be avoided
CS 152 L12 RAW Hazards, Interrrupts (19)
Fall 2004 © UC Regents
1-bit Prediction Accuracy
•
1-bit predictor in loop is incorrect twice when not taken
– Assume predict_bit = 0 to start (indicating
branch not taken) and loop control is at
Loop:
the bottom of the loop code
1. First time through the loop, the predictor
mispredicts the branch since the branch is
taken back to the top of the loop; invert
prediction bit (predict_bit = 1)
2. As long as branch is taken (looping),
prediction is correct
3. Exiting the loop, the predictor again
mispredicts the branch since this time the
branch is not taken falling out of the loop;
invert prediction bit (predict_bit = 0)
•
1st loop instr
2nd loop instr
.
.
.
last loop instr
bne $1,$2,Loop
fall out instr
For 10 times through the loop we have a 80% prediction
accuracy for a branch that is taken 90% of the time
CS 152 L12 RAW Hazards, Interrrupts (20)
Fall 2004 © UC Regents
2-bit Predictors
• A 2-bit scheme can give 90% accuracy since a prediction must be
wrong twice before the prediction bit is changed
right 9 times
wrong on loop
Taken
fall out
1 Predict
Taken
Taken
0 Predict
Not Taken
Not taken
Taken
Predict 1
Taken
right on 1st
iteration
Not taken
Taken
CS 152 L12 RAW Hazards, Interrrupts (22)
Loop: 1st loop instr
2nd loop instr
.
.
.
last loop instr
bne $1,$2,Loop
fall out instr
Not taken
Predict 0
Not Taken
Not taken
Fall 2004 © UC Regents
Delayed Decision
• First, move the branch decision hardware and target
address calculation to the ID pipeline stage
• A delayed branch always executes the next sequential
instruction – the branch takes effect after that next
instruction
– MIPS software moves an instruction to immediately after the
branch that is not affected by the branch (a safe instruction)
thereby hiding the branch delay
• As processor go to deeper pipelines and multiple
issue, the branch delay grows and need more than one
delay slot.
– Delayed branching has lost popularity compared to more
expensive but more flexible dynamic approaches
– Growth in available transistors has made dynamic approaches
relatively cheaper
CS 152 L12 RAW Hazards, Interrrupts (23)
Fall 2004 © UC Regents
Scheduling Branch Delay Slots
A. From before branch
add $1,$2,$3
if $2=0 then
delay slot
becomes
B. From branch target
sub $4,$5,$6
add $1,$2,$3
if $1=0 then
delay slot
becomes
if $2=0 then
add $1,$2,$3
add $1,$2,$3
if $1=0 then
sub $4,$5,$6
C. From fall through
add $1,$2,$3
if $1=0 then
delay slot
sub $4,$5,$6
becomes
add $1,$2,$3
if $1=0 then
sub $4,$5,$6
• A is the best choice, fills delay slot & reduces instruction count (IC)
• In B, the sub instruction may need to be copied, increasing IC
• In B and C, must be okay to execute sub when branch fails
CS 152 L12 RAW Hazards, Interrrupts (24)
Fall 2004 © UC Regents
3 Generic Data Hazards: RAW, WAR, WAW
• Read After Write (RAW)
InstrJ tries to read operand before InstrI
writes it
I: add r1,r2,r3
J: sub r4,r1,r3
• Caused by a “Dependence” (in compiler
nomenclature). This hazard results from an
actual need for communication.
• Forwarding handles many, but not all, RAW
dependencies in 5 stage MIPS pipeline
CS 152 L12 RAW Hazards, Interrrupts (25)
Fall 2004 © UC Regents
3 Generic Data Hazards: RAW, WAR, WAW
• Write After Read (WAR)
InstrJ writes operand before InstrI reads it
I: sub r4,r1,r3
J: add r1,r2,r3
K: mul r6,r1,r7
• Called an “anti-dependence” by compiler writers.
This results from “reuse” of the name “r1”.
• Can’t happen in MIPS 5 stage pipeline because:
– All instructions take 5 stages, and
– Reads are always in stage 2, and
– Register Writes must be in stage 5
CS 152 L12 RAW Hazards, Interrrupts (26)
Fall 2004 © UC Regents
3 Generic Data Hazards: RAW, WAR, WAW
• Write After Write (WAW)
InstrJ writes operand before InstrI writes it.
I: sub r1,r4,r3
J: add r1,r2,r3
K: mul r6,r1,r7
• Called an “output dependence” by compiler writers
This also results from the “reuse” of name “r1”.
• Can’t happen in MIPS 5 stage pipeline because:
– All instructions take 5 stages, and
– Register Writes must be in stage 5
• Can see WAR and WAW in more complicated pipes
CS 152 L12 RAW Hazards, Interrrupts (27)
Fall 2004 © UC Regents
Supporting ID Stage Branches
PCSrc
Branch
1
IF/ID
Add
PC
Instruction
Memory
Read
Address
IF.Flush
4
ID/EX
0
0
Control
Shift
left 2
EX/MEM
1
Add
Read Addr 1
MEM/WB
Compare
Hazard
Unit
0
Data
Memory
RegFile
0
Read Addr 2
Read Data 1
Write Addr
ReadData 2
Write Data
16
ALU
1
0
Sign
Extend 32
Forward
Unit
CS 152 L12 RAW Hazards, Interrrupts (28)
0
1
Read Data
Address
1
Write Data
0
ALU
cntrl
Forward
Unit
Fall 2004 © UC Regents
Brain storm on pipeline bugs
• Where are bugs likely to hide in a pipelined
processor?
1.
2. …
• How can you write tests to uncover these likely
bugs?
1.
2. …
• Once it passes a test, never need to run it again
in the design process?
CS 152 L12 RAW Hazards, Interrrupts (29)
Fall 2004 © UC Regents
Brain storm on pipeline bugs
• Depending on branch solution (move to
ID, delayed, static prediction, dynamic
prediction), where are bugs likely to hide?
1.
2. …
• How can you write tests to uncover these
likely bugs?
1.
2. …
• Once it passes a test, don’t need to run it
again?
CS 152 L12 RAW Hazards, Interrrupts (30)
Fall 2004 © UC Regents
Peer Instruction
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Cycle 6
Ifetch
Reg/Dec
Exec
2nd lw
Ifetch
Reg/Dec
Exec
Mem
Wr
Ifetch
Reg/Dec
Exec
Mem/Wr
Cycle 7
Clock
1st add
3rd add
Mem/Wr
• Suppose we use with a 4 stage pipeline that combines memory access
and write back stages for all instructions but load, stalling when there are
structural hazards. Impact?
1. The branch delay slot is now 0 instructions
2. Most loads cause stall since often a structural hazard on reg. writes
3. Most stores cause stall since they have a structural hazard
4. Both 2 & 3: most loads&stores cause stall due to structural hazards
5. Most loads cause stall, but there is no load-use hazard anymore
6. Both 2 & 3, but there is no load-use hazard anymore
7. None of the above
CS 152 L12 RAW Hazards, Interrrupts (31)
Fall 2004 © UC Regents
Summary: Designing a Pipelined Processor
• Go back and examine your data path and control diagram
• Associate resources with states
– Be sure there are no structural hazards: one use / clock cycle
• Add pipeline registers between stages to balance clock cycle
– Amdahl’s Law suggests splitting longest stage
• Resolve all data and control dependencies
– If backwards in time in pipeline drawing to registers
=> data hazard: forward or stall to resolve them
– If backwards in time in pipeline drawing to PC
=> control hazard: we’ll see next time
• 5 stage pipeline with reads early in same stage, writes later in
same stage, avoids WAR/WAW hazards
• Assert control in appropriate stage
• Develop test instruction sequences likely to uncover pipeline
bugs (If you don’t test it, it won’t work )
CS 152 L12 RAW Hazards, Interrrupts (33)
Fall 2004 © UC Regents