Transcript Document
Pipelining
Reducing Instruction Execution
Time
COMP25212 Lecture 5
1
The Fetch-Execute Cycle
CPU
Memory
• Instruction execution
is a simple repetitive
cycle
Fetch Instruction
Execute Instruction
COMP25212 Lecture 5
2
Cycles of Operation
• Most logic circuits are driven by a clock
• In its simplest form one operations would
take one clock cycle
• This is assuming that getting an instruction
and accessing data memory can each be
done in a 1/5th of a cycle (i.e. a cache hit)
COMP25212 Lecture 5
3
Fetch-Execute Detail
The two parts of the cycle can be subdivided
• Fetch
– Get instruction from memory
– Decode instruction & select registers
• Execute
– Perform operation or calculate address
– Access an operand in data memory
– Write result to a register
COMP25212 Lecture 5
4
Processor Detail
IF
EX
MEM
WB
Data
Cache
Register Bank
ALU
MUX
Instruction
Cache
PC
ID
COMP25212
Logic to do this
Inst Cache
Data Cache
Write Logic
Mem Logic
Exec Logic
Decode Logic
Fetch Logic
• Each stage will do its work and pass work to the
next
• Each block is only computing once every 1/5th of
a cycle
COMP25212 Lecture 5
6
Can We Overlap Operations?
• E.g while decoding one instruction we could be
fetching the next
Inst a
Inst b
Inst c
Inst d
Inst e
1
IF
Clock Cycle
2
3
4
5
6
7
ID EX MEM WB
IF
ID EX MEM WB
IF
ID EX MEM WB
IF
ID EX MEM
IF
ID EX
COMP25212 Lecture 5
7
Insert Buffers Between Stages
Inst Cache
Data Cache
clock
Write Logic
Mem Logic
Exec Logic
Decode Logic
Instruction Reg.
Fetch Logic
• Instead of direct connection between
stages – use extra buffers to hold state
• Clock buffers once per cycle
COMP25212 Lecture 5
8
This is a Pipeline
• Just like a car production line, one stage puts
engine in, next puts wheels on etc.
• We still execute one instruction every cycle
• We can now increase the clock speed by 5x
• 5x faster!
• But it isn’t quite that easy!
COMP25212 Lecture 5
9
Why 5 Stages
• Simply because early pipelined processors
determined that dividing into these 5
stages of roughly equal complexity was
appropriate
• Some recent processors have used up to
30 pipeline stages
• We will consider 5 for simplicity at the
moment
COMP25212 Lecture 5
10
The Control Transfer Problem
• The obvious way to fetch instructions is in
serial program order (i.e. just incrementing
the PC)
• What if we fetch a branch?
• We only know it’s a branch when we
decode it in the second stage of the
pipeline
• By that time we are already fetching the
next instruction in serial order
COMP25212 Lecture 5
11
A Pipeline ‘Bubble’
Decode here
1
2
3
Bra
5
2
3
Bra
5
n
cycles
3
Bra
5
n
n+1
We must mark Inst 5 as unwanted and
Ignore it as it goes down the pipeline.
But we have wasted a cycle
COMP25212 Lecture 5
Inst 1
Inst 2
Inst 3
Branch n
Inst 5
.
.
Inst n
12
Conditional Branches
• It gets worse!
• Suppose we have a conditional branch
• It is possible that we might not be able to
determine the branch outcome until the
execute (3rd) stage
• We would then have 2 ‘bubbles’
• We can often avoid this by reading
registers during the decode stage.
COMP25212 Lecture 5
13
Longer Pipelines
• ‘Bubbles’ due to branches are usually
called Control Hazards
• They occur when it takes one or more
pipeline stages to detect the branch
• The more stages, the less each does
• More likely to take multiple stages
• Longer pipelines usually suffer more
degradation from control hazards
COMP25212 Lecture 5
14
Branch Prediction
• In most programs a branch instruction is
executed many times
• Also, the instructions will be at the same
(virtual) address in memory
• What if, when a branch was executed
– We ‘remembered’ its address
– We ‘remembered’ the address that was
fetched next
COMP25212 Lecture 5
15
Branch Target Buffer
• We could do this with some sort of cache
Address
Data
Branch Add Target Add
• As we fetch the branch we check the target
• If a valid entry in buffer we use that to fetch next
instruction
COMP25212 Lecture 5
16
Branch Target Buffer
• For an unconditional branch we would
always get it right
• For a conditional branch it depends on the
probability that the next branch is the
same as the previous
• E.g. a ‘for’ loop which jumps back many
times we will get it right most of the time
• But it is only a prediction, if we get it wrong
we correct next cycle (suffer a ‘bubble’)
COMP25212 Lecture 5
17
Outline Implementation
valid
Branch
Target
Buffer
Inst
Cache
inc
PC
Fetch
Stage
COMP25212 Lecture 5
18
Other Branch Prediction
• BTB is simple to understand but expensive
to implement
• Also, as described, it just uses the last
branch to predict
• In practice branch prediction depends on
– More history (several previous branches)
– Context (how did we get to this branch)
• Real branch predictors are more complex
and vital to performance (long pipelines)
COMP25212 Lecture 5
19