Transcript Document

Chapter One
Introduction to Pipelined
Processors
Non-linear pipeline
• In floating point adder, stage (2) and (4)
needs a shift register.
• We can use the same shift register and then
there will be only 3 stages.
• Then we should have a feedback from third
stage to second stage.
• Further the same pipeline can be used to
perform fixed point addition.
• A pipeline with feed-forward and/or feedback
connections is called non-linear
Example: 3-stage nonlinear
pipeline
3 stage non-linear pipeline
Output A
Input
Sa
Output B
Sb
Sc
• It has 3 stages Sa, Sb and Sc and latches.
• Multiplexers(cross circles) can take more than
one input and pass one of the inputs to
output
• Output of stages has been tapped and used for
feedback and feed-forward.
3 stage non-linear pipeline
• The above pipeline can perform a variety of
functions.
• Each functional evaluation can be represented
by a particular sequence of usage of stages.
• Some examples are:
1. Sa, Sb, Sc
2. Sa, Sb, Sc, Sb, Sc, Sa
3. Sa, Sc, Sb, Sa, Sb, Sc
Reservation Table
• Each functional evaluation can be represented
using a diagram called Reservation Table(RT).
• It is the space-time diagram of a pipeline
corresponding to one functional evaluation.
• X axis – time units
• Y axis – stages
Reservation Table
• For first sequence Sa, Sb, Sc, Sb, Sc, Sa called
function A , we have
Sa
Sb
Sc
0
A
1
2
A
3
4
A
A
A
5
A
Reservation Table
• For second sequence Sa, Sc, Sb, Sa, Sb, Sc
called function B, we have
Sa
Sb
Sc
0
B
1
2
B
B
3
B
4
5
B
B
3 stage non-linear pipeline
Output A
Input
Output B
Sa
Sc
Reservation Table
Time 
Stage 
0
Sa
Sb
Sc
Sb
1
2
3
4
5
Function A
3 stage pipeline : Sa, Sb, Sc, Sb, Sc, Sa
Output A
Input
Output B
Sa
Sb
Sc
Reservation Table
Time 
Stage 
Sa
Sb
Sc
0
A
1
2
3
4
5
3 stage pipeline : Sa, Sb, Sc, Sb, Sc, Sa
Output A
Input
Output B
Sa
Sc
Reservation Table
Time 
Stage 
Sa
Sb
Sc
Sb
0
A
1
A
2
3
4
5
3 stage pipeline : Sa, Sb, Sc, Sb, Sc, Sa
Output A
Input
Output B
Sa
Sc
Reservation Table
Time 
Stage 
Sa
Sb
Sc
Sb
0
A
1
2
A
A
3
4
5
3 stage pipeline : Sa, Sb, Sc, Sb, Sc, Sa
Output A
Input
Output B
Sa
Sc
Reservation Table
Time 
Stage 
Sa
Sb
Sc
Sb
0
A
1
2
A
3
A
A
4
5
3 stage pipeline : Sa, Sb, Sc, Sb, Sc, Sa
Output A
Input
Output B
Sa
Sc
Reservation Table
Time 
Stage 
Sa
Sb
Sc
Sb
0
A
1
2
A
3
4
A
A
A
5
3 stage pipeline : Sa, Sb, Sc, Sb, Sc, Sa
Output A
Input
Output B
Sa
Sc
Reservation Table
Time 
Stage 
Sa
Sb
Sc
Sb
0
A
1
2
A
3
4
A
A
A
5
A
Function B
3 stage pipeline: Sa, Sc, Sb, Sa, Sb, Sc
Output A
Input
Output B
Sa
Sc
Reservation Table
Time 
Stage 
Sa
Sb
Sc
Sb
0
B
1
2
3
4
5
3 stage pipeline: Sa, Sc, Sb, Sa, Sb, Sc
Output A
Input
Output B
Sa
Sc
Reservation Table
Time 
Stage 
Sa
Sb
Sc
Sb
0
B
1
B
2
3
4
5
3 stage pipeline: Sa, Sc, Sb, Sa, Sb, Sc
Output A
Input
Output B
Sa
Sc
Reservation Table
Time 
Stage 
Sa
Sb
Sc
Sb
0
B
1
2
B
B
3
4
5
3 stage pipeline: Sa, Sc, Sb, Sa, Sb, Sc
Output A
Input
Output B
Sa
Sc
Reservation Table
Time 
Stage 
Sa
Sb
Sc
Sb
0
B
1
2
B
B
3
B
4
5
3 stage pipeline: Sa, Sc, Sb, Sa, Sb, Sc
Output A
Input
Output B
Sa
Sc
Reservation Table
Time 
Stage 
Sa
Sb
Sc
Sb
0
B
1
2
B
B
3
B
4
B
5
3 stage pipeline: Sa, Sc, Sb, Sa, Sb, Sc
Output A
Input
Output B
Sa
Sc
Reservation Table
Time 
Stage 
Sa
Sb
Sc
Sb
0
B
1
2
B
B
3
B
4
5
B
B
Reservation Table
• After starting a function, the stages need to be
reserved in corresponding time units.
• Each function supported by multifunction
pipeline is represented by different RTs
• Time taken for function evaluation in units of
clock period is compute time.(For A & B, it is
6)
Reservation Table
• Marking in same row => usage of stage more
than once
• Marking in same column => more than one
stage at a time
Multifunction pipelines
• Hardware of multifunction pipeline should be
reconfigurable.
• Multifunction pipeline can be static or
dynamic
Multifunction pipelines
• Static:
– Initially configured for one functional evaluation.
– For another function, pipeline need to be drained
and reconfigured.
– You cannot have two inputs of different function
at the same time
Multifunction pipelines
• Dynamic:
– Can do different functional evaluation at a time.
– It is difficult to control as we need to be sure that
there is no conflict in usage of stages.
Principle of Designing Pipeline
Processors
(Design Problems of Pipeline
Processors)
Instruction Prefetch and Branch
Handling
• The instructions in computer programs can be
classified into 4 types:
– Arithmetic/Load Operations (60%)
– Store Type Instructions (15%)
– Branch Type Instructions (5%)
– Conditional Branch Type (Yes – 12% and No – 8%)
Instruction Prefetch and Branch
Handling
• Arithmetic/Load Operations (60%) :
– These operations require one or two operand
fetches.
– The execution of different operations requires a
different number of pipeline cycles
Instruction Prefetch and Branch
Handling
• Store Type Instructions (15%) :
– It requires a memory access to store the data.
• Branch Type Instructions (5%) :
– It corresponds to an unconditional jump.
Instruction Prefetch and Branch
Handling
• Conditional Branch Type (Yes – 12% and No –
8%) :
– Yes path requires the calculation of the new
address
– No path proceeds to next sequential instruction.
Instruction Prefetch and Branch
Handling
• Arithmetic-load and store instructions do not
alter the execution order of the program.
• Branch instructions and Interrupts cause
some damaging effects on the performance of
pipeline computers.
Handling Example – Interrupt
System of Cray1
Cray-1 System
• The interrupt system is built around an
exchange package.
• When an interrupt occurs, the Cray-1 saves 8
scalar registers, 8 address registers, program
counter and monitor flags.
• These are packed into 16 words and swapped
with a block whose address is specified by a
hardware exchange address register
Instruction Prefetch and Branch
Handling
• In general, the higher the percentage of
branch type instructions in a program, the
slower a program will run on a pipeline
processor.
Effect of Branching on Pipeline Performance
• Consider a linear pipeline of 5 stages
Fetch
Instruction
Decode
Fetch
Operands
Execut
e
Store
Results
Overlapped Execution of Instruction
without branching
I1
I2
I3
I4
I5
I6
I7
I8
I5 is a branch instruction
I1
I2
I3
I4
I5
I6
I7
I8
Estimation of the effect of branching on
an n-segment instruction pipeline
Estimation of the effect of branching
• Consider an instruction cycle with n pipeline
clock periods.
• Let
– p – probability of conditional branch (20%)
– q – probability that a branch is successful (60% of
20%) (12/20=0.6)
Estimation of the effect of branching
• Suppose there are m instructions
• Then no. of instructions of successful branches
= mxpxq (mx0.2x0.6)
• Delay of (n-1)/n is required for each successful
branch to flush pipeline.
Estimation of the effect of branching
• Thus, the total instruction cycle required for m
instructions =
1
mpq(n  1)
n  m  1 
n
n
Estimation of the effect of branching
• As m becomes large , the average no. of
instructions per instruction cycle is given as
m
Lt
m   n  m  1
mpq( n  1)

n
n
=
?
Estimation of the effect of branching
• As m becomes large , the average no. of
instructions per instruction cycle is given as
m
Lt
m   n  m  1
mpq( n  1)

n
n
n

1  pq(n  1)
Estimation of the effect of branching
• When p =0, the above measure reduces to n,
which is ideal.
• In reality, it is always less than n.
Solution = ?
Multiple Prefetch Buffers
• Three types of buffers can be used to match
the instruction fetch rate to pipeline
consumption rate
1. Sequential Buffers: for in-sequence
pipelining
2. Target Buffers: instructions from a branch
target (for out-of-sequence pipelining)
Multiple Prefetch Buffers
• A conditional branch cause both sequential
and target to fill and based on condition one is
selected and other is discarded
Multiple Prefetch Buffers
3. Loop Buffers
– Holds sequential instructions within a loop