Μετατροπή της γλώσσας Ε σε λογικό προγρα
Download
Report
Transcript Μετατροπή της γλώσσας Ε σε λογικό προγρα
LAB3: IMPROVING MIPS
PERFORMANCE WITH
PIPELINING
Electrical and Computer Engineering
University of Cyprus
22-10-2014
Previous Labs: MIPS single-cycle
with Memory System
2
In this Lab: Pipelining
You are expected to:
Understand the concept
Be familiar with the 5 MIPS pipeline stages
Understand the three pipeline hazards and
their solutions.
3
PIPELINING
Technique in which the execution of several
instructions is overlapped.
Each instruction is broken into several stages.
Stages can operate concurrently PROVIDED
WE HAVE SEPARATE RESOURCES FOR
EACH STAGE! => each step uses a different
functional unit.
Note: execution time for a single instruction is
NOT improved. Throughput of several
instructions is improved.
4
Major Pipeline Benefit = Performance
Ideal Performance
Time/instruction = non-piped-time/#stages
This is an asymptote of course, but +10% is
commonly achieved
Two ways to view the performance mechanism
Reduced CPI (i.e. non-piped to piped change)
Close to 1 instruction/cycle if you’re lucky
Reduced cycle-time (i.e. increasing pipeline depth)
Work split into more stages
Simpler stages result in faster clock cycles
5
Other Pipeline Benefits
Completely hardware mechanism
All modern machines are pipelined
This was the key technique to advancing
performance in the 80’s
In the 90’s the move was to multiple
pipelines
Beware, no benefit is totally free/good
6
Laundry analogy to pipelining
7
Implementation of a RISC (Unpipelined,
Multicycle)
Implementation of an integer subset of a RISC
architecture that takes at most 5 clock cycles.
Instruction Fetch (IF)
Instruction Decode/Register Fetch (ID)
Execution/Effective Address Calculation (EX)
Memory Access (MEM)
Write-Back (WB)
8
Review: Single-cycle Datapath for MIPS
Stage 5
PC
Instruction
Memory
(Imem)
Stage 1
Registers
Stage 2
ALU
Stage 3
Data
Memory
(Dmem)
Stage 4
Use datapath figure to represent pipeline
IFtch Dcd Exec Mem WB
Reg
ALU
IM
DM
Reg
9
Classic 5 Stage Pipeline for a RISC
Processor
10
Pipelined Execution Representation
Time
IFtch Dcd Exec Mem WB
IFtch Dcd Exec Mem WB
IFtch Dcd Exec Mem WB
IFtch Dcd Exec Mem WB
IFtch Dcd Exec Mem WB
Program Flow
To simplify pipeline, every instruction takes
same number of steps, called stages
One clock cycle per stage
11
Important Pipeline Characteristics
Latency
Time it takes an instruction to go through the
pipe
Latency = # stages * stage-delay
Dominant feature if there are a lot of
exceptions…
Throughput
Determined by the rate at which instructions
can start/finish
Dominant feature if there are few exceptions
12
Pipelining Lessons
Pipelining doesn’t help latency (execution time) of
single task, it helps throughput of entire workload
Multiple tasks operating simultaneously using
different resources
Potential speedup = Number of pipe stages
Time to “fill” pipeline and time to “drain” it reduces
speedup
Pipeline rate limited by slowest pipeline stage
Unbalanced lengths of pipe stages also reduces
speedup
13
Single Cycle Datapath
4
P
C
Read
Addr
31:0
Instruction
Imem
15:11
a
d
d
25:21
20:16
M
u
x
a
d
d
<<
2
M
u
x
PCSrc
MemWrite
Read
Reg1
Read
Read data1
Reg2
Read
Write
data2
Reg
Write
Data
Regs
RegDst
RegWrite
15:0
Sign
Extend
M
u
x
A
L
U
Read
data
Zero
Address
MemToReg
Dmem
ALUcon
ALUsrc
Write
Data
MemRead
ALUOp
M
u
x
14
Required Changes to Datapath
Introduce registers to separate 5 stages by
putting IF/ID, ID/EX, EX/MEM, and
MEM/WB registers in the datapath.
Next PC value is computed in the 3rd step,
but we need to bring in next instruction in
the next cycle.
Branch address is computed in 3rd stage.
With pipeline, the PC value has changed!
Must carry the PC value along with
instruction.
15
Changes to Datapath (cont.)
For lw instruction, we need write register
address at stage 5. But the IR is now
occupied by another instruction! So, we
must carry the IR destination field as we
move along the stages. See connection in
fig.
16
Pipelined Datapath (with Pipeline Regs)
Fetch
Decode
Execute
Memory
Write
Back
0
M
u
x
1
IF/ID
EX/MEM
ID/EX
MEM/WB
Add
4
Add
Add
result
PC
Ins truction
Shift
left 2
Address
Read
register 1
Read
data 1
Read
register 2
Read
data 2
Write
register
Imem
Write
data
0
M
u
x
1
Regs
Zero
ALU ALU
result
Address
Write
data
16
Sign
extend
32
Read
data
1
M
u
x
0
Dmem
5
64 bits
133 bits
102 bits
69 bits
17
Pipelined Datapath (with Control Signals)
18
Control for Pipelined Datapath
Basic approach: build on single-cycle control
Place control unit in ID stage
Pass control signals to following stages
RegDst
ALUOp[1:0]
ALUSrc
MemRead
MemWrite
Branch
RegWrite
MemtoReg
19
Control for Pipelined Datapath
Execution/Addre ss Calculation M em ory access stage Write-back stage
stage control lines
control lines
control lines
ALU
ALU
M em
M em
Reg
M em to
Instruction Reg Dst Op1
Op0 ALU Src Branch Read
Write
w rite
Reg
R-format
1
1
0
0
0
0
0
1
0
lw
0
0
0
1
0
1
0
1
1
sw
X
0
0
1
0
0
1
0
X
beq
X
0
1
0
1
0
0
0
X
20
Datapath and Control Unit
21
Tracking Control Signals - Cycle 1
LW
22
Tracking Control Signals - Cycle 2
SW
LW
23
Tracking Control Signals - Cycle 3
1
0
01
ADD
SW
LW
24
Tracking Control Signals - Cycle 4
0
0
1
SUB
ADD
SW
LW
25
Tracking Control Signals - Cycle 5
1
1
SUB
ADD
SW
LW
26
Can pipelining get us into trouble?
Yes: Pipeline Hazards
Hazards occur because data required for
executing the current instruction may not be
available.
Structural hazards: attempt to use the same
resource two different ways at the same time
Control hazards: attempt to make a decision
before condition is evaluated
branch instructions
interrupts
27
Pipelining troubles (cont.)
Data hazards occur when an instruction needs
register contents for an arithmetic/
logical/memory instruction, before they are
ready.
instruction depends on result of prior
instruction still in the pipeline
Can always resolve hazards by waiting:
pipeline control must detect the hazard
take action (or delay action) to resolve
hazards
28
Hazard detection & Forwarding Units
29
Forwarding: A solution to Data Hazards
30
Stalls
Forwarding will not always solve the problems of data hazards.
For example, suppose an add instruction follows a load word (lw), and
the add involves the register that receives the memory data.
In this case, forwarding will not work.
The reason is that the data must be read from memory, and so it will
not be available until the end of the MEM cycle. Thus the required data
is not available for a forward, and the add instruction. if it proceeds, will
process the wrong data.
A solution to this problem is the stall.
A stall halts the instruction awaiting data, while the key instruction (a lw
in this case) proceeds to the end of the MEM cycle, after which the
desired data is available to the add.
31
Other Problems With Branches
A remaining problem is what to do about instructions following
a branch.
Even assuming forwarding and stalls, the branch/no branch
decision is not made until the third stage.
This means that in the MIPS pipeline, two following
instructions will enter the pipe before the branch/no branch
decision is made.
What if:
The following instructions were for the case of “branch taken” and the
branch was not taken.
The following instructions were for “branch not taken” and it was taken.
In either case, the wrong instructions are in the pipe and they
must be eliminated (“flushed”).
How can this problem be prevented?
32
Control Hazard Approach
One approach is to always assume the branch is(or is not)
taken:
Say we assume the branch is never taken. Then if the
instruction in ALU/EX is a branch, the instructions in IF and
ID/RF will be those in the “not taken” program line (branch
determination is made in ALU/EX).
It this assumption is correct, the pipeline will continue to flow
without delay.
When the branch is taken, instructions in IF and ID/RF must be
“flushed,” usually by changing the “op” code of those instructions
to a “nop” and letting them proceed to the end of the pipe.
33
Summary
Pipelining is a fundamental concept in
computers/nature
Multiple instructions in flight
Limited by length of longest stage
Latency vs.Throughput
Hazards gum up the works
34