Introduction - AWARDSPACE.COM
Download
Report
Transcript Introduction - AWARDSPACE.COM
Basic MIPS Architecture:
Single-Cycle Datapath and Control
Chapter 4
Sections 4.1 – 4.4
Appendix D.1 and D.2
Dr. Iyad F. Jafar
Outline
Introduction
Clocking
Single-cycle Datapath
Single-cycle Control
Performance Analysis
2
Introduction
So far, we have built a small ALU
ADD, SUB, SLT, AND, OR, …
What about
Memory and registers?
Control operations?
Interpreting (decoding) instructions?
The big picture
The CPU’s datapath deals with moving data
around
The CPU’s control manages the data
Fetch
PC = PC+4
Generic implementation
Execute
3
Decode
Clocking
The clocking methodology defines when signals can
be read and when they are written
An edge-triggered methodology
Typical execution
read contents of state elements
send values through combinational logic
write results to one or more state elements
State
Element
Combinational
logic
State
Element
clock
one clock cycle
4
Assumes state elements are written on every clock cycle; if not,
need explicit write control signal
write occurs only when both the write control is asserted and the clock edge
occurs
Single-Cycle Datapath
The first implementation considered
All instructions start and finish execution in
one cycle!
This include the time required to fetch,
decode, and execute the instruction
In the following, we will consider the
datapath of each of these steps
5
Single-Cycle Datapath
Fetch Datapath
Fetching the instruction from memory requires
Sending the PC to memory to read the instruction
Update the PC to point to the next instruction
+
4
Instruction
Memory
PC
Read
Address
Data
Instruction
Do we need an explicit write signal for writing the PC?
6
Do we need an explicit read signal for reading the memory?
Single-Cycle Datapath
Decode Datapath
Regardless of the instruction
Send the opcode (31-26) and the function (5-0) fields of the
instruction to the control unit
Read two registers; rs (25-21) and rt (20-16)
Reading is not harmful!
Control
Unit
Read Addr 2
Instruction
Write Addr
Write Data
7
Register File
Read Addr 1
Read
Data 1
R[rs]
Read
Data 2
R[rt]
Single-Cycle Datapath
Inside the Register File
How can we read a register out of 32 registers?
0
Register 1
1
Register 2
….
Register 31
31
32-to-1 MUX
Register 0
Read Data 1
32-to-1 MUX
Read Register 1
Read Data 2
Read Register 2
0
1
8
31
Single-Cycle Datapath
Inside the Register File
How can we write a register out of 32 registers?
Clock
Write
Register Number
5-to-32 Decoder
0
C
D
1
C
D
C
31
D
C
D
C
Write Data
9
D
Register 0
Register 1
Register 2
…..
Register 31
Single-Cycle Datapath
Execution Datapath
R-type instructions (ADD, SUB, SLT, AND, OR)
The two registers are read already!
Perform operation based on OPCODE and FUNC fields
Store the result back into the register file (the destination
register is specified in rd field of the instruction (15-11)!
RegWrite
Write Addr
Write Data
10
Read
Data 1
Read
Data 2
R[rs]
R[rt]
ALU
Read Addr 2
Write
Register File
Instruction
Read Addr 1
ALU Control
The register file is not written on every cycle! Need an
explicit write signal
Single-Cycle Datapath
Execution Datapath
Load Instruction
Compute the load address
Store the loaded data in the register file. The destination
register is the rt field of the instruction (20-16)
RegWrite
Write Addr
Write Data
ALU Control
Read
Data 2
R[rs]
R[rt]
Address
ALU
Read
Data 1
Data Memory
Read Addr 2
Write
Register File
Instruction
Read Addr 1
MemRead
Data
Write
Data
11
Sign
Ext.
MemWrite
Single-Cycle Datapath
Execution Datapath
Store Instruction
Compute the load address
Store register in the memory
RegWrite
Write Addr
Write Data
ALU Control
R[rs]
Read R[rt]
Data 2
Address
ALU
Read
Data 1
Data Memory
Read Addr 2
Write
Register File
Instruction
Read Addr 1
MemRead
Data
Write
Data
Sign
Ext.
12
MemWrite
Single-Cycle Datapath
Execution Datapath
Branch Instruction
Compare the two registers
Compute the branch address
Branch
Address
Change PC if true !
+
+
RegWrite
Zero
Branch
Address
13
Instruction
Read Addr 2
Write Addr
Write Data
Sign
Ext.
Write
Register File
1
PC
0
Read Addr 1
Zero
Read
Data 1
Read
Data 2
x4
ALU
4
ALU
Control
Single-Cycle Datapath
Execution Datapath
Jump Instruction
Compute the jump address
Store it in the PC
+
jump address
4
Jump
1
14
PC
0
Instruction
Memory
Read
Address
Data
Instruction
x4
Single-Cycle Datapath
Creating the Single Datapath
Assemble the datapath segments and add control
lines and multiplexors as needed
Single cycle design
Fetch, decode and execute each instructions in one clock
cycle
No datapath resource can be used more than once per
instruction, so some must be duplicated (e.g., separate
Instruction Memory and Data Memory, several adders)
Multiplexors needed at the input of shared elements with
control lines to do the selection
Write signals to control writing to the Register File and
Data Memory
Cycle time is determined by length of the longest path
15
Single-Cycle Datapath
1
Instr[25-0] Shift
left 2
+
4
0
PC[31-28]
Jump
Branch
ALUOp
Instr[31-26]
Read
Instr[31-0]
Address
MemWrite
MemtoReg
ALUSrc
RegWrite
ovf
Read Addr 1
Register Read
Instr[20-16]
Read Addr 2Data 1
File
0
Write Addr Read
1
Instr[15
-11]
Write Data
Instr[15-0]
16
16
PCSrc
MemRead
Instr[25-21]
PC
1
Control
Unit
RegDst
Instruction
Memory
+
Shift
left 2
0
zero
0
Data 2
Sign
Extend
ALU
1
32
Instr[5-0]
ALU
control
Address
Data
Memory Read Data
1
Write Data
0
Single-Cycle Control
Need to design the control that generates the
appropriate control signals based on the Opcode and
Function fields to
Specify the operation of the ALU
Control the data flow by selecting the appropriate input of the
multiplexors
With the following observations across different
instructions
Op field is always in bits 31-26 of the instruction
Address of registers to be read are always specified by
The rs field (bits 25-21)
The rt field (bits 20-16)
For LW and SW, the rs field is the base register
Address of register to be written is in one of two places
For LW, the address is the rt field (bits 20-16 )
For R-type, the address is the rd field (bits 15-11)
17
Offset for BEQ, LW, and SW is always in bits 15-0 of the instruction
Single-Cycle Control
Signal Name
RegDst
Effect when Asserted (1)
The destination register is from rt
The destination register is from rd field
field
RegWrite
None
Enable writing to the register selected
by the Write register port
ALUSrc
The second ALU operand comes
from the second register file
output
The second ALU operand is the sign
extended offset
PCSrc
PC value is PC+4
PC is the branch address
MemRead
None
Contents of memory address are put
on Read data output
MemWrite
None
Data on the Write data input is placed
in the specified address
MemtoReg
The data fed to the register file
Write data input comes from ALU
The data fed to the register file Write
data input comes from memory
ALUOp
18
Effect when Deassereted (0)
Used with the function field of the instruction to generate the ALUOp
signal that specify the ALU operation
R-type Instruction Data/Control Flow
1
Instr[26-0] Shift
left 2
+
4
0
PC[31-28]
Jump
Branch
ALUOp
Instr[31-26]
Read
Instr[31-0]
Address
MemWrite
MemtoReg
ALUSrc
RegWrite
ovf
Read Addr 1
Register Read
Instr[20-16]
Read Addr 2Data 1
File
0
Write Addr Read
1
Instr[15
-11]
Write Data
Instr[15-0]
16
19
PCSrc
MemRead
Instr[25-21]
PC
1
Control
Unit
RegDst
Instruction
Memory
+
Shift
left 2
0
zero
0
Data 2
Sign
Extend
ALU
1
32
Instr[5-0]
ALU
control
Address
Data
Memory Read Data
1
Write Data
0
Load Word Instruction Data/Control Flow
1
Instr[26-0] Shift
left 2
+
4
0
PC[31-28]
Jump
Branch
ALUOp
Instr[31-26]
Read
Instr[31-0]
Address
MemWrite
MemtoReg
ALUSrc
RegWrite
ovf
Read Addr 1
Register Read
Instr[20-16]
Read Addr 2Data 1
File
0
Write Addr Read
1
Instr[15
-11]
Write Data
Instr[15-0]
16
20
PCSrc
MemRead
Instr[25-21]
PC
1
Control
Unit
RegDst
Instruction
Memory
+
Shift
left 2
0
zero
0
Data 2
Sign
Extend
ALU
1
32
Instr[5-0]
ALU
control
Address
Data
Memory Read Data
1
Write Data
0
Branch Equal Instruction Data/Control Flow
1
Instr[26-0] Shift
left 2
+
4
0
PC[31-28]
Jump
Branch
ALUOp
Instr[31-26]
Read
Instr[31-0]
Address
MemWrite
MemtoReg
ALUSrc
RegWrite
ovf
Read Addr 1
Register Read
Instr[20-16]
Read Addr 2Data 1
File
0
Write Addr Read
1
Instr[15
-11]
Write Data
Instr[15-0]
16
21
PCSrc
MemRead
Instr[25-21]
PC
1
Control
Unit
RegDst
Instruction
Memory
+
Shift
left 2
0
zero
0
Data 2
Sign
Extend
ALU
1
32
Instr[5-0]
ALU
control
Address
Data
Memory Read Data
1
Write Data
0
Jump Instruction Data/Control Flow
1
Instr[26-0] Shift
left 2
+
4
0
PC[31-28]
Jump
Branch
ALUOp
Instr[31-26]
Read
Instr[31-0]
Address
MemWrite
MemtoReg
ALUSrc
RegWrite
ovf
Read Addr 1
Register Read
Instr[20-16]
Read Addr 2Data 1
File
0
Write Addr Read
1
Instr[15
-11]
Write Data
Instr[15-0]
16
22
PCSrc
MemRead
Instr[25-21]
PC
1
Control
Unit
RegDst
Instruction
Memory
+
Shift
left 2
0
zero
0
Data 2
Sign
Extend
ALU
1
32
Instr[5-0]
ALU
control
Address
Data
Memory Read Data
1
Write Data
0
Single-Cycle Control
The Main Control Unit
The input is the Op field (6 bits) from the instruction
The output is nine control signals
The truth table !
23
Op4
Op3
Op2
Op1
Op0
RegDist
ALUsrc
MemtoReg
RegWrite
MemRead
MemWrite
Branch
ALUop1
ALUop0
Outputs
Op5
Inputs
R-type
0
0
0
0
0
0
1
0
0
1
0
0
0
1
0
LW
1
0
0
0
1
1
0
1
1
1
1
0
0
0
0
SW
1
0
1
0
1
1
X
1
X
0
0
1
0
0
0
BEQ
0
0
0
1
0
0
X
0
X
0
0
0
1
0
1
Single-Cycle Control
The Main Control Unit
To design the logic circuit, generate the appropriate
minterms for each output signal
Simply, use a PLA!
24
Single-Cycle Control
The ALU Control Unit
It has two inputs
ALUop (2 bits) from Main control
Func (6 bits) from the instruction
It has two outputs
Bengate (1 bits)
Operation (2 bits)
Supported Operations
Function
and
or
add
sub
slt
25
Bnegate
0
0
0
1
1
Operation
00
01
10
10
11
ALUop
Func
Bnegate
ALU
control Operation
Single-Cycle Control
The ALU Control Unit
Truth Table !
ALUop0
F5
F4
F3
F2
F1
F0
Bnegate
Operation
1
Operation
0
26
Outputs
ALUop1
Inputs
AND
1
0
1
0
0
1
0
0
0
0
0
OR
1
0
1
0
0
1
0
1
0
0
1
ADD
1
0
1
0
0
0
0
0
0
1
0
SUB
1
0
1
0
0
0
1
0
1
1
0
SLT
1
0
1
0
1
0
1
0
1
1
1
LW
0
0
n/a
0
1
0
SW
0
0
n/a
0
1
0
BEQ
0
1
n/a
1
1
0
Single-Cycle Control
The ALU Control Unit
Hardware Implementation
Generating minterms!! Minimization!!
By inspection!
27
Performance Analysis
All instructions have to finish in one cycle!
How long is the cycle time?
Different units are used in different instructions
Each unit has its own delay
Need to find the longest path!
Assume the following times
Unit
Delay
ALU
2 ns
Memory
2 ns
Register File
1 ns
R-type:
Instr. Fetch
Register Read
ALU
Register Write
6ns
LW:
Instr. Fetch
Register Read
ALU
Memory Read Register Write
8ns
SW:
Instr. Fetch
Register Read
ALU
Memory Write
7ns
Branch:
Instr. Fetch
Register Read
ALU
Jump:
Instr. Fetch
28
Thus, the cycle time should be at least 8 ns
5ns
2ns
Performance Analysis
The cycle time is fixed!
However, not all instructions require the same time! There
is a wasted time for some instructions?!
Cycle 1
Cycle 2
Clock
LW
SW
waste
Possible Solution?
29
Performance Analysis
Example 1. Example 1. consider
the following two
implementations of a single cycle machine:
Machine A : all instructions execute in one cycle of fixed
length
Machine B: all instructions execute in one cycle , however, the
cycle time adapts to instruction types
Use the information given in the tables to compare the two
machines
30
Instruction type
Percentage %
Unit
Time (ps)
ALU
45
Memory
200
Load
25
ALU and adders
100
Store
10
Register File
50
Branch
15
Jump
5
Performance Analysis
Example 1. Continued.
CPU Execution Time = IC x CPI x Clock cycle time
CPI A = CPIB = 1
ICA = ICB
CCA= 600 ns
Instruction
Type
Inst.
Register
Memory
Read
ALU
Data
Register
Memory
Write
Total
R-type
200
50
100
0
50
400
Load
200
50
100
200
50
600
Store
200
50
100
200
550
Branch
200
50
100
0
350
Jump
200
200
CCB = 600 x 0.25 + 550 x 0.1 + 400 x 0.45 + 350 x 0.15 + 200 x 0.05 =
31
447.5 ps
PerformancB / PerformanceA = 600 / 447.5 = 1.34
So, adaptive clock cycle is faster; however it is hard to implement !
Single Cycle Disadvantages &
Advantages
Single-cycle implementation assumes that all
instructions can execute in one cycles
Advantages
Simple and easy to understand
Disadvantages
Hardware duplication!
Uses the clock cycle inefficiently – the clock cycle must
be timed to accommodate the slowest instruction
(especially problematic for more complex instructions like
floating point multiply)
32