Major CPU Design Steps Datapath 1. Analyze instruction set operations using independent RTN ISA => RTN => datapath requirements. – This provides the the.

Download Report

Transcript Major CPU Design Steps Datapath 1. Analyze instruction set operations using independent RTN ISA => RTN => datapath requirements. – This provides the the.

Major CPU Design Steps
Datapath
1. Analyze instruction set operations using independent RTN
ISA => RTN => datapath requirements.
– This provides the the required datapath components and how they are
connected to meet ISA requirements.
2. Select required datapath components, connections &
establish clock methodology (e.g clock edge-triggered).
+
Determine number of cycles per instruction and operations in each cycle.
3. Assemble datapath meeting the requirements.
Control
4. Identify and define the function of all control points or
signals needed by the datapath.
– Analyze implementation of each instruction to determine setting of control
points that affects its operations and register transfer.
5. Design & assemble the control logic.
– Hard-Wired: Finite-state machine implementation.
– Microprogrammed.
i.e using a control program
3rd Edition Chapter 5.5 – See Handout – Not in 4th Edition
EECC550 - Shaaban
#1 Lec # 5 Winter 2009 1-5-2010
Single Cycle MIPS Datapath:
PCSrc
Branch
Zero
PC+4
ALUop
(2-bits)
Zero
Function
Field
32
Branch
Target
imm16
16
1
MemtoReg
=
32
Data In
32
Clk
32
0
Mux
Clk
Extender
Clk
MemWr
Main
ALU
ALU
1
busW
Mux
PC
Mux
Adder
Rs Rt
5
5
R[rs]
busA
Rw Ra Rb
32
32 32-bit
R[rt]
Registers
busB
0
32
ALU
Control
RegWr 5
0
T = I x CPI x C
Imm16
Rd Rt
0
1
Adder
PC Ext
imm16
Rd
RegDst
00
4
Rt
Instruction<31:0>
<0:15>
Rs
<11:15>
Adr
<16:20>
<21:25>
Inst
Memory
CPI = 1, Long Clock Cycle
WrEn Adr
1
Data
Memory
Jump Not Included
(Includes ORI
not in book version)
ExtOp ALUSrc
EECC550 - Shaaban
#2 Lec # 5 Winter 2009 1-5-2010
Single Cycle MIPS Datapath Extended To Handle Jump with
Control Unit Added
32
Instruction [25–0]
32
Jump address [31–0]
Shift
left 2
26
28
PC + 4 [31–28]
4
Add
PC +4
32
PC +4
32
0
M
u
x
PC +4
Add
4
ALU
result
Branch
Target
1
1
Shift
left 2
RegDst
Jump
0
Book figure may
have an error!
Branch
Opcode
32
M
u
x
MemRead
Instruction [31–26]
MemtoReg
Control
ALUOp
MemWrite
ALUSrc
RegWrite
Instruction [25–21]
PC
Read
address
Instruction [20–16]
Instruction
[31–0]
Instruction
memory
rt
Read
register 1
Read
data 1
Read
register 2
Instruction [15–11]
rd
imm16
1
M
u
x
Read
data 2
Write
register
Write
data
16
R[rs]
Zero
0
Instruction [15–0]
Figure 5.24 page 314
rs
ALU
R[rt]
0
M
u
x
ALU
result
Data
memory
1
Registers
Sign
extend
Address
R[rt]
Write
data
Read
data
1
0
M
u
x
32
32
ALU
control
Function Field
Instruction [5–0]
In this book version, ORI is not supported—no zero extend of immediate needed.
ALUOp (2-bits)
00 = add
01 = subtract
10 = R-Type
EECC550 - Shaaban
#3 Lec # 5 Winter 2009 1-5-2010
Drawbacks of Single-Cycle Processor
1. Long cycle time:
CPI = 1
– All instructions must take as much time as the slowest:
•
Cycle time for load is longer than needed for all other instructions.
– Real memory is not as well-behaved as idealized memory
•
Cannot always complete data access in one (short) cycle.
2. Impossible to implement complex, variable-length instructions and
complex addressing modes in a single cycle.
•
e.g indirect memory addressing.
3. High and duplicate hardware resource requirements
– Any hardware functional unit cannot be used more than once in
a single cycle (e.g. ALUs).
4. Cannot pipeline (overlap) the processing of one instruction with the
previous instructions.
– (instruction pipelining, chapter 6).
EECC550 - Shaaban
#4 Lec # 5 Winter 2009 1-5-2010
Abstract View of Single Cycle CPU
Main
Control
op
ALU
control
2 ns
RegDst
RegWr
MemWr
Result Store
2 ns
Reg.
Wrt
MemRd
MemWr
Mem
Access
ExtOp
ALUSrc
ALUctr
ALU
1 ns
Data
Mem
1 ns
Ext
Register
Fetch
Instruction
Fetch
PC
Next PC
Equal
Branch, Jump
fun
2 ns
One CPU Clock Cycle
Duration C = 8ns
One instruction per cycle CPI = 1
Assuming the following datapath/control hardware components delays:
Memory Units: 2 ns ALU and adders: 2 ns
Register File: 1 ns
Control Unit < 1 ns
EECC550 - Shaaban
#5 Lec # 5 Winter 2009 1-5-2010
Single Cycle Instruction Timing
Arithmetic & Logical
PC
Inst Memory
Load
PC
Inst Memory
Reg File
mux
ALU
mux
Reg File
Critical Path
ALU
Store
PC
Inst Memory
Reg File
Branch
PC
Inst Memory
Reg File
mux
setup
Data Mem
mux setup
(Determines CPU clock cycle, C)
mux
cmp
ALU
Data Mem
mux
Critical Path: Load (e.g 8 ns)
EECC550 - Shaaban
#6 Lec # 5 Winter 2009 1-5-2010
Clock Cycle Time & Critical Path
One CPU Clock Cycle
Duration C = 8ns here
Clk
.
.
.
.
.
.
.
.
.
i.e longest delay
.
.
.
Critical Path
LW in this case
• Critical path: the slowest path between any two storage devices
• Clock Cycle time is a function of the critical path, and must be
greater than:
– Clock-to-Q + Longest Delay Path through the Combination Logic
+ Setup + Clock Skew
Assuming the following datapath/control hardware components delays:
Memory Units: 2 ns ALU and adders: 2 ns
Register File: 1 ns
Control Unit < 1 ns
EECC550 - Shaaban
#7 Lec # 5 Winter 2009 1-5-2010
Reducing Cycle Time: Multi-Cycle Design
• Cut combinational dependency graph by inserting registers / latches.
• The same work is done in two or more shorter cycles, rather than one
long cycle.
storage element
storage element
Two shorter
cycles
One long
cycle
e.g CPI =1
Acyclic
Combinational
Logic
Cycle 1
Acyclic
Combinational
Logic (A)
e.g CPI =2
=>
Storage Element:
Register or memory
Cycle 2
storage element
Place registers to:
• Get a balanced clock cycle length
• Save any results needed for the remaining cycles
storage element
Acyclic
Combinational
Logic (B)
storage element
EECC550 - Shaaban
#8 Lec # 5 Winter 2009 1-5-2010
Basic MIPS Instruction Processing Steps
Instruction Memory
Instruction
Fetch
Next
Obtain instruction from program storage
Instruction  Mem[PC]
Update program counter to address
PC  PC + 4
Instruction
of next instruction
Instruction
Determine instruction type
Decode
Obtain operands from registers
Execute
Compute result value or status
}
Common
steps
for all
instructions
Done by
Control Unit
Result
Store result in register/memory if needed
Store
(usually called Write Back).
EECC550 - Shaaban
#9 Lec # 5 Winter 2009 1-5-2010
Partitioning The Single Cycle Datapath
1
Instruction
Fetch
Cycle
(IF)
Instruction
Decode
2 Cycle
(ID)
Execution
Cycle
3 (EX)
Place registers to:
• Get a balanced clock cycle length
• Save any results needed for the remaining cycles
Data
Memory
Access
4 Cycle
(MEM)
5
Result Store
MemWr
RegDst
RegWr
Reg.
File
MemRd
MemWr
ALUctr
ALUSrc
Exec
Data
Mem
Operand
Fetch
Instruction
Fetch
2 ns
ExtOp
1 ns
1 ns
2 ns
2 ns
Mem
Access
To Control Unit
PC
Next PC
Branch, Jump
Add registers between steps to break into cycles
Write back
Cycle
(WB)
EECC550 - Shaaban
#10 Lec # 5 Winter 2009 1-5-2010
MemToReg
MemRd
MemWr
ALUSrc
ALUctr
R
Mem
Acces
s
B
RegDst
Reg.
RegWr
File
Equal
A
Ext
ALU
Reg
File
M
Instruction
Decode
(ID)
2 1ns
Execution
(EX)
2ns
Data
Mem
Instruction
Fetch
(IF)
2ns
IR
Instruction
Fetch
ExtOp
To Control Unit
PC
Branch, Jump
Next PC
1
Example Multi-cycle Datapath
Memory
Write Back
(MEM)
(WB)
3
4 2ns
5
1ns
All clock-edge triggered (not shown register write enable control lines)
Registers added:
IR:
Instruction register
A, B: Two registers to hold operands read from register file.
R:
or ALUOut, holds the output of the main ALU
M:
or Memory data register (MDR) to hold data read from data memory
CPU Clock Cycle Time: Worst cycle delay = C = 2ns
Assuming the following datapath/control hardware components delays:
Memory Units: 2 ns ALU and adders: 2 ns
Register File: 1 ns
Control Unit < 1 ns
Thus Clock Rate:
f = 1 / 2ns = 500 MHz
(ignoring MUX, CLK-Q delays)
EECC550 - Shaaban
#11 Lec # 5 Winter 2009 1-5-2010
Operations (Dependant RTN) for Each Cycle
R-Type
Logic
Immediate
Load
Store
Branch
IF
Instruction
Fetch
IR Mem[PC]
IR  Mem[PC]
IR  Mem[PC]
IR  Mem[PC]
IR  Mem[PC]
ID
Instruction
Decode
A  R[rs]
A  R[rs]
A  R[rs]
B  R[rt
A  R[rs]
A 
B  R[rt]
B  R[rt]
B  R[rt]
B  R[rt
R[rs]
Zero  A - B
If Zero = 1:
EX
Execution
R  A funct B
R  A OR ZeroExt[imm16]
R A + SignEx(Im16)
R  A + SignEx(Im16)
PC  PC + 4 +
(SignExt(imm16) x4)
else (i.e Zero =0):
PC  PC + 4
MEM
WB
Memory
M Mem[R]
Mem[R]

B
PC  PC + 4
Write
Back
 M
R[rd]  R
R[rt]  R
R[rt]
PC  PC + 4
PC  PC + 4
PC  PC + 4
Instruction Fetch (IF) & Instruction Decode cycles
are common for all instructions
EECC550 - Shaaban
#12 Lec # 5 Winter 2009 1-5-2010
MIPS Multi-Cycle Datapath:
Five Cycles of Load
Cycle 1 Cycle 2
Load
IF
ID
CPI = 5
Cycle 3 Cycle 4 Cycle 5
EX
MEM
WB
1- Instruction Fetch (IF):
Fetch the instruction from instruction Memory.
2- Instruction Decode (ID):
Operand Register Fetch and Instruction Decode.
3- Execute (EX): Calculate the effective memory address.
4- Memory (MEM): Read the data from the Data Memory.
5- Write Back (WB):
Write the loaded data to the register file. Update PC.
EECC550 - Shaaban
#13 Lec # 5 Winter 2009 1-5-2010
Multi-cycle Datapath Instruction CPI
• R-Type/Immediate: Require four cycles, CPI = 4
–
IF, ID, EX, WB
• Loads: Require five cycles, CPI = 5
–
IF, ID, EX, MEM, WB
• Stores: Require four cycles, CPI = 4
– IF, ID, EX, MEM
• Branches/Jumps: Require three cycles, CPI = 3
– IF, ID, EX
• Average or effective program CPI: 3 CPI 5
depending on program profile (instruction mix).
EECC550 - Shaaban
#14 Lec # 5 Winter 2009 1-5-2010
Single Cycle Vs. Multi-Cycle CPU
Clk
8ns (125 MHz)
Cycle 1
Cycle 2
Single Cycle Implementation:
8 ns
Load
Store
Waste
2ns (500 MHz)
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10
Clk
Multiple Cycle Implementation:
Load
IF
Store
ID
EX
MEM
WB
Single-Cycle CPU:
CPI = 1 C = 8ns f = 125 MHz
One million instructions take =
I x CPI x C = 106 x 1 x 8x10-9 = 8 msec
T = I x CPI x C
Assuming the following datapath/control hardware components delays:
Memory Units: 2 ns ALU and adders: 2 ns
Register File: 1 ns
Control Unit < 1 ns
IF
R-type
ID
EX
MEM
IF
Multi-Cycle CPU:
CPI = 3 to 5 C = 2ns f = 500 MHz
One million instructions take from
106 x 3 x 2x10-9 = 6 msec
to 106 x 5 x 2x10-9 = 10 msec
depending on instruction mix used.
EECC550 - Shaaban
#15 Lec # 5 Winter 2009 1-5-2010
Control Unit Design:
•
•
•
Finite State Machine (FSM) Control Model
State specifies control points (outputs) for Register Transfer.
Control points (outputs) are assumed to depend only on the current state
and not inputs (i.e. Moore finite state machine)
Transfer (register/memory writes) and state transition occur upon exiting
the state on the falling edge of the clock.
inputs (opcode, conditions)
Last State
Next State
Logic
State X
Control State
Register Transfer
Control Points
Current State
State Transition Depends
on Inputs
e.g Flip-Flops
Output Logic
Next State
outputs (control points)
To datapath
Moore Finite
State Machine
EECC550 - Shaaban
#16 Lec # 5 Winter 2009 1-5-2010
Control Specification For Multi-cycle CPU
Finite State Machine (FSM) - State Transition Diagram
“instruction fetch”
IR  MEM[PC]
(Start state)
“decode / operand fetch”
R A fun B
ORi
R A or ZX
Memory
Execute
R-type
R[rd] R
PC  PC + 4
R[rt]  R
PC PC + 4
To instruction fetch
LW
SW
BEQ & Zero
BEQ & ~Zero
PC  PC + 4
R A + SX
R A + SX
M MEM[R]
MEM[R]  B
PC PC + 4
R[rt]  M
PC  PC + 4
To instruction fetch
PC PC +
4+ SX || 00
To instruction fetch
13 states:
4 State Flip-Flops needed
Write-back
A  R[rs]
B  R[rt]
EECC550 - Shaaban
#17 Lec # 5 Winter 2009 1-5-2010
Traditional FSM Controller
next
state op cond state
Outputs
control points
Next State
Logic
Output
Logic
State Transition Table
Inputs
11
next
State
control points
Equal
6
Opcode
Current
State
4
State
op
Outputs (Control points)
To datapath
datapath State
State register (4 Flip-Flops)
EECC550 - Shaaban
#18 Lec # 5 Winter 2009 1-5-2010
Traditional FSM Controller
datapath + state diagram => control
• Translate RTN statements into
control points.
• Assign states.
• Implement the controller.
More on FSM controller implementation in Appendix C
EECC550 - Shaaban
#19 Lec # 5 Winter 2009 1-5-2010
Mapping RTNs To Control Points Examples
& State Assignments
IR  MEM[PC]
“instruction fetch”
0000
0
imem_rd, IRen
A  R[rs]
B  R[rt]
Aen, Ben
“decode / operand fetch”
1
0001
ALUfun, Sen
R-type
R A fun B
0100
6
8
BEQ & Zero
SW
BEQ & ~Zero
11
R A or ZX
R A + SX
0110
1000
R A + SX
M MEM[R]
1001
1011
R[rd] R
PC  PC + 4
R[rt]  R
PC PC + 4
0101
0111
To instruction fetch state 0000
0011
MEM[R]  B
PC PC + 4
PC PC +
4+SX || 00
0010
To instruction fetch
state 0000
10
R[rt]  M
PC  PC + 4
1010
To instruction fetch state 0000
2
3
1100
7
PC  PC + 4
12
9
RegDst,
RegWr,
PCen
5
LW
ORi
13 states:
4 State Flip-Flops needed
Write-back
Memory
Execute
4
EECC550 - Shaaban
#20 Lec # 5 Winter 2009 1-5-2010
Detailed Control Specification - State Transition Table
Current
Op field Z
Next IR
??????
BEQ
BEQ
R-type
orI
LW
SW
xxxxxx
xxxxxx
xxxxxx
xxxxxx
xxxxxx
xxxxxx
xxxxxx
xxxxxx
xxxxxx
xxxxxx
xxxxxx
0001 1
0011
0010
0100
0110
1000
1011
0000
1 1
0000
1 0
0101
0000
1 0
0111
0000
1 0
1001
1010
0000
1 0
1100
0000
1 0
State
IF
ID
BEQ
R
ORI
LW
SW
0000
0001
0001
0001
0001
0001
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
?
0
1
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
PC
en sel
Ops
AB
Exec
Ex Sr ALU S
Mem
RWM
Write-Back
M-R Wr Dst
11
11
11
11
11
11
Can be combined in one state
0 1 fun 1
0 0 or
0
1
1
0
1
0
1
1
0
1
1 0 add 1
1 0 1
1 0 add 1
More on FSM controller implementation in Appendix C
0 1
EECC550 - Shaaban
#21 Lec # 5 Winter 2009 1-5-2010
Alternative Multiple Cycle Datapath (In Textbook)
• Minimizes Hardware: 1 memory, 1 ALU
PCWr
ALUSrcA 1
RegWr
32
Din Dout
32
MemRd
Rb
busA A
Rd
B
busW busB
1
1 Mux 0
Imm 16
Extend
32
Zero
32
1
32
Reg File
Rw
0
32
4
0
1
2
3
32
ALU Out
32
Rt 0
5
Ra
32
ALU
1
5
32 Rt
Mux
Ideal
Memory
Rs
Mem Data Reg
Mux
Address
0
PC
Mux
0
Instruction Reg
32
32
RegDst
32
PC
32
PCSrc
Mux
PCWrCond
Zero
IorD
MemWr
IRWr
32
ALU
Control
<< 2
ALUOp
MemtoReg
ALUSrcB
EECC550 - Shaaban
#22 Lec # 5 Winter 2009 1-5-2010
Alternative Multiple Cycle Datapath (In Textbook)
rs
rt
rd
imm16
i.e MDR
• Shared instruction/data memory unit
• A single ALU shared among instructions
• Shared units require additional or widened multiplexors
• Temporary registers to hold data between clock cycles of the instruction:
• Additional registers:
Instruction Register (IR), Memory Data Register (MDR), A, B, ALUOut
(Figure 5.27 page 322)
EECC550 - Shaaban
#23 Lec # 5 Winter 2009 1-5-2010
Alternative Multiple Cycle Datapath With Control Lines
(Fig 5.28 In Textbook)
32
2
2
PC+ 4
PC
32
32
32
rs
Branch
Target
rt
rd
32
32
2
imm16
32
(ORI not supported, Jump supported)
(Figure 5.28 page 323)
EECC550 - Shaaban
#24 Lec # 5 Winter 2009 1-5-2010
The Effect of The 1-bit Control Signals
Signal
Name
Effect when deasserted (=0)
Effect when asserted (=1)
RegDst
The register destination number for the
write register comes from the rt field
(instruction bits 20:16).
RegWrite
None
The register destination number for the
write register comes from the rd field
(instruction bits 15:11).
The register on the write register input
is written with the value on the Write
data input.
ALUSrcA
The first ALU operand is the PC
The First ALU operand is register A (i.e R[rs])
MemRead
None
MemWrite
None
Content of memory specified by the address input
are put on the memory data output.
Memory contents specified by the address input is
replaced by the value on the Write data input.
MemtoReg
The value fed to the register write data
input comes from ALUOut register.
The value fed to the register write data
input comes from data memory register (MDR).
IorD
The PC is used to supply the address to the
memory unit.
The ALUOut register is used to supply the the
address to the memory unit.
IRWrite
None
The output of the memory is written into
Instruction Register (IR)
PCWrite
None
The PC is written; the source is controlled by
PCSource
PCWriteCond None
(Figure 5.29 page 324)
The PC is written if the Zero output of the ALU is
also active.
EECC550 - Shaaban
#25 Lec # 5 Winter 2009 1-5-2010
The Effect of The 2-bit Control Signals
Signal
Name
Value (Binary)
00
The ALU performs an add operation
01
The ALU performs a subtract operation
10
The funct field of the instruction determines the ALU
operation (R-Type)
00
The second input of the ALU comes from register B
01
The second input of the ALU is the constant 4
10
The second input of the ALU is the sign-extended 16-bit
immediate (imm16) field of the instruction in IR
The second input of the ALU is is the sign-extended 16-bit
immediate field of IR shifted left 2 bits (for branches)
ALUOp
ALUSrcB
11
00
PCSource
Effect
(i.e R[rs])
Output of the ALU (PC+4) is sent to the PC for writing
01
The content of ALUOut (the branch target address) is sent
to the PC for writing
10
The jump target address (IR[25:0] shifted left 2 bits and
concatenated with PC+4[31:28] is sent to the PC for writing
i.e jump address
(Figure 5.29 page 324)
EECC550 - Shaaban
#26 Lec # 5 Winter 2009 1-5-2010
Operations (Dependant RTN) for Each Cycle
R-Type
IF
ID
EX
Instruction
Fetch
Instruction
Decode
Execution
IR Mem[PC]
PC  PC + 4
WB
Store
IR  Mem[PC]
PC  PC + 4
IR  Mem[PC]
PC  PC + 4
A  R[rs]
A  R[rs]
A 
B  R[rt]
B  R[rt]
B  R[rt]
ALUout  PC +
(SignExt(imm16)
x4)
ALUout  PC +
ALUout 
ALUout 
A funct B
MEM
Load
(SignExt(imm16) x4)
A + SignEx(Imm16)
Branch
IR  Mem[PC]
PC  PC + 4
A 
R[rs]
ALUout  PC +
(SignExt(imm16) x4)
R[rs]
A + SignEx(Imm16)
IR  Mem[PC]
PC  PC + 4
A 
R[rs]
B  R[rt]
B  R[rt]
ALUout  PC +
ALUout  PC +
(SignExt(imm16) x4)
Zero  A - B
ALUout 
Jump
(SignExt(imm16) x4)
PC  Jump Address
Zero: PC ALUout
Memory
MDR Mem[ALUout]
Write
Back
R[rd] ALUout
R[rt]
Mem[ALUout]

B
 MDR
Instruction Fetch (IF) & Instruction Decode (ID) cycles
are common for all instructions
EECC550 - Shaaban
#27 Lec # 5 Winter 2009 1-5-2010
High-Level View of Finite State
Machine Control
(Figure 5.32)
2-5
6-7
(Figure 5.33)
•
•
•
•
(Figure 5.34)
0-1
8
9
(Figure 5.35)
(Figure 5.36)
First steps are independent of the instruction class
Then a series of sequences that depend on the instruction opcode
Then the control returns to fetch a new instruction.
Each box above represents one or several state.
(Figure 5.31 page 332)
EECC550 - Shaaban
#28 Lec # 5 Winter 2009 1-5-2010
FSM State Transition
Diagram (From Book)
IF
A  R[rs]
ID
B  R[rt]
ALUout  PC +
(Figure 5.38 page 339)
(SignExt(imm16) x4)
IR  Mem[PC]
PC  PC + 4
ALUout 
A + SignEx(Imm16)
PC  Jump Address
EX
ALUout  A func B
Zero  A -B
Zero: PC ALUout
MDR Mem[ALUout]
WB
MEM
R[rd] ALUout
Mem[ALUout]  B
Total 10 states
R[rt]
MDR
WB
EECC550 - Shaaban
More on FSM controller implementation in Appendix C
#29 Lec # 5 Winter 2009 1-5-2010
Instruction Fetch (IF) and Decode (ID)
FSM States
A 
R[rs]
B  R[rt]
ALUout  PC + (SignExt(imm16) x4)
IF
IR  Mem[PC]
PC  PC + 4
(Figure 5.33)
(Figure 5.32 page 333)
(Figure 5.34)
ID
(Figure 5.35)
(Figure 5.36)
EECC550 - Shaaban
#30 Lec # 5 Winter 2009 1-5-2010
Instruction Fetch (IF) Cycle (State 0)
IR  Mem[PC]
PC  PC + 4
MemRead = 1
ALUSrcA = 0
ALUSrcB = 01 ALUOp = 00 (add)
IorD = 0
PCWrite = 1
IRWrite =1
PCSource = 00
32
00
1
2
2
1
0
1
01
1
PC
32
PC+ 4
0
32
32
rs
Branch
Target
rt
rd
32
32
2
imm16
00
Add
32
(ORI not supported, Jump supported)
(Figure 5.28 page 323)
EECC550 - Shaaban
#31 Lec # 5 Winter 2009 1-5-2010
Instruction Decode (ID) Cycle (State 1)
A 
R[rs]
ALUSrcA = 0
B  R[rt]
ALUout  PC + (SignExt(imm16) x4)
ALUSrcB = 11
ALUOp = 00 (add)
(Calculate branch target)
32
2
2
11
PC
32
PC+ 4
0
32
32
rs
Branch
Target
rt
rd
32
32
2
imm16
00
Add
32
(ORI not supported, Jump supported)
(Figure 5.28 page 323)
EECC550 - Shaaban
#32 Lec # 5 Winter 2009 1-5-2010
Load/Store Instructions FSM States
(From Instruction Decode)
ALUout A + SignEx(Imm16)
EX
i.e Effective address calculation
MDR Mem[ALUout]
MEM
R[rt]
Mem[ALUout]  B
MDR
WB
(Figure 5.33 page 334)
To Instruction Fetch
(Figure 5.32)
EECC550 - Shaaban
#33 Lec # 5 Winter 2009 1-5-2010
Load/Store Execution (EX) Cycle (State 2)
Effective address calculation
ALUout A + SignEx(Imm16)
ALUSrcA = 1
ALUOp = 00 (add)
ALUSrcB = 10
32
2
2
10
PC
32
PC+ 4
1
32
32
rs
Branch
Target
rt
rd
32
32
2
imm16
00
Add
32
(ORI not supported, Jump supported)
(Figure 5.28 page 323)
EECC550 - Shaaban
#34 Lec # 5 Winter 2009 1-5-2010
Load Memory (MEM) Cycle (State 3)
MDR Mem[ALUout]
MemRead = 1
IorD = 1
32
2
2
1
1
PC+ 4
PC
32
32
32
rs
Branch
Target
rt
rd
32
32
2
imm16
32
(ORI not supported, Jump supported)
(Figure 5.28 page 323)
EECC550 - Shaaban
#35 Lec # 5 Winter 2009 1-5-2010
Load Write Back (WB) Cycle (State 4)
R[rt]
MDR
RegWrite = 1
MemtoReg = 1
RegDst = 0
32
2
2
PC+ 4
1
PC
32
32
0
32
rs
Branch
Target
rt
rd
32
32
2
1
imm16
32
(ORI not supported, Jump supported)
(Figure 5.28 page 323)
EECC550 - Shaaban
#36 Lec # 5 Winter 2009 1-5-2010
Store Memory (MEM) Cycle (State 5)
Mem[ALUout]  B
MemWrite = 1
IorD = 1
32
2
2
1
1
PC+ 4
PC
32
32
32
rs
Branch
Target
rt
rd
32
32
2
imm16
32
(ORI not supported, Jump supported)
(Figure 5.28 page 323)
EECC550 - Shaaban
#37 Lec # 5 Winter 2009 1-5-2010
(From Instruction Decode)
R-Type Instructions
FSM States
EX
ALUout  A funct B
WB
R[rd] ALUout
To State 0 (Instruction Fetch)
(Figure 5.32)
(Figure 5.34 page 335)
EECC550 - Shaaban
#38 Lec # 5 Winter 2009 1-5-2010
R-Type Execution (EX) Cycle (State 6)
ALUout  A funct B
ALUSrcA = 1
ALUSrcB = 00
ALUOp = 10 (R-Type)
32
2
2
00
PC
32
PC+ 4
1
32
32
rs
Branch
Target
rt
rd
32
32
2
imm16
10
R-Type
32
(ORI not supported, Jump supported)
(Figure 5.28 page 323)
EECC550 - Shaaban
#39 Lec # 5 Winter 2009 1-5-2010
R-Type Write Back (WB) Cycle (State 7)
R[rd] ALUout
RegWrite = 1
MemtoReg = 0
RegDst = 1
32
2
2
PC+ 4
1
PC
32
32
1
32
rs
Branch
Target
rt
rd
32
32
2
0
imm16
32
(ORI not supported, Jump supported)
(Figure 5.28 page 323)
EECC550 - Shaaban
#40 Lec # 5 Winter 2009 1-5-2010
Branch Instruction
Single EX State
Jump Instruction
Single EX State
(From Instruction Decode)
(From Instruction Decode)
Zero  A - B
PC  Jump Address
Zero : PC ALUout
EX
EX
To State 0 (Instruction Fetch)
(Figure 5.32)
(Figures 5.35, 5.36 page 337)
To State 0 (Instruction Fetch)
(Figure 5.32)
EECC550 - Shaaban
#41 Lec # 5 Winter 2009 1-5-2010
Branch Execution (EX) Cycle (State 8)
Zero  A - B
Zero : PC ALUout
ALUSrcA = 1
PCWriteCond = 1
ALUSrcB = 00
PCSource = 01
ALUOp = 01 (Subtract)
32
1
01
2
2
00
PC
32
PC+ 4
1
32
32
rs
Branch
Target
rt
rd
32
32
2
imm16
01
Subtract
32
(ORI not supported, Jump supported)
(Figure 5.28 page 323)
EECC550 - Shaaban
#42 Lec # 5 Winter 2009 1-5-2010
Jump Execution (EX) Cycle (State 9)
PC  Jump Address
PCWrite = 1
PCSource = 10
32
10
1
2
2
1
PC+ 4
PC
32
32
32
rs
Branch
Target
rt
rd
32
32
2
imm16
32
(ORI not supported, Jump supported)
(Figure 5.28 page 323)
EECC550 - Shaaban
#43 Lec # 5 Winter 2009 1-5-2010
MIPS Multi-cycle Datapath
Performance Evaluation
• What is the average CPI?
– State diagram gives CPI for each instruction type.
– Workload (program) below gives frequency of each type.
Type
CPIi for type
Frequency
CPIi x freqIi
Arith/Logic
4
40%
1.6
Load
5
30%
1.5
Store
4
10%
0.4
branch
3
20%
0.6
Average CPI:
4.1
Better than CPI = 5 if all instructions took the same number
of clock cycles (5).
T = I x CPI x C
EECC550 - Shaaban
#44 Lec # 5 Winter 2009 1-5-2010
Adding Support for swap to Multi Cycle Datapath
• You are to add support for a new instruction, swap that
exchanges the values of two registers to the MIPS multicycle
datapath of Figure 5.28 on page 232
R[rt]  R[rs]
swap $rs, $rt
R[rs]  R[rt]
• Swap used the R-Type format with:
the value of field rs = the value of field rd
• Add any necessary datapaths and control signals to the
multicycle datapath. Find a solution that minimizes the
number of clock cycles required for the new instruction without
modifying the register file. Justify the need for the
modifications, if any.
i.e No additional register write ports
• Show the necessary modifications to the multicycle control
finite state machine of Figure 5.38 on page 339 when adding
the swap instruction. For each new state added, provide the
dependent RTN and active control signal values.
EECC550 - Shaaban
#45 Lec # 5 Winter 2009 1-5-2010
Adding swap Instruction Support to Multi Cycle Datapath
Swap $rs, $rt
R[rt]  R[rs]
We assume here rs = rd in instruction encoding
op
R[rs]  R[rt]
rs rt
[31-26] [25-21]
[20-16]
rd
[10-6]
2
2
PC+ 4
rs
R[rs]
rt
Branch
Target
R[rt]
rd
2
3
imm16
2
The outputs of A and B should be connected to the multiplexor controlled by MemtoReg if one of the two fields
(rs and rd) contains the name of one of the registers being swapped. The other register is specified by rt.
The MemtoReg control signal becomes two bits.
EECC550 - Shaaban
#46 Lec # 5 Winter 2009 1-5-2010
Adding swap Instruction Support to Multi Cycle Datapath
IF
A  R[rs]
IR  Mem[PC]
PC  PC + 4
ID
B  R[rt]
ALUout  PC +
(SignExt(imm16) x4)
EX
ALUout 
A + SignEx(Imm16)
WB1
R[rd]  B
ALUout  A func B
Zero  A -B
Zero: PC ALUout
WB2
R[rt]  A
R[rd] ALUout
MEM
WB
Swap takes 4 cycles
WB
EECC550 - Shaaban
#47 Lec # 5 Winter 2009 1-5-2010
Adding Support for add3 to Multi Cycle Datapath
•
You are to add support for a new instruction, add3, that adds the values of
three registers, to the MIPS multicycle datapath of Figure 5.28 on page 232
For example:
add3 $s0,$s1, $s2, $s3
Register $s0 gets the sum of $s1, $s2 and $s3.
The instruction encoding uses a modified R-format, with an additional register
specifier rx added replacing the five low bits of the “funct” field.
6 bits
[31-26]
•
•
5 bits
[25-21]
5 bits
[20-16]
5 bits
[15-11]
OP
rs
rt
rd
add3
$s1
$s2
$s0
6 bits
[10-5]
5 bits
[4-0]
rx
Not used
$s3
Add necessary datapath components, connections, and control signals to the multicycle
datapath without modifying the register bank or adding additional ALUs. Find a solution
that minimizes the number of clock cycles required for the new instruction. Justify the
need for the modifications, if any.
Show the necessary modifications to the multicycle control finite state machine of Figure
5.38 on page 339 when adding the add3 instruction. For each new state added, provide
the dependent RTN and active control signal values.
EECC550 - Shaaban
#48 Lec # 5 Winter 2009 1-5-2010
add3 instruction support to Multi Cycle Datapath
Add3 $rd, $rs, $rt, $rx
rx is a new register specifier in field [0-4] of the instruction
No additional register read ports or ALUs allowed
R[rd]  R[rs] + R[rt] + R[rx]
Modified
R-Format
op
rs rt
[31-26] [25-21]
[20-16]
rd
rx
[10-6]
[4-0]
2
WriteB
Re adSrc
2
2
rs
rt
2
PC+ 4
Branch
Target
rx
rd
imm16
1. ALUout is added as an extra input to first ALU operand MUX to use the previous ALU result as an input for the second addition.
2. A multiplexor should be added to select between rt and the new field rx containing register number of the 3rd operand
(bits 4-0 for the instruction) for input for Read Register 2.
This multiplexor will be controlled by a new one bit control signal called ReadSrc.
3. WriteB control line added to enable writing R[rx] to B
EECC550 - Shaaban
#49 Lec # 5 Winter 2009 1-5-2010
add3 instruction support to Multi Cycle Datapath
IF
A  R[rs]
IR  Mem[PC]
PC  PC + 4
B  R[rt]
ID
ALUout  PC +
(SignExt(imm16) x4)
EX
ALUout 
WriteB
A + SignEx(Im16)
EX1
ALUout  A + B
WriteB
B  R[rx]
ALUout  A func B
Zero  A -B
Zero: PC ALUout
EX2
ALUout  ALUout + B
R[rd] ALUout
MEM
WB
Add3 takes 5 cycles
WB
EECC550 - Shaaban
#50 Lec # 5 Winter 2009 1-5-2010