CPU Design Basics

Download Report

Transcript CPU Design Basics

55:035
Computer Architecture and Organization
Lecture 9
Outline

Building a CPU






Basic Components
MIPS Instructions
Basic 5 Steps for CPU
Single-Cycle Design
Multi-cycle Design
Comparison of Single and Multi-cycle Designs
55:035 Computer Architecture and Organization
2
Overview

Brief look


Digital logic
CPU Datapath

MIPS Example
55:035 Computer Architecture and Organization
3
Digital Logic
D-type Flip-flop
Multiplexer
A
D Q
Clock
(edgetriggered)
0
F
1
B
S (Select input)
D-type Flip-flop with Enable
D Q
EN
0
D Q
D
EN
(enable)
1
Clock
(edgetriggered)
Q
Clock
(edgetriggered)
55:035 Computer Architecture and Organization
4
Digital Logic
1 Bit
4 Bits
D3
D2
D1
D0
D Q
EN
Clock
(edgetriggered)
Q3
Q2
Q1
Q0
N Bits
D Q
EN
Clock
(edgetriggered)
EN
Clock
(edgetriggered)
Registers
55:035 Computer Architecture and Organization
5
Digital Logic
Tri-state Driver (Buffer)
in
out
drive
In
Drive
Out
0
0
Z
1
0
Z
0
1
0
1
1
1
What is Z ??
55:035 Computer Architecture and Organization
6
Digital Logic
Adder/Subtractor or ALU
B
A
Add/sub or ALUop
Carry-out
Carry-in
F
55:035 Computer Architecture and Organization
7
Overview

Brief look


Digital logic
How to Design a CPU Datapath

MIPS Example
55:035 Computer Architecture and Organization
8
Designing a CPU: 5 Steps

Analyze the instruction set  datapath requirements

MIPS: ADD, SUB, ORI, LW, SW, BR

Meaning of each instruction given by RTL (register transfers)

2 types of registers: CPU/ISA registers, temporary registers

Datapath requirements  select the datapath components

ALU, register file, adder, data memory, etc

Assemble the datapath

Datapath must support planned register transfers

Ensure all instructions are supported


Analyze datapath control required for each instruction
Assemble the control logic
55:035 Computer Architecture and Organization
9
Step 1a: Analyze ISA


All MIPS instructions are 32 bits long.
Three instruction formats:
31

R-type
26
op
rs
6 bits

I-type
31
26
J-type
31
6
0
rd
shamt
funct
5 bits
5 bits
6 bits
16
0
immediate
rt
5 bits
16 bits
0
target address
6 bits

5 bits
11
26
op

5 bits
21
rs
6 bits
16
rt
5 bits
op

21
26 bits
R: registers, I: immediate, J: jumps
These formats intentionally chosen to simplify design
55:035 Computer Architecture and Organization
10
Step 1b: Analyze ISA
31
Rtype

21
op
6 bits
31
I-type
J-type
26
rs
5 bits
26
rt
5 bits
21
op
6 bits
31
16
rs
5 bits
11
rd
5 bits
6
shamt
5 bits
16
rt
5 bits
0
funct
6 bits
0
immediate
16 bits
26
0
op
6 bits
target address
26 bits
Meaning of the fields:


op: operation of the instruction
rs, rt, rd: the source and destination register specifiers





Destination is either rd (R-type), or rt (I-type)
shamt: shift amount
funct: selects the variant of the operation in the “op” field
immediate: address offset or immediate value
target address: target address of the jump instruction
55:035 Computer Architecture and Organization
11
MIPS ISA: subset for today

ADD and SUB




addU rd, rs, rt
subU rd, rs, rt
OR Immediate:

31
op
31
ori rt, rs, imm16

lw rt, rs, imm16
sw rt, rs, imm16
op
31

0
rd
shamt
funct
5 bits
5 bits
6 bits
0
16 bits
16
rt
5 bits
6
immediate
5 bits
21
rs
11
16
rt
5 bits
26
6 bits
BRANCH:
5 bits
21
rs
6 bits
16
rt
5 bits
26
op

21
rs
6 bits
LOAD and STORE Word

26
0
immediate
5 bits
16 bits
beq rs, rt, imm16
31
26
op
6 bits
21
rs
5 bits
16
rt
5 bits
55:035 Computer Architecture and Organization
0
immediate
16 bits
12
Step 2: Datapath Requirements
REGISTER FILE

MIPS ISA requires 32 registers, 32b
each







Called a register file
Contains 32 entries
Each entry is 32b
AddU rd,rs,rt or SubU rd,rs,rt

Register
Numbers
(5 bits ea)
Read two sources rs, rt
Operation rs + rt or rs – rt
Write destination rd ← rs+/-rt
RdReg1
RdData1
RdReg2
REGFILE
WrReg
RdData2
WrData
How to
implement?
RegWrite
Zero?
Requirements



Read two registers (rs, rt)
Perform ALU operation
Write a third register (rd)
Result
ALU
ALUop
55:035 Computer Architecture and Organization
13
Step 3: Datapath Assembly

ADDU rd, rs, rt

SUBU rd, rs, rt
Need an ALU


Hook it up to REGISTER FILE
REGFILE has 2 read ports (rs,rt), 1 write port (rd)
Parameters rs
Come From
rt
Instruction
Fields
rd
Control Signals Depend
Upon Instruction Fields
RdReg1
RdData1
RdReg2
REGFILE
WrReg
RdData2
WrData
RegWrite
Zero?
Result
ALU
ALUop
Eg:
ALUop = f(Instruction)
= f(op, funct)
55:035 Computer Architecture and Organization
14
Steps 2 and 3: ORI Instruction

ORI rt, rs, Imm16


Need new ALUop for ‘OR’ function, hook up to REGFILE
1 read port (rs), 1 write port (rt), 1 const value (Imm16)
From
Instruction
rs
RdReg1
rt
RdReg2
REGFILE
WrReg
RdData2
WrData
rt rd
X
Control Signals
Depend Upon
Instruction Fields
RegWrite
RdData1
ZEROImm16
16-bits EXTEND
Zero?
Result
0
ALU
1
ALUop
ALUsrc
E.g.:
ALUsrc = f(Instruction)
= f(op, funct)
55:035 Computer Architecture and Organization
15
Steps 2 and 3 Destination Register

Must select proper destination, rd or rt

Depends on Instruction Type


R-type may write rd
I-type may write rt
rs
RdReg1
rt
RdReg2
REGFILE
WrReg RdData2
WrData
From
Instruction
rd
1
0
RegDst
RegWrite
RdData1
ZEROImm16
16-bits EXTEND
55:035 Computer Architecture and Organization
Zero?
Result
0
ALU
1
ALUop
ALUsrc
16
Steps 2 and 3: Load Word
LW rt, rs, Imm16


Need Data Memory:


Addr is rs+Imm16, Imm16 is signed, use ALU for +
Store in rt:
rs
rt
RdReg1
1
rd
data ← Mem[Addr]
0
RegDst
rt ← Mem[rs+Imm16]
RdData1
RdReg2
REGFILE
WrReg RdData2
WrData
Imm16 SIGN/
ZERORegWrite
EXTEND
Zero?
Result
0
ALU
1
DATAMEM
Addr
RdData
0
1
ALUsrc ALUop
MemtoReg
ExtOp
17
55:035 Computer Architecture and Organization
Steps 2 and 3: Store Word

SW rt, rs, Imm16



Mem[Addr] ← data
Need Data Memory:
Addr is rs+Imm16, Imm16 is signed, use ALU for +
Mem[rs+Imm16] ← rt
Store in Mem:
rs
RdReg1
rt
RdReg2
REGFILE
WrReg
RdData2
WrData
1
rd
0
RegDst
RdData1
Zero?
Result
0
Imm16 SIGN/
ZEROEXTEND
RegWrite
ALU
1
DATAMEM
Addr
RdData
1
WrData
0
ALUsrc ALUop MemWrite
ExtOp
55:035 Computer Architecture and Organization
MemtoReg
18
Writes: Need to Control Timing

Problem: write to data memory



Data can come anytime
Addr must come first
MemWrite must come after Addr


Solution: use ideal data memory





Else? writes to wrong Addr!
Assume everything works ok
How to fix this for real?
One solution: synchronous memory
Another solution: delay MemWr to come late
Problems?: write to register file



Does RegWrite signal come after WrReg number?
When does the write to a register happen?
Read from same register as being written?
55:035 Computer Architecture and Organization
19
Missing Pieces: Instruction Fetching

Where does the Instruction come from?

From instruction memory, of course!

Recall: stored-program concept


Alternatives? How about hard-coding wires and switches…? This
is how ENIAC was programmed!
How to branch?

BEQ rs, rt, Imm16
55:035 Computer Architecture and Organization
20
Instruction Processing




Fetch instruction
Execute instruction
Fetch next instruction
Execute next instruction

Fetch next instruction
Execute next instruction

Etc…

How to maintain sequence? Use a counter!
Branches (out of sequence) ? Load the counter!


55:035 Computer Architecture and Organization
21
Instruction Processing

Program Counter

Points to current instruction

Address to instruction memory



Instr ← InstrMem[PC]
Next instruction: counts up by 4

Remember: memory is byte-addressable, instructions are 4 bytes

PC ← PC + 4
Branch instruction: replace PC contents
55:035 Computer Architecture and Organization
22
Step 1: Analyze Instructions

Register Transfer Language…
op | rs | rt | rd | shamt | funct = InstrMem[ PC ]
op | rs | rt | Imm16
= InstrMem[ PC ]
Instr
Register Transfers
ADDU
R[rd] ← R[rs] + R[rt];
PC ← PC + 4
SUBU
R[rd] ← R[rs] – R[rt];
PC ← PC + 4
ORI
R[rt] ← R[rs] + zero_ext(Imm16);
PC ← PC + 4
LOAD
R[rt] ← MEM[ R[rs] + sign_ext(Imm16)];
PC ← PC + 4
STORE
MEM[ R[rs] + sign_ext(Imm16) ] ← R[rt];
PC ← PC + 4
BEQ
if ( R[rs] == R[rt] ) then
PC ← PC + 4 + { sign_ext(Imm16)] || b’00’ }
else
PC ← PC + 4
55:035 Computer Architecture and Organization
23
Steps 2 and 3: Datapath & Assembly
Add
4
Read
address
PC
Instruction
[31:0]
Instruction[31:0]
Instruction
Memory

PC: a register


Counter, counts by +4
Provides address to Instruction Memory
55:035 Computer Architecture and Organization
24
Steps 2 and 3: Datapath & Assembly
Add
Add
4
Shift
Left 2
Add
result
0
M
u
x
1
PCSrc
Instruction[25:21]
PC
Read
address
Instruction[20:16]
Instruction
[31:0]
Instruction
Memory
Instruction[15:11]
PC: a register

Instruction[15:0]
(Imm16)
16
Note: the sign-extender for Imm16
is already in the datapath
(everything else is new)
Sign/
Zero
Extend 32

Counter, counts by +4
Sometimes, must add
SignExtend{Imm16||b’00’} for
branch instructions
ExtOp
25
Steps 2 and 3: Add Previous Datapath
Add
Add
4
RegWrite
Instruction[25:21]
PC
Read
address
Instruction[20:16]
Instruction
[31:0]
Instruction
Memory
Read
reg. 1
Instruction[15:11]
0
M
u
x
1
RegDst
Instruction[15:0]
Read
reg. 2
Write
reg.
Read
data 2
Write Register
data
File
(Imm16)
16
Instruction[5:0]
Read
data 1
(funct)
Sign/
Zero
Extend 32
Shift
Left 2
Add
result
ALUSrc ALU Zero
ALU
0
result
M
u
x
1
0
M
u
x
1
PCSrc
MemtoReg
Address
Write
data
Read
data
Data
Memory
ALU
Control
MemWrite
ExtOp
ALUOp
1
M
u
x
0
What have we done?

Created a simple CPU datapath


Control still missing (next slide)
Single-cycle CPU


Every instruction takes 1 clock cycle
Clocking ?
55:035 Computer Architecture and Organization
27
One Clock Cycle

Clock Locations


PC, REGFILE have clocks
Operation

On rising edge, PC will get new value


Maybe REGFILE will have one value updated as well
After rising edge










PC and REGFILE can’t change
New value out of PC
Instruction out of INSTRMEM
Instruction selects registers to read from REGFILE
Instruction controls ALUop, ALUsrc, MemWrite, ExtOp, etc
ALU does its work
DataMem may be read (depending on instruction)
Result value goes back to REGFILE
New PC value goes back to PC
Await next clock edge
55:035 Computer Architecture and Organization
Lots to do
in only
1 clock
cycle !!
28
Missing Steps?

Control is missing (Steps 4 and 5 we mentioned earlier)

Generate the green signals




These are all f(Instruction), where f() is a logic expression
Will look at control strategies in upcoming lecture
Implementation Details

How to implement REGFILE?





ALUsrc, MemWrite, MemtoReg, PCSrc, RegDst, etc
Read port: tristate buffers? Multiplexer? Memory?
Two read ports: two of above?
Write port: how to write only 1 register?
How to control writes to memory? To register file?
More instructions



Shift instructions
Jump instruction
Etc
55:035 Computer Architecture and Organization
29
1-Cycle CPU Datapath
Add
Add
4
RegWrite
Instruction[25:21]
PC
Read
address
Instruction[20:16]
Instruction
[31:0]
Instruction
Memory
Read
reg. 1
Instruction[15:11]
0
M
u
x
1
RegDst
Instruction[15:0]
Read
reg. 2
Write
reg.
Read
data 2
Write Register
data
File
(Imm16)
16
Instruction[5:0]
Read
data 1
(funct)
Sign/
Zero
Extend 32
Shift
Left 2
Add
result
ALUSrc ALU Zero
ALU
0
result
M
u
x
1
0
M
u
x
1
PCSrc
MemtoReg
Address
Write
data
Read
data
Data
Memory
ALU
Control
MemWrite
ExtOp
ALUOp
1
M
u
x
0
1-cycle CPU Datapath + Control
Add
Add
Add
result
4
Instruction
[31:26]
Instruction[25:21]
PC
Read
address
Branch
MemRead
MemtoReg
ALUOp
MemWrite
ALUSrc
RegWrite
Read
reg. 1
Instruction[20:16]
Read
data 1
Read
reg. 2
Instruction
[31:0]
Instruction
Memory
Control
Instruction[15:11]
PCSrc
Shift
Left 2
RegDst
Write
reg.
Write
data
ALU
Zero
ALU
result
Read
data 2
Register
File
Address
Data
Memory
Write
data
Instruction[15:0]
Sign/
Zero
Extend
Instruction[5:0]
Read
data
ALU
control
1-cycle CPU Control – Lookup Table
Input or Output
Inputs
Outputs

Signal Name
R-format
Lw
Sw
Beq
Op5
0
1
1
0
Op4
0
0
0
0
Op3
0
0
1
0
Op2
0
0
0
1
Op1
0
1
1
0
Op0
0
1
1
0
RegDst
1
0
X
X
ALUSrc
0
1
1
0
MemtoReg
0
1
X
X
RegWrite
1
1
0
0
MemRead
0
1
0
0
MemWrite
0
0
1
0
Branch
0
0
0
1
ALUOp1
1
0
0
0
ALUOp0
0
0
0
1
Also: I-type instructions (ORI) & ExtOp (sign-extend control), etc.
1-cycle CPU + Jump Instruction
Instruction[25:0]
Jump address [31..0]
PC + 4 [31..28]
Instruction
[31:26]
Instruction[25:21]
Instruction[20:16]
Instruction[15:11]
Instruction[15:0]
Instruction[5:0]
1-cycle CPU Problems?


Every instruction 1 cycle
Some instructions “do more work”

Eg, lw must read from DATAMEM

All instructions must have same clock period…

Many instructions run slower than necessary

Tricky timing on MemWrite, RegWrite(?) signals


Write signal must come *after* address is stable
Need extra resources…

PC+4 adder, ALU for BEQ instruction, DATAMEM+INSTRMEM
55:035 Computer Architecture and Organization
34
Performance!

Single-Cycle CPU Performance


Execute one instruction per clock cycle (CPI=1)
Clock cycle time? Note dataflow includes:








Not every instruction uses all resources (eg, DATAMEM read)
Can we change clock period for each instruction?



INSTRMEM read
REGFILE access
Sign extension
ALU operation
DATAMEM read
REGFILE/PC write
No! (Why not?)
One clock period: the worst case!
This is why a single-cycle CPU is not good for performance
55:035 Computer Architecture and Organization
35
1-cycle CPU Datapath + Controller
Instruction[25:0]
Jump address [31..0]
PC + 4 [31..28]
Instruction
[31:26]
Instruction[25:21]
Instruction[20:16]
Instruction[15:11]
Instruction[15:0]
Instruction[5:0]
1-cycle CPU Summary

Operation



1 cycle per instruction
Control signals held fixed during entire cycle (except BRANCH)
Only 2 registers





During clock cycle, data flows from register-outputs to register-inputs
Fixed clock frequency / period
Performance



PC, updated every clock cycle
REGFILE, updated when required
1 instruction per cycle
Slowest instruction determines clock frequency
Outstanding issue: MemWrite timing

Assume this signal writes to memory at end of clock cycle
55:035 Computer Architecture and Organization
37
Multi-cycle CPU Goals

Improve performance

Break each instruction into smaller steps / multiple cycles





Aim for 5x clock frequency



Complex instructions (eg, LW)  5 cycles  same performance as before
Simple instructions (eg, ADD)  fewer cycles  faster
Save resources (gates/transistors)



LW instruction  5 cycles
SW instruction  4 cycles
R-type instruction  4 cycles
Branch, Jump  3 cycles
Re-use ALU over multiple cycles
Put INSTR + DATA in same memory
MemWrite timing solved?
55:035 Computer Architecture and Organization
38
Multi-cycle CPU Datapath
PC
M
u
x
Address
Memory
MemData
Write
data
Instruction
[25:21]
Instruction
[20:16]
Instruction
[15:0]
Instruction
Register
Instr[15:0]
Memory
Data
Register
RdReg1
RdData1
M
u
x
A
RdReg2
Instruction
[15:11]
M
u
x
M
u
x
Registers
Write
reg
RdData2
Zero
ALU
result
B
4
Write
data
Sign
Extend
ALU
M
u
x
Shift
Left 2
Instruction[5:0]


Add multiplexers + control signals (IorD, MemtoReg, ALUSrcA, ALUSrcB)
Move signal paths (+4, Shift Left 2)
ALU
Out
Multi-cycle CPU Datapath
PC
M
u
x
Address
Memory
MemData
Write
data
Instruction
[25:21]
Instruction
[20:16]
Instruction
[15:0]
Instruction
Register
Instr[15:0]
Memory
Data
Register
RdReg1
RdData1
M
u
x
A
RdReg2
Instruction
[15:11]
M
u
x
M
u
x
Registers
Write
reg
RdData2
Zero
ALU
result
B
4
Write
data
Sign
Extend
ALU
M
u
x
Shift
Left 2
Instruction[5:0]

Add registers + control signals (IR, MDR, A, B, ALUOut)

Registers with no control signal load value every clock cycle (eg, PC)
ALU
Out
Instruction Execution Example

Execute a “Load Word” instruction


LW rt, 0(rs)
5 Steps
1.
2.
3.
4.
5.
Fetch instruction
Read registers
Compute address
Read data
Write registers
55:035 Computer Architecture and Organization
41
Load Word Instruction Sequence
PC
M
u
x
Address
Memory
MemData
Write
data
Instruction
[25:21]
Instruction
[20:16]
Instruction
[15:0]
Instruction
Register
RdReg1
RdData1
M
u
x
A
RdReg2
Instruction
[15:11]
Instr[15:0]
Memory
Data
Register
M
u
x
M
u
x
Registers
Write
reg
Sign
Extend
Zero
ALU
result
B
4
Write
data
Instruction[5:0]
1. Fetch Instruction
InstructionRegister ← Mem[PC]
RdData2
ALU
Shift
Left 2
M
u
x
ALU
Out
Load Word Instruction Sequence
PC
M
u
x
Address
Memory
MemData
Write
data
Instruction
[25:21]
Instruction
[20:16]
Instruction
[15:0]
Instruction
Register
Instr[15:0]
Memory
Data
Register
RdReg1
RdData1
RdReg2
Instruction
[15:11]
M
u
x
M
u
x
Registers
Write
reg
RdData2
Sign
Extend
ALU
Zero
ALU
result
B
4
Write
data
Instruction[5:0]
2. Read Registers
A ← Registers[Rs]
M
u
x
A
Shift
Left 2
M
u
x
ALU
Out
Load Word Instruction Sequence
PC
M
u
x
Address
Memory
MemData
Write
data
Instruction
[25:21]
Instruction
[20:16]
Instruction
[15:0]
Instruction
Register
Instr[15:0]
Memory
Data
Register
RdReg1
RdData1
M
u
x
A
RdReg2
Instruction
[15:11]
M
u
x
M
u
x
Registers
Write
reg
RdData2
Instruction[5:0]
3. Compute Address
ALUOut ← A + {SignExt(Imm16),b’00’}
Zero
ALU
result
B
4
Write
data
Sign
Extend
ALU
Shift
Left 2
M
u
x
ALU
Out
Load Word Instruction Sequence
PC
M
u
x
Address
Memory
MemData
Write
data
Instruction
[25:21]
Instruction
[20:16]
Instruction
[15:0]
Instruction
Register
Instr[15:0]
Memory
Data
Register
RdReg1
RdData1
RdReg2
Instruction
[15:11]
M
u
x
M
u
x
Registers
Write
reg
RdData2
Sign
Extend
ALU
Zero
ALU
result
B
4
Write
data
Instruction[5:0]
4. Read Data
MDR ← Memory[ALUOut]
M
u
x
A
Shift
Left 2
M
u
x
ALU
Out
Load Word Instruction Sequence
PC
M
u
x
Address
Memory
MemData
Write
data
Instruction
[25:21]
Instruction
[20:16]
Instruction
[15:0]
Instruction
Register
Instr[15:0]
Memory
Data
Register
RdReg1
RdData1
RdReg2
Instruction
[15:11]
M
u
x
M
u
x
Registers
Write
reg
RdData2
Sign
Extend
ALU
Zero
ALU
result
B
4
Write
data
Instruction[5:0]
5. Write Registers
Registers[Rt] ← MDR
M
u
x
A
Shift
Left 2
M
u
x
ALU
Out
Load Word Instruction Sequence
PC
M
u
x
Address
Memory
MemData
Write
data
Instruction
[25:21]
Instruction
[20:16]
Instruction
[15:0]
Instruction
Register
Instr[15:0]
Memory
Data
Register
RdReg1
RdData1
RdReg2
Instruction
[15:11]
M
u
x
M
u
x
Registers
Write
reg
RdData2
Sign
Extend
ALU
Zero
ALU
result
B
4
Write
data
Instruction[5:0]
All 5 Steps Shown
M
u
x
A
Shift
Left 2
M
u
x
ALU
Out
Multi-cycle Load Word: Recap
1. Fetch Instruction
InstructionRegister ← Mem[PC]
2. Read Registers
A ← Registers[Rs]
3. Compute Address
ALUOut ← A + {SignExt(Imm16)}
4. Read Data
MDR ← Memory[ALUOut]
5. Write Registers
Registers[Rt] ← MDR

Missing Steps?

55:035 Computer Architecture and Organization
48
Multi-cycle Load Word: Recap
1. Fetch Instruction
InstructionRegister ← Mem[PC];
2. Read Registers
A ← Registers[Rs]
3. Compute Address
ALUOut ← A + {SignExt(Imm16)}
4. Read Data
MDR ← Memory[ALUOut]
5. Write Registers
Registers[Rt] ← MDR

PC ← PC + 4
Missing Steps?



Must increment the PC
Do it as part of the instruction fetch (in step 1)
Need PCWrite control signal
55:035 Computer Architecture and Organization
49
Multi-cycle R-Type Instruction
1. Fetch Instruction InstructionRegister ← Mem[PC]; PC ← PC + 4
2. Read Registers
A ← Registers[Rs];
B ← Registers[Rt]
3. Compute Value ALUOut ← A op B
4. Write Registers

Registers[Rd] ← ALUOut
RTL describes data flow action in each clock cycle
 Control signals determine precise data flow
 Each step implies unique control values
55:035 Computer Architecture and Organization
50
Multi-cycle R-Type Instruction:
Control Signal Values
1. Fetch Instruction InstructionRegister ← Mem[PC]; PC ← PC + 4
MemRead=1, ALUSrcA=0, IorD=0, IRWrite,
ALUSrcB=01, ALUop=00, PCWrite, PCSource=00
A ← Registers[Rs];
B ← Registers[Rt]
ALUSrcA=0, ALUSrcB=11, ALUop=00
2. Read Registers
3. Compute Value ALUOut ← A op B
ALUSrcA=1, ALUSrcB=00, ALUop=10
Registers[Rd] ← ALUOut
RegDst=1, RegWrite, MemtoReg=0
4. Write Registers

Each step implies unique control values


Fixed for entire cycle
“Default value” implied if unspecified
55:035 Computer Architecture and Organization
51
Check Your Work – Is RTL Valid ?
1. Datapath check

Within one cycle…



Each cycle has valid data flow path (path exists)
Each register gets only one new value
Across multiple cycles…

Register value is defined before use in previous (earlier in time) clock cycle


Eg, “A  3” must occur before “B  A”
Make sure register value doesn’t disappear if set >1 cycle earlier
2. Control signal check

Each cycle, RTL describing the datapath flow implies a value for each control
signal


0 or 1 or default or don’t care
Each control signal gets only one fixed value the entire cycle
3. Overall check

Does the sequence of steps work ?
55:035 Computer Architecture and Organization
52
Multi-cycle BEQ Instruction
1. Fetch Instruction
InstructionRegister ← Mem[PC];
PC ← PC + 4
2. Read Registers, Precompute Target
A ← Registers[Rs] ; B ← Registers[Rt] ; ALUOut ← PC + {SignExt{Imm16},b’00’}
3. Compare Registers, Conditional Branch
if( (A – B) ==0 )
PC ← ALUOut
Green shows PC calculation flow (in parallel with other operations)
55:035 Computer Architecture and Organization
53
Multi-cycle Datapath with Control Signals
PCSrc
PCWrite
IRWrite
IorD
RegWrite
ALUSrcA
Jump
address
[31..0]
MemRead
Instr[25:0]
RegDst
PC[31..28]
Instr[25:21]
Instr[20:16]
Instr[15:0]
In[15:11]
Instr[15:0]
ALU
Control
MemWrite
MemtoReg
ALUSrcB
Instruction[5:0]
55:035 Computer Architecture and Organization
ALUOp
54
Multi-cycle Datapath with Controller
Instr.
[31:26]
Jump
address
[31..0]
Instr[25:0]
Instr[31:26]
PC[31..28]
Instr[25:21]
Instr[20:16]
Instr[15:0]
In[15:11]
Instr[15:0]
Instruction[5:0]
Multi-cycle BEQ Instruction
1. Fetch Instruction
InstructionRegister ← Mem[PC];
PC ← PC + 4
2. Read Registers, Precompute Target
A ← Registers[Rs] ; B ← Registers[Rt] ; ALUOut ← PC + {SignExt{Imm16},b’00’}
3. Compare Registers, Conditional Branch
if( (A – B) ==0 )
PC ← ALUOut
Green shows PC calculation flow (in parallel with other operations)
55:035 Computer Architecture and Organization
56
Multi-cycle Datapath with Control Signals
PCSrc
PCWrite
IRWrite
IorD
RegWrite
ALUSrcA
Jump
address
[31..0]
MemRead
Instr[25:0]
RegDst
PC[31..28]
Instr[25:21]
Instr[20:16]
Instr[15:0]
In[15:11]
Instr[15:0]
ALU
Control
MemWrite
MemtoReg
ALUSrcB
Instruction[5:0]
55:035 Computer Architecture and Organization
ALUOp
57
Multi-cycle Datapath with Controller
Instr.
[31:26]
Jump
address
[31..0]
Instr[25:0]
Instr[31:26]
PC[31..28]
Instr[25:21]
Instr[20:16]
Instr[15:0]
In[15:11]
Instr[15:0]
Instruction[5:0]
Multi-cycle CPU Control: Overview
Control
Signal
Outputs
Control
Signal
Outputs

General approach: Finite State Machine (FSM)

Need details in each branch of control…


Precise outputs for each state (Mealy depends on inputs, Moore does not)
Precise “next state” for each state (can depend on inputs)
55:035 Computer Architecture and Organization
59
How to Implement FSM ?

Manually with logic gates + FFs



High-level language description (eg, Verilog, VHDL)



Bubble diagram, next-state table, state assignment
Karnaugh map for each state bit, each output bit (painful!)
Describe FSM bubble diagram (next-states, output values)
Automatically synthesized into gates + FFs
Microcode (µ-code) description

Sequence through many µ-ops for each CPU instruction



Acts like a mini-CPU within a CPU



One µ-op (µ-instruction) sends correct control signal for 1 cycle
µ-op similar to one bubble in FSM
µPC: microcode program counter
Microcode storage memory contains µ-ops
Can look similar to RTL or some new “assembly language”
55:035 Computer Architecture and Organization
60
FSM Specification: Bubble Diagram
Can build this
by examining
RTL
It is possible to
automatically
convert RTL
into this form !
61
FSM: Gates + FFs Implementation
FSM
High-level
Organization
55:035 Computer Architecture and Organization
62
FSM: Microcode Implementation
Microcode
Storage
(memory)
Datapath
control
outputs
Outputs
Inputs
1
Microprogram Counter
Sequencing
control
Adder
Address Select Logic
Inputs from instruction
register opcode field
55:035 Computer Architecture and Organization
63
Multi-cycle CPU with Control FSM
Conditional
Branch
FSM
Control
Outputs
Instr.
[31:26]
Jump
address
[31..0]
Instr[25:0]
Instr[31:26]
PC[31..28]
Instr[25:21]
Instr[20:16]
Instr[15:0]
In[15:11]
Instr[15:0]
Instruction[5:0]
Control FSM: Overview


General approach: Finite State Machine (FSM)
Need details in each branch of control…
55:035 Computer Architecture and Organization
65
Detailed FSM
66
Detailed FSM
Instruction
Fetch
R-Type Branch
Memory
Reference
Jump
67
Detailed FSM: Instruction Fetch
55:035 Computer Architecture and Organization
68
Detailed FSM: Memory Reference
LW
SW
69
Detailed FSM: R-Type Instruction
55:035 Computer Architecture and Organization
70
Detailed FSM: Branch Instruction
55:035 Computer Architecture and Organization
71
Detailed FSM: Jump Instruction
55:035 Computer Architecture and Organization
72
Performance Comparison
Single-cycle CPU
vs
Multi-cycle CPU
55:035 Computer Architecture and Organization
73
Simple Comparison
1 clock cycle
Single-cycle CPU
All
5 clock cycles
Multi-cycle CPU
LW
4 clock cycles
Multi-cycle CPU
3 clock cycles
Multi-cycle CPU
SW, R-type
BEQ, J
What’s really happening?
Single-cycle CPU
Ideally:
Fetch
Calc
Memory
Addr
( Load Word Instruction )
Decode
Write
Multi-cycle CPU
55:035 Computer Architecture and Organization
75
In practice, steps differ in speeds…
Load Word Instruction
Fetch
Single-cycle CPU
Calc
Decode
Memory
Addr
Wasted time!
Fetch
Write
Violation!
Multi-cycle CPU
Calc
Decode
Memory
Addr
Write
55:035 Computer Architecture and Organization
76
Single-cycle vs Multi-cycle
LW instruction faster for single-cycle
Single-cycle CPU
Calc
Fetch Decode
Memory
Addr
Write
Now wasted time is larger! Violation fixed!
Fetch
Multi-cycle CPU
Calc
Decode
Memory
Addr
55:035 Computer Architecture and Organization
Write
77
Single-cycle vs Multi-cycle
SW instruction ~ same speed
Single-cycle CPU
Fetch
Decode
Calc
Addr
Memory
Speed diff
Wasted time!
Multi-cycle CPU
Fetch
Decode
Calc
Addr
55:035 Computer Architecture and Organization
Memory
78
Single-cycle vs Multi-cycle
BEQ, J instruction faster for multi-cycle
Fetch
Single-cycle CPU
Calc
Decode
Addr
Speed diff
Wasted time!
Multi-cycle CPU
Fetch
Decode
Calc
Addr
55:035 Computer Architecture and Organization
79
Performance Summary

Which CPU implementation is faster?



LW  single-cycle is faster
SW,R-type  about the same
BEQ,J  multi-cycle is faster

Real programs use a mix of these instructions

Overall performance depends instruction frequency !
55:035 Computer Architecture and Organization
80
Implementation Summary

Single-cycle CPU






1 instruction per cycle (eg, 1MHz  1 MIPS)
No “wasted time” on most complex instruction
Large wasted time on simpler instructions
Simple controller (just a lookup table or memory)
Simple instructions
Multi-cycle CPU


<< 1 instruction per cycle (eg, 1MHz  0.2 MIPS)
Small time wasted on most complex instruction


Small time wasted on simple instructions



Hence, this instruction always slower than single-cycle CPU
Eliminates “large wasted time” by using fewer clock cycles
Complex controller (FSM)
Potential to create complex instructions
55:035 Computer Architecture and Organization
81