Transcript Document

Building A CPU
• We’ve built a small ALU
• Add, Subtract, SLT, And, Or
• Could figure out Multiply and Divide...
• What about the rest
• How do we deal with memory and registers?
• What about control operations (branches)?
• How do we interpret instructions?
• The whole thing...
• A CPU’s datapath deals with moving data around
• A CPU’s control manages the data
5.1
Datapath Overview
ALU Computes on:
R-type: 2 registers
I-type: Register and data
Current Instruction: PC
PC
Read address
Instruction [31-0]
Instruction
Memory
Read reg. num A
Read reg data A
Read reg. num B
Registers
Read address
Data Memory
Result
Write reg num
Read reg dataB
Read data
Write address
Write data
Write reg data
Instructions:
R-type: 3 registers
I-type: 2 registers, Data
Data to write into
dest. register from:
ALU or Memory
Memory:
Address from ALU
Data to/from regs
5.1
Instruction Datapath
• Instructions will be held in
4
Add
PC
Read address
Instruction
Instruction
Memory
Note: Regular instruction width
(32 for MIPS) makes this easy
the instruction memory
• The instruction to fetch is at
the location specified by the
PC
• Instr. = M[PC]
• After we fetch one
instruction, the PC must be
incremented to the next
instruction
• All instructions are 4 bytes
• PC = PC + 4
5.2
R-type Instruction Datapath
Instruction
Read reg.
num AA
reg num
Read reg data A
Read reg num B
Registers
Write reg num
Read reg data B
Zero
Result
ALU
Write reg data
• R-type Instructions have three registers
• Two read (Rs, Rt) to provide data to the ALU
• One write (Rd) to receive data from the ALU
• We’ll need to specify the operation to the ALU (later...)
• We might be interested if the result of the ALU is zero (later...)
5.2
Memory Operations
Read reg.
num AA
reg num
Read reg data A
Read reg num B
Instruction
Read address
Zero
Result
Registers
Data Memory
Read data
Write address
Write reg num
Read reg data B
Write data
Write reg data
16
sign
extend
32
• Memory operations first need to compute the effective address
• LW $t1, 450($s3)
# E.A. = 450 + $s3
• Add together one register and 16 bits of immediate data
• Immediate data needs to be converted from 16-bit to 32-bit
• Memory then performs load or store using destination register
5.2
Branches
• Branches conditionally
PC + 4
Result
Sh.
Left
2
Instruction
Read reg.
num AA
reg num
Read reg data A
Read reg num B
Registers
Write reg num
Read reg data B
Write reg data
offset
16
sign
extend
32
Add
To control
logic
Zero
Result
change the next instruction
• BEQ $2, $1, 42
• The offset is specified as
the number of words to be
added to the next
instruction (PC+4)
• Take offset, multiply by 4
• Shift left two
• Add this to PC+4 (from PC
logic)
• Control logic has to decide if
the branch is taken
• Uses ‘zero’ output of ALU
5.2
Integrating the R-types and Memory
Read reg.
num AA
reg num
Read reg data A
Read reg num B
Instruction
Read address
Zero
Result
Registers
Write reg num
Read reg data B
Data Memory
Read data
Write address
1
0
0
Write data
Write reg data
Memory
Datapath
16
sign
extend
1
32
• R-types and Load/Stores are similar in many respects
• Differences:
• 2nd ALU source: R-types use register, I-types use Immediate
• Write Data: R-types use ALU result, I-types use memory
• Mux the conflicting datapaths together
5.3
Adding the instruction memory
Simply add the instruction memory
and PC to the beginning of the datapath.
4
Result
Add
PC
Read reg.
num AA
reg num
Read reg data A
Read reg num B
Read address
Instruction [31-0]
Read address
Zero
Result
Registers
Instruction
Memory
16
Write reg num
Read reg data B
0
Write reg data
1
sign
extend
Data Memory
Read data
Write address
1
0
Write data
32
Separate Instruction and Data memories are needed in order to allow
the entire datapath to complete its job in a single clock cycle.
5.3
Adding the Branch Datapath
0
4
Result
Result
Sh.
Left
2
Add
PC
Read reg.
num AA
reg num
Read reg data A
Read reg num B
Read address
Instruction [31-0]
16
Zero
Result
Write reg num
Read reg data B
0
Write reg data
1
sign
extend
Add
Read address
Registers
Instruction
Memory
1
Data Memory
Read data
Write address
1
0
Write data
32
Now we have the datapath for R-type, I-type, and branch instructions.
On to the control logic!
5.3
When does everything happen?
0
4
Result
clk
Result
Sh.
Left
2
Add
PC
Single-Cycle Design
Read address
Zero
Result
Registers
Instruction
Memory
Write reg num
Read reg data B
0
Write reg data
1
16
sign
extend
Data Memory
Read data
Write address
32
1
0
Write data
clk
Combinational Logic:
Just does it! Outputs are
always just a function of its
inputs (with some delay)
Add
Read reg.
num AA
reg num
Read reg data A
Read reg num B
Read address
Instruction [31-0]
1
clk
Registers: Written at the end of the clock cycle.
(Rising edge triggered).
5.3
Example
• Suppose it takes:
• memory 100 nsec to read a word,
• the ALU and adders take 4 nsec,
• the register file can be read or written in 1 nsec,
• the PC can be read or written in 0.2 nsec,
• all multiplexors take 0.1 nsec.
• Assume everything else takes 0 time (control, shift,
•
•
•
•
sign extend, wires, etc.).
How long will it take to execute an add instruction?
How long will it take to execute a lw instruction?
How long will it take to execute a beq instruction?
How long will it take to execute a j instruction?
What do we need to control?
4
Result
RegistersShould we
write data?
0
Result
Sh.
Left
2
Add
PC
Add
Read reg.
num AA
reg num
Read reg data A
Read reg num B
Read address
Instruction [31-0]
16
Zero
Result
Write reg num
Read reg data B
0
Write reg data
1
sign
extend
1
Mux - Result from
ALU or Memory?
Read address
Registers
Instruction
Memory
Mux - are we
branching or not?
Data Memory
Read data
Write address
1
0
Write data
32
Mux - Where
does 2nd ALU
operand come
from?
ALU What is the
Operation?
MemoryRead/Write/neither?
Almost all of the information we need is in the instruction!
5.3
The ALU
• The ALU is stuck right in the middle of everything...
• It must:
• Add, Subtract, And, or Or for arithmetic instructions
• Subtract for a branch on equal
BInvert
CarryIn
• Subtract and set for a SLT
• Add for a memory access
A
Function
And
Or
Add
Subtract
SLT
BInvert
0
0
0
1
1
Op
00
01
10
10
11
Carryin
0
0
0
1
1
Result
R=A•B
R=AB
R=A+B
R=A-B
R = 1 if A < B
0 if A B
Operation
0
1
Result
B
0
1
+
2
3
Less
CarryOut
Always the same: Combine into one signal called “sub”
5.3
Setting the ALU controls
• The instruction Opcode and Function give us the info we need
• For R-type instructions, Opcode is zero, function code
determines ALU controls
• For I-type instructions, Opcode determines ALU controls
New control signal: ALUOp is 00 for memory, 01 for Branch, and 10 for R-type
Instruction
Opcode ALUOp Funct. Code ALU action
add
sub
and
or
SLT
load word
store word
branch equal
R-type
R-type
R-type
R-type
R-type
LW
SW
BEQ
10
10
10
10
10
00
00
01
100000
100010
100100
100101
101010
xxxxxx
xxxxxx
xxxxxx
add
subtract
and
or
SLT
add
add
subtract
ALU control
sub op
0 10
1 10
0 00
0 01
1 11
0 10
0 10
1 10
5.3
Controlling the ALU
For ALUOp = 00 or 01,
function code is unused
AluOp is determined by Opcode separate logic will generate ALUOp
ALUOp
00
x1
1x
1x
1x
1x
1x
F5
x
x
x
x
x
x
x
F4
x
x
x
x
x
x
x
F3
x
x
0
0
0
0
1
F2
x
x
0
0
1
1
0
F1
x
x
0
1
0
0
1
Since ALUOp can only
be 00, 01, or 10, we
don’t care what ALUOp2
is when ALUOP1 is 1
F0
x
x
0
0
0
1
0
Function
Add
Sub
Add
Sub
And
Or
SLT
ALU Ctrl
0 10
1 10
0 10
1 10
0 00
0 01
1 11
ALUOp1
F1
ALUOp0
F2
A2
A1
A0
F3
F0
A 6-input truth table use standard
minimization techniques
5.3
Decoding the Instruction - Data
The instruction holds the key to all of the data signals
R-type
Memory,
Branch
31-26
25-21
20-16
Opcode
RS
RT
To ctrl
logic
Read
reg. A
Read
reg. B
31-26
25-21
20-16
Opcode
RS
RT
To ctrl
logic
Read
reg. A
Write
reg./
Read
reg. B
15-11
10-6
5-0
RD ShAmt Function
Write
reg.
Not
Used
To ALU
Control
15-0
Immediate Data
Memory address or Branch Offset
One problem - Write register number must come from two different places.
5.3
Instruction Decoding
0
Opcode: [31-26]
4
Result
Result
Add
PC
We can decode the data simply
by dividing up the instruction bus
Read address
Instruction [31-0]
Instruction
Memory
Read Reg A: Rs
Read Reg B: Rt
Op:[31-26]
Sh.
Left
2
Ctrl
Add
Rs:[25-21] Read reg.
num AA
reg num
Read
reg
A
Rt:[20-16] Read reg numdata
B
1
Rd:
[15-11]
Imm:
[15-0]
Read address
Zero
Result
Registers
0
16
Write reg num
Read reg data B
0
Write reg data
1
sign
extend
1
Data Memory
Read data
Write address
1
0
Write data
32
Write Reg: Either Rd or Rt
Immediate Data: [15-0]
5.3
Control Signals
0
4
Result
Load,R-type
Result
Add
Op:[31-26]
Ctrl
Rs:[25-21]
Read reg.
num AA
reg num
Read reg data A
Read reg num B
Rt:[20-16]
PC
Read address
Instruction [31-0]
Instruction
Memory
RegWrite
Registers
0
1
Rd:
[15-11]
1
BEQ and zero
Add
Sh.
Left
2
PCSrc
Store
ALUSrc
0
Write reg data
1
MemToReg
Read address
Memory
Write reg num
Read reg data B
MemWrite Load
Zero
Result
Data Memory
Read data
Write address
0
Write data
RegDest
R-type
Imm:
[15-0]
16
sign
extend
FC:[5-0]
32
6
1
ALU
Ctrl
00: Memory
01: Branch
10: R-type
MemRead
ALUOp
ALU Control - A function of: ALUOp and the function code
5.3
Load
Inside the control oval
0:Reg
1:Imm
1:Mem
0:ALU
00:Mem
01:Branch
1:Branch 10:R-type
0:Rt
1:Rd
Reg ALU Mem Reg
Instruction Opcode Write Src To Reg Dest
Mem Mem
Read Write PCSrc ALUOp
R-format
LW
000000 1
100011 1
0
1
0
1
1
0
0
1
0
0
0
0
10
00
SW
BEQ
101011 0
000100 0
1
0
x
x
x
x
0
0
1
0
0
1
00
01
• This control logic can be decoded in several ways:
• Random logic, PLA, PAL
• Just build hardware that looks for the 4 opcodes
• For each opcode, assert the appropriate signals
Note: BEQ must also check the zero output of the ALU...
5.3
We must AND
BEQ and Zero
Control Signals
0
4
Result
Result
Sh.
Left
2
Add
Op:[31-26]
BEQ
MemToReg
MemRead
MemWrite
ALUOp
ALUSrc
RegWrite
Rt:[20-16]
PC
Instruction
Memory
PCSrc
Write Read
Read address
Read reg.
num AA
reg num
Read reg data A
Read reg num B
Zero
Result
Registers
0
Write reg num
Read reg data B
1
Rd:
[15-11]
Imm:
[15-0]
Add
Ctrl
RegDest
Rs:[25-21]
Read address
Instruction [31-0]
1
sign
extend
FC:[5-0]
Read data
Write address
1
32
1
0
0
Write data
Write reg data
16
Data Memory
ALU
Ctrl
6
5.3
Jumping
26
4
32
Sh.
Left
2
28
0
0
Concat.
4
Result
[31-28]
Result
Add
BEQ
MemToReg
MemRead
MemWrite
ALUOp
ALUSrc
RegWrite
Rt:[20-16]
Instruction
Memory
Add
PCSrc
Write Read
Read address
Read reg.
num AA
reg num
Read reg data A
Read reg num B
Zero
Result
Registers
0
Write reg num
Read reg data B
1
Rd:
[15-11]
Imm:
[15-0]
1
Ctrl
RegDest
Rs:[25-21]
Read address
Instruction [31-0]
Sh.
Left
2
Jump
J:[25-0]
Op:[31-26]
PC
1
sign
extend
FC:[5-0]
Read data
Write address
1
32
1
0
0
Write data
Write reg data
16
Data Memory
ALU
Ctrl
6
5.3
Performance
What major functional units are used by different instructions?
R-type: Instr. Fetch Register Read ALU Register Write
6ns
LW:
Instr. Fetch Register Read ALU Memory Read Register Write 8ns
SW:
Instr. Fetch Register Read ALU Memory Write
7ns
Branch: Instr. Fetch Register Read ALU
5ns
Jump:
2ns
Instr. Fetch
Assume the following times:
Memory Access: 2ns
ALU: 2ns
Registers: 1ns
Since the longest time is 8ns (LW),
the cycle time must be at least 8ns.
Example
• Calculate the execution times for the following
program in a Single-cycle datapath with a cycle time
of 50 ns
main:
add $9, $0, $0
# clear $9
lw $8, Tonto($9)
# put Tonto[0] in $8
addi $9, $9, 4
# increment $9
lw $10, Tonto($9) # put Tonto[1] in $10
add $11, $10, $8
Example 2
Calculate the execution times for the following
program in a Single-cycle datapath with a cycle time
of 50 ns
.data
ARRAY: .word 3, 5, 7, 9, 2 #random values
SUM: .word 0
#initialize sum to zero
.text
main:
addi $6, $0, 5
#initialize loop counter to 5
addi $7, $0, 0
#initialize array index to zero
addi $8, $0, 0
#set $8 (sum temp) to zero
REPEAT:
lw $5, ARRAY($7) #R5 = ARRAY[i]
add $8, $8, $5
#SUM+= ARRAY[I]
addi $7, $7, 4
#increment index (i++)
addi $6, $6, -1 #decrement loop counter
bne $6, $0, REPEAT #check if 5 repetitions
sw $8, SUM($0) #copy sum to memory
addi $v0, $0, 10 #exit program
syscall