CS152 Lecture 8

Download Report

Transcript CS152 Lecture 8

EEM 486: Computer Architecture
Designing a Single Cycle
Datapath
The Big Picture: Where are We Now?

The Five Classic Components of a Computer
Processor
Control
Datapath

Input
Memory
Output
Today’s Topic: Design a Single Cycle Processor
The Big Picture: The Performance Perspective
CPI

Performance of a machine is determined by:
◦ Instruction count
◦ Clock cycle time
◦ Clock cycles per instruction

Inst. Count
Processor design (datapath and control) will determine:
◦ Clock cycle time
◦ Clock cycles per instruction

Today:
◦ Single cycle processor:
 Advantage: One clock cycle per instruction
 Disadvantage: long cycle time
Cycle Time
How to Design a Processor: Step-by-Step
1.
Analyze instruction set => datapath requirements
◦ Meaning of each instruction is given by the register transfers
◦ Datapath must include storage element for ISA registers
 possibly more
◦ Datapath must support each register transfer
2.
3.
4.
5.
Select set of datapath components and establish clocking
methodology
Assemble datapath meeting the requirements
Analyze implementation of each instruction to determine setting of
control points that effects the register transfer
Assemble the control logic
The MIPS Instruction Formats

The three instruction formats:
◦ R-type
31
26
op
rs
6 bits
◦ I-type
31
op

31
5 bits
11
6
rd
5 bits
21
rs
5 bits
0
shamt
funct
5 bits
6 bits
16
0
immediate
rt
5 bits
16 bits
26
op
6 bits
The different fields are:
◦
◦
◦
◦
◦
◦
16
rt
5 bits
26
6 bits
◦ J-type
21
0
target address
26 bits
op
: operation of the instruction
rs, rt, rd
: the source and destination register specifiers
shamt
: shift amount
funct
: selects the variant of the operation in the “op” field
address / immediate
: address offset or immediate value
target address
: target address of the jump instruction
Step 1a: The MIPS-lite Subset for Today

ADD/SUB
◦ addU rd, rs, rt
◦ subU rd, rs, rt

OR Immediate:
31
op
31
LOAD/STORE Word
31

BRANCH
◦ beq rs, rt, imm16
26
op
6 bits
31
26
op
6 bits
5 bits
funct
5 bits
6 bits
0
16 bits
0
immediate
5 bits
21
rs
shamt
16
rt
5 bits
0
immediate
5 bits
21
rs
5 bits
6
16
rt
5 bits
11
rd
5 bits
21
rs
6 bits
16
rt
5 bits
26
op
◦ lw rt, rs, imm16
◦ sw rt, rs, imm16
21
rs
6 bits
◦ ori rt, rs, imm16

26
16 bits
16
rt
5 bits
0
immediate
16 bits
Logical Register Transfers

RTL gives the meaning of the instructions

All start by fetching the instruction
op | rs | rt | rd | shamt | funct = MEM[ PC ]
op | rs | rt | Imm16
= MEM[ PC ]
inst
Register Transfers
ADDU
R[rd] <– R[rs] + R[rt];
PC <– PC + 4
SUBU
R[rd] <– R[rs] – R[rt];
PC <– PC + 4
ORi
R[rt] <– R[rs] | zero_ext(Imm16);
PC <– PC + 4
LOAD
R[rt] <– MEM[ R[rs] + sign_ext(Imm16)];
PC <– PC + 4
STORE
MEM[ R[rs] + sign_ext(Imm16) ] <– R[rt];
PC <– PC + 4
BEQ
if ( R[rs] == R[rt] ) then
PC <– PC + 4 +
[sign_ext(Imm16) || 00]
else
PC <– PC + 4
Step 1: Requirements of the Instruction Set

Memory
◦ instruction & data

Registers (32 x 32)
◦ read RS
◦ read RT
◦ Write RT or RD

PC

Extender

Add and Sub registers or register and extended immediate

Logical Or of a register and extended immediate

Add 4 or extended immediate to PC
Step 2: Components of the Datapath



Combinational Elements
Storage Elements
Clocking methodology
Combinational Logic Elements (Basic Building Blocks)
CarryIn
A
Adder

32
Adder
B
Sum
32
Carry
32
Select
MUX
B
32
MUX

A
32
Y
32
OP
A
ALU
B
32
ALU

32
32
Result
Storage Element: Register (Basic Building Block)

Register
Write Enable
◦ Similar to the D Flip Flop except
 N-bit input and output
 Write Enable input
◦ Write Enable:
Data In
Data Out
N
N
 Negated (0): Data Out will not change
 Asserted (1): Data Out becomes Data In
Clk
Storage Element: Register File

Register File consists of 32 registers:
◦ Two 32-bit output busses:
busA and busB
◦ One 32-bit input bus: busW

RW RA RB
Write Enable 5 5 5
busW
32
Clk
32 32-bit
Registers
Register is selected by:
◦ RA (number) selects the register to put on busA (data)
◦ RB (number) selects the register to put on busB (data)
◦ RW (number) selects the register to be written
via busW (data) when Write Enable is 1

Clock input (CLK)
◦ The CLK input is a factor ONLY during write operation
◦ During read operation, behaves as a combinational logic block:
 RA or RB valid => busA or busB valid after “access time.”
busA
32
busB
32
Storage Element: Idealized Memory

Memory (idealized)
◦ One input bus: Data In
◦ One output bus: Data Out

Memory word is selected by:
Write Enable
Address
Data In
32
Clk
◦ Address selects the word to put on Data Out
◦ Write Enable = 1: address selects the memory
word to be written via the Data In bus

Clock input (CLK)
◦ The CLK input is a factor ONLY during write operation
◦ During read operation, behaves as a combinational logic block:
 Address valid => Data Out valid after “access time.”
DataOut
32
Clocking Methodology (Just in case)
Clk
Setup Hold
Setup
Hold
.
.
.
.
.
.
Don’t Care
.
.
.



.
.
.
All storage elements are clocked by the same clock edge
Cycle Time = CLK-to-Q + Longest Delay Path + Setup + Clock Skew
(CLK-to-Q + Shortest Delay Path - Clock Skew) > Hold Time
Step 3: Assemble Datapath Meeting Requirements

Register Transfer Requirements
 Datapath Assembly


Instruction Fetch
Read Operands and Execute Operation
3a: Overview of the Instruction Fetch Unit

The common RTL operations
◦ Fetch the Instruction: mem[PC]
◦ Update the program counter:
 Sequential Code: PC  PC + 4
 Branch and Jump: PC  “something else”
Clk
PC
Next Address
Logic
Address
Instruction
Memory
Instruction Word
32
3b: Add & Subtract

R[rd]  R[rs] op R[rt]
Example: addU rd, rs, rt
◦ Ra, Rb, and Rw come from instruction’s rs, rt, and rd fields
◦ ALUctr and RegWr: control logic after decoding the instruction
31
26
op
21
rs
6 bits
RegWr
5
32
Clk
5
rd
5 bits
Rt
32 32-bit
Registers
5 bits
6
0
shamt
funct
5 bits
6 bits
ALUctr
5
Rw Ra Rb
11
busA
32
busB
32
ALU
busW
rt
5 bits
Rd Rs
16
Result
32
Register-Register Timing: One complete cycle
Clk
Clk-to-Q
PC
Old Value
New Value
Instruction Memory Access Time
Rs, Rt, Rd,
Op, Func
Old Value
ALUctr
Old Value
New Value
RegWr
Old Value
New Value
New Value
Delay through Control Logic
Register File Access Time
busA, B
Old Value
New Value
ALU Delay
busW
Old Value
Rd Rs Rt
RegWr 5 5
5
32 32-bit
Registers
Register Write
Occurs Here
ALUctr
busA
32
busB
32
ALU
busW
32
Clk
Rw Ra Rb
New Value
32
Result
3c: Logical Operations with Immediate

R[rt]  R[rs] op ZeroExt[imm16]
31
26
21
op
rs
6 bits
RegDst
5 bits
16 bits
Rt
Mux
Rs Rt?
5
5
32 32-bit
Registers
ALUctr
busA
Rw Ra Rb
32
busB
ZeroExt
16
32
Mux
32
imm16
immediate
ALU
32
Clk
0
rt
5 bits
Rd
RegWr 5
busW
16
ALUSrc
Result
32
3d: Load Operations

R[rt]  Mem[R[rs] + SignExt[imm16]]
31
26
op
Rd
Mux
RegWr 5
32
Clk
rt
5 bits
immediate
5 bits
16 bits
Rt
Rs Rt?
5
5
Rw Ra Rb
32 32-bit
Registers
busA
W_Src
32
32
ExtOp
32
MemWr
??
ALUSrc
Data In
32
Clk
WrEn Adr
Data
Memory
Mu
x
busB
32
Mux
16
ALUctr
Extender
imm16
0
ALU
busW
16
rs
6 bits
RegDst
21
E.g.: lw rt, rs, imm16
32
3e: Store Operations

Mem[ R[rs] + SignExt[imm16] ]  R[rt]
31
26
21
op
Rd
RegDst
Mux
RegWr 5
5 bits
immediate
5 bits
16 bits
Rt
ALUctr
MemWr
W_Src
Rs Rt
5
5
Rw Ra Rb
32 32-bit
Registers
32
busB
32
32
ExtOp
32
Data In32
Clk
ALUSrc
WrEn Adr
Data
Memory
Mu
x
16
busA
Extender
imm16
rt
Mux
32
Clk
0
ALU
busW
16
rs
6 bits
E.g.: sw rt, rs, imm16
32
3f: The Branch Instruction
31
26
op
6 bits

21
rs
5 bits
16
rt
5 bits
0
immediate
16 bits
beq rs, rt, imm16
◦ mem[PC]
Fetch the instruction from memory
◦ Equal  R[rs] == R[rt] Calculate the branch condition
◦ if (Equal)
Calculate the next instruction’s address
 PC  PC + 4 + ( SignExt(imm16) x 4 )
◦ else
 PC  PC + 4
Datapath for Branch Operations
rs, rt, imm16
31
Datapath generates condition (equal)
26
op
21
rs
6 bits
16
rt
5 bits
0
immediate
5 bits
16 bits
Inst Address
nPC_sel
4
Adder
32
RegWr 5
busW
PC
Mux
Adder
PC Ext
imm16
Clk
Cond
Clk
Rs Rt
5
5
Rw Ra Rb
32 32-bit
Registers
busA
32
busB
32
Equal?
beq
00

Putting it All Together: A Single Cycle Datapath
nPC_sel
RegDst
1
4
Rd
Imm16
ALUctr MemWr MemtoReg
Equal
Rd Rt
0
Rs Rt
5
5
busA
Rw Ra Rb
32 32-bit
Registers
busB
32
RegWr 5
00
busW
imm16
16
32
0
Mux
Clk
Extender
Adder
Clk
32
1
ExtOp ALUSrc
32
32
WrEn Adr
Data In
Data
Memory
Clk
0
Mux
PC
Mux
32
=
ALU
Adder
PC Ext
imm16
Rt
Instruction<31:0>
<0:15>
Rs
<11:15>
Adr
<16:20>
<21:25>
Inst
Memory
1
A Single Cycle Datapath
0
M
u
x
Add ALU
result
Add
4
Instruction [31– 26]
PC
Instruction
memory
Read
register 1
Instruction [20– 16]
Instruction
[31– 0]
Instruction [15– 11]
Shift
left 2
RegDst
Branch
MemRead
MemtoReg
Control ALUOp
MemWrite
ALUSrc
RegWrite
Instruction [25– 21]
Read
address
1
0
M
u
x
1
Read
data 1
Read
register 2
Registers Read
Write
data 2
register
0
M
u
x
1
Write
data
Zero
ALU ALU
result
Address
Write
data
Instruction [15– 0]
16
Instruction [5– 0]
Sign
extend
32
ALU
control
Read
data
Data
memory
1
M
u
x
0
An Abstract View of the Critical Path

Register file and ideal memory:
◦ The CLK input is a factor ONLY during write operation
◦ During read operation, behave as combinational logic:
 Address valid => Output valid after “access time.”
Ideal
Instruction
Memory
Instruction
Rd Rs
5
5
PC
32
Clk
Clk
Rw Ra Rb
32 32-bit
Registers
A
32
32
ALU
Next Address
Instruction
Address
Rt
5
Critical Path (Load Operation) =
PC’s Clk-to-Q +
Instruction Memory’s Access Time +
Register File’s Access Time +
ALU to Perform a 32-bit Add +
Data Memory Access Time +
Setup Time for Register File Write +
Imm
16
Clock Skew
B
32
Data
Address
Data
In
Clk
Ideal
Data
Memory
An Abstract View of the Implementation
Ideal
Instruction
Memory
Clk
PC
32
Instruction
Rd Rs
5
5
Rt
5
Rw Ra Rb
32 32-bit
Registers
Clk
Control Signals Conditions
A
32
32
ALU
Next Address
Instruction
Address
Control
B
32
Datapath
Data
Address
Data
In
Clk
Ideal
Data
Memory
Data
Out
Steps 4 & 5: Implement the control
Next time
Summary

5 steps to design a processor
◦
◦
◦
◦
1. Analyze instruction set => datapath requirements
2. Select set of datapath components & establish clock methodology
3. Assemble datapath meeting the requirements
4. Analyze implementation of each instruction to determine setting of control
points that effects the register transfer.
◦ 5. Assemble the control logic

MIPS makes it easier
◦
◦
◦
◦
Instructions same size
Source registers always in same place
Immediates same size, location
Operations always on registers/immediates

Single cycle datapath => CPI=1, CCT => long

Next time: implementing control