Transcript Document
Building A CPU • We’ve built a small ALU • Add, Subtract, SLT, And, Or • Could figure out Multiply and Divide... • What about the rest • How do we deal with memory and registers? • What about control operations (branches)? • How do we interpret instructions? • The whole thing... • A CPU’s datapath deals with moving data around • A CPU’s control manages the data 5.1 Datapath Overview ALU Computes on: R-type: 2 registers I-type: Register and data Current Instruction: PC PC Read address Instruction [31-0] Instruction Memory Read reg. num A Read reg data A Read reg. num B Registers Read address Data Memory Result Write reg num Read reg dataB Read data Write address Write data Write reg data Instructions: R-type: 3 registers I-type: 2 registers, Data Data to write into dest. register from: ALU or Memory Memory: Address from ALU Data to/from regs 5.1 Instruction Datapath • Instructions will be held in 4 Add PC Read address Instruction Instruction Memory Note: Regular instruction width (32 for MIPS) makes this easy the instruction memory • The instruction to fetch is at the location specified by the PC • Instr. = M[PC] • After we fetch one instruction, the PC must be incremented to the next instruction • All instructions are 4 bytes • PC = PC + 4 5.2 R-type Instruction Datapath Instruction Read reg. num AA reg num Read reg data A Read reg num B Registers Write reg num Read reg data B Zero Result ALU Write reg data • R-type Instructions have three registers • Two read (Rs, Rt) to provide data to the ALU • One write (Rd) to receive data from the ALU • We’ll need to specify the operation to the ALU (later...) • We might be interested if the result of the ALU is zero (later...) 5.2 Memory Operations Read reg. num AA reg num Read reg data A Read reg num B Instruction Read address Zero Result Registers Data Memory Read data Write address Write reg num Read reg data B Write data Write reg data 16 sign extend 32 • Memory operations first need to compute the effective address • LW $t1, 450($s3) # E.A. = 450 + $s3 • Add together one register and 16 bits of immediate data • Immediate data needs to be converted from 16-bit to 32-bit • Memory then performs load or store using destination register 5.2 Branches • Branches conditionally PC + 4 Result Sh. Left 2 Instruction Read reg. num AA reg num Read reg data A Read reg num B Registers Write reg num Read reg data B Write reg data offset 16 sign extend 32 Add To control logic Zero Result change the next instruction • BEQ $2, $1, 42 • The offset is specified as the number of words to be added to the next instruction (PC+4) • Take offset, multiply by 4 • Shift left two • Add this to PC+4 (from PC logic) • Control logic has to decide if the branch is taken • Uses ‘zero’ output of ALU 5.2 Integrating the R-types and Memory Read reg. num AA reg num Read reg data A Read reg num B Instruction Read address Zero Result Registers Write reg num Read reg data B Data Memory Read data Write address 1 0 0 Write data Write reg data Memory Datapath 16 sign extend 1 32 • R-types and Load/Stores are similar in many respects • Differences: • 2nd ALU source: R-types use register, I-types use Immediate • Write Data: R-types use ALU result, I-types use memory • Mux the conflicting datapaths together 5.3 Adding the instruction memory Simply add the instruction memory and PC to the beginning of the datapath. 4 Result Add PC Read reg. num AA reg num Read reg data A Read reg num B Read address Instruction [31-0] Read address Zero Result Registers Instruction Memory 16 Write reg num Read reg data B 0 Write reg data 1 sign extend Data Memory Read data Write address 1 0 Write data 32 Separate Instruction and Data memories are needed in order to allow the entire datapath to complete its job in a single clock cycle. 5.3 Adding the Branch Datapath 0 4 Result Result Sh. Left 2 Add PC Read reg. num AA reg num Read reg data A Read reg num B Read address Instruction [31-0] 16 Zero Result Write reg num Read reg data B 0 Write reg data 1 sign extend Add Read address Registers Instruction Memory 1 Data Memory Read data Write address 1 0 Write data 32 Now we have the datapath for R-type, I-type, and branch instructions. On to the control logic! 5.3 When does everything happen? 0 4 Result clk Result Sh. Left 2 Add PC Single-Cycle Design Read address Zero Result Registers Instruction Memory Write reg num Read reg data B 0 Write reg data 1 16 sign extend Data Memory Read data Write address 32 1 0 Write data clk Combinational Logic: Just does it! Outputs are always just a function of its inputs (with some delay) Add Read reg. num AA reg num Read reg data A Read reg num B Read address Instruction [31-0] 1 clk Registers: Written at the end of the clock cycle. (Rising edge triggered). 5.3 Example • Suppose it takes: • memory 100 nsec to read a word, • the ALU and adders take 4 nsec, • the register file can be read or written in 1 nsec, • the PC can be read or written in 0.2 nsec, • all multiplexors take 0.1 nsec. • Assume everything else takes 0 time (control, shift, • • • • sign extend, wires, etc.). How long will it take to execute an add instruction? How long will it take to execute a lw instruction? How long will it take to execute a beq instruction? How long will it take to execute a j instruction? What do we need to control? 4 Result RegistersShould we write data? 0 Result Sh. Left 2 Add PC Add Read reg. num AA reg num Read reg data A Read reg num B Read address Instruction [31-0] 16 Zero Result Write reg num Read reg data B 0 Write reg data 1 sign extend 1 Mux - Result from ALU or Memory? Read address Registers Instruction Memory Mux - are we branching or not? Data Memory Read data Write address 1 0 Write data 32 Mux - Where does 2nd ALU operand come from? ALU What is the Operation? MemoryRead/Write/neither? Almost all of the information we need is in the instruction! 5.3 The ALU • The ALU is stuck right in the middle of everything... • It must: • Add, Subtract, And, or Or for arithmetic instructions • Subtract for a branch on equal BInvert CarryIn • Subtract and set for a SLT • Add for a memory access A Function And Or Add Subtract SLT BInvert 0 0 0 1 1 Op 00 01 10 10 11 Carryin 0 0 0 1 1 Result R=A•B R=AB R=A+B R=A-B R = 1 if A < B 0 if A B Operation 0 1 Result B 0 1 + 2 3 Less CarryOut Always the same: Combine into one signal called “sub” 5.3 Setting the ALU controls • The instruction Opcode and Function give us the info we need • For R-type instructions, Opcode is zero, function code determines ALU controls • For I-type instructions, Opcode determines ALU controls New control signal: ALUOp is 00 for memory, 01 for Branch, and 10 for R-type Instruction Opcode ALUOp Funct. Code ALU action add sub and or SLT load word store word branch equal R-type R-type R-type R-type R-type LW SW BEQ 10 10 10 10 10 00 00 01 100000 100010 100100 100101 101010 xxxxxx xxxxxx xxxxxx add subtract and or SLT add add subtract ALU control sub op 0 10 1 10 0 00 0 01 1 11 0 10 0 10 1 10 5.3 Controlling the ALU For ALUOp = 00 or 01, function code is unused AluOp is determined by Opcode separate logic will generate ALUOp ALUOp 00 x1 1x 1x 1x 1x 1x F5 x x x x x x x F4 x x x x x x x F3 x x 0 0 0 0 1 F2 x x 0 0 1 1 0 F1 x x 0 1 0 0 1 Since ALUOp can only be 00, 01, or 10, we don’t care what ALUOp2 is when ALUOP1 is 1 F0 x x 0 0 0 1 0 Function Add Sub Add Sub And Or SLT ALU Ctrl 0 10 1 10 0 10 1 10 0 00 0 01 1 11 ALUOp1 F1 ALUOp0 F2 A2 A1 A0 F3 F0 A 6-input truth table use standard minimization techniques 5.3 Decoding the Instruction - Data The instruction holds the key to all of the data signals R-type Memory, Branch 31-26 25-21 20-16 Opcode RS RT To ctrl logic Read reg. A Read reg. B 31-26 25-21 20-16 Opcode RS RT To ctrl logic Read reg. A Write reg./ Read reg. B 15-11 10-6 5-0 RD ShAmt Function Write reg. Not Used To ALU Control 15-0 Immediate Data Memory address or Branch Offset One problem - Write register number must come from two different places. 5.3 Instruction Decoding 0 Opcode: [31-26] 4 Result Result Add PC We can decode the data simply by dividing up the instruction bus Read address Instruction [31-0] Instruction Memory Read Reg A: Rs Read Reg B: Rt Op:[31-26] Sh. Left 2 Ctrl Add Rs:[25-21] Read reg. num AA reg num Read reg A Rt:[20-16] Read reg numdata B 1 Rd: [15-11] Imm: [15-0] Read address Zero Result Registers 0 16 Write reg num Read reg data B 0 Write reg data 1 sign extend 1 Data Memory Read data Write address 1 0 Write data 32 Write Reg: Either Rd or Rt Immediate Data: [15-0] 5.3 Control Signals 0 4 Result Load,R-type Result Add Op:[31-26] Ctrl Rs:[25-21] Read reg. num AA reg num Read reg data A Read reg num B Rt:[20-16] PC Read address Instruction [31-0] Instruction Memory RegWrite Registers 0 1 Rd: [15-11] 1 BEQ and zero Add Sh. Left 2 PCSrc Store ALUSrc 0 Write reg data 1 MemToReg Read address Memory Write reg num Read reg data B MemWrite Load Zero Result Data Memory Read data Write address 0 Write data RegDest R-type Imm: [15-0] 16 sign extend FC:[5-0] 32 6 1 ALU Ctrl 00: Memory 01: Branch 10: R-type MemRead ALUOp ALU Control - A function of: ALUOp and the function code 5.3 Load Inside the control oval 0:Reg 1:Imm 1:Mem 0:ALU 00:Mem 01:Branch 1:Branch 10:R-type 0:Rt 1:Rd Reg ALU Mem Reg Instruction Opcode Write Src To Reg Dest Mem Mem Read Write PCSrc ALUOp R-format LW 000000 1 100011 1 0 1 0 1 1 0 0 1 0 0 0 0 10 00 SW BEQ 101011 0 000100 0 1 0 x x x x 0 0 1 0 0 1 00 01 • This control logic can be decoded in several ways: • Random logic, PLA, PAL • Just build hardware that looks for the 4 opcodes • For each opcode, assert the appropriate signals Note: BEQ must also check the zero output of the ALU... 5.3 We must AND BEQ and Zero Control Signals 0 4 Result Result Sh. Left 2 Add Op:[31-26] BEQ MemToReg MemRead MemWrite ALUOp ALUSrc RegWrite Rt:[20-16] PC Instruction Memory PCSrc Write Read Read address Read reg. num AA reg num Read reg data A Read reg num B Zero Result Registers 0 Write reg num Read reg data B 1 Rd: [15-11] Imm: [15-0] Add Ctrl RegDest Rs:[25-21] Read address Instruction [31-0] 1 sign extend FC:[5-0] Read data Write address 1 32 1 0 0 Write data Write reg data 16 Data Memory ALU Ctrl 6 5.3 Jumping 26 4 32 Sh. Left 2 28 0 0 Concat. 4 Result [31-28] Result Add BEQ MemToReg MemRead MemWrite ALUOp ALUSrc RegWrite Rt:[20-16] Instruction Memory Add PCSrc Write Read Read address Read reg. num AA reg num Read reg data A Read reg num B Zero Result Registers 0 Write reg num Read reg data B 1 Rd: [15-11] Imm: [15-0] 1 Ctrl RegDest Rs:[25-21] Read address Instruction [31-0] Sh. Left 2 Jump J:[25-0] Op:[31-26] PC 1 sign extend FC:[5-0] Read data Write address 1 32 1 0 0 Write data Write reg data 16 Data Memory ALU Ctrl 6 5.3 Performance What major functional units are used by different instructions? R-type: Instr. Fetch Register Read ALU Register Write 6ns LW: Instr. Fetch Register Read ALU Memory Read Register Write 8ns SW: Instr. Fetch Register Read ALU Memory Write 7ns Branch: Instr. Fetch Register Read ALU 5ns Jump: 2ns Instr. Fetch Assume the following times: Memory Access: 2ns ALU: 2ns Registers: 1ns Since the longest time is 8ns (LW), the cycle time must be at least 8ns. Example • Calculate the execution times for the following program in a Single-cycle datapath with a cycle time of 50 ns main: add $9, $0, $0 # clear $9 lw $8, Tonto($9) # put Tonto[0] in $8 addi $9, $9, 4 # increment $9 lw $10, Tonto($9) # put Tonto[1] in $10 add $11, $10, $8 Example 2 Calculate the execution times for the following program in a Single-cycle datapath with a cycle time of 50 ns .data ARRAY: .word 3, 5, 7, 9, 2 #random values SUM: .word 0 #initialize sum to zero .text main: addi $6, $0, 5 #initialize loop counter to 5 addi $7, $0, 0 #initialize array index to zero addi $8, $0, 0 #set $8 (sum temp) to zero REPEAT: lw $5, ARRAY($7) #R5 = ARRAY[i] add $8, $8, $5 #SUM+= ARRAY[I] addi $7, $7, 4 #increment index (i++) addi $6, $6, -1 #decrement loop counter bne $6, $0, REPEAT #check if 5 repetitions sw $8, SUM($0) #copy sum to memory addi $v0, $0, 10 #exit program syscall