Transcript Document
Multicycle Datapath Design CMSC411/Computer Architecture These slides and all associated material are © 2003 by J. Six and are available only for students enrolled in CMSC411. I once put instant coffee in a microwave and went back in time. CMSC411 – Computer Architecture / © 2003 J. Six Use and Distribution Notice Possession of any of these files implies understanding and agreement to this policy. The slides are provided for the use of students enrolled in Jeff Six's Computer Architecture class (CMSC 411) at the University of Maryland Baltimore County. They are the creation of Mr. Six and he reserves all rights as to the slides. These slides are not to be modified or redistributed in any way. All of these slides may only be used by students for the purpose of reviewing the material covered in lecture. Any other use, including but not limited to, the modification of any slides or the sale of any slides or material, in whole or in part, is expressly prohibited. Most of the material in these slides, including the examples, is derived from Computer Organization and Design, Second Edition. Credit is hereby given to the authors of this textbook for much of the content. This content is used here for the purpose of presenting this material in CMSC 411, which uses this textbook. CMSC411 – Computer Architecture / © 2003 J. Six Multicycle Implementations Before we broke each instruction into a series of steps that correspond to the functional units that were necessary. In a multicycle implementation, each step in the execution of an instruction takes one clock cycle. This allows a functional unit to be used more than once per instruction. We only require that each unit is only used once per clock cycle. This allows instructions to take different numbers of cycles and functional units can be shared – this turns out to be a major advantage. CMSC411 – Computer Architecture / © 2003 J. Six The Multicycle Datapath Here is an abstract representation of a multicycle datapath. Instruction register PC Address Data A Register # Instruction Memory or data Data Memory data register ALU Registers Register # B Register # ALUOut CMSC411 – Computer Architecture / © 2003 J. Six The Multicycle Datapath This multicycle datapath has a couple of key differences… A single memory unit is used instead of separate units for instructions and data. The ALU and two adders have been replaced with a single ALU. After every major functional unit, we have added registers. These registers hold the output of that unit until the value is used in the next clock cycle. This is key to the reuse capability of the functional units in the datapath. CMSC411 – Computer Architecture / © 2003 J. Six Data Storage Data that is used by later instructions must be stored in a programmer-accessible state element. This includes the register file, the PC, and the memory. Data that is used later in the same instruction must be stored in one of these additional registers that appear after each functional unit. These include… The instruction register (IR) and memory data register (MDR) save the output of a memory access. The A and B registers hold the output of reads from the register file. The ALUOut register holds the output of the ALU. CMSC411 – Computer Architecture / © 2003 J. Six Sharing Functional Units Since we are now reusing functional units, we need multiplexors in front of our function units to select the appropriate inputs for the current use. For example, a multiplexor is needed in front of the memory address input for the memory unit to select between the PC (for an instruction access) and the output of the ALU (for a data access). We end up needing more two changes… The first input into the ALU needs a mux (the A register or the PC). The second ALU’s mux expands from two inputs into four (we need to add the constant 4 – for PC incrementing – and the sign-extended and shifted offset field – for branch address calculation). CMSC411 – Computer Architecture / © 2003 J. Six The New Datapath Here is the datapath with all of these changes. Notice that by adding some registers and some multiplexors, we reduce the number of memory units to one and eliminate two adders. PC 0 M u x 1 Address Memory Instruction [25– 21] Read register 1 Instruction [20– 16] Read Read register 2 data 1 Registers Write Read register data 2 MemData Instruction [15– 0] Write data Instruction register Instruction [15– 0] Memory data register 0 M Instruction u x [15– 11] 1 A B 0 4 0 M u x 1 Sign extend 32 Zero ALU Write data 16 0 M u x 1 Shift left 2 1 M u 2 x 3 ALU result ALUOut CMSC411 – Computer Architecture / © 2003 J. Six Control Signals We have changed the way our datapath works; it needs new control signals. The programmer-visible state units need a write signal, as done the IR (that needs to hold the address until the end of the instruction – we do not want to write to it all the time like the other internal registers we added). The memory needs a read signal. The ALU control signals stay the same (lucky us). The newly added multiplexors need control signals. CMSC411 – Computer Architecture / © 2003 J. Six The New Datapath (with control signals) Here’s the datapath with the control signals… IorD PC 0 M u x 1 MemRead MemWrite RegDst RegWrite Instruction [25– 21] Address Memory MemData Write data IRWrite Instruction register Instruction [15– 0] Memory data register 0 M u x 1 Read register 1 Read Read data 1 register 2 Registers Write Read register data 2 Instruction [20– 16] Instruction [15– 0] ALUSrcA 0 M Instruction u x [15– 11] 1 A B 4 Write data 0 M u x 1 16 Sign extend 32 Shift left 2 Zero ALU ALU result 0 1 M u 2 x 3 ALU control Instruction [5– 0] MemtoReg ALUSrcB ALUOp ALUOut CMSC411 – Computer Architecture / © 2003 J. Six Adding Branches and Jumps Our new datapath does not yet support branch and jump instructions. There are now three possible PC values… The output of the ALU. This is the value PC+4 during the instruction fetch stage. The ALUOut register. This is the address of a branch target, after it is computed. The lower 26 bits of the IR shifted left by two and concatenated with the upper 4 bits of the incremented PC. This is for a jump instruction. Note that the PC is written unconditionally (normal increment and jumps) or conditionally (conditional branch). We need two new control signals, PCWrite and PCWriteCond. CMSC411 – Computer Architecture / © 2003 J. Six The PC Write Control Signal We can compute a PC write control signal using these two new control signals, the ALU Zero signal, and a couple gates. We can AND the Zero signal with PCWriteCond and then OR that with PCWrite. This will give us the PC write control signal. CMSC411 – Computer Architecture / © 2003 J. Six The Complete Multicycle Datapath PCWriteCond PCSource PCWrite ALUOp IorD Outputs ALUSrcB MemRead ALUSrcA Control MemWrite RegWrite MemtoReg IRWrite Op [5– 0] RegDst 0 M 26 Instruction [25– 0] PC 0 M u x 1 Shift left 2 Instruction [31-26] Address Memory MemData Write data Instruction [25– 21] Read register 1 Instruction [20– 16] Read Read register 2 data 1 Registers Write Read register data 2 Instruction [15– 0] Instruction register Instruction [15– 0] Memory data register 0 M Instruction u x [15– 11] 1 B Write data 0 M u x 1 16 Sign extend 32 Instruction [5– 0] Shift left 2 Zero ALU ALU result 0 4 Jump address [31-0] 1 M u 2 x 3 ALU control 1 u x 2 PC [31-28] 0 M u x 1 A 28 ALUOut CMSC411 – Computer Architecture / © 2003 J. Six Breaking Down an Instruction The multicycle datapath is complete. Now we need to look at what should happen in each clock cycle – we should attempt to balance the amount of work done in each cycle so that we can minimize the clock cycle time. We will do this by breaking down instruction execution into a series of steps, each taking one clock cycle and balanced in length. Each step contains, at most, one ALU op, one register file access, or one memory access. One clock cycle could be as short as the longest of these operations. CMSC411 – Computer Architecture / © 2003 J. Six Clocking Considerations Remember that our implementation is edgetriggered; we can continue to read the current value of a register – the new value will not appear until the next clock cycle. All of the operations grouped into one step happen in parallel within the same clock cycle. Successive steps will all occur in subsequent clock cycles. Reading/writing into the PC or a standalone register is slightly different than reading/writing to the register file. The former occurs immediately; the later takes a clock cycle (due to additional control constraints). CMSC411 – Computer Architecture / © 2003 J. Six Instruction Execution Steps We will now break instruction execution into these steps – specifically, five steps… Instruction Fetch Instruction Decode and Register Fetch Execution, Memory Address Computation, or Branch Completion Memory Access or R-type Completion Memory Read Completion Each instruction will take either three, four, or five, of these steps to complete. CMSC411 – Computer Architecture / © 2003 J. Six Instruction Fetch Stage In this stage, we fetch the instruction from memory and compute the address of the next sequential instruction. Send the PC to the memory as the address, perform a read, and write the instruction into the IR. Increment the PC by four. Store the incremented PC into the PC. This new value will not be visible until the next clock cycle. Note that we can increment the PC and read the instruction from memory in parallel. CMSC411 – Computer Architecture / © 2003 J. Six Instruction Decode and Register Fetch Stage Here (and in the previous stage), we do not yet know what type of instruction we’re dealing with so we can only do tasks common to all instructions or tasks that are not harmful (does not matter if they are done and are not necessary). We can read the registers specified by rs and rt and store them in registers A and B. We can compute the branch target address with the ALU and save it in ALUOut. These are non-harmful actions (and can be computed in parallel) – if the instruction does not require them, no harm has come; if it does, we’ve gotten some work done early! CMSC411 – Computer Architecture / © 2003 J. Six Execution, Memory Address Computation, or Branch Completion Stage In this stage, the datapath operation finally can be determined by the instruction class. In all cases, the ALU is operating on the operands prepared in the previous step, performing one of three functions. The action taken during this stage is depenedent on the instruction class. For memory reference instructions… The ALU adds the operands (Register A and the signextended version of bits 0->15 of the instruction). CMSC411 – Computer Architecture / © 2003 J. Six Execution, Memory Address Computation, or Branch Completion Stage For R-type instructions… The ALU performs the instruction specified by the function code on the two values read from the register file in the previous stage. For branch instructions… The ALU computes the Zero signal for the two registers read in the previous stage. If the conditional branch is taken, the target address is written to the PC and is used for the next instruction fetch. For jump instructions… The PC is replaced by the jump address. CMSC411 – Computer Architecture / © 2003 J. Six Memory Access or R-type Completion Stage For memory access instructions… In a load instruction, a data word is retrieved from memory and is written into the MDR. In a save instruction, the data is written into memory. The address used is the one computed in the previous step and stored in ALUOut. For R-type instructions… The contents of ALUOut, the output of the ALU from the previous stage, is written into the Result register. Memory Read Completion Stage CMSC411 – Computer Architecture / © 2003 J. Six In this stage, memory loads complete by writing the value that was read from memory in the previous stage from the MDR into the register file. CMSC411 – Computer Architecture / © 2003 J. Six Now … Onto the Control Unit When we designed the control unit for the single cycle datapath, we used truth tables. For our multicycle datapath, the control unit is more complex because the instruction is executed in a series of steps – therefore, it must specify both the control signals and the next step in the instruction execution sequence. We will discuss multicycle control unit using two different techniques… Finite State Machines (FSMs) Microprogramming CMSC411 – Computer Architecture / © 2003 J. Six Finite State Machine Design A finite state machine (FSM) consists of a set of states and directions on how to change states. A FSM maps the current state and the inputs into a new state using a next-state function. Each state also specifies a set of outputs that are asserted when the FSM is in that state. If a specific output is not asserted, we can assume that it is deasserted. Let’s look at the FSM implementation of the multicycle control unit. CMSC411 – Computer Architecture / © 2003 J. Six Understanding FSM Diagrams The FSMs we will now look at include state transitions (to other FSMs) and a set of output signals. The output signals are control signals. These specify what control signals need to be specified by that class of instruction at that stage. To best understand what the FSM diagrams are showing you, it’s a good idea to have a copy of the multicycle datapath with the control signals labeled, along with the FSM diagram. This should enable you to easily follow along with what the FSM diagrams are telling you. CMSC411 – Computer Architecture / © 2003 J. Six Generic FSM Flow A FSM can be constructed for the generic execution flow. We will break all of these boxes into their own FSMs on the next couple of slides (the figure numbers refer to the text). Start Instruction fetch/decode and register fetch (Figure 5.37) Memory access instructions (Figure 5.38) R-type instructions (Figure 5.39) Branch instruction (Figure 5.40) Jump instruction (Figure 5.41) CMSC411 – Computer Architecture / © 2003 J. Six Instruction Fetch and Decode FSM This stage’s FSM is identical for all instruction classes (again, all numbers refer to the text). Instruction decode/ Register fetch Instruction fetch This is Figure 5.37. 0 1 ALUSrcA = 0 ALUSrcB = 11 ALUOp = 00 (Op = 'JMP') Start MemRead ALUSrcA = 0 IorD = 0 IRWrite ALUSrcB = 01 ALUOp = 00 PCWrite PCSource = 00 Memory reference FSM (Figure 5.38) R-type FSM (Figure 5.39) Branch FSM (Figure 5.40) Jump FSM (Figure 5.41) CMSC411 – Computer Architecture / © 2003 J. Six Instruction class-specific FSMs After the instruction fetch and decode FSM, control passes to one of four FSMs, based on which instruction class is being executed. The figure number for each of the four instruction class-specific FSMs is provided, so control flow can easily be followed. CMSC411 – Computer Architecture / © 2003 J. Six Memory Reference Instruction FSM (Figure 5.38) F r o m s ta te 1 ( O p = ' L W ' ) o r ( O p = ' S W ') M e m o r y a d d r e s s c o m p u ta tio n 2 (Op = 'LW') A L U S rc A = 1 A L U S rc B = 1 0 A LU O p = 00 M e m o ry acce ss 3 M e m o ry a ccess 5 M em R ead Io r D = 1 M e m W r ite Io rD = 1 W r ite - b a c k s te p 4 R e g W r it e M e m to R e g = 1 R egD st = 0 T o s ta te 0 ( F ig u r e 5 .3 7 ) CMSC411 – Computer Architecture / © 2003 J. Six R-Type Instruction FSM (Figure 5.39) F r o m s ta te 1 ( O p = R - ty p e ) E x e c u t io n 6 A L U S rc A = 1 A L U S rc B = 0 0 A LU O p = 10 R - ty p e c o m p le t io n 7 R egD st = 1 R e g W r i te M e m to R e g = 0 T o s ta te 0 ( F ig u r e 5 .3 7 ) CMSC411 – Computer Architecture / © 2003 J. Six Branch Instruction FSM (Figure 5.40) From state 1 (Op = 'BEQ') Branch completion 8 ALUSrcA = 1 ALUSrcB = 00 ALUOp = 01 PCWriteCond PCSource = 01 To state 0 (Figure 5.37) CMSC411 – Computer Architecture / © 2003 J. Six Jump Instruction FSM (Figure 5.41) From state 1 (Op = 'J') Jump completion 9 PCWrite PCSource = 10 To state 0 (Figure 5.37) CMSC411 – Computer Architecture / © 2003 J. Six Building a Multicycle Datapath FSM Now that we have seen the common FSM for the first stages and the instruction-specific FSMs for the following stages, we can build a common FSM for the multicycle datapath. CMSC411 – Computer Architecture / © 2003 J. Six The Multicycle Datapath FSM In s t r u c tio n d e c o d e / r e g is te r fe t c h In s tr u c tio n fe tc h S t a rt M e m o ry a d d re s s c o m p u t a tio n M em R ead A L U S rc A = 0 Io r D = 0 IR W r ite A L U S rc B = 0 1 ALUO p = 00 P C W r ite P C S o u rc e = 0 0 6 (Op = 'LW') A L U S rc A = 1 A L U S rc B = 1 0 ALUO p = 00 M e m o ry access 5 M em R ead Io r D = 1 W rite - b a c k s te p R eg D st = 0 R e g W rite M e m to R e g = 1 B ra n c h c o m p le tio n 8 A L U S rc A = 1 A L U S rc B = 0 0 ALU O p = 10 M e m o ry a cce ss 4 A L U S rc A = 0 A L U S rc B = 1 1 A LU O p = 00 E x e c u tio n 2 3 1 M e m W r ite Io r D = 1 R egD st = 1 R e g W rite M e m to R e g = 0 Jum p c o m p le tio n 9 A L U S rc A = 1 A L U S rc B = 0 0 ALUO p = 01 P C W rit e C o n d P C S o u rc e = 0 1 R - ty p e c o m p le tio n 7 (Op = 'J') 0 P C W r ite P C S o u rc e = 1 0 CMSC411 – Computer Architecture / © 2003 J. Six FSM Logic Implementation FSMs are often implemented in hardware using a state register (to hold the current state) and a big block of combinatorial logic that takes… Inputs: current state and the instruction register opcode field Outputs: next state and the datapath control signals This is not the only way to implement a FSM in hardware, but it is typical. This implementation is known as a Moore machine, as the outputs only depend on the current state. CMSC411 – Computer Architecture / © 2003 J. Six FSM Logic Implementation (Moore Machine) Combinational control logic Datapath control outputs Outputs Inputs Next state Inputs from instruction register opcode field State register CMSC411 – Computer Architecture / © 2003 J. Six Limitations of FSM-based Control Unit Design Control design using FSM proved successful with our (very) limited implementation (although the FSM took up a whole slide) Imagine designing (or understanding) a FSM for a complete MIPS implementation (100+ instruction, ranging from 1 to 20 clock cycles) or IA-32 (many times that number of instructions plus a lot of addressing modes). Even after such a complex FSM is designed, how can it be translated into hardware? The potential for errors is huge. FSM control design does not scale well. CMSC411 – Computer Architecture / © 2003 J. Six Microinstructions Think of what defines the behavior of the datapath during a stage of instruction execution - control signals. Enter microinstructions - a low-level control instruction that consists of the set of control signals that are asserted at that stage of instruction execution. Microinstructions must also include some notion of sequencing… is the next sequential microinstruction executed? do we transfer control to a different microinstruction? CMSC411 – Computer Architecture / © 2003 J. Six Microprogramming The design of a program that implements the machine instructions in terms of microinstructions is known as microprogramming. Just like we design assembly languages to represent machine instructions as a series of field, we can do the same for microinstructions. Now, let’s turn to designing a microinstruction format. CMSC411 – Computer Architecture / © 2003 J. Six Microinstruction Format Design The microprogram is just a symbolic representation of the control that is converted into a series of control signals – the format chosen for this should simplify the representation, making it as easy as possible to write and understand a microprogram. We also do not want to allow inconsistent microinstructions… We cannot let a microinstruction require that a control signal be set to two different values. To do this, we make each microinstruction field specify a non-overlapped set of control signals. Signals that are never asserted at the same time may share the same field. CMSC411 – Computer Architecture / © 2003 J. Six Our Microinstruction Format We can define a simple microinstruction format for our MIPS implementation. The first six fields control the datapath. The last field specifies how to select the next microinstruction. Field Function of Field ALU control operation being done by ALU SRC1 source for first ALU operand SRC2 source for second ALU operand Register Control read or write for register file, source of value for a write Memory read or write for memory, source for memory, dest register PCWrite Control writing of the PC Sequencing how to choose the next microinstruction CMSC411 – Computer Architecture / © 2003 J. Six Microinstruction Addressing Typically, microinstructions are stored in a ROM or PLA and are given (sequential) addresses. There are three methods to choose the next microinstruction (MI)… Increment the address of the current MI. (Seq in the Sequencing field) Branch to the MI that begins execution of the next MIPS instruction. (Fetch in the Sequencing field) Choose the next MI based on control unit input (called a dispatch). A dispatch table of addresses of target microinstructions is stored in a ROM/PLA. There can be >1 dispatch tables. (Dispatch x in the Sequencing field; x = # of dispatch table to use). CMSC411 – Computer Architecture / © 2003 J. Six Possible Field Values Let’s look at the possible values for some of the microinstruction fields. ALU Control (operation for the ALU) Add, Subt, or Func Code (use the funct field) SRC1 (first ALU input) PC, A (register A) SRC2 (second ALU input) B, 4, Extend (sign-ext), Extshft (sign-ext + shift) Register Control (read/write for reg file + number) Read (read into A and B), Write ALU (write ALUOut to register file), Write MDR (write MDR to register file) CMSC411 – Computer Architecture / © 2003 J. Six Possible Field Values Memory (read/write and address for mem) Read PC (read mem[PC] into IR), Read ALU (read mem[ALUOut] into MDR), Write ALU (write B into mem[ALUOut]) PCWrite Control (writing of the PC) ALU (write ALU output into PC), ALUOut-cond (if Zero, write ALUOut into PC), Jump address (write jump address into PC) Sequencing Seq, Fetch, Dispatch x CMSC411 – Computer Architecture / © 2003 J. Six Example The first two steps of execution of an instruction are the same. Here’s the microprogram… Label ALU SRC SRC2 Control 1 Reg. Memory PCWrite SequenCtrl. Ctrl. Ctrl. cing Fetch Add PC 4 Add PC Extshift Read Read PC ALU Seq Dispatch 1 In MI 1, we compute PC+4, fetch the instruction into the IR, cause the output of the ALU to be written into the PC, and then go to the next MI. In MI 2, we store PC+ sign extended and shifted IR[15-0] into ALUOut, read values into A & B, and use dispatch table 1 to find out where to go next (switching based on instruction class). Implementing Microcode Controllers CMSC411 – Computer Architecture / © 2003 J. Six A common way to implement microcode controllers (hardware to run microprograms/microcode) stored the code in a read-only memory (ROM/PLA) and implements the sequencing function separately. The microcode store specifies the control lines and how to select the next MI – not the next MI itself. A separate piece of logic, the address select logic takes the sequencing information from the microcode and inputs from the opcode of the current instruction to select the next microinstruction. CMSC411 – Computer Architecture / © 2003 J. Six A Common Implementation Microcode storage Datapath control outputs Outputs Input 1 Microprogram counter Adder Address select logic Inputs from instruction register opcode field Sequencing control CMSC411 – Computer Architecture / © 2003 J. Six Hardware Support for Exceptions We have one last item to discuss concerning multicycle datapath implementations. One of the hardest parts of microprocessor control is handling exceptions and interrupts – events other than branches and control that change the normal flow of instruction execution. An exception is an unexpected event from within the processor – such as an arithmetic overflow. An interrupt is an unexpected event from outside the processor – these are typically caused by I/O devices that require attention. CMSC411 – Computer Architecture / © 2003 J. Six Interrupts vs. Exceptions The terms exception and interrupt are frequently confused and intermixed… Intel x86 uses the word interrupt for all of these events (internal or external). PowerPC uses the word exception for an unusual event and the word interrupt to indicate the change in control flow. We will use the term exception to refer to any (internal or external) unexpected change in control flow and the term interrupt only when the event is externally caused (this is the convention that MIPS uses). Exception Support in the Design Process CMSC411 – Computer Architecture / © 2003 J. Six We will now look at control unit support for two types of exceptions that arise from the instructions we have already implemented. Determining exceptional conditions (and taking the appropriate action) is often on the critical path of a design (the longest path through the datapath) and is thus vital to computing the clock cycle time. Adding support for exceptions after the design of a datapath has been finalized often significantly reduce performance and complicate the effort required to correctly implement the design. CMSC411 – Computer Architecture / © 2003 J. Six Exception Processing The two exceptions we will support are… an undefined instruction arithmetic overflow When we encounter one of these conditions, will store the address of the offending instruction in the exception program counter (EPC) and transfer control to the OS of the machines at some specified address (the exception handler). Normally, this exception handler deals with the error and then either terminates the program or continues its execution (using the EPC value). CMSC411 – Computer Architecture / © 2003 J. Six Expression the Reason for the Exception For the OS to handle the exception, it needs to know what caused it. There are two main methods used for this… Method 1 (used in the MIPS architecture) Add a register (called the Cause register) that stores a field the indicates the reason for the exception. Here these is one major exception handler. Method 2 (used in the Intel x86 architecture): Known as vectored interrupts, the address to which control is transferred is determined by the cause of the exception (many exception handlers, called from a table – sometimes called the interrupt vector jump table). CMSC411 – Computer Architecture / © 2003 J. Six Implementing Exception Support Since we are implementing MIPS, we need to add two registers to our design… EPC: this stores the address of the offending instruction (this would also be needed for vectored interrupts). Cause: records the cause of the exception. For our implementation, 0 will represent an undefined instruction and 1 will represent an arithmetic overflow. We need new signals… EPCWrite – write to the EPC register? CauseWrite – write to the Cause register? IntCause – write a zero or one (which exception?) exception address – PC of the offending instruction CMSC411 – Computer Architecture / © 2003 J. Six The New Datapath CauseWrite IntCause EPCWrite PCSource ALUOp PCWriteCond PCWrite IorD Outputs MemRead MemWrite ALUSrcB Control ALUSrcA MemtoReg IRWrite RegWrite Op [5– 0] RegDst 0 26 Instruction [25– 0] PC 0 M u x 1 Shift left 2 Instruction [31-26] Address Memory MemData Write data Read register 1 Instruction [20– 16] Read Read register 2 data 1 Registers Write Read register data 2 Instruction register Instruction [15– 0] Memory data register Jump address [31-0] 2 CO 00 00 00 Instruction [25– 21] Instruction [15– 0] 28 0 M Instruction u x [15– 11] 1 B 0 M u x 1 Zero ALU ALU result ALUOut 0 1 Sign extend 32 Instruction [5– 0] Shift left 2 EPC 1M u 2x 3 16 3 0 4 Write data u x PC [31-28] 0 M u x 1 A 1M ALU control 0 M u x 1 Cause CMSC411 – Computer Architecture / © 2003 J. Six Support for Exceptions Since the PC has already been incremented during the first cycle of instruction execution, we cannot just write the PC into the EPC. We use the ALU to subtract from 4 from the PC and write to the EPC. We already have 4 as an input – great! This requires no additional control signals or paths. If an exception is detected, we will follow a FSM to successfully implement handling. CMSC411 – Computer Architecture / © 2003 J. Six Exception FSM If an exception occurs, we follow a simple FSM… 11 10 IntCause = 0 CauseWrite ALUSrcA = 0 ALUSrcB = 01 ALUOp = 01 EPCWrite PCWrite PC++Source = 11 IntCause = 1 CauseWrite ALUSrcA = 0 ALUSrcB = 01 ALUOp = 01 EPCWrite PCWrite PCSource = 11 PCSource = 11 To state 0 to begin next instruction CMSC411 – Computer Architecture / © 2003 J. Six Exception Detection So how does the control unit detect an exception? An undefined instruction is detected when no next state is defined for the opcode value. If the opcode is not for one of the instructions we support, an undefined instruction exception has occurred. When the ALU Overflow signal is asserted, an verflow exception has occurred. We can then add the exception FSM to our generic FSM. CMSC411 – Computer Architecture / © 2003 J. Six Our Final Multicycle Datapath FSM Instruction decode/ Register fetch 1 Instruction fetch 0 MemRead ALUSrcA = 0 IorD = 0 IRWrite ALUSrcB = 01 ALUOp = 00 PCWrite PCSource = 00 Memory address computation 2 6 (Op = 'LW') 8 ALUSrcA = 1 ALUSrcB = 00 ALUOp = 10 Memory access Memory access 5 MemRead IorD = 1 RegWrite MemtoReg = 1 RegDst = 0 ALUSrcA = 1 ALUSrcB = 00 ALUOp = 01 PCWriteCond PCSource = 01 R-type completion 7 MemWrite IorD = 1 11 RegDst = 1 RegWrite MemtoReg = 0 Write-back step 4 Branch completion Execution ALUSrcA = 1 ALUSrcB = 00 ALUOp = 00 3 ALUSrcA = 0 ALUSrcB = 11 ALUOp = 00 Overflow (Op = 'J') Start Overflow Jump completion 9 PCWrite PCSource = 10 IntCause = 1 CauseWrite ALUSrcA = 0 ALUSrcB = 01 ALUOp = 01 EPCWrite PCWrite PCSource = 11 10 IntCause = 0 CauseWrite ALUSrcA = 0 ALUSrcB = 01 ALUOp = 01 EPCWrite PCWrite PCSource = 11 CMSC411 – Computer Architecture / © 2003 J. Six One More Issue Close examination of the final multicycle datapath FSM reveals that if an arithmetic overflow occurs, the instruction completes and writes its (incorrect) result, because the overflow branch occurs when the instruction is in the write stage. Is this the correct behavior? Maybe – it’s a design decision! Some architectures – including MIPS – specify that an instruction has no effect if it causes an exception (so the FSM is not quite accurate to the MIPS specification). CMSC411 – Computer Architecture / © 2003 J. Six Case Study: The IA-32/x86 Control The Intel IA-32 microprocessor line, since the 80486, uses a combination of hardwired control and microprogrammed control. Hardwired control is used for simple instructions (single path through the datapath and a small number of clock cycles). Microprogrammed control is used for complex instructions. This allows IA-32 to achieve low cycle counts for simple instructions while not requiring such instructions to pass through the complex datapath that handles the very complex general instructions (make the common case fast!). CMSC411 – Computer Architecture / © 2003 J. Six The Pentium Pro Datapath The Pentium and all of its successors are superscalar processors (we will discuss this in detail later). More than one instruction is executed per clock cycle – this requires a duplication of datapath resources. Think of the PPro as having multiple datapaths, each tailored for a specific type of instruction. Using such a design, we can execute a memory load at the same time as we execute a computation instruction, for example. CMSC411 – Computer Architecture / © 2003 J. Six Microoperations The PPro executes simple microoperations (another name for microinstructions), much like MIPS instructions. Microoperations (MOs) are self-contained operations that are 72-bits wide internally – these 72 bits are expanded to 120 (for the integer datapath) or 285 (for the floating-point datapath) control signals. The control unit that executes the MOs is a hardwired control unit (like the FSM model we have already seen). CMSC411 – Computer Architecture / © 2003 J. Six Generating Microoperations The instruction->MO generation is done in one of two ways. For x86 instructions that require less than four MOs, that instruction is directly decoded into one to four MOs by hardwired PLAs. For other x86 instructions, the microprocessor uses a traditional microcode sequencer (the design we saw for MIPS with address select logic) and microcode control store to generate the sequence of five or more MOs. CMSC411 – Computer Architecture / © 2003 J. Six Pentium Pro Performance The Pentium Pro’s datapath combines many features together… simple, low-level hardwired control and simple datapaths for executing MOs translation process that has a hardwired control for simple instructions and microcoded control for complex instructions These features allow the PPro to perform very well in terms of CPI for integer instructions compared with traditional RISC microprocessors – something that was a long time coming from a traditional CISC design. CMSC411 – Computer Architecture / © 2003 J. Six Summary Datapath design can be vastly improved by moving from a single cycle to a multicycle design. Control unit design can be accomplished using hardwired control (FSM) or microprogramming. Now that we have covered the basics of datapath and control unit design, we will move to pipelining, the key to modern microprocessor performance.