Transcript Slide 1
EEE515J1 ASICs and DIGITAL DESIGN Lecture 7: CPUs; The SHC1, Simple Hypothetical CPU #1 Ian McCrum Room 5B18 Tel: 90 366364 voice mail on 6th ring Email: [email protected] Web site: http://www.eej.ulst.ac.uk (old archive http://tigger.engj.ulst.ac.uk/~ddij23 ) Last changed 30/11/07@18:00 30/11/07 www.eej.ulster.ac.uk/~ian/modules/EEE515J1/ files 1 Common ASM DATA PROCESSOR blocks Input Data External Inputs ( only a few and preferably synchronised to the system clock) Control Signals DATA PROCESSOR Simple blocks, each of which does a single, simple, easily expressed function. CONTROL LOGIC Actually a FSM; receiving inputs and deciding what sequences of outputs to generate. Status Signals Output Data 30/11/07 www.eej.ulster.ac.uk/~ian/modules/EEE515J1/ files We find we often get data from the outside world or a internal storage register, process it in some way and put the result back into an internal register or send it to the outside world 2 More common DATA PROCESSOR blocks CLEAR COUNTER (RESETABLE) DETECT COUNTUP EQ16 16 CLOCK REGISTER, Load number or constant LOADSTARTVALUE LOAD COUNTER (RESETABLE) DETECT COUNTDOWN EQ In designing ASM machines we often need to repeat a set of operations a number of times. Hence we will often have counters and some means of detecting when a count is reached. (or counters that count down and a zero detector (NOR gate!) zero CLOCK 30/11/07 www.eej.ulster.ac.uk/~ian/modules/EEE515J1/ files 3 More general pupose data processing block LOAD REG CLEAR CLEAR REG LOAD ALU ADD ALU Function code ADDER LOADRESULT REGISTER REGISTER LOADRESULT CLEARRESULT CLEARRESULT ALU can output A+B, A-B, B-A, A, B, A AND B, A OR B, A XOR B, NOT A NOT B using 4 Function code lines. It can also output STATUS bits Z,C,N,V (see 74F181 datasheet) 30/11/07 www.eej.ulster.ac.uk/~ian/modules/EEE515J1/ files We could add the blocks on the left to every digital machine we design... This is the start of designing a “general purpose digital machine” - a CPU 4 The SHC01 (see SHC01.pdf) 30/11/07 The minimum to do useful work – has many areas that can be improved; it only has one accumulator (and a temporary register). It cannot, as it stands, implement subroutines or even indexed memory accesses. It has only 8 bit data and address buses. Has a PROGRAM ROM where every instruction code (OPCODE) and operand is stored, starts at address zero Requires 22 control signals emitted in the correct order for everything to work allows up to 16 microinstructions for each OPCODE loaded into the IR (Instruction Register See the fetch-execute tables and microcode tables to see how this machine works (the .pdf on the website/handout in class) www.eej.ulster.ac.uk/~ian/modules/EEE515J1/ files 5 DATA BUS – 8 bits ACCA MDR S S ALU S E RESULT S C[2..0] i IR PC MAR E CONTROL UNIT ROM 13 ADDRESSES 22 DATA OUTPUTS LAT i.e { ACCAS, MDRS, RESULTS, RESULTE, IRS, PCS, PCi, PCE, MARS, MARE, ALU[C2..C0], ROME.RAMS, RAME, INPE, OUTS, LAT[d3..d0] } Hence the ROM is 2^13 x 22 bits in size S E ADDRESS BUS 8 bits E The control unit ROM outputs signals to;control Strobing data into a register (using the 'S' lines) Enabling outputs from registers or buffers ('E') Controlling function of the ALU (C2,C1 and C0) Incrementing the PC (the 'I' line) Supply a 4 bit number to the LAT latch, (this causes the ROM to switch to (typically) the next microinstruction) 30/11/07 S S PROGRAM ROM DATA RAM E E INPUT BUFFER S OUTPUT REG www.eej.ulster.ac.uk/~ian/modules/EEE515J1/ files 6 POWER UP SEQUENCE and fetch-execute of first instruction (assumes immediate ADD) Step # ACTION 0 PCE=1 (PC) -> AB PUT PROGRAM COUNTER CONTENTS ONTO ADDRESS BUS 1 ROME=1 (ROM) -> DB READ THE PROGRAM ROM. OPCODE NOW ON DATA BUS 2 PCI=1,PCE=1,IRS= 1 (PC)+1->PC, (PC)->AB, (DB)->IR POINT PC AT OPERAND, AND READ THE ROM; ITS CONTENTS GO INTO THE IR 3 ROME=1 (ROM)->DB ADDRESS BUS SETTLES WITH NEW VALUE; THE ADDRESS OF THE OPERAND 4 MDRS=1 (DB)->MDR PUT IT IN THE MDR 5 ALU=ADD ALU=(ACC)+(MDR) EXECUTE THE INSTRUCTION 6 RESULTS=1 (ALU)->RESULTS 7 RESULTE=1 (RESULTS)->DB PUT ANSWER ONTO DATA BUS 8 ACCS=1 (DB)-ACC AND INTO ACC. 9 PCI=1 THESE ARE PART OF 10 PCE=1.ROME=1 THE NEXT 11 IRS=1,PCI=1,PCE= 1 FETCH-EXECUTE TABLE RESULT { ROUND BRACKETS MEAN “CONTENTS OF” } COMMENT Do examine the 5 page handout carefully – check the microcode tables that implement the above 30/11/07 www.eej.ulster.ac.uk/~ian/modules/EEE515J1/ files 7 Improving the SHC01 1) Use a REGISTER BANK WRS A REG_WRITE_ADDRESS[2..0] B REGISTER BANK (8 registers) REGA_READ_ADDRESS[2..0] REGB_READ_ADDRESS[2..0] C2 C1 ALU C0 S RESULT E 30/11/07 The Register bank needs10 control signals instead of 2, but the control logic can be altered to make this efficient – take bits direct from the IR to the register address lines. Suits larger machines – 16 bits and above www.eej.ulster.ac.uk/~ian/modules/EEE515J1/ files 8 Improving the SHC01 2) Use a bigger ALU or 3) a secondary ALU REG bank ALU ALU Function code Secondary ALU (e.g MULTIPLIER 30/11/07 www.eej.ulster.ac.uk/~ian/modules/EEE515J1/ files 9 Improving the SHC01 4) Improve memory addressing capability MARS S MAR PC I E MARE MARI MARII MARD MARDD MARCLR S ROM S RAM E 30/11/07 (a) (b) (c) (d) (e) Increment Double Increment Decrement Double Decrement Reset (to access address zero) www.eej.ulster.ac.uk/~ian/modules/EEE515J1/ files 10 Improving the SHC01 5) DATA BUS S I Add a second MAR - MAR1 PC E MAR2 ADDRESS BUS S ROM S RAM E 30/11/07 www.eej.ulster.ac.uk/~ian/modules/EEE515J1/ files If you have a source and destination address in external RAM in makes sense to have two address pointers within the CPU MAR2 will need the usual S and E lines, it makes sense to also add others (c.f. previous 11 slide) Improving the SHC01 Add a second ALU – to allow calculated addresses 6) DATA BUS MAR1 PC MAR2 TEMPREG2 ADDRESS BUS REG Secondary ALU (-simple adder) ROM RAM 30/11/07 www.eej.ulster.ac.uk/~ian/modules/EEE515J1/ files 12 Now to optimise the Control unit. It currently needs 13 inputs and 22 outputs If implemented as a large ROM it needs 2^13 * 22 bits = 180,224 bits 30/11/07 www.eej.ulster.ac.uk/~ian/modules/EEE515J1/ files 13 MICROPROGRAMMING On Powerup the IR and LATCH are at zero, so the first address presented at the inputs of the MICROCODE ROM is INSTRUCTION REGISTER X 0000-0000 0000 8 4 STATUS BIT FROM ALU CONTROL UNIT ROM clk CONTAINING MICROCODE 4 BIT LATCH 18 To all 'S' and 'E' control signals, also to ALU C2, C1 and C0 control lines, AS and BS strobe lines, PCI Increment line (PCI) 30/11/07 The first thing to do is put the PC’s contents onto the address bus Next Enable the PROGAM ROMs outputs (onto the databus) Next The IR is strobed – the first real opcode is now in the IR and the ROM has a new address … depending on what that opcode is! The Microcode performs a “microjump” to the new microcode www.eej.ulster.ac.uk/~ian/modules/EEE515J1/ files 14 Improving the CONTROL UNIT of SHC01 1) Replace LAT with “MICROPROGRAM COUNTER” INSTRUCTION REGISTER 8 4 STATUS BIT FROM ALU CONTROL UNIT ROM clk CONTAINING MICROCODE 4 bit LATCH 2 If we use just microorders “COUNT” and “RESET” this saves 2 outputs from the control unit so its new size is 2^13 X 20 (....168,340 bits ...) Actually we can remove the need for “RESET” if we complicate the microcode. its new size is 2^13 X 19 (...155,648 bits...) It is even possible to have “COUNT” as a default option and remove the need for it as well – at this stage the microcode becomes hard to follow – so this step is left until the very end when a number of obfuscating optimisations can be carried out 30/11/07 www.eej.ulster.ac.uk/~ian/modules/EEE515J1/ files 15 Improving the CONTROL UNIT of SHC01 2) Look for redundancy in the control signals - PCE/MARe PC PCE MAR Drop MARE and use an invertor wired to PCE since we see that PCE and MARE are never '1' at the same time and it does no harm to have one of these at '1' all the time. (“00” not used) This saves an output, CU ROM is now 2^13 X 18 (....147,456 bits...) 3) Look for redundancy in the control signals – mutually exclusive 'S' lines It so happens that we never activate more than one S line at a time – we can use a decoder, There are times when no S lines are active so it is convenient to use a 3:8 decoder and provide 7 S lines with a 3 bit number emitted from the Control unit ROM CU ROM is 2^13 X 14 (...114,688 bits...) 30/11/07 www.eej.ulster.ac.uk/~ian/modules/EEE515J1/ files 16 Improving the CONTROL UNIT of SHC01 4) Look for redundancy in the control signals - NANOMEMORY INSTRUCTION REGISTER 8 4 STATUS BIT FROM ALU CONTROL UNIT ROM CONTAINING lookup number of MICROCODE 7 clk NANOMEMORY 5 inputs and 24 outputs 4 BIT LATCH 12 30/11/07 Although the CU ROM could output many different patterns, if we analyse the complete set of microcode we might discover, for example, we only need 100 different emissions. Hence we use a “LOOKUP TABLE” to generate these. The CU ROM outputs a number between 0 and 99 and the NANOMEMORY emits the required wide microinstruction CU ROM is 2^13 X 7 = 57,344 and NANOMEMORY is 2^7 X 14 = 1778 giving total of (...59,122 bits...) www.eej.ulster.ac.uk/~ian/modules/EEE515J1/ files 17 Improving the CONTROL UNIT of SHC01 5) Only provide the opcodes actually wanted – probably less than 254 Although the CU ROM could provide many different opcodes, such a simple architecture may only need 50 or so opcodes, we can keep IR7 and IR6 low all the time – hence only apply 6 bits to the ROM from the IR INSTRUCTION REGISTER 6 4 STATUS BIT FROM ALU CONTROL UNIT ROM CONTAINING lookup number of MICROCODE 7 clk NANOMEMORY 5 inputs and 24 outputs 4 BIT LATCH CU ROM is 2^11 X 7 = 14336 and NANOMEMORY is 12 2^7 X 14 = 1778 giving total of 30/11/07 www.eej.ulster.ac.uk/~ian/modules/EEE515J1/ files (...16,114 bits...) 18 Improving the CONTROL UNIT of SHC01 6) Use fields in the IR to drive control signals directly INSTRUCTION REGISTER 2 e.g to ALU fn or REG bank addresses CONTROL UNIT ROM Although more common in bigger machines (e.g 16 bits) we can divide the IR into fields and “wire” them directly to parts of the CPU, bypassing the CU and saving space there. If a field in the IR is used as a “MODE” field it can drive multiplexors and switches to route the other IR fields to different parts of the CPU. This is used in, for example, the PDP11 to allow fields to be used to drive the ALU or the ADDRESS calculation sections. At this point the architecture (and microcode) become complicated - and beyond the course! 30/11/07 www.eej.ulster.ac.uk/~ian/modules/EEE515J1/ files 19 Summary Be able to sketch a typical CPU Be able to sketch a typical CONTROL UNIT Be able to work out FETCH-EXECUTE tables for simple (explained)instructions Be able to write out a MICROCODE table, including whatever steps are required at powerup to get the machine going Be able to suggest architectural improvements to the CPU Be able to sketch CONTROL UNIT improvements and calculate the resulting savings in ROM sizes. 30/11/07 www.eej.ulster.ac.uk/~ian/modules/EEE515J1/ files 20