Transcript org2-2
Part 2 A five-level memory hierarchy. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0 Note cost vs. size. 1. 2. 3. 4. 5. All instructions are directly executed by hardware. Maximize the rate at which instructions are issued. Instructions should be easy to decode. Only loads and stores should reference memory. Provide many registers. 1. All instructions are directly executed by hardware. Eliminate the microcode interpreter 2. Maximize the rate at which instructions are issued. If you issue 500 MIPS, you have a 500 MIPS machine. Parallelism 3. Instructions should be easy to decode. Made possible by regular, fixed-length instructions w/ a small number of fields. Fewer instructions are better. Fewer instruction formats are better. 4. Only loads and stores should reference memory. Memory access takes a long time. Most instructions should use registers. Separate ops for load & store. can be done in parallel 5. Provide many registers. At least 32! Time consuming to have to save registers temporarily and reload them later. Ways to increase speed: a. increase the clock speed b. parallelism types: 1. processor/core level 2. instruction level Fetching instruction from memory is slow. So use a Prefetch Buffer = set of registers (memory) containing instructions to be executed. Fetch and execution can now be done in parallel! A five-stage pipeline The state of each stage as a function of time. Nine Tanenbaum, Structured Computer Fifth Edition, (c) 2006 clock cycles areOrganization, illustrated. Pearson Education, Inc. All rights reserved. 0-13-148521-0 Latency = time to execute instruction Bandwidth = MIPS (instructions per second – typically in millions) Cycle time = time to move through 1 stage of the pipeline = clock rate = clock cycle Problem: Let the clock rate = 3 nsec/stage and the execution of each instruction requires 6 stages or steps. a. What is the bandwidth in MIPS for a machine without any pipeline (i.e., without any instructionlevel parallelism)? b. What is the bandwidth in MIPS for a machine with a pipeline? Problem: Let the clock rate = 3 nsec/stage and the execution of each instruction requires 6 stages or steps. a. What is the bandwidth in MIPS for a machine without any pipeline (i.e., without any instructionlevel parallelism)? 6 stages/inst x 3x10-9 sec/stage = 18x10-9 sec/inst 1 inst/18x10-9 sec = 56 MIPS 6 stages 1 instructio n 3 10 9 seconds 1 stage 18 10 9 seconds 1 instructio n Problem: Let the clock rate = 3 nsec/stage and the execution of each instruction requires 6 stages or steps. a. What is the bandwidth in MIPS for a machine without any pipeline (i.e., without any instructionlevel parallelism)? 6 stages/inst x 3x10-9 sec/stage = 18x10-9 sec/inst 1 inst/18x10-9 sec = 56 MIPS b. What is the bandwidth in MIPS for a machine with a pipeline? Problem: Let the clock rate = 3 nsec/stage and the execution of each instruction requires 6 stages or steps. a. What is the bandwidth in MIPS for a machine without any pipeline (i.e., without any instructionlevel parallelism)? 6 stages/inst x 3x10-9 sec/stage = 18x10-9 sec/inst 1 inst/18x10-9 sec = 56 MIPS b. What is the bandwidth in MIPS for a machine with a pipeline? 3x10-9 sec/inst 1 inst/3x10-9 sec = 333 MIPS fetches pairs of instructions Dual five-stage pipelines with a common instruction fetch unit. Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0 Note: Since 2 inst can be executed at the same time (S4), they must not conflict over resource usage (e.g., register) and neither must depend on the result of the other. How can we insure this? Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0 Note: Since 2 inst can be executed at the same time (S4), they must not conflict over resource usage (e.g., register) and neither must depend on the result of the other. How can we insure this? (1) hardware, (2) compiler Tanenbaum, Structured Computer Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0 386 – no pipeline 486 – one pipeline first generation Pentium two 5-stage pipelines: 1. u pipeline - can execute any instruction 2. v pipeline – limited; only integer instructions or FXCH P4 – 20 stages “The later "Prescott" and "Cedar Mill" Pentium 4 cores (and their Pentium D derivatives) had a 31-stage pipeline, the longest in mainstream consumer computing.” http://en.wikipedia.org/wiki/Instruction_pipeline Nehalem (16 pipeline stages), Enhanced Core, and Sandy Bridge microachitecture (next few slides; see http://www.intel.com/content/dam/doc/manual/64-ia-32architectures-optimization-manual.pdf) S3 issued every clock cycle S4 may require more than 1 clock cycle Tanenbaum, Structured Computer A superscalar processor with five functional units. Organization, Fifth Edition, (c) 2006 Pearson Education, Inc. All rights reserved. 0-13-148521-0