Computer Architecture Dr. R. Venkatesan Fall 2005 PREREQUISITES • • • • • • • • Digital Logic: basic building blocks, design Computer programming: object-oriented Computer Organization: Microprocessors Basic Instruction Set: Assembly Language Computer Interfacing:
Download ReportTranscript Computer Architecture Dr. R. Venkatesan Fall 2005 PREREQUISITES • • • • • • • • Digital Logic: basic building blocks, design Computer programming: object-oriented Computer Organization: Microprocessors Basic Instruction Set: Assembly Language Computer Interfacing:
Computer Architecture Dr. R. Venkatesan Fall 2005 PREREQUISITES • • • • • • • • Digital Logic: basic building blocks, design Computer programming: object-oriented Computer Organization: Microprocessors Basic Instruction Set: Assembly Language Computer Interfacing: Microprocessors Computer Design: Digital Systems Design HDL: concurrency, delay: VHDL HLL compilers ENGR 6861 Fall 2005 R. Venkatesan, Memorial University 2 What makes a better architecture • • • • • Higher performance: speed, throughput Elegance: symmetry, simplicity, orthogonality Flexibility: scalability – upwards/downwards Power-efficiency Low cost: mostly depends on above factors However, a sleeker architecture need not be the most popular architecture, as marketing skills and market lead are perhaps the two most important factors to achieve popularity or business success. Having said that, we should remember that even market leaders have to improve their architecture in order to retain their popularity and success. ENGR 6861 Fall 2005 R. Venkatesan, Memorial University 3 Computers and Processors • Old classification of computers: Micro, mini, mainframe, super • Classification based on use: general-purpose, servers, embedded systems, special-purpose: DSP; numerical coprocessors for division, convolution, FFT, etc.; graphic/video processors; audio/speech processors; data processors; communication processors including network processors, security processors, codecs for error control; and so on. • Cluster processors; distributed computers; shared memory multiprocessors; supercomputers; array processors including systolic arrays ENGR 6861 Fall 2005 R. Venkatesan, Memorial University 4 Enabling Technologies • IC technology: CMOS feature size, integration limits, Moore’s Law • Memory (DRAM) technology: memory size, memory cost, memory speed: lags behind processor speed and the gap is getting progressively wider • Mass/Secondary storage technology: portability • Network technology: LAN, MAN, WAN, Ethernet, ATM, Internet, Bluetooth, Wireless technologies, WiFi, WiMax, 2G, 2.5G, 3G, 4G, 5G, UWB, adhoc, sensor networks ENGR 6861 Fall 2005 R. Venkatesan, Memorial University 5 Measuring Performance • Performance is inversely related to the execution time of the application • Possible measures: clock-on-the-wall time, response time, CPU time for the user application (without or with the system call times for the application), processor clock speed, metrics such as MIPS, GFLOPS, TOPS, MPolygons/s and kTrans/s, • If we can measure the actual CPU time to execute application on the target computer: best but possible? • Benchmarks and benchmark suites: compare the performance with a “standard” computer/processor ENGR 6861 Fall 2005 R. Venkatesan, Memorial University 6 Processor Benchmarks • Real applications: e.g. GCC, MS-Word, LaTex • Modified (scripted) applications: to enhance a particular aspect of the processor like multiuser access • Kernels: small repeated code; e.g. Livermore loops • Toy Benchmarks: Puzzle, QuickSort, Sieve of Eratosthenes • Synthetic Benchmarks: Whetstone (numeric), Dhrystone (data I/O), Dhampstone • Benchmark Suites: combinations of the above for selected focus: SPECCPU2000, SPECint1997, SPECfp1992, SPECWeb, SPECSFS, TPC-C, etc. ENGR 6861 Fall 2005 R. Venkatesan, Memorial University 7 Performance comparison – example 1 COMPUTERS A Program P1 (s) Program P2 (s) Total time (s) 1 1000 1001 Arith. Mean W(1) 2.00 Arith. Mean W(2) 91.91 Arith. Mean W(3) 500.30 ENGR 6861 Fall 2005 Weightings B C W(1) W(2) W(3) 10 100 110 20 20 40 0.991 0.009 0.909 0.091 0.50 0.50 10.09 20.00 18.19 20.00 55.00 20.00 R. Venkatesan, Memorial University 8 Example contd.. Normalized to A A B Normalized to B C A B C Normalized to C A B C 0.5 5.0 2.75 1.58 1.0 1.0 1.0 1.0 Program P1 Program P2 Arithmetic Mean Geometric Mean 1.0 1.0 1.0 1.0 10.0 0.1 5.05 1.0 20.0 0.02 10.01 0.63 0.1 10.0 5.05 1.0 1.0 1.0 1.0 1.0 2.0 0.05 0.2 50.0 2.0 25.03 0.63 1.58 Total Time 1.0 0.11 0.04 9.1 1.0 0.36 25.03 2.75 1.0 P1 P2 A 1 1000 ENGR 6861 Fall 2005 B 10 100 C 20 20 (reproduced here for ease) R. Venkatesan, Memorial University 9 Improving performance of computer • Use faster material: silicon, GaA, InP • Use faster technology: photochemical lithography • Employ better architecture within one processor – Selection of instruction set: RISC/CISC, VLIW – Cache (levels of cache): higher throughput – Virtual memory: relocatability, security – Pipelining: k stages gives a maximum speedup of k • Superpipelining, • Superscalar (multiple pipelines) with dynamic scheduling – Branch prediction • Use multiple processors: emphasis of this course – Scalability, level of parallelism – Shared memory, array processing, multicomputers, MPP • Employ better software: compilers, etc. ENGR 6861 Fall 2005 R. Venkatesan, Memorial University 10 Speedup • Any (architectural) enhancement will hopefully lead to better performance, and speedup is a measure of this improvement. • Performance improvement should be based on the total CPU time taken to execute the application, and not just any of the component times like memory access time or clock period. • If the whole processor is replicated, then the fraction enhanced is 100%, as the whole computation will be impacted. • If an enhancement affects only a part of the computation, then we need to determine the fraction of the CPU time impacted by the enhancement. ENGR 6861 Fall 2005 R. Venkatesan, Memorial University 11 Amdahl’s Law • The following simple, but important law tells us that we need to always aim at making enhancements that will affect a large fraction of the computation, if not the whole computation. Speedup Performance for entire task using the enhancement when possible Performance for entire task without using the enhancement Speedup 1 Fraction enhanced (1 - Fraction enhanced) Speedup of the enhancement ENGR 6861 Fall 2005 R. Venkatesan, Memorial University 12 CPU (Computation) time • CPU time is the product of three quantities: – Number of instructions executed or Instruction Count (IC): remember this is not the code (program) size – Average number of clock cycles per instruction (CPI): if CPI varies for different instruction, a weighted average is needed – Clock period (τ) • CPU time = IC * CPI * t • An architectural (or compiler-based) enhancement that is aimed to decrease one of the above two factors might end up increasing one or both of the other two. It is the product of the three quantities after applying the enhancement that gives us the new CPU time. ENGR 6861 Fall 2005 R. Venkatesan, Memorial University 13 CPU Performance Equation CPU Time = IC * CPI avg. * t CPU Time = CPU clock cycles for a program * t CPU Time = CPU clock cycles for a program / f CPI avg. = CPU clock cycles for a program / IC CPU Time = IC * CPI avg. / f # inst./pgm. * clk. cylces/instn. * secs./clk. cycle = secs./pgm. = CPU time MIPS = IC / (execution time * 106) = f / (CPI * 106) Execution time = IC / (MIPS * 106) ENGR 6861 Fall 2005 R. Venkatesan, Memorial University 14 Speedup example • Three enhancements for different parts of computation are contemplated, with speedups of 40, 20 and 5, respectively. E1 improves 20%, E2 improves 30% and E3 improves 70% of the computation. Assuming both cost the same, which is a better choice? • Speedup due to E1 = 1 / ((1-0.2) + 0.2/40) = 1.242 • Speedup due to E2 = 1 / ((1-0.3) + 0.3/20) = 1.399 • Speedup due to E3 = 1 / ((1-0.7) + 0.7/5) = 2.272 • So, a higher fraction enhanced is more beneficial than a huge speedup for a small fraction. • So, frequency of execution of different instructions becomes important – statistics. ENGR 6861 Fall 2005 R. Venkatesan, Memorial University 15