#### Transcript Lect 16: Benchmarks and Performance Metrics

Lect 16: Benchmarks and Performance Metrics 1 Measurement Tools • Benchmarks, Traces, Mixes • Cost, delay, area, power estimation • Simulation (many levels) ISA, RT, Gate, Circuit • Queuing Theory • Rules of Thumb • Fundamental Laws Maeng Lect 16-2 Marketing Metrics MIPS = Instruction Count / Time * 106 = Clock Rate / CPI * 106 Machines with different instruction sets ? Programs with different instruction mixes ? • Dynamic frequency of instructions Uncorrelated with performance MFLOPS = FP Operations / Time * 106 Machine dependent Often not where time is spent Normalized: add,sub,compare,mult 1 divide, sqrt 4 exp, sin, . . . 8 Maeng Lect 16-3 Fallacies and Pitfalls MIPS is an accurate measure for comparing performance among computers dependent on the instr. set varies between programs on the same computer can vary inversely to performance MFLOPS is a consistent and useful measure of performance dependent on the machine and on the program not applicable outside the floating-point performance the set of floating-point ops is not consistent across the machines Maeng Lect 16-4 Programs to Evaluate Processor Performance • (Toy) Benchmarks 10-100 line program e.g.: sieve, puzzle, quicksort • Synthetic Benchmarks Attempt to match average frequencies of real workloads e.g., Whetstone, dhrystone • Kernels Time critical excerpts of real programs e.g., Livermore loops • Real programs e.g., gcc, spice Maeng Lect 16-5 Types of Benchmarks • Architectural Synthetic mixes: WHETSTONE, DHRYSTONE, ... • Algorithmic LINPACK • Kernels Self contained sub-programs such as PDE without Input/Output • Production Working code for a significant problem PERFECT and SPEC • Workload Maeng Lect 16-6 Levels of Benchmark Specification • Problem Statement Algorithm + code production Reflect more the effort and skill than it does the system capability • Solution Method NASA Ames Reflect more the effort and skill than it does the system capability • Source Language Code Performing the same operation necessary baseline from which to measure the effectiveness of ‘smart’ compiler options Maeng Lect 16-7 Benchmarking Games • • • • • Differing configurations used to run the same workload on two systems Compiler wired to optimize the workload Workload arbitrarily picked Very small benchmarks used Benchmarks manually translated to optimize performance Maeng Lect 16-8 Common Benchmarking Mistakes • Only average behavior represented in test workload • Not ensuring same initial conditions • “Benchmark engineering” particular optimization different compilers or preprocessors runtime libraries Maeng Lect 16-9 Benchmarks • DHRYSTONE A synthetic benchmark Non-numeric system-type programming Contains fewer loops, simpler calculation and more ‘if’ statements C code • LINPACK Argonne National Lab Solution of linear equations in FORTRAN environment Solution method and the code levels Vectorised processors • SPEC Standard Performance Evaluation Corp. Non-profit group of computer vendors, systems integrators, universities, research organizations, publishers and consultants throughout the world * http://www.specbench.org Maeng Lect 16-10 SPEC • Groups Open Systems Group(OSG) • CPU committee • SFS committee : file server benchmarks • SDM committee: multi-user Unix Commands Benchmarks High Performance Group(HPG)] • SMP, Workstation Clusters, DSM, Vector Processors, .. Graphics Performance Characterization Groups(GPC) • What metrics can be measured? CINT 95 and CFP 95 • C : ‘component level’ benchmarks – the performance of the processor, the memory architecture and the compiler – “I/O’, networking, or graphics not measured by CINT 95 and CFP 95 • S : ‘system level’ benchmarks Maeng Lect 16-11 SPEC: System Performance Evaluation Cooperative First Round 1989 10 programs yielding a single number Second Round 1992 SpecInt92 (6 integer programs) and SpecFP92 (14 floating point programs VAX-11/780 Third Round 1995 Single flag setting for all programs; new set of programs “benchmarks useful for 3 years” non-baseline, baseline SPARCstation 10 Model 40 Fourth Round 1998 Under development Maeng Lect 16-12 SPEC First Round One program: 99% of time in single line of code New front-end compiler could improve dramatically 800 700 500 400 300 200 100 tomcatv fpppp matrix300 eqntott li nasa7 doduc spice epresso 0 gcc SPEC Perf 600 Benchmark Maeng Lect 16-13 CPU95 CINT95 099.go : An internationally ranked go-playing program. 124.m88ksim : A chip simulator for the Motorola 88100 microprocessor. 126.gcc : Based on the GNU C compiler version 2.5.3. 129.compress : A in-memory version of the common UNIX utility. 130.li : Xlisp interpreter. 132.ijpeg : Image compression/decompression on in-memory images. 134.perl : An interpreter for the Perl language. 147.vortex : An object oriented database. CFP95 101.tomcatv : Vectorized mesh generation. 102.swim : Shallow water equations. 103.su2cor : Monte-Carlo method. 104.hydro2d : Navier Stokes equations. 107.mgrid : 3d potential field. 110.applu : Partial differential equations. 125.turb3d : Turbulence modeling. 141.apsi : Weather prediction. 145.fpppp : From Gaussian series of quantum chemistry benchmarks. 146.wave5 : Maxwell's equations. Maeng Lect 16-14 CINT95 (written in C) • SPECint95 The geometric mean of eight normalized ratios (one for each integer benchmark) when compiled with aggressive optimization for each benchmark. • SPECint_base95 The geometric mean of eight normalized ratios when compiled with conservative optimization for each benchmark. • SPECint_rate95 The geometric mean of eight normalized throughput ratios when compiled with aggressive optimization for each benchmark. • SPECint_rate_base95 The geometric mean of eight normalized throughput ratios when compiled with conservative optimization for each benchmark. Maeng Lect 16-15 CFP95 (written in FORTRAN) • SPECfp95 The geometric mean of 10 normalized ratios (one for each floating point benchmark) when compiled with aggressive optimization for each benchmark. • SPECfp_base95 The geometric mean of 10 normalized ratios when compiled with conservative optimization for each benchmark. • SPECfp_rate95 The geometric mean of 10 normalized throughput ratios when compiled with aggressive optimization for each benchmark. • SPECfp_rate_base95 The geometric mean of 10 normalized throughput ratios when compiled with conservative optimization for each benchmark. Maeng Lect 16-16 The Pros and Cons of Geometric Means • • • • Independent of the running times of the individual programs Independent of the reference machines Do not predict execution time To focus the attention on the benchmarks where the performance is easiest to improve 2 sec --> 1 sec , 10000 sec --> 5000 sec “crack”, benchmark engineering Maeng Lect 16-17