Transcript CDA 3101 Spring 2001 Introduction to Computer Organization
CDA 3101 Fall 2013
Introduction to Computer Organization
Benchmarks
30 August 2013
Overview
• Benchmarks • Popular benchmarks – Linpack – Intel’s iCOMP • SPEC Benchmarks • MIPS Benchmark • Fallacies and Pitfalls
Benchmarks
• • Benchmarks measure different aspects of component and system performance • • Ideal situation: use real
workload
Types of Benchmarks
• Real programs • Toy benchmarks • Kernels • Synthetic benchmarks
Risk:
adjust design to benchmark requirements – (partial) solution: use real programs and update constantly • Engineering or scientific applications • Software development tools • Transaction processing • Office applications
A
/
Benchmark Story
1. You create a benchmark called the
vmark
2. Run it on lots of different computers 3. Publish the
vmarks
in www.vmark.org
4. vmark
and www.vmark.org become popular – Users start buying their PCs based on
vmark
– Vendors would be banging on your door 5. Vendors examine the
vmark
code and fix up their compilers and/or microarchitecture to run
vmark
6. Your
vmark
benchmark has been broken 7. Create
vmark
2.0
Performance Reports
• Reproducibility – Include hardware / software configuration (
SPEC
) – Evaluation process conditions • Summarizing performance – Total time: – Arithmetic mean : AM = 1/n * Σ exec time
i
– Harmonic mean: HM = n / Σ (1/rate
i
) – Weighted mean: WM = Σ w
i
* exec time
i
– Geometric mean : GM = (Π exec time ratio
i
) 1/n GM (X
i
) X
i
= GM GM (Y
i
) Y
i
Ex.1: Linpack Benchmark
• “Mother of all benchmarks” • Time to solve a dense systems of linear equations DO I = 1, N DY(I) = DY(I) + DA * DX(I) END DO • Metrics – R peak : system peak Gflops – N max : matrix size that gives the highest Gflops – N 1/2 : matrix size that achieves half the rated R max – R max : the Gflops achieved for the N max size matrix Gflops • Used in http://www.top500.org
Ex.2: Intel’s iCOMP Index 3.0
• New version (3.0) reflects: • Mix of instructions for existing and emerging software. • Increasing use of 3D, multimedia, and Internet software.
• Benchmarks • 2 integer productivity applications (20% each) • 3D geometry and lighting calculations (20%) • FP engineering and finance programs and games (5%) • Multimedia and Internet application (25%.
• Java application (10%) ) • Weighted GM of relative performance – Baseline processor: Pentium II processor at 350MHz
Ex.3: SPEC CPU Benchmarks
• S
ystem P erformance E valuation C orporation
• Need to update/upgrade benchmarks – Longer run time – Larger problems – Application diversity • Rules to run and report – Baseline and optimized
www.spec.org
– Geometric mean of normalized execution times – Reference machine: Sun Ultra5_10 ( 300-MHz SPARC, 256MB ) • CPU2006: latest SPEC CPU benchmark (4 th – 12 integer and 17 floating point programs • Metrics:
response time
and
throughput
version)
Ex.3: SPEC CPU Benchmarks
1989-2006
Previous Benchmarks, now retired
Ex.3: SPEC CPU Benchmarks
•
Observe
: We will use SPEC 2000 & 2006 CPU benchmarks in this set of notes.
•
Task
: However, you are asked to read about SPEC 2006 CPU benchmark suite, described at
www.spec.org/cpu2006
•
Result: Compare SPEC 2006 with SPEC 2000 data www.spec.org/cpu2000 to answer the extra-credit questions in Homework #2.
SPEC CINT2000 Benchmarks
1.
2.
3.
4.
5.
6.
7.
8.
164.gzip 175.vpr
176.gcc
181.mcf
186.crafty C 197.parser
252.eon
C C C C C C++ Computer Visualization 253.perlbmk C 9.
254.gap
C 10. 255.vortex C 11. 256.bzip2
12. 300.twolf
C C Compression FPGA Circuit Placement and Routing C Programming Language Compiler Combinatorial Optimization Game Playing: Chess Word Processing PERL Programming Language Group Theory, Interpreter Object-oriented Database Compression Place and Route Simulator
SPEC CFP2000 Benchmarks
1.
2.
3.
4.
5.
6.
168.wupwise F77 171.swim
F77 172.mgrid
F77 173.applu
177.mesa
178.galgel
F77 C F90 7.
8.
179.art
183.equake
9.
187.facerec
10. 188.ammp
11. 189.lucas
C C F90 C F90 12. 191.fma3d
F90 13. 200.sixtrack F77 14. 301.apsi
F77 Physics / Quantum Chromodynamics Shallow Water Modeling Multi-grid Solver: 3D Potential Field Parabolic / Elliptic Partial Differential Equations 3-D Graphics Library Computational Fluid Dynamics Image Recognition / Neural Networks Seismic Wave Propagation Simulation Image Processing: Face Recognition Computational Chemistry Number Theory / Primality Testing Finite-element Crash Simulation High Energy Nuclear Physics Accelerator Design Meteorology: Pollutant Distribution
SPECINT2000 Metrics
• • • •
SPECint2000
: The geometric mean of 12 normalized ratios (one for each integer benchmark) when each benchmark is compiled with "aggressive" optimization
SPECint_base2000
: The geometric mean of 12 normalized ratios when compiled with "conservative" optimization
SPECint_rate2000
: The geometric mean of 12 normalized
throughput
ratios when compiled with "aggressive" optimization
SPECint_rate_base2000
: The geometric mean of 12 normalized
throughput
ratios when compiled with "conservative" optimization
SPECint_base2000 Results
Mips/IRIX R12000@ 400MHz Intel/NT 4.0
PIII @ 733 MHz Alpha/Tru64 21264 @ 667 MHz
SPECfp_base2000 Results
Mips/IRIX R12000@ 400MHz Alpha/Tru64 21264 @ 667 MHz Intel/NT 4.0
PIII @ 733 MHz
Effect of CPI:
SPECint95 Ratings
Microarchitecture improvements
CPU time = IC * CPI * clock cycle
Effect of CPI:
SPECfp95 Ratings
Microarchitecture improvements
SPEC Recommended Readings
SPEC 2006 – Survey of Benchmark Programs
http://www.spec.org/cpu2006/publications/CPU2006benchmarks.pdf
SPEC 2006 Benchmarks
- Journal Articles on Implementation Techniques and Problems http://www.spec.org/cpu2006/publications/SIGARCH-2007-03/
SPEC 2006 Installation, Build, and Runtime Issues
http://www.spec.org/cpu2006/issues/
Another Benchmark:
MIPS
• • •
Millions of Instructions Per Second
• MIPS = IC / (CPUtime * 10 6 ) • Comparing apples to oranges
Flaw:
1 MIPS on one processor does not accomplish the same work as 1 MIPS on another –
It is like determining the winner of a foot race by counting who used fewer steps
– Some processors do FP in software (e.g. 1FP = 100 INT) – Different instructions take different amounts of time
Useful for comparisons between 2 processors from the same vendor that support the same ISA with the same compiler
(e.g. Intel’s iCOMP benchmark)
Fallacies and Pitfalls
• Ignoring Amdahl’s law • Using clock rate or MIPS as a performance metric • Using the Arithmetic Mean of normalized CPU times ( ratios ) instead of the Geometric Mean • Using hardware-independent metrics – Using code size as a measure of speed • Synthetic benchmarks predict performance – They do not reflect the behavior of real programs • The geometric mean of CPU times ratios is proportional to the total execution time [NOT!!]
Conclusions
• Performance is specific to a particular program/s •
CPU time: only adequate measure of performance
• For a given ISA performance increases come from: – increases in clock rate (without adverse CPI affects) – improvements in processor organization that lower CPI – compiler enhancements that lower CPI and/or IC •
Your workload: the ideal benchmark
• You should not always believe everything you read!