EGRE 426 Computer Organization and Design Chapter 4

Download Report

Transcript EGRE 426 Computer Organization and Design Chapter 4

EGRE 426
Computer Organization and Design
Chapter 4
Performance
• Performance is important!
– Often determines viability of the hardware software system.
• Consider running windows XP on a PC with performance of the original IBM
PC (4.8 MHz clock)
• Determining performance can be difficult.
• Response Time (latency)
— How long does it take for my job to run?
— How long does it take to execute a job?
— How long must I wait for the database query?
• Throughput
— How many jobs can the machine run at once?
— What is the average execution rate?
— How much work is getting done?
2
Determining performance can be difficult
• Instruction execution times.
– When a salesman quotes a MIPS (millions of instructions per second)
value he is guaranteeing that the machine will not run faster than that
value.
• Benchmarks programs are useful but can produce misleading results.
– Benchmarks may depend on small sections of repetitive code.
– There have been many instances of compilers being optimized to do well
on popular benchmarks.
• Real programs provide best indication of performance.
– Should be chosen based on user needs.
• Scientific applications have different requirements than large data base
applications.
3
Spec95 Benchmarks
• The System Performance Evaluation Cooperative (SPEC)
group was formed in 1988 by representatives of many
computer companies.
• Most popular and comprehensive set of CPU benchmarks.
– 8 integer and 10 floating-point programs (see Fig 2.6 page 72).
4
SPEC ‘95
Benchmark
go
m88ksim
gcc
compress
li
ijpeg
perl
vortex
tomcatv
swim
su2cor
hydro2d
mgrid
applu
trub3d
apsi
fpppp
wave5
Description
Artificial intelligence; plays the game of Go
Motorola 88k chip simulator; runs test program
The Gnu C compiler generating SPARC code
Compresses and decompresses file in memory
Lisp interpreter
Graphic compression and decompression
Manipulates strings and prime numbers in the special-purpose programming language Perl
A database program
A mesh generation program
Shallow water model with 513 x 513 grid
quantum physics; Monte Carlo simulation
Astrophysics; Hydrodynamic Naiver Stokes equations
Multigrid solver in 3-D potential field
Parabolic/elliptic partial differential equations
Simulates isotropic, homogeneous turbulence in a cube
Solves problems regarding temperature, wind velocity, and distribution of pollutant
Quantum chemistry
Plasma physics; electromagnetic particle simulation
5
Amdahl's Law
Execution Time After Improvement =
Execution Time Unaffected +( Execution Time Affected / Amount of
Improvement )
• Example:
"Suppose a program runs in 100 seconds on a machine, with
multiply responsible for 80 seconds of this time. How much do we
have to improve the speed of multiplication if we want the program to
run 4 times faster?"
How about making it 5 times faster?
• Principle: Make the common case fast
6
Execution Time
• Elapsed Time
– counts everything (disk and memory accesses, I/O , etc.)
– a useful number, but often not good for comparison purposes
• CPU time
– doesn't count I/O or time spent running other programs
– can be broken up into system time, and user time
• Our focus: user CPU time
– time spent executing the lines of code that are "in" our program
7
For a given instruction set architecture
increase in CPU performance can come form
three sources
• Increase in clock rate or reduction in clock cycles per
instruction.
• Better compilers
• Improvements in processor architecture
8
Terms
• Cycle time or clock cycle time.
– If clock frequency, f = 400 MHz then cycle time T = 1/f = 2.5 ns.
• CPI – cycles per instruction or clocks per instruction.
– Different instructions may require different number of clock cycles
to execute.
• MIPS – million of instructions per second.
–
–
–
–
Varies depending on instruction stream.
Peak MIPS – Best case instruction stream.
Native MIPS – Typical instruction stream.
MIPS = (instruction count) / (execution time x 106)
= average number of instructions executed in one micro sec.
9
An example
•
•
•
•
Assume we only need to consider CPU time.
Let clock rate = 400 MHz = 400 million cycles/sec.
Three types of instructions: A, B, and C.
Assume we run a program that executes 1000 million
instructions.
10
An example continued
Inst
type
CPI
Millions of
instruction
Millions of cycles
Time in sec.
MIPS
T = 1.2 x 109
x 2.5 ns = 3 sec
300/3 = 100
400/4 = 100
A
4
300
300 inst x 4
cy/inst = 1200 cy
B
6
500
500 x 6 = 3000
3.0 x 2.5 = 7.5
sec
500/7.5 = 67
400/6 = 67
200
200 x 8 = 1600
1.6 x 2.5 = 4 sec
200/4 = 50
400/8 = 50
C
Total
Avg
8
1000
CPI = 5800 Mcy/1000 Minst = 5.8
5800
14.5 sec
1000Mi/14.5
sec = 69 native
11