Evaluating Computer Performance

Download Report

Transcript Evaluating Computer Performance

Evaluating Computer
Performance
Edward L. Bosworth, Ph.D.
Computer Science Department
Columbus State University
The IBM Stretch
• The IBM 7030, called
the “Stretch” was
intended to be 100
times as fast as existing
IBM models, such as
the 704 and 705.
• Question: How was the
performance to be
measured?
Typical Comments
• 1. I want a better computer.
• 2. I want a faster computer.
• 3. I want a computer or network of computers that
does more work.
• 4. I have the latest game in “World of Warcraft” and
want a computer that can play it.
• QUESTION: What does “better” mean?
• What does “faster” really mean, beyond the obvious
• What does it mean for a computer to do more work?
The Need for Performance
• What applications require such computational power?
• Weather modeling. We would like to have weather
predictions that are accurate up to two weeks after the
prediction. This requires a great deal of computational
power and data storage. It was not until the 1990’s
that the models had the Appalachian Mountains.
• We would like to model the flight characteristics of an
airplane design before we actually build it. We have a
set of equations for the flow of air over the wings; at
present these would take tens of years to solve.
• Some AI applications, such as the IBM Watson.
The Job Mix
• What is our speed measure: the time to run one job or
the number of jobs run per unit time?
• The World of Warcraft example is a good illustration of a
job mix. Here the job mix is simple; run this game.
• Most computers run a mix of jobs. To assess the
suitability of a computer for such an environment, one
needs a proper “job mix”, that represent the computing
need of one’s organization.
• These days, few organizations have the time to specify a
good job mix. The more common option is to use the
results of commonly available benchmarks, which are
job mixes tailored to common applications.
What Is Performance?
• In many applications, especially the old “batch mode”
computing, the measure was the number of jobs per
unit time. The more user jobs that could be processed;
the better.
• For a single computer running spreadsheets, the speed
might be measured in the number of calculations per
second.
• For computers that support process monitoring, the
requirement is that the computer correctly assesses
the process and takes corrective action (possibly to
include warning a human operator) within the shortest
possible time.
Two “Big and Fast” Computer Types
• Here we mention two classes of large computer
systems and the performance measures
appropriate for each class.
• A supercomputer, such as the Cray XT5, is
designed to work one big and very complex
problem at a time. Its performance measure:
time to solve each problem.
• An enterprise computer, such as the IBM z/9, is
designed to process a very large number of
simple transactions. Its performance measure:
the number of transactions per second.
Hard Real Time
• Some systems, called “hard real time”, are those in which
there is a fixed time interval during which the computer
must produce and answer or take corrective action.
• As an example, consider a process monitoring computer
with a required response time of 15 seconds. There are
two performance measures of interest.
• 1. Can it respond within the required 15 seconds? If not,
it cannot be used.
• 2. How many processes can be monitored while
guaranteeing the required 15 second response time for
each process being monitored. The more, the better.
• A fire alarm with a two-day response time is unusable.
Statistics: The Average and Median
• The problem starts with a list of N numbers (A1, A2, …,
AN), with N  2.
• The average is computed by the (A1 + A2 + + AN) / N.
• The median is the “value in the middle”. Half of the
values are larger and half are smaller. For a small even
number of values, there might be 2 candidate medians.
• The average and median are often close, but may be
wildly different. Consider Bill Gate’s high school class.
The average income is above one million dollars, thanks
to Bill and another Microsoft founder. The median is
about $40,000.
The Weighted Average
• In certain averages, one might want to pay more
attention to some values than others.
For example, in assessing an instruction mix, one might
want to give a weight to each instruction that
corresponds to its percentage in the mix.
• Each of our numbers (A1, A2, …, AN), with N  2, has an
associated weight.
So we have (A1, A2, …, AN) and (W1, W2, …, WN). The
weighted average is given by the formula
(W1A1 + W2A2 …+ WNAN) / (W1 + W2 +…+ WN).
• NOTE: If all the weights are equal, this value becomes
the arithmetic mean.
Other Averages
• Geometric Mean
• The geometric mean is the Nth root of the product
(A1A2…AN)1/N. It is generally applied only to positive
numbers, as we are considering here.
• Some of the SPEC benchmarks (discussed later) report the
geometric mean.
• Harmonic Mean
• The harmonic mean is N / ( (1/A1) + (1 / A2) + … + ( 1 / AN) )
• This is more useful for averaging rates or speeds. As an
example, suppose that you drive at 40 miles per hour for
half the distance and 60 miles per hour for the other half.
Your average speed is 48 miles per hour.
More on the Harmonic Mean
• Are we averaging by time or distance driven?
• By distance
Drive 300 miles at 40 mph. Time is 7.5 hours.
Drive 300 miles at 60 mph. Time is 5.0 hours.
• You have covered 600 miles in 12.5 hours, for an
average speed of 48 mph.
• But: Drive 1 hour at 40 mph. You cover 40 miles.
Drive 1 hour at 60 mph. You cover 60 miles.
That is 100 miles in 2 hours; 50 mph.
Performance & Execution Time
• Whatever it is, performance is inversely related to
execution time. The longer the execution time, the less
the performance.
• Again, we assess a computer’s performance by
measuring the time to execute either a single program or
a mix of computer programs.
• Wall-clock time is easiest to measure, but may include
time not spent on the target program or job mix.
• CPU execution time is the time the CPU actually spends
executing the target program. It is harder to measure.
MIPS, MFLOPS, etc.
• Here are some commonly used performance measures.
• The first is MIPS (Million Instructions Per Second).
• Another measure is the FLOPS sequence, commonly used
to specify the performance of supercomputers, which tend
to use floating–point math fairly heavily. The sequence is:
• MFLOPS Million Floating Point Operations Per Second
• GFLOPS
Billion Floating Point Operations Per Second
(The term “giga” is the standard prefix for 109.)
• TFLOPS
Trillion Floating Point Operations Per Second
(The term “tera” is the standard prefix for 1012.)
• Note that these measures are for test programs. The
actual sustained performance will be somewhat lower.
MIPS as a Performance Measure
• The term “MIPS” had its origin in the marketing
departments of IBM and DEC, to sell the IBM
370/158 and VAX–11/780.
• One wag has suggested that the term “MIPS”
stands for “Meaningless Indicator of Performance
for Salesmen”.
• A more significant reason for the decline in the
popularity of the term “MIPS” is the fact that it
just measures the number of instructions
executed and not what these do.
MIPS vs. Work Done
• Consider the high-level language instruction
A[K++] = B. How many assembly language
instructions are used to realize this one line?
• A VAX-11/780 would require 1 instruction.
• The MIPS would require at least two.
A[K] = B
K=K+1
• We want to measure the time to solve the
problem, not just some instruction count.
Understanding Performance

Algorithm


Programming language, compiler, architecture


Determine number of machine instructions executed
per operation
Processor and memory system


Determines number of operations executed
Determine how fast instructions are executed
I/O system (including OS)

Determines how fast I/O operations are executed
Chapter 1 — Computer Abstractions and Technology — 17

Which airplane has the best performance?
Boeing 777
Boeing 777
Boeing 747
Boeing 747
BAC/Sud
Concorde
BAC/Sud
Concorde
Douglas
DC-8-50
Douglas DC8-50
0
100
200
300
400
0
500
Boeing 777
Boeing 777
Boeing 747
Boeing 747
BAC/Sud
Concorde
BAC/Sud
Concorde
Douglas
DC-8-50
Douglas DC8-50
500
1000
Cruising Speed (mph)
4000
6000
8000 10000
Cruising Range (miles)
Passenger Capacity
0
2000
§1.4 Performance
Defining Performance
1500
0
100000 200000 300000 400000
Passengers x mph
Chapter 1 — Computer Abstractions and Technology — 18
Response Time and Throughput

Response time


How long it takes to do a task
Throughput

Total work done per unit time


How are response time and throughput affected
by



e.g., tasks/transactions/… per hour
Replacing the processor with a faster version?
Adding more processors?
We’ll focus on response time for now…
Chapter 1 — Computer Abstractions and Technology — 19
CPU Clocking

Operation of digital hardware governed by a
constant-rate clock
Clock period
Clock (cycles)
Data transfer
and computation
Update state

Clock period: duration of a clock cycle


e.g., 250ps = 0.25ns = 250×10–12s
Clock frequency (rate): cycles per second

e.g., 4.0GHz = 4000MHz = 4.0×109Hz
Chapter 1 — Computer Abstractions and Technology — 20
CPU Time
CPU Time  CPU Clock Cycles Clock Cycle Time
CPU Clock Cycles

Clock Rate

Performance improved by



Reducing number of clock cycles
Increasing clock rate
Hardware designer must often trade off clock
rate against cycle count
Chapter 1 — Computer Abstractions and Technology — 21
Performance Summary
The BIG Picture
Instructions Clock cycles Seconds
CPU Time 


Program
Instruction Clock cycle

Performance depends on




Algorithm: affects IC, possibly CPI
Programming language: affects IC, CPI
Compiler: affects IC, CPI
Instruction set architecture: affects IC, CPI, Tc
Chapter 1 — Computer Abstractions and Technology — 22
Games People Play with Benchmarks
• Synthetic benchmarks (Whetstone, Linpack, and Dhrystone) are
convenient, but easy to fool. These problems arise directly from
the commercial pressures to quote good benchmark numbers.
• Manufacturers can equip their compilers with special switches to
emit code that is tailored to optimize a given benchmark at the cost
of slower performance on a more general job mix. “Just get us
some good numbers!”
• The benchmarks are usually small enough to be run out of cache
memory.
This says nothing of the efficiency of the entire memory system,
which must include cache memory, main memory, and support for
virtual memory.
• A 1995 Intel special compiler was designed only to excel in the SPEC
integer benchmark. Its code was fast, but incorrect.
• Small benchmarks often give overly optimistic results.
SPEC Benchmarks
• The SPEC (Standard Performance Evaluation Corporation) was founded
in 1988 by a consortium of computer manufacturers.
• As of 2007, the current SPEC benchmarks were:
• 1. CPU2006 measures CPU throughput, cache and memory access
speed, and compiler efficiency. This has two components:
•
SPECint2006
to test integer processing, and
SPECfp2006
to test floating point processing.
• 2. SPEC MPI 2007 measures the performance of parallel computing
systems and clusters running MPI (Message–Passing Interface).
• 3. SPECweb2005 is a set of benchmarks for web servers.
• 4. SPEC JBB2005 is a set of benchmarks for server–side Java
performance.
• 5. SPEC JVM98 is a set of benchmarks for client–side Java performance.
• 6. SPEC MAIL 2001 is a set of benchmarks for mail servers.
Concluding Remarks on
Benchmarks
• First, remember the great temptation to manipulate
benchmark results for commercial advantage.
As the Romans said, “Caveat emptor”.
• Do not forget Amdahl’s Law, which computes the
improvement in overall system performance due to the
improvement in a specific component.
• This law was formulated by George Amdahl in 1967. One
formulation of the law is given in the following equation.
S = 1 / [ (1 – f) + (f /K) ]
• S is the speedup of the overall system,
• f is the fraction of work performed by the faster
component, and K is the speedup of the new component.
Amdahl’s Law for Multiprocessors
• Consider a computer system with K CPU’s.
• Let f represent the fraction of the job mix that
can be executed in parallel. Then (1 – f)
represent the amount of sequential code.
• Amdahl’s law remains the same.
S = 1 / [ (1 – f) + (f /K) ]
• As the number of processors becomes large,
this approaches a limiting value S = 1/(1 – f).
Some Results for Amdahl’s Law