0657.311A Computer Systems Architecture

Download Report

Transcript 0657.311A Computer Systems Architecture

Performance – Last Lecture

Bottom line performance measure is
time


PerformanceA = 1/Execution TimeA
Comparing Performance

N = PerformanceA /
PerformanceB
Example

If a machine A runs a program 25 seconds
and machine B runs the same program in 20
seconds, how much faster is machine B that
machine A?
Metrics of performance
Answers per month
Application
Useful Operations per second
Programming
Language
Compiler
ISA
(millions) of Instructions per second – MIPS
(millions) of (F.P.) operations per second – MFLOP/s
Datapath
Control
Megabytes per second
Function Units
Transistors Wires Pins
Cycles per second (clock rate)
Each metric has a place and a purpose, and each can be misused
Relating Metrics

Instead of reporting execution time in seconds, we often use
cycles
seconds
cycles
seconds


program program
cycle

So, to improve performance (everything else being equal) you
can either
________ the # of required cycles for a program, or
________ the clock cycle time or, said another way,
________ the clock rate.
Example

Our favourite program runs in 10 seconds on computer A,
which has a 400 Mhz. clock. We are trying to help a
computer designer build a new machine B, that will run
this program in 6 seconds. The designer can use new (or
perhaps more expensive) technology to substantially
increase the clock rate, but has informed us that this
increase will affect the rest of the CPU design, causing
machine B to require 1.2 times as many clock cycles as
machine A for the same program. What clock rate
should we tell the designer to target?"

Following formula relates most basic metrics
to CPU time:
CPU time
= Seconds
Program

= Instructions x Cycles
Program
x Seconds
Instruction
How do we measure these metrics?
Cycle
Measuring CPI


Often obtained by a detailed simulation of an
architecture or by using CPU counters.
Sometimes know clock cycle count for
different instruction types:
n
CPU Clock Cycles   CPIi  Ci 
i 1
where CPIi  averagenum berof clock cycles for instrucrion of class i
Ci  count of instructions of class i
Amdahl's Law
Execution Time After Improvement =
Execution Time Unaffected +( Execution Time Affected / Amount of
Improvement )

Example:
"Suppose a program runs in 100 seconds on a machine,
with multiply responsible for 80 seconds of this time. How
much do we have to improve the speed of multiplication if we
want the program to run 4 times faster?"
How about making it 5 times faster?

Principle: Make the common case fast
Example

Suppose we enhance a machine making all floating-point
instructions run five times faster. If the execution time of some
benchmark before the floating-point enhancement is 10
seconds, what will the speedup be if half of the 10 seconds is
spent executing floating-point instructions?

We are looking for a benchmark to show off the new floatingpoint unit described above, and want the overall benchmark to
show a speedup of 3. One benchmark we are considering runs
for 100 seconds with the old floating-point hardware. How much
of the execution time would floating-point instructions have to
account for in this program in order to yield our desired speedup
on this benchmark?
Remember

Performance is specific to a particular program/s


For a given architecture performance increases come from:




Total execution time is a consistent summary of performance
increases in clock rate (without adverse CPI affects)
improvements in processor organization that lower CPI
compiler enhancements that lower CPI and/or instruction count
Pitfall: expecting improvement in one aspect of a machine’s
performance to affect the total performance
Evaluating Performance


Real workloads good but often not possible
Mostly use benchmarks





Real applications best
SPEC benchmarks – CPU92, CPU95, CPU2000
SPECweb99
Guiding principle to reporting performance is
“reproducibility”
i.e. detailing:


CPU details
Software (e.g. compiler version etc)
Example

/proc/cpuifo
processor
: 0
vendor_id
: GenuineIntel
type
: primary processor
cpu family
: 6
model
: 9
model name
:
Intel(R) Pentium(R) M processor 1600MHz
stepping
: 5
brand id
: 6
cpu count
: 0
apic id
: 0
cpu MHz
: 599
fpu
: yes
flags
: fpu vme de pse tsc msr mce cx8 sep mtrr pge mca cmov pat clfl
dtes acpi mmx fxsr sse sse2 tmi pbe tm2 est