Transcript Document
Performance
Computer Architecture – CS401
Erkay Savas
Sabanci University
7/17/2015
Erkay Savas
1
Performance
•
•
•
•
•
What is performance?
How to measure performance?
Performance metrics
Performance evaluation
Why some hardware perform better than
others for different programs?
• What factors in hardware are related to
system overall performance?
• How does the machine's instruction set
affect performance?
7/17/2015
Erkay Savas
2
Airplane Analogy
• Which of these airplanes has the best
performance?
Airplane
Boeing 777
Passenger
throughput
Passenger Range Speed
Capacity (miles) (m.p.h) (passenger x m.p.h)
228750
375
4630
610
Boeing 747
470
4150
610
268700
Airbus A 3xx
656
8400
600
393600
Concorde
132
4000
1350
178200
Douglas DC-8-50
146
8720
544
79424
7/17/2015
Erkay Savas
3
Computer Performance
• Response time (latency)
– How long does it take for my job to run?
– How long does it take to execute a program?
– How long must I wait for a database query?
• Throughput
– How many jobs can the machine run at once?
– What is the average execution rate?
– How much work is getting done?
• If we upgrade a machine with a new processor
what do we increase?
• If we add a new machine what do we increase?
7/17/2015
Erkay Savas
4
Which Time to Measure?
• Elapsed Time (Wall clock time, response time)
– Counts everything (disk and memory access, I/O,
operating system overhead, work on other
processes)
– Useful but not always good for comparison purposes
• CPU (execution) time
– The time CPU spends computing for the user task
– Not include time spent waiting for I/O, running
other programs
– user CPU time CPU time spent within the program,
– system CPU time CPU time spent in the operating
system performing tasks on behalf of the program
7/17/2015
Erkay Savas
5
CPU Time
• Unix time command reflects this breakdown by
returning the following when prompted:
90.7u 12.9s 2:39 65%
Interpretation:
• User CPU time is 90.7 s
• System CPU time is 12.9s
• Elapsed time is 159 s ( 90.7+12.9)
• CPU time is 65% of total elapsed time
7/17/2015
Erkay Savas
6
A Definition of Performance
• For some program running on machine X
PerformanceX = 1/Execution_timeX
• The machine X is said to be “n times
faster” than the machine Y if
PerformanceX/PerformanceY = n
Execution_timeY/Execution_timeX = n
• Example: Machine A runs a program in 10
seconds and machine B runs the same
program in 15 seconds, how much faster is
A than B?
7/17/2015
Erkay Savas
7
Metrics of Performance
• “Time to execute a program” is the ultimate
metric in determining the performance
• However, it is convenient to inspect other metrics
as well when we examine the details of a machine.
• Computers use a clock that runs at a constant
rate and determines when an event takes place in
hardware.
• These discrete time intervals are called clock
cycles (or ticks, clock ticks, clock periods).
• Clock rate (frequency) is the inverse of clock
period.
7/17/2015
Erkay Savas
8
Clock Cycles
• Clock “ticks” indicate when to start activities
time
Start of events often the rising
edge of the clock
• Instead of reporting execution time in seconds,
we often use cycles
seconds
cycles seconds
program program cycle
7/17/2015
Erkay Savas
9
Clock Cycle
• cycle time (CT) = time between ticks = seconds
per cycle
• Cycle Count (CC): the number of clock cycles to
execute a program
• clock rate (frequency) = cycles per second
(1 Hz = 1 cycle/sec)
• A 200 MHz clock has a 1/(200·106) = ?
nanosecond cycle time
• A 4 GHz clock has a 1/(4· 109) = ? nanosecond
cycle time
7/17/2015
Erkay Savas
10
CPI
• CPI Clocks Per Instruction
–
–
–
–
Number of cycles spent on an instruction on average.
CC = IC CPI
Hard to compute.
It is useful when comparing the performances of two
machines with the same ISA. (Why?)
• Example: two machines with the same ISA. For
a certain program we have
–
–
–
–
Machine A: CPI = 2.0
Machine B: CPI = 1.2
Which machine is faster?
What if machine A uses 250 ps and machine B 500 ps
cycle time
7/17/2015
Erkay Savas
11
Improving Performance
seconds
cycles seconds
program program cycles
So, to improve performance
1. Increase the clock frequency (i.e. decrease
the clock period)
2. Reduce the number of the clock cycles per
program (IC CPI)
7/17/2015
Erkay Savas
12
Instruction Cycle ?
• No !
• The number of cycles per instruction depends
on the implementations of the instructions in
hardware
• The number differs for each processor (even
with the same ISA)
7/17/2015
Erkay Savas
13
The Reason
• Operations take different number of cycles
– Multiplication takes longer than addition
– Floating point operations take longer than integer
operations
– The access time to a register is much shorter than
access to the main memory.
7/17/2015
Erkay Savas
14
Simple Formulae for CPU Time
• CPU execution time =
CPU clock cycles for a program Clock
cycle time (CC CT)
• CPU execution time =
CPU clock cycles for a program/Clock rate
• We can write
CPU clock cycles for a program =
IC CPI
• Then
CPU execution time = (IC CPI)/Clock rate
7/17/2015
Erkay Savas
15
Example
• Computer A of 800 MHz
– It runs our favorite program in 15 s
• Our goal
– Design computer B with the same ISA
– It will run the same program in 8 s.
• We will use a new technology
– can increase the clock rate;
– however, it will also increase CPI by 1.25.
• What clock rate should we aim to use?
7/17/2015
Erkay Savas
16
Performance
• Performance is determined by execution time
(CPU time)
• We have also other indicators
–
–
–
–
–
# of cycles to execute program
# of instructions in program (IC)
# of cycles per second
average # of cycles per instruction (CPI)
average # of instructions per second
• Common pitfall: thinking one of the variables is
indicative of performance when it really isn’t.
7/17/2015
Erkay Savas
17
Number of Instructions Example
•
A compiler designer has the following two
alternatives to generate a certain piece of
code with instructions
A(1 cycle) , B (2 cycles), and C(3 cycles):
1. 2106 of A, 106 of B, and 2106 of C
(IC = 5106)
2. 4106 of A, 106 of B, and 106 of C
(IC = 6106)
–
Which code sequence is faster?
7/17/2015
Erkay Savas
18
MIPS
• Millions Instructions Per Second =
MIPS
MIPS
MIPS
MIPS
=
=
=
=
IC/(Execution_time 106)
IC/(#of clocks cycle time 106)
(IC clock rate)/(IC CPI 106)
clock rate/(CPI 106)
• A faster machine has a higher MIPS
Execution_time = IC/(MIPS 106)
7/17/2015
Erkay Savas
19
A MIPS Example
• A computer with 500 MHz clock
– Three different classes of instructions:
– A (1 cycle), B (2 cycles), C (3 cycles)
• Two compilers used to produce code for a large
piece of software.
– Compiler 1:
– 5 billion A, 1 billion B, and 1 billion C instructions.
– Compiler 2:
• 10 billion A, 1 billion B, and 1 billion C instructions.
• Which sequence will be faster according to
execution time?
• Which sequence will be faster according to MIPS?
7/17/2015
Erkay Savas
20
Problems of MIPS
• MIPS specifies instruction execution rate
• MIPS does not take into account the
capabilities of the instructions
– Thus, it is impossible to compare computers with
different ISA using MIPS.
• MIPS is not constant, even on a single machine,
depends on the application.
• As we saw in the previous example, MIPS can
vary inversely with performance.
7/17/2015
Erkay Savas
21
CPI example
• CPI
– Machine A: CPI = 10/7 = 1.43
– Machine B: CPI = 15/12 = 1.25
• CPU time
– CPU time = (IC CPI) / clock rate
– Let us assume both machines use 200 MHz clock
7/17/2015
Erkay Savas
22
Overview
•
•
A given program will require
1. Some number of instructions
2. Some number of clock cycles
3. Some number of seconds
Vocabulary
–
–
–
–
–
Cycle time: (micro or nano) seconds per cycle
Clock rate (frequency): cycles per second
CPI: clock per instruction
MIPS: millions of instruction per second
MFLOPS: millions of floating point operations per
second
7/17/2015
Erkay Savas
23
Performance
• Performance is ultimately determined by
execution time
• Is any of the following metrics good to measure
performance by itself? Why?
–
–
–
–
–
# of cycles to execute a program
# of instructions in a program
# of cycles per second
Average # of cycles per instruction
Average # number of instructions per second
7/17/2015
Erkay Savas
24
Question
• Assuming two machines have the same ISA,
which of the following quantities are identical?
–
–
–
–
–
Clock rate
CPI
Execution time
# of instructions
MIPS
7/17/2015
Erkay Savas
25
Program Performance
HW or SW
component
Algorithm
Affects what?
How?
IC, possibly CPI
Programming IC, CPI
Language
Compiler
IC, CPI
ISA
7/17/2015
IC, clock rate,
CPI
Erkay Savas
26
Benchmarks
• Programs specifically chosen to measure
performance
– must reflect typical workload of the user
• Benchmark types
–
–
–
–
Real applications
Small benchmarks
Benchmark suites
Synthetic benchmarks
7/17/2015
Erkay Savas
27
Real Applications
• Workload: Set of programs a typical user
runs day in and day out.
• To use these real applications for metrics is a
direct way of comparing the execution time
of the workload on two machines.
• Using real applications for metrics has
certain restrictions:
–
–
–
–
They are usually big
Takes time to port to different machines
Takes considerable time to execute
Hard to observe the outcome of a certain
improvement technique
7/17/2015
Erkay Savas
28
Comparing & Summarizing Performance
Computer A
Computer B
Program 1
1s
100 s
Program 2
1000 s
100 s
Total time
1001 s
200 s
• A is 100 times faster than B for program 1
• B is 10 times faster than A for program 2
• For total performance, arithmetic mean is used:
1 n
AM Timei
n i 1
7/17/2015
Erkay Savas
29
Arithmetic Mean
• If each program, in the workload, do not run
equal times, then we have to use weighted
arithmetic mean
1 n
AM wi Timei
n i 1
• Suppose that the program 1 runs 10 times as
often as the program 2. Which machine is
faster?
Computer A
Computer B
Program 2 (seconds)
weight
10
1
1
1000
100
100
Weighted AM
-
?
?
Program 1 (seconds)
7/17/2015
Erkay Savas
30