Transcript Document

Performance
Computer Architecture – CS401
Erkay Savas
Sabanci University
7/17/2015
Erkay Savas
1
Performance
•
•
•
•
•
What is performance?
How to measure performance?
Performance metrics
Performance evaluation
Why some hardware perform better than
others for different programs?
• What factors in hardware are related to
system overall performance?
• How does the machine's instruction set
affect performance?
7/17/2015
Erkay Savas
2
Airplane Analogy
• Which of these airplanes has the best
performance?
Airplane
Boeing 777
Passenger
throughput
Passenger Range Speed
Capacity (miles) (m.p.h) (passenger x m.p.h)
228750
375
4630
610
Boeing 747
470
4150
610
268700
Airbus A 3xx
656
8400
600
393600
Concorde
132
4000
1350
178200
Douglas DC-8-50
146
8720
544
79424
7/17/2015
Erkay Savas
3
Computer Performance
• Response time (latency)
– How long does it take for my job to run?
– How long does it take to execute a program?
– How long must I wait for a database query?
• Throughput
– How many jobs can the machine run at once?
– What is the average execution rate?
– How much work is getting done?
• If we upgrade a machine with a new processor
what do we increase?
• If we add a new machine what do we increase?
7/17/2015
Erkay Savas
4
Which Time to Measure?
• Elapsed Time (Wall clock time, response time)
– Counts everything (disk and memory access, I/O,
operating system overhead, work on other
processes)
– Useful but not always good for comparison purposes
• CPU (execution) time
– The time CPU spends computing for the user task
– Not include time spent waiting for I/O, running
other programs
– user CPU time CPU time spent within the program,
– system CPU time CPU time spent in the operating
system performing tasks on behalf of the program
7/17/2015
Erkay Savas
5
CPU Time
• Unix time command reflects this breakdown by
returning the following when prompted:
90.7u 12.9s 2:39 65%
Interpretation:
• User CPU time is 90.7 s
• System CPU time is 12.9s
• Elapsed time is 159 s ( 90.7+12.9)
• CPU time is 65% of total elapsed time
7/17/2015
Erkay Savas
6
A Definition of Performance
• For some program running on machine X
PerformanceX = 1/Execution_timeX
• The machine X is said to be “n times
faster” than the machine Y if
PerformanceX/PerformanceY = n
Execution_timeY/Execution_timeX = n
• Example: Machine A runs a program in 10
seconds and machine B runs the same
program in 15 seconds, how much faster is
A than B?
7/17/2015
Erkay Savas
7
Metrics of Performance
• “Time to execute a program” is the ultimate
metric in determining the performance
• However, it is convenient to inspect other metrics
as well when we examine the details of a machine.
• Computers use a clock that runs at a constant
rate and determines when an event takes place in
hardware.
• These discrete time intervals are called clock
cycles (or ticks, clock ticks, clock periods).
• Clock rate (frequency) is the inverse of clock
period.
7/17/2015
Erkay Savas
8
Clock Cycles
• Clock “ticks” indicate when to start activities
time
Start of events often the rising
edge of the clock
• Instead of reporting execution time in seconds,
we often use cycles
seconds
cycles seconds


program program cycle
7/17/2015
Erkay Savas
9
Clock Cycle
• cycle time (CT) = time between ticks = seconds
per cycle
• Cycle Count (CC): the number of clock cycles to
execute a program
• clock rate (frequency) = cycles per second
(1 Hz = 1 cycle/sec)
• A 200 MHz clock has a 1/(200·106) = ?
nanosecond cycle time
• A 4 GHz clock has a 1/(4· 109) = ? nanosecond
cycle time
7/17/2015
Erkay Savas
10
CPI
• CPI Clocks Per Instruction
–
–
–
–
Number of cycles spent on an instruction on average.
CC = IC  CPI
Hard to compute.
It is useful when comparing the performances of two
machines with the same ISA. (Why?)
• Example: two machines with the same ISA. For
a certain program we have
–
–
–
–
Machine A: CPI = 2.0
Machine B: CPI = 1.2
Which machine is faster?
What if machine A uses 250 ps and machine B 500 ps
cycle time
7/17/2015
Erkay Savas
11
Improving Performance
seconds
cycles seconds


program program cycles
So, to improve performance
1. Increase the clock frequency (i.e. decrease
the clock period)
2. Reduce the number of the clock cycles per
program (IC  CPI)
7/17/2015
Erkay Savas
12
Instruction  Cycle ?
• No !
• The number of cycles per instruction depends
on the implementations of the instructions in
hardware
• The number differs for each processor (even
with the same ISA)
7/17/2015
Erkay Savas
13
The Reason
• Operations take different number of cycles
– Multiplication takes longer than addition
– Floating point operations take longer than integer
operations
– The access time to a register is much shorter than
access to the main memory.
7/17/2015
Erkay Savas
14
Simple Formulae for CPU Time
• CPU execution time =
CPU clock cycles for a program  Clock
cycle time (CC  CT)
• CPU execution time =
CPU clock cycles for a program/Clock rate
• We can write
CPU clock cycles for a program =
IC  CPI
• Then
CPU execution time = (IC  CPI)/Clock rate
7/17/2015
Erkay Savas
15
Example
• Computer A of 800 MHz
– It runs our favorite program in 15 s
• Our goal
– Design computer B with the same ISA
– It will run the same program in 8 s.
• We will use a new technology
– can increase the clock rate;
– however, it will also increase CPI by 1.25.
• What clock rate should we aim to use?
7/17/2015
Erkay Savas
16
Performance
• Performance is determined by execution time
(CPU time)
• We have also other indicators
–
–
–
–
–
# of cycles to execute program
# of instructions in program (IC)
# of cycles per second
average # of cycles per instruction (CPI)
average # of instructions per second
• Common pitfall: thinking one of the variables is
indicative of performance when it really isn’t.
7/17/2015
Erkay Savas
17
Number of Instructions Example
•
A compiler designer has the following two
alternatives to generate a certain piece of
code with instructions
A(1 cycle) , B (2 cycles), and C(3 cycles):
1. 2106 of A, 106 of B, and 2106 of C
(IC = 5106)
2. 4106 of A, 106 of B, and 106 of C
(IC = 6106)
–
Which code sequence is faster?
7/17/2015
Erkay Savas
18
MIPS
• Millions Instructions Per Second =
MIPS
MIPS
MIPS
MIPS
=
=
=
=
IC/(Execution_time  106)
IC/(#of clocks  cycle time  106)
(IC  clock rate)/(IC  CPI  106)
clock rate/(CPI  106)
• A faster machine has a higher MIPS
Execution_time = IC/(MIPS  106)
7/17/2015
Erkay Savas
19
A MIPS Example
• A computer with 500 MHz clock
– Three different classes of instructions:
– A (1 cycle), B (2 cycles), C (3 cycles)
• Two compilers used to produce code for a large
piece of software.
– Compiler 1:
– 5 billion A, 1 billion B, and 1 billion C instructions.
– Compiler 2:
• 10 billion A, 1 billion B, and 1 billion C instructions.
• Which sequence will be faster according to
execution time?
• Which sequence will be faster according to MIPS?
7/17/2015
Erkay Savas
20
Problems of MIPS
• MIPS specifies instruction execution rate
• MIPS does not take into account the
capabilities of the instructions
– Thus, it is impossible to compare computers with
different ISA using MIPS.
• MIPS is not constant, even on a single machine,
depends on the application.
• As we saw in the previous example, MIPS can
vary inversely with performance.
7/17/2015
Erkay Savas
21
CPI example
• CPI
– Machine A: CPI = 10/7 = 1.43
– Machine B: CPI = 15/12 = 1.25
• CPU time
– CPU time = (IC  CPI) / clock rate
– Let us assume both machines use 200 MHz clock
7/17/2015
Erkay Savas
22
Overview
•
•
A given program will require
1. Some number of instructions
2. Some number of clock cycles
3. Some number of seconds
Vocabulary
–
–
–
–
–
Cycle time: (micro or nano) seconds per cycle
Clock rate (frequency): cycles per second
CPI: clock per instruction
MIPS: millions of instruction per second
MFLOPS: millions of floating point operations per
second
7/17/2015
Erkay Savas
23
Performance
• Performance is ultimately determined by
execution time
• Is any of the following metrics good to measure
performance by itself? Why?
–
–
–
–
–
# of cycles to execute a program
# of instructions in a program
# of cycles per second
Average # of cycles per instruction
Average # number of instructions per second
7/17/2015
Erkay Savas
24
Question
• Assuming two machines have the same ISA,
which of the following quantities are identical?
–
–
–
–
–
Clock rate
CPI
Execution time
# of instructions
MIPS
7/17/2015
Erkay Savas
25
Program Performance
HW or SW
component
Algorithm
Affects what?
How?
IC, possibly CPI
Programming IC, CPI
Language
Compiler
IC, CPI
ISA
7/17/2015
IC, clock rate,
CPI
Erkay Savas
26
Benchmarks
• Programs specifically chosen to measure
performance
– must reflect typical workload of the user
• Benchmark types
–
–
–
–
Real applications
Small benchmarks
Benchmark suites
Synthetic benchmarks
7/17/2015
Erkay Savas
27
Real Applications
• Workload: Set of programs a typical user
runs day in and day out.
• To use these real applications for metrics is a
direct way of comparing the execution time
of the workload on two machines.
• Using real applications for metrics has
certain restrictions:
–
–
–
–
They are usually big
Takes time to port to different machines
Takes considerable time to execute
Hard to observe the outcome of a certain
improvement technique
7/17/2015
Erkay Savas
28
Comparing & Summarizing Performance
Computer A
Computer B
Program 1
1s
100 s
Program 2
1000 s
100 s
Total time
1001 s
200 s
• A is 100 times faster than B for program 1
• B is 10 times faster than A for program 2
• For total performance, arithmetic mean is used:
1 n
AM   Timei
n i 1
7/17/2015
Erkay Savas
29
Arithmetic Mean
• If each program, in the workload, do not run
equal times, then we have to use weighted
arithmetic mean
1 n
AM   wi  Timei
n i 1
• Suppose that the program 1 runs 10 times as
often as the program 2. Which machine is
faster?
Computer A
Computer B
Program 2 (seconds)
weight
10
1
1
1000
100
100
Weighted AM
-
?
?
Program 1 (seconds)
7/17/2015
Erkay Savas
30