Computer Architecture Dr. R. Venkatesan Fall 2005 PREREQUISITES • • • • • • • • Digital Logic: basic building blocks, design Computer programming: object-oriented Computer Organization: Microprocessors Basic Instruction Set: Assembly Language Computer Interfacing:

Download Report

Transcript Computer Architecture Dr. R. Venkatesan Fall 2005 PREREQUISITES • • • • • • • • Digital Logic: basic building blocks, design Computer programming: object-oriented Computer Organization: Microprocessors Basic Instruction Set: Assembly Language Computer Interfacing:

Computer Architecture
Dr. R. Venkatesan
Fall 2005
PREREQUISITES
•
•
•
•
•
•
•
•
Digital Logic: basic building blocks, design
Computer programming: object-oriented
Computer Organization: Microprocessors
Basic Instruction Set: Assembly Language
Computer Interfacing: Microprocessors
Computer Design: Digital Systems Design
HDL: concurrency, delay: VHDL
HLL compilers
ENGR 6861 Fall 2005
R. Venkatesan, Memorial University
2
What makes a better architecture
•
•
•
•
•
Higher performance: speed, throughput
Elegance: symmetry, simplicity, orthogonality
Flexibility: scalability – upwards/downwards
Power-efficiency
Low cost: mostly depends on above factors
However, a sleeker architecture need not be the most popular
architecture, as marketing skills and market lead are
perhaps the two most important factors to achieve
popularity or business success.
Having said that, we should remember that even market
leaders have to improve their architecture in order to
retain their popularity and success.
ENGR 6861 Fall 2005
R. Venkatesan, Memorial University
3
Computers and Processors
• Old classification of computers: Micro, mini,
mainframe, super
• Classification based on use: general-purpose,
servers, embedded systems, special-purpose: DSP;
numerical coprocessors for division, convolution,
FFT, etc.; graphic/video processors; audio/speech
processors; data processors; communication
processors including network processors, security
processors, codecs for error control; and so on.
• Cluster processors; distributed computers; shared
memory multiprocessors; supercomputers; array
processors including systolic arrays
ENGR 6861 Fall 2005
R. Venkatesan, Memorial University
4
Enabling Technologies
• IC technology: CMOS feature size, integration limits,
Moore’s Law
• Memory (DRAM) technology: memory size, memory
cost, memory speed: lags behind processor speed
and the gap is getting progressively wider
• Mass/Secondary storage technology: portability
• Network technology: LAN, MAN, WAN, Ethernet,
ATM, Internet, Bluetooth, Wireless technologies,
WiFi, WiMax, 2G, 2.5G, 3G, 4G, 5G, UWB, adhoc,
sensor networks
ENGR 6861 Fall 2005
R. Venkatesan, Memorial University
5
Measuring Performance
• Performance is inversely related to the execution
time of the application
• Possible measures: clock-on-the-wall time, response
time, CPU time for the user application (without or
with the system call times for the application),
processor clock speed, metrics such as MIPS,
GFLOPS, TOPS, MPolygons/s and kTrans/s,
• If we can measure the actual CPU time to execute
application on the target computer: best but possible?
• Benchmarks and benchmark suites: compare the
performance with a “standard” computer/processor
ENGR 6861 Fall 2005
R. Venkatesan, Memorial University
6
Processor Benchmarks
• Real applications: e.g. GCC, MS-Word, LaTex
• Modified (scripted) applications: to enhance a
particular aspect of the processor like multiuser
access
• Kernels: small repeated code; e.g. Livermore loops
• Toy Benchmarks: Puzzle, QuickSort, Sieve of
Eratosthenes
• Synthetic Benchmarks: Whetstone (numeric),
Dhrystone (data I/O), Dhampstone
• Benchmark Suites: combinations of the above for
selected focus: SPECCPU2000, SPECint1997,
SPECfp1992, SPECWeb, SPECSFS, TPC-C, etc.
ENGR 6861 Fall 2005
R. Venkatesan, Memorial University
7
Performance comparison – example 1
COMPUTERS
A
Program P1 (s)
Program P2 (s)
Total time (s)
1
1000
1001
Arith. Mean W(1) 2.00
Arith. Mean W(2) 91.91
Arith. Mean W(3) 500.30
ENGR 6861 Fall 2005
Weightings
B
C
W(1)
W(2)
W(3)
10
100
110
20
20
40
0.991
0.009
0.909
0.091
0.50
0.50
10.09 20.00
18.19 20.00
55.00 20.00
R. Venkatesan, Memorial University
8
Example contd..
Normalized to A
A
B
Normalized to B
C
A
B
C
Normalized to C
A
B
C
0.5
5.0
2.75
1.58
1.0
1.0
1.0
1.0
Program P1
Program P2
Arithmetic Mean
Geometric Mean
1.0
1.0
1.0
1.0
10.0
0.1
5.05
1.0
20.0
0.02
10.01
0.63
0.1
10.0
5.05
1.0
1.0
1.0
1.0
1.0
2.0
0.05
0.2
50.0
2.0
25.03
0.63 1.58
Total Time
1.0
0.11
0.04
9.1
1.0
0.36 25.03 2.75 1.0
P1
P2
A
1
1000
ENGR 6861 Fall 2005
B
10
100
C
20
20
(reproduced here for ease)
R. Venkatesan, Memorial University
9
Improving performance of computer
• Use faster material: silicon, GaA, InP
• Use faster technology: photochemical lithography
• Employ better architecture within one processor
– Selection of instruction set: RISC/CISC, VLIW
– Cache (levels of cache): higher throughput
– Virtual memory: relocatability, security
– Pipelining: k stages gives a maximum speedup of k
• Superpipelining,
• Superscalar (multiple pipelines) with dynamic scheduling
– Branch prediction
• Use multiple processors: emphasis of this course
– Scalability, level of parallelism
– Shared memory, array processing, multicomputers, MPP
• Employ better software: compilers, etc.
ENGR 6861 Fall 2005
R. Venkatesan, Memorial University
10
Speedup
• Any (architectural) enhancement will hopefully lead to
better performance, and speedup is a measure of this
improvement.
• Performance improvement should be based on the
total CPU time taken to execute the application, and
not just any of the component times like memory
access time or clock period.
• If the whole processor is replicated, then the fraction
enhanced is 100%, as the whole computation will be
impacted.
• If an enhancement affects only a part of the
computation, then we need to determine the fraction
of the CPU time impacted by the enhancement.
ENGR 6861 Fall 2005
R. Venkatesan, Memorial University
11
Amdahl’s Law
• The following simple, but important law tells us that we need
to always aim at making enhancements that will affect a large
fraction of the computation, if not the whole computation.
Speedup 
Performance for entire task using the enhancement when possible
Performance for entire task without using the enhancement
Speedup 
1
Fraction enhanced
(1 - Fraction enhanced) 
Speedup of the enhancement
ENGR 6861 Fall 2005
R. Venkatesan, Memorial University
12
CPU (Computation) time
• CPU time is the product of three quantities:
– Number of instructions executed or Instruction Count (IC):
remember this is not the code (program) size
– Average number of clock cycles per instruction (CPI): if CPI
varies for different instruction, a weighted average is needed
– Clock period (τ)
• CPU time = IC * CPI * t
• An architectural (or compiler-based) enhancement
that is aimed to decrease one of the above two factors
might end up increasing one or both of the other two.
It is the product of the three quantities after applying
the enhancement that gives us the new CPU time.
ENGR 6861 Fall 2005
R. Venkatesan, Memorial University
13
CPU Performance Equation
CPU Time = IC * CPI avg. * t
CPU Time = CPU clock cycles for a program * t
CPU Time = CPU clock cycles for a program / f
CPI avg. = CPU clock cycles for a program / IC
CPU Time = IC * CPI avg. / f
# inst./pgm. * clk. cylces/instn. * secs./clk. cycle = secs./pgm. = CPU time
MIPS = IC / (execution time * 106) = f / (CPI * 106)
Execution time = IC / (MIPS * 106)
ENGR 6861 Fall 2005
R. Venkatesan, Memorial University
14
Speedup example
• Three enhancements for different parts of
computation are contemplated, with speedups of 40,
20 and 5, respectively. E1 improves 20%, E2
improves 30% and E3 improves 70% of the
computation. Assuming both cost the same, which is
a better choice?
• Speedup due to E1 = 1 / ((1-0.2) + 0.2/40) = 1.242
• Speedup due to E2 = 1 / ((1-0.3) + 0.3/20) = 1.399
• Speedup due to E3 = 1 / ((1-0.7) + 0.7/5) = 2.272
• So, a higher fraction enhanced is more beneficial
than a huge speedup for a small fraction.
• So, frequency of execution of different instructions
becomes important – statistics.
ENGR 6861 Fall 2005
R. Venkatesan, Memorial University
15