CSCE430/830 Computer Architecture Fundamentals of Computer Design Instructor: Hong Jiang Courtesy of Prof. Yifeng Zhu @ U.

Download Report

Transcript CSCE430/830 Computer Architecture Fundamentals of Computer Design Instructor: Hong Jiang Courtesy of Prof. Yifeng Zhu @ U.

CSCE430/830 Computer
Architecture
Fundamentals of Computer Design
Instructor: Hong Jiang
Courtesy of Prof. Yifeng Zhu @ U. of Maine
Fall, 2007
Portions of these slides are derived from:
Dave Patterson © UCB
Slide 1
Motivations and Introduction
• Phenomenal growth in computer
industry/technology:
X2/18mo in 20yr. multi-GFLOPs
processors, largely due to
–Micro-electronics technology
–Computer Design innovations
• We have come a long way in a short
time of 60 years since the 1st
general purpose computer in 1946:
Slide 2
Motivations and Introduction
Past (Milestones):
– First electronic computer ENIAC in 1946: 18,000 vacuum
tubes, 3,000 cubic feet, 20 2-foot 10-digit registers, 5
KIPs (thousand additions per second);
– First microprocessor (a CPU on a single IC chip) Intel
4004 in 1971: 2,300 transistors, 60 KIPs, $200;
– Virtual elimination of assembly language programming
reduced the need for object-code compatibility;
– The creation of standardized, vendor-independent
operating systems, such as UNIX and its clone, Linux,
lowered the cost and risk of bringing out a new
architecture
– RISC instruction set architecture paved ways for drastic
design innovations that focused on two critical
performance techniques: instruction-level parallelism
and use of caches
Slide 3
Motivations and Introduction
Present (State of the art):
– Microprocessors approaching/surpassing 10 GFLOPS;
– A high-end microprocessor (<$10K) today is easily more
powerful than a supercomputer (>$10million) ten years
ago;
– While technology advancement contributes a sustained
annual growth of 35%, innovative computer design
accounts for another 25% annual growth rate  a factor
of 15 in performance gains!
Slide 4
Technology Trend
In reality:
Big Fish Eating Little Fish
Slide 5
Technology Trend
1988 Computer Food Chain
Mainframe
Supercomputer
Minisupercomputer
Work- PC
Ministation
computer
Massively
Parallel
Processors
Slide 6
Technology Trend
Clusters
Minisupercomputer
Minicomputer
1998 Computer Food Chain
Mainframe
Server
Supercomputer
Work- PC
station
Now who is eating whom?
Slide 7
Supercomputer Trends in Top 500
Parallel Computing Architectures in Top 500
SIMD
Single processor
Cluster
Constellations
SMP
MPP
www.top500.org
Nov. 2004
CPU
CPU
CPU
CPU
CPU
M
CPU
M
CPU
M
CPU
M
PC
PC
PC
BUS/CROSSBAR
network
MEMORY
Symmetric Multiprocessing (SMP)
Massively Parallel Processor (MPP)
cluster
Slide 8
Why Such Changes in 10
years?
• Performance
– Technology Advances
» CMOS VLSI dominates older technologies (TTL, ECL)
in cost AND performance
– Computer architecture advances improves low-end
» RISC, superscalar, RAID, …
• Price: Lower costs due to …
– Simpler development
» CMOS VLSI: smaller systems, fewer components
– Higher volumes
» CMOS VLSI : same dev. cost 10,000 vs. 10,000,000
units
– Lower margins by class of computer, due to fewer
services
• Function
– Rise of networking/local interconnection technology
Slide 9
Amazing Underlying Technology
Change
• In 1965, Gordon Moore
sketched out his prediction of
the pace of silicon technology.
• Moore's Law: The number
of transistors incorporated in
a chip will approximately
double every 24 months.
• Decades later, Moore's Law
remains true.
From Intel
Slide 10
Technology Trends: Moore’s Law
• Gordon Moore (Founder of Intel) observed in 1965 that the
number of transistors on a chip doubles about every 24 months.
• In fact, the number of transistors on a chip doubles about
every 18 months.
From intel
Slide 11
Technology Trends
Based on SPEED, the CPU has increased
dramatically, but memory and disk have
increased only a little. This has led to
dramatic changed in architecture, Operating
Systems, and programming practices.
Slide 12
Technology  dramatic change
• Processor
– transistor number in a chip: about 55% per year
– clock rate: about 20% per year
• Memory
– DRAM capacity: about 60% per year (4x every 3 years)
– Memory speed: about 10% per year
– Cost per bit: improves about 25% per year
• Disk
– capacity: about 60% per year
– Total use of data: 100% per 9 months!
• Network Bandwidth
– 10 years: 10Mb  100Mb
– 5 years: 100Mb  1 Gb
Slide 13
Technology  dramatic change
From IBM
Slide 14
Computer Architecture Is …
the attributes of a [computing] system as
seen by the programmer, i.e., the
conceptual structure and functional behavior,
as distinct from the organization of the data
flows and controls, the logic design, and the
physical implementation.
Amdahl, Blaaw, and Brooks, 1964
SOFTWARE
Slide 15
Computer Architecture’s Changing
Definition
• 1950s to 1960s Computer Architecture Course:
Computer Arithmetic
• 1970s to mid 1980s Computer Architecture
Course:
Instruction Set Design, especially ISA appropriate
for compilers
• 1990s Computer Architecture Course:
Design of CPU, memory system, I/O system,
Multiprocessors, Networks
• 2010s: Computer Architecture Course:
Self adapting systems? Self organizing
structures?
DNA Systems/Quantum Computing?
Slide 16
CSCE430/830 Course Focus
Understanding the design techniques,
machine structures, technology factors,
evaluation methods that will determine the
form of computers in the 21st Century
Technology
Parallelism
Programming
Languages
Applications
Computer Architecture:
• Instruction Set Design
• Organization
• Hardware/Software Boundary
Operating
Systems
Measurement & Evaluation
Interface Design
(ISA)
Compilers
History
Slide 17
Computer Engineering Methodology
Implementation
Complexity
Evaluate Existing
Systems for
Bottlenecks
Benchmarks
Technology
Trends
Implement Next
Generation System
Simulate New
Designs and
Organizations
Workloads
Architecture design is an iterative process: Searching the
space of possible designs at all levels of computer systems Slide 18
Summary
1. Moors’s laws: The number of transistors
incorporated in a chip will approximately
double every 18 months.
2. CPU speed increases dramatically, but the
speed of memory, disk and network increases
slowly.
3. Architecture design is an iterative process.
Measure performance: Benchmarks
Slide 19
Quantitative Principles
• Performance Metrics: How do we conclude
that System-A is “better” than System-B?
• Amdahl’s Law: Relates total speedup of a
system to the speedup of some portion of that
system.
• Topics: (Sections 1.1, 1.2, 1.5, 1.6)
– Metrics for different market segments
– Benchmarks to measure performance
– Quantitative principles of computer design
Slide 20
Importance of Measurement
Architecture design is an
iterative process:
• Search the possible design space
• Make selections
• Evaluate the selections made
Good measurement
tools are required to
accurately evaluate
the selection.
Cost /
Performance
Analysis
Good Ideas
Bad Ideas
Mediocre Ideas
Slide 21
Two notions of “performance”
Plane
DC to Paris
Speed
Passengers
Throughput
(pmph)
Boeing 747
6.5 hours
610 mph
470
286,700
BAD/Sud
Concodre
3 hours
1350 mph
132
178,200
Which has higher performance?
• Time to do the task (Execution Time)
– execution time, response time, latency, etc.
• Tasks per day, hour, week, sec, ns. .. (Performance)
– throughput, bandwidth, etc.
Slide 22
Performance Definitions
• Performance is in units of things-per-second.
– bigger is better
• Execution time is the reciprocal of performance.
– performance(x) =
1
execution_time(x)
• "X is n times faster than Y" means
execution_time (Y)
performance(X)
n = ----------------= ----------------execution_time (X)
performance(Y)
• When is throughput more important than execution time?
• When is execution time more important than throughput?
Slide 23
Performance Terminology
“X is n% faster than Y” means:
ExTime(Y) Performance(X)
-------- = -------------- =
ExTime(X) Performance(Y)
1
n
+ -----100
n = 100(Performance(X) - Performance(Y))
Performance(Y)
n = 100(ExTime(Y) - ExTime(X))
ExTime(X)
Example: Y takes 15 seconds to complete a task,
X takes 10 seconds. What % faster is X than Y?
Slide 24
Quantitative Design: Amdahl's Law
Amdahl’s Law gives a quick way to find the
speedup from some enhancement.
Speedup due to enhancement E:
Speedup( E ) 
Execution_ Tim e_ Without _ Enhancem en
t
Perform ance _ With _ Enhancem en
t

Execution_ Tim e_ With _ Enhancem en
t
Perform ance _ Without _ Enhancem en
t
This fraction enhanced
Suppose that enhancement E accelerates a fraction F of the
task by a factor S, and the remainder of the task is
unaffected
Slide 25
Quantitative Design: Amdahl's Law
ExTimenew = ExTimeold x (1 - Fractionenhanced) + Fractionenhanced
Speedupenhanced
Speedupoverall =
ExTimeold
ExTimenew
1
=
(1 - Fractionenhanced) + Fractionenhanced
Speedupenhanced
This fraction enhanced
ExTimeold
ExTimenew
Slide 26
Pictorial Depiction of Amdahl’s Law
Enhancement E accelerates fraction F of original execution time by a factor of S
Before: Execution Time without enhancement E
• shown normalized to 1 = (1-F) + F =1
Affected fraction: F
Unaffected fraction: (1- F)
Unchanged
Unaffected fraction: (1- F)
F/S
After: Execution Time with enhancement E:
Execution Time without enhancement E
1
Speedup(E) = --------------------------------------------------------- = ---------------------Execution Time with enhancement E
(1 - F) + F/S
Slide 27
Quantitative Design: Amdahl's Law
• Floating point (FP) instructions improved to run 2X; but
only 10% of actual instructions are FP. Suppose the old
execution time is ExTimeold, What are the current
execution time and speedup?
ExTimenew = ExTimeold x (0.9 + .1/2) = 0.95 x ExTimeold
Speedupoverall
Speedup =
1
0.95
=
ExTimeold
ExTimenew
1.053
1
=
(1 - Fractionenhanced) + Fractionenhanced
Speedupenhanced
1
Speedup =
=
(1 - 0.1) + 0.1/2
= 1.053
Slide 28
Computer Clocks
• A computer clock runs at a constant rate and
determines when events take placed in hardware.
Clk
clock period
• The clock cycle time is the amount of time for one
clock period to elapse (e.g. 5 ns).
• The clock rate is the inverse of the clock cycle time.
• For example, if a computer has a clock cycle time
of 5 ns, the clock rate is:
1
---------------------- = 200 MHz
-9
5 x 10 sec
Slide 29
Computing CPU time
• The time to execute a given program is
CPU time = CPU clock cycles for a program x clock cycle
time
Since clock cycle time and clock rate are
reciprocals, thus
CPU time = CPU clock cycles for a program / clock rate
• CPI: clock cycles per instruction
CPU clock cycle for a program
CPI = ------------------------Instruction count
Slide 30
Computing CPU time
• The time to execute a given program is
CPU time = CPU clock cycles for a program x clock cycle time
Since clock cycle time and clock rate are reciprocals, thus
CPU time = CPU clock cycles for a program / clock rate
• The number of CPU clock cycles can be determined by
CPU clock cycles = (instructions/program) x (clock cycles/instruction)
= Instruction count x CPI
which gives
CPU time = Instruction count x CPI x clock cycle time
CPU time = Instruction count x CPI / clock rate
• The units for this are
instructions clock cycles
seconds
seconds = ---------------- x -------------- x -------------program
instruction
clock cycle
Slide 31
Example of Computing CPU time
• If a computer has a clock rate of 2 GHz,
how long does it take to execute a
program with 1,000,000 instructions, if
the CPI for the program is 3.5?
Slide 32
Example of Computing CPU time
• If a computer has a clock rate of 2 GHz, how long does it
take to execute a program with 1,000,000 instructions, if
the CPI for the program is 3.5?
• Using the equation
CPU time = Instruction count x CPI / clock rate
gives
6
CPU time = 1000000 x 3.5 / (2
x 109 )
• If a computer’s clock rate increases from 200 MHz to 250
MHz and the other factors remain the same, how many
times faster will the computer be?
CPU time old
clock rate new
250 MHz
-------------- = ------------------ = -------------- = 1.25
CPU time new
clock rate old
200 MHZ
• What simplifying assumptions did we make?
Slide 33
Performance Example
• Two computers M1 and M2 with the same instruction set.
• For a given program, we have
M1
M2
Clock rate
(MHz)
50
75
CPI
2.8
3.2
• How many times faster is M2 than M1 for this program?
ExTimeM1
ExTimeM2
=
ICM1 x CPIM1 / Clock RateM1
ICM2 x CPIM2 / Clock RateM2
=
2.8/50
= 1.31
3.2/75
Slide 34
Aspects of CPU Performance
CPU time
= Seconds
= Instructions x
Program
Program
Program
Inst Count
X
CPI
X
(X)
Inst. Set.
X
X
Technology
X
x Seconds
Instruction
Compiler
Organization
Cycles
Cycle
Cycle Time
X
X
Slide 35
Performance Summary
• Two performance metrics execution time and throughput.
• Amdahl’s Law
Execution Time without enhancement E
1
Speedup(E) = --------------------------------------------------------- = ---------------------Execution Time with enhancement E
(1 - F) + F/S
• When trying to improve performance, look at what occurs
frequently => make the common case fast.
• CPU time:
CPU time = Instruction count x CPI x clock cycle time
CPU time = Instruction count x CPI / clock rate
Slide 36