CSCE 385 COMPUTER ARCHITECTURE Instructor: Dr. Mike Turi Department of Computer Science & Computer Engineering Pacific Lutheran University Lecture Slides adapted from Prof.

Download Report

Transcript CSCE 385 COMPUTER ARCHITECTURE Instructor: Dr. Mike Turi Department of Computer Science & Computer Engineering Pacific Lutheran University Lecture Slides adapted from Prof.

CSCE 385
COMPUTER
ARCHITECTURE
Instructor: Dr. Mike Turi
Department of Computer Science & Computer Engineering
Pacific Lutheran University
Lecture Slides adapted from Prof. Jose Delgado-Frias and
Mr. Paul Wettin (Washington State University EE 334)
Course Objectives
Students in this course will be able to:
• Understand how modern computer systems work
• Perform quantitative analysis of computer systems
• Analyze at system level the impact of changes in the
computer systems
• Estimate the performance of a computer system
• Design novel schemes that improve the performance of
computer systems
• Use tools to design modern systems
• Recognize the need for further learning in this field (life-long
learning)
2
Things you’ll be learning
• How computers work, a basic foundation
• Classic/basic components of a computer
• Stored program concept: instructions and data
• Issues affecting modern processors (caches, pipelines)
• Principles of locality to be exploited by means of memory
hierarchy (L1, L2 & L3 cache; main memory; disk,…)
• Greater performance by means of instruction level parallelism
• Principle of abstraction, used to build systems as layers
• Compilation vs. interpretation thru system layers
• How to analyze their performance (or how not to!)
• Principles and pitfalls of performance measurement
• Multiprocessor systems
3
Focus
• Our primary focus: the processor (datapath and control)
– Implemented using millions of transistors
– Impossible to understand by looking at each transistor
– We need to:
• Have an overall picture of the system
• Analyze how the components interact
• Consider both Hardware and Software
4
Classes of Computers
• Desktop Computer
• Server
– High dependability
• Supercomputers
– Hundreds to thousands of processors
– Terabytes of memory, pentabytes of storage
• Datacenters
– Large clusters of computers
5
Embedded Computers
•
•
•
•
•
•
Largest class of computers
Span widest range of applications and performance
Where are these found?
Large growth in developing countries
Cost and power requirements
Dependability requirements
EE 334
6
Basic components
Computer
Processor
Control
(“brain”)
Datapath
(“brawn”)
Memory
(where
programs,
data
live when
running)
Devices
Input
Output
Keyboard,
Mouse
Disk
(where
programs,
data
live when
not running)
Display,
Printer
7
Computer System
Application (Explorer)
Software
Hardware
Compiler
Operating
System
Assembler
Processor Memory I/O system
Instruction Set
Architecture
Datapath & Control
Digital Design
Circuit Design
transistors
8
Abstraction
• Delving into the depths
reveals more information
High-level
language
program
(in C)
• An abstraction omits unneeded detail,
helps us cope with complexity
Assembly
language
program
(for MIPS)
What are some of the details that
appear in these familiar abstractions?
Binary machine
language
program
(for MIPS)
swap(int v[], int k)
{int temp;
temp = v[k];
v[k] = v[k+1];
v[k+1] = temp;
}
C compiler
swap:
muli $2, $5,4
add $2, $4,$2
lw $15, 0($2)
lw $16, 4($2)
sw $16, 0($2)
sw $15, 4($2)
jr $31
Assembler
00000000101000010000000000011000
00000000100011100001100000100001
10001100011000100000000000000000
10001100111100100000000000000100
101011001111001000000000000000
00
10101100011000100000000000000100
00000011111000000000000000001000
9
Computer history
Generation -1:
The early days
????-1642
– Calculations had to be performed, recognized need for
non-human computer
Generation 0:
Mechanical
1642-1935
– Mechanical computers: Examples are Babbage Machine
(Charles Babbage) and Difference Engine (Georg Scheutz)
• Babbage Machine: http://www.youtube.com/watch?v=BlbQsKpq3Ak
Generation 1:
Generation 2:
Generation 3:
Generation 4:
Generation 5:
Electromechanical 1935-1945
Vacuum tubes
1945-1955
Discrete transistors 1955-1965
Integrated circuits 1965-1980
VLSI
1980-????
10
Generation 1:
Electromechanical
(1935-1945)
•
Grace Murray Hopper
found the first computer
bug beaten to death in
the jaws of a relay. She
glued it into the logbook
of the computer and
thereafter when the
machine stopped
(frequently) she told
Howard Aiken that they
were "debugging" the
computer.
11
Intel 4004
• In 1971, Ted Hoff produced
the Intel 4004 in response to
the request from a Japanese
company (Busicom) to create
a chip for a calculator
• It is the first microprocessor,
i.e. the first processor-on-achip
12
Intel 4004
13
Intel Pentium II
14
Pentium III (2000)
15
Pentium III continued
•PIC
Programmable Interrupt Controller
•E/BBL
External/Back-side Bus Logic
•CLK
Clocking
•L2
Level 2 cache
•DTLB
Data Translation Look-aside Buffer
•DCU
Data Cache Unit
•BTB
Branch Target Buffer
•BAC
Branch Address Calculator
•TAP
Testability Access Port
•IFU
Instruction Fetch Unit
•PMH
Page Miss Handler
16
Pentium III continued
•PFU
Packed FPU (MMX)
•SIMD
Packed Floating point
•MOB
Memory Order Buffer
•IEU
Integer Execution Unit
•RAT
Register Alias Table
•FEU
FPU Execution Unit
•MIU
Memory Interface Unit
•RS
Reservation Station
•ID
Instruction Decode
•ROB
Re-Order Buffer
•MS
Micro-instruction Sequencer
17
Intel Pentium 4
Fall 2002
• 2.80 GHz
• 0.13-micron technology
• 478-pin package
• 512 KB L2 Cache
• 50 Amps
• Vcc = 1.5V
18
Intel Core i7
Cores: 4
Cache
32nm CMOS Tech
Threads: 8
L1: 32KB (8-way) 731M transistors
Clock: 2.5GHz L2: 256KB (8-way)
L3: 8MB (16 way)
19
Intel Core i7 (Ivy Bridge -2012)
22nm technology (tri-gate transistor technology)
1.40 billion transistors; 160mm²
3.5GHz (Turbo frequency of 3.9GHz)
Tri-Gate transistor technology
(FinFET technology)
Semiconductor Manufacturing
•
•
•
•
•
•
•
•
•
•
Start with a Silicon ingot (single Silicon crystal)
Slice ingot into unfinished wafers
Perform many processing steps to put circuits onto wafer
Create a patterned wafer with dies (chips)
– Use a mask to do this (a “master” copy)
Processing steps not 100% accurate (causes defects on dies)
Wafer tester figures which dies are good/bad
– Yield: percentage of good die from total number on wafer
– Larger die are more expensive
Dicer cuts rectangular dies from wafer (discards defective dies)
Bond the good die to the package (with I/O pins)
Test the packaged dies (discard bad packaged dies)
22
Ship to customers
For more info on Semiconductor
Manufacturing
• Textbook Figure 1.18
• YouTube videos (a sample, much more info available)
• How to make wafers and pattern (Science Channel)
• http://www.youtube.com/watch?v=aWVywhzuHnQ
• Patterning and packaging (Lexar)
• http://www.youtube.com/watch?v=kvf29R7nXlM
• Details about manufacturing process (Global Foundaries)
• http://www.youtube.com/watch?v=qm67wbB5GmI
• Inside a Fab (Intel)
• http://www.youtube.com/watch?v=PecKlm6VutU
23
Why Such Change in 12 years?
• Performance
– Technology Advances
• CMOS VLSI dominates older technologies (TTL, ECL) in cost
AND performance
– Computer architecture advances improves low-end
• RISC, superscalar, RAID, …
• Price: Lower costs due to …
– Simpler development
• CMOS VLSI: smaller systems, fewer components
– Higher volumes
• CMOS VLSI: same dev. cost 10,000 vs. 10,000,000 units
– Lower margins by class of computer, due to fewer services
• Function
– Rise of networking/local interconnection technology
24
Single Processor Performance
Move to multi-processor
RISC
VAX-11/780, 5MHz
Processor Core Trend
Source: S. Fuller and L. Millett, “Computing Performance: Game Over or Next Level?”
IEEE Computer, pp. 31-38, January 2011.
26
Processor Core Clock Frequency
Source: S. Fuller and L. Millett, “Computing Performance: Game Over or Next Level?”
IEEE Computer, pp. 31-38, January 2011.
27
Computer System
Application (Explorer)
Software
Hardware
Compiler
Operating
System
Assembler
Processor Memory I/O system
Instruction Set
Architecture
Datapath & Control
Digital Design
Circuit Design
transistors
28
Metrics of Performance
Application
Answers per month
Operations per second
Programming
Language
Compiler
ISA
(millions) of Instructions per second: MIPS
(millions) of (FP) operations per second: MFLOP/s
Datapath
Control
Function Units
Transistors Wires Pins
Megabytes per second
Cycles per second (clock rate)
29
CPU time
Program
Compiler
Instruction
set
Organization
Technology
Instruction Cycles Per
Clock Rate
Count
Instruction
30
Computer Architecture
Computer Architecture =
Instruction Set Architecture +
Machine Organization + …
31
What is Computer Architecture Course?
Changing Definition
• 1950s to 1960s: Computer Arithmetic
• 1970s to mid 1980s:
Instruction Set Design, especially ISA appropriate for
compilers
• 1990s:
Design of CPU, memory system, I/O system, Multiprocessors
• 2000s:
Design of CPU (Performance & Power), Memory system, I/O,
embedded systems, wireless.
32
Instruction Set Architecture (ISA)
instruction set
33
Interface Design (ISA)
A good interface:
• Lasts through many implementations (portability,
compatibility)
• Is used in many different ways (generality)
• Provides convenient functionality to higher levels
• Permits an efficient implementation at lower levels
use
use
use
Interface
imp 1
time
imp 2
imp 3
34