Lecture 1: Course Introduction and Overview

Download Report

Transcript Lecture 1: Course Introduction and Overview

Adapted from
“http://wwwinst.EECS.Berkeley.EDU:80/~cs152/fa97/index_lectures.html”,
“http://www.cs.berkeley.edu/~pattrsn/252S98/index.html”
Copyright 1998 UCB
Chapter 2: Instruction Set Principles
and Examples
순천향대학교 컴퓨터학부
이상정
S.J.Lee 1
Review, #1
• Designing to Last through Trends
Capacity
Speed
Logic
2x in 3 years
2x in 3 years
DRAM
4x in 3 years
2x in 10 years
Disk
4x in 3 years
2x in 10 years
Processor
( n.a.)
2x in 1.5 years
• Time to run the task
– Execution time, response time, latency
• Tasks per day, hour, week, sec, ns,
– Throughput, bandwidth
• “X is n times faster than Y” means
ExTime(Y)
--------ExTime(X)
=
Performance(X)
-------------Performance(Y)
S.J.Lee 2
Review, #2
• Amdahl’s Law:
Speedupoverall =
ExTimeold
ExTimenew
1
=
(1 - Fractionenhanced) + Fractionenhanced
Speedupenhanced
• CPI Law:
CPU time
= Seconds
Program
= Instructions x
Program
Cycles
x Seconds
Instruction
Cycle
• Execution time is the REAL measure of computer
performance!
• Good products created when have:
– Good benchmarks
– Good ways to summarize performance
S.J.Lee 3
Introduction
Instruction
Fetch
Instruction
Decode
What Must be Specified?
• Instruction Format or Encoding
– how is it decoded?
• Location of operands and result
Operand
Fetch
Execute
Result
Store
Next
Instruction
– where other than memory?
– how many explicit operands?
– how are memory operands located?
– which can or cannot be in memory?
• Data type and Size
• Operations
– what are supported
• Successor instruction
– jumps, conditions, branches
S.J.Lee 4
Evolution of Instruction Sets
Single Accumulator (EDSAC 1950)
Accumulator + Index Registers
(Manchester Mark I, IBM 700 series 1953)
Separation of Programming Model
from Implementation
High-level Language Based
(B5000 1963)
Concept of a Family
(IBM 360 1964)
General Purpose Register Machines
Complex Instruction Sets
(Vax, Intel 432 1977-80)
Load/Store Architecture
(CDC 6600, Cray 1 1963-76)
RISC
(Mips,Sparc,HP-PA,IBM RS6000,PowerPC . . .1987)
LIW/”EPIC”? (IA-64. . .1999)
S.J.Lee 5
Classifying Instruction Set Architectures
• Accumulator (1 register):
– 1 address
– 1+x address
add A
addx A
acc acc + mem[A]
acc acc + mem[A + x]
add
tos tos + next
• Stack:
– 0 address
• General Purpose Register:
– 2 address
– 3 address
add A B
add A B C
EA(A) EA(A) + EA(B)
EA(A) EA(B) + EA(C)
add Ra Rb Rc
load Ra Rb
store Ra Rb
Ra Rb + Rc
Ra mem[Rb]
mem[Rb] Ra
• Load/Store:
– 3 address
• Comparison:
– Bytes per instruction?
– Number of Instructions?
– Cycles per instruction?
S.J.Lee 6
Comparing Number of Instructions
° Code sequence for C = A + B for four classes of instruction s
Stack
Accumulator Register
Register
(register-memory) (load-store)
Push A
Load A
Load R1,A
Load R1,A
Push B
Add B
Add R1,B
Load R2,B
Add
Store C
Store C, R1
Add R3,R1,R2
Pop C
Store C,R3
S.J.Lee 7
General Purpose Registers Dominate
1975-1995 all machines use general purpose registers
Advantages of registers
registers are faster than memory
registers are easier for a compiler to use
-
e.g., (A*B) -(C*D)-(E*F) can do multiplies in any order
vs. stack
registers can hold variables
- memory traffic is reduced, so program is sped up
(since registers are faster than memory)
- code density improves (since register named with fewer bits
than memory location)
S.J.Lee 8
Memory Addressing
Since 1980 almost every machine uses addresses to level of 8-bits
(byte)
2 questions for design of ISA:
Since could read a 32-bit word as four loads of bytes from
sequential byte addresses or as one load word from a single byte
address,
how do byte addresses map onto words?
Can a word be placed on any byte boundary?
S.J.Lee 9
Addressing Objects: Endianess and Alignment
• Big Endian:
address of most significant
– IBM 360/370, Motorola 68k, MIPS, Sparc, HP PA
• Little Endian: address of least significant
– Intel 80x86, DEC Vax, DEC Alpha (Windows NT)
little endian byte 0
3
2
1
0
msb
lsb
0
0
big endian byte 0
1
2
1
2
3
3
Aligned
Alignment: require that objects fall on address
that is multiple of their size.
Not
Aligned
S.J.Lee 10
Addressing Modes
Addressing mode
Example
Meaning
Register
Add R4,R3
R4R4+R3
Immediate
Add R4,#3
R4 R4+3
Displacement
Add R4,100(R1) R4 R4+Mem[100+R1]
Register indirect
Add R4,(R1)
Indexed / Base
Add R3,(R1+R2) R3 R3+Mem[R1+R2]
Direct or absolute
Add R1,(1001)
R1 R1+Mem[1001]
Memory indirect
Add R1,@(R3)
R1 R1+Mem[Mem[R3]]
Auto-increment
Add R1,(R2)+
R1 R1+Mem[R2]; R2 R2+d
Auto-decrement
Add R1,-(R2)
R2 R2-d; R1 R1+Mem[R2]
Scaled
Add R1,100(R2)[R3]
R4 R4+Mem[R1]
R1  R1+Mem[100+R2+R3*d]
Why Auto-increment/decrement? Scaled?
S.J.Lee 11
Addressing Mode Usage? (ignore register mode)
SPEC89 3 programs
--- Displacement:
42% avg, 32% to 55%
--- Immediate:
33% avg, 17% to 43%
75%
85%
--- Register deferred (indirect): 13% avg, 3% to 24%
--- Scaled:
7% avg, 0% to 16%
--- Memory indirect:
3% avg, 1% to 6%
--- Misc:
2% avg, 0% to 3%
75% displacement & immediate
88% displacement, immediate & register indirect
S.J.Lee 12
Displacement Address Size?
Int. Avg.
FP Avg.
30%
20%
10%
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
0%
Address Bits
Avg. of 5 SPECint92 programs v. avg. 5 SPECfp92 programs
3
4
X-axis is in powers of 2: 4 => addresses > 2 (8) and  2 (16)
1% of addresses > 16-bits
12 -16 bits of displacement needed
S.J.Lee 13
Immediate Size?
• 50% to 60% fit within 8 bits
• 75% to 80% fit within 16 bits
S.J.Lee 14
Operations in the Instruction Set
Data Movement
Load (from memory)
Store (to memory)
memory-to-memory move
register-to-register move
input (from I/O device)
output (to I/O device)
push, pop (to/from stack)
Arithmetic
integer (binary + decimal) or FP
Add, Subtract, Multiply, Divide
Shift
shift left/right, rotate left/right
Logical
not, and, or, set, clear
Control (Jump/Branch)
unconditional, conditional
Subroutine Linkage
call, return
Interrupt
trap, return
Synchronization
test & set
String
Graphics (MMX)
search, translate
parallel subword ops (4 16bit add)
S.J.Lee 15
Top 10 80x86 Instructions
° Rank instruction
Integer Average Percent total executed
1
load
22%
2
conditional branch
20%
3
compare
16%
4
store
12%
5
add
8%
6
and
6%
7
sub
5%
8
move register-register
4%
9
call
1%
10
return
1%
Total
96%
° Simple instructions dominate instruction frequency
S.J.Lee 16
Methods of Testing Condition
• Condition Codes
– Processor status bits are set as a side-effect of
arithmetic instructions (possibly on Moves) or explicitly
by compare or test instructions.
ex:
add r1, r2, r3
bz label
• Condition Register
– Ex: cmp r1, r2, r3
bgt r1, label
• Compare and Branch
– Ex: bgt r1, r2, label
S.J.Lee 17
Conditional Branch Distance
Int. Avg.
FP Avg.
40%
30%
20%
10%
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
0%
Bits of Branch Dispalcement
S.J.Lee 18
Conditional Branch Addressing
• PC-relative since most branches to
the current PC address
• At least 8 bits suggested (128 instructions)
• Compare Equal/Not Equal most important for integer
programs (86%)
7%
LT/GE
40%
Int Avg.
7%
GT/LE
23%
FP Avg.
86%
EQ/NE
37%
0%
50%
Frequency of comparison
types in branches
100%
S.J.Lee 19
Data Types
Bit: 0, 1
Bit String: sequence of bits of a particular length
4 bits is a nibble
8 bits is a byte
16 bits is a half-word
32 bits is a word
64 bits is a double-word
Character:
ASCII 7 bit code
Decimal:
digits 0-9 encoded as 0000b thru 1001b
two decimal digits packed per 8 bit byte
Integers:
2's Complement
Floating Point:
exponent
Single Precision
E
Double Precision
M
x
R
Extended Precision
base
mantissa
How many +/- #'s?
Where is decimal pt?
How are +/- exponents
represented? S.J.Lee 20
Operand Size Usage
Doubleword
0%
69%
74%
Word
Halfword
Byte
Int Avg.
31%
19%
FP Avg.
0%
7%
0%
0%
20%
40%
60%
80%
Frequency of reference by s ize
•Support these data sizes and types:
8-bit, 16-bit, 32-bit integers and
32-bit and 64-bit IEEE 754 floating point numbers
S.J.Lee 21
Generic Examples of Instruction Format Widths
Variable:
Fixed:
Hybrid:
S.J.Lee 22
Summary of Instruction Formats
• If code size is most important,
use variable length instructions
• If performance is over is most important,
use fixed length instructions
• Recent embedded machines (ARM, MIPS) added
optional mode to execute subset of 16-bit wide
instructions (Thumb, MIPS16); per procedure decide
performance or density
S.J.Lee 23
Compilers and Instruction Set Architectures
• Ease of compilation
– orthogonality: no special registers, few special cases,
all operand modes available with any data type or instruction type
– completeness: support for a wide range of operations
and target applications
– regularity: no overloading for the meanings of instruction fields
– streamlined: resource needs easily determined
• Register Assignment is critical too
– Easier if lots of registers
S.J.Lee 24
Summary of Compiler Considerations
• Provide at least 16 general purpose registers
plus separate floating-point registers,
• Be sure all addressing modes apply to all
data transfer instructions,
• Aim for a minimalist instruction set.
S.J.Lee 25
A "Typical" RISC
•
•
•
•
32-bit fixed format instruction (3 formats)
32 32-bit GPR (R0 contains zero, DP take pair)
3-address, reg-reg arithmetic instruction
Single address mode for load/store:
base + displacement
– no indirection
• Simple branch conditions
• Delayed branch
see: SPARC, MIPS, HP PA-Risc, DEC Alpha, IBM PowerPC,
CDC 6600, CDC 7600, Cray-1, Cray-2, Cray-3
S.J.Lee 26
DLX Architecture
• Simple load-store architecture
• Based on observations about instruction set
architecture
• Emphasizes:
– Simple load-store instruction set
– Design for pipeline efficiency
– Design for compiler target
• DLX registers
– 32 32-bit GPRS named R0, R1, ..., R31
– 32 32-bit FPRs named F0, F2, ..., F30
» Accessed independently for 32-bit data
» Accessed in pairs for 64-bit (double-precision) data
– R0 is always 0
– Other status registers, e.g., floating-point status register
• Byte addressable in big-endian with 32-bit
address
S.J.Lee 27
DLX Addressing Modes
• All instructions 32 bits wide
Register (direct)
op
rs
rt
rd
register
Immediate
Base+index
op
rs
rt
immed
op
rs
rt
immed
register
PC-relative
op
rs
PC
rt
Memory
+
immed
Memory
+
S.J.Lee 28
DLX Instruction Format
R-type instruction
31
26 25
Op
21 20
rs1
16 15
rs2
11 10
0
func
rd
I-type instruction
31
26 25
Op
rs1
21 20
16 15
rd
0
immediate
J-type instruction
31
26 25
Op
0
Offset added to PC
S.J.Lee 29
DLX Operation Overview
• Data transfers
– LB,LBU,SB, LH,LHU,SH, LW,SW
– LF,LD,SF,SD
– MOVI2S,MOVS2I, MOVF,MOVD, MOVFP2I,MOVI2FP
• Arithmetic logical
– ADD, ADDI, ADDU, ADDUI,SUB,SUBI, SUBU,SUBUI,
MULT,MULTU,DIV,DIVU
– AND,ANDI, OR, ORI, XOR,XORI
– LHI
– SLL,SRL,SRA,SLLI,SRLI,SRAI
– S__, S__I => “__” may be LT,GT,LE,GE,EQ,NE
• Control
– BEQZ,BNEZ, BFPT,BFPF
– J,JR, TRAP, RFE
• Floating point
– ADDD,ADDF,SUBD,SUBF,MULTD,MULTF,DIVD,DIVF
– CVTF2D,CVTF2I,CVTD2F,CTD2I,CVTI2F,CVTI2D
– __D,__F => “__” may be LT,GT,LE,GE,EQ,NE
S.J.Lee 30
Examples of DLX load and store instructions
Instruction
load word
load word
load byte
load byte unsigned
load half word
Example
LW R1, 30(R2)
LW R1, 1000(R0)
LB R1, 40(R3)
LBU R1, 40(R3)
LH R1, 40(R3)
load float
load double
store word
store float
store double
store half
store byte
LF F0, 50(R3)
LD F0, 50(R3)
SW 500(R4), R3
SF 40(R3), F0
SD 40(R3), F0
SH 502(R2), R3
SB 41(R3), R2
Meaning
R1 =32 Mem[30+R2]
R1 =32 Mem[1000+0]
R1 =32 (Mem[40+R3]0)24 ## Mem[40+R3]
R1 =32 024 ## Mem[40+R3]
R1 =32 (Mem[40+R3]0)16 ## Mem[40+R3]
## Mem[41+R3]
F0 =32 Mem[50+R3]
F0 ## F1 =64 Mem[50+R3]
Mem[500+R4] =32 R3
Mem[40+R3] =32 F0
Mem[40+R3] =32 F0; Mem[44+R3] =32 F1
Mem[502+R2] =16 R316..31
Mem[41+R3] =8 R224..31
S.J.Lee 31
Examples of DLX arithmetic/logical instructions
Instruction
add
add immediate
load high immediate
shift left logical immediate
set less than
Example
ADD R1,R2,R3
ADDI R1,R2,#3
LHI R1,#42
SLLI R1,R2,#5
SLT R1,R2,R3
Meaning
R1 = R2 + R3
R1 = R2 + 3
R1 = 42##016
R1 = R2 << 5
if (R2<R3) R1 = 1
else R1 = 0
S.J.Lee 32
Examples of control-flow instructions
Instruction
jump
Example
J name
jump and link
JAL name
jump and link
JALR R2
register
jump register
JR R3
branch equal zero BEQZ R4, name
branch not equal
zero
BNEZ R4, name
Meaning
PC = name;
( (PC+4)-225) <= name < ( (PC+4)+225)
R31 = PC+4; PC = name;
( (PC+4)-225) <= name < ( (PC+4)+225)
R31 = PC+4; PC = R2;
PC = R3
if (R4 == 0) PC = name;
( (PC+4)-215) <= name < ( (PC+4)+215)
if (R4 != 0) PC = name;
( (PC+4)-215) <= name < ( (PC+4)+215)
S.J.Lee 33
Summary, #1
• Classifying Instruction Set Architectures
– Accumulator (1 register):1 address
– Stack:0 address
– General Purpose Register:2 address, 3 address
– Load/Store: 3 address
• General Purpose Registers Dorminate
• Data Addressing modes that are important:
Displacement, Immediate, Register Indirect
• Displacement size should be 12 to 16 bits
• Immediate size should be 8 to 16 bits
S.J.Lee 34
Summary, #2
• Operations in the Instruction Set :
– Data Movement, Arithmetic, Shift, Logical, Control
– subroutine linkage, interrupt
– synchronization, string, graphics(MMX)
• Methods of Testing Condition
– condition codes
– condition register
– compare and branch
S.J.Lee 35
Summary, #3
• DLX Architecture
– Simple load-store architecture
– DLX registers
» 32 32-bit GPRS named R0, R1, ..., R31
» 32 32-bit FPRs named F0, F2, ..., F30
– Byte addressable in big-endian with 32-bit address
S.J.Lee 36