EE-F011 Computer Architecture 計算機結構

Download Report

Transcript EE-F011 Computer Architecture 計算機結構

EEF011 Computer Architecture
計算機結構
Chapter 2
Instruction Set Principles and Examples
吳俊興
高雄大學資訊工程學系
October 2004
Chapter 2. Instruction Set Principles
and Examples
2.1 Introduction
2.2 Classifying Instruction Set Architectures
2.3 Memory Addressing
2.4 Addressing Modes for Signal Processing
2.5 Type and Size of Operands
2.6 Operands for Media and Signal Processing
2.7 Operations in the Instruction Set
2.8 Operations for Media and Signal Processing
2.9 Instructions for Control Flow
2.10 Encoding an Instruction Set
2
2.1 Introduction
Instruction Set Architecture – the portion of the machine
visible to the assembly level programmer or to the
compiler writer
– In order to use the hardware of a computer, we must speak its
language
– The words of a computer language are called instructions, and
its vocabulary is called an instruction set
software
instruction set
hardware
Instr. #
i
(i+1)
(i+2)
(i+3)
:
L5:
Operation+Operands
movl -4(%ebp), %eax
addl %eax, (%edx)
cmpl 8(%ebp), %eax
jl
L5
3
Topics
1. A taxonomy of instruction set alternatives and
qualitative assessment
2. Instruction set quantitative measurements
3. Specific instruction set architecture
4. Issues and bearings of languages and
compilers
5. Examples: MIPS and Trimedia TM32 CPU
Appendices C-F:
MIPS, PowerPC, Precision Architecture, SPARC
ARM, Hitachi SH, MIPS 16, Thumb
80x86 (App. D),IBM 360/370 (App. E), VAX (App. F)
4
2.2 Classifying Instruction Set Architectures
Operand storage in CPU Where are they other than memory
# explicit operands
named per instruction
How many? Min, Max, Average
Addressing mode
How the effective address for an operand
calculated? Can all use any mode?
Operations
What are the options for the opcode?
Type & size of operands
How is typing done? How is the size
specified?
These choices critically affect number of instructions, CPI, and
CPU cycle time
5
ISA Classification
• Most basic differentiation: internal storage in a
processor
– Operands may be named explicitly or implicitly
• Major choices:
1. In an accumulator architecture one operand is implicitly
the accumulator => similar to calculator
2. The operands in a stack architecture are implicitly on the
top of the stack
3. The general-purpose register architectures have only
explicit operands – either registers or memory location
6
Basic ISA Classes
ISA Type
Examples
Stack
B5500, B6500
Explicit
Operand
operands per Result
access
Destination Method
ALU inst.
0
Stack
Push & Pop Stack
Accumulator
Acc = Acc + mem[A]
Registers
Rx = Ry + mem[A]
or
Rx = Rx + Ry (2)
Memory
Rx = Rx + Rz (3)
HP 3000/70
Accumulator Motorola 6809 1
+ ancient ones
Register
Set IBM 360
DEC VAX
+ all modern
micro’s
2 or 3
Register-register, register-memory,
and memory-memory (gone) options
7
Example
Stack:
0 address
add
tos tos + next
Accumulator:
1 address
add A
acc acc + mem[A]
General Purpose Register (register-memory):
1 address
add R1 A
R1 R1 + mem[A]
GPR (register-register or called load/store):
0 address
load R1, A
load R2, B
add R3, R1, R2
ALU Instructions can
have two operands.
R1 mem[A]
R2 mem[B]
R3 R1+R2
ALU Instructions can have
three operands.
8
Operand Locations and Code Sequence for C=A+B
Stack
Push A
Push B
Add
Pop C
Accumulator
Load A
Add B
Store C
GPR
(register-memory)
Load R1, A
Add R1, B
Store C, R1
GPR
(load-store)
Load R1, A
Load R2, B
Add R3, R1, R2
Store C, R3
9
Pro’s and Con’s
ISA Type
Advantages
Disadvantages
Stack
• Simple effective address
• Short instructions
• Good code density
• Lack of random access
• Efficient code is difficult to
generate
• Stack is often a bottleneck
Accumulator • Minimal internal state
• Very high memory traffic
• Fast context switch
• Short instructions
Register
• Registers are faster than memory • Longer instructions
• Registers can be used to hold
• Possibly complex effective
variables
address generation
+reduce memory traffic
• Size and structure of register set
+speed up programs
has many options
• Registers are more efficient for a
compiler to use than other forms of
internal storage
Register is the class that won out!
10
Register Machines
• How many registers are sufficient?
• General-purpose registers vs. special-purpose registers
• compiler flexibility and hand-optimization
• Two major concerns for arithmetic and logical instructions (ALU)
1. Two or three operands
X+YX
X+Y Z
2. How many of the operands may be memory addresses (0 – 3)
Number of
memory addresses
Max number of
operands allowed
0
3
Alpha, ARM, MIPS, PowerPC, Sparc, SuperH, Trimedia
TM5200
1
2
IBM 360/370, Intel 80x86, Motorola 68000, TI TMS320C54x
2
2
VAX, PDP-1, National 32x32, IBM 360SS
3
3
VAX
Examples
Hence, register classification (# mem, # operands)
11
(0, 3): Register-Register
ALU is Register to Register – also known as
pure Reduced Instruction Set Computer (RISC)
o Advantages
–
–
–
–
simple fixed length instruction encoding
decode is simple since instruction types are small
simple code generation model
instruction CPI tends to be very uniform
• except for memory instructions of course
• but there are only 2 of them - load and store
o Disadvantages
– instruction count tends to be higher
– some instructions are short - wasting instruction word bits
12
(1, 2): Register-Memory
Evolved RISC and also old CISC
• new RISC machines capable of doing speculative loads
• predicated and/or deferred loads are also possible
o Advantages
– data access to ALU immediate without loading first
– instruction format is relatively simple to encode
– code density is improved over Register (0, 3) model
o Disadvantages
– operands are not equivalent - source operand may be
destroyed
– need for memory address field may limit # of registers
– CPI will vary
• if memory target is in L0 cache then not so bad
• if not - life gets miserable
13
(2, 2) or (3, 3): Memory-Memory
True and most complex CISC model
• currently extinct and likely to remain so
• more complex memory actions are likely to appear but not
directly linked to the ALU
o Advantages
– most compact code
– doesn’t waste registers for temporary values
• good
idea for use once data - e.g. streaming media
o Disadvantages
– large variation in instruction size - may need a shoe-horn
– large variation in CPI - i.e. work per instruction
– exacerbates the infamous memory bottleneck
• register file reduces memory accesses if reused
Not used today
14
2.3 Memory Addressing
Interpreting Memory Addresses
• In today’s machine, objects have byte addresses – an address
refers to the number of bytes counted from the beginning of
memory
• Object Length: Provides access for bytes (8 bits), half words (16
bits), words (32 bits), and double words (64 bits). The type is
implied in opcode (e.g., LDB – load byte; LDW – load word; etc.)
• Byte Ordering
– Little Endian: puts the byte whose address is xx00 at the least significant position in
the word. (7,6,5,4,3,2,1,0)
– Big Endian: puts the byte whose address is xx00 at the most significant position in
the word. (0,1,2,3,4,5,6,7)
• Problem occurs when exchanging data among machines with different orderings
15
Interpreting Memory Addresses
• Alignment Issues
– Accesses to objects larger than a byte must be aligned. An access to an
object of size s bytes at byte address A is aligned if A mod s = 0.
 Misalignment causes hardware complications, since the memory is
typically aligned on a word or a double-word boundary
 Misalignment typically results in an alignment fault that must be
handled by the OS
– Hence
• byte address is anything - never misaligned
• half word - even addresses - low order address bit = 0 ( XXXXXXX0) else
trap
• word - low order 2 address bits = 0 ( XXXXXX00) else trap
• double word - low order 3 address bits = 0 (XXXXX000) else trap
16
Figure 2.5
17
Addressing Modes
How do architectures specify the addr. of an object they will access?
 Effective address: the actual memory address specified by the addressing mode.
 “->” is for assignment. Mem[R[R1]] refers to the contents of the memory location whose
location is given the contents of register 1 (R1).
18
Figure 2.7
Summary of use of memory addressing modes
Based on a VAX which supported everything – from SPEC89
19
Displacement Addressing Mode
How big should the displacement be?
Figure 2.8 Displacement values are widely distributed
20
Displacement Addressing Mode (cont.)
• Benchmarks show 12 bits of displacement would capture
about 75% of the full 32-bit displacements and 16 bits should
capture about 99%
• Remember: optimize for the common case. Hence, the
choice is at least 12-16 bits
 For addresses that do fit in displacement size:
Add
R4, 10000 (R0)
 For addresses that don’t fit in displacement size, the compiler
must do the following:
Load
R1, 1000000
Add
Add
R1, R0
R4, 0 (R1)
21
Immediate Addressing Mode
• Used where we want to get to a numerical value in an instruction
• Around 20% of the operations have an immediate operand
At high level:
At Assembler level:
a = b + 3;
Load
Add
if ( a > 17 )
Load
R2, 17
CMPBGT R1, R2
goto
Load
Jump
Addr
R2, 3
R0, R1, R2
R1, Address
(R1)
22
Immediate Addressing Mode
How frequent for immediates?
Figure 2.9 About one-quarter of data transfers and ALU operations have an
immediate operand
23
Immediate Addressing Mode
How big for immediates?
Figure 2.10 Benchmarks show that 50%-70% of the immediates fit within 8 bits
and 75%-80% fit within 16 bits
24
2.4 Addressing Modes for Signal Processing
Two addressing modes that distinguish DSPs
1.Modulo or circular addressing mode
–autoincrement/autodecrement to support circular buffers
•As data are added, a pointer is checked to see if it is pointing to the end
of the buffer
–If not, the pointer is incremented to the next address
–If it is, the pointer is set instead to the start of the buffer
2.Bit reverse addressing mode
–the hardware reverses the lower bits of the address, with the
number of bits reversed depending on the step of the FFT
algorithm
25
Addressing for Fast Fourier Transform (FFT)
• FFTs start or end their processing with data shuffled in a
particular order
Without special support, such address transformations would take
an extra memory access to get the new address, or involve a fair
amount of logical instructions to transform the address
0 (0002)
=>
0 (0002)
1 (0012)
=>
4 (1002)
2 (0102)
=>
2 (0102)
3 (0112)
=>
6 (1102)
4 (1002)
=>
1 (0012)
5 (1012)
=>
5 (1012)
6 (1102)
=>
3 (0112)
7 (1112)
=>
7 (1112)
26
Figure 2.11 Static Frequency of Addressing
Modes for TI TMS320C54x DSP
17 addressing modes, 6 modes also found in Figure 2.6 account for 95% of the
DSP addressing
27
Summary: Memory Addressing
• A new architecture expected to support at least:
displacement, immediate, and register indirect
– represent 75% to 99% of the addressing modes (Figure 2.7)
• The size of the address for displacement mode to be
at least 12-16 bits
– capture 75% to 99% of the displacements (Figure 2.8)
• The size of the immediate field to be at least 8-16 bits
– capture 50% to 80% of the immediates (Figure 2.10)
Desktop and server processors rely on compilers, but
historically DSPs rely on hand-coded libraries
28
2.5 Type And Size of Operands
How is the type of an operand designated?
• The type of the operand is usually encoded in the opcode
– e.g., LDB – load byte; LDW – load word
• Common operand types: (imply their sizes)
Character (8 bits or 1 byte)
Half word (16 bits or 2 bytes)
Word (32 bits or 4 bytes)
Double word (64 bits or 8 bytes)
Single precision floating point (4 bytes or 1 word)
Double precision floating point (8 bytes or 2 words)
 Characters are almost always in ASCII
 16-bit Unicode (used in Java) is gaining popularity
 Integers are two’s complement binary
 Floating points follow the IEEE standard 754
• Some architectures support packed decimal: 4 bits are used to
encode the values 0-9; 2 decimal digits are packed into each byte
29
Figure 2.12 Distribution of data accesses
by size for the benchmark programs
SPEC2000 Operand Sizes
The double-word data type is used for double-precision floating point in
floating-point programs and for addresses
30
2.6 Operands for Media and Signal
Processing
• Vertex
–
–
–
–
A common 3D data type dealt in graphics applications
four components: (x, y, z) and w=color or hidden surfaces
vertex values are usually 32-bit floating-point values
Three vertices specify a graphics primitive such as a triangle
• Pixel
– Typically 32 bits, consisting of four 8-bit channels
• R (red), G (green), B (blue), and A (attribute: eg. transparency)
• DSPs add fixed point
– fractions between -1 and +1 (divide by 2n-1)
• Blocked floating point
– a block of variables with common exponent
– accumulators, registers that are wider to guard against
round-off error to aid accuracy in fixed-point arithmetic
31
Size of Data operands for DSP
Figure 2.13 Four generations of DSPs, their data width, and the width of the registers
that reduces round-off error
Figure 2.14 Size of data operands for TMS320C540x DSP. This DSP has two 40-bit
accumulators and no floating-point operations.
32
Brief Summary
• Review instruction set classes
– choose register-register class
• Review memory addressing
– select displacement, immediate, and register indirect
addressing modes
• Select the operand sizes and types
33
2.7 Operations in the Instruction Set
Figure 2.15 Categories of instruction operators and examples of each.
•All computers generally provide a full set of operations for the first three categories
•All computers must have some instruction support for basic system functions
•Graphics instructions typically operate on many smaller data items in parallel
34
Figure 2.16 Top 10 instructions for the 80x86
• Simple instructions dominate this list and responsible for 96% of the instructions
executed
• These percentages are the average of the five SPECint 92 programs
35
2.8 Operations for Media & Signal Processing
• Data for multimedia operations is often narrower than the
64-bit data word
– normally in single precision, not double precision
• Single-instruction multiple-data (SIMD) or vector
instructions
– A partitioned add operation on 16-bit data with a 64-bit ALU
would perform four 16-bit adds in a single clock cycle
• Hardware cost: prevent carries between the four 16-bit partitions of the
ALU
– Two 32-bit floating-point operations (paired single operations)
• The two partitions must be insulated to prevent operations on one half
from affecting the other
36
Figure 2.17 Summary of multimedia
support for desktop RISCs
•B: byte (8 bits), H: half word (16 bits), W: word (32 bits) 8B: operation on 8 bytes in a single
instruction
•All are fixed-width operations, performing multiple narrow operations on either a 64-bit or
128-bit ALU
37
Multimedia Operations for DSPs
• DSP architectures use saturating arithmetic
– If the result is too large to be represented, it is set to the largest
representable number
– There is not an option of causing an exception on arithmetic overflow
• Prevent missing an event in real-time applications
• The result will be used no matter what the inputs
• There are several modes to round the wider accumulators into the
narrower data words
• The targeted kernels for DSPs accumulate a series of produces,
and hence have a multiply-accumulate (MAC) instruction
– MACs are key to dot product operations for vector and matrix multiplies
Finite Impulse Response (FIR) Problem
y[n] = S c[k] * x[n-k]
In C:
y[n] = 0
for(k=0; k <N; k++)
y[n] = y[n] + c[k]*x[n-k]
General form:
x=x+y*z
IBM PowerPC 440 MAC instruction:
macchw RT, RA, RB
where RT = x, RA = y, RB = z
38
Figure 2.18 Mix of
instructions for
TMS320C540x DSP
• 16-bit architecture use two 40-bit
accumulators, 8 address registers, no
floating-point operations (fixed points
instead), plus a stack for passing
parameters to library routines and for
saving return addresses
• 15% to 20% of the multiplies and
MACs round the final sum (not shown)
39
2.9 Instructions for Control Flow
 Control instructions change the flow of control:
instead of executing the next instruction, the
program branches to the address specified in the
branching instructions
 They are a big deal
• Primarily because they are difficult to optimize out
• AND they are frequent
 Four types of control instructions
•
•
•
•
Conditional branches
Jumps – unconditional transfer
Procedure calls
Procedure returns
40
Control Flow Instructions
 Issues:
• Where is the target address? How to specify it?
• Where is return address kept? How are the arguments passed? (calls)
• Where is return address? How are the results passed? (returns)
 Figure 2.19 Breakdown
41
Addressing Modes for Control Flow
Instructions
• PC-relative (Program Counter)
– supply a displacement added to the PC
• Known at compile time for jumps, branches, and calls (specified
within the instruction)
– the target is often near the current instruction
• requiring fewer bits
• independently of where it is loaded (position independence)
• Register indirect addressing – dynamic addressing
– The target address may not be known at compile time
– Naming a register that contains the target address
•
•
•
•
Case or switch statements
Virtual functions or methods
High-order functions or function pointers
Dynamically shared libraries
42
Figure 2.20 Branch distances
These measurements were taken on a load-store computer (Alpha architecture)
with all instructions aligned on word boundaries
43
Conditional Branch Options
Figure 2.21 Major methods for evaluating branch conditions
44
Figure 2.22 Comparison Type vs. Frequency
•Most loops go from 0 to n.
•Most backward branches are
loops – taken about 90%
Program % backward branches
gcc
26%
spice
31%
TeX
17%
Average
25%
% all control instructions that modify PC
63%
63%
70%
65%
45
Repeat Instruction for DSP
• DSPs: add looping structure, called a repeat
instruction, to avoid loop overhead
• It allows a single instruction or a block of instructions
to be repeated up to, say, 256 times
• eg. TMS320C54 dedicates three special registers to
hold the block starting address, ending address, and
repeat counter
46
Procedure Invocation Options
•Procedure calls and returns
– control transfer
– state saving; the return address must be saved
Newer architectures require the compiler to generate stores and
loads for each register saved and restored
•Two basic conventions in use to save registers
– caller saving: the calling procedure must save the registers that
it wants preserved for access after the call
• the called procedure need not worry about registers
– callee saving: the called procedure must save the registers it
wants to use
• leaving the caller unrestrained
most real systems today use a combination of the two
mechanisms
• specified in an application binary interface (ABI) that set down the basic
rules as to which register be caller saved and which should be callee
saved
47
2.10 Encoding an Instruction Set
•Opcode: specifying the operation
•# of operand
– addressing mode
– address specifier: tells what addressing
mode is used
– Load-store computer
• Only one memory operand
• Only one or two addressing modes
•Encoding issues
Figure 2.23 Three basic variations
in instruction encoding
• The length of 80x86 instructions
varies between 1 and 17 bytes
• Trade-off: size of programs vs.
ease of decoding
1. The desire to have as many registers and
addressing modes as possible
2. The impact of the size of the register and
addressing mode fields on the average
instruction size and hence on the average
program size
3. A desire to have instructions encoded into
lengths that will be easy to handle in a
pipelined implementation
48
Instruction formats
for desktop/server
RISC
architectures
49
Reduced Code Size in RISCs
• Hybrid encoding – support 16-bit and 32-bit instructions
in RISC, eg. ARM Thumb and MIPS 16
– narrow instructions support fewer operations, smaller address
and immediate fields, fewer registers, and two-address format
rather than the classic three-address format
– claim a code size reduction of up to 40%
• Compression in IBM’s CodePack
– Adds hardware to decompress instructions as they are fetched
from memory on an instruction cache miss
– The instruction cache contains full 32-bit instructions, but
compressed code is kept in main memory, ROMs, and the disk
• Hitachi’s SuperH: fixed 16-bit format
– 16 rather than 32 registers
– fewer instructions
50
2.11 The Role of Compilers
 Today almost all programming is done in high-level languages. As
such, an ISA is essentially a complier target.
 Because performance of a computer will be significantly affected by
the compiler, understanding compiler technology today is critical to
designing and efficiently implementing an IS.
 Compiler goals:
 All correct programs execute correctly
 Most compiled programs execute fast (optimizations)
 Fast compilation
 Debugging support
51
Typical Modern Compiler Structure
C, Fortran, or Cobol, etc.
Object Code
52
Optimization Types
 High level – done at or near source code level
• If procedure is called only once, put it in-line and save CALL
• more general case: if call-count < some threshold, put them in-line
 Local – done within straight-line code
• common sub-expressions produce same value – either allocate a
register or replace with single copy
• constant propagation – replace constant valued variable with the
constant
• stack height reduction – re-arrange expression tree to minimize
temporary storage needs
 Global – across a branch
• copy propagation – replace all instances of a variable A that has been
assigned X (i.e., A=X) with X.
• code motion – remove code from a loop that computes same value each
iteration of the loop and put it before the loop
• simplify or eliminate array addressing calculations in loops
53
Optimization Types
 Machine-dependent optimizations – based on machine knowledge
• strength reduction – replace multiply by a constant with shifts and adds
• would make sense if there was no hardware support for MUL
• a trickier version: 17  = arithmetic left shift 4 and add
• pipelining scheduling – reorder instructions to improve pipeline
performance
• dependency analysis
• branch offset optimization - reorder code to minimize branch offsets
54
Complier Optimizations – Change in IC
•
•
•
•
L0 – unoptimized
L1 – local opts, code scheduling, & local reg. allocation
L2 – global opts and loop transformations, & global reg. Allocation
L3 – procedure integration
55
The Impact of Compiler Technology
How are variables allocated and addressed?
How many registers are needed to allocate variables appropriately?
 Three separate areas for data allocation
• Stack
– Used to allocate local variables
– Grown and shrunk on calls and returns
– Addressing is relative to the stack pointer
• Global data area
– Used to allocate statically declared objects, such as global variables and
constants
– A large percentage of these objects are aggregate data structures such as
arrays
• Heap
– Used to allocate dynamic objects
– Access are usually by pointers
– Data is typically not scalars (single variables)
56
Register Allocation Problem
 Reasonably simple for stack-allocated objects
• Done with the graph coloring theory: variable – vertex; dependency
between variables – edge; # register will be equal to # colors
 Essentially impossible for heap-allocated objects because they are
accessed with pointers
 Hard for global variables and some static variables due to aliasing
opportunity
 There are multiple ways to refer to the address of a variable b
p  &b
b2
p3
...b...
57
How do you help the compiler writer?
 Key: make the frequent cases fast and the rare case correct
 Guidelines that will make it easy to write a complier
– Regularity
– Addressing modes, operations, and data types should be
independent of each other
– Provide primitives, not solution
– What works in one language may be detrimental to others, so don’t
optimize for one particular language
– Simplify trade-offs among alternatives
– Anything that makes code sequence performance obvious is a
definite win!
– Provide instructions that bind the quantities known at compile
time as constants
58
2.12 The MIPS64 Architecture
A Typical General-Purpose Load-Store Architecture
 Registers
• 32 integer - 64-bit integer registers (R0:R31)
– R0 is always equal to 0 and the rest are general purpose
• 32 FP registers (F0:F31)
– Contains a single 32-bit or a single 64-bit float
 Data Types
• byte, half word, word, double word
 Instructions
• 32-bit long
• Fixed length: 4 bytes, 3 instruction types ==> easy decode
• Overall goal: simple ==> pipeline ease ==> performance
59
MIPS64 Operations
 Key features
• addressing modes for data
Displacement (size 12-16 bits)
Immediate (size 8-16 bits)
Register indirect: placing 0 in the 16-bit displacement field
Absolute addressing: using R0 as the base register
• conditional branches
Test the register source for zero or nonzero (compare equal); or
Compare 2 registers (compare less)
If satisfied, then jump PC+4+immeidate (size 8 bits long)
jump, call, and return
• floats - usual set of operations
.S for single precision (32-bit)
.D for double precision (64-bit)
60
Instruction Format
Fixed length with 3 types
NOTE:
• 16 bit immediate:
Immediate data and
PC-relative offset for
short jumps and
conditional branches.
• 26 bit PC-relative offset
for calls and returns
(regular or traps) and
long jumps.
61
MIPS64 Operations ( load and store)
Example instruction
Instruction name
Meaning
LD R1, 30(R2)
Load double word
Regs[R1]←64 Mem[30+Regs[R2]]
LD R1, 1000(R0)
Load double word
Regs[R1]←64 Mem[1000+0]
LW R1, 30(R2)
Load word
Regs[R1]←64 (Mem[30+Regs[R2]]0)32 ##
Mem[30+Regs[R2]]
LB R1, 30(R2)
Load byte
Regs[R1]←64 (Mem[30+Regs[R2]]0)56 ##
Mem[30+Regs[R2]]
L.S F0, 30(R2)
Load FP single
Regs[F0]←64 Mem[30+Regs[R2]]##032
L.D F0, 30(R2)
Load FP double
Regs[F0]←64 Mem[30+Regs[R2]]
SD R3, 500(R4)
Store double word
Mem[500+Regs[R4]] ← 64 Regs[R3]
SW R3, 500(R4)
Store word
Mem[500+Regs[R4]] ← 32 Regs[R3]
SH R3, 500(R4)
Store half word
Mem[500+Regs[R4]] ← 16 Regs[R3]48..63
SB R3, 500(R4)
Store byte
Mem[500+Regs[R4]] ← 8 Regs[R3]56..63
S.S F0, 500(R4)
Store FP single
Mem[500+Regs[R4]] ← 32 Regs[F0]0..31
S.D F0, 500(R4)
Store FP double
Mem[500+Regs[R4]] ← 16 Regs[F0]
62
MIPS64 Operations ( load and store)
Illustration legends:

A subscript is appended to the symbol ←whenever the length of the datum being
transferred might not be clear. Thus, ←n means transfer an n-bit quantity.

A subscript is used to indicate selection of a bit from a filed. Bits are labeled
from the most significant bit starting at 0.
(Mem[30+Regs[R2]]0)32 : replicating the sign bit for 32 times

The variable Mem, used as an array that stands for main memory, is indexed a
byte address and may transfer any number of bytes.

A superscript is used to replicate a filed.

The symbol ## is used to concatenate two fields and may appear on either side
of a data transfer.
63
MIPS64 Operations ( A/L w/o immediate )
Example instruction
Instruction name
Meaning
DADDU R1, R2, R3
Add unsigned
Regs[R1] ← Regs[R2]+Regs[R3]
DADDI R1, R1, #-3
Add immediate
Regs[R1] ← Regs[R1]-3
LUI R1, #88
Load upper immediate
Regs[R1] ← 032 ## 88 ## 016
DSLL R1,R2, #3
Shift left logic
Regs[R1] ← Regs[R2]<<3
DSLT R1,R2,R3
Set less than
If (Regs[R2]<Regs[R3]) Regs[R1] ← 1 else
Regs[R1] ← 0
<<: logical shift left
64
MIPS64 Operations ( flow control)
Example instruction
Instruction name
Meaning
J
name
Jump
PC ← name; ((PC+4)–225) ≤ name <
((PC+4)+225)
JAL
name
Jump and link
Regs[R31] ← PC+4; PC ← name; ((PC+4)–
225) ≤ name < ((PC+4)+225)
JALR R2
Jump and link register
Regs[R31] ← PC+4; PC ← Regs[R2]
JR
Jump register
PC ← Regs[R3]
BEQZ R4, name
Branch equal zero
If (Regs[R4] == 0) PC ← name; ((PC+4)–215)
≤ name < ((PC+4)+215)
BNE
Branch not equal zero
If (Regs[R3] != Regs[R4]) PC ← name;
((PC+4)–215) ≤ name < ((PC+4)+215)
Conditional move if zero
If (Regs[R3] == 0) Regs[R1] ← Regs[R2]
R3
R3, R4, name
MOVZ R1, R2, R3
65
Summary
Chapter 2 Instruction Set Principles and Examples
2.1 Introduction
2.2 Classifying Instruction Set Architectures
2.3 Memory Addressing
2.4 Addressing Modes for Signal Processing
2.5 Type and Size of Operands
2.6 Operands for Media and Signal Processing
2.7 Operations in the Instruction Set
2.8 Operations for Media and Signal Processing
2.9 Instructions for Control Flow
2.10 Encoding an Instruction Set
2.11 The Role of Compilers
2.12 The MIPS Architecture
66