無投影片標題 - VLSI Laboratory

Transcript 無投影片標題 - VLSI Laboratory

Chapter 2:Instruction Set Principles and Examples
Rung-Bin Lin
2-1
Chapter 2. Instruction Set Principles and
Examples
• Topics
– Present a taxonomy of instruction set alternatives
– Analyze some instruction set measurements
– Discuss instruction set architecture not aim at desktops or
servers: DSPs and Media processors
– Address the issue of languages and compilers
– Overview the MIPS
Chapter 2:Instruction Set Principles and Examples
Rung-Bin Lin
2-2
Classification of Instruction Set Architectures
• The type of internal storage in the CPU is the most
basic differentiation.
– The major choices are a stack, an accumulator, or a set of
registers. Operands may be explicit or implicit.
• Stack architecture : Early machines
• Accumulator architecture : Early machines
• General purpose register (GPR) architecture : machines after
1980.
Rung-Bin Lin
Chapter 2:Instruction Set Principles and Examples
Types of Machines
2-3
Rung-Bin Lin
Chapter 2:Instruction Set Principles and Examples
Code Sequence for C=A+B
Stack
Push A
Push B
Add
Pop C
Accumulator
Load A
Add B
Store C
Register-memory Register-register
Load R1, A
Add R1, B
Store C
Load R1, A
Load R2, B
Add R3, R1, R2
Store C, R3
2-4
Chapter 2:Instruction Set Principles and Examples
Rung-Bin Lin
2-5
Memory Access for GPR Machines
– Two ways to access explicit operands
• First loaded into temporary storage
• Accessed directly from memory
– Memory access for register machine
• Register-memory architecture: one can access memory as part of any
instruction.
• Register-register or load-store architecture: memory access only by load
or store instructions.
– Memory access for memory-memory architecture
– Reasons for emergence of general-purpose register (GPR) machines
• Registers are faster than memory
• Registers are easily used by a compiler and used more effectively.
– Example: (A*B)-(C*D)-(E*F) for stack machine? for GPR machine?
• Registers can be used to hold variables: Reduce memory traffic, improve
code density, speed up program.
Rung-Bin Lin
Chapter 2:Instruction Set Principles and Examples
2-6
Types of GPR Machines
• Two major instruction set characteristics divide
GPR architectures:
• Whether an ALU instruction has two or three operands.
• How many of the operands may be memory address in ALU
instructions?
• Example (fig. 2.3 possible combinations)
Number of memory
addresses
0
1
2
3
Maximum number of
Examples
operands allowed
3
SPARC, MIPS, PA, PowerPC, Alpha
2
Intel 80X86, Motorola 68000
2
VAX
3
VAX
Rung-Bin Lin
Chapter 2:Instruction Set Principles and Examples
Advantages & Disadvantages of GPR
Machines
Type
Advantages
Disadvantages
Register
(3,0)
Simple, fixed inst. length,
similar clocks to execute
Poor code density
Register
(1,2)
Easy to encode,
good code density
Operands destroyed,
CPI varies
Memory-Memory
(2,2) or (3,3)
Most compact
Inst. Length and CPI
varies greatly
2-7
Chapter 2:Instruction Set Principles and Examples
Rung-Bin Lin
Memory Addressing
• How is a memory address interpreted?
– Byte addressed: Provide access for bytes, half words,
words, and double words (64 bits)
– Conventions for ordering the bytes within a word:
• Little Endian: put byte whose address xxxx00 at LSB position.
Word address
Data
0
3
2
1
0
4
7
6
5
4
• Big Endian: Put byte whose address xxxx00 at MSB position.
Word address
Data
0
0
1
2
3
4
4
5
6
7
2-8
Rung-Bin Lin
Chapter 2:Instruction Set Principles and Examples
2-9
Address Alignment
• Access to objects larger than a byte must be aligned.
• An access to an object of size S bytes at byte
address A is aligned if A mod S =0.
– Fig. 2.5 aligned and misaligned access
Object
Aligned at
Misaligned at
addressed
byte offsets
byte offsets
-----------------------------------------------------------Byte
0,1,2,3,4,5,6,7 Never
Half word
0,2,4,6
1,3,5,7
Word
0,4
1,2,3,5,6,7,
Double word 0
1,2,3,4,5,6,7
– A misaligned memory access will take multiple aligned
memory references
Rung-Bin Lin
Chapter 2:Instruction Set Principles and Examples
2-10
Addressing Mode
• How architectures specify the address of an object
they will access?
– In a GPR, an addressing mode can specify
• a constant,
• a register,
• a location in memory (used to compute effective address).
– Immediate or literals are usually considered as memory
addressing mode.
– Addressing modes that depend on the program counter is
called PC-relative addressing.
– Addressing modes can significantly reduce instruction
counts, but may add to the complexity of building a
machine and increase the average CPI.
Rung-Bin Lin
Chapter 2:Instruction Set Principles and Examples
Addressing Modes for Desktops and Servers
Register
Immediate
Displacement
Register Indirect
Indexed
Direct (Absolute)
Memory Indirect
Autoincrement
Autodecrement
Scaled
ADD
ADD
ADD
ADD
ADD
ADD
ADD
ADD
ADD
ADD
R4,
R4,
R4,
R4,
R3,
R1,
R1,
R1,
R1,
R1,
R3
#3
100(R1)
(R1)
(R1+R2)
(1001)
@(R3)
(R2)+
-(R2)
100(R2)[R3]
2-11
Chapter 2:Instruction Set Principles and Examples
Rung-Bin Lin
Addressing Mode Usage (VAX)
2-12
Chapter 2:Instruction Set Principles and Examples
Rung-Bin Lin
Displacement Addressing Mode (Alpha,
SPEC CPU2000)
2-13
Chapter 2:Instruction Set Principles and Examples
Rung-Bin Lin
Immediate or Literal Addressing Mode
2-14
Chapter 2:Instruction Set Principles and Examples
Rung-Bin Lin
Distribution of Immediate Values (Alpha,
SPEC CPU2000)
2-15
Chapter 2:Instruction Set Principles and Examples
Rung-Bin Lin
2-16
Addressing Mode for DSPs
• 95% of DSP addressings are same as the Addressing
modes for desktops and servers
• Circular (modulo) addressing mode
– Starting address register and ending address register
• Bit-reverse addressing mode
– Reverse the bit order in an address register
• 6(1100)
 3(0011)
• Used for Fast Fourier Transform(FFT)
• Bit-reverse and circular takes only 5% of DSP
addressings
Chapter 2:Instruction Set Principles and Examples
Type and Size of Operands
• How is the type of an operand designated?
– Encoding it in the OpCode
– Annotated with tags
• For desktops(servers) and DSPs
Rung-Bin Lin
2-17
Chapter 2:Instruction Set Principles and Examples
Rung-Bin Lin
Operand Types for Desktops and Servers
– Character, half word, word, single-precision floating
point, double-precision floating point.
– Distribution of data accesses by size
2-18
Chapter 2:Instruction Set Principles and Examples
Rung-Bin Lin
Operands for Media and Signal Processing
• Vertex for graphic operations
– A vertex has four components
• X-coordinate, Y-coordinate, Z-coordinate, and W-coordinate (to
help with color or hidden surface)
• Pixel for imaging Processing
– A pixel typically has 32 bits which is divided into four 8bit channels.
• R(red), G(green), B(blue), and A (denote the transparency of the
surface or the pixel)
• Fixed-point operand type (in addition to floating
point)
• Data widths
– 32 bits, 24 bits and 16 bits are common (fig. 2.13)
2-19
Chapter 2:Instruction Set Principles and Examples
Rung-Bin Lin
Operations in the Instruction Set
• Desktops and servers
– ALU, Data transfer, control, system, floating point,
decimal, string, graphics.
– SIMD instructions for medial and signal processing (fig.
2.17)
– Ten most used 80x86instructions (takes 96% of the total
inst.)
• Load(22%), conditional branch(20%), compare(16%),
store(12%), …. (fig. 2.16)
• DSPs (beside the above operations in desktops)
– MAC (multiply and accumulate)
– Saturating arithmetic
• If the result is too large to be represented, it is set to the largest
representable number.
2-20
Chapter 2:Instruction Set Principles and Examples
Instructions for Control Flow
• Types and their frequency
Rung-Bin Lin
2-21
Chapter 2:Instruction Set Principles and Examples
Rung-Bin Lin
2-22
Destination Address of a Control Flow
Instructions
• Specified explicitly except procedure return.
– The most common way to specify the destination is to use
PC-relative addressing. Its advantages:
• Require fewer bits
• Form re-loadable code
• Procedure return and indirect jump require a
destination address be specified dynamically.
– The target address may be as simple as naming a register.
– Permit any addressing mode to be used to supply the
target address.
Chapter 2:Instruction Set Principles and Examples
Rung-Bin Lin
Use of Indirect Jump
• The register indirect jump can be useful in
– case or switch statements
– dynamically shared library
– virtual functions or methods in objected oriented
languages
– high-order functions or function pointer in C or C++
2-23
Rung-Bin Lin
Chapter 2:Instruction Set Principles and Examples
Branch distance
2-24
Chapter 2:Instruction Set Principles and Examples
Rung-Bin Lin
How Branch Condition Is Tested?
• Condition code (CC)
–
–
–
–
80x86, ARM, Power PC, SPARC, SuperH
Test special bits set by ALU
Condition is set for free
CC is extra state and constrains the ordering of inst.
• Condition register
–
–
–
–
Alpha, MIPS
Test arbitrary register with the result of a comparison
Simple
Use up a register
• Compare and branch
–
–
–
–
Compare is a part of branch
PA-RISC, VAX
One instruction for doing branch
Too much work
2-25
Chapter 2:Instruction Set Principles and Examples
Rung-Bin Lin
Frequency of Conditional Branches
2-26
Chapter 2:Instruction Set Principles and Examples
Rung-Bin Lin
Procedure Invocation Options
• Procedure calls and returns involve saving of
machine states
– Program counter must be saved for returning to the
caller
– Some other registers automatically saved by hardware
(Old architecture)
– Registers are saved by executing the code generated by
the compiler
• Caller saving
• Callee saving
2-27
Chapter 2:Instruction Set Principles and Examples
Rung-Bin Lin
2-28
Encoding an Instruction Set
• How to encode the addressing mode with the operations?
– Depend on the number of operands per instruction and the number
of addressing modes per operand.
• If the number of operation and addressing mode combinations is large,
a separate address specifier is needed for each operand (ex. VAX
machine).
• If it is small, the addressing mode can be encoded as part of the OpCode
(Load-store machine, one memory operand and one or two addressing
mode).
– The architects must balance several competing forces when encoding
the instruction set.
• The desire to have as many registers and addressing modes as possible.
• The impact of the size of the register and addressing mode fields on the
average instruction size and on the average program size.
• A desire to have instructions encoded into lengths that will be easy to
handle in the implementation (i.e., multiples of bytes).
Chapter 2:Instruction Set Principles and Examples
Rung-Bin Lin
Choices for Encoding the Instruction Set
2-29
Chapter 2:Instruction Set Principles and Examples
Rung-Bin Lin
An Example of The Variable Encoding
ADD
EAX, 1000(EBX)
• Indicates a 32-bit integer ADD instruction with two operands.
• Opcode takes 1 byte
• Needs a 1-byte address specifier to indicate addressing mode and
the register being used.
• The displacement needs four bytes.
• Total length: 1+1+4=6 bytes
– 80x86 instruction length
• 1 to 17 bytes
2-30
Chapter 2:Instruction Set Principles and Examples
Rung-Bin Lin
Code Size Reduction
• Use of hybrid format (40% reduction)
– ARM series processors
• 32-bit ARM instruction set
• 16-bit Thumb instruction set
– MIPS (40% reduction in code size)
• 32-bit MIPS
• 16-bit MIPS
• Use of code compression
– IBM PowerPC decodes the compressed code when it is
first fetched from main memory
2-31
Chapter 2:Instruction Set Principles and Examples
Rung-Bin Lin
The Role of Compilers
• Crossed coupled with the instruction set
architecture
• It greatly affects the code size and execution speed
of a program
• Goal of a compiler writer
– Correctness
– Speed of compiled code
2-32
Chapter 2:Instruction Set Principles and Examples
Rung-Bin Lin
The Structure of Recent Compilers
2-33
Chapter 2:Instruction Set Principles and Examples
Rung-Bin Lin
Phase-ordering Problem
• Compilers make assumptions about the ability of
later steps to deal with certain problems. It limits
the optimization effectiveness of compilation.
• Examples
– Choose which procedure calls to expand inline before
they know the exact size of the procedure.
– Global common subexpression elimination assumes the
value will be allocated with a temporary register
2-34
Chapter 2:Instruction Set Principles and Examples
Optimization of Compilation
• High-level optimization
– Processor independent
• Local optimization
– Within a basic block
• Global optimization
– Across branches
• Register allocation
– Machine dependent
– Can be solved by Graph Coloring
• Other processor-dependent optimization
Rung-Bin Lin
2-35
Chapter 2:Instruction Set Principles and Examples
Rung-Bin Lin
Impact of Optimization on Performance
2-36
Chapter 2:Instruction Set Principles and Examples
Rung-Bin Lin
2-37
How the Architect Can Help Compiler
Writer
• Instruction set properties help compiler writers
– Regularity: Operations, data types, and addressing
modes should be orthogonal (independent).
– Provide primitives, not solutions: Don’t try to match a
language construct.
– Simplify the trade-offs among alternatives: One of the
most difficult instances of complex trade-offs occurs in a
register-memory architecture in deciding how many times
a variable should be referenced before it is cheaper to
load it into a register.
– Provide instructions that bind the quantities known at
compiled time as constants.
Chapter 2:Instruction Set Principles and Examples
Rung-Bin Lin
The MIPS Architecture
• The MIPS architecture emphasizes
– A simple load-store instruction set.
– Design for pipelining efficiency, including a fixed instruction set
encoding.
– Efficiency as a compiler target.
• Registers for MIPS64
– 32 64-bit general purpose registers, named R0, R1, …, R31. The
value of R0 is always 0.
– 32 single(double)-precision registers, named F0, F1, …, F31.
• Data types for DLX
– 8-bit byte, 16-bit half word, 32-bit word for integer.
– 32-bit single-precision, 64-bit double-precision for floating point.
2-38
Chapter 2:Instruction Set Principles and Examples
Rung-Bin Lin
2-39
Addressing Modes for MIPS64
– Immediate and displacement, both with 16-bit fields.
– Register deferred can be obtained by placing 0 in the 16bit displacement field. Ex. LD R1, 0(R1).
– Absolute addressing can be obtained by using register R0
as the base register. Ex. LW R1, xx(R0).
– Byte addressable with a 64-bit address.
– A mode bit to select Big-Endian or little-Endian mode
– A load-store architecture.
Chapter 2:Instruction Set Principles and Examples
MIPS Instruction Format
Rung-Bin Lin
2-40
Rung-Bin Lin
Chapter 2:Instruction Set Principles and Examples
2-41
MIPS Operations
– Load and stores (fig. 2.28 on page 133)
– ALU operations (fig. 2.29 on page 134)
– Branches and jumps (fig. 2.30 on page 135)
• Jump
– target address: 26-bit offset + PC+4 or a register containing the whole
32-bit address.
• Jump and Link
– Target address: same as jump.
– Return address: PC+4
• Conditional Branch
– Target address: 16-bit offset + PC+4
• Branch condition is specified in the instruction, which may test the
register source of zero or non-zero.
– Floating-point operations
• Add, subtract, multiply and divided.
• MOV.S and MOV.D copy a single-precision (MOVF) or double-precision
floating-point register to another register of the same type.
Chapter 2:Instruction Set Principles and Examples
Usage of MIPS Instructions
Rung-Bin Lin
2-42
Chapter 2:Instruction Set Principles and Examples
Rung-Bin Lin
Concluding Remarks
– Changes of the instruction set architecture in 1990s
•
•
•
•
•
32-bit address ----> 64-bit address
Optimization of conditional branches via conditional execution
Optimization of cache performance via prefetch
Support for multimedia
Faster floating-point operations
– Trends in next decade
•
•
•
•
Long instruction word.
Increased conditional execution.
Blending of general-purpose and DSP architectures
80x86 emulation
2-43

無投影片標題 - VLSI Laboratory

Transcript 無投影片標題 - VLSI Laboratory

Directory