Classification of Architectures C=A+B Stack Accumulator PUSH A PUSH B ADD POP C LOAD A ADD B STORE C GPR mem-mem GPR reg-mem GPR reg-reg ADD C, A, B -ORADD R1, A, B STORE C, R1 LOAD R1,

Download Report

Transcript Classification of Architectures C=A+B Stack Accumulator PUSH A PUSH B ADD POP C LOAD A ADD B STORE C GPR mem-mem GPR reg-mem GPR reg-reg ADD C, A, B -ORADD R1, A, B STORE C, R1 LOAD R1,

Classification of Architectures
C=A+B
Stack
Accumulator
PUSH A
PUSH B
ADD
POP C
LOAD A
ADD
B
STORE C
GPR mem-mem
GPR reg-mem
GPR reg-reg
ADD C, A, B
-ORADD R1, A, B
STORE C, R1
LOAD R1, A
ADD
R1, B
STORE C, R1
LOAD R1, A
LOAD R2, B
ADD R2, R1
STORE C, R2
Sometimes, GPR reg-reg processors use three operands: e.g. ADD R2, R1, R2
Registers are used for expression evaluation, parameter passing and holding vars.
GPR Vs Dedicated registers – GPR is superior for easy compiler use
Number of registers: 8, 16, 32, 64. If too large, slower access due to decoding
GPR Reg. – Reg. with 0 memory operand and 3 (or 2) total operands in ALU
instructions is simple; fixed instruction length encoding; simple code generation;
CPI for all instructions nearly the same; higher IC. Load-Store architecture.
ENGR 6861 Fall 2005
R. Venkatesan, Memorial University
1
Memory Addressing
•
•
•
•
•
•
•
•
Byte addressable memory means each byte has a unique address:
almost always used so that one character can be modified selectively.
Each word consists of several bytes. If the wordsize is 64 bits, then
each word spans eight addresses.
Memory access: read (load) or write (store).
Store (write) operation should be able to access 1 byte, quarter word,
half word or 1 word at a time so that one character, short integer,
integer/SPFP, long integer/DPFP can be modified as required.
For orthogonality, load (read) operations also facilitate access to 1 byte,
quarter word, half word or 1 word accesses.
Little endian: if the lsB of the word has the address xxx…xx000
Big endian: if the msB of the word has the address xxx…xx000
Aligned memory access restriction: word should be located starting at
address xxx…xx000, and not with any address that ends with any other
3-bit combination. This could mean a few wasted memory locations,
but a complex alignment network is avoided, thus making t smaller.
ENGR 6861 Fall 2005
R. Venkatesan, Memorial University
2
Addressing Modes
•
•
•
•
•
•
•
•
•
•
Dozens of memory addressing modes have been used in CISC
systems such as IBM 360, 370 and VAX 11.
Detailed statistical analyses have been carried out on CISC computers,
and results are discussed in the textbook.
Registers are used as operands 50% of the time.
Displacement mode (index + disp.) is used 25% of the time.
Immediate addressing is used 20% of the time.
All other addressing modes are used <5% of the time, totally.
Therefore, it is efficient to design load-store processors with only
displacement memory addressing mode, as long as the instruction set
permits immediate operands as well. This way, we will not need a bit
field in each load or store instruction to identify the memory addressing
mode. If an unavailable mode be needed, use multiple instructions.
Displacement value: 12 bits capture 75% and 16 bits 99% of cases.
Immediate address: comparisons, const. in reg. move, shifts, etc.
Immediate value: 8 bits cover 70%, 16 bits cover 80%.
ENGR 6861 Fall 2005
R. Venkatesan, Memorial University
3
Operations in the Instruction Set
•
•
•
•
•
•
•
•
•
ALU reg-reg; ALU imm.; load; store; control transfer; FP; system, string,
graphics, decimal, and others that are needed, but occurring rarely.
Control transfer: Jump or Branch (PC relative) ; Conditional or
unconditional; call; return; s/w interrupts; trap. Most common:
conditional branches with 8-12 bit value. 75% are forward branches.
Register indirect jumps are useful in case or switch instructions,
dynamically shared libraries, virtual functions in C++, returns.
Most compares use an immediate 0 operand.
Condition code (flag): simple but constrain instruction ordering.
Condition register: test arbitrary register with result of comparison.
Compare and branch: one complex instruction; not good in pipelining.
State saving for procedure calls: caller saving or callee saving.
Media and signal processing: partitioned add enables four 16-bit adds
in a 64-bit ALU in one cycle; SIMD or vector instructions; paired single
operations using DPFP registers for vertices; saturating arithmetic;
MAC or multiply accumulate.
ENGR 6861 Fall 2005
R. Venkatesan, Memorial University
4
Type and Size of Operands
•
•
•
•
•
•
•
Most modern processors place another restriction: operands in ALU
instructions should always be 1 word long. This simplifies the hardware,
thus reducing t.
Character, integer (+ short, long), SPFP, DPFP, Boolean are common.
Fixed point data type for DSPs. 0.000000000000000 assumed binary
point in a fixed location; range: -1 to +1-2-15.
DSP: blocked fixed point: exponent separately kept (scaling) and
shared between a set of fixed-point variables.
Decimal: BCD or packed decimal.
Media or graphics operations would benefit from data types such as
vertex: x, y, z coordinates and w for color; each 32 bit FP.
pixels: R, G, B, A (transparency): 4 x 8 = 32 bits.
DSP: modulo or circular addressing mode; bit reverse addressing.
ENGR 6861 Fall 2005
R. Venkatesan, Memorial University
5
CISC vs RISC
•
•
•
•
•
•
Three (selected) instructions are sufficient to design a processor that
can be used to run any application; even simple operations will take a
very large number of repetitions of these three instructions.
CISC processors provide several hundreds to thousands of instructions
but the compilers do not use most of the esoteric instructions.
Most modern processors employ about a 100 instructions and thus
essentially follow RISC architecture.
They employ fixed instruction lengths (32 bytes, for example); impose
aligned memory access constraint; all ALU operations are done on
word-sized operands; provide only one or a few addressing modes; do
not use flags but use a GPR for compare. Floating-point operations are
almost always used.
Processors released in the past ten years include graphics, DSP and
multimedia operations and data types to facilitate such operations.
Examples of processors with good architectures: MIPS, Alpha, SUN
Sparc, Intel iPSC860.
ENGR 6861 Fall 2005
R. Venkatesan, Memorial University
6
Role of compilers
•
Compiler based issues:
–
–
–
–
•
Regularity and orthogonality: operations, data types, addr. modes
Provide primitives, not solutions
Simplify trade-offs among alternatives
Provide instructions that bind the quantities known at compile time as const.
Compiler optimizations:
– High-level optimizations: often done on source with output fed to later
optimization passes. Example: procedure integration.
– Local optimizations: optimize code only within a straight line code fragment;
aka basic block. Examples: common expression elimination; constant
propagation, stack height reduction.
– Global optimizations: extend local optimizations across branches and
introduce a set of transformations aimed at optimizing loops. Examples:
copy propagation, code motion, induction variable elimination.
– Register allocation: associates registers with operands.
– Processor-dependent optimizations: Examples: pipeline scheduling, branch
offset optimization, replace multiplies with adds and shifts.
ENGR 6861 Fall 2005
R. Venkatesan, Memorial University
7
MIPS 64 Architecture
•
•
•
•
•
•
•
64-bit load-store architecture; can select little or big endian (mode bit).
32 64-bit GPRs (integer registers). R0 always has 0 and is read-only;
R31 is used with calls; all other GPRs are identical in usability.
32 64-bit FPRs. F0 … F31 can hold SPFP or DPFP numbers; when
SPFP is loaded the other half of FPR is used.
Aligned memory access; only addressing mode is displacement;
memory access as bytes, half words, words or double words – 64-bit
number is called a double word! LD, SD, LW, LWU, SW, LH, LHU, SH,
LB, LBU, SB. msbits are loaded with sign bit or zero (unsigned).
ALU operands are always 64 bits long. DADD, DADDI, DADDU,
DADDIU, DSUB, DSUBU, DMUL, DMULU, DDIV, DDIVU, MADD, AND,
ANDI, OR, ORI, XOR, XORI, LUI (load bits 32-47 with imm. and signextend), DSLL, DSRL, DSRA, DSLLV, DSRLV, DSRAV, SLT, SLTI,
SLTU, SLTIU.
Control: BEQZ, BNEZ, BEQ, BNE, BCIT, J, JR, JAL, JALR, TRAP
FP: ADD.D, ADD.S, ADD.PS; SUB.D, MUL.D, MADD.D, DIV.D, etc.
ENGR 6861 Fall 2005
R. Venkatesan, Memorial University
8
MIPS 64 Instruction Formats
ENGR 6861 Fall 2005
R. Venkatesan, Memorial University
9