CS136, Advanced Architecture Instruction Set Architecture CS 136 Types of ISAs • Stack – – – – Implicit operands (top of stack) Heavy memory traffic Limited ability to access operands.

Download Report

Transcript CS136, Advanced Architecture Instruction Set Architecture CS 136 Types of ISAs • Stack – – – – Implicit operands (top of stack) Heavy memory traffic Limited ability to access operands.

CS136, Advanced Architecture
Instruction Set Architecture
CS 136
1
Types of ISAs
• Stack
–
–
–
–
Implicit operands (top of stack)
Heavy memory traffic
Limited ability to access operands at will
Obsolete
• Accumulator
–
–
–
–
Implicit register operand (“accumulator”)
One memory operand
Insufficient temporaries
Obsolete
• General-purpose register
– Multiple registers
– Several variations
CS 136
2
GPR Architectures
• Memory-memory
– CISC idea
– Usually allows any operand to be in register as well
• Register-memory
– Example: x86
– Can do one operand in register, one in memory, or 2 in regs
• Register-register
–
–
–
–
CS 136
Only design used in modern machines
Lots of registers ⇒ fast flexible operand access
Simplicity of hardware
Compiler has full flexibility in register usage
3
Five Ways to Do C = A + B
STACK
ACCUM
MEM-MEM
REG-MEM
REG-REG
PUSH A
PUSH B
ADD
POP C
LOAD A
ADD B
STORE C
ADD C,A,B
LOAD R1,A
ADD R1,B
STORE R1,C
LOAD R1,A
LOAD R2,B
ADD R3,R1,R2
STORE R3,C
CS 136
4
Memory Addressing
• Originally just word addressing
• 8-bit bytes and byte addressing introduced on
IBM 360 series
• Brief experiments with bit addressing (bad idea)
• Unaligned accesses not worth supporting
• Some machines byte-address but only load/store
a word at a time
– Turned out to be bad design decision
– Too many programs do string processing 1 character at a time
– May need to revisit in future (32-bit characters?)
• Modern RISC designs allow short load/store, but
not short arithmetic
CS 136
5
Endian-ness
• The word is “Endian”, not “Indian”
• Reference to Gulliver’s Travels
• Little-Endian invented by Digital Equipment on
the PDP-11
–
–
–
–
Mathematically more elegant
Horrible for humans
“It seemed like a good idea at the time”
Should be banished from the face of the Earth
• Some machines can switch endianness with a
control bit
– This idea is even stupider than the original
CS 136
6
Addressing Modes
• How can an instruction reference memory?
• Early days: absolute address in instruction
– Led to instruction modification
– Improvement: “Indirection” picked up absolute location, used
it as final address
• Minimum necessary today: follow pointer in
register
– Clumsy if only option
• Fanciest conceivable: *(R1+S*R2+constant), with
either or both of R1 and R2 autoincremented or
autodecremented as side effect, either before or
after instruction
– No machine went quite this far
– But VAX came close
CS 136
7
Addressing Modes (cont’d)
• What’s actually useful?
• Need to follow pointers: can restrict to registers
– ADD R1,(R2)
– Better: LOAD R1,R2 (like MIPS)
• Frequent stack access ⇒ register + constant useful
• Immediates needed for built-in constants
• Access to globals ⇒ absolute memory addresses
– (We’ll see that that’s painful)
• PC-relative modes
– Used to be needed for data; not in modern systems
– Still needed for calls and branches
• Absolute addresses no longer needed for branches
– Can always emulate with PC-relative, since PC known
– Still available on some architectures
CS 136
8
Operand Types and Sizes
• Type usually implies size
• Integers can safely be widened to word size
– Shrink again when stored
– Takes advantage of two’s-complement representation
• Single-precision FP gives different results than
double-precisions
⇒ Necessary to support both widths
– Some FPUs can do two SP operations in parallel
• Older machines allowed “packed” decimal (2
digits per byte)
– x86 supports with DAA (Decimal Add Adjust) instruction
– Still useful in business world, though dying
• 32 bits standard these days, 64 bits coming
– 128 some day?
CS 136
9
Operations Provided
• Only one instruction truly needed: SJ
– Subtract A from B, giving C; if result is < 0, jump to D
– It’s Turing-complete!
• Practical machines need a bit more at minimum:
– Arithmetic and logical (add, multiply, divide?, and, or, …)
– Data movement (load/store, move between registers)
– Control (conditional/unconditional branch, procedure call and
return, trap to OS)
– System control (return from interrupt, manage VM, set
unprivileged mode, access I/O devices)
• Other builtins can be useful:
– Basic floating point
» Bad x86 design idea: sin, sqrt, etc.!
– Decimal
– String
– Vector, graphics
CS 136
10
Control Flow
• Addressing modes are important
– PC-relative means code can run at any virtual address
– Useful for dynamically linked (shared) libraries
• Pointer-following jump needed for returns
– Also useful for switch statements, function pointers, virtual
functions, and shared libraries
• How to specify condition for conditional
branches?
– Condition code as side effect of every instruction
» Boils down to extra register
» Spurious dependencies in pipeline
– Condition register explicitly set by comparison
– Compare as part of branch
» Adds delay slots in pipeline
CS 136
11
Encodings
• Variable-length instructions
–
–
–
–
Highly efficient (few wasted bits)
Allows complex specifications (e.g., x86 addressing modes)
Usually means misaligned instruction fetch
Greatly complicates fetch/decode units
• Fixed-length instructions
–
–
–
–
CS 136
May limit number of registers
Usually very few instruction formats
Wastes space but gains speed (e.g., only aligned fetches)
Limits width of immediate operands
12
The Fight for Bits
• How wide should instruction be?
– Wider ⇒ can encode more registers, more options
– Wider ⇒ bigger programs, more memory bandwidth
– Bigger programs ⇒ fewer cache hits
• Things you need to encode:
–
–
–
–
–
–
–
CS 136
Operation code (16 to 1000 instructions)
Operands (at least one, normally two or three)
Immediate operands
Memory offsets
Branch targets
Branch conditions
Conditional operations (e.g., conditional load, add)
13
Two or Three Operands?
• In favor of three:
– Smaller code size
– No clobbered operands ⇒ fewer copies or reloads
– Setting R0 to zero allows fewer operations supported in ALU
• In favor of two:
– Can address more registers
CS 136
14
How to Decide All These Questions?
• Slide rules at 50 paces?
• Analysis wars
– Look at existing designs, existing programs
– “Recompile” programs for hypothetical architecture
» Analyze size of resulting program
» Run through simulator to see how it performs
– Impractical approach
» Writing compiler back ends is expensive
» Simulators are slow
– instead, make projections based on existing object code
CS 136
15
Example of Bad Analysis: @-(R2)
• DEC VAX had three “auto” addressing modes:
autopostincrement, autopredecrement, and
indirect autopostincrement
• What happened to indirect autopredecrement?
–
–
–
–
CS 136
Analyzed output of BLISS compiler on many programs
Language didn’t provide way to express autopredecrement
Concluded it wasn’t necessary
Very different result if had analyzed C!
*--p1 = a[--i];
16
Example of Difficult Analysis: imm16
• How big should an immediate be?
• Easy analysis: examine existing code
– Calculate frequency of various widths
– Analyze tradeoff of using those bits for other purposes
• Problem: analyzed architecture affects frequency
of different widths
– E.g., Alpha has only 16 bits, so you’ll never see over 16!
– Alternative: look for multi-instruction sequences that
effectively use more than 16 bits
» Hard to find (compiler pipeline scheduling)
» Compiler will stand on head, use sneaky tricks to avoid
generating extra instructions
– Need for wider constants depends on architecture
» E.g., MIPS needs them when jumping to shared libraries
CS 136
17
CS 136
18
Interaction with Compilers
• Nearly all modern code generated by compilers
• Architect must make compiler’s job easier
–
–
–
–
–
–
–
–
CS 136
Lots of registers
Orthogonal instruction set
Few side effects
Instructions and addressing modes matched to language
constructs
» But NOT attempt to implement them in detail!
» Primitives are better than “solutions” even when
solutions are correct
Good support for stack, globals, and pointers
Support for both compile-time and run-time binding
Don’t ask compiler to predict dynamic information (e.g.,
branch targets)
Don’t provide features language can’t express
» Example pro and con: vector architectures
19
The MIPS64 Architecture
• Extension of MIPS32
• Data path widened to 64 bits
– Still 32-bit instructions
– Still only 32 registers
• Most instructions have “D” as prefix to indicate
64-bit version
CS 136
20
MIPS Instruction Formats
I-Type Instruction
6
5
5
16
Opcode
rs
rt
Immediate
6
5
5
5
5
6
Opcode
rs
rt
rd
shamt
funct
R-Type Instruction
J-Type Instruction
6
26
Opcode
Offset inserted into PC
CS 136
21
I-Type Instructions
6
5
5
16
Opcode
rs
rt
Immediate
• Encodes loads, stores (all widths), immediate
ALU ops
• Also conditional branches (rt unused)
CS 136
22
R-Type Instructions
6
5
5
5
5
6
Opcode
rs
rt
rd
shamt
funct
• Register-register ALU operations
– “funct” encodes the ALU operation: add, sub, etc.
– Opcode chooses operands, special registers, sizes, etc.
– Conditional moves
• Handles special registers, floating point, …
CS 136
23
J-Type Instructions
6
26
Opcode
Offset inserted into PC
• Jump, jump and link
• Trap, return from exception
CS 136
24
MIPS Control Flow
• Unconditional jump substitutes low bits of PC
– NOT addition!
– Exceptionally bad on 64-bit architecture, where 36 bits
unchanged
• No built-in stack
–
–
–
–
Subroutine call stores return in register
Callee must save on stack if necessary
Reduces overall cycle time
Ultra-efficient for leaf functions
• Conditional branches only test against zero
– Complex tests (e.g., <) store Z/NZ result in a register
– We’ve seen how this improves the pipeline
• Conditional moves can eliminate many branches
– Feature of many modern architectures
CS 136
25
MIPS Floating Point
• Floating point was originally coprocessor
⇒ Separate FP registers
– Special instructions to move to/from integer registers
• MIPS64 (but not 32) has paired single operations
– Two SP numbers pass through DP ALU simultaneously
• MIPS64 also has multiply-add in one instruction
– Useful in signal processing (multimedia)
CS 136
26
Fallacies and Pitfalls
• PITFALL: Instruction designed to support feature
in some language
– Examples: PDP-11/45 MARK, VAX CALLS, IBM 360 ED/EDMK
– Why is this bad?
» Easy to get wrong (PDP-11 MARK instruction)
» Easy to make inefficient (VAX CALLS)
» Languages evolve, hardware doesn‘t
CS 136
27
Fallacies and Pitfalls (2)
• FALLACY: Typical programs exist
– We wish!
• PITFALL: Ignoring the compiler
– Design better code size, based on bad compiler
– Good compiler can blow your idea out of the water
• FALLACY: Flawed architectures can’t succeed
– Ummm, x86?
– Every architecture has drawbacks
• FALLACY: You (YOU!) can design a flawless
architecture
– Always tradeoffs
– Always something new to learn
CS 136
28
Summary
• Instruction encoding is important
• Don’t forget to provide what the compiler needs
– This is NOT what you think the compiler needs!
• Addresses will only get wider
• Data will only get wider
– Including characters
• Cleverness to improve bandwidth (e.g., MADD)
• RISC is here to stay
CS 136
29