RISC machines - Computer Science | Your Education
Download
Report
Transcript RISC machines - Computer Science | Your Education
COSC 3P92
Cosc 3P92
Week 8 Lecture slides
It is dangerous to be right when the government is
wrong.
Voltaire (1694 - 1778)
1
COSC 3P92
RISC machines
• Reduced instruction set computer vs. CISC -complex ...
(680x0, IBM 360,...)
• CISC technology has evolved highly complex
instruction sets, to bridge "semantic gap"
between hardware and software
– simplify compilers
– alleviate software crisis
– improve architecture quality
• But has CISC design gone "over the top"?
• If one looks at the software being executed, it is
typically simple and unsophisticated, and do not
exploit the sophisticated features of CISC
instruction sets.
2
COSC 3P92
• Software studies (all values represent %)
Statement
Assignment
If
Call
Loop
Goto
Other
SAL
47
17
25
6
0
5
Assginments
N Terms
0
-%
1
80
2
15
3
3
4
2
5>=
0
XPL Fortran C Pascal Average
55
51
38
45
47
17
10
43
29
23
17
5
12
15
15
5
9
3
5
6
1
9
3
0
3
5
16
1
6
7
Vars/Proc
N Locals
0
22
1
17
2
20
3
14
4
8
5>=
20
Params/Call
N Param
0
41
1
19
2
15
3
9
4
7
5>=
8
3
COSC 3P92
• RISC philosophy: create an instruction set that
lets you do the most common computations,
while maximising their efficiency
• To do this, throw away microprogramming, and
aim for instructions which execute in 1 cycle
• RISC chips have many features which are
exportable to contemporary CISC chips; also,
there are points of contention about design.
• There are also chips which seem to have both
CISC and RISC-like qualities.
4
COSC 3P92
History
IBM 370/168
Year
No. of instructions
Control memory
size (Kbit)
Instr. size (bits)
Machine type
VAX-11/780
Dorado
iAPX-432
1973
208
1978
303
1978
270
1982
222
420
16-48
register
480
16-456
register
136
8-24
stack
64
6-321
stack
IBM 801
RISC1
MIPS
Year
1980
1982
1983
No. of instructions
120
39
55
0
0
0
Instr. size (bits)
32
32
32
Machine type
register register register
Control memory
size
(Kbit)
5
COSC 3P92
History
• The IBM 801 project (1975) was designed with the
following principles:
– choose an instruction set to be a good target for a compiler
– provide a hardware engine that can execute its instructions in
one machine cycle
– design the storage hierarchy so that the control engine does
not have to wait for storage access
– base the entire system design on an optimizing compiler
6
COSC 3P92
RISC vs CISC: characteristics
RISC
1. simple instns taking 1 cycle
CISC
1. complex instns taking multiple cycles
2. only LOADs, STOREs
access memory
2. any instn. may access memory
3. designed around pipeline
3. designed around instn. set
4. instns. executed by h/w
4. instns interpreted by microprogram
5. Fixed format instns
5. variable format instns
6. Few instns and modes
6. Many instns and modes
7. Complexity in the compiler
7. Complexity in the microprogram
8. Multiple register sets
8. Single register set
7
COSC 3P92
RISC Design
* Sacrifice everything to reduce the data path
cycle time.
* Microcode is not magic.
•
Five steps:
1. Find key operations in intended applications.
2. Design optimal data path for these operations.
3. Design instructions which perform these operations on this
data path.
4. Add new instructions if they don't slow down machine
5. Repeat for other resources (cache, MMU,...)
8
COSC 3P92
Design Issues
• Single-cycle instructions
– key RISC characteristic
– rapid execution of simple instructions
– complex instns will require more compiled code
• Only LOAD and STORE instns access memory
– permits pipelining efficiency
– not as many addressing modes
Load signed byte
Load unsigned byte
Load signed halfword
Load unsigned halfword
Load word
Store byte
Store halfword
Store word
Fig. 8.5 Load and Store instructions for a typical 32-bit machine
9
COSC 3P92
Design Issues
• Maximal pipelining
– permits n instructions in n cycles
Instruction fetch
Instruction execution
Memory reference
Cycle
1 2 3 4 5 6 7 8 9 10
1 2 L 4 5 6 S 8 9 10
1 2 L 4 5 6 S 8 9
L
S
Fig. 8.6 A pipelined RISC machine with delayed LOAD, L, and STORE, S.
• Problems:
(i) memory accesses take 2 cycles
(ii) jumps ruin pipeline
• Solutions:
For (i): - hardware interlock (wait)
- use incorrect register (means that
compiler needs to correct situation)
10
COSC 3P92
Design Issues
– For (ii): need to optimise pipeline (instruction order) at compile
time.
• No Micro-code
– eliminate interpretation, max. data path efficiency.
– frees ALOT of chip space
• Fixed format instructions
– simple to decode
Bits
7
1
OPCODE C
5
DEST
5
1
SOURCE 1
13
OFFSET
0 = Not immediate
1 = Immediate
0 = Do not set condition codes
1 = Set condition codes
Fig. 8-7. RISC 1 basic instruction format.
11
COSC 3P92
• Reduced instruction set
– because of simple instruction format
– with RISC I, offset can double as operand, yielding 3 operand
instructions.
– to effect complex addressing, need to generate code to
explicitly construct the addressing
• More compile-time complexity
– compiler technology is the reason that RISC technology is
feasible.
– compiled code is executed directly, so compiler must account
for delayed instructions, register usage,...
– lack of sophisticated instructions adds compiler complexity
(eg. Multiply)
• Multiple register sets
– RISC chips have lots of registers (100's!)
– techniques for organizing them.
12
COSC 3P92
Register Usage
R0 - R7
R8 - R15
• RISC organises
registers to minimize
(remove) memory
accesses during
procedure calls
Global Variables
Incoming Parameters
R16 - R24
• memory traffic in CISC is
largely caused during
procedure calling
32-bit Words
Local Variables
R25 - R31
• Need to maximize
pipeline, minimise
memory access.
Outgoing Parameters
Fig. 8.8 The 32 registers visible to a
program at any instant of time.
– overlapping register window
organisation
13
COSC 3P92
• CWP - current window pointer
• Output and input register sets double up in
usage during procedure calls
• No stack needed UNLESS
– too many parameters
– parameters are too large in value
– too many nested calls cause all registers to be
used
• ... in which case standard stack techniques are
used.
• Remember: most programs are simple!
• Philosophical point:
–
Registers
vs
?
Memory
14
COSC 3P92
RISC vs CISC
• Benchmarking computers is difficult
– effects of hardware organisation (I/O, memory mgmt, ...)
– different chip technologies: ECL (emitter coupled logic) vs
MOS
– operating system
– language effects: C vs Prolog vs COBOL vs ...
– type of program: recursive vs iterative
• Overlapping register windows:
– not part of MIPS chip
– could it be exported to CISC chips too?
15
COSC 3P92
CISC vs RISC
• Compiler writing for RISC
•
Delayed JUMP
100
LOAD X, A
101
ADD
1, A
102
JUMP
106
A, B
103
NO-OP
C, B
104
ADD
A, B
STORE A, Z
105
SUB
C, B
106
STORE A, Z
100
LOAD
X, A
101
ADD
1, A
102
JUMP
105
103
ADD
104
SUB
105
106
Normal Branch
100
101
102
103
104
105
106
Delayed Branch
LOAD
JUMP
ADD
ADD
SUB
STORE
X, A
105
1, A
A, B
C, B
A, Z
Optimized Delayed Branch
16
COSC 3P92
• Compilers need to account for:
–
–
–
–
memory delays
jump delays
register allocation
simple instruction set
• RISC compilers need to make the best use of
registers
– preferable to use all the regs in a single window
(and not memory)
• Optimising compilers can do data flow analysis
on programs to see when variables are "active"
17
COSC 3P92
Example 1: Pentium II (CISC)
• Recall:
– instruction formats [5.13]
– addressing modes [5.26]
• Instruction set: [5.33]
• CISC instruction set
– design determined for back-compatability
– superscalar microprocessor tries to “deconstruct” CISC instns
into pipelineable microinstructions
– erratic variants for instn type, register usage, addressing
modes
– [reference pages]
18
COSC 3P92
19
COSC 3P92
20
COSC 3P92
21
COSC 3P92
Example 2: UltraSparc II
• Recall: formats [5.14]
– addressing: either immediate or register
– only load, store access memory
• [5.34]
22
COSC 3P92
23
COSC 3P92
24
COSC 3P92
25
COSC 3P92
Example 4: MIPS R4000
• Microprocessor without Interlocking Pipe Stages
• similarities with UltraSPARC:
–
–
–
–
64 bit design
LOAD/STORE architecture
2^64 byte-addressable memory
paging, coprocessors, ...
• differences:
– configurable to either Big- or Little-endian (byte ordering in
words)
– no register file or register windows
– no condition codes: results of tests saved in regs
– 8 stage pipeline
• Generally, MIPS does not give as orthogonal an
instruction set to programmer as SPARC, for
hardware efficiency sake.
26
COSC 3P92
• No window file:
– pro:
» with saved space, can fit MMU, cache controller, MUL/DIV
on chip
» removes overhead of saving 500 regs when
multitasking
» registers not fixed in purpose
– con:
» more (software) overhead in procedure calling
27
COSC 3P92
Summary
• Pentium II:
– 2-address, 32-bit CISC
– irregular
• UltraSPARC
– 3-address, 64-bit RISC
– 128-bit bus
– somewhat complex formats
SPARC
MIPS
orthogonal instns
H/W
optimised
register windows
none
software MUL/DIV
hardware
MUL/DIV
condition codes
none
• MIPS
– another 64-bit RISC
vs.
28
COSC 3P92
Itanium
• P4 has severe problems (IA-32)
– CISC
– 2 Address Memory Oriented ISA
– Small Register Set
» 6 registers
– Lack of Regs requires internal renaming of Regs,
» requires out of order execution to compensate for memory
reference waits
» Hence expensive h/w
– Deep pipeline (result of out of order execution),
» Flushing becomes very expensive
– Speculative execution causing traps to set.
• A large portion of the P4 is devoted to dealing
with the problems of its CISC architecture.
29
COSC 3P92
Itanium.
• Itanium (EPIC a better RISC)
– Has many functional units each able to work in parellel.
– 3 Address Risc.
– Much of the instruction work (reordering etc.) is moved to the
compiler.
– Parallelism of the h/w is known by the compiler
» Take advantage of the h/w producing efficient code.
– Simple Memory Model 264 bytes
– 128 registers – reducing memory references
» 32 static
» 96 for a register stack (like register windows of Ultra Sparc
III.
– Procedure calls put the call stack on the register stack
» Parameters a placed in registers as part of the call frame.
» Local variable are allocated on the stack by the procedure.
30
COSC 3P92
Itanium..
– 128 Floating point registers
– 64 Predicate registers
» Used for conditional branch prediction.
– 8 branch registers
– 128 special-purpose
» Inter application communication.
31
COSC 3P92
Itanium…
• Branch Prediction
(a) An if statement.
(b) Generic assembly code for a).
(c) A conditional instruction.
• Branches are removed by allowing all instructions
to execute.
– A Condition sets a predicate bit
– An instruction will write back the result if the predicate is true.
32
COSC 3P92
Itanium….
• CMOVZ, will execute if R1 is 0
• CMOVN, will execute if R1 is not 0
• This means:
– Executing a few instr. Is cheaper then a branch.
– Most branches can be eliminated.
» No pipeline problems.
33
COSC 3P92
Itanium…..
• Predicate registers are pairs,
– E.g. P4 is false then P5 is true. And visa versa.
– Any instruction can be predicated.
34
COSC 3P92
The end
35