Transcript Lecture 14

Lecture 8
Reduced Instruction
Set Computer
CS311-Computer Organization
RISC
Lecture 8 - 1
Lecture 8:
RISC
In this lecture, we will study
• Program execution characteristics
• RISC Philosophy
– Make the most frequently executed statement fast
» Functional, Transfer instructions
» Simple, small number of fixed format instructions
» Large register file
– Make the most time consuming statements fast
» Procedure Call and Return instructions
» Large register file
• Large Register File
• Overlapping Register Windows
– Linear and Circular organization of ORWs
• Ultimate RISC
CS311-Computer Organization
RISC
Lecture 8 - 2
Instruction Execution Characteristics:
Type of Operations
Relative Dynamic Frequencies of statements in HLL programs
Language
Workload
Assignment
LOOP
Call
If
goto
others
PASCAL
Scientific
74
4
1
20
2
-
FORTRAN PASCAL
Student Systems
67
45
3
5
3
15
11
29
9
7
6
C
Systems
38
3
12
43
3
1
SAL
Systems
42
4
12
36
6
What type of statements is most frequent?
– Assignment statements dominate
» Functional instructions and Transfer instructions
» Movements of data must be made simple, thus fast
– Conditional Statements(if and loop together)
» Instructions with Control function
» Sequence control mechanism is important
CS311-Computer Organization
RISC
Lecture 8 - 3
Instruction Execution Characteristics:
Time Consumed by Statements
Time Consumed
Assignment
Loop
Call
If
goto
others
Number of Machine Instructions
Dynamic Occur
PASCAL C
45
38
5
3
15
12
29
43
3
6
1
Machine Instr Wt
PASCAL C
13
13
42
32
31
33
11
21
3
1
Memory Ref Wt
PASCAL C
14
15
33
26
44
45
7
13
2
1
Machine instruction weighted
= [Average No. of machine Instr. / Statements] x [Frequency of Occurrences]
Memory reference weighted
= [Average No. of memory references / Statement] x [Frequency of Occurrences]
Most time consuming statement is procedure CALL/RETURN
CS311-Computer Organization
RISC
Lecture 8 - 4
Instruction Execution Characteristics:
Type of Operands
Dynamic Frequencies of Occurrences
Integer Constant
Scalar Variable
Array/Structure
PASCAL
16
58
26
C
23
53
24
Average
20
55
25
Majority of references to scalar
– 80% are local to a procedure
– References to arrays/structure require index or pointer
Locations of operands(Average per instruction)
– 0.5 operands in memory
– 1.4 operands in registers
CS311-Computer Organization
RISC
Lecture 8 - 5
Instruction Execution Characteristics:
Procedure Calls
• Two most significant aspects in implementing this operation
– Number of parameters
– Depth of nesting
• Statistics on Number of Parameters
– 98% of dynamically called procedures were passed fewer than 6
parameters
– 92% of them used fewer than 6 local scalar variables
CALL SUB(X1, X2, X3)
parameters
SUB(A, B, C)
CS311-Computer Organization
RISC
Lecture 8 - 6
Multiple Register Sets
Multiple register sets:
- Assume that we have several sets of registers that each
set can be used by each different procedure
- Saves some time in procedure CALL/RETURN simply by
changing the R set pointer value
R set pointer
...
...
...
...
Set 0
set 1
set 2
CS311-Computer Organization
RISC
...
...
Set n-1
Lecture 8 - 7
Instruction Execution Characteristics:
Depth of Procedure Nesting
Procedure Nesting and Register Set Window
t
Nesting depth of 5
can be served with
register set window of
size 5 without using
Memory
Return
Call
Depth
Register set window
When Nesting depth > 5
- Movements of >5 in either direction(CALL/RETURN)
needs to shift the register set window(down/up)
Shifting register set window: need to save the information in one register
set in the memory so that a register set can
be used by the new procedure
Statistics: Window depth of 8 will need to shift
only on less than 1% of calls and returns
CS311-Computer Organization
RISC
Lecture 8 - 8
Complex Instruction Set
Computer(CISC)
Design Philosophy of CISC
•
•
Distinction between Architecture and Implementation via
microprogrammed control unit
Richer Instruction Set
– Performance of instruction - powerfulness
– Reduce Semantic Gap for programming easiness
– Simplifying compiler functions
•
Larger Microprogram
– Moving hardware functions to micro-code
– Moving software functions to micro-code
•
Parallelism
– Pipelining
– Multiple function units, processors, computers
•
NO ATTENTION ON INSTRUCTION FREQUENCY,
TIME-CONSUMING INSTRUCTIONS, etc
CS311-Computer Organization
RISC
Lecture 8 - 9
RISC Philosophy(1):
Make the Most Frequent Statements
Execute Fast
Most frequent statements are Assignment Type of Statements and each of
them are translated by the compiler into a set of Functional Instructions and/or
Transfer Instruction. Thus Functional and Transfer Instructions need to be
made to execute fast.
Instruction Cycle of Functional Instruction or Transfer Instruction
I-F(M)
read istr.
from M
Short
instruction
I-P
O-F(M)
Decode/
effective addr
Fixed instr. Format
Simple addr. modes
E
read opd
from M
perform
operation
Have operands
in registers
Cannot do anything about
it with an instr set
Improved Architecture
- Pipelined Execution
CS311-Computer Organization
RISC
Lecture 8 - 10
Assignment Statements
• To make the Instruction Fetch fast
– Short OP-code part: Small number of instructions in the instruction set
– Short Operand Address part: Make the operands in the registers
instead of M
• To make the Instruction Preparation fast
– Fixed length instruction
– Fixed format instruction
– Simple addressing modes
• To make the Operand Fetch fast
– Make the operands available from registers instead of memory
– Needs a large register file
• To make the Instruction Execution fast
– Multiple register set; Overlapping MRS
– Instruction execution pipeline
CS311-Computer Organization
RISC
Lecture 8 - 11
RISC Philosophy(2):
Make the Most Time-Consuming
Statements Execute Fast
Procedure Call and Return
CALL SUB(X1, X2, X3)
SUB(A, B, C)
Methods of passing Parameters
• Through memory
–
–
Parameters are stored in the memory locations which are commonly accessible
by both calling and called procedures
Execution of CALL and RETURN instructions are very slow due to the memory
accesses, especially when there are many parameters to pass
• Through registers
– Parameters are stored in the registers in CPU
– Calling procedure needs to save the registers, which are not used for passing
parameters, in the memory. This results in a lot of memory accesses and makes
the execution times of these instructions slow.
CS311-Computer Organization
RISC
Lecture 8 - 12
Time Out
•
어떤 노파가 고양이와 함께 앉아서 먼지 낀 램프를 닦고 있었다.
•
끄 때 조그만 요정 하나가 램프에서 튀어나오더니 노파에게 세 가지 소원
을 말하라고 했다.
•
노파는 얼른 “부자가 되고 싶고, 젊고 아름다워지고 싶으며, 고양이가 잘
생긴 왕자가 되었으면 좋겠어요.” 라고 말했다.
•
그러자 연기가 피어 오르며 펑 하는 소리가 나더니 노파는 젊고 아름다워
졌으며, 주위에는 금은보화가 산더미 같이 쌓여있었다. 고양이는 자취를
감추고 대신 늠름한 왕자가 나타나서 두 팔을 벌리고 있었다. 젊어진 노파
는 얼른 그의 품에 안겼다.
•
왕자는 여자의 귀에 대고 부드럽게 속삭였다.
•
“당신이 전에 내가 고양이였을 때 나한테 거세수술을 해준 걸 후회하지 않
나요?”
CS311-Computer Organization
RISC
Lecture 8 - 13
CISC and RISC
Year developed
No. of instructions
Instruction length
Addressing modes
No. of GPR
CM capacity
Cache
IBM
S/360-168
73
208
16-48
4
16
420Kb
64Kb
VAX
11-780
78
303
16-456
22
16
480Kb
64Kb
Intel
8086
78
133
8-32
6
4
-
Berkeley
RISC I
81
31
32
3
138
0
0
IBM
801
80
120
32
5
32
0
0
RISC
• A limited and simple instruction set
• A large number of GPR(Register File)
• An emphasis on optimizing the instruction pipeline
CS311-Computer Organization
RISC
Lecture 8 - 14
Large Register File
Quick access to operands is desirable
- Assignment Statements rely on Functional and Transfer
Instructions
- Functional Instructions heavily rely on registers
- Frequency of Transfer Instructions depends on the
number of registers in the register file
If the number of registers is small, it needs a strategy to keep the
most frequently accessed operands in registers to minimize
Register-Memory traffic
- Software approach
Maximize register usage by compiler
(Requires sophisticated program analysis)
- Hardware approach
More registers in the register file
CS311-Computer Organization
RISC
Lecture 8 - 15
Register Window
•
Fact
– Statistically, most operand references are to local scalars - 80%
– Local variables to a procedure cannot be accessed by other
procedure(s)
•
Problem
– Local changes with each procedure CALL/RETURN
– CALL/RETURN occurs frequently
– Parameters need to be passed around
•
Observations
– Statistically, a few parameters(<6) and local variables(<6)
– Statistically, depth of procedure activation fluctuates within relatively
narrow range(<8)
•
Solution
– Multiple small sets of registers
– Each set is assigned to a different procedures
– Windows for adjacent procedures overlap to allow parameter passing
CS311-Computer Organization
RISC
Lecture 8 - 16
Multiple Register Set
Register Set Pointer
...
...
Set 1
set 2
…
set 3
...
...
set m
Each Register Set is assigned to a different procedure
- Size of a Register Set is equal to the size of a window
- Parameters need to be copied in the called/calling procedure’s Register Set
- Require register move instructions
CS311-Computer Organization
RISC
Lecture 8 - 17
Overlapping Register
Window
When the Register Sets are implemented in a large Register File, we call the
Register Set as a Register Window.
Overlapping Register Window
- Portions of register windows overlap for passing parameters
- At any time only one window is visible
- No need for moving information for parameter passing
Window i
Procedure i Parameter
Registers
Local
Registers
Temporary
Registers
Same physical registers
CALL
Window i+1
RETURN
Exchange of
parameters
Parameter
Registers
Local
Registers
Temporary
Registers
Procedure i+1
How about global variables?
CS311-Computer Organization
RISC
Lecture 8 - 18
Global Variables
Global Variables are commonly accessible by all the procedures
• Assign to memory locations by compiler
– Straight forward but inefficient for the frequently accessed
global variables because of frequent memory accesses
• Set aside a set of Global Variable registers
– Available to all procedures
– Unified register numbering system to simplify instruction
format
– e.g.
R0 ~ R7: Global
R8 ~ R13: Current window
CS311-Computer Organization
RISC
Lecture 8 - 19
Linear Organization
of Register Windows
Physical Register File
Global
Registers
0
p-1
0
p-1
0
p-1
0
p-1
n-1
Set 1
p
m-1
Set 2
Set 3
p
m-1
.
CS311-Computer Organization
p
m-1
RISC
Lecture 8 - 20
Circular Organization
of Register Windows
SWP
n-window register file accommodates
n-1 procedure calls
Save
Restore
Procedure Call:
CWP
CWP+1(current window pointer)
if CWP=SWP(save window pointer)
then interrupt, save Window(SWP),
SWP SWP+1
Load temporary register with parameters
which must be passed down
Call proceeds
W0
W5
W1
W4
Return
CWP
W2
W3
CS311-Computer Organization
Call
RISC
Return:
CWP
CWP-1
if CWP=SWP
then interrupt,
restore called procedure’s Window(SWP),
SWP
SWP-1
Lecture 8 - 21
Code Size
• Smaller programs
– Program takes less memory space
– Smaller program improves performance
» Fewer instructions
» Fewer bytes to fetch
» In paging environment, occupy in fewer pages and reduces
page faults
• CISC
– Smaller number of instructions in the program(program may
be shorter but not necessarily smaller space)
CS311-Computer Organization
RISC
Lecture 8 - 22
Example
CISC
8
4+12
4+12
Memory Traffic
Instruction:
Data:
Total MB used:
RISC
8
LD
LD
ADD
ST
4
4
4+12
56 bits
32 x 3 = 96 bits
56 + 96 = 152 bits
12
Rb
B
Rc
C
Ra Rb Rc
Ra
A
Memory Traffic
Instruction:
Data:
Total MB used:
112 bits
96 bits
200 bits
CISC: More instructions in the instructions set
Longer OP-code
RISC: More chances of storing intermediate results in registers
Less use of LD/ST
CS311-Computer Organization
RISC
Lecture 8 - 23
Characteristic of RISC(1, 2)
(1) 1 Instruction per cycle(memory cycle)
– Machine cycle:
IF + IP + Time to fetch the operands from registers
+ Perform operation + Store the result in a register
– RISC instruction <=> CISC micro-instruction
=> No need to microprogram(Hardwired control)
(2) Register-to-Register operation
– With only simple Load and Store operations for accessing
memory(Load/Store Arch.)
– Simplifies the instruction set, and control unit
B
A
D
B+C
A+B
D- A
CISC-I
ADD B,C, B
ADD A, B, A
ADD D, A, D
RISC
ADD Rb, Rc, Rb
ADD Ra, Rb, Ra
ADD Rd, Ra, Rd
I:
56 x 3 = 168 bits
D: 96 x 3 = 288 bits
Total MB:
456 bits
Cycles: 3 x 4 = 12
I:
28 x 3 = 84 bits
D: 0 bits
Total MB: 84 bits
Cycles: 3 x 1 = 3
CS311-Computer Organization
RISC
Data:
32 bits
OP-code:
8 bits
Reg Address:
4 bits
M address: MM instr- 12 + 4 bits
RISC -- 12 bits
Lecture 8 - 24
Characteristic of RISC(3, 4)
(3) Simple Addressing Modes - Shorten EA generation time
– Almost all instructions use register addressing
– Relative addressing using PC, BAR, and Index address
– Other complex modes may be synthesized by software
OP-code . . .
Addressing Modes
Immediate
Direct
Register
Register Indirect
Displacement
Rs1
S2
Effective Address
Operand=A
EA = A
EA = R
EA = [R]
EA = [R] + A
Synthesis
S2
0 + S2
Rs1, S2
Rs1 + 0
Rs1 + S2
Used by
R-to-R
LD/ST
R-to-R
LD/ST
LD/ST
(4) Simple Instruction Format - Shorten instruction Decoding Time
– Usually one format
– Fixed length/align on word boundary
– Fixed field length
CS311-Computer Organization
RISC
Lecture 8 - 25
Characteristic of RISC(5)
(5) Pipelining (We will learn this later in detail)
At this time, you just need to know that
- Instruction execution hardware can be made of a few inter- connected
independent sub-modules, called pipeline STAGEs
S0
S1
S2
S3
- An instruction execution progresses at each pipeline stage in sequence
- When an instruction completes its execution at the i-th stage, the next
instruction commences its execution at the i-th stage
- Thus, in the ideal situation, throughput increases nearly n times, where n
is the number of pipeline stages
- Branch instruction makes the pipelined execution inefficient
CS311-Computer Organization
RISC
Lecture 8 - 26
Laundry Task
Laundry Example
• Ann, Brian, Cathy, Dave
each have one load of clothes
to wash, dry, and fold
A
B
C
D
• Washer takes 30 minutes
• Dryer takes 40 minutes
• “Folder” takes 20 minutes
We have 3 different work stages
CS311-Computer Organization
RISC
Lecture 8 - 27
Sequential Laundry
6 PM
30
T
a
s
k
7
40
Time
9
8
20
A
30
40
20
30
40
Midnight
11
10
20
30
40
20
Sequential laundry takes 6 hours for 4 loads
If they learned pipelining, how long would laundry take?
B
O
r
d
e
r
C
D
CS311-Computer Organization
RISC
Lecture 8 - 28
Pipelined Laundry
T
a
s
k
Time
6PM
7
30
40
8
40
9
40
40
10
Midnight
20
A
• Pipelined laundry takes 3.5
hours for 4 loads
• Maximum of 3 tasks can be
carried out concurrently
B
O
r
d
e
r
11
C
D
CS311-Computer Organization
RISC
Lecture 8 - 29
Pipelined Execution
S0
S1
S2
S3
S2
IS2
0
IS3
0
1 instruction execution
S0
S0
I0
S1
S1
I0
tx4
Execution of a Sequence of Instructions
S0
I0
S1
I0
S2
S2
I0
S3
S3
I0
At 4t: I0
S0
I1
S1
I1
S2
S2
I1
S3
S3
I1
At 5t: I1
S0
I2
S1
I2
S2
S2
I2
S3
I2
At 6t: I2
S0
I3
S1
I3
S2
I3
S3
At 7t: I3
S0
I4
S1
I4
S2
S3
At 8t: I4
CS311-Computer Organization
RISC
N instructions
complete at
(n+3)t
When n is large it
becomes
nt
Thus, 1 instruction
in every t
Lecture 8 - 30
Pipeline Characteristics
• Multiple tasks operating simultaneously
• Pipeline does not help latency of single task, but it helps
throughput of entire workload
• Pipeline rate is limited by the slowest pipeline stage
• Unbalanced lengths of pipeline stages reduce speedup
• Potential speedup = Number of pipeline stages
• Time to Fill pipeline and time to drain it reduces speedup
CS311-Computer Organization
RISC
Lecture 8 - 31
Time Out
•
수게 한 마리가 암게를 만나 청혼을 했다.
•
그런데 암게가 보니 그 수게가 옆으로 걷지 않고 똑바로 걷는 것이었다.
•
‘이놈 정말 별난 놈이로구나. 이런 놈을 놓쳐서는 안되겠다.’ 이렇게 생각하
고 즉시 그 수게와 결혼했다.
•
그런데 다음날 암게는 남편이 다른 게들이나 마찬가지로 옆으로 걷는 걸 보
고 화가 나서 따졌다.
•
“도대체 어떻게 된 거에요?” 우리가 결혼하기 전에는 당신은 똑바로 걷지 않
았어요?”
•
수게가 대답했다. “아이쿠, 여보, 매일 그렇게 술을 많이 마실 순 없지 않소.”
CS311-Computer Organization
RISC
Lecture 8 - 32
Berkeley RISC
RISC-I and RISC-II
A 32-bit processor
31 and 39 instructions, respectively
ORW, 138 Rs; Window: 10 global, 6 temporary, 10 local, 6 parameter
5
Instruction Format
31
24
OP=code
7
4
0
Cond
1
23
19 18
Rs2
imm13
13
0
SCC
Rd
Rs1
S2
1
5
5
14
imm19
Cond(flag): C, Z, O, N
Rd: destination register
Rs1: Source register
S2: Functional Instr.: if MSB=0, then S2=Rs2: another source register
: if MSB=1, imm13(13-bit immediate data)
Transfer or Sequencing Instr.: if MSB=0, EA=[Rs1]+[Rs2]; index reg.
: if MSB=1, EA=[Rs1]+imm13
RISC-II: EA=[PC] + S2
CS311-Computer Organization
RISC
Lecture 8 - 33
RISC-II Instruction Set
• Functional(C:carry, R:reverse)
– ADD, ADDC, SUB, SUBC, SUBR, SUBCR, AND, OR, XOR, SLL,
SRL, SRA
• Transfer(X:index, W:word, H:half, B:byte, R:relative,
U/S:unsigned)
(Index: EA=Rs1+S2(Rs2), Relative: EA=PC+S2(Rs2))
– LDXW, LDXHU(S), LDXBU(S), LDRW, LDRHU(S), LDRBU(S)
– STXW, STXHU(S), STXBU(S), STRW, STRHU(S), STRBU(S)
• Sequence Control
– JMP, JMPR, CALL, CALLR, RET, CALLINT, RETINT, ...
CS311-Computer Organization
RISC
Lecture 8 - 34
Ultimate RISC Instruction Set
• BN instruction
– Conditional branch phase in each instruction cycle
– Does not conform with RISC philosophy, that is, inefficient use
of instruction pipeline
• Ultimate RISC instruction set
– Move the content of the SOURCE(Read) to the
DESTINATION(Write), both within memory
– 2-address instruction
» 1 address fits in an M word PC
» 4-cycle instruction
addr
M[PC], PC
temp
M[addr]
addr
M[PC], PC
M[addr]
temp
CS311-Computer Organization
PC + 1
PC + 1
RISC
X
Y
addr
X
temp
A
Y
X
A
Y
A
Lecture 8 - 35
Ultimate RISC Architecture
BUS
Memory Mapped I/O
Memory Mapped ALU
PC: 1 special word(address=0)
ALU contains an accumulator and flags
IEU
ALU
M
I/O
Memory Mapped ALU
Arithmetic operations - Special Addresses
When ALU is used as a Destination
- Store a value in AC
- Operate on AC
When ALU is used as the Source
- One address gets the value of AC
- Other addresses test the conditions code and sets the
destination address
(Branch either one of the 2 consecutive addresses)
CS311-Computer Organization
RISC
Lecture 8 - 36
Memory Mapped ALU
Writing an operand into an address associated with the operation, reading
the resulting from the result from the other address
Address
8
9
10
11
12
13
14
15
CS311-Computer Organization
Write(used as the destination) Read(source address)
AC
data
data
AC
AC
AC - data
data
N
AC
data - AC
data
Z
AC
data + AC
data
V
AC
data + AC
data
C
AC
data v AC
data
N+0
AC
AC ^ data
data
((N + 0) v Z)
AC
data / 2
data
C^ ~Z
RISC
Lecture 8 - 37
Condition Codes and Branching
Condition Codes
2(10): True
0(00): False
- Upon testing a CC, it sets the LSB of the destination address
- This allows to branch either one of the two consecutive instructions
Set to 00 when False
Destination address
Branch
Set to 10 when True
Moving a target address to location 0(PC)
CS311-Computer Organization
RISC
Lecture 8 - 38
Instructions Cycle
Instruction Layout in memory
S D S D S D
...
S D
- 2 adjoining words/instruction
- Contiguous storage of instructions
Instruction Cycle - 4 clean cycles for pipelining
[1] Fetch Source Address and increment PC:
[2] Read Source Data:
[3] Fetch Destination Address:
[4] Write Data to Destination:
IS
RS
ID
WD
Pipelining with a 4-port memory(3 reads and 1 write)
Instruction 1:
Instruction 2:
Instruction 3:
Instruction 4:
CS311-Computer Organization
IS1
RS1
IS2
ID1
RS2
IS3
RISC
WD1
ID2
RS3
IS3
read
read
read
write
Completion 0f
1 instr/cycle
WD2
ID3
RS4
WD3
ID4
WD4
Lecture 8 - 39
Improvement
3-Cycle Design
S
S
S
...
S
D D D
...
D
Instruction Cycle - 3 clean cycles
[1] Fetch Source and Destination Addresses and increment PC: ISD
[2] Read Source Data:
RS
[3] Write Data to Destination:
WD
read
read
write
3-way Pipelining using a 3-port memory(2 read ports and 1 write port)
Instruction 1:
Instruction 2:
Instruction 3:
Instruction 4:
CS311-Computer Organization
ISD1
RS1
ISD2
WD1
RS2
ISD3
RISC
WD2
RS3
ISD4
Completion of
1 instr./cycle
WD3
RS4
WD4
Lecture 8 - 40
Improvement
2-Cycle Design
IEU
Instruction
Memory
ALU
Data
Memory
I/O
Instruction Cycle (2 dedicated memory units; 1 instruction, 1 data)
[1] Read Data from Source:
[2] Write Data to Destination,
Read instruction,
Increment PC:
RS
WD
(RI
read
write
read)
2-way Pipelining
Instruction p+1:
Instruction 2:
Instruction 3:
Instruction 4:
CS311-Computer Organization
WDp
RSp+1
WDp+1
RSp+2
RISC
Completion of
1 instr./cycle
WDp+2
RSp+3
WDp+3
RSp+4
WDp+4
Lecture 8 - 41