Transcript Lecture 14
Lecture 8 Reduced Instruction Set Computer CS311-Computer Organization RISC Lecture 8 - 1 Lecture 8: RISC In this lecture, we will study • Program execution characteristics • RISC Philosophy – Make the most frequently executed statement fast » Functional, Transfer instructions » Simple, small number of fixed format instructions » Large register file – Make the most time consuming statements fast » Procedure Call and Return instructions » Large register file • Large Register File • Overlapping Register Windows – Linear and Circular organization of ORWs • Ultimate RISC CS311-Computer Organization RISC Lecture 8 - 2 Instruction Execution Characteristics: Type of Operations Relative Dynamic Frequencies of statements in HLL programs Language Workload Assignment LOOP Call If goto others PASCAL Scientific 74 4 1 20 2 - FORTRAN PASCAL Student Systems 67 45 3 5 3 15 11 29 9 7 6 C Systems 38 3 12 43 3 1 SAL Systems 42 4 12 36 6 What type of statements is most frequent? – Assignment statements dominate » Functional instructions and Transfer instructions » Movements of data must be made simple, thus fast – Conditional Statements(if and loop together) » Instructions with Control function » Sequence control mechanism is important CS311-Computer Organization RISC Lecture 8 - 3 Instruction Execution Characteristics: Time Consumed by Statements Time Consumed Assignment Loop Call If goto others Number of Machine Instructions Dynamic Occur PASCAL C 45 38 5 3 15 12 29 43 3 6 1 Machine Instr Wt PASCAL C 13 13 42 32 31 33 11 21 3 1 Memory Ref Wt PASCAL C 14 15 33 26 44 45 7 13 2 1 Machine instruction weighted = [Average No. of machine Instr. / Statements] x [Frequency of Occurrences] Memory reference weighted = [Average No. of memory references / Statement] x [Frequency of Occurrences] Most time consuming statement is procedure CALL/RETURN CS311-Computer Organization RISC Lecture 8 - 4 Instruction Execution Characteristics: Type of Operands Dynamic Frequencies of Occurrences Integer Constant Scalar Variable Array/Structure PASCAL 16 58 26 C 23 53 24 Average 20 55 25 Majority of references to scalar – 80% are local to a procedure – References to arrays/structure require index or pointer Locations of operands(Average per instruction) – 0.5 operands in memory – 1.4 operands in registers CS311-Computer Organization RISC Lecture 8 - 5 Instruction Execution Characteristics: Procedure Calls • Two most significant aspects in implementing this operation – Number of parameters – Depth of nesting • Statistics on Number of Parameters – 98% of dynamically called procedures were passed fewer than 6 parameters – 92% of them used fewer than 6 local scalar variables CALL SUB(X1, X2, X3) parameters SUB(A, B, C) CS311-Computer Organization RISC Lecture 8 - 6 Multiple Register Sets Multiple register sets: - Assume that we have several sets of registers that each set can be used by each different procedure - Saves some time in procedure CALL/RETURN simply by changing the R set pointer value R set pointer ... ... ... ... Set 0 set 1 set 2 CS311-Computer Organization RISC ... ... Set n-1 Lecture 8 - 7 Instruction Execution Characteristics: Depth of Procedure Nesting Procedure Nesting and Register Set Window t Nesting depth of 5 can be served with register set window of size 5 without using Memory Return Call Depth Register set window When Nesting depth > 5 - Movements of >5 in either direction(CALL/RETURN) needs to shift the register set window(down/up) Shifting register set window: need to save the information in one register set in the memory so that a register set can be used by the new procedure Statistics: Window depth of 8 will need to shift only on less than 1% of calls and returns CS311-Computer Organization RISC Lecture 8 - 8 Complex Instruction Set Computer(CISC) Design Philosophy of CISC • • Distinction between Architecture and Implementation via microprogrammed control unit Richer Instruction Set – Performance of instruction - powerfulness – Reduce Semantic Gap for programming easiness – Simplifying compiler functions • Larger Microprogram – Moving hardware functions to micro-code – Moving software functions to micro-code • Parallelism – Pipelining – Multiple function units, processors, computers • NO ATTENTION ON INSTRUCTION FREQUENCY, TIME-CONSUMING INSTRUCTIONS, etc CS311-Computer Organization RISC Lecture 8 - 9 RISC Philosophy(1): Make the Most Frequent Statements Execute Fast Most frequent statements are Assignment Type of Statements and each of them are translated by the compiler into a set of Functional Instructions and/or Transfer Instruction. Thus Functional and Transfer Instructions need to be made to execute fast. Instruction Cycle of Functional Instruction or Transfer Instruction I-F(M) read istr. from M Short instruction I-P O-F(M) Decode/ effective addr Fixed instr. Format Simple addr. modes E read opd from M perform operation Have operands in registers Cannot do anything about it with an instr set Improved Architecture - Pipelined Execution CS311-Computer Organization RISC Lecture 8 - 10 Assignment Statements • To make the Instruction Fetch fast – Short OP-code part: Small number of instructions in the instruction set – Short Operand Address part: Make the operands in the registers instead of M • To make the Instruction Preparation fast – Fixed length instruction – Fixed format instruction – Simple addressing modes • To make the Operand Fetch fast – Make the operands available from registers instead of memory – Needs a large register file • To make the Instruction Execution fast – Multiple register set; Overlapping MRS – Instruction execution pipeline CS311-Computer Organization RISC Lecture 8 - 11 RISC Philosophy(2): Make the Most Time-Consuming Statements Execute Fast Procedure Call and Return CALL SUB(X1, X2, X3) SUB(A, B, C) Methods of passing Parameters • Through memory – – Parameters are stored in the memory locations which are commonly accessible by both calling and called procedures Execution of CALL and RETURN instructions are very slow due to the memory accesses, especially when there are many parameters to pass • Through registers – Parameters are stored in the registers in CPU – Calling procedure needs to save the registers, which are not used for passing parameters, in the memory. This results in a lot of memory accesses and makes the execution times of these instructions slow. CS311-Computer Organization RISC Lecture 8 - 12 Time Out • 어떤 노파가 고양이와 함께 앉아서 먼지 낀 램프를 닦고 있었다. • 끄 때 조그만 요정 하나가 램프에서 튀어나오더니 노파에게 세 가지 소원 을 말하라고 했다. • 노파는 얼른 “부자가 되고 싶고, 젊고 아름다워지고 싶으며, 고양이가 잘 생긴 왕자가 되었으면 좋겠어요.” 라고 말했다. • 그러자 연기가 피어 오르며 펑 하는 소리가 나더니 노파는 젊고 아름다워 졌으며, 주위에는 금은보화가 산더미 같이 쌓여있었다. 고양이는 자취를 감추고 대신 늠름한 왕자가 나타나서 두 팔을 벌리고 있었다. 젊어진 노파 는 얼른 그의 품에 안겼다. • 왕자는 여자의 귀에 대고 부드럽게 속삭였다. • “당신이 전에 내가 고양이였을 때 나한테 거세수술을 해준 걸 후회하지 않 나요?” CS311-Computer Organization RISC Lecture 8 - 13 CISC and RISC Year developed No. of instructions Instruction length Addressing modes No. of GPR CM capacity Cache IBM S/360-168 73 208 16-48 4 16 420Kb 64Kb VAX 11-780 78 303 16-456 22 16 480Kb 64Kb Intel 8086 78 133 8-32 6 4 - Berkeley RISC I 81 31 32 3 138 0 0 IBM 801 80 120 32 5 32 0 0 RISC • A limited and simple instruction set • A large number of GPR(Register File) • An emphasis on optimizing the instruction pipeline CS311-Computer Organization RISC Lecture 8 - 14 Large Register File Quick access to operands is desirable - Assignment Statements rely on Functional and Transfer Instructions - Functional Instructions heavily rely on registers - Frequency of Transfer Instructions depends on the number of registers in the register file If the number of registers is small, it needs a strategy to keep the most frequently accessed operands in registers to minimize Register-Memory traffic - Software approach Maximize register usage by compiler (Requires sophisticated program analysis) - Hardware approach More registers in the register file CS311-Computer Organization RISC Lecture 8 - 15 Register Window • Fact – Statistically, most operand references are to local scalars - 80% – Local variables to a procedure cannot be accessed by other procedure(s) • Problem – Local changes with each procedure CALL/RETURN – CALL/RETURN occurs frequently – Parameters need to be passed around • Observations – Statistically, a few parameters(<6) and local variables(<6) – Statistically, depth of procedure activation fluctuates within relatively narrow range(<8) • Solution – Multiple small sets of registers – Each set is assigned to a different procedures – Windows for adjacent procedures overlap to allow parameter passing CS311-Computer Organization RISC Lecture 8 - 16 Multiple Register Set Register Set Pointer ... ... Set 1 set 2 … set 3 ... ... set m Each Register Set is assigned to a different procedure - Size of a Register Set is equal to the size of a window - Parameters need to be copied in the called/calling procedure’s Register Set - Require register move instructions CS311-Computer Organization RISC Lecture 8 - 17 Overlapping Register Window When the Register Sets are implemented in a large Register File, we call the Register Set as a Register Window. Overlapping Register Window - Portions of register windows overlap for passing parameters - At any time only one window is visible - No need for moving information for parameter passing Window i Procedure i Parameter Registers Local Registers Temporary Registers Same physical registers CALL Window i+1 RETURN Exchange of parameters Parameter Registers Local Registers Temporary Registers Procedure i+1 How about global variables? CS311-Computer Organization RISC Lecture 8 - 18 Global Variables Global Variables are commonly accessible by all the procedures • Assign to memory locations by compiler – Straight forward but inefficient for the frequently accessed global variables because of frequent memory accesses • Set aside a set of Global Variable registers – Available to all procedures – Unified register numbering system to simplify instruction format – e.g. R0 ~ R7: Global R8 ~ R13: Current window CS311-Computer Organization RISC Lecture 8 - 19 Linear Organization of Register Windows Physical Register File Global Registers 0 p-1 0 p-1 0 p-1 0 p-1 n-1 Set 1 p m-1 Set 2 Set 3 p m-1 . CS311-Computer Organization p m-1 RISC Lecture 8 - 20 Circular Organization of Register Windows SWP n-window register file accommodates n-1 procedure calls Save Restore Procedure Call: CWP CWP+1(current window pointer) if CWP=SWP(save window pointer) then interrupt, save Window(SWP), SWP SWP+1 Load temporary register with parameters which must be passed down Call proceeds W0 W5 W1 W4 Return CWP W2 W3 CS311-Computer Organization Call RISC Return: CWP CWP-1 if CWP=SWP then interrupt, restore called procedure’s Window(SWP), SWP SWP-1 Lecture 8 - 21 Code Size • Smaller programs – Program takes less memory space – Smaller program improves performance » Fewer instructions » Fewer bytes to fetch » In paging environment, occupy in fewer pages and reduces page faults • CISC – Smaller number of instructions in the program(program may be shorter but not necessarily smaller space) CS311-Computer Organization RISC Lecture 8 - 22 Example CISC 8 4+12 4+12 Memory Traffic Instruction: Data: Total MB used: RISC 8 LD LD ADD ST 4 4 4+12 56 bits 32 x 3 = 96 bits 56 + 96 = 152 bits 12 Rb B Rc C Ra Rb Rc Ra A Memory Traffic Instruction: Data: Total MB used: 112 bits 96 bits 200 bits CISC: More instructions in the instructions set Longer OP-code RISC: More chances of storing intermediate results in registers Less use of LD/ST CS311-Computer Organization RISC Lecture 8 - 23 Characteristic of RISC(1, 2) (1) 1 Instruction per cycle(memory cycle) – Machine cycle: IF + IP + Time to fetch the operands from registers + Perform operation + Store the result in a register – RISC instruction <=> CISC micro-instruction => No need to microprogram(Hardwired control) (2) Register-to-Register operation – With only simple Load and Store operations for accessing memory(Load/Store Arch.) – Simplifies the instruction set, and control unit B A D B+C A+B D- A CISC-I ADD B,C, B ADD A, B, A ADD D, A, D RISC ADD Rb, Rc, Rb ADD Ra, Rb, Ra ADD Rd, Ra, Rd I: 56 x 3 = 168 bits D: 96 x 3 = 288 bits Total MB: 456 bits Cycles: 3 x 4 = 12 I: 28 x 3 = 84 bits D: 0 bits Total MB: 84 bits Cycles: 3 x 1 = 3 CS311-Computer Organization RISC Data: 32 bits OP-code: 8 bits Reg Address: 4 bits M address: MM instr- 12 + 4 bits RISC -- 12 bits Lecture 8 - 24 Characteristic of RISC(3, 4) (3) Simple Addressing Modes - Shorten EA generation time – Almost all instructions use register addressing – Relative addressing using PC, BAR, and Index address – Other complex modes may be synthesized by software OP-code . . . Addressing Modes Immediate Direct Register Register Indirect Displacement Rs1 S2 Effective Address Operand=A EA = A EA = R EA = [R] EA = [R] + A Synthesis S2 0 + S2 Rs1, S2 Rs1 + 0 Rs1 + S2 Used by R-to-R LD/ST R-to-R LD/ST LD/ST (4) Simple Instruction Format - Shorten instruction Decoding Time – Usually one format – Fixed length/align on word boundary – Fixed field length CS311-Computer Organization RISC Lecture 8 - 25 Characteristic of RISC(5) (5) Pipelining (We will learn this later in detail) At this time, you just need to know that - Instruction execution hardware can be made of a few inter- connected independent sub-modules, called pipeline STAGEs S0 S1 S2 S3 - An instruction execution progresses at each pipeline stage in sequence - When an instruction completes its execution at the i-th stage, the next instruction commences its execution at the i-th stage - Thus, in the ideal situation, throughput increases nearly n times, where n is the number of pipeline stages - Branch instruction makes the pipelined execution inefficient CS311-Computer Organization RISC Lecture 8 - 26 Laundry Task Laundry Example • Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, and fold A B C D • Washer takes 30 minutes • Dryer takes 40 minutes • “Folder” takes 20 minutes We have 3 different work stages CS311-Computer Organization RISC Lecture 8 - 27 Sequential Laundry 6 PM 30 T a s k 7 40 Time 9 8 20 A 30 40 20 30 40 Midnight 11 10 20 30 40 20 Sequential laundry takes 6 hours for 4 loads If they learned pipelining, how long would laundry take? B O r d e r C D CS311-Computer Organization RISC Lecture 8 - 28 Pipelined Laundry T a s k Time 6PM 7 30 40 8 40 9 40 40 10 Midnight 20 A • Pipelined laundry takes 3.5 hours for 4 loads • Maximum of 3 tasks can be carried out concurrently B O r d e r 11 C D CS311-Computer Organization RISC Lecture 8 - 29 Pipelined Execution S0 S1 S2 S3 S2 IS2 0 IS3 0 1 instruction execution S0 S0 I0 S1 S1 I0 tx4 Execution of a Sequence of Instructions S0 I0 S1 I0 S2 S2 I0 S3 S3 I0 At 4t: I0 S0 I1 S1 I1 S2 S2 I1 S3 S3 I1 At 5t: I1 S0 I2 S1 I2 S2 S2 I2 S3 I2 At 6t: I2 S0 I3 S1 I3 S2 I3 S3 At 7t: I3 S0 I4 S1 I4 S2 S3 At 8t: I4 CS311-Computer Organization RISC N instructions complete at (n+3)t When n is large it becomes nt Thus, 1 instruction in every t Lecture 8 - 30 Pipeline Characteristics • Multiple tasks operating simultaneously • Pipeline does not help latency of single task, but it helps throughput of entire workload • Pipeline rate is limited by the slowest pipeline stage • Unbalanced lengths of pipeline stages reduce speedup • Potential speedup = Number of pipeline stages • Time to Fill pipeline and time to drain it reduces speedup CS311-Computer Organization RISC Lecture 8 - 31 Time Out • 수게 한 마리가 암게를 만나 청혼을 했다. • 그런데 암게가 보니 그 수게가 옆으로 걷지 않고 똑바로 걷는 것이었다. • ‘이놈 정말 별난 놈이로구나. 이런 놈을 놓쳐서는 안되겠다.’ 이렇게 생각하 고 즉시 그 수게와 결혼했다. • 그런데 다음날 암게는 남편이 다른 게들이나 마찬가지로 옆으로 걷는 걸 보 고 화가 나서 따졌다. • “도대체 어떻게 된 거에요?” 우리가 결혼하기 전에는 당신은 똑바로 걷지 않 았어요?” • 수게가 대답했다. “아이쿠, 여보, 매일 그렇게 술을 많이 마실 순 없지 않소.” CS311-Computer Organization RISC Lecture 8 - 32 Berkeley RISC RISC-I and RISC-II A 32-bit processor 31 and 39 instructions, respectively ORW, 138 Rs; Window: 10 global, 6 temporary, 10 local, 6 parameter 5 Instruction Format 31 24 OP=code 7 4 0 Cond 1 23 19 18 Rs2 imm13 13 0 SCC Rd Rs1 S2 1 5 5 14 imm19 Cond(flag): C, Z, O, N Rd: destination register Rs1: Source register S2: Functional Instr.: if MSB=0, then S2=Rs2: another source register : if MSB=1, imm13(13-bit immediate data) Transfer or Sequencing Instr.: if MSB=0, EA=[Rs1]+[Rs2]; index reg. : if MSB=1, EA=[Rs1]+imm13 RISC-II: EA=[PC] + S2 CS311-Computer Organization RISC Lecture 8 - 33 RISC-II Instruction Set • Functional(C:carry, R:reverse) – ADD, ADDC, SUB, SUBC, SUBR, SUBCR, AND, OR, XOR, SLL, SRL, SRA • Transfer(X:index, W:word, H:half, B:byte, R:relative, U/S:unsigned) (Index: EA=Rs1+S2(Rs2), Relative: EA=PC+S2(Rs2)) – LDXW, LDXHU(S), LDXBU(S), LDRW, LDRHU(S), LDRBU(S) – STXW, STXHU(S), STXBU(S), STRW, STRHU(S), STRBU(S) • Sequence Control – JMP, JMPR, CALL, CALLR, RET, CALLINT, RETINT, ... CS311-Computer Organization RISC Lecture 8 - 34 Ultimate RISC Instruction Set • BN instruction – Conditional branch phase in each instruction cycle – Does not conform with RISC philosophy, that is, inefficient use of instruction pipeline • Ultimate RISC instruction set – Move the content of the SOURCE(Read) to the DESTINATION(Write), both within memory – 2-address instruction » 1 address fits in an M word PC » 4-cycle instruction addr M[PC], PC temp M[addr] addr M[PC], PC M[addr] temp CS311-Computer Organization PC + 1 PC + 1 RISC X Y addr X temp A Y X A Y A Lecture 8 - 35 Ultimate RISC Architecture BUS Memory Mapped I/O Memory Mapped ALU PC: 1 special word(address=0) ALU contains an accumulator and flags IEU ALU M I/O Memory Mapped ALU Arithmetic operations - Special Addresses When ALU is used as a Destination - Store a value in AC - Operate on AC When ALU is used as the Source - One address gets the value of AC - Other addresses test the conditions code and sets the destination address (Branch either one of the 2 consecutive addresses) CS311-Computer Organization RISC Lecture 8 - 36 Memory Mapped ALU Writing an operand into an address associated with the operation, reading the resulting from the result from the other address Address 8 9 10 11 12 13 14 15 CS311-Computer Organization Write(used as the destination) Read(source address) AC data data AC AC AC - data data N AC data - AC data Z AC data + AC data V AC data + AC data C AC data v AC data N+0 AC AC ^ data data ((N + 0) v Z) AC data / 2 data C^ ~Z RISC Lecture 8 - 37 Condition Codes and Branching Condition Codes 2(10): True 0(00): False - Upon testing a CC, it sets the LSB of the destination address - This allows to branch either one of the two consecutive instructions Set to 00 when False Destination address Branch Set to 10 when True Moving a target address to location 0(PC) CS311-Computer Organization RISC Lecture 8 - 38 Instructions Cycle Instruction Layout in memory S D S D S D ... S D - 2 adjoining words/instruction - Contiguous storage of instructions Instruction Cycle - 4 clean cycles for pipelining [1] Fetch Source Address and increment PC: [2] Read Source Data: [3] Fetch Destination Address: [4] Write Data to Destination: IS RS ID WD Pipelining with a 4-port memory(3 reads and 1 write) Instruction 1: Instruction 2: Instruction 3: Instruction 4: CS311-Computer Organization IS1 RS1 IS2 ID1 RS2 IS3 RISC WD1 ID2 RS3 IS3 read read read write Completion 0f 1 instr/cycle WD2 ID3 RS4 WD3 ID4 WD4 Lecture 8 - 39 Improvement 3-Cycle Design S S S ... S D D D ... D Instruction Cycle - 3 clean cycles [1] Fetch Source and Destination Addresses and increment PC: ISD [2] Read Source Data: RS [3] Write Data to Destination: WD read read write 3-way Pipelining using a 3-port memory(2 read ports and 1 write port) Instruction 1: Instruction 2: Instruction 3: Instruction 4: CS311-Computer Organization ISD1 RS1 ISD2 WD1 RS2 ISD3 RISC WD2 RS3 ISD4 Completion of 1 instr./cycle WD3 RS4 WD4 Lecture 8 - 40 Improvement 2-Cycle Design IEU Instruction Memory ALU Data Memory I/O Instruction Cycle (2 dedicated memory units; 1 instruction, 1 data) [1] Read Data from Source: [2] Write Data to Destination, Read instruction, Increment PC: RS WD (RI read write read) 2-way Pipelining Instruction p+1: Instruction 2: Instruction 3: Instruction 4: CS311-Computer Organization WDp RSp+1 WDp+1 RSp+2 RISC Completion of 1 instr./cycle WDp+2 RSp+3 WDp+3 RSp+4 WDp+4 Lecture 8 - 41