CS 162 Computer Architecture Lecture 3: Pipelining Contd. Instructor: L.N. Bhuyan www.cs.ucr.edu/~bhuyan/cs162
Download ReportTranscript CS 162 Computer Architecture Lecture 3: Pipelining Contd. Instructor: L.N. Bhuyan www.cs.ucr.edu/~bhuyan/cs162
CS 162 Computer Architecture Lecture 3: Pipelining Contd. Instructor: L.N. Bhuyan www.cs.ucr.edu/~bhuyan/cs162 1 1999 ©UCB Single Cycle Datapath (From Ch 5) 4 P C Read Addr 31:0 Instruction Imem 15:11 a d d 25:21 20:16 M u x << 2 PCSrc MemWrite Read Reg1 Read Read data1 Reg2 Read Write data2 Reg Write Data Regs RegDst RegWrite 15:0 2 a d d M u x Sign Extend M u x A L U Read data Zero Address MemToReg Dmem ALUcon ALUsrc Write Data MemRead ALUOp M u x 1999 ©UCB Required Changes to Datapath °Introduce registers to separate 5 stages by putting IF/ID, ID/EX, EX/MEM, and MEM/WB registers in the datapath. °Next PC value is computed in the 3rd step, but we need to bring in next instn in the next cycle – Move PCSrc Mux to 1st stage. The PC is incremented unless there is a new branch address. °Branch address is computed in 3rd stage. With pipeline, the PC value has changed! Must carry the PC value along with instn. Width of IF/ID register = (IR)+(PC) = 64 bits. 3 1999 ©UCB Changes to Datapath Contd. °For lw instn, we need write register address at stage 5. But the IR is now occupied by another instn! So, we must carry the IR destination field as we move along the stages. See connection in fig. Length of ID/EX register = (Reg1:32)+(Reg2:32)+(offset:32)+ (PC:32)+ (destination register:5) = 133 bits Assignment: What are the lengths of EX/MEM, and MEM/WB registers 4 1999 ©UCB Pipelined Datapath (with Pipeline Regs)(6.2) Fetch Decode Execute Memory Write Back 0 M u x 1 IF/ID EX/MEM ID/EX MEM/WB Add 4 Add Add result PC Ins truction Shift left 2 Address Read register 1 Read data 1 Read register 2 Read data 2 Write register Imem Write data 0 M u x 1 Regs Zero ALU ALU result Address Write data 16 Sign extend 32 Read data 1 M u x 0 Dmem 5 5 64 bits 133 bits 102 bits 69 bits 1999 ©UCB Pipelined Control (6.3) • Start with single-cycle controller • Group control lines by pipeline stage needed • Extend pipeline registers with control bits WB Instruction Control Mem WB EX Mem RegDst ALUop ALUSrc IF/ID 6 ID/EX WB Branch MemRead MemWrite EX/MEM MemToReg RegWrite MEM/WB 1999 ©UCB Pipelined Processor: Datapath + Control • More work to correctly handle pipeline hazards PCSrc ID/EX 0 M u x 1 WB Control IF/ID EX/MEM M WB EX M MEM/WB WB Add Imem Read regis ter 2 Writ e regis ter Writ e data ALUSrc Read data 1 Read data 2 Regs 0 M u x 1 Zero ALU ALU result Address Write data Instruction 16 [15– 0] Instruction [20– 16] Instruction [15– 11] Si gn ex tend 32 6 ALU control 0 M u x 1 MemToReg Read regis ter 1 Branch Sh if t left 2 MemWrite Address Instructi on PC Add Add result RegWrite 4 Read data Dmem 1 M u x 0 MemRead ALUOp RegDst 7 1999 ©UCB Recap °if can keep all pipeline stages busy, can retire (complete) up to one instruction per clock cycle (thereby achieving single-cycle throughput) °The pipeline paradox (for MIPS): any instruction still takes 5 cycles to execute (even though can retire one instruction per cycle) 8 1999 ©UCB Problems for Pipelining °Hazards prevent next instruction from executing during its designated clock cycle, limiting speedup • Structural hazards: HW cannot support this combination of instructions (single memory for instruction and data) • Data hazards: Instruction depends on result of prior instruction still in the pipeline • Control hazards: conditional branches & other instructions may stall the pipeline delaying later instructions 9 1999 ©UCB Single Memory is a Structural Hazard Time (clock cycles) Reg Reg M Reg M Reg M Reg M Reg ALU M Reg M Reg ALU M M ALU Reg ALU 10 M ALU I n s Load t Instr 1 r. Instr 2 O Instr 3 r d Instr 4 e r M Reg • Can’t read same memory twice in same clock cycle 1999 ©UCB EX: MIPS multicycle datapath: Structural Hazard in Memory P C Address Instruction Register Read Reg1 Memory Read Reg2 Instruction or Data Data 11 Read data 1 A Registers Memory Data Register Write Reg Read data 2 A L U ALUOut B Data 1999 ©UCB Structural Hazards limit performance °Example: if 1.3 memory accesses per instruction (30% of instructions execute loads and stores) and only one memory access per cycle then • Average CPI 1.3 • Otherwise datapath resource is more than 100% utilized Structural Hazard Solution: Add more Hardware 12 1999 ©UCB Speed Up Equation for Pipelining CPIpipelined = Ideal CPI + Pipeline stall clock cycles per instn Speedup = Ideal CPI x Pipeline depth Clock Cycleunpipelined ---------------------------------- X ------------------------- Ideal CPI + Pipeline stall CPI Clock Cyclepipelined x Speedup = Pipeline depth Clock Cycleunpipelined ------------------------ X --------------------------1 + Pipeline stall CPI Clock Cyclepipelined 13 1999 ©UCB Example: Dual-port vs. Single-port ° Machine A: Dual ported memory ° Machine B: Single ported memory, but its pipelined implementation has a 1.05 times faster clock rate ° Ideal CPI = 1 for both ° Loads are 40% of instructions executed SpeedUpA = Pipeline Depth/(1 + 0) x (clockunpipe/clockpipe) = Pipeline Depth SpeedUpB = Pipeline Depth/(1 + 0.4 x 1) x (clockunpipe/(clockunpipe / 1.05) = (Pipeline Depth/1.4) x 1.05 = 0.75 x Pipeline Depth SpeedUpA / SpeedUpB = Pipeline Depth/(0.75 x Pipeline Depth) = 1.33 ° Machine A is 1.33 times faster 14 1999 ©UCB Data Hazard on Register $1 (6.4) add $1 ,$2, $3 sub $4, $1 ,$3 and $6, $1 ,$7 or $8, $1 ,$9 xor $10, $1 ,$11 15 1999 ©UCB Data Hazard Solution: • “Forward” result from one stage to another Time (clock cycles) and $6,$1,$7 IM Reg IM EX MEM WB DM Reg Reg DM Reg IM Reg ALU sub $4,$1,$3 ID/RF ALU add $1,$2,$3 IF ALU I n s t r. DM Reg ALU O IM DM Reg r or $8,$1,$9 d IM Reg xor $10,$1,$11 e r • “or” OK if implement register file properly ALU 16 Reg DM Reg 1999 ©UCB Hazard Detection for Forwarding ° A hazard must be detected just before execution so that in case of hazard, the data can be forwarded to the input of the ALU. ° It can be detected when a source register (Rs or Rt or both) of the instruction at the EX stage is equal to the destination register (Rd) of an instruction in the pipeline (either in MEM or WB stage) ° Compare the values of Rs and Rt registers in the ID/EX stage with Rd at EX/MEM and MEM/WB stages => Need to carry Rs, Rt, Rd values to the ID/EX register from the IF/ID register (only Rd was carried before) ° If they match, forward the data to the input of the ALU through the multiplexor. See Fig. 6.43 pp. 488 of the text 17 1999 ©UCB Forwarding: What about Loads? • Dependencies backward in time are hazards IM Reg IM EX MEM WB DM Reg Reg ALU sub $4,$1,$3 ID/RF ALU lw $1,0($2) IF DM Reg • Can’t solve with forwarding alone • Must stall instruction dependent on load •“Load-Use” hazard 18 1999 ©UCB Data Hazard Even with Forwarding • Must stall pipeline 1 cycle (insert 1 bubble) Time (clock cycles) 19 IM WB DM Reg Reg bub ble IM bub ble Reg bub ble IM DM Reg DM Reg Reg ALU or $8,$1,$9 Reg MEM ALU and $6,$1,$7 IM EX ALU sub $4,$1,$6 ID/RF ALU lw $1, 0($2) IF DM 1999 ©UCB Compiler Schemes to Improve Load Delay ° Compiler will detect data dependency and inserts nop instructions until data is available sub $2, $1, $3 nop and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2) ° Compiler will find independent instructions to fill in the delay slots 20 1999 ©UCB Software Scheduling to Avoid Load Hazards Try producing fast code for a = b + c; d = e – f; assuming a, b, c, d ,e, and f in memory. Slow code: LW LW ADD SW LW LW SUB SW 21 Fast code: Rb,b Rc,c Ra,Rb,Rc a,Ra Re,e Rf,f Rd,Re,Rf d,Rd LW Rb,b LW Rc,c LW Re,e ADD Ra,Rb,Rc LW Rf,f SW a,Ra SUB Rd,Re,Rf SW d,Rd 1999 ©UCB