Transcript Slide 1
Lecture 8: Modern Dynamic Instruction Scheduling Tomasulo weakness, data forwarding, reg mapping table, generic superscalar models, examples 1 Tomasulo Performance Observe at the EX stage, how many cycles to execute this code? LW R2,45(R3) ADD R6,R2,R4 SUB R10,R0,R6 ADD R10,R10,R12 Assume load takes 1 cycle, ALU 1 cycle IM Fetch Unit Reorder Buffer Decode Rename Regfile S-buf L-buf RS RS FU1 FU2 DM 2 Tomasulo vs MIPS Pipeline How many cycles on the 5-stage MIPS pipeline? Why does the simple pipeline run faster? IF ID EX MEM WB Stall check Data forwarding 3 Tomasulo Complexity and Efficiency Modern processors employ deep pipeline => Can the rename stage be finished in one fast cycle? => How are register content storages? IM Fetch Unit Reorder Buffer Decode Rename Regfile S-buf L-buf RS RS FU1 FU2 DM 4 Review Tomasulo Inst Scheduling Both in RS, no contention on CDB or FU ADD SUB R2,R2,45 R6,R2,R4 Cycle 1: Cycle 2: Cycle 3: # R2=>tag p, result = A # R4 is ready, = B ADD starts at FU, producing A ADD broadcast p + A SUB matches on p and accepts A SUB starts execution, FU calc A-B A is produced at cycle 1, but consumed at cycle 3 -- unavoidable? 5 Review Data Forwarding MIPS pipeline data forwarding: FU/MEM => FU Why not in Tomasulo? REG/ROB FU ROB bypass Cycle 2: forward A from FU output to FU input… But tag broadcasting has one cycle delay!! When is it known that A will be ready? Cycle 1: A is to be ready Cycle 2: A and its tag are broadcast If tag is broadcast onecycle earlier … 6 Revise Scheduling* RS1: ADD RS2: SUB RS3: ADD R6,R2,R4 R10,R0,R6 R12,R10,R6 ADD(1) has been ready and selected 1. 2. 3. - ADD(1)’s tag is broadcast, and operands are sent to FU; - SUB is waken up and selected; - SUB’s tag is broadcast, operands are sent to FU; - forwarding logic replace 2nd FU operand with FU output; - ADD(2) is waken up and accepts FU output, and is selected So on and so forth… RS RS RS RS RS 1 2 3 4 5 SELECT FU RS can be centralized or distributed *Updated One cycle earlier How to address CDB contention? 7 Revise Pipeline Stages FETCH FETCH ISSUE RENAME EXE REG/ROB Rd WB COMMIT ISSUE: decode, rename, allocate RS and ROB, and read REG/ROB EX: Wakeup and select inst, then fu-execute SCHEDULE EXE WB COMMIT 8 Examples: Intel P6 … Decode Decode Rename ROB Rd … • 40-entry ROB • 20-entry RS station • Register Alias Table 9 Rethink RS and ROB design Data broadcasting to RS stations: Broadcasting saves reg-write to regread delay n child instructions can receive data simultaneously However, Data forwarding can be used Not all n child instructions may fuexecute next cycle RS and ROB may store duplicate values 10 Physical Register RS entry op Qj Qk busy Vj ROB entry i-type dest PC Vk Physical register p1 p2 p3 valid result p_n Physical register: collection of all temporary register contents 11 Register Mapping Approach Rename architectural register to physical register NO real architectural registers (now virtual register) RS => issue queue Rename stage: allocate issue queue entry, allocate ROB, allocate physical register What is tag now? ra rb rc pc Mapping Table pa p b alloc free list pa pb p1 p2 p3 p_n vala valb 12 Mis-speculation Recovery RS+ROB: no changes to arch. registers, so just clear pipeline and re-fetch Fundamental issue: software does not see wrong register contents Recovery for mapping approach: Roll back mapping table to the misspeculation point Architectural registers => virtual registers Committed mapping ROB p1 p2 p3 mapping 1 mapping 2 p_n mapping table status How to implement mapping table supporting recovery? 13 Change of pipeline FETCH IM RENAME Fetch Unit SCHEDULE Decode Rename issue queue REG ROB phy. regfile EXE WB COMMIT S-buf L-buf FU1 FU2 DM 14 Example: Intel Pentium 4 Alloc Rename Rename Queue Schd Schd Schd Disp Disp Reg Reg Ex 128 entries 15 Alpha 21264 Pipeline 16 Generic Superscalar Processor Models D-cache FU FU D-cache FU FU Wakeup select bypass Reg ROB Schedule Rename commit execute Reservation based Fetch bypass Regfile Wakeup select Schedule Rename Fetch Issue queue based commit execute Source: Paracharla PhD thesis 1998 17 Summary of Dynamic Scheduling Pipeline stages Renaming (in-order) Schedule Commit (in-order) Two organizations Mapping table + phy reg + issue queue + ROB; REN => SCHD => REG Reg alias table + RS + ROB, reg in RS and ROB; REN => REG => SCHD CDC6600: introduces scoreboarding Tomasulo: introduces renaming and tag broadcasting Reorder buffer: provides inorder commit Real OOO processors Scheduling methods Tag broadcasting vs. scoreboarding (later) very complicated (like a vehicle) bring impl variants but all root in those basic designs 18