Transcript Document
Pipelining Reducing Instruction Execution Time COMP25212 Lecture 5 1 The Fetch-Execute Cycle CPU Memory • Instruction execution is a simple repetitive cycle Fetch Instruction Execute Instruction COMP25212 Lecture 5 2 Cycles of Operation • Most logic circuits are driven by a clock • In its simplest form one operations would take one clock cycle • This is assuming that getting an instruction and accessing data memory can each be done in a 1/5th of a cycle (i.e. a cache hit) COMP25212 Lecture 5 3 Fetch-Execute Detail The two parts of the cycle can be subdivided • Fetch – Get instruction from memory – Decode instruction & select registers • Execute – Perform operation or calculate address – Access an operand in data memory – Write result to a register COMP25212 Lecture 5 4 Processor Detail IF EX MEM WB Data Cache Register Bank ALU MUX Instruction Cache PC ID COMP25212 Logic to do this Inst Cache Data Cache Write Logic Mem Logic Exec Logic Decode Logic Fetch Logic • Each stage will do its work and pass work to the next • Each block is only computing once every 1/5th of a cycle COMP25212 Lecture 5 6 Can We Overlap Operations? • E.g while decoding one instruction we could be fetching the next Inst a Inst b Inst c Inst d Inst e 1 IF Clock Cycle 2 3 4 5 6 7 ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM IF ID EX COMP25212 Lecture 5 7 Insert Buffers Between Stages Inst Cache Data Cache clock Write Logic Mem Logic Exec Logic Decode Logic Instruction Reg. Fetch Logic • Instead of direct connection between stages – use extra buffers to hold state • Clock buffers once per cycle COMP25212 Lecture 5 8 This is a Pipeline • Just like a car production line, one stage puts engine in, next puts wheels on etc. • We still execute one instruction every cycle • We can now increase the clock speed by 5x • 5x faster! • But it isn’t quite that easy! COMP25212 Lecture 5 9 Why 5 Stages • Simply because early pipelined processors determined that dividing into these 5 stages of roughly equal complexity was appropriate • Some recent processors have used up to 30 pipeline stages • We will consider 5 for simplicity at the moment COMP25212 Lecture 5 10 The Control Transfer Problem • The obvious way to fetch instructions is in serial program order (i.e. just incrementing the PC) • What if we fetch a branch? • We only know it’s a branch when we decode it in the second stage of the pipeline • By that time we are already fetching the next instruction in serial order COMP25212 Lecture 5 11 A Pipeline ‘Bubble’ Decode here 1 2 3 Bra 5 2 3 Bra 5 n cycles 3 Bra 5 n n+1 We must mark Inst 5 as unwanted and Ignore it as it goes down the pipeline. But we have wasted a cycle COMP25212 Lecture 5 Inst 1 Inst 2 Inst 3 Branch n Inst 5 . . Inst n 12 Conditional Branches • It gets worse! • Suppose we have a conditional branch • It is possible that we might not be able to determine the branch outcome until the execute (3rd) stage • We would then have 2 ‘bubbles’ • We can often avoid this by reading registers during the decode stage. COMP25212 Lecture 5 13 Longer Pipelines • ‘Bubbles’ due to branches are usually called Control Hazards • They occur when it takes one or more pipeline stages to detect the branch • The more stages, the less each does • More likely to take multiple stages • Longer pipelines usually suffer more degradation from control hazards COMP25212 Lecture 5 14 Branch Prediction • In most programs a branch instruction is executed many times • Also, the instructions will be at the same (virtual) address in memory • What if, when a branch was executed – We ‘remembered’ its address – We ‘remembered’ the address that was fetched next COMP25212 Lecture 5 15 Branch Target Buffer • We could do this with some sort of cache Address Data Branch Add Target Add • As we fetch the branch we check the target • If a valid entry in buffer we use that to fetch next instruction COMP25212 Lecture 5 16 Branch Target Buffer • For an unconditional branch we would always get it right • For a conditional branch it depends on the probability that the next branch is the same as the previous • E.g. a ‘for’ loop which jumps back many times we will get it right most of the time • But it is only a prediction, if we get it wrong we correct next cycle (suffer a ‘bubble’) COMP25212 Lecture 5 17 Outline Implementation valid Branch Target Buffer Inst Cache inc PC Fetch Stage COMP25212 Lecture 5 18 Other Branch Prediction • BTB is simple to understand but expensive to implement • Also, as described, it just uses the last branch to predict • In practice branch prediction depends on – More history (several previous branches) – Context (how did we get to this branch) • Real branch predictors are more complex and vital to performance (long pipelines) COMP25212 Lecture 5 19