Branch prediction!

Transcript Branch prediction!

Dynamic History-Length Fitting: A third level of adaptivity for branch prediction

ISCA '98 Toni Juan Sanji Sanjeevan Juan J. Navarro

Department of Computer Architecture University Polit ècnica de Catalunya Presented by Danyao Wang ECE1718, Fall 2008

Overview • Branch prediction background • Dynamic branch predictors • Dynamic history-length fitting (DHLF) – Without context switches – With context switches • Results • Conclusion 2

Why branch prediction?

• Superscalar processors with deep pipelines – Intel Core 2 Duo: 14 stages – AMD Athlon 64: 12 stages – Intel Pentium 4: 31 stages • Many cycles before branch is resolved – Wasting time if wait… – Would be good if can do some useful work… •

Branch prediction!

What does it do?

sub r1, r2, r3 bne r1, r0, L1 add r4, r5, r6 … L1: add r4, r7, r8 sub r9, r4, r2 Branch resolved fetch decode fetch

sub

decode fetch Branch fetched Predict taken.

Fetch from L1

bne

decode fetch

add

decode

sub

Execute speculatively Validate prediction:

Correct

Time 4

What happens when mispredicted?

sub r1, r2, r3 bne r1, r0, L1 add r4, r5, r6 … L1: add r4, r7, r8 sub r9, r4, r2 Branch resolved fetch decode fetch

sub

decode fetch Branch fetched Predict taken.

Fetch from L1

bne

decode fetch

add

decode

sub

squash Execute speculatively Validate prediction:

Incorrect!

Time 5

How to predict branches?

• Statically at compile time – Simple hardware – Not accurate enough… • Dynamically at execution time – Hardware predictors • Last-outcome predictor • Saturation counter • Pattern predictor • Tournament predictor

More Complex More Accurate

Last-Outcome Branch Predictor • Simplest dynamic branch predictor • Branch prediction table with 1-bit entries • Intuition: history repeats itself PC lower N bits of PC index 2 N entries

1-bit Prediction: T or NT

-Read at Fetch -Write on misprediction

Branch Prediction Table

Saturation Counter Predictor • Observation: branches highly

bimodal

• n-bit saturation counter – Hysteresis – n-bit entries in branch prediction table

Strong bias

e.g.

2-bit bimodal predictor N

Pred. Not-Taken

T N

Pred. Taken 11

N T WEAK bias

Pattern Predictors • Near-by branches often correlate • Looks for patterns in branch history – Branch History Register (BHR):

outcomes most recent branch

Two-Level Predictor

lower n bits of PC saturation counter PC BHR f N-bit index 2 N entries m-bit history Branch Prediction Table 9

Tournament Predictor • No one-size-suits-all predictor • Dynamically choose among different predictors PC Predictor A Predictor B Predictor C Chooser or metapredictor 10

What is the best predictor?

Optimal

Better

Observations • Predictor performance depends on history length • Optimal history length differs for programs • Predictors with fixed history length underperforming potential • … dynamic history length?

Dynamic History-Length Fitting (DHLF)

Intuition • Tournament predictor – Picks best out of many predictors – Spatial multiplexing – Area cost … • DHLF: time multiplexing – Try different history lengths during execution – Adapt history length to code – Hope to find the best one 14

2-Level Predictor Revisited lower n bits of PC saturation counter PC BHR f n-bit index 2 n entries m-bit history Branch Prediction Table • Index = f(PC, BHR) • gshare, f = xor, m < n • 2-bit saturation counter 15

DHLF Approach • Current history length • Best so far length • Misprediction counter • Branch counter • Table of measured misprediction rates per length – Initialized to zero • Sampling at fixed intervals (

step

size) – Try new length: get MR – Adjust if worse than best seen before – Move to a random length if length has not changed for a while • Avoids local minima 16

DHLF Examples Index = 12 bits step = 16K Optimal 17

Experimental Methodology • SPECint95 •

gshare

and

dhlf-gshare

• Trace-driven simulation • Simulated up to 200M conditional branches • Branch history register & pattern history table immediately updated with the true outcome 18

DHLF Performance

Better

• Area overhead – Index length = 10; step size = 16K; overhead = 7% – Index length = 16; step size = 16K; overhead = 0.02% 19

Optimization Strategies • Step size – Small: learns faster • Has to be big enough for meaningful misprediction stats – Big: learns slower • Change length incrementally – Test as many lengths as possible • Warm-up period – No MR count for 1 interval after length change 20

Context Switches • Branch prediction table trashed periodically • Lower prediction accuracy immediately after a context switch • Context switch frequency affects optimal history length 21

Impact on Misprediction Rate

Context-switch distance

: # branches executed between context switches

Better

gshare. Index = 16 bits 22

Coping with Context Switches • Upon context switch – Discard current misprediction counter – Save current predictor data • misprediction table • current history length • Approx. 221 bits for 16-bit index, step = 16K, 13 bit misprediction counter • Returning from a context switch – Warm-up: no MR counter for 1 interval 23

DHLF with Context Switches x dhlf-gshare with step value = 16K gshare with all possible history length

Better

Branch prediction table flush every 70K instructions to simulate context switch.

Contributions • Dynamically finds near-optimal history lengths • Performs well for programs with different branch behaviours • Performs well under context switches • Can be applied to any two-level branch predictor • Small area overhead 25

Backup Slides

DHLF Performance: SPECint95

Better Better

dhlf-share; step size = 16K. Compared to all possible history lengths (no context switch) 27

DHLP with Context Switches

Better Better

dhlf-gshare; step size = 16K; context-switch distance = 70K 28

dhlf-gskew

Better

Step value = 16K. Compared to all history lengths for gskew, 29

dhlf-gskew with Context Switch

Better

Step size = 16K; Context-switch distance = 70K.

DHLF Structure Initial history length

step

dynamic branches Run next interval No current misprediction > min achieved?

Yes Adjust history length DHLF Data Structure Misprediction table 0 1 N N entries ptr. to min. misprediction count branch counter ptr. to entry for current history length misprediction counter 31

Questions • Is fixed context switch distance realistic?

• Does updating the PHT with true branch data immediately affect results?

– Previous studies show little impact due to this 32

Branch prediction!

Transcript Branch prediction!

Directory