BranchTap Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control Patrick Akl and Andreas Moshovos AENAO Research Group Department of Electrical and Computer Engineering University.

Download Report

Transcript BranchTap Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control Patrick Akl and Andreas Moshovos AENAO Research Group Department of Electrical and Computer Engineering University.

BranchTap
Improving Performance With Very Few Checkpoints
Through Adaptive Speculation Control
Patrick Akl and Andreas Moshovos
AENAO Research Group
Department of Electrical and Computer Engineering
University of Toronto
June 28th, 2006
BranchTap: Improving Performance With Very Few
Checkpoints Through Adaptive Speculation Control
1/25
What Happens on a Branch Misprediction?
Execution Timeline
Predict a Branch
Outcome
Predicted Path
Misprediction
Discovered
Correct Path
Recover Processor
State
Redirect Fetch
Resume
Execution
• We wish to make the recovery fast
June 28th, 2006
BranchTap: Improving Performance With Very Few
Checkpoints Through Adaptive Speculation Control
2/25
State-of-the-art recovery
• Existing mechanisms
– Reorder buffer based: slow
– Instantaneous checkpoints: faster
• Problem: can’t have enough checkpoints
• State-of-the-art solution: checkpoint prediction
– Allocate the few checkpoints judiciously
• Another degree of freedom: speculation control
– Sometimes deeper speculation = higher recovery cost
• Can hurt performance
– Throttle speculation
June 28th, 2006
BranchTap: Improving Performance With Very Few
Checkpoints Through Adaptive Speculation Control
3/25
BranchTap Results / Benefits
• No additional checkpoints are needed
• Dynamically adapts to application behavior
• Improves performance for most programs
– Misprediction performance penalty reduced by 28% on AVG
• BranchTap comes “for free”
– Very simple to implement
– Better than more accurate checkpoint predictors
June 28th, 2006
BranchTap: Improving Performance With Very Few
Checkpoints Through Adaptive Speculation Control
4/25
Outline
• Background
• BranchTap
• Methodology and Results
• Summary
June 28th, 2006
BranchTap: Improving Performance With Very Few
Checkpoints Through Adaptive Speculation Control
5/25
State Recovery Example: Register Alias Table
A add r1, r2, 100
B breq r1, E
C sub r1, r2, r2
RAT
Lg(# arch. regs)
p1
p4
p5
p4
Architectural
Register
p2
p3
Renamed Code
# arch. regs
Original Code
A add p4, p2, 100
B breq p4, E
C sub r5, p2, p2
Physical
Register
June 28th, 2006
BranchTap: Improving Performance With Very Few
Checkpoints Through Adaptive Speculation Control
6/25
ROB: Slow, Fine-Grain Recovery
Each entry contains
1. Architectural destination register
2. Its previous RAT map
Program Order
3. Undo RAT updates in reverse order
B
B
1. Misprediction
discovered
B
B
Reorder
Buffer
2. Locate newest
instruction
RAT
• Too slow: recovery latency proportional to number
of instructions to squash
June 28th, 2006
BranchTap: Improving Performance With Very Few
Checkpoints Through Adaptive Speculation Control
7/25
Global Checkpoints: Fast, Coarse-Grain Recovery
Program Order
checkpoint
B
checkpoint
B
checkpoint
B
checkpoint
B
Reorder
Buffer
1. Misprediction
discovered
RAT
• Branch w/ GC: Recovery is “Instantaneous”
June 28th, 2006
BranchTap: Improving Performance With Very Few
Checkpoints Through Adaptive Speculation Control
8/25
Impact of More Checkpoints
Concept
Actual Implementation
RAT
architectural
register
physical register
Working Copy
• More checkpoints ?
– Power hungry structure
– Increased delay
• Only a few checkpoints can practically be implemented
– Cannot always cover all branches
June 28th, 2006
BranchTap: Improving Performance With Very Few
Checkpoints Through Adaptive Speculation Control
9/25
Intelligent Checkpointing
• State of the art solution
– Checkpoint allocation: Allocate checkpoints at hard-topredict branches
– Checkpoint management: Release checkpoints as soon as
they are no longer needed
• Use few checkpoints efficiently
June 28th, 2006
BranchTap: Improving Performance With Very Few
Checkpoints Through Adaptive Speculation Control
10/25
Conventional Mechanisms: Recovery Scenarios
• Mispeculation on a branch w/ a GC: Direct recovery
B
B
B
ROB
Fast Recovery
checkpoint
• Mispeculation on a branch w/o a GC: Indirect recovery
B
B
B
checkpoint
ROB
Slow Recovery
• With intelligent checkpointing:
• 30% Indirect recoveries  75% of performance loss
June 28th, 2006
BranchTap: Improving Performance With Very Few
Checkpoints Through Adaptive Speculation Control
11/25
Outline
• Background
• BranchTap
• Methodology and Results
• Summary
June 28th, 2006
BranchTap: Improving Performance With Very Few
Checkpoints Through Adaptive Speculation Control
12/25
BranchTap Motivation
Low confidence branch
No Wait Scenario
B
B
~ Recovery Cost
B
ROB
checkpoint checkpoint
Misprediction
discovered
Wait Scenario
B
B
checkpoint checkpoint
B
ROB
~ Recovery Cost
Sometimes, it is better to wait if no checkpoint is available
June 28th, 2006
BranchTap: Improving Performance With Very Few
Checkpoints Through Adaptive Speculation Control
13/25
BranchTap Concept
• Key idea: stall when speculation is likely to deteriorate
performance
– Count the number of low confidence branches w/o a checkpoint
– If it exceeds a threshold, stall
• Threshold selection
– Fixed
• Varies greatly across programs
• Can deteriorate performance significantly
– Adaptive
• Robust performance
• Minimize recovery cost while conserving good speculation
opportunities
June 28th, 2006
BranchTap: Improving Performance With Very Few
Checkpoints Through Adaptive Speculation Control
14/25
Threshold Adaptation Policy
Execution Timeline (Cycles)
WT
Sample &
adapt
W
T
W
T
+
4
W
T
-4
No adaptation
Next WT
• BranchTap adapts across and within applications
June 28th, 2006
BranchTap: Improving Performance With Very Few
Checkpoints Through Adaptive Speculation Control
15/25
Outline
• Background
• BranchTap
• Methodology and Results
• Summary
June 28th, 2006
BranchTap: Improving Performance With Very Few
Checkpoints Through Adaptive Speculation Control
16/25
Results Overview
• Performance w/o Checkpoints
– BranchTap improves even with just an ROB
• Performance w/ 4 Checkpoints
– BranchTap improves over conventional recovery methods
• Performance w/ Larger Checkpoint Predictors
– BranchTap offers better performance than a 64x larger
predictor
June 28th, 2006
BranchTap: Improving Performance With Very Few
Checkpoints Through Adaptive Speculation Control
17/25
Methodology
• Simulator based on Simplescalar
• 24 SPEC CPU 2000 benchmarks
• Reference Inputs
• Processor configurations
– 8-way OoO core
– Up to 1K in-flight instructions
– 1K-entry confidence table for low confidence branch
identification
• 1B committed instructions after skipping 100B
June 28th, 2006
BranchTap: Improving Performance With Very Few
Checkpoints Through Adaptive Speculation Control
18/25
“Perfect Checkpointing” Configuration
• A checkpoint is auto-magically taken at all
mispredicted branches
– All recoveries are fast
• We report the “deterioration relative to perfect
checkpointing”
June 28th, 2006
BranchTap: Improving Performance With Very Few
Checkpoints Through Adaptive Speculation Control
19/25
Performance with No Checkpoints
• Deterioration relative to “perfect checkpointing”
better
deterioration
Conventional
BranchTap Adaptive
BranchTap Non-Adaptive
25%
20%
-39%
15%
10%
5%
0%
gzip
vpr
lucas
art
AVG
• BranchTap improves over conventional mechanisms
• Adaptation leads to robust performance improvements
June 28th, 2006
BranchTap: Improving Performance With Very Few
Checkpoints Through Adaptive Speculation Control
20/25
Performance Evaluation with 4 Checkpoints
better
deterioration
• Deterioration relative to “perfect checkpointing”
Conventional
BranchTap Adaptive
BranchTap non-Adaptive
10%
8%
-28%
6%
4%
2%
0%
twolf
parser
lucas
mcf
bzip2
AVG
• BranchTap with 4 checkpoints is better than 6
checkpoints alone
June 28th, 2006
BranchTap: Improving Performance With Very Few
Checkpoints Through Adaptive Speculation Control
21/25
better
deterioration
BranchTap vs. Larger Checkpoint Predictors
3.0%
2.5%
2.0%
BranchTap
1.5%
1.0%
0.5%
0.0%
64
256
1K
4K
16K
64K
confidence table size
• BranchTap with a 1K-entry confidence table and 4 GCs:
– Higher performance than a 64K-entry confidence table with 4 GCs
– Lower complexity, virtually comes “for free”
June 28th, 2006
BranchTap: Improving Performance With Very Few
Checkpoints Through Adaptive Speculation Control
22/25
Outline
• Background
• BranchTap
• Methodology and Results
• Summary
June 28th, 2006
BranchTap: Improving Performance With Very Few
Checkpoints Through Adaptive Speculation Control
23/25
Summary
• Performance with 4 (no) checkpoints
– ~28 (39) % of misprediction penalty removed
– BranchTap is robust:
• Up to 6 (13) % better and max 1.2 (0.1) % worse than
conventional mechanisms
• BranchTap is very simple to implement
– Few counters and comparators
• BranchTap is better than other alternatives
– BT + 1K predictor better than a 64K predictor alone
– BT + 4 GCs better than 6 GCs alone
June 28th, 2006
BranchTap: Improving Performance With Very Few
Checkpoints Through Adaptive Speculation Control
24/25
BranchTap
Improving Performance With Very Few Checkpoints
Through Adaptive Speculation Control
Patrick Akl and Andreas Moshovos
AENAO Research Group
Department of Electrical and Computer Engineering
University of Toronto
{pakl, moshovos}@eecg.toronto.edu
June 28th, 2006
BranchTap: Improving Performance With Very Few
Checkpoints Through Adaptive Speculation Control
25/25