Transcript part2.ppt

Motivation
• Claim: It’s hard to make a good predictor
– Example, which is better?
•
•
•
•
Use 50% of bits for global history/50% for address
Use 25% for global / 75% for address
Use 75% for global / 25% for address
Use all bits as global XOR address
• Idea: Automatically try ALL predictors?!?!
• This takes too long!
– Their approach: use genetic programming
1
Generic GP Algorithm
1) Create an initial population
2) Rank the fitness of individuals in population
3) Create the new generation
i. Replication: good individuals survive
ii. Crossover: randomly let good individuals meet other
good individuals
iii. Mutation: randomly change crossovers
iv. Encapsulation: limit crossovers
4) Goto step #2 until you’re happy.
2
GP Language
(What is the population?)
Predictor: a store
MSB
d bit
per branch
sat counter
Predictor[n,d]
T
d bits / element
Terminals
IF
PC
...
n
elements
+
-
inputs (PC, branch direction, etc.),
updates (taken?),
predictor value (P)
constants
Functions
P 1
P 1
if, xor, cat, maskhi, masklo,
msb, satur
3
Creating the Initial Generation
• Use templates (e.g., 2 bit counter with random index)
– Makes the initial population reasonable
– Any problems with this approach?
• How many?
– They create 400 random predictors.
– Is this enough?
– Is this too many?
4
Tournament Selection
(How do we randomly select?)
• Create new generation
Population
• Must favour better
individuals
(400)
• Every individual must
have a chance
Uniform Sample (8)
Fittest survives! 5
Fitness
• Branch misprediction rate
• Multiple benchmarks
– Stream collected by Atom
– Select SPEC 92/95 and IBS benchmarks
• Problem: SLOW!
• How do we make it faster?
6
Does it work? Yes!
Predictor
Onebit[1,512K]
Twobit[2,256K]
GShare
GAg[18]
PAg[18,8K]
PAp[9,18,8K]
(Table 1) Branch Predictor Performance
Mispredict
Mispredict
Predictor Mispredict
Mispredict
Rate (SPEC) Rate (IBS)
Rate (SPEC) Rate (IBS)
17.7
10.0
GP1
9.7
5.7
13.1
6.7
GP2
9.5
5.0
6.7
2.7
GP3
9.7
5.7
7.9
4.0
GP4
7.2
3.0
7.9
4.5
`
GP5
7.0
2.9
11.2
5.5
GP6
7.1
2.9
• Get predictors that are as good as hand crafted ones
• Why doesn’t it find GShare?
7
Does it work? No!
• Predictors are too complex to be implemented
• Was this a wasted attempt? No
– Discovered ideas that might be usable (sub-trees)
• Global history biased by direction (records usual histories)
• Separate forward and backward branch histories
• PC Histories
– Used the same framework for indirect jump predictors
• Can you use it for value predictors?
8
Fitness Thoughts
• Use only 1 (or small number) of traces each time
– Pro: faster
– Con: might overly bias the generations
• Use only prefix of trace
– Go long enough to get stable RANKING
– They claim this is good enough (tried some predictors)
• Even very short traces seem to be good enough
– Is it really good enough?
• Hand crafted predictors might be very different from the ones
coming out of this algorithm!
9
Discussion points
• How expressive is the language?
– Still need insight to come up with useful functions
– We need insight to come up with new inputs
• e.g., register values!
• How do you build good templates
– They seem to have needed them
– This VERY strongly affects the generations
• How do you build practical predictors?
– This is hard to add to GP framework
10