PREDICTING UNROLL FACTORS USING SUPERVISED LEARNING Mark Stephenson & Saman Amarasinghe
Download
Report
Transcript PREDICTING UNROLL FACTORS USING SUPERVISED LEARNING Mark Stephenson & Saman Amarasinghe
PREDICTING UNROLL FACTORS USING
SUPERVISED LEARNING
Mark Stephenson & Saman Amarasinghe
Massachusetts Institute of Technology
Computer Science and Artificial Intelligence Lab
INTRODUCTION & MOTIVATION
Compiler heuristics rely on detailed
knowledge of the system
Compiler interactions not understood
Architectures are complex
Features
Superscalar
Pentium®
(3M)
Pentium 4
(55M)
Hyperthreading
Speculative
execution
Improved FPU
HEURISTIC DESIGN
Current approach to heuristic
development is somewhat ad hoc
Can compiler writers learn anything from
baseball?
• Is it feasible to deal with empirical data?
• Can we use statistics and machine learning
to build heuristics?
CASE STUDY
Loop unrolling
• Code expansion can degrade performance
• Increased live ranges, register pressure
• A myriad of interactions with other passes
Requires categorization into multiple
classes
• i.e., what’s the unroll factor?
ORC’S HEURISTIC (UNKNOWN TRIPCOUNT)
if (trip_count_tn == NULL) {
UINT32 ntimes = MAX(1, OPT_unroll_times-1);
INT32 body_len = BB_length(head);
while (ntimes > 1 && ntimes * body_len > CG_LOOP_unrolled_size_max)
ntimes--;
Set_unroll_factor(ntimes);
} else {
…
}
ORC’S HEURISTIC (KNOWN TRIPCOUNT)
} else {
BOOL const_trip = TN_is_constant(trip_count_tn);
INT32 const_trip_count = const_trip ? TN_value(trip_count_tn) : 0;
INT32 body_len = BB_length(head);
}
CG_LOOP_unroll_min_trip = MAX(CG_LOOP_unroll_min_trip, 1);
if (const_trip && CG_LOOP_unroll_fully &&
(body_len * const_trip_count <= CG_LOOP_unrolled_size_max ||
CG_LOOP_unrolled_size_max == 0 &&
CG_LOOP_unroll_times_max >= const_trip_count)) {
Set_unroll_fully();
Set_unroll_factor(const_trip_count);
} else {
UINT32 ntimes = OPT_unroll_times;
ntimes = MIN(ntimes, CG_LOOP_unroll_times_max);
if (!is_power_of_two(ntimes)) {
ntimes = 1 << log2(ntimes);
}
while (ntimes > 1 && ntimes * body_len > CG_LOOP_unrolled_size_max)
ntimes /= 2;
if (const_trip) {
while (ntimes > 1 && const_trip_count < 2 * ntimes)
ntimes /= 2;
}
Set_unroll_factor(ntimes);
}
SUPERVISED LEARNING
Supervised learning algorithms try to find
a function F(X) → Y
• X : vector of characteristics that define a loop
• Y : empirically found best unroll factor
1
2
3
4
5
6
7
8
Loops
F(X)
Unroll Factors
EXTRACTING THE DATA
Extract features
• Most features readily available in ORC
• Kitchen sink approach
Finding the labels (best unroll factors)
• Added instrumentation pass
Assembly instructions inserted to time loops
Calls to a library at all exit points
• Compile and run at all unroll factors (1.. 8)
For each loop, choose the best one as the label
LEARNING ALGORITHMS
Prototyped in Matlab
Two learning algorithms classified our
data set well
• Near neighbors
• Support Vector Machine (SVM)
Both algorithms classify quickly
• Train at the factory
• No increase in compilation time
# FP operations
NEAR NEIGHBORS
# branches
unroll
don’t unroll
SUPPORT VECTOR MACHINES
Map the original feature space into a
higher-dimensional space (using a kernel)
Find a hyperplane that maximally
separates the data
# FP operations
# FP operations
SUPPORT VECTOR MACHINES
# branches2
# branches
unroll
don’t unroll
PREDICTION ACCURACY
Leave-one-out cross validation
Filter out ambiguous training examples
• Only keep obviously better examples (1.05x)
• Throw away obviously noisy examples
Accuracy
NN
62%
SVM
65%
ORC
16%
Speedup over ORC.
10%
0%
-10%
afty
256.b z
ip 2
176.g c
c
rtex
253.p e
r lbmk
NN v. ORC
Oracle v. ORC
255.vo
175.vp
r
300.tw
olf
197.p a
rser
186.cr
f
164.g z
ip
181.mc
179.a r
t
171.sw
im
301.a p
si
189.lu c
as
200.six
tr ack
30%
254.g a
p
rid
20%
178.g a
lgel
172.mg
168.wu
pwise
177.me
sa
173.a p
plu
183.eq
u ake
188.a m
mp
187.fac
erec
REALIZING SPEEDUPS (SWP DISABLED)
40%
SVM v. ORC
FEATURE SELECTION
Feature selection is a way to identify the
best features
Start with loads of features
Small feature sets are better
• Learning algorithms run faster
• Are less prone to overfitting the training data
• Useless features can confuse learning
algorithms
FEATURE SELECTION CONT.
MUTUAL INFORMATION SCORE
Measures the reduction of uncertainty in
one variable given knowledge of another
variable
Does not tell us how features interact with
each other
FEATURE SELECTION CONT.
GREEDY FEATURE SELECTION
Choose single best feature
Choose another feature, that together
with the best feature, improves
classification accuracy most …
FEATURE SELECTION
THE BEST FEATURES
Rank Mutual Information
Score
Greedy Feature Selection with
SVM
1.
# FP operations
# FP operations
0.59
2.
# Operands
Loop nest level
0.49
3.
Instruction fan-in in
DAG
# Operands
0.34
4.
# Live ranges
# Branches
0.20
5.
# Memory operations
# Memory
operations
0.13
RELATED WORK
Monsifrot et al., “A Machine Learning Approach to
Automatic Production of Compiler Heuristics.” 2002
Calder et al., “Evidence-Based Static Branch Prediction
Using Machine Learning.” 1997
Cavazos et al., “Inducing Heuristic to Decide Whether to
Schedule.” 2004
Moss et al., “Learning to Schedule Straight-Line Code.”
1997
Cooper et al., “Optimizing for Reduced Code Space using
Genetic Algorithms.” 1999
Puppin et al., “Adapting Convergent Scheduling using
Machine Learning.” 2003
Stephenson et al., “Meta Optimization: Improving Compiler
Heuristics with Machine Learning.” 2003
CONCLUSION
Supervised classification can effectively
find good heuristics
• Even for multi-class problems
• SVM and near neighbors perform well
• Potentially have big impact
Spent very little time tuning the learning
parameters
Let a machine learning algorithm tell us
which features are best
T H E
E N D
SOFTWARE PIPELINING
ORC has been tuned with SWP in mind
• Every major release of ORC has had a
different unrolling heuristic for SWP
• Currently 205 lines long
Can we learn a heuristic that outperforms
ORC’s SWP unrolling heuristic?
5%
0%
-5%
-10%
10%
y
171.s wim
256.bzip2
SVM v. ORC
300.twolf
176.gcc
253.perlb
mk
NN v. ORC
254.gap
175.vpr
164.gzip
255.vorte
x
197.pars e
r
181.mcf
186.craft
179.art
ise
15%
200.s ixtr a
ck
178.galg
el
172.mgrid
187.f ace
r ec
173.applu
301.aps i
ke
188.amm
p
183.equa
189.lucas
177.mes a
20%
168.wupw
Improvement over ORC;
REALIZING SPEEDUPS (SWP ENABLED)
25%
Oracle v. ORC
HURDLES
Compiler writer must extract features
Acquiring labels takes time
• Instrumentation library
• ~2 weeks to collect data
Predictions confined to training labels
Have to tweak learning algorithms
Noise