PREDICTING UNROLL FACTORS USING SUPERVISED LEARNING Mark Stephenson & Saman Amarasinghe

Download Report

Transcript PREDICTING UNROLL FACTORS USING SUPERVISED LEARNING Mark Stephenson & Saman Amarasinghe

PREDICTING UNROLL FACTORS USING
SUPERVISED LEARNING
Mark Stephenson & Saman Amarasinghe
Massachusetts Institute of Technology
Computer Science and Artificial Intelligence Lab
INTRODUCTION & MOTIVATION
 Compiler heuristics rely on detailed
knowledge of the system
 Compiler interactions not understood
 Architectures are complex
Features
Superscalar
Pentium®
(3M)

Pentium 4
(55M)

Hyperthreading

Speculative
execution

Improved FPU

HEURISTIC DESIGN
 Current approach to heuristic
development is somewhat ad hoc
 Can compiler writers learn anything from
baseball?
• Is it feasible to deal with empirical data?
• Can we use statistics and machine learning
to build heuristics?
CASE STUDY
 Loop unrolling
• Code expansion can degrade performance
• Increased live ranges, register pressure
• A myriad of interactions with other passes
 Requires categorization into multiple
classes
• i.e., what’s the unroll factor?
ORC’S HEURISTIC (UNKNOWN TRIPCOUNT)
if (trip_count_tn == NULL) {
UINT32 ntimes = MAX(1, OPT_unroll_times-1);
INT32 body_len = BB_length(head);
while (ntimes > 1 && ntimes * body_len > CG_LOOP_unrolled_size_max)
ntimes--;
Set_unroll_factor(ntimes);
} else {
…
}
ORC’S HEURISTIC (KNOWN TRIPCOUNT)
} else {
BOOL const_trip = TN_is_constant(trip_count_tn);
INT32 const_trip_count = const_trip ? TN_value(trip_count_tn) : 0;
INT32 body_len = BB_length(head);
}
CG_LOOP_unroll_min_trip = MAX(CG_LOOP_unroll_min_trip, 1);
if (const_trip && CG_LOOP_unroll_fully &&
(body_len * const_trip_count <= CG_LOOP_unrolled_size_max ||
CG_LOOP_unrolled_size_max == 0 &&
CG_LOOP_unroll_times_max >= const_trip_count)) {
Set_unroll_fully();
Set_unroll_factor(const_trip_count);
} else {
UINT32 ntimes = OPT_unroll_times;
ntimes = MIN(ntimes, CG_LOOP_unroll_times_max);
if (!is_power_of_two(ntimes)) {
ntimes = 1 << log2(ntimes);
}
while (ntimes > 1 && ntimes * body_len > CG_LOOP_unrolled_size_max)
ntimes /= 2;
if (const_trip) {
while (ntimes > 1 && const_trip_count < 2 * ntimes)
ntimes /= 2;
}
Set_unroll_factor(ntimes);
}
SUPERVISED LEARNING
 Supervised learning algorithms try to find
a function F(X) → Y
• X : vector of characteristics that define a loop
• Y : empirically found best unroll factor
1
2
3
4
5
6
7
8
Loops
F(X)
Unroll Factors
EXTRACTING THE DATA
 Extract features
• Most features readily available in ORC
• Kitchen sink approach
 Finding the labels (best unroll factors)
• Added instrumentation pass
 Assembly instructions inserted to time loops
 Calls to a library at all exit points
• Compile and run at all unroll factors (1.. 8)
 For each loop, choose the best one as the label
LEARNING ALGORITHMS
 Prototyped in Matlab
 Two learning algorithms classified our
data set well
• Near neighbors
• Support Vector Machine (SVM)
 Both algorithms classify quickly
• Train at the factory
• No increase in compilation time
# FP operations
NEAR NEIGHBORS
# branches
unroll
don’t unroll
SUPPORT VECTOR MACHINES
 Map the original feature space into a
higher-dimensional space (using a kernel)
 Find a hyperplane that maximally
separates the data

# FP operations
# FP operations
SUPPORT VECTOR MACHINES
# branches2
# branches
unroll
don’t unroll
PREDICTION ACCURACY
 Leave-one-out cross validation
 Filter out ambiguous training examples
• Only keep obviously better examples (1.05x)
• Throw away obviously noisy examples
Accuracy
NN
62%
SVM
65%
ORC
16%
Speedup over ORC.
10%
0%
-10%
afty
256.b z
ip 2
176.g c
c
rtex
253.p e
r lbmk
NN v. ORC
Oracle v. ORC
255.vo
175.vp
r
300.tw
olf
197.p a
rser
186.cr
f
164.g z
ip
181.mc
179.a r
t
171.sw
im
301.a p
si
189.lu c
as
200.six
tr ack
30%
254.g a
p
rid
20%
178.g a
lgel
172.mg
168.wu
pwise
177.me
sa
173.a p
plu
183.eq
u ake
188.a m
mp
187.fac
erec
REALIZING SPEEDUPS (SWP DISABLED)
40%
SVM v. ORC
FEATURE SELECTION
 Feature selection is a way to identify the
best features
 Start with loads of features
 Small feature sets are better
• Learning algorithms run faster
• Are less prone to overfitting the training data
• Useless features can confuse learning
algorithms
FEATURE SELECTION CONT.
MUTUAL INFORMATION SCORE
 Measures the reduction of uncertainty in
one variable given knowledge of another
variable
 Does not tell us how features interact with
each other
FEATURE SELECTION CONT.
GREEDY FEATURE SELECTION
 Choose single best feature
 Choose another feature, that together
with the best feature, improves
classification accuracy most …
FEATURE SELECTION
THE BEST FEATURES
Rank Mutual Information
Score
Greedy Feature Selection with
SVM
1.
# FP operations
# FP operations
0.59
2.
# Operands
Loop nest level
0.49
3.
Instruction fan-in in
DAG
# Operands
0.34
4.
# Live ranges
# Branches
0.20
5.
# Memory operations
# Memory
operations
0.13
RELATED WORK
 Monsifrot et al., “A Machine Learning Approach to
Automatic Production of Compiler Heuristics.” 2002
 Calder et al., “Evidence-Based Static Branch Prediction
Using Machine Learning.” 1997
 Cavazos et al., “Inducing Heuristic to Decide Whether to
Schedule.” 2004
 Moss et al., “Learning to Schedule Straight-Line Code.”
1997
 Cooper et al., “Optimizing for Reduced Code Space using
Genetic Algorithms.” 1999
 Puppin et al., “Adapting Convergent Scheduling using
Machine Learning.” 2003
 Stephenson et al., “Meta Optimization: Improving Compiler
Heuristics with Machine Learning.” 2003
CONCLUSION
 Supervised classification can effectively
find good heuristics
• Even for multi-class problems
• SVM and near neighbors perform well
• Potentially have big impact
 Spent very little time tuning the learning
parameters
 Let a machine learning algorithm tell us
which features are best
T H E
E N D
SOFTWARE PIPELINING
 ORC has been tuned with SWP in mind
• Every major release of ORC has had a
different unrolling heuristic for SWP
• Currently 205 lines long
 Can we learn a heuristic that outperforms
ORC’s SWP unrolling heuristic?
5%
0%
-5%
-10%
10%
y
171.s wim
256.bzip2
SVM v. ORC
300.twolf
176.gcc
253.perlb
mk
NN v. ORC
254.gap
175.vpr
164.gzip
255.vorte
x
197.pars e
r
181.mcf
186.craft
179.art
ise
15%
200.s ixtr a
ck
178.galg
el
172.mgrid
187.f ace
r ec
173.applu
301.aps i
ke
188.amm
p
183.equa
189.lucas
177.mes a
20%
168.wupw
Improvement over ORC;
REALIZING SPEEDUPS (SWP ENABLED)
25%
Oracle v. ORC
HURDLES
 Compiler writer must extract features
 Acquiring labels takes time
• Instrumentation library
• ~2 weeks to collect data
 Predictions confined to training labels
 Have to tweak learning algorithms
 Noise