Progressive Register Allocation for Irregular Architectures David Koes [email protected] Seth Copen Goldstein [email protected] March 23, 2005 2005 International Symposium on Code Generation and Optimization.

Download Report

Transcript Progressive Register Allocation for Irregular Architectures David Koes [email protected] Seth Copen Goldstein [email protected] March 23, 2005 2005 International Symposium on Code Generation and Optimization.

Progressive Register
Allocation for Irregular
Architectures
David Koes
[email protected]
Seth Copen Goldstein
[email protected]
March 23, 2005
2005 International Symposium on Code Generation and Optimization
Irregular Architectures
• Few registers
• Register usage restrictions
– address registers, hardwired registers...
• Memory operands
• Examples:
– x86, 68k, ColdFire,
ARM Thumb, MIPS16,
V800, various DSPs...
2
2005 International Symposium on Code Generation and Optimization
eax
ebx
ecx
edx
esi
edi
esp
ebp
Fewer Registers  More Spills
Percent
Percent of functions that spill
50
45
40
35
30
25
20
15
10
5
0
PPC (32)
3
68k (16)
x86 (8)
2005 International Symposium on Code Generation and Optimization
• Used gcc to compile
>10,000 functions
from Mediabench,
Spec95, Spec2000,
and microbenchmarks
• Recorded which
functions spilled
Register Usage Restrictions
• Instructions may prefer or require a specific
subset of registers
– x86 multiply instruction
imul %edx,%eax // 2 byte instruction
imul %edx,%ecx // 3 byte instruction
– x86 divide instruction
idivl %ecx // eax = edx:eax/ecx
4
2005 International Symposium on Code Generation and Optimization
Memory Operands
• Load/store not always needed to access
variables allocated to memory
– depends upon instruction
– still less efficient than register access
addl 8(%ebp), %eax
vs
movl 8(%ebp), %edx
addl %edx, %eax
5
2005 International Symposium on Code Generation and Optimization
Register Allocation Challenges
• Optimize spill code
– with few registers, spilling unavoidable
• Model register usage restrictions
• Exploit memory operands
– affects spilling decisions
6
2005 International Symposium on Code Generation and Optimization
Previous Work
Method
Models
Irregular
Features
Fast
Optimal
/
/
Graph Coloring
Integer Programming
[Goodwin and Wilken 96]
[Kong and Wilken 98]
[Fu and Wilken 2002]
Separated IP
[Appel and George 01]
PBQP
[Scholz and Eckstein 02]
7
2005 International Symposium on Code Generation and Optimization
Our Goals
• Expressive
– Explicitly represent architectural irregularities
and costs
• Proper model
– An optimum solution results in optimal
register allocation
• Progressive solution algorithm
– more computation  better solution
– decent feasible solution obtained quickly
– competitive with current allocators
8
2005 International Symposium on Code Generation and Optimization
Multicommodity Network Flow (MCNF)
source
a
b
crossbar
2
4
2
4
4
4
2
instruction
sink
9
a
2005 International Symposium on Code Generation and Optimization
b
Modeling Usage Constraints
int foo(int a, int b, int c)
{
a = a*b;
a
return a/c;
}
-1
imul
eax
b
1
edx
ecx
mem
c
b
1
idiv
not quite right…
10
eax
a
2005 International Symposium on Code Generation and Optimization
edx
ecx
mem
c
Modeling Spills and Moves
int foo(int a, int b, int c)
{
a = a*b;
return a/c;
}
eax
edx
3
eax
a
b
ecx
3
edx
mem
3
ecx
mem
1
1
-1
idiv
imul eax
eax
a
11
2005 International Symposium on Code Generation and Optimization
edx
edx
ecx
ecx
b
mem
mem
c
c
Modeling Stores
• Simple approach flawed
– doesn’t model memory
persistency
• Solution: antivariables
– flow only through memory
– eviction cost = store cost
– evict only once
12
2005 International Symposium on Code Generation and Optimization
Register Allocation as MCNF
•
•
•
•
•
•
•
13
Variables  Commodities
Variable Usage  Network Design
Nodes  Allocation Classes (Reg/Mem)
Registers Limits  Node Capacities
Spill Costs  Edge Costs
Variable Definition  Source
Variable Last Use  Sink
2005 International Symposium on Code Generation and Optimization
Solving an MCNF
• Integer solution NP-complete
• Use standard IP solvers
– commercial solvers (CPLEX) are impressive
• Exploit structure of problem
– variety of MCNF specific solvers
• empirically faster than IP solvers
• Lagrangian Relaxation technique
14
2005 International Symposium on Code Generation and Optimization
Lagrangian Relaxation: Intuition
• Relaxes the hard constraints
– only have to solve single commodity flow
• Combines easy subproblems using a
Lagrangian multiplier
– an additional price on each edge
a
Example:
b
a
b
edges have unit capacity
with price, solution to single
commodity flow can be
solution to multicommodity flow
1
a
15
2005 International Symposium on Code Generation and Optimization
0
b
1
a
0+1
b
Solution Procedure
• Compute prices using iterative
subgradient optimization
– converge to optimal prices
• At each iteration, greedily construct a
feasible solution using current prices
– allocate most expensive vars first
– can always find an allocation
16
2005 International Symposium on Code Generation and Optimization
Solution Procedure
• Advantages
+ have feasible solution at each step
+ iterative nature  progressive
+ Lagrangian relaxation theory provides
means for computing a lower bound
+ Can compute optimality bound
• Disadvantages
– No guarantee of optimality of solution
17
2005 International Symposium on Code Generation and Optimization
Evaluation
• Replace gcc’s local allocator
• Optimize for code size
– easy to statically evaluate
• Evaluate on MediaBench, MiBench,
SpecInt95, SpecInt2000
– consider only blocks where local allocation is
interesting (enough variables to spill)
18
2005 International Symposium on Code Generation and Optimization
Behavior of Solver
19
2005 International Symposium on Code Generation and Optimization
Proven Optimality
100%
90%
80%
70%
>25%
60%
Within 20%
Within 15%
50%
Within 10%
Within 5%
40%
Optimal
30%
20%
10%
5-10
10-15
conflicts
conflicts
(355 blocks) (23 blocks)
20
2005 International Symposium on Code Generation and Optimization
15-20
conflicts
(7 blocks)
1000 Iters
100 Iters
10 Iters
1 Iter
1000 Iters
100 Iters
10 Iters
1 Iter
1000 Iters
100 Iters
10 Iters
1 Iter
1000 Iters
100 Iters
10 Iters
1 Iter
0%
>= 20
conflicts
(5 blocks)
Comprehensive Results
20.00%
Improvement over gcc
15.00%
10.00%
5.00%
0.00%
-5.00%
artifact of
interaction
with gcc
-10.00%
5-10 co nflicts
(355 blo ck s)
21
10-15 co nflicts
(23 blo ck s )
2005 International Symposium on Code Generation and Optimization
15-20 co nflicts
(7 blo ck s )
1000 Iters
100 Iters
10 Iters
1 Iter
1000 Iters
100 Iters
10 Iters
1 Iter
1000 Iters
100 Iters
10 Iters
1 Iter
1000 Iters
100 Iters
10 Iters
1 Iter
-15.00%
>= 20 co nflicts
(5 blo ck s )
Progressive Nature
:-(
22
2005 International Symposium on Code Generation and Optimization
Contributions
• New MCNF model for register allocation
+ expressive, can model irregular architectures
+ can be solved using conventional ILP solvers
• Progressive solution procedure
+
+
+
–
23
decent initial solution
maintains feasible solution
improves solution over time
no optimality guarantees
2005 International Symposium on Code Generation and Optimization