Progressive Register Allocation for Irregular Architectures David Koes [email protected] Seth Copen Goldstein [email protected] March 23, 2005 2005 International Symposium on Code Generation and Optimization.
Download ReportTranscript Progressive Register Allocation for Irregular Architectures David Koes [email protected] Seth Copen Goldstein [email protected] March 23, 2005 2005 International Symposium on Code Generation and Optimization.
Progressive Register Allocation for Irregular Architectures David Koes [email protected] Seth Copen Goldstein [email protected] March 23, 2005 2005 International Symposium on Code Generation and Optimization Irregular Architectures • Few registers • Register usage restrictions – address registers, hardwired registers... • Memory operands • Examples: – x86, 68k, ColdFire, ARM Thumb, MIPS16, V800, various DSPs... 2 2005 International Symposium on Code Generation and Optimization eax ebx ecx edx esi edi esp ebp Fewer Registers More Spills Percent Percent of functions that spill 50 45 40 35 30 25 20 15 10 5 0 PPC (32) 3 68k (16) x86 (8) 2005 International Symposium on Code Generation and Optimization • Used gcc to compile >10,000 functions from Mediabench, Spec95, Spec2000, and microbenchmarks • Recorded which functions spilled Register Usage Restrictions • Instructions may prefer or require a specific subset of registers – x86 multiply instruction imul %edx,%eax // 2 byte instruction imul %edx,%ecx // 3 byte instruction – x86 divide instruction idivl %ecx // eax = edx:eax/ecx 4 2005 International Symposium on Code Generation and Optimization Memory Operands • Load/store not always needed to access variables allocated to memory – depends upon instruction – still less efficient than register access addl 8(%ebp), %eax vs movl 8(%ebp), %edx addl %edx, %eax 5 2005 International Symposium on Code Generation and Optimization Register Allocation Challenges • Optimize spill code – with few registers, spilling unavoidable • Model register usage restrictions • Exploit memory operands – affects spilling decisions 6 2005 International Symposium on Code Generation and Optimization Previous Work Method Models Irregular Features Fast Optimal / / Graph Coloring Integer Programming [Goodwin and Wilken 96] [Kong and Wilken 98] [Fu and Wilken 2002] Separated IP [Appel and George 01] PBQP [Scholz and Eckstein 02] 7 2005 International Symposium on Code Generation and Optimization Our Goals • Expressive – Explicitly represent architectural irregularities and costs • Proper model – An optimum solution results in optimal register allocation • Progressive solution algorithm – more computation better solution – decent feasible solution obtained quickly – competitive with current allocators 8 2005 International Symposium on Code Generation and Optimization Multicommodity Network Flow (MCNF) source a b crossbar 2 4 2 4 4 4 2 instruction sink 9 a 2005 International Symposium on Code Generation and Optimization b Modeling Usage Constraints int foo(int a, int b, int c) { a = a*b; a return a/c; } -1 imul eax b 1 edx ecx mem c b 1 idiv not quite right… 10 eax a 2005 International Symposium on Code Generation and Optimization edx ecx mem c Modeling Spills and Moves int foo(int a, int b, int c) { a = a*b; return a/c; } eax edx 3 eax a b ecx 3 edx mem 3 ecx mem 1 1 -1 idiv imul eax eax a 11 2005 International Symposium on Code Generation and Optimization edx edx ecx ecx b mem mem c c Modeling Stores • Simple approach flawed – doesn’t model memory persistency • Solution: antivariables – flow only through memory – eviction cost = store cost – evict only once 12 2005 International Symposium on Code Generation and Optimization Register Allocation as MCNF • • • • • • • 13 Variables Commodities Variable Usage Network Design Nodes Allocation Classes (Reg/Mem) Registers Limits Node Capacities Spill Costs Edge Costs Variable Definition Source Variable Last Use Sink 2005 International Symposium on Code Generation and Optimization Solving an MCNF • Integer solution NP-complete • Use standard IP solvers – commercial solvers (CPLEX) are impressive • Exploit structure of problem – variety of MCNF specific solvers • empirically faster than IP solvers • Lagrangian Relaxation technique 14 2005 International Symposium on Code Generation and Optimization Lagrangian Relaxation: Intuition • Relaxes the hard constraints – only have to solve single commodity flow • Combines easy subproblems using a Lagrangian multiplier – an additional price on each edge a Example: b a b edges have unit capacity with price, solution to single commodity flow can be solution to multicommodity flow 1 a 15 2005 International Symposium on Code Generation and Optimization 0 b 1 a 0+1 b Solution Procedure • Compute prices using iterative subgradient optimization – converge to optimal prices • At each iteration, greedily construct a feasible solution using current prices – allocate most expensive vars first – can always find an allocation 16 2005 International Symposium on Code Generation and Optimization Solution Procedure • Advantages + have feasible solution at each step + iterative nature progressive + Lagrangian relaxation theory provides means for computing a lower bound + Can compute optimality bound • Disadvantages – No guarantee of optimality of solution 17 2005 International Symposium on Code Generation and Optimization Evaluation • Replace gcc’s local allocator • Optimize for code size – easy to statically evaluate • Evaluate on MediaBench, MiBench, SpecInt95, SpecInt2000 – consider only blocks where local allocation is interesting (enough variables to spill) 18 2005 International Symposium on Code Generation and Optimization Behavior of Solver 19 2005 International Symposium on Code Generation and Optimization Proven Optimality 100% 90% 80% 70% >25% 60% Within 20% Within 15% 50% Within 10% Within 5% 40% Optimal 30% 20% 10% 5-10 10-15 conflicts conflicts (355 blocks) (23 blocks) 20 2005 International Symposium on Code Generation and Optimization 15-20 conflicts (7 blocks) 1000 Iters 100 Iters 10 Iters 1 Iter 1000 Iters 100 Iters 10 Iters 1 Iter 1000 Iters 100 Iters 10 Iters 1 Iter 1000 Iters 100 Iters 10 Iters 1 Iter 0% >= 20 conflicts (5 blocks) Comprehensive Results 20.00% Improvement over gcc 15.00% 10.00% 5.00% 0.00% -5.00% artifact of interaction with gcc -10.00% 5-10 co nflicts (355 blo ck s) 21 10-15 co nflicts (23 blo ck s ) 2005 International Symposium on Code Generation and Optimization 15-20 co nflicts (7 blo ck s ) 1000 Iters 100 Iters 10 Iters 1 Iter 1000 Iters 100 Iters 10 Iters 1 Iter 1000 Iters 100 Iters 10 Iters 1 Iter 1000 Iters 100 Iters 10 Iters 1 Iter -15.00% >= 20 co nflicts (5 blo ck s ) Progressive Nature :-( 22 2005 International Symposium on Code Generation and Optimization Contributions • New MCNF model for register allocation + expressive, can model irregular architectures + can be solved using conventional ILP solvers • Progressive solution procedure + + + – 23 decent initial solution maintains feasible solution improves solution over time no optimality guarantees 2005 International Symposium on Code Generation and Optimization