TOPIC N: Overview of The Pro64 Code Generator (slides by Gao, Dehnert and Amaral)

Download Report

Transcript TOPIC N: Overview of The Pro64 Code Generator (slides by Gao, Dehnert and Amaral)

TOPIC N:
Overview of
The Pro64 Code Generator
(slides by Gao, Dehnert and Amaral)
Outline
•
•
•
•
The code generator flow diagram
Hyperblock formation and predication (HBF)
Predicate Query System (PQS)
Loop preparation (CGPREP) and software
pipelining
• Global and local instruction scheduling (IGLS)
• Global and local register allocation (GRA, LRA)
• WHIRL/CGIR and TARG-INFO
11/6/2015
PACT2000 Tutorial: Open64
2
Flowchart of Code Generator
WHIRL
WHIRL-to-TOP Lowering
EBO:
Extended
basic block
optimization
peephole,
etc.
PQS:
Predicate
Query
System
11/6/2015
Control Flow Opt II
EBO
CGIR: Quad Op List
Control Flow Opt I
EBO
IGLS: pre-pass
GRA, LRA, EBO
IGLS: post-pass
Control Flow Opt
Hyperblock Formation
Critical-Path Reduction
Process Inner Loops: unrolling,
EBO
Loop prep, software pipelining
PACT2000 Tutorial: Open64
Code Emission
3
Hyperblock Formation and
Predicated Execution
• Hyperblock single-entry multiple-exit
control-flow region:
– loop body, hammock region, etc.
• Hyperblock formation algorithm
– Based on Scott Mahlke’s method [Mahlke96]
– But, capable of performing “conditional tail
duplication” based on heuristics to eliminate
side-effects (such as code duplication)
11/6/2015
PACT2000 Tutorial: Open64
4
Hyperblock Formation Algorithm
Region
Identification
Block
Selection
•
•
•
•
•
Hammock regions
Innermost loops
General regions (sequence based)
Paths sorted by priorities
Inclusion of a path is guided by its impact on
resources, scheduling height, and priority level
Tail
Duplication
If
Conversion
• Internal branches are removed via predication
• Predicate reuse
• Side exits
Objective: Keep the scheduling height close to that of the highest priority path.
11/6/2015
PACT2000 Tutorial: Open64
5
Features of the Pro64 Hyperblock
Formation Algorithm
•
•
•
•
Form “good” vs. “maximal” hyperblocks
Conditional code duplication
Reduce unnecessary duplication
Seamless integration of HBF with global
scheduling - an integrated part of IGLS
• Avoid unnecessary reverse if-conversion
11/6/2015
PACT2000 Tutorial: Open64
6
Hyperblock Formation - An Example
1
1
1
4,5
2
6,7
8
aa = a[i];
bb = b[i];
switch (aa) {
case 1:
if (aa < tabsiz)
aa = tab[aa];
case 2:
if (bb < tabsiz)
bb = tab[bb];
default:
ans = aa + bb;
4
4
2
5
5
7’
7
7
8
H1
(a) Source
(b) CFG
PACT2000 Tutorial: Open64
6’
6
6
8
11/6/2015
2
8’
H2
(c) Hyperblock formation
with aggressive tail
duplication
7
Hyperblock Formation - An Example
Cont’d
1
1
1
4
4
2
2
4
2
H1
5
5
6’
6
5
6
6
7’
7
7
8’
8
8
(a) CFG
11/6/2015
H1
H2
(b) Hyperblock formation
with aggressive tail
duplication
PACT2000 Tutorial: Open64
7
H2
8
(c) Pro64 hyperblock
formation
8
Predicate Query System (PQS)
• Purpose: gather information and provide
interfaces allowing other phases to make queries
regarding the relationships among predicate
values
• PQS functions (examples)
BOOL PQSCG_is_disjoint (PQS_TN tn1, PQS_TN tn2)
BOOL PQSCG_is_subset (PQS_TN_SET& tns1, PQS_TN_SET& tns2)
• Efficiency: O(log n), where n is the number of
ancestor temporaries (TNs).
11/6/2015
PACT2000 Tutorial: Open64
9
Loop Preparation and Optimization
for Software Pipelining
•
•
•
•
•
•
11/6/2015
Loop canonicalization for SWP
Read/Write removal (register aware)
Loop unrolling (resource aware)
Recurrence removal
Prefetch (several different types)
Forced if-conversion
PACT2000 Tutorial: Open64
10
Pro64 Software Pipelining
Method Overview
• Only apply to SWP-amenable loops
• Extensive loop preparation and optimization
before application [DehnertTowle93]
• Use lifetime sensitive SWP algorithm [Huff93]
• Register allocation after scheduling based on
Cydra 5 [RLTS92, DeTo93]
• Handle both while and do loops
• Smooth switching to normal scheduling if not
successful.
11/6/2015
PACT2000 Tutorial: Open64
11
Pro64 Lifetime-Sensitive Modulo
Scheduling for Software Pipelining
Features
• Try to place an op ASAP
or ALAP to minimize
register pressure
• Slack scheduling
• Limited backtracking
• Operation-driven
scheduling framework
Compute Estart/Lstart for
all unplaced ops
Choose a good op to place into
the current partial schedule
within its Estart/Lstart range
yes
Succeed
no
Eject conflicting Ops
11/6/2015
PACT2000 Tutorial: Open64
Register
allocate
done
12
Integrated Global Local
Scheduling (IGLS) Method
• The basic IGLS framework integrates
global code motion (GCM) with local
scheduling [MantripragadaJainDehnert98]
• IGLS extended to hyperblock scheduling
• Performs profitable code motion between
hyperblock regions and normal regions
11/6/2015
PACT2000 Tutorial: Open64
13
IGLS Phase Flow Diagram
Hyperblock Scheduling
(HBS)
Block Priority Selection
Global Code Motion
(GCM)
Motion Selection
Target Selection
Local Code Scheduling
(LCS)
11/6/2015
PACT2000 Tutorial: Open64
14
Advantages of the Extended IGLS
Method - The Example Revisited
1
• Advantages:
– No rigid
boundaries
between
hyperblocks and
non-hyperblocks
– GCM moves
code into and out
of a hyperblock
according to
profitability
1
4
4
2
H1
5
5
6
6
7
7
H2
H1
8
(a) Pro64 hyperblock
11/6/2015
2
PACT2000 Tutorial: Open64
8
8’
H2
(b) Aggressive
duplication
15
Software Pipelining
vs
Normal Scheduling
Yes
a SWP-amenable
loop candidate ?
IGLS
Inner loop processing
software pipelining
GRA/LRA
Failure/not profitable
Success
11/6/2015
No
IGLS
Code Emission
PACT2000 Tutorial: Open64
16
WHIRL
•
•
•
•
Abstract syntax tree based
Base representation is simple and efficient
Used through several phases with lowering
Designed for multiple target architectures
• Use symbol table and maps
11/6/2015
PACT2000 Tutorial: Open64
17
Code Generation Intermediate
Representation (CGIR)
•
•
•
•
•
•
11/6/2015
Conventional and simple
Load/store architecture
Predication
Flags on ops (copy ops, integer add, load, etc.)
Flags on operands (TNs)
Structured as basic blocks
PACT2000 Tutorial: Open64
18
Global and Local
Register Allocation
(GRA/LRA)
• LRA-RQ provides an
estimate of local register
requirements
• Allocates global variables
using a priority-based
register allocator
[ChowHennessy90,Chow83,
Briggs92]
• Incorporates IA-64 specific
extensions, e.g. register
stack usage
11/6/2015
PACT2000 Tutorial: Open64
From prepass IGLS
GRA
LRA Register Request
LRA-RQ
Priority Based Register
Allocation
with
IA-64 Extensions
LRA
To postpass IGLS
19
Pro64 Priority-Based
Register Allocator
Create_LRANGE (live range set)
GRA-Create
Create_Live_BB_Sets (for each live range, find out
blocks in which the live range is live)
Create_Interference_Graph (backward walk-through
to find out live ranges live simultaneously)
Simplify (form a stack composed of LRs which will be
colored from top to bottom)
GRA-Color
Choose_Register or GRA_Note_Spill
GRA-Spill
11/6/2015
Spill (Spill and optimize spill-code placement)
PACT2000 Tutorial: Open64
20
Local Register Allocation
(LRA)
• Assign_registers
using reverse linear
scan with priority
assignment
Assign_Registers
failed
succeed
Fix_LRA
• Reordering: depthfirst ordering on
the DDG
11/6/2015
first
time
Instruction
reordering
PACT2000 Tutorial: Open64
Spill global
spill local
21
From WHIRL to CGIR
An Example
i
T1 = sp + &a;
T2 = ld T1
T3 = sp + &i;
T4 = ld T3
T5 = sxt T4
T6 = T5 << 2
T7 = T6
T8 = T 2 + T7
T9 = ld T8
T10 = sp + &aa
:= st T10 T9
(b) WHIRL
(c) CGIR
ST aa
int *a;
int i;
int aa;
aa = a[i];
LD
+
a
*
CVTL32
(a) Source
11/6/2015
4
PACT2000 Tutorial: Open64
22
From WHIRL to CGIR
Cont’d
• Information passed
– alias information
– loop information
– symbol table and maps
11/6/2015
PACT2000 Tutorial: Open64
23
The Target Information Table
(TARG_INFO)
Objective:
• Parameterized description of a target machine
and system architecture
• Separates architecture details from the
compiler’s algorithms
• Minimizes compiler changes when targeting a
new architecture
11/6/2015
PACT2000 Tutorial: Open64
24
The Target Information Table
(TARG_INFO)
Con’d
• Based on an extension of Cydra tables,
with major improvements
• Architecture models have already
targeted:
–
–
–
–
11/6/2015
Whole MIPS family
IA-64
IA-32
SGI graphics processors (earlier version)
PACT2000 Tutorial: Open64
25