Open64-Intro

Transcript Open64-Intro

An Introduction to Open64
Compiler
Guang R. Gao (Capsl, Udel)
Xiaomi An (Capsl, Udel)
Curtsey to Fred Chow
Outline
• Background and Motivation
• Part I: An overview of the Open64 compiler
infrastructure and design principles
• Part II: Using Open64 in compiler research
& development
4/8/2015
Open64 Tutorial - An Introduction
2
What is The Original
Open64 (Pro64) ?
• A suite of optimizing compiler tools for
Intel IA-64, x86-64 on Linux systems
• C, C++ and Fortran90/95 compilers
• Conforming to the IA-64, x86-64 Linux
ABI and API standards
• Open to all researchers/developers in
the community
4/8/2015
Open64 Tutorial - An Introduction
3
Historical Perspectives
Stanford RISC
compiler research
Cydrome Cydra5
Compiler
1980-83
MIPS Ucode
Compiler (R2000)
Software
pipelining
1989
Global opt
under -O2
SGI Ragnarok
Floating-pt
Compiler (R8000) performance
1987
1994
MIPS Ucode
Compiler (R4000)
Loop opt
under -O3
1991
Stanford SUIF
Rice IPA
SGI MIPSpro
Compiler (R10000)
1997
Curtsey to Fred Chow
Pro64/Open64
Compiler (Itanium)
2000
Who Might Want to Use
Open64?
• Researchers : test new compiler
analysis and optimization algorithms
• Developers : retarget to another
architecture/system
• Educators : a compiler teaching
platform
4/8/2015
Open64 Tutorial - An Introduction
5
Who Are Using Open64 – from ACM CGO 2008 Open64 Workshop
Attendee List (April 6, 2008, Boston)
12 Companies: Google,
nVidia, IBM, HP, Qualcomm,
AMD, Tilera, PathScale,
SimpLight, Absoft,
Coherenet Logix, STMicro,
9 Education Institutes: USC,
Rice, U. of Delaware, U.
Houston, U. of Illinois,
Tsinghua, Fudan, ENS Lyon,
NRCC,
Abramson
Adhianto
Bagley
Boissinot
Cavazos
Chakrabarti
Chen
Chow
Danalis
Dasgupta
DE
Eachempati
Gan
Gao
Gottlieb
Guillon
Hundt
Jain
Ju
Linthicum
Ma
McIntosh
Murphy
Narayanan
Özender
Ramasamy
Rastello
Ributzka
Ryoo
Schafer
Wang
Ye
Zheng
Zhou
Zhou
Jeremy D
Laksono
Richard
Benoit
John
Gautam
Tong
Fred
Antonios G
Anshuman
Subrato K
Deepak
Ge
Guang
Robert
Christophe
Robert
Suneel
Roy
Charles A.
Yin
Nathaniel
Michael P
Kannan
M. Serhat
Vinodha
Fabrice
Juergen
Shane
Uriel
Xu
Handong
Bixia
Hucheng
Xing
Univ of Southern California
Rice University
AMD
ENS Lyon
University of Delaware
PathScale, LLC.
IBM
PathScale, LLC
University of Delaware
Qualcomm, Inc.
Qualcomm, Inc.
University of Houston
University of Delaware
University of Delaware
Tilera Corporation
STMicroelectronics
Google
Hewlett Packard
AMD
Qualcomm, Inc.
Absoft
Hewlett Packard Company
NVIDIA
Coherent Logix, Inc.
NRCC
Google, Inc.
LIP, ENS Lyon
University of Delaware
University of Illinois
Hewlett-Packard
University of Delaware
University of Delaware
AMD
Tsinghua University
Tsinghua University
Redondo Beach
Houston
Marlborough
Lyon
Newark
Sunnyvale
Yorktown Heights
Fremont
Newark
Austin
San Diego
Houston
Newark
Newark
Westborough
Grenoble
Palo Alto
Los Gatos
Sunnyvale
Austin
Rochester Hills
Lexington
Newark
Milpitas
Istanbul
Mountain View
Lyon
Newark
Urbana
Silver Spring
Newark
Newark
Sunnyvale
Beijing
Beijing
USA
USA
USA
FRANCE
USA
USA
USA
USA
USA
USA
USA
USA
USA
USA
USA
FRANCE
USA
USA
USA
USA
USA
USA
USA
USA
TURKEY
USA
FRANCE
USA
USA
USA
USA
USA
USA
CHINA
CHINA
Vision and Status of Open64 Today ?
• People should view it as GCC with an alternative
backend with great potential to reclaim the best
compiler in the world
• The technology incorporated all top compiler
optimization research in 90's
• It has regain momentum in the last three years
due to Pathscale and HP's investment in
robustness and performance
• Targeted to x86, Itanium in the public repository,
ARM, MIPS, PowerPC, and several other signal
processing CPU in private branches
4/8/2015
Open64 Tutorial - An Introduction
7
Overview of Open64
Infrastructure
• Logical compilation model and
component flow
• WHIRL Intermediate Representation
• Very High Optimizer
• Inter-Procedural Analysis (IPA)
• Loop Nest Optimizer (LNO) and
Parallelization
• Global optimization (WOPT)
• Code Generation (CG)
4/8/2015
Open64 Tutorial - An Introduction
8
Front end
Very High Optimizer
Interprocedural Analysis and Optimization
Good IR
Loop Nest Optimization and Parallelization
Global (Scalar) Optimization
Middle-End
Backend
Code Generation
4/8/2015
Open64 Tutorial - An Introduction
9
Front Ends
• C front end based on gcc
• C++ front end based on g++
• Fortran90/95 front end from MIPSpro
4/8/2015
Open64 Tutorial - An Introduction
10
Semantic Level of IR
At higher level:
Source program High
•More kinds of constructs
•Shorter code sequence
•More program info present
•Hierarchical constructs
•Cannot perform many optimizations
At lower level:
•Less program info
•Fewer kinds of constructs
Machine
instruction
Low
•Longer code sequence
•Flat constructs
•All optimizations can be performed
4/8/2015
Open64 Tutorial - An Introduction
12
Compilation Flow
4/8/2015
Open64 Tutorial - An Introduction
13
Very High WHIRL Optimizer
Lower to High WHIRL while performing optimizations
First part deals with common language constructs





Bit-field optimizations
Short-circuit boolean expressions
Switch statement optimization
Simple if-conversion
Assignments of small structs: lower struct copy to
assignments of individual fields
 Convert patterns of code sequences to intrinsics:
• Saturated subtract, abs()
 Other pattern-based optimizations
• max, min
4/8/2015
Open64 Tutorial - An Introduction
14
Roles of IPA
The only optimization component operating at program
scope
• Analysis: collect information from entire program
• Optimization: performs optimizations across procedure
boundaries
• Depends on later phases for full optimization effects
• Supplies cross-file information for later optimization
phases
4/8/2015
Open64 Tutorial - An Introduction
15
IPA Flow
4/8/2015
Open64 Tutorial - An Introduction
16
IPA Main Stage
Analysis
– alias analysis
– array section
– code layout
Optimization
– inlining
– cloning
– dead function and variable
elimination
– constant propagation
4/8/2015
Open64 Tutorial - An Introduction
17
Loop Nest Optimizations
1. Transformations for Data Cache
2. Transformations that help other
optimizations
3. Vectorization and Parallellization
LNO Transformations for Data
Cache
Cache blocking
 Transform loop to work on sub-matrices that fit in
cache
Loop interchange
Array Padding
 Reduce cache conflicts
Prefetches generation
 Hide the long latency of cache miss references
Loop fusion
Loop fission
4/8/2015
Open64 Tutorial - An Introduction
19
LNO Transformations that
Help Other Optimizations
Scalar Expansion / Array Expansion
 Reduce inter-loop dependencies, enable parallelization
Scalar Variable Renaming
 Less constraints for register allocation
Array Scalarization
 Improves register allocation
Hoist Messy Loop Bounds
Outer loop unrolling
Array Substitution (Forward and Backward)
Loop Unswitching
Hoist IF
Inter-iteration CSE
4/8/2015
Open64 Tutorial - An Introduction
20
LNO Parallelization
• SIMD code generation
– Highly dependent on the SIMD instructions in target
• Generate vector intrinsics
– Based on the library functions available
• Automatic parallelization
– Leverage OpenMP support in rest of backend
4/8/2015
Open64 Tutorial - An Introduction
21
Global Optimization Phase
• SSA is unifying technology
• Open-64 extension to SSA technology
– Representing aliases and indirect memory
operations (Chow et al, CC 96)
– Integrated partial redundancy elimination
(Chow et al, PLDI 97; Kennedy et al, CC 98, TOPLAS 99)
– Support for speculative code motion
– Register promotion via load and store
placement (Lo et al, PLDI 98)
4/8/2015
Open64 Tutorial - An Introduction
22
Overview
•
•
•
•
•
•
•
•
Works at function scope
Builds control flow graph
Performs alias analysis
Represents program in SSA form
SSA-based optimization algorithms
Co-operations among multiple phases to achieve final effects
Phase order designed to maximize effectiveness
Separated into Preopt and Mainopt
– Pre-opt serves as pre-optimizing front-ends for LNO and IPA
(in High WHIRL)
• Provide use-def info to LNO and IPA
• Provide alias info to CG
4/8/2015
Open64 Tutorial - An Introduction
23
Optimizations Performed
Pre-optimizer
 Goto conversion
 Loop normalization
 Induction variable
canonicalization
 Dead store elimination
 Copy propagation
 Dead code elimination
 Alias analysis (flow-free
and flow-sensitive)
 Compute def-use chains for
LNO and IPA
 Pass alias info to CG
4/8/2015
Main optimizer
 Partial redundancy elimination
based on SSAPRE framework
o Global common subexpression
o Loop invariant code motion
o Strength reduction
o Linear function test
replacement
 Value-number-based full
redundancy elimination
 Induction variable elimination
 Register promotion
 Bitwise dead store elimination
Open64 Tutorial - An Introduction
24
Feedback
Used throughout the compiler
• Instrumentation can be added at any stage
• VHO, LNO, WOPT, CG
• Explicit instrumentation data incorporated
where inserted
• Instrumentation data maintained and checked
for consistency through program
transformations.
4/8/2015
Open64 Tutorial - An Introduction
25
WHIRL
WHIRL-to-TOP
CGIR
Extended
basic block
optimization
Control
Flow
optimization
Hyperblock Formation
Critical Path Reduction
Inner Loop Opt
Software Pipelining
IGLS
GRA/LRA
Information from Front end
(alias, structure, etc.)
Smooth Info
flow into Backend
in Pro64
Code Emission
Executable
2015/4/8
\course\cpeg421-10s\Topic2a.ppt
26
Software Pipelining
vs
Normal Scheduling
a SWP-amenable
loop candidate ?
Yes
IGLS
Inner loop processing
software pipelining
GRA/LRA
Failure/not profitable
Success
2015/4/8
No
IGLS
Code Emission
\course\cpeg421-10s\Topic2a.ppt
27
Code Generation Intermediate
Representation (CGIR)
•
•
•
•
•
•
•
TOPs (Target Operations) are “quads”
Operands/results are TNs
Basic block nodes in control flow graph
Load/store architecture
Supports predication
Flags on TOPs (copy ops, integer add, load, etc.)
Flags on operands (TNs)
4/8/2015
Open64 Tutorial - An Introduction
28
From WHIRL to CGIR
Cont’d
• Information passed
– alias information
– loop information
– symbol table and maps
4/8/2015
Open64 Tutorial - An Introduction
29
The Target Information Table
(TARG_INFO)
Objective:
• Parameterized description of a target
machine and system architecture
• Separates architecture details from the
compiler’s algorithms
• Minimizes compiler changes when
targeting a new architecture
4/8/2015
Open64 Tutorial - An Introduction
30
WHIRL SSA:
A New Optimization Infrastructure for Open64
Parallel Processing Institute, Fudan University, Shanghai, China
Global Delivery China Center, Hewlett-Packard, Shanghai, China
4/8/2015
Open64 Tutorial - An Introduction
31
Goal
• A better “DU manager”
– Factored UD chain
• Reduced traversing overhead
– Keeping alias information
• Handle both direct and indirect access
• Eliminate ‘incomplete DU/UD chain’
• Easy to use
– STL-style iterator to traverse the DU/UD chain
• A flexible Infrastructure
– Available from H WHIRL to L WHIRL
– Lightweight, demand-driven
– Precise and updatable
4/8/2015
Open64 Tutorial - An Introduction
32
PHI Placement
• For SCF, φ nodes are mapped on the root WN
• For GOTO-LABEL, φ nodes are placed on the LABEL
IF
a3ß φ(a1,a2)
if(p1)
a1 ß
a 2ß
a3ß φ(a1,a2)
if(p1) a1ß
a1ß
while(p1) a2ß φ(a1,a3)
a3ß
a2ß
(a) IF
a2ß φ(a1,a3)
a 3ß
DO_WHILE
a2ß φ(a1,a3)
CMP
I1ß
DO_LOOP
I2ß φ(I1,I3)
I2ß φ(I1,I3)
BODY
while(p1)
p1
(c) DO_WHILE
4/8/2015
a3ß
p1
(b) WHILE_DO
INIT
a 1ß
WHILE_DO
a2ß φ(a1,a3)
a3ß
CMP
STEP I3=op(I2)
I
I1ß
I3ß op(I2)
BODY
(d) DO_LOOP
Open64 Tutorial - An Introduction
33
Thank you!
4/8/2015
Open64 Tutorial - An Introduction
34