Transcript Slide 1

Demystifying GCC
Morgan Deters
Ron K. Cytron
[email protected]
[email protected]
Distributed Object Computing Laboratory
Washington University
St. Louis, MO
Copyright is held by the author/owner(s).
OOPSLA'06, October 22–26, 2006, Portland, Oregon, USA.
2006 ACM 06/0010.
Copyright © 2005–2006 Morgan Deters
Under the Hood of the
GNU Compiler Collection
Demystifying GCC
Morgan Deters and Ron K. Cytron
Tutorial Objectives
• Introduce the internals of GCC 4.1.1
 Java and C++ front-ends
 Optimizations
 Back-end structure
• How to add new or change
 language front-ends
 optimizations
 machine-specific back-ends
• How to debug/improve GCC
OOPSLA 2006
Portland, Oregon
Tutorial Objectives
2
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
GCC Big Picture
What is GCC?
Why use GCC?
What does compilation with GCC look like?
OOPSLA 2006
Portland, Oregon
3
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
What is GCC ?
• A compiler for multiple languages…






C
C++
Java
Objective-C/C++
FORTRAN
Ada
OOPSLA 2006
Portland, Oregon
GCC Big Picture
4
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
What is GCC ?
• …supporting multiple targets
arc
c4x
frv
iq2000
m68k
mn10300
rs6000
stormy16
arm
cris
h8300
m32c
mcore
mt
s390
v850
avr
crx
i386
m32r
mips
pa
sh
vax
bfin
fr30
ia64
m68hc11
mmix
pdp11
sparc
xtensa
These are code generators; variants are also supported
(e.g. powerpc is a “variant” of the rs6000 code generator)
OOPSLA 2006
Portland, Oregon
GCC Big Picture
5
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
What GCC is not
• GCC is not




an assembler (see GNU binutils)
a C library (see glibc)
a debugger (see gdb)
an IDE
OOPSLA 2006
Portland, Oregon
GCC Big Picture
6
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Advantages of using GCC
as an R&D platform
• Research is immediately usable by everyone
 Large development community and user base
 GCC is a modern, practical compiler
• multiple architectures, full standard languages, optimizations
• debugging support
• You can meet GCC halfway
 modular: hack some parts, rely on the others
• Can incorporate bug fixes that come along
 minor version upgrades (e.g. 3.3.x  3.4.x) – no big deal
 major version upgrades (e.g. 3.x  4.x) – more of a pain
• Need not maintain code indefinitely (if incorporated)
OOPSLA 2006
Portland, Oregon
GCC Big Picture
7
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
The GCC project and the GPL
• Open-source
 covered by GNU General Public License (GPL)
• Any changes you make to GCC source code or
associated libraries must also be GPLed
• However, compiler and libraries can be
used/linked against in non-GPL development
Your improvements to GCC must be open-source,
but your customers need not open-source their
programs to use your stuff
OOPSLA 2006
Portland, Oregon
GCC Big Picture
8
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Typical structure of GCC compilation
source
program
gcc/g++/gcj
compiler
assembly
program
assembler
linker
ELF object
OOPSLA 2006
Portland, Oregon
GCC Big Picture
9
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Inside the compiler
compiler (C, C++, Java)
parser /
semantic
checker
tree
optimizations
gimplifier
RTL passes
expander
trees
OOPSLA 2006
Portland, Oregon
target arch
instruction selection
RTL
GCC Big Picture
10
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
GCC Basics
How do you build GCC?
How do you navigate the source tree?
OOPSLA 2006
Portland, Oregon
11
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
GCC Basics: Getting Started
• Requirements to build GCC
 usual suite of UNIX tools (C compiler, assembler/linker,
GNU Make, tar, awk, POSIX shell)
• For development




GNU m4 and GNU autotools (autoconf/automake/libtool)
gperf
bison, flex
autogen, guile, gettext, perl, Texinfo, diffutils, patch, …
• Obtaining GCC sources
 gcc.gnu.org or local mirror (see gcc.gnu.org/mirrors.html)
 get gcc-core package, then language add-ons
• gcc-java requires gcc-g++
OOPSLA 2006
Portland, Oregon
GCC Basics
12
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Building GCC from sources
• Configure it in a separate build directory from sources




/path/to/source/directory/configure options…
--prefix=install-location
--enable-languages=comma-separated-language-list
--enable-checking
• turns on sanity checks (especially on intermediate representation)
• Build it !
 Environment variables useful when debugging compiler/runtime
•
•
•
•
CFLAGS
stage 1 flags (using host C compiler)
BOOT_CFLAGS
stage 2 and stage 3 flags (using stage 1 GCC)
CFLAGS_FOR_TARGET flags for new GCC building target binaries
CXXFLAGS_FOR_TARGET
flags for new GCC building libstdc++/others
• GCJFLAGS
flags for new GCC building Java runtime
• ‘-O0 –ggdb3’ is recommended when debugging
OOPSLA 2006
Portland, Oregon
GCC Basics
13
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Building GCC from sources
• Build it ! continued…
 make bootstrap (to bootstrap) or make (to not)
• bootstrap useful when compiling with non-GCC host compiler
• during development, non-bootstrap is faster and also better at
recompiling just those sources that have changed
 use make’s -j option to speed things up on MP/dual core
 make bootstrap-lean
• cleans up between stages, uses less disk
 make profiledbootstrap
• faster compiler produced, but need GCC host
• –j unsupported
• Install it !
 make install
OOPSLA 2006
Portland, Oregon
GCC Basics
14
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Building a cross-compiler
• Code generator can be built for any target
 runtime libraries then are built using that code generator
• Since GCC outputs assembly, you actually need a full
cross development toolchain
 Dan Kegel’s crosstool automates a GNU/Linux cross chain
for popular configurations:
•
•
•
•
•
Linux kernel headers
GNU binutils
glibc
gcc
see kegel.com/crosstool
OOPSLA 2006
Portland, Oregon
GCC Basics
15
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
GCC Basics: Getting Around
• Other tools recommended when hacking GCC
GNU Screen
etags
ctags
c++filt
readelf
objdump
gdb
OOPSLA 2006
Portland, Oregon
attach/reattach terminal sessions
navigation to source definitions (emacs)
navigation to source definitions (vi)
demangle C++/Java mangled symbols
decompose ELF files
object file dumper/disassembler
GNU debugger
GCC Basics
16
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
GCC Drivers
• gcc, g++, gcj are drivers, not compilers
 They will execute (as appropriate):
•
•
•
•
compiler (cc1, cc1plus, jc1)
Java program main entry point generation (jvgenmain)
assembler (as)
linker (collect2)
• Differences between drivers include active
#defines, default libraries, other behavior
 but can use any driver for any source language
OOPSLA 2006
Portland, Oregon
GCC Basics
17
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Most useful driver options for debugging
-E
-S
-H
-save-temps
-print-search-dirs
-v
-g
preprocess, don’t compile
compile, don’t assemble
verbose header inclusion
save temporary files
print search paths
verbose (see what the driver does)
include debugging symbols
--help
--version
-dumpversion
get command line help
show full version info
show minimal version info
OOPSLA 2006
Portland, Oregon
GCC Basics
18
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
For extra help
man gcc
basic option assistance
info gcc
using gcc in-depth;
language extensions etc.
info gccint
internals documentation
Top-level INSTALL directory in distribution provides
help on configuring and building GCC
OOPSLA 2006
Portland, Oregon
GCC Basics
19
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Tour of GCC source
INSTALL
boehm-gc
config
contrib
fastjar
fixincludes
gcc
include
intl
OOPSLA 2006
Portland, Oregon
configuration/installation documentation
the Boehm garbage collector
architecture-specific configure fragments
contributed scripts
a replacement for the jar tool
source for a program to fix host header
files when they aren't ANSI-compliant
the main compiler source
headers used by GCC (libiberty mostly)
support for languages other than English
GCC Basics
20
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Tour of GCC source, cont’d
libcpp
libffi
libiberty
libjava
libmudflap
libstdc++-v3
maintainer-scripts
zlib
OOPSLA 2006
Portland, Oregon
source for C preprocessing library
Foreign Function Interface library (allows
function callers and receivers to have
different calling conventions)
useful utility routines (symbol tables etc.)
used by GCC and replacement functions
for common things not provided by host
source for standard Java library
source for a pointer instrumentation library
source for standard C++ library
utility scripts for GCC maintainers
compression library source
GCC Basics
21
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Front-end
Middle-end Back-end
The GCC Front-End
Option processing
Controlling drivers and hooking up front-ends
The C, C++, and Java front-ends
The GENERIC high-level intermediate representation
OOPSLA 2006
Portland, Oregon
22
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
The GCC Front-End
• gcc, g++, gcj driver entry point
 main (gcc/gcc.c)
• cc1, cc1plus, jc1 share a common entry point
 toplev_main (gcc/toplev.c)
• actual main in gcc/main.c
– just calls toplev_main()
– can be overridden by front-end
OOPSLA 2006
Portland, Oregon
GCC Front-End
23
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Command-line option processing
• In gcc/ directory
common.opt
opts.{c,h}
c-opts.c
c.opt
java/lang.opt
java/lang.c
option definitions
common_handle_option()
c_common_handle_option()
C compiler option definitions
Java compiler option definitions
java_handle_option()
• These are cc1, cc1plus, jc1 option handling routines
 drivers just pass on arguments as declared in spec files
OOPSLA 2006
Portland, Oregon
GCC Front-End
24
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
common.opt
• Parsed by awk scripts at build time to generate
options.c, options.h
• Simple format
 Language specifications and option stanzas
• Each option stanza contains
1. option name
2. space-separated options list
3. documentation string for --help output
OOPSLA 2006
Portland, Oregon
GCC Front-End
25
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Properties of command-line options
• Available properties for use in .opt option spec files are
Common
Target
Joined
Separate
JoinedOrMissing
RejectNegative
UInteger
Undocumented
Report
OOPSLA 2006
Portland, Oregon
option is available for all front-ends
option is target-specific
argument is mandatory and may be joined
argument is mandatory and may be separate
optional argument, must be joined if present
there is not an associated “no-” option
argument expected is a nonnegative integer
undocumented; do not include in --help output
--fverbose-asm should report the state of this option
GCC Front-End
26
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Properties of options cont’d
Var(var-name)
VarExists
Init(value)
Mask(name)
set var-name to true (or argument) if present
do not define variable in resulting options.c
static initializer for variable
associated with a bit in target_flags bit vector;
MASK_name is automatically #defined to the
bitmask; TARGET_name is automatically #defined
as an expression that is 1 when the option is used,
0 when not
InverseMask(other, [this])
option is inverse of another option with
Mask(other); if this is given, #define
TARGET_this.
MaskExists
don’t #define again; use for synonymous options
Condition(cond)
option permitted iff preprocessor cond is true
OOPSLA 2006
Portland, Oregon
GCC Front-End
27
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Language-specific options
• gcc/c.opt, gcc/java/lang.opt, gcc/cp/lang.opt
• Special processing in gcc/java/lang.c
• Specify valid language-names as an option
OOPSLA 2006
Portland, Oregon
GCC Front-End
28
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Adding command-line options
OOPSLA 2006
Portland, Oregon
29
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Controlling the drivers: spec files
specs for gcc driver
additional specs for g++ driver
additional specs for gcj driver
%{E|M|MM:%(trad_capable_cpp) %(cpp_options) %(cpp_debug_options)}
%{!E:%{!M:%{!MM:
%{traditional|ftraditional:
%eGNU C no longer supports -traditional without -E}
%{save-temps|traditional-cpp|no-integrated-cpp:%(trad_capable_cpp)
%(cpp_options) -o %{save-temps:%b.i} %{!save-temps:%g.i} \n
cc1 -fpreprocessed %{save-temps:%b.i} %{!save-temps:%g.i}
%(cc1_options)}
%{!save-temps:%{!traditional-cpp:%{!no-integrated-cpp:
cc1 %(cpp_unique_options) %(cc1_options)}}}
%{!fsyntax-only:%(invoke_as)}}}
adapted from gcc/gcc.c
gcc/gcc.c
gcc/cp/lang-specs.h
gcc/java/lang-specs.h
gcc/gcc.c contains documentation on spec language
Use -dumpspecs to see specifications
OOPSLA 2006
Portland, Oregon
GCC Front-End
30
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
The C front-end
• C front-end is in gcc/ directory
 parse entry point c_common_parse_file (c-opts.c)
• workhorse is c_parse_file (c-parser.c)
c-common.def
c-common.c
c-convert.c
c-cppbuiltin.c
c-decl.c
c-dump.c
c-errors.c
c-format.c
c-gimplify.c
OOPSLA 2006
Portland, Oregon
IR codes for C compiler
functions for C-like front-ends
type conversion
built-in preprocessor #defines
declaration handling
IR-dumping
pedantic warning issuance
format checking for printf-like functions
lowering of IR (and documentation)
GCC Front-End
31
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
The C front-end, cont’d
c-incpath.c
c-lang.c
c-lex.c
c-objc-common.c
c-opts.c
c-parser.c
c-pch.c
c-ppoutput.c
c-pragma.c
c-pretty-print.c
c-semantics.c
c-typeck.c
include path generation for preprocessor
language infrastructure, front-end hookups
lexical analyzer (manually coded)
some functions for C and Objective-C
option processing, some init stuff
parser (based on an old bison parser)
precompiled header support
preprocessing-only support (-E option)
support for #pragma pack and #pragma weak
used to pretty-print expressions in error messages
statement list handling in IR
functions to build IR, type checks
gccspec.c
driver-specific tasks for gcc driver
OOPSLA 2006
Portland, Oregon
GCC Front-End
32
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
The C++ front-end
• In subdirectory gcc/cp/
 same parse entry point as C compiler
call.c
class.c
cp-gimplify.c
cp-lang.c
cp-objcp-common.c
cvt.c
cxx-pretty-print.c
decl.c
decl2.c
dump.c
OOPSLA 2006
Portland, Oregon
function/method invocation lookup and handling
building (the runtime artifacts of) classes etc.
IR lowering
language hooks for C++ front-end
common bits for C++ and Objective-C++
type conversion
C++ pretty-printer
declaration and variable handling
additional declaration and variable handling
IR dumping
GCC Front-End
33
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
The C++ front-end, cont’d
error.c
except.c
expr.c
friend.c
init.c
lex.c
mangle.c
method.c
name-lookup.c
optimize.c
parser.c
pt.c
ptree.c
repo.c
OOPSLA 2006
Portland, Oregon
C++ error-reporting callbacks
C++ exception-handling support
IR lowering for C++
C++ “friend” support
data initializers and constructors
the C++ lexical analyzer
C++ name mangling
method handling; default constructor generation
context-aware name (type, var, namespace) lookup
constructor/destructor cloning
the C++ parser
parameterized type (template) support
IR pretty-printing
C++ template repository support
GCC Front-End
34
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
The C++ front-end, cont’d
rtti.c
search.c
semantics.c
tree.c
typeck.c
typeck2.c
support for run-time type information
type search in the presence of multiple inheritance
semantic checking
C++ front-end specific IR functionality
functionality dealing with types, conversion
types, conversion, type errors
g++spec.c
driver-specific tasks for g++ driver
OOPSLA 2006
Portland, Oregon
GCC Front-End
35
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
The Java front-end
• In subdirectory gcc/java/
 parse entry point java_parse_file (jcf-parse.c)
boehm.c
buffer.{c,h}
builtins.c
check-init.c
class.c
constants.c
decl.c
except.c
expr.c
gjavah.c
java-gimplify.c
OOPSLA 2006
Portland, Oregon
per-type bitmask building for Boehm GC
expandable buffer data type
builtin/inline functions for Java (like Math.min())
checks over IR for uninitialized variables
IR building of classes, class-references, vtables, etc.
class file constant pool handling
Java declaration support (misc.)
Java exception support
Java expressions (misc.)
source for gcjh program
IR lowering
GCC Front-End
36
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
The Java front-end
jcf-depend.c
jcf-dump.c
jcf-io.c
jcf-parse.c
jcf-path.c
jcf-reader.c
jcf-write.c
jv-scan.c
jvgenmain.c
jvspec.c
lang.c
lex.c
mangle.c
mangle_name.c
OOPSLA 2006
Portland, Oregon
class file dependency tracking
source for jcf-dump program
class file I/O utility functions
entry point for compiling Java files
CLASSPATH-sensitive search
generic, pluggable class file reader
class file writer
source for jv-scan program
source for jvgenmain program
Java option specs
language hooks, options processing
Java lexical analyzer
symbol-mangling routines
symbol-mangling routines
GCC Front-End
37
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
The Java front-end
parse-scan.y
parse.y
resource.c
typeck.c
verify-glue.c
verify-impl.c
win32-host.c
zextract.c
minimal, fast parser for syntax checking
Java (source-language) parser
Support for --resource option
routines related to types and type conversion
interface between verifier and compiler
bytecode verifier
for Windows; case-sensitive filename matching
read class files from zip/jar archives
keyword.gperf
Java keyword specification
OOPSLA 2006
Portland, Oregon
GCC Front-End
38
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Multiple “front-ends” for Java
• common entry point at java_parse_file
 gcc/java/jcf-parse.c
• compile .java  .o
 gcc/java/parse.y
• compile .class  .o (or .jar  .so)
 gcc/java/expr.c (with gcc/java/jcf-reader.c)
 expand_byte_code, process_jvm_instruction
• compile .java  .class (with –C option)
 gcc/java/parse.y with flag_emit_class_files set
 unusual back-end (as if syntax checking only)
OOPSLA 2006
Portland, Oregon
GCC Front-End
39
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
The “treelang” front end:
Essential front-end components
•
•
•
•
•
configure fragment (config-lang.in)
language-specific options (lang.opt)
filename handling for driver (lang-specs.h)
treelang-specific tree codes (treelang-tree.def)
front-end hookups to toplev.c (treetree.c)
 see gcc/langhooks.h for documentation
• flex scanner (lex.l)
• bison parser (parse.y)
• structural functions (tree1.c)
OOPSLA 2006
Portland, Oregon
GCC Front-End
40
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Adding a new
front-end to GCC
OOPSLA 2006
Portland, Oregon
41
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
GENERIC trees
• Front-ends are written in C !
• We’d like to have…
 tree node base class
• subclasses for expressions etc.
• Instead we have
 union tree_node (gcc/tree.h)
• each field is a struct components of union
OOPSLA 2006
Portland, Oregon
GCC Front-End
42
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Structs vs. unions
low memory
struct
union
field 2
field 1
field 2
field 3
field 4
field 1
fields overlap in memory;
you’re on your own for type safety !
field 3
field 4
high memory
OOPSLA 2006
Portland, Oregon
GCC Front-End
43
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
The tree_node union
Everything is a tree !
low memory
union tree_node
common
int_cst
type
identifier
field_decl
typedef union tree_node *tree;
OOPSLA 2006
Portland, Oregon
GCC Front-End
exp
…
high memory
44
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
The tree_node union
• The “common” part contains




code (kind of tree – declaration, expression, etc.)
chain (for linking trees together)
type (type of the represented item – also a tree)
flags
•
•
•
•
side effects
addressable
access flags (used for other things in non-declarations)
7 language-specific flags
OOPSLA 2006
Portland, Oregon
GCC Front-End
45
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Macros for accessing tree parts
• In the common part
 TREE_*
• TREE_CODE(tree)
• TREE_TYPE(tree)
• TREE_SIDE_EFFECTS(tree) etc.
• For specific trees
 type trees
• TYPE_*
– TYPE_FIELDS(tree)
– TYPE_NAME(tree)
OOPSLA 2006
Portland, Oregon
gets a list of fields in the type
gets the type’s associated decl
GCC Front-End
46
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Expression trees
• Lots of tree codes used for expressions







gcc/tree.def defines all standard tree codes
LT_EXPR
less-than conditional
TRUTH_ORIF_EXPR short-circuiting OR conditional
MODIFY_EXPR
assignment
NOP_EXPR
type promotion (typically)
SAVE_EXPR
store in temporary for multiple uses
ADDR_EXPR
take address of
• Front-end extensions to GENERIC permitted
 gcc/c-common.def
 gcc/cp/cp-tree.def
e.g. DYNAMIC_CAST_EXPR
 gcc/java/java-tree.def e.g. SYNCHRONIZED_EXPR
OOPSLA 2006
Portland, Oregon
GCC Front-End
47
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
A few useful front-end functions
build() expression tree building – pass tree code,
tree type, and (arbitrary number of) operands
fold() simple tree restructuring and optimization;
mostly useful for constant folding
gcc_assert() assertion verification – if it fails it
gives an “internal compiler error” report with source
file and line number under compilation (as well as
source file and line number in compiler code)
OOPSLA 2006
Portland, Oregon
GCC Front-End
48
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Code naming conventions
•
•
•
•
•
Preprocessor macros ALL UPPERCASE
Variables/functions all lowercase with underscores
Predicates end in “_P” or “_p”
Global flags start with “flag_”
Global trees (vary somewhat with front-end)




null_node (or null_pointer_node)
integer_zero_node
void_type_node
integer_unsigned_type_node (or unsigned_int_type_node)
• Tree accessor macros FROM_TO (e.g. TYPE_DECL)
OOPSLA 2006
Portland, Oregon
GCC Front-End
49
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Modifying the
front-end
OOPSLA 2006
Portland, Oregon
50
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Gimplification
• GENERIC + extensions  GIMPLE
 GIMPLE is a subset of GENERIC
 based on SIMPLE from McGill’s McCAT group
• GIMPLE is just like GENERIC but
 no language extensions
• front-end gimplify_expr callback
 3-address form (with temporary variables)
 control structures lowered to goto
OOPSLA 2006
Portland, Oregon
GCC Front-End
51
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Front-end
Middle-end
Back-end
GCC Middle-End
Optimization of trees
Static Single-Assignment form
The Register Transfer Language
intermediate representation
OOPSLA 2006
Portland, Oregon
52
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
The middle-end in context
Front-end
Middle-end
Gimplification
Tree
Tree
optimizations
Tree
optimizations
Treeoptimizations
optimizations
Expansion into RTL
RTL
RTL
passes
RTL
passes
RTLpasses
passes
Register allocation
RTL
RTL
passes
RTL
passes
RTLpasses
passes
Back-end
OOPSLA 2006
Portland, Oregon
GCC Middle-End
53
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Optimizations over the
tree representation
• Managed by pass manager in gcc/passes.c
 init_optimization_passes orders the passes
 passes represented by a tree_opt_pass struct
(tree-pass.h) even though it does RTL now too
• “gate” function – whether or not to run optimization
• “execute” function – implementation of pass
• property bitmaps
– properties required, destroyed, and created
• “todo” bitmaps
– run internal GC, dump the tree, verify SSA form, etc.
OOPSLA 2006
Portland, Oregon
GCC Middle-End
54
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Passes and subpasses
• Passes can be used to group subpasses
• all_passes contains all_optimization_passes
 all_optimization_passes has optimizations in order
• pass_tree_loop contains loop optimizations
OOPSLA 2006
Portland, Oregon
GCC Middle-End
55
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Adding a tree optimization pass
OOPSLA 2006
Portland, Oregon
56
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Debugging middle-end tree passes
Command-line options for dumping trees:
-fdump-tree-X
-fdump-tree-original
-fdump-tree-optimized
-fdump-tree-gimple
-fdump-tree-inlined
-fdump-tree-all
output after pass X
output initial tree (before all opts)
output final GIMPLE (after all opts)
dump before & after gimplification
output after function inlining
output after each pass
(Make sure you specify an –O level or you might not get anything.)
Passes available for dumping in GCC 4.1.1 (see info page):
cfg, vcg, ch, ssa, salias, alias, ccp, storeccp, pre, fre, copyprop,
store_copyprop, dce, mudflap, sra, sink, dom, dse, phiopt, forwprop,
copyrename, nrv, vect, vrp
OOPSLA 2006
Portland, Oregon
GCC Middle-End
57
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Debugging middle-end tree passes
Can specify options for tree dumps:
• address
print address of each tree node
• slim
less output; don’t dump all scope bodies
• raw
raw tree output (rather than pretty-printed C-like trees)
• details
detailed output (not supported by all passes)
• stats
statistics (not supported by all passes)
• blocks
basic block boundaries
• vops
output virtual operands for each statement
• lineno
output line #s
• uid
output decl’s unique ID along with each variable
• all
all except raw, slim, and lineno\
e.g.
-fdump-tree-dse-details
detailed post-DSE output
-fdump-tree-all-all
(almost) everything
OOPSLA 2006
Portland, Oregon
GCC Middle-End
58
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Static Single-Assignment (SSA) form
Cytron et al. Efficiently computing static single
assignment form and the control dependence graph.
ACM TOPLAS, October 1991.
• (Pure) functional languages have nice properties for
optimization
 single-assignment: one assignment to each variable
 static single-assignment: next best thing
• each variable assigned at one static location in the program
 makes it clearer where data is produced
• reduces complexity of many optimization algorithms
• removes association of variable uses over its lifetime
OOPSLA 2006
Portland, Oregon
GCC Middle-End
59
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
SSA renaming (1)
y = 10
x = 1
y = 10;
/* compute 2^y */
x = 1;
while (y > 0) {
x = x * 2;
y = y - 1;
}
y < 0 ?
false
true
x = x * 2
y = y - 1
EXIT
model control flow
OOPSLA 2006
Portland, Oregon
GCC Middle-End
60
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
SSA renaming (2)
y1 = 10
x1 = 1
y1 = 10;
/* compute 2^y */
x1 = 1;
while (y1 > 0) {
x2 = x1 * 2;
y2 = y1 - 1;
}
version all variables
OOPSLA 2006
Portland, Oregon
y1 < 0 ?
false
true
x2 = x1 * 2
y2 = y1 - 1
GCC Middle-End
EXIT
61
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
SSA renaming (3)
y1 = 10;
/* compute 2^y */
x1 = 1;
while(true) {
x3 = φ(x1, x2);
y3 = φ(y1, y2);
if (y3 > 0)
break;
}
y1 = 10
x1 = 1
x3 = φ(x1, x2)
y3 = φ(y1, y2)
y3 < 0 ?
x2 = x3 * 2;
y2 = y3 - 1;
insert “phi” nodes
OOPSLA 2006
Portland, Oregon
false
true
x2 = x3 * 2
y2 = y3 - 1
GCC Middle-End
EXIT
62
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Into and out of SSA form in GCC
pass_build_ssa
gcc/tree-into-ssa.c
SSA
optimizations
pass_del_ssa
pass_del_ssa
pass_del_ssa
pass_del_ssa
pass_del_ssa
pass_del_ssa
gcc/tree-outof-ssa.c
OOPSLA 2006
Portland, Oregon
GCC Middle-End
63
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Dealing with SSA form in GCC
• Given a tree node n with code = PHI_NODE
PHI_RESULT(n)
PHI_NUM_ARGS(n)
PHI_ARG_DEF(n, i)
PHI_ARG_EDGE(n, i)
PHI_ARG_ELT(n, i)
get lhs of φ
get rhs count
get ssa-name
get edge
tuple (ssa-name, edge)
• Given a tree node n with code = SSA_NAME
SSA_NAME_DEF_STMT(n) get defining statement
SSA_NAME_VERSION(n) get SSA version #
OOPSLA 2006
Portland, Oregon
GCC Middle-End
64
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
A few useful functions in the middle-end
walk_use_def_chains(var, func, data)
start at ssa-name var, calling func at each point up the chain;
data is a generic pointer for use by func
— see tree-ssa.c and internals docs (info gccint)
walk_dominator_tree(dom-walk-data, basic-block)
start at basic-block and walk children in dominator
relationship; dom-walk-data provides several callbacks
— see domwalk.h and internals docs (info gccint)
OOPSLA 2006
Portland, Oregon
GCC Middle-End
65
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Implementing an optimization
from start to finish
OOPSLA 2006
Portland, Oregon
66
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
RTL expansion and optimization
• Expansion performed by pass_expand
(gcc/cfgexpand.c)
 Back-end has a say in this
• As of GCC 4.1.x, RTL passes are carried out
by same pass manager that works on trees
• pass_final (at end) outputs assembly
OOPSLA 2006
Portland, Oregon
GCC Middle-End
67
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Front-end Middle-end
Back-end
GCC Back-End
Register allocation
Instruction selection
Debugger support
OOPSLA 2006
Portland, Oregon
68
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Register allocation
• RTL pseudo-registers  hard registers
• Proceeds in several passes
1. Register class scan (preference registers)
2. Register allocation within basic blocks
3. Register allocation for remaining registers
4. Reload (renumbering, spilling)
OOPSLA 2006
Portland, Oregon
GCC Back-End
69
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Instruction selection
• Machine description (.md) files for target CPU
define_expand()
matches standard names and generates RTL; assists in
expansion of GIMPLE
define_insn()
matches RTL templates and generates assembly
• Internals documentation has details
OOPSLA 2006
Portland, Oregon
GCC Back-End
70
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
A machine description tour
OOPSLA 2006
Portland, Oregon
71
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Debugger support
• Specifying –g to the compiler inserts debugging
symbols in the assembly output
• DWARF2 format
 embedded within ELF
 a tree of debug info entries (compilation unit at the root)
• each with a linked list of attributes
 DWARF2 manual: ftp.freestandards.org/pub/dwarf/dwarf-2.0.0.pdf
• Once assembled, “readelf –w” interprets them
OOPSLA 2006
Portland, Oregon
GCC Back-End
72
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Runtime Issues
Object layout
Virtual method lookup
The Boehm garbage collector
crt stuff
OOPSLA 2006
Portland, Oregon
73
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Simple object layout (C++)
class A {
public:
int x;
virtual void myMethod();
virtual void other();
};
class B : public A {
public:
int y;
virtual void myMethod();
virtual void third();
};
OOPSLA 2006
Portland, Oregon
A::myMethod
vtable
A::other
x
vtable for A
instances of A
B::myMethod
vtable
A::other
x
B::third
y
vtable for B
instances of B
GCC Runtime Issues
74
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
OOPSLA 2006
Portland, Oregon
A::myMethod
vtable
A::other
x
vtable for A
instances of A
B::myMethod
vtable
A::other
x
B::third
y
vtable for B
instances of B
GCC Runtime Issues
subobject A of B
sub-vtable A of B
Simple object layout (C++)
75
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
sub-vtable Object of B
sub-vtable A of B
class B pointer
GC descriptor
vtable
finalize
x
hashCode
y
equals
subobject A of B
Object layout (Java)
instances of B
toString
myMethod
other
third
OOPSLA 2006
Portland, Oregon
vtable for B
clone
GCC Runtime Issues
76
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
But more complicated for C++ !
First, classes might not have virtual functions !
class A {
public:
int x;
void myMethod();
void other();
};
x
class B : public A {
public:
int y;
void myMethod();
void third();
};
OOPSLA 2006
Portland, Oregon
x
y
subobject A of B
instances of A
instances of B
GCC Runtime Issues
77
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
But more complicated for C++ !
class A {
Second,
public:
int x;
virtual void one();
};
classes might have multiple bases !
class B {
public:
int y;
virtual void two();
};
A::one
vtable
vtable for A
x
instances of A
B::two
vtable
vtable for B
y
instances of B
class C : public A, public B {
public:
int z;
virtual void three();
};
OOPSLA 2006
Portland, Oregon
?
?
vtable for C
instances of C
GCC Runtime Issues
78
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
A::one
vtable for A
vtable
A::one
vtable
x
—
x
instances of A
—
vtable
B::two
y
C::three
z
vtable for C
instances of C
B::two
vtable for B
vtable
y
instances of B
OOPSLA 2006
Portland, Oregon
subobject B of C
subobject A of C
Object layout for multiple bases
Requires “this pointer-adjustment”
GCC Runtime Issues
79
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Multiple bases, cont’d
OOPSLA 2006
Portland, Oregon
vtable
[ offset = – 4 ]
x
—
vtable
B::two
y
C::three
z
vtable for C
instances of C
GCC Runtime Issues
subobject B of C
A::one
subobject A of C
But what about dynamic_cast ?!
80
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Multiple bases, cont’d
—
OOPSLA 2006
Portland, Oregon
A::one
vtable
[ offset = – 4 ]
x
—
vtable
B::two
y
C::three
z
vtable for C
instances of C
GCC Runtime Issues
subobject B of C
[ offset = 0 ]
subobject A of C
Top-level offset
81
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Multiple bases, finished*
ptr. typeinfo C
*
A::one
vtable
[ offset = – 4 ]
x
ptr. typeinfo C
vtable
B::two
y
C::three
z
vtable for C
instances of C
there are further complications, but we’ll leave it here
OOPSLA 2006
Portland, Oregon
GCC Runtime Issues
subobject B of C
[ offset = 0 ]
subobject A of C
But what about C++ type info ?!
82
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Java and C++ share object layout
[ offset = 0 ]
null typeinfo
class B pointer
GC descriptor
finalize
hashCode
toString
clone
myMethod
other
third
OOPSLA 2006
Portland, Oregon
GCC Runtime Issues
vtable for (Java) B
equals
83
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Virtual method lookup (C++, Java)
[ offset = 0 ]
null typeinfo
Now, virtual method
invocation is a snap !
Compiler knows method
offset within vtable
vtable
class B pointer
x
GC descriptor
y
finalize
instance of B
hashCode
So it generates an indirect
access through instance pointer…
toString
clone
myMethod
…and invokes the method
through the pointer found in vtable
OOPSLA 2006
Portland, Oregon
GCC Runtime Issues
other
third
84
22 October 2006
vtable for (Java) B
equals
Demystifying GCC
Morgan Deters and Ron K. Cytron
The Boehm garbage collector
Boehm, H., Space Efficient Conservative
Garbage Collection. In ACM PLDI’91.
• Conservative mark & sweep garbage collector
 designed to operate in a hostile environment as a
drop-in replacement for malloc
 “conservative” means it cannot distinguish
between pointers and non-pointers
 Java is considerably less “hostile” than C/C++
• can’t hide pointers from the compiler
OOPSLA 2006
Portland, Oregon
GCC Runtime Issues
85
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Java and Boehm GC
• Java front-end generates class pointer masks
 stows them in vtable
 computed in gcc/java/boehm.c
• Class too big for a pointer mask ?
 use a count of reference fields
 use a “mark procedure”
[ offset = 0 ]
null typeinfo
class B pointer
GC descriptor
finalize
…
• Where to look
 boehm-gc/doc contains docs
 libjava/prims.cc contains GC-aware allocation routines
OOPSLA 2006
Portland, Oregon
GCC Runtime Issues
86
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
crt stuff (“C runtime”)
• crt1.o, crti.o, crtn.o* provided by glibc
crt1.o sets up libc before main() is even invoked
crti.o prologue for .init and .fini
crtn.o epilogue for .init and .fini
• crtbegin.o, crtend.o* provided by GCC
crtbegin.o
crtend.o
contributes frame_dummy() call to .init;
calls static data destructors in .fini
calls static data constructors in .init
 code in gcc/crtstuff.c
*
OOPSLA 2006
Portland, Oregon
GCC Runtime Issues
and some variations
87
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Language feature with
runtime support
OOPSLA 2006
Portland, Oregon
88
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Wrap-up
Running GCC under GDB
Obtaining development versions of GCC
Reporting bugs in GCC
What’s next for GCC
OOPSLA 2006
Portland, Oregon
89
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Running GCC under GDB
• Inevitably, hacking a compiler will result in
 segfault
 assertion fault
 incorrect code generation
• Remember to attach debugger to the compiler,
not the driver
• “gcc –v …,” then use GDB on the actual
front-end
OOPSLA 2006
Portland, Oregon
Wrap-up
90
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Debugging GCC
OOPSLA 2006
Portland, Oregon
91
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Obtaining development versions of GCC
• All GCC development is in the open
 design discussions
 change logs
 bugs
• Subversion (SVN) repository
 public read access
 for details: gcc.gnu.org/svn.html
 clients available from subversion.tigris.org/
OOPSLA 2006
Portland, Oregon
Wrap-up
92
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
What to do if you find a bug in GCC
• Check to see if bug is present in SVN version
• Check to see if bug is in bug database
 http://gcc.gnu.org/bugzilla/
• Collect version information (gcc --version)
• Guidelines: http://gcc.gnu.org/bugs.html
• Report it: http://gcc.gnu.org/bugzilla/
OOPSLA 2006
Portland, Oregon
Wrap-up
93
22 October 2006
Demystifying GCC
Morgan Deters and Ron K. Cytron
Thanks!
Ron K. Cytron
[email protected]
– Distributed Object Computing Laboratory –
Washington University
Dept. of Computer Science & Engineering
St. Louis, MO 63130 USA
OOPSLA 2006
Portland, Oregon
94
22 October 2006
Copyright © 2005–2006 Morgan Deters
Morgan Deters
[email protected]