Transcript Slide 1
Demystifying GCC Morgan Deters Ron K. Cytron [email protected] [email protected] Distributed Object Computing Laboratory Washington University St. Louis, MO Copyright is held by the author/owner(s). OOPSLA'06, October 22–26, 2006, Portland, Oregon, USA. 2006 ACM 06/0010. Copyright © 2005–2006 Morgan Deters Under the Hood of the GNU Compiler Collection Demystifying GCC Morgan Deters and Ron K. Cytron Tutorial Objectives • Introduce the internals of GCC 4.1.1 Java and C++ front-ends Optimizations Back-end structure • How to add new or change language front-ends optimizations machine-specific back-ends • How to debug/improve GCC OOPSLA 2006 Portland, Oregon Tutorial Objectives 2 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron GCC Big Picture What is GCC? Why use GCC? What does compilation with GCC look like? OOPSLA 2006 Portland, Oregon 3 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron What is GCC ? • A compiler for multiple languages… C C++ Java Objective-C/C++ FORTRAN Ada OOPSLA 2006 Portland, Oregon GCC Big Picture 4 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron What is GCC ? • …supporting multiple targets arc c4x frv iq2000 m68k mn10300 rs6000 stormy16 arm cris h8300 m32c mcore mt s390 v850 avr crx i386 m32r mips pa sh vax bfin fr30 ia64 m68hc11 mmix pdp11 sparc xtensa These are code generators; variants are also supported (e.g. powerpc is a “variant” of the rs6000 code generator) OOPSLA 2006 Portland, Oregon GCC Big Picture 5 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron What GCC is not • GCC is not an assembler (see GNU binutils) a C library (see glibc) a debugger (see gdb) an IDE OOPSLA 2006 Portland, Oregon GCC Big Picture 6 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Advantages of using GCC as an R&D platform • Research is immediately usable by everyone Large development community and user base GCC is a modern, practical compiler • multiple architectures, full standard languages, optimizations • debugging support • You can meet GCC halfway modular: hack some parts, rely on the others • Can incorporate bug fixes that come along minor version upgrades (e.g. 3.3.x 3.4.x) – no big deal major version upgrades (e.g. 3.x 4.x) – more of a pain • Need not maintain code indefinitely (if incorporated) OOPSLA 2006 Portland, Oregon GCC Big Picture 7 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron The GCC project and the GPL • Open-source covered by GNU General Public License (GPL) • Any changes you make to GCC source code or associated libraries must also be GPLed • However, compiler and libraries can be used/linked against in non-GPL development Your improvements to GCC must be open-source, but your customers need not open-source their programs to use your stuff OOPSLA 2006 Portland, Oregon GCC Big Picture 8 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Typical structure of GCC compilation source program gcc/g++/gcj compiler assembly program assembler linker ELF object OOPSLA 2006 Portland, Oregon GCC Big Picture 9 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Inside the compiler compiler (C, C++, Java) parser / semantic checker tree optimizations gimplifier RTL passes expander trees OOPSLA 2006 Portland, Oregon target arch instruction selection RTL GCC Big Picture 10 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron GCC Basics How do you build GCC? How do you navigate the source tree? OOPSLA 2006 Portland, Oregon 11 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron GCC Basics: Getting Started • Requirements to build GCC usual suite of UNIX tools (C compiler, assembler/linker, GNU Make, tar, awk, POSIX shell) • For development GNU m4 and GNU autotools (autoconf/automake/libtool) gperf bison, flex autogen, guile, gettext, perl, Texinfo, diffutils, patch, … • Obtaining GCC sources gcc.gnu.org or local mirror (see gcc.gnu.org/mirrors.html) get gcc-core package, then language add-ons • gcc-java requires gcc-g++ OOPSLA 2006 Portland, Oregon GCC Basics 12 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Building GCC from sources • Configure it in a separate build directory from sources /path/to/source/directory/configure options… --prefix=install-location --enable-languages=comma-separated-language-list --enable-checking • turns on sanity checks (especially on intermediate representation) • Build it ! Environment variables useful when debugging compiler/runtime • • • • CFLAGS stage 1 flags (using host C compiler) BOOT_CFLAGS stage 2 and stage 3 flags (using stage 1 GCC) CFLAGS_FOR_TARGET flags for new GCC building target binaries CXXFLAGS_FOR_TARGET flags for new GCC building libstdc++/others • GCJFLAGS flags for new GCC building Java runtime • ‘-O0 –ggdb3’ is recommended when debugging OOPSLA 2006 Portland, Oregon GCC Basics 13 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Building GCC from sources • Build it ! continued… make bootstrap (to bootstrap) or make (to not) • bootstrap useful when compiling with non-GCC host compiler • during development, non-bootstrap is faster and also better at recompiling just those sources that have changed use make’s -j option to speed things up on MP/dual core make bootstrap-lean • cleans up between stages, uses less disk make profiledbootstrap • faster compiler produced, but need GCC host • –j unsupported • Install it ! make install OOPSLA 2006 Portland, Oregon GCC Basics 14 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Building a cross-compiler • Code generator can be built for any target runtime libraries then are built using that code generator • Since GCC outputs assembly, you actually need a full cross development toolchain Dan Kegel’s crosstool automates a GNU/Linux cross chain for popular configurations: • • • • • Linux kernel headers GNU binutils glibc gcc see kegel.com/crosstool OOPSLA 2006 Portland, Oregon GCC Basics 15 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron GCC Basics: Getting Around • Other tools recommended when hacking GCC GNU Screen etags ctags c++filt readelf objdump gdb OOPSLA 2006 Portland, Oregon attach/reattach terminal sessions navigation to source definitions (emacs) navigation to source definitions (vi) demangle C++/Java mangled symbols decompose ELF files object file dumper/disassembler GNU debugger GCC Basics 16 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron GCC Drivers • gcc, g++, gcj are drivers, not compilers They will execute (as appropriate): • • • • compiler (cc1, cc1plus, jc1) Java program main entry point generation (jvgenmain) assembler (as) linker (collect2) • Differences between drivers include active #defines, default libraries, other behavior but can use any driver for any source language OOPSLA 2006 Portland, Oregon GCC Basics 17 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Most useful driver options for debugging -E -S -H -save-temps -print-search-dirs -v -g preprocess, don’t compile compile, don’t assemble verbose header inclusion save temporary files print search paths verbose (see what the driver does) include debugging symbols --help --version -dumpversion get command line help show full version info show minimal version info OOPSLA 2006 Portland, Oregon GCC Basics 18 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron For extra help man gcc basic option assistance info gcc using gcc in-depth; language extensions etc. info gccint internals documentation Top-level INSTALL directory in distribution provides help on configuring and building GCC OOPSLA 2006 Portland, Oregon GCC Basics 19 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Tour of GCC source INSTALL boehm-gc config contrib fastjar fixincludes gcc include intl OOPSLA 2006 Portland, Oregon configuration/installation documentation the Boehm garbage collector architecture-specific configure fragments contributed scripts a replacement for the jar tool source for a program to fix host header files when they aren't ANSI-compliant the main compiler source headers used by GCC (libiberty mostly) support for languages other than English GCC Basics 20 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Tour of GCC source, cont’d libcpp libffi libiberty libjava libmudflap libstdc++-v3 maintainer-scripts zlib OOPSLA 2006 Portland, Oregon source for C preprocessing library Foreign Function Interface library (allows function callers and receivers to have different calling conventions) useful utility routines (symbol tables etc.) used by GCC and replacement functions for common things not provided by host source for standard Java library source for a pointer instrumentation library source for standard C++ library utility scripts for GCC maintainers compression library source GCC Basics 21 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Front-end Middle-end Back-end The GCC Front-End Option processing Controlling drivers and hooking up front-ends The C, C++, and Java front-ends The GENERIC high-level intermediate representation OOPSLA 2006 Portland, Oregon 22 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron The GCC Front-End • gcc, g++, gcj driver entry point main (gcc/gcc.c) • cc1, cc1plus, jc1 share a common entry point toplev_main (gcc/toplev.c) • actual main in gcc/main.c – just calls toplev_main() – can be overridden by front-end OOPSLA 2006 Portland, Oregon GCC Front-End 23 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Command-line option processing • In gcc/ directory common.opt opts.{c,h} c-opts.c c.opt java/lang.opt java/lang.c option definitions common_handle_option() c_common_handle_option() C compiler option definitions Java compiler option definitions java_handle_option() • These are cc1, cc1plus, jc1 option handling routines drivers just pass on arguments as declared in spec files OOPSLA 2006 Portland, Oregon GCC Front-End 24 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron common.opt • Parsed by awk scripts at build time to generate options.c, options.h • Simple format Language specifications and option stanzas • Each option stanza contains 1. option name 2. space-separated options list 3. documentation string for --help output OOPSLA 2006 Portland, Oregon GCC Front-End 25 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Properties of command-line options • Available properties for use in .opt option spec files are Common Target Joined Separate JoinedOrMissing RejectNegative UInteger Undocumented Report OOPSLA 2006 Portland, Oregon option is available for all front-ends option is target-specific argument is mandatory and may be joined argument is mandatory and may be separate optional argument, must be joined if present there is not an associated “no-” option argument expected is a nonnegative integer undocumented; do not include in --help output --fverbose-asm should report the state of this option GCC Front-End 26 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Properties of options cont’d Var(var-name) VarExists Init(value) Mask(name) set var-name to true (or argument) if present do not define variable in resulting options.c static initializer for variable associated with a bit in target_flags bit vector; MASK_name is automatically #defined to the bitmask; TARGET_name is automatically #defined as an expression that is 1 when the option is used, 0 when not InverseMask(other, [this]) option is inverse of another option with Mask(other); if this is given, #define TARGET_this. MaskExists don’t #define again; use for synonymous options Condition(cond) option permitted iff preprocessor cond is true OOPSLA 2006 Portland, Oregon GCC Front-End 27 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Language-specific options • gcc/c.opt, gcc/java/lang.opt, gcc/cp/lang.opt • Special processing in gcc/java/lang.c • Specify valid language-names as an option OOPSLA 2006 Portland, Oregon GCC Front-End 28 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Adding command-line options OOPSLA 2006 Portland, Oregon 29 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Controlling the drivers: spec files specs for gcc driver additional specs for g++ driver additional specs for gcj driver %{E|M|MM:%(trad_capable_cpp) %(cpp_options) %(cpp_debug_options)} %{!E:%{!M:%{!MM: %{traditional|ftraditional: %eGNU C no longer supports -traditional without -E} %{save-temps|traditional-cpp|no-integrated-cpp:%(trad_capable_cpp) %(cpp_options) -o %{save-temps:%b.i} %{!save-temps:%g.i} \n cc1 -fpreprocessed %{save-temps:%b.i} %{!save-temps:%g.i} %(cc1_options)} %{!save-temps:%{!traditional-cpp:%{!no-integrated-cpp: cc1 %(cpp_unique_options) %(cc1_options)}}} %{!fsyntax-only:%(invoke_as)}}} adapted from gcc/gcc.c gcc/gcc.c gcc/cp/lang-specs.h gcc/java/lang-specs.h gcc/gcc.c contains documentation on spec language Use -dumpspecs to see specifications OOPSLA 2006 Portland, Oregon GCC Front-End 30 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron The C front-end • C front-end is in gcc/ directory parse entry point c_common_parse_file (c-opts.c) • workhorse is c_parse_file (c-parser.c) c-common.def c-common.c c-convert.c c-cppbuiltin.c c-decl.c c-dump.c c-errors.c c-format.c c-gimplify.c OOPSLA 2006 Portland, Oregon IR codes for C compiler functions for C-like front-ends type conversion built-in preprocessor #defines declaration handling IR-dumping pedantic warning issuance format checking for printf-like functions lowering of IR (and documentation) GCC Front-End 31 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron The C front-end, cont’d c-incpath.c c-lang.c c-lex.c c-objc-common.c c-opts.c c-parser.c c-pch.c c-ppoutput.c c-pragma.c c-pretty-print.c c-semantics.c c-typeck.c include path generation for preprocessor language infrastructure, front-end hookups lexical analyzer (manually coded) some functions for C and Objective-C option processing, some init stuff parser (based on an old bison parser) precompiled header support preprocessing-only support (-E option) support for #pragma pack and #pragma weak used to pretty-print expressions in error messages statement list handling in IR functions to build IR, type checks gccspec.c driver-specific tasks for gcc driver OOPSLA 2006 Portland, Oregon GCC Front-End 32 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron The C++ front-end • In subdirectory gcc/cp/ same parse entry point as C compiler call.c class.c cp-gimplify.c cp-lang.c cp-objcp-common.c cvt.c cxx-pretty-print.c decl.c decl2.c dump.c OOPSLA 2006 Portland, Oregon function/method invocation lookup and handling building (the runtime artifacts of) classes etc. IR lowering language hooks for C++ front-end common bits for C++ and Objective-C++ type conversion C++ pretty-printer declaration and variable handling additional declaration and variable handling IR dumping GCC Front-End 33 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron The C++ front-end, cont’d error.c except.c expr.c friend.c init.c lex.c mangle.c method.c name-lookup.c optimize.c parser.c pt.c ptree.c repo.c OOPSLA 2006 Portland, Oregon C++ error-reporting callbacks C++ exception-handling support IR lowering for C++ C++ “friend” support data initializers and constructors the C++ lexical analyzer C++ name mangling method handling; default constructor generation context-aware name (type, var, namespace) lookup constructor/destructor cloning the C++ parser parameterized type (template) support IR pretty-printing C++ template repository support GCC Front-End 34 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron The C++ front-end, cont’d rtti.c search.c semantics.c tree.c typeck.c typeck2.c support for run-time type information type search in the presence of multiple inheritance semantic checking C++ front-end specific IR functionality functionality dealing with types, conversion types, conversion, type errors g++spec.c driver-specific tasks for g++ driver OOPSLA 2006 Portland, Oregon GCC Front-End 35 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron The Java front-end • In subdirectory gcc/java/ parse entry point java_parse_file (jcf-parse.c) boehm.c buffer.{c,h} builtins.c check-init.c class.c constants.c decl.c except.c expr.c gjavah.c java-gimplify.c OOPSLA 2006 Portland, Oregon per-type bitmask building for Boehm GC expandable buffer data type builtin/inline functions for Java (like Math.min()) checks over IR for uninitialized variables IR building of classes, class-references, vtables, etc. class file constant pool handling Java declaration support (misc.) Java exception support Java expressions (misc.) source for gcjh program IR lowering GCC Front-End 36 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron The Java front-end jcf-depend.c jcf-dump.c jcf-io.c jcf-parse.c jcf-path.c jcf-reader.c jcf-write.c jv-scan.c jvgenmain.c jvspec.c lang.c lex.c mangle.c mangle_name.c OOPSLA 2006 Portland, Oregon class file dependency tracking source for jcf-dump program class file I/O utility functions entry point for compiling Java files CLASSPATH-sensitive search generic, pluggable class file reader class file writer source for jv-scan program source for jvgenmain program Java option specs language hooks, options processing Java lexical analyzer symbol-mangling routines symbol-mangling routines GCC Front-End 37 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron The Java front-end parse-scan.y parse.y resource.c typeck.c verify-glue.c verify-impl.c win32-host.c zextract.c minimal, fast parser for syntax checking Java (source-language) parser Support for --resource option routines related to types and type conversion interface between verifier and compiler bytecode verifier for Windows; case-sensitive filename matching read class files from zip/jar archives keyword.gperf Java keyword specification OOPSLA 2006 Portland, Oregon GCC Front-End 38 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Multiple “front-ends” for Java • common entry point at java_parse_file gcc/java/jcf-parse.c • compile .java .o gcc/java/parse.y • compile .class .o (or .jar .so) gcc/java/expr.c (with gcc/java/jcf-reader.c) expand_byte_code, process_jvm_instruction • compile .java .class (with –C option) gcc/java/parse.y with flag_emit_class_files set unusual back-end (as if syntax checking only) OOPSLA 2006 Portland, Oregon GCC Front-End 39 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron The “treelang” front end: Essential front-end components • • • • • configure fragment (config-lang.in) language-specific options (lang.opt) filename handling for driver (lang-specs.h) treelang-specific tree codes (treelang-tree.def) front-end hookups to toplev.c (treetree.c) see gcc/langhooks.h for documentation • flex scanner (lex.l) • bison parser (parse.y) • structural functions (tree1.c) OOPSLA 2006 Portland, Oregon GCC Front-End 40 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Adding a new front-end to GCC OOPSLA 2006 Portland, Oregon 41 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron GENERIC trees • Front-ends are written in C ! • We’d like to have… tree node base class • subclasses for expressions etc. • Instead we have union tree_node (gcc/tree.h) • each field is a struct components of union OOPSLA 2006 Portland, Oregon GCC Front-End 42 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Structs vs. unions low memory struct union field 2 field 1 field 2 field 3 field 4 field 1 fields overlap in memory; you’re on your own for type safety ! field 3 field 4 high memory OOPSLA 2006 Portland, Oregon GCC Front-End 43 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron The tree_node union Everything is a tree ! low memory union tree_node common int_cst type identifier field_decl typedef union tree_node *tree; OOPSLA 2006 Portland, Oregon GCC Front-End exp … high memory 44 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron The tree_node union • The “common” part contains code (kind of tree – declaration, expression, etc.) chain (for linking trees together) type (type of the represented item – also a tree) flags • • • • side effects addressable access flags (used for other things in non-declarations) 7 language-specific flags OOPSLA 2006 Portland, Oregon GCC Front-End 45 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Macros for accessing tree parts • In the common part TREE_* • TREE_CODE(tree) • TREE_TYPE(tree) • TREE_SIDE_EFFECTS(tree) etc. • For specific trees type trees • TYPE_* – TYPE_FIELDS(tree) – TYPE_NAME(tree) OOPSLA 2006 Portland, Oregon gets a list of fields in the type gets the type’s associated decl GCC Front-End 46 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Expression trees • Lots of tree codes used for expressions gcc/tree.def defines all standard tree codes LT_EXPR less-than conditional TRUTH_ORIF_EXPR short-circuiting OR conditional MODIFY_EXPR assignment NOP_EXPR type promotion (typically) SAVE_EXPR store in temporary for multiple uses ADDR_EXPR take address of • Front-end extensions to GENERIC permitted gcc/c-common.def gcc/cp/cp-tree.def e.g. DYNAMIC_CAST_EXPR gcc/java/java-tree.def e.g. SYNCHRONIZED_EXPR OOPSLA 2006 Portland, Oregon GCC Front-End 47 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron A few useful front-end functions build() expression tree building – pass tree code, tree type, and (arbitrary number of) operands fold() simple tree restructuring and optimization; mostly useful for constant folding gcc_assert() assertion verification – if it fails it gives an “internal compiler error” report with source file and line number under compilation (as well as source file and line number in compiler code) OOPSLA 2006 Portland, Oregon GCC Front-End 48 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Code naming conventions • • • • • Preprocessor macros ALL UPPERCASE Variables/functions all lowercase with underscores Predicates end in “_P” or “_p” Global flags start with “flag_” Global trees (vary somewhat with front-end) null_node (or null_pointer_node) integer_zero_node void_type_node integer_unsigned_type_node (or unsigned_int_type_node) • Tree accessor macros FROM_TO (e.g. TYPE_DECL) OOPSLA 2006 Portland, Oregon GCC Front-End 49 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Modifying the front-end OOPSLA 2006 Portland, Oregon 50 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Gimplification • GENERIC + extensions GIMPLE GIMPLE is a subset of GENERIC based on SIMPLE from McGill’s McCAT group • GIMPLE is just like GENERIC but no language extensions • front-end gimplify_expr callback 3-address form (with temporary variables) control structures lowered to goto OOPSLA 2006 Portland, Oregon GCC Front-End 51 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Front-end Middle-end Back-end GCC Middle-End Optimization of trees Static Single-Assignment form The Register Transfer Language intermediate representation OOPSLA 2006 Portland, Oregon 52 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron The middle-end in context Front-end Middle-end Gimplification Tree Tree optimizations Tree optimizations Treeoptimizations optimizations Expansion into RTL RTL RTL passes RTL passes RTLpasses passes Register allocation RTL RTL passes RTL passes RTLpasses passes Back-end OOPSLA 2006 Portland, Oregon GCC Middle-End 53 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Optimizations over the tree representation • Managed by pass manager in gcc/passes.c init_optimization_passes orders the passes passes represented by a tree_opt_pass struct (tree-pass.h) even though it does RTL now too • “gate” function – whether or not to run optimization • “execute” function – implementation of pass • property bitmaps – properties required, destroyed, and created • “todo” bitmaps – run internal GC, dump the tree, verify SSA form, etc. OOPSLA 2006 Portland, Oregon GCC Middle-End 54 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Passes and subpasses • Passes can be used to group subpasses • all_passes contains all_optimization_passes all_optimization_passes has optimizations in order • pass_tree_loop contains loop optimizations OOPSLA 2006 Portland, Oregon GCC Middle-End 55 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Adding a tree optimization pass OOPSLA 2006 Portland, Oregon 56 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Debugging middle-end tree passes Command-line options for dumping trees: -fdump-tree-X -fdump-tree-original -fdump-tree-optimized -fdump-tree-gimple -fdump-tree-inlined -fdump-tree-all output after pass X output initial tree (before all opts) output final GIMPLE (after all opts) dump before & after gimplification output after function inlining output after each pass (Make sure you specify an –O level or you might not get anything.) Passes available for dumping in GCC 4.1.1 (see info page): cfg, vcg, ch, ssa, salias, alias, ccp, storeccp, pre, fre, copyprop, store_copyprop, dce, mudflap, sra, sink, dom, dse, phiopt, forwprop, copyrename, nrv, vect, vrp OOPSLA 2006 Portland, Oregon GCC Middle-End 57 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Debugging middle-end tree passes Can specify options for tree dumps: • address print address of each tree node • slim less output; don’t dump all scope bodies • raw raw tree output (rather than pretty-printed C-like trees) • details detailed output (not supported by all passes) • stats statistics (not supported by all passes) • blocks basic block boundaries • vops output virtual operands for each statement • lineno output line #s • uid output decl’s unique ID along with each variable • all all except raw, slim, and lineno\ e.g. -fdump-tree-dse-details detailed post-DSE output -fdump-tree-all-all (almost) everything OOPSLA 2006 Portland, Oregon GCC Middle-End 58 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Static Single-Assignment (SSA) form Cytron et al. Efficiently computing static single assignment form and the control dependence graph. ACM TOPLAS, October 1991. • (Pure) functional languages have nice properties for optimization single-assignment: one assignment to each variable static single-assignment: next best thing • each variable assigned at one static location in the program makes it clearer where data is produced • reduces complexity of many optimization algorithms • removes association of variable uses over its lifetime OOPSLA 2006 Portland, Oregon GCC Middle-End 59 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron SSA renaming (1) y = 10 x = 1 y = 10; /* compute 2^y */ x = 1; while (y > 0) { x = x * 2; y = y - 1; } y < 0 ? false true x = x * 2 y = y - 1 EXIT model control flow OOPSLA 2006 Portland, Oregon GCC Middle-End 60 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron SSA renaming (2) y1 = 10 x1 = 1 y1 = 10; /* compute 2^y */ x1 = 1; while (y1 > 0) { x2 = x1 * 2; y2 = y1 - 1; } version all variables OOPSLA 2006 Portland, Oregon y1 < 0 ? false true x2 = x1 * 2 y2 = y1 - 1 GCC Middle-End EXIT 61 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron SSA renaming (3) y1 = 10; /* compute 2^y */ x1 = 1; while(true) { x3 = φ(x1, x2); y3 = φ(y1, y2); if (y3 > 0) break; } y1 = 10 x1 = 1 x3 = φ(x1, x2) y3 = φ(y1, y2) y3 < 0 ? x2 = x3 * 2; y2 = y3 - 1; insert “phi” nodes OOPSLA 2006 Portland, Oregon false true x2 = x3 * 2 y2 = y3 - 1 GCC Middle-End EXIT 62 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Into and out of SSA form in GCC pass_build_ssa gcc/tree-into-ssa.c SSA optimizations pass_del_ssa pass_del_ssa pass_del_ssa pass_del_ssa pass_del_ssa pass_del_ssa gcc/tree-outof-ssa.c OOPSLA 2006 Portland, Oregon GCC Middle-End 63 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Dealing with SSA form in GCC • Given a tree node n with code = PHI_NODE PHI_RESULT(n) PHI_NUM_ARGS(n) PHI_ARG_DEF(n, i) PHI_ARG_EDGE(n, i) PHI_ARG_ELT(n, i) get lhs of φ get rhs count get ssa-name get edge tuple (ssa-name, edge) • Given a tree node n with code = SSA_NAME SSA_NAME_DEF_STMT(n) get defining statement SSA_NAME_VERSION(n) get SSA version # OOPSLA 2006 Portland, Oregon GCC Middle-End 64 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron A few useful functions in the middle-end walk_use_def_chains(var, func, data) start at ssa-name var, calling func at each point up the chain; data is a generic pointer for use by func — see tree-ssa.c and internals docs (info gccint) walk_dominator_tree(dom-walk-data, basic-block) start at basic-block and walk children in dominator relationship; dom-walk-data provides several callbacks — see domwalk.h and internals docs (info gccint) OOPSLA 2006 Portland, Oregon GCC Middle-End 65 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Implementing an optimization from start to finish OOPSLA 2006 Portland, Oregon 66 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron RTL expansion and optimization • Expansion performed by pass_expand (gcc/cfgexpand.c) Back-end has a say in this • As of GCC 4.1.x, RTL passes are carried out by same pass manager that works on trees • pass_final (at end) outputs assembly OOPSLA 2006 Portland, Oregon GCC Middle-End 67 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Front-end Middle-end Back-end GCC Back-End Register allocation Instruction selection Debugger support OOPSLA 2006 Portland, Oregon 68 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Register allocation • RTL pseudo-registers hard registers • Proceeds in several passes 1. Register class scan (preference registers) 2. Register allocation within basic blocks 3. Register allocation for remaining registers 4. Reload (renumbering, spilling) OOPSLA 2006 Portland, Oregon GCC Back-End 69 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Instruction selection • Machine description (.md) files for target CPU define_expand() matches standard names and generates RTL; assists in expansion of GIMPLE define_insn() matches RTL templates and generates assembly • Internals documentation has details OOPSLA 2006 Portland, Oregon GCC Back-End 70 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron A machine description tour OOPSLA 2006 Portland, Oregon 71 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Debugger support • Specifying –g to the compiler inserts debugging symbols in the assembly output • DWARF2 format embedded within ELF a tree of debug info entries (compilation unit at the root) • each with a linked list of attributes DWARF2 manual: ftp.freestandards.org/pub/dwarf/dwarf-2.0.0.pdf • Once assembled, “readelf –w” interprets them OOPSLA 2006 Portland, Oregon GCC Back-End 72 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Runtime Issues Object layout Virtual method lookup The Boehm garbage collector crt stuff OOPSLA 2006 Portland, Oregon 73 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Simple object layout (C++) class A { public: int x; virtual void myMethod(); virtual void other(); }; class B : public A { public: int y; virtual void myMethod(); virtual void third(); }; OOPSLA 2006 Portland, Oregon A::myMethod vtable A::other x vtable for A instances of A B::myMethod vtable A::other x B::third y vtable for B instances of B GCC Runtime Issues 74 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron OOPSLA 2006 Portland, Oregon A::myMethod vtable A::other x vtable for A instances of A B::myMethod vtable A::other x B::third y vtable for B instances of B GCC Runtime Issues subobject A of B sub-vtable A of B Simple object layout (C++) 75 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron sub-vtable Object of B sub-vtable A of B class B pointer GC descriptor vtable finalize x hashCode y equals subobject A of B Object layout (Java) instances of B toString myMethod other third OOPSLA 2006 Portland, Oregon vtable for B clone GCC Runtime Issues 76 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron But more complicated for C++ ! First, classes might not have virtual functions ! class A { public: int x; void myMethod(); void other(); }; x class B : public A { public: int y; void myMethod(); void third(); }; OOPSLA 2006 Portland, Oregon x y subobject A of B instances of A instances of B GCC Runtime Issues 77 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron But more complicated for C++ ! class A { Second, public: int x; virtual void one(); }; classes might have multiple bases ! class B { public: int y; virtual void two(); }; A::one vtable vtable for A x instances of A B::two vtable vtable for B y instances of B class C : public A, public B { public: int z; virtual void three(); }; OOPSLA 2006 Portland, Oregon ? ? vtable for C instances of C GCC Runtime Issues 78 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron A::one vtable for A vtable A::one vtable x — x instances of A — vtable B::two y C::three z vtable for C instances of C B::two vtable for B vtable y instances of B OOPSLA 2006 Portland, Oregon subobject B of C subobject A of C Object layout for multiple bases Requires “this pointer-adjustment” GCC Runtime Issues 79 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Multiple bases, cont’d OOPSLA 2006 Portland, Oregon vtable [ offset = – 4 ] x — vtable B::two y C::three z vtable for C instances of C GCC Runtime Issues subobject B of C A::one subobject A of C But what about dynamic_cast ?! 80 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Multiple bases, cont’d — OOPSLA 2006 Portland, Oregon A::one vtable [ offset = – 4 ] x — vtable B::two y C::three z vtable for C instances of C GCC Runtime Issues subobject B of C [ offset = 0 ] subobject A of C Top-level offset 81 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Multiple bases, finished* ptr. typeinfo C * A::one vtable [ offset = – 4 ] x ptr. typeinfo C vtable B::two y C::three z vtable for C instances of C there are further complications, but we’ll leave it here OOPSLA 2006 Portland, Oregon GCC Runtime Issues subobject B of C [ offset = 0 ] subobject A of C But what about C++ type info ?! 82 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Java and C++ share object layout [ offset = 0 ] null typeinfo class B pointer GC descriptor finalize hashCode toString clone myMethod other third OOPSLA 2006 Portland, Oregon GCC Runtime Issues vtable for (Java) B equals 83 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Virtual method lookup (C++, Java) [ offset = 0 ] null typeinfo Now, virtual method invocation is a snap ! Compiler knows method offset within vtable vtable class B pointer x GC descriptor y finalize instance of B hashCode So it generates an indirect access through instance pointer… toString clone myMethod …and invokes the method through the pointer found in vtable OOPSLA 2006 Portland, Oregon GCC Runtime Issues other third 84 22 October 2006 vtable for (Java) B equals Demystifying GCC Morgan Deters and Ron K. Cytron The Boehm garbage collector Boehm, H., Space Efficient Conservative Garbage Collection. In ACM PLDI’91. • Conservative mark & sweep garbage collector designed to operate in a hostile environment as a drop-in replacement for malloc “conservative” means it cannot distinguish between pointers and non-pointers Java is considerably less “hostile” than C/C++ • can’t hide pointers from the compiler OOPSLA 2006 Portland, Oregon GCC Runtime Issues 85 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Java and Boehm GC • Java front-end generates class pointer masks stows them in vtable computed in gcc/java/boehm.c • Class too big for a pointer mask ? use a count of reference fields use a “mark procedure” [ offset = 0 ] null typeinfo class B pointer GC descriptor finalize … • Where to look boehm-gc/doc contains docs libjava/prims.cc contains GC-aware allocation routines OOPSLA 2006 Portland, Oregon GCC Runtime Issues 86 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron crt stuff (“C runtime”) • crt1.o, crti.o, crtn.o* provided by glibc crt1.o sets up libc before main() is even invoked crti.o prologue for .init and .fini crtn.o epilogue for .init and .fini • crtbegin.o, crtend.o* provided by GCC crtbegin.o crtend.o contributes frame_dummy() call to .init; calls static data destructors in .fini calls static data constructors in .init code in gcc/crtstuff.c * OOPSLA 2006 Portland, Oregon GCC Runtime Issues and some variations 87 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Language feature with runtime support OOPSLA 2006 Portland, Oregon 88 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Wrap-up Running GCC under GDB Obtaining development versions of GCC Reporting bugs in GCC What’s next for GCC OOPSLA 2006 Portland, Oregon 89 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Running GCC under GDB • Inevitably, hacking a compiler will result in segfault assertion fault incorrect code generation • Remember to attach debugger to the compiler, not the driver • “gcc –v …,” then use GDB on the actual front-end OOPSLA 2006 Portland, Oregon Wrap-up 90 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Debugging GCC OOPSLA 2006 Portland, Oregon 91 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Obtaining development versions of GCC • All GCC development is in the open design discussions change logs bugs • Subversion (SVN) repository public read access for details: gcc.gnu.org/svn.html clients available from subversion.tigris.org/ OOPSLA 2006 Portland, Oregon Wrap-up 92 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron What to do if you find a bug in GCC • Check to see if bug is present in SVN version • Check to see if bug is in bug database http://gcc.gnu.org/bugzilla/ • Collect version information (gcc --version) • Guidelines: http://gcc.gnu.org/bugs.html • Report it: http://gcc.gnu.org/bugzilla/ OOPSLA 2006 Portland, Oregon Wrap-up 93 22 October 2006 Demystifying GCC Morgan Deters and Ron K. Cytron Thanks! Ron K. Cytron [email protected] – Distributed Object Computing Laboratory – Washington University Dept. of Computer Science & Engineering St. Louis, MO 63130 USA OOPSLA 2006 Portland, Oregon 94 22 October 2006 Copyright © 2005–2006 Morgan Deters Morgan Deters [email protected]