Transcript Document
Chapter 1 Introduction Samuel College of Computer Science & Technology Harbin Engineering University Compilers • Compilers are computer programs that translate one language to another . – Very complex program from 10,000 to 1,000,000 lines of code. • Its input is a program written in its source language. • It produces an equivalent program written in its target language. Compiler [email protected] 2 This is a book. Step 1: lexical analysis This / is / a / book / . Step 2: syntax analysis This subject is predicate a quantifier book object . end Step 3: semantic analysis This pronoun 这 is copula 是 a numeral 一 book noun 书 . period 。 Step 4: This is a book. Step 5: 这是一书。 [email protected] Step 6: 这是一本书。 3 Translation Process Compiler Translation Process Source Code 1 Scanner 4 Source Code Optimizer Literal Table Compiler 2 Parser 5 Code Generator Symbol Table [email protected] 3 Semantic Analyzer 6 Target Code Optimizer Error Handler Target Code 4 Translation Process Source Code 1 Scanner Source Code Optimizer Literal Table Compiler Parser Code Generator Symbol Table [email protected] Semantic Analyzer Target Code Optimizer Error Handler 5 The Scanner • Reads the source program (stream of characters). • Performs lexical analysis: collects sequences of characters into meaningful units called tokens. • Example: a[index] = 4 + 2 a [ index ] = 4 + 2 Compiler identifier left bracket identifier right bracket assignment number plus sign number [email protected] 6 Translation Process Source Code Tokens Scanner Source Code Optimizer Literal Table Compiler 2 Parser Code Generator Symbol Table [email protected] Semantic Analyzer Target Code Optimizer Error Handler 7 The Parser • Receives the source in form of tokens. • Performs syntax analysis – determines the structure of the program – similar to performing grammatical analysis on a sentence in natural language. • The result is represented as a parse tree or a syntax tree. Compiler [email protected] 8 Parse Tree a[index] = 4 + 2 expression assign-expression = expression expression subscript-expression expression identifier a Compiler [ expression identifier index additive-expression ] expression number 4 [email protected] + expression number 2 9 Abstract Syntax Tree a[index] = 4 + 2 An abstract syntax tree is a condensation of the information contained in a parse tree. assign-expression subscript-expression identifier a Compiler additive-expression identifier index number 4 [email protected] number 2 10 Translation Process Source Code Tokens Scanner Source Code Optimizer Literal Table Compiler Parser Code Generator Symbol Table [email protected] Syntax Tree 3 Semantic Analyzer Target Code Optimizer Error Handler 11 The Semantic Analyzer • The semantics of a program are its “meaning”. • The semantics of a program determine its runtime behavior. • Most programming languages have features (called static semantics) that can be determined prior to execution. • Typical static semantics features – Declarations – Type checking • The extra information computed by the semantic analyzer are called attributes. – They are added to the tree as annotations, or “decorations” Compiler [email protected] 12 Abstract Syntax Tree a[index] = 4 + 2 An abstract syntax tree is a condensation of the information contained in a parse tree. assign-expression subscript-expression identifier a Compiler additive-expression identifier index number 4 [email protected] number 2 13 Abstract Syntax Tree a[index] = 4 + 2 assign-expression subscript-expression identifier a Compiler additive-expression identifier index [email protected] number 4 number 2 14 Annotated Tree a[index] = 4 + 2 assign-expression subscript-expression integer identifier a array of integer Compiler additive-expression integer identifier index integer [email protected] number 4 integer number 2 integer 15 Translation Process Source Code Tokens Scanner Parser Syntax Tree Semantic Analyzer 4 Source Code Optimizer Literal Table Compiler Code Generator Symbol Table [email protected] Target Code Optimizer Error Handler 16 The Source Code Optimizer • The earliest point at which optimization steps can be performed is just after semantic analysis. • There may be possibilities that depend only on the source code. • Compilers exhibit a wide variation in the kind of optimization and its placement. • The output of the source code optimizer is the intermediate representation (IR) or intermediate code. Compiler [email protected] 17 Example • 4 + 2 can be precomputed by the compiler. – This optimization is known as constant folding. – This optimization can be performed on the annotated syntax tree by collapsing the right hand subtree to its constant value. assign-expression subscript-expression integer identifier a array of Compiler integer additive-expression integer identifier index integer number 4 integer [email protected] number 2 integer 18 Example • 4 + 2 can be precomputed by the compiler. – This optimization is known as constant folding. – This optimization can be performed on the annotated syntax tree by collapsing the right hand subtree to its constant value. assign-expression subscript-expression integer identifier a array of Compiler integer number 6 integer identifier index integer [email protected] 19 Translation Process Source Code Tokens Scanner Parser 5 Source Code Code Optimizer Intermediate Generator Syntax Tree Semantic Analyzer Target Code Optimizer code Literal Table Compiler Symbol Table [email protected] Error Handler 20 The Code Generator • The code generator takes the intermediate code or IR and generates code for the target machine. – We will write target code in assembly language form. Most compilers generate object code directly. • The properties of the target machine become important. – Use instructions of the target machine. – Data representations: how many bytes or words integer and floating-point data types occupy in memory. Compiler [email protected] 21 Example MOV R0, index MUL R0, 2 MOV R1, &a ADD R1, R0 MOV *R1, 6 a[index] = 4 + 2 a[index] = 6 ;; value of index -> R0 ;; double value in R0 ;; address of a ->R1 ;; add R0 to R1 ;; constant 6 -> address in R1 • &a is the address of a (the base address of the array) • *R1 means indirect register addressing • We assumed that the machine performs byte addressing. • Integers occupy two bytes of memory. Compiler [email protected] 22 Translation Process Source Code Tokens Scanner Parser Source Code Code Optimizer Intermediate Generator code Literal Table Compiler Target Code Symbol Table [email protected] Syntax Tree Semantic Analyzer 6 Target Code Optimizer Error Handler 23 The Target Code Optimizer a[index] = 6 • Improvements include MOV R0, index ;; value of index -> R0 – – – Choosing modes value to improve MUL R0,addressing 2 ;; double in R0 performance. Replacing MOV R1,slow &a instructions ;; addressby of faster a ->R1ones. Eliminating redundant or R0 unnecessary operations ADD R1, R0 ;; add to R1 • Example: MOV *R1, 6 ;; constant 6 -> address in R1 MOV R0, index SHL R0 MOV &a[R0], 6 Compiler ;; value of index -> R0 ;; double the value in R0 ;; constant 6 -> address a+R0 [email protected] 24 Translation Process Source Code Tokens Scanner Parser Source Code Code Optimizer Intermediate Generator code Literal Table Compiler Target Code Symbol Table [email protected] Syntax Tree Semantic Analyzer Target Code Optimizer Error Handler Target Code 25 Interpreters • An interpreter is a language translator like a compiler. • The difference: the source program is executed immediately, not after translation is complete. • Programming language can be either interpreted or compiled. • Interpreted languages: BASIC, LISP, Java • Compiled languages: FORTRAN, C, C++. • Interpreters share many operations with compilers. Compiler [email protected] 26 Assemblers • An assembler is a translator for the assembly language of a particular computer. • Assembly language is a symbolic form of the machine language and it is easy to translate. • Sometimes, a compiler will generate assembly language as its target language. • Then assembler will finish the translation into object code. Compiler [email protected] 27 Linkers • A linker collects code separately compiled or assembled in different object files into final executable file. • Also connects to the code for standard library functions and to resources supplied by OS (memory allocators, I/O devices) • A linker was originally one of the principal activities of a compiler. Compiler [email protected] 28 Loaders • In object code the primary memory references are made relative to an undetermined starting location that can be anywhere in memory. • Loader will resolve all relocateable addresses to a given starting address. • Usually, the loading process is part of OS. Compiler [email protected] 29 Preprocessors • A preprocessor is a separate program that is called by the compiler before the translation begins. • Preprocessors can – Delete comments – Include other files – Perform macro substitutions • A macro is a shorthand description of a repeated sequence of text Compiler [email protected] 30 Editors • Source programs are written using an editor that produces a standard file (ASCII). • Recently, compilers have been bundled with editors and other programs into an interactive development environment (IDE). • Such editors may be oriented towards the format of programming language. – Programmer may be informed of errors as the program is written. • The compiler can be called from within the editor. Compiler [email protected] 31 Debuggers • A debugger is a program that determines execution errors in a compiled program. – It is also packaged in IDE. • The debugger keeps track of the source code information such as line numbers, names of variables and procedures. • It can halt execution at breakpoint and provide information on called functions and current values of variables. Compiler [email protected] 32 Homework 1.2 Given the C assignment a[i+1] = a[i] + 2 draw a parse tree and a syntax tree for the expression, using the similar example as a guide. Compiler [email protected] 33