Transcript Document

Chapter 1
Introduction
Samuel
College of Computer Science & Technology
Harbin Engineering University
Compilers
• Compilers are computer programs that
translate one language to another .
– Very complex program from 10,000 to 1,000,000
lines of code.
• Its input is a program written in its source
language.
• It produces an equivalent program written in
its target language.
Compiler
[email protected]
2
This is a book.
Step 1: lexical analysis
This / is / a / book / .
Step 2: syntax analysis
This
 subject
is
 predicate
a
 quantifier
book  object
.
 end
Step 3: semantic analysis
This
 pronoun 这
is
 copula 是
a
 numeral 一
book  noun
书
.
 period 。
Step 4: This is a book.
Step 5: 这是一书。
[email protected]
Step 6: 这是一本书。 3
Translation Process
Compiler
Translation Process
Source
Code
1
Scanner
4
Source Code
Optimizer
Literal
Table
Compiler
2
Parser
5
Code
Generator
Symbol
Table
[email protected]
3
Semantic
Analyzer
6
Target Code
Optimizer
Error
Handler
Target
Code
4
Translation Process
Source
Code
1
Scanner
Source Code
Optimizer
Literal
Table
Compiler
Parser
Code
Generator
Symbol
Table
[email protected]
Semantic
Analyzer
Target Code
Optimizer
Error
Handler
5
The Scanner
• Reads the source program (stream of characters).
• Performs lexical analysis: collects sequences of
characters into meaningful units called tokens.
• Example:
a[index] = 4 + 2
a
[
index
]
=
4
+
2
Compiler
identifier
left bracket
identifier
right bracket
assignment
number
plus sign
number
[email protected]
6
Translation Process
Source
Code
Tokens
Scanner
Source Code
Optimizer
Literal
Table
Compiler
2
Parser
Code
Generator
Symbol
Table
[email protected]
Semantic
Analyzer
Target Code
Optimizer
Error
Handler
7
The Parser
• Receives the source in form of tokens.
• Performs syntax analysis
– determines the structure of the program
– similar to performing grammatical analysis on a sentence
in natural language.
• The result is represented as a parse tree or a syntax
tree.
Compiler
[email protected]
8
Parse Tree
a[index] = 4 + 2
expression
assign-expression
=
expression
expression
subscript-expression
expression
identifier
a
Compiler
[
expression
identifier
index
additive-expression
]
expression
number
4
[email protected]
+
expression
number
2
9
Abstract Syntax Tree
a[index] = 4 + 2
An abstract syntax tree is a condensation of the
information contained in a parse tree.
assign-expression
subscript-expression
identifier
a
Compiler
additive-expression
identifier
index
number
4
[email protected]
number
2
10
Translation Process
Source
Code
Tokens
Scanner
Source Code
Optimizer
Literal
Table
Compiler
Parser
Code
Generator
Symbol
Table
[email protected]
Syntax Tree
3
Semantic
Analyzer
Target Code
Optimizer
Error
Handler
11
The Semantic Analyzer
• The semantics of a program are its “meaning”.
• The semantics of a program determine its runtime
behavior.
• Most programming languages have features (called
static semantics) that can be determined prior to
execution.
• Typical static semantics features
– Declarations
– Type checking
• The extra information computed by the semantic
analyzer are called attributes.
– They are added to the tree as annotations, or “decorations”
Compiler
[email protected]
12
Abstract Syntax Tree
a[index] = 4 + 2
An abstract syntax tree is a condensation of the
information contained in a parse tree.
assign-expression
subscript-expression
identifier
a
Compiler
additive-expression
identifier
index
number
4
[email protected]
number
2
13
Abstract Syntax Tree
a[index] = 4 + 2
assign-expression
subscript-expression
identifier
a
Compiler
additive-expression
identifier
index
[email protected]
number
4
number
2
14
Annotated Tree
a[index] = 4 + 2
assign-expression
subscript-expression
integer
identifier
a
array of
integer
Compiler
additive-expression
integer
identifier
index
integer
[email protected]
number
4
integer
number
2
integer
15
Translation Process
Source
Code
Tokens
Scanner
Parser
Syntax Tree
Semantic
Analyzer
4
Source Code
Optimizer
Literal
Table
Compiler
Code
Generator
Symbol
Table
[email protected]
Target Code
Optimizer
Error
Handler
16
The Source Code Optimizer
• The earliest point at which optimization steps can
be performed is just after semantic analysis.
• There may be possibilities that depend only on the
source code.
• Compilers exhibit a wide variation in the kind of
optimization and its placement.
• The output of the source code optimizer is the
intermediate representation (IR) or intermediate
code.
Compiler
[email protected]
17
Example
• 4 + 2 can be precomputed by the compiler.
– This optimization is known as constant folding.
– This optimization can be performed on the annotated syntax
tree by collapsing the right hand subtree to its constant value.
assign-expression
subscript-expression
integer
identifier
a
array of
Compiler
integer
additive-expression
integer
identifier
index
integer
number
4
integer
[email protected]
number
2
integer
18
Example
• 4 + 2 can be precomputed by the compiler.
– This optimization is known as constant folding.
– This optimization can be performed on the annotated syntax
tree by collapsing the right hand subtree to its constant value.
assign-expression
subscript-expression
integer
identifier
a
array of
Compiler
integer
number
6
integer
identifier
index
integer
[email protected]
19
Translation Process
Source
Code
Tokens
Scanner
Parser
5
Source Code
Code
Optimizer Intermediate Generator
Syntax Tree
Semantic
Analyzer
Target Code
Optimizer
code
Literal
Table
Compiler
Symbol
Table
[email protected]
Error
Handler
20
The Code Generator
• The code generator takes the intermediate code or IR
and generates code for the target machine.
– We will write target code in assembly language form. Most
compilers generate object code directly.
• The properties of the target machine become
important.
– Use instructions of the target machine.
– Data representations: how many bytes or words integer and
floating-point data types occupy in memory.
Compiler
[email protected]
21
Example
MOV R0, index
MUL R0, 2
MOV R1, &a
ADD R1, R0
MOV *R1, 6
a[index] = 4 + 2
a[index] = 6
;; value of index -> R0
;; double value in R0
;; address of a ->R1
;; add R0 to R1
;; constant 6 -> address in R1
• &a is the address of a (the base address of the array)
• *R1 means indirect register addressing
• We assumed that the machine performs byte
addressing.
• Integers occupy two bytes of memory.
Compiler
[email protected]
22
Translation Process
Source
Code
Tokens
Scanner
Parser
Source Code
Code
Optimizer Intermediate Generator
code
Literal
Table
Compiler
Target
Code
Symbol
Table
[email protected]
Syntax Tree
Semantic
Analyzer
6
Target Code
Optimizer
Error
Handler
23
The Target Code Optimizer
a[index] = 6
• Improvements
include
MOV R0, index
;; value of index -> R0
–
–
–
Choosing
modes value
to improve
MUL R0,addressing
2
;; double
in R0 performance.
Replacing
MOV R1,slow
&a instructions
;; addressby
of faster
a ->R1ones.
Eliminating
redundant
or R0
unnecessary
operations
ADD R1, R0
;; add
to R1
• Example:
MOV *R1, 6
;; constant 6 -> address in R1
MOV R0, index
SHL R0
MOV &a[R0], 6
Compiler
;; value of index -> R0
;; double the value in R0
;; constant 6 -> address a+R0
[email protected]
24
Translation Process
Source
Code
Tokens
Scanner
Parser
Source Code
Code
Optimizer Intermediate Generator
code
Literal
Table
Compiler
Target
Code
Symbol
Table
[email protected]
Syntax Tree
Semantic
Analyzer
Target Code
Optimizer
Error
Handler
Target
Code
25
Interpreters
• An interpreter is a language translator like a compiler.
• The difference: the source program is executed
immediately, not after translation is complete.
• Programming language can be either interpreted or
compiled.
• Interpreted languages: BASIC, LISP, Java
• Compiled languages: FORTRAN, C, C++.
• Interpreters share many operations with compilers.
Compiler
[email protected]
26
Assemblers
• An assembler is a translator for the assembly
language of a particular computer.
• Assembly language is a symbolic form of the
machine language and it is easy to translate.
• Sometimes, a compiler will generate assembly
language as its target language.
• Then assembler will finish the translation into
object code.
Compiler
[email protected]
27
Linkers
• A linker collects code separately compiled or
assembled in different object files into final
executable file.
• Also connects to the code for standard library
functions and to resources supplied by OS (memory
allocators, I/O devices)
• A linker was originally one of the principal activities
of a compiler.
Compiler
[email protected]
28
Loaders
• In object code the primary memory references are
made relative to an undetermined starting
location that can be anywhere in memory.
• Loader will resolve all relocateable addresses to a
given starting address.
• Usually, the loading process is part of OS.
Compiler
[email protected]
29
Preprocessors
• A preprocessor is a separate program that is called
by the compiler before the translation begins.
• Preprocessors can
– Delete comments
– Include other files
– Perform macro substitutions
• A macro is a shorthand description of a repeated sequence of
text
Compiler
[email protected]
30
Editors
• Source programs are written using an editor that
produces a standard file (ASCII).
• Recently, compilers have been bundled with editors
and other programs into an interactive development
environment (IDE).
• Such editors may be oriented towards the format of
programming language.
– Programmer may be informed of errors as the program is
written.
• The compiler can be called from within the editor.
Compiler
[email protected]
31
Debuggers
• A debugger is a program that determines execution
errors in a compiled program.
– It is also packaged in IDE.
• The debugger keeps track of the source code
information such as line numbers, names of variables
and procedures.
• It can halt execution at breakpoint and provide
information on called functions and current values of
variables.
Compiler
[email protected]
32
Homework
1.2 Given the C assignment
a[i+1] = a[i] + 2
draw a parse tree and a syntax tree for the
expression, using the similar example as a guide.
Compiler
[email protected]
33