CS416 Compiler Design

Download Report

Transcript CS416 Compiler Design

CS308 Compiler Principles
Introduction
Fan Wu
Department of Computer Science and Engineering
Shanghai Jiao Tong University
Why study compiling?
• Importance:
– Programs written in high-level languages have to be translated into binary
codes before executing
– Reduce execution overhead of the programs
– Make high-performance computer architectures effective on users'
programs
• Influence:
– Language Design
– Computer Architecture (influence is bi-directional)
• Techniques used influence other areas
–
–
–
–
–
–
2
Text editors, information retrieval system, and pattern recognition programs
Query processing system such as SQL
Equation solver
Natural Language Processing
Debugging and finding security holes in codes
…
Compiler Principles
Compiler Concept
• A compiler is a program that takes a
program written in a source language and
translates it into an equivalent program in
a target language.
source program
( Normally a program
written in a high-level
programming
language)
3
COMPILER
target program
( Normally the equivalent
program in machine code
relocatable object file)
error messages
Compiler Principles
Interpreter
• An interpreter directly executes the
operations specified in the source program
on inputs supplied by the user.
source program
INTERPRETER
input
error messages
4
Compiler Principles
output
Programming Languages
• Compiled languages:
– Fortran, Pascal, C, C++, C#, Delphi, Visual
Basic, …
• Interpreted languages:
– BASIC, Perl, PHP, Ruby, TCL, MATLAB,…
• Joint Compiled and Interpreted languages
– Java, Python, …
5
Compiler Principles
Compiler vs. Interpreter
• Preprocessing
– Compilers do extensive preprocessing
– Interpreters run programs “as is”, with little or
no preprocessing
• Efficiency
– The target program produced by a compiler is
usually much faster than interpreting the
source codes
6
Compiler Principles
Compiler Structure
Intermediate
Language
Source
Language
Front End –
language
specific
Analysis
Back End –
machine
specific
Symbol
Table
Synthesis
•Separation of Concerns
•Retargeting
7
Compiler Principles
Target
Language
Two Main Phases
• Analysis Phase: breaks up a source
program into constituent pieces and
produces an internal representation of it
called intermediate code.
• Synthesis Phase: translates the
intermediate code into the target program.
8
Compiler Principles
Phases of Compilation
• Compilers work in a sequence of phases.
• Each phase transforms the source program from one
representation into another representation.
• They use the symbol table to store information of the entire
source program.
Intermediate
Language
Source
Language
Lexical Analyzer
Syntax Analyzer
Semantic Analyzer
Intermediate Code
Generator
Analysis
9
Code Optimizer
Code Generator
Symbol
Table
Synthesis
Compiler Principles
Target
Language
A Model of A Compiler Font End
• Lexical analyzer reads the source program character by character and
returns the tokens of the source program.
• Parser creates the tree-like syntactic structure of the given program.
• Intermediate-code generator translates the syntax tree into threeaddress codes.
10
Compiler Principles
Lexical Analysis
11
Compiler Principles
Lexical Analysis
• Lexical Analyzer reads the source
program character by character and
returns the tokens of the source program.
<token-name, attribute-value>
• A token describes a pattern of characters
having the same meaning in the source
program. (such as identifiers, operators,
keywords, numbers, delimiters, and so on)
<NUM, 60>
12
Compiler Principles
White Space Removal
• No blank, tab, newline, or comments in
grammar
Skipping white space
13
Compiler Principles
Constants
• When a sequence of digits appears in the input
stream, the lexical analyzer passes to the parser a
token consisting of the terminal num along with an
integer-valued attribute computed from the digits.
31+28+59  <num, 31><+><num, 28><+><num, 59>
• Simulate parsing some number ....
14
Compiler Principles
Keywords and Identifiers
Keywords:
Fixed character strings used as punctuation marks or
to identify constructs.
Identifiers:
A character string forms an identifier only if it is not a
keyword.
15
Compiler Principles
Lexical Analysis Cont’d
• Puts information about identifiers into the
symbol table.
• Regular expressions are used to describe
tokens (lexical constructs).
• A (Deterministic) Finite State Automaton
can be used in the implementation of a
lexical analyzer.
16
Compiler Principles
Symbol Table
17
Compiler Principles
Symbol Table
• Symbol Tables are data structures that are
used by compilers to hold information about
the source-program constructs.
• For each identifier, there is an entry in the
symbol table containing its information.
• Symbol tables need to support multiple
declarations of the same identifier
– One symbol table per scope (of declaration)...
{ int x; char y; { bool y; x; y; } x; y; }
x
int
y
char
Outer symbol table
18
y
bool
Inner symbol table
Compiler Principles
Parsing
• A Syntax/Semantic Analyzer (Parser) creates the syntactic
structure (generally a parse tree) of the given program.
• Parsing is the problem of taking a string of terminals and
figuring out how to derive it from the start symbol of the
grammar
20
Compiler Principles
Syntax Definition
• Context-Free Grammar (CFG) is used to specify
the syntax of a formal language (for example a
programming language like C, Java)
• Grammar describes the structure (usually
hierarchical) of programming languages.
– Example: in Java an IF statement should fit in
• if ( expression ) statement else statement
Production
– statement  if ( expression ) statement else
statement
– Note the recursive nature of statement.
23
Compiler Principles
Definition of CFG
• Four components:
– A set of terminal symbols (tokens):
elementary symbols of the language defined
by the grammar
– A set of non-terminals (syntactic variables):
represent the set of strings of terminals
– A set of productions: non-terminal  a
sequence of terminals and/or non-terminals
– A designation of one of the non-terminals as
the start symbol.
24
Compiler Principles
A Grammar Example
List of digits separated by plus or minus signs
•
•
•
•
•
•
25
Accepts strings such as 9-5+2, 3-1, or 7.
0, 1, …, 9, +, - are the terminal symbols
list and digit are non-terminals
Every “line” is a production
list is the start symbol
Grouping: list → list + digit | list – digit | digit
Compiler Principles
Derivations
• A grammar derives strings by beginning with
the start symbol and repeatedly replacing a
non-terminal by the body of a production
• Language: The terminal strings that can be
derived from the start symbol defined by the
grammar.
• Example: Derivation of 9-5+2
– 9 is a list, since 9 is a digit.
– 9-5 is a list, since 9 is a list and 5 is a digit.
– 9-5+2 is a list, since 9-5 is a list and 2 is a digit.
26
Compiler Principles
Parse Trees
• A parse tree shows how the start symbol
of a grammar derives a string in the
language
A  XYZ
27
Compiler Principles
Parse Trees Properties
• The root is labeled by the start symbol.
• Each leaf is labeled by a terminal or by ε.
• Each interior node is labeled by a nonterminal.
• If A is the non-terminal labeling some interior
node and X1, X2,… , Xn are the labels of the
children of that node from left to right, then
there must be a production A  X1X2 · · · Xn.
28
Compiler Principles
Parse Tree for 9-5+2
29
Compiler Principles
Ambiguity
• A grammar can have more than one parse
tree generating a given string of terminals.
list  list + digit | list – digit | digit
digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
string  string + string | string - string | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7
|8|9
(9-5)+2 = 6
30
9-5+2
Compiler Principles
9-(5+2) = 2
Eliminating Ambiguity
• Operator Associativity: in most
programming languages arithmetic operators
have left associativity.
– Example: 9+5-2 = (9+5)-2
– Exception: Assignment operator = has right
associativity: a=b=c is equivalent to a=(b=c)
• Operator Precedence: if an operator has
higher precedence, then it will bind to it’s
operands first.
– Example: * has higher precedence than +,
therefore 9+5*2 = 9+(5*2)
31
Compiler Principles
Parsing
• Parsing is the process of determining how
a string of terminals can be generated by a
grammar.
• Two classes:
– Top-down: construction of parse tree starts at
the root and proceeds towards the leaves
– Bottom-up: construction of parse tree starts at
the leaves and proceeds towards the root
32
Compiler Principles
Top-Down Parsing
• The top-down construction of a parse tree is
done by starting from the root, and repeatedly
performing the following two steps.
– At node N, labeled with non-terminal A, select the
proper production of A and construct children at N
for the symbols in the production body.
– Find the next node at which a subtree is to be
constructed, typically the leftmost unexpanded
non-terminal of the tree.
33
Compiler Principles
Top-Down Parsing
34
Compiler Principles
Predictive Parsing
• Recursive descent parsing: a top-down
method of syntax analysis in which a set of
recursive procedures is used to process
the input.
• Predictive parsing: a simple form of
recursive-descent parsing
– Lookahead symbol unambiguously
determines the flow of control based on the
first terminal(s) of the nonterminal
35
Compiler Principles
Procedure for stmt
Necessary
condition to use
predictive
parsing?
No confliction on
the first symbols of
the bodies for the
same head.
36
Compiler Principles
Left Recursion Elimination
• Leftmost symbol of the body is the same as
the nonterminal:
• A left-recursive production can be eliminated
by rewriting the offending production:
37
Compiler Principles
Syntax Analyzer vs. Lexical Analyzer
• Both of them do similar things
• Granularity
– The lexical analyzer works on the characters to
recognize the smallest meaningful units (tokens)
in a source program.
– The syntax analyzer works on the smallest
meaningful units (tokens) in a source program to
recognize meaningful structures in the
programming language.
• Recursion
– The lexical analyzer deals with simple nonrecursive constructs of the language.
– The syntax analyzer deals with recursive
constructs of the language.
38
Compiler Principles
Semantic Analysis
• Semantic Analyzer
– adds semantic information to the parse tree (syntaxdirected translation)
– checks the source program for semantic errors
– collects type information for the code generation
– type checking: check whether each operator has
matching operands
– coercion: type conversion
39
Compiler Principles
Semantic Analysis
• A Semantic Analyzer checks the source
program for semantic errors and collects the
type information for the code generation.
• Type checking is an important part of semantic
analysis.
Syntax Tree
40
Semantic Tree
Compiler Principles
Syntax-Directed Translation
• Syntax-directed translation is done by
attaching rules or program fragments to
productions in a grammar.
• Infix expression  postfix expression
• Techniques: Attributes & Translation
Schemes
41
Compiler Principles
Postfix Notation
• Definition:
– If E is a variable or constant ,
• EE
– If E is an expression of the form E1 op E2,
• E1 op E2  E’1 E’2 op
– If E is a parenthesized expression of the form (E1),
• (E1)  E’1
• Examples:
– 9-5+2  95-2+
– 9-(5+2)  952+42
Compiler Principles
Attributes
• A syntax-directed definition
– associates attributes with non-terminals and
terminals in a grammar
– attaches semantic rules to the productions of
the grammar
• An attribute is said to be synthesized if its
value at a parse-tree node is determined
from attribute values of its children and
itself.
43
Compiler Principles
Semantic Rules for Infix to Postfix
Annotated
Parse Tree
9-5+2  95-2+
Syntax-directed definition
44
Compiler Principles
Translation Schemes
• A Syntax-Directed Translation Scheme is
a notation for specifying a translation by
attaching program fragments to
productions in a grammar.
• The program fragments are called
semantic actions.
45
Compiler Principles
A Translation Scheme
9-5+2  95-2+
Parse tree
Translation scheme
46
Compiler Principles
Attribute vs. Translation Scheme
• Syntax-directed attribute attaches strings
as attributes to the nodes in the parse tree
• Syntax-directed translation scheme prints
the translation incrementally, through
semantic actions
47
Compiler Principles
A Simple Translator
Grammar of List of digits separated by plus or minus signs
49
Compiler Principles
Translation of 9-5+2 to 95-2+
Leftrecursion
eliminated
50
Compiler Principles
Procedures for Simple Translator
51
Compiler Principles
Syntax vs. Semantics
• The syntax of a programming language
describes the proper form of its programs.
• The semantics of the language defines
what its programs mean, what each
program does when it executes.
53
Compiler Principles
Intermediate Code Generation
54
Compiler Principles
Intermediate Code Generation
• The front end of a compiler constructs an
intermediate representation of the source
program from which the back end
generates the target program.
• Two kinds of intermediate representations
– Tree: parse trees and (abstract) syntax trees
– Linear representation: three-address code
56
Compiler Principles
Three-Address Codes
• Three-address code is a sequence of instructions of
the form
x = y op z
• Arrays will be handled by using the following two
variants of instructions:
x[y]=z
x=y[z]
• Instructions for control flow:
ifFalse x goto L
ifTrue x goto L
goto L
• Instruction for copying value
x=y
60
Compiler Principles
Translation of Statements
• Use jump instructions to implement the
flow of control through the statement.
• The translation of
if expr then stmtl
61
Compiler Principles
Translation of Expressions
• Approach:
– No code is generated for identifiers and constants
– If a node x of class Expr has operator op, then an
instruction is emitted to compute the value at node x into a
temporary.
• Expression: i-j+k translates into
t1 = i-j
t2 = t1+k
• Expression: 2 * a[i] translates into
t1 = a [ i ]
t2 = 2 * t1
* Do not use a temporary in place of a[i], if a[i]
appears on the left side of an assignment.
64
Compiler Principles
Translation of Expressions
• Example:
65
Compiler Principles
Test Yourself
• Generate three-address codes for
If(x[2*a]==y[b]) x[2*a+1]=y[b+1];
t4=2*a
t2=x[t4]
t3=y[b]
t1= t2 == t3
ifFalse t1 goto after
t5=t4+1
t7=b+1
t6=y[t7]
x[t5]=t6
after:
66
Compiler Principles
Code Optimization
• The code optimizer optimizes the code
produced by the intermediate code
generator in the terms of time and space.
67
Compiler Principles
Code Generation
• The code generator takes as input an
intermediate representation of the source
program and maps it into the target
language.
• Example:
MOVE
MULT
ADD
MOVE
68
Compiler Principles
id3, R1
#60.0, R1
id2, R1
R1, id1
Tools
• Lexical Analysis – LeX, FLeX, JLeX
• Syntax Anaysis – Yacc, JavaCC, SableCC
• Semantic Analysis – Yacc, JavaCC,
SableCC
70
Compiler Principles
Homework
• Reading
– Chapter 1 and 2
71
Compiler Principles