Compiler - Tunghai University
Download
Report
Transcript Compiler - Tunghai University
Chapter 2
Chang Chi-Chung
2008.03 rev.1
A Simple Syntax-Directed Translator
This chapter contains introductory material to
Chapters 3 to 8
To create a syntax-directed translator that maps
infix arithmetic expressions into postfix
expressions.
Building a simple compiler involves:
Defining the syntax of a programming language
Develop a source code parser: for our compiler
we will use predictive parsing
Implementing syntax directed translation to
generate intermediate code
A Code Fragment To Be Translated
To extend syntax-directed translator to map code fragments into threeaddress code. See appendix A.
{
int i; int j;
float[100] a; float v; float x;
while (true) {
do i = i + 1; while ( a[i] < v );
do j = j – 1; while ( a[j] > v );
if ( i>= j ) break;
x = a[i]; a[i] = a[j]; a[j] = x;
}
}
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
i = i + 1
t1 = a [ i ]
if t1 < v goto 1
j = j -1
t2 = a [ j ]
if t2 > v goto 4
ifFalse i >= j goto 9
goto 14
x = a [ i ]
t3 = a [ j ]
a [ i ] = t3
a [ j ] = x
goto 1
A Model of a Compiler Front End
Source
program
Lexical
analyzer
Token
stream
Parser
Character
Stream
Symbol
Table
Syntax
tree
Intermediate
Code
Generator
Three-address
code
Two Forms of Intermediate Code
Abstract syntax trees
Tree-Address instructions
do-while
body
assign
[]
+
i
i
1:
2:
3:
>
a
1
v
i
i = i + 1
t1 = a [ i ]
if t1 < v goto 1
Syntax Definition
Using Context-free grammar (CFG)
BNF: Backus-Naur Form
Context-free grammar has four components:
A set of tokens (terminal symbols)
A set of nonterminals
A set of productions
A designated start symbol
Example of CFG
G = <T, N, P, S>
T = { +,-,0,1,2,3,4,5,6,7,8,9 }
N = { list, digit }
P=
list list + digit
list list – digit
list digit
digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
S = list
Derivations
The set of all strings (sequences of tokens)
generated by the CFG using derivation
Begin with the start symbol
Repeatedly replace a nonterminal symbol in the
current sentential form with one of the right-hand
sides of a production for that nonterminal
Example of the Derivations
list
list + digit
list - digit + digit
digit - digit + digit
9 - digit + digit
9 - 5 + digit
9-5+2
Production
list list + digit
list list – digit
list digit
digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Leftmost derivation
replaces the leftmost nonterminal (underlined) in each step.
Rightmost derivation
replaces the rightmost nonterminal in each step.
Parser Trees
Given a CFG, a parse tree according to the grammar is a tree
with following propertes.
The root of the tree is labeled by the start symbol
Each leaf of the tree is labeled by a terminal (=token) or
Each interior node is labeled by a nonterminal
If A X1 X2 … Xn is a production, then node A has immediate
children X1, X2, …, Xn where Xi is a (non)terminal or ( denotes
the empty string)
Example
A XYZ
A
X
Y
Z
Example of the Parser Tree
Parse tree of the string 9-5+2 using grammar G
list
list
list
digit
digit
digit
9
-
5
+
2
The sequence of
leafs is called the
yield of the parse tree
Ambiguity
Consider the following context-free grammar
G = <{string}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, string>
P = string string + string | string - string | 0 | 1 | … | 9
This grammar is ambiguous, because more
than one parse tree represents the string 95+2
Ambiguity (Cont’d)
string
string
string
9
string
string
string
-
5
string
string
+
2
9
string
-
5
string
+
2
Associativity of Operators
Left-associative
If an operand with an operator on both sides of it, then it
belongs to the operator to its left.
Left-associative operators have left-recursive productions
string a+b+c has the same meaning as (a+b)+c
left left + term | term
Right-associative
If an operand with an operator on both sides of it, then it
belongs to the operator to its right.
string a=b=c has the same meaning as a=(b=c)
Right-associative operators have right-recursive productions
right term = right | term
Associativity of Operators (cont’d)
right
list
list
list
digit
letter
right
letter
digit
right
letter
digit
a
+
b
+
left-associative
c
a
=
b
right-associative
=
c
Precedence of Operators
String 9+5*2 has the same meaning as 9+(5*2)
* has higher precedence than +
Constructs a grammar for arithmetic
expressions with precedence of operators.
left-associative : + - (expr)
left-associative:* / (term)
Step 1:
Step 3:
factor digit | ( expr )
expr expr + term
| expr – term
| term
Step 2:
Step 4:
term term * factor
| term / factor
| factor
expr expr + term | expr – term | term
term term * factor | term / factor | factor
factor digit | ( expr )
An Example: Syntax of Statements
The grammar is a subset of Java statements.
This approach prevents the build-up of semicolons
after statements such as if- and while-, which end
with nested substatements.
stmt
|
|
|
|
|
id = expression ;
if ( expression ) stmt
if ( expression ) stmt else stmt
while ( expression ) stmt
do stmt while ( expression ) ;
{ stmts }
stmts stmts stmt
|
Syntax-Directed Translation
Syntax-Directed translation is done by attaching rules
or program fragments to productions in a grammar.
Translate infix expressions into postfix notation. ( in
this chapter )
Infix: 9 – 5 + 2
Postfix: 9 5 – 2 +
An Example
expr expr1 + term
The pseudo-code of the translation
translate expr1 ;
translate term ;
handle + ;
Syntax-Directed Translation (Cont’d)
Two concepts (approaches) related to
Syntax-Directed Translation.
Synthesized Attributes
Syntax-directed definition
Build up a translation by attaching strings (semantic
rules) as attributes to the nodes in the parse tree.
Translation Schemes
Syntax-directed translation
Build up a translation by program fragments which are
called semantic actions and embedded within production
bodies.
Syntax-directed definition
The syntax-directed definition associates
With each grammar symbol (terminals and nonterminals), a
set of attributes.
With each production, a set of semantic rules for computing
the values of the attributes associated with the symbols
appearing in the production.
An attribute is said to be
Synthesized
if its value at a parse-tree node is determined from attribute
values at its children and at the node itself.
Inherited
if its value at a parse-tree node is determined from attribute
values at the node itself, its parent, and its siblings in the parse
tree.
An Example: Synthesized Attributes
An annotated parse tree
Suppose a node N in a parse tree is labeled by
grammar symbol X.
The X.a is denoted the value of attribute a of X at
node N.
expr.t = “95-2+”
expr.t = “95-”
expr.t = “9”
term.t = “2”
term.t = “5”
term.t = “9”
9
-
5
+
2
Semantic Rules
Production
expr expr1 + term
expr expr1 - term
expr term
term 0
term 1
…
term 9
Semantic Rules
expr.t = expr1.t || term.t || ‘+’
expr.t = expr1.t || term.t || ‘-’
expr.t = term.t
term.t = ‘0’
term.t = ‘1’
…
term.t = ‘9’
|| is the operator for string concatenation in semantic rule.
Depth-First Traversals
Tree traversals
Breadth-First
Depth-First
Preorder: N L R
Inorder: L N R
Postorder: L R N
Depth-First Traversals: Postorder、From left to right
procedure visit(node N)
{
for ( each child C of N, from left to right )
{
visit(C);
}
evaluate semantic rules at node N;
}
Example: Depth-First Traversals
expr.t = 95-2+
expr.t = 95expr.t = 9
term.t = 2
term.t = 5
term.t = 9
9
-
5
+
2
Note: all attributes are the synthesized type
Translation Schemes
A translation scheme is a CFG embedded
with semantic actions
Example
rest + term { print(“+”) } rest
Embedded Semantic Action
rest
+
term
{ print(“+”) }
rest
An Example: Translation Scheme
expr
expr
expr
term
9
-
+
term
term { print(‘-’) }
5
{ print(‘9’) }
{ print(‘5’) }
2
{ print(‘+’) }
{ print(‘2’) }
expr expr + term
expr expr – term
expr term
term 0
term 1
…
term 9
{ print(‘+’) }
{ print(‘-’) }
{ print(‘0’) }
{ print(‘1’) }
{ print(‘9’) }
Parsing
The process of determining if a string of
terminals (tokens) can be generated by a
grammar.
Time complexity:
For any CFG there is a parser that takes at most O(n3)
time to parse a string of n terminals.
Linear algorithms suffice to parse essentially all
languages that arise in practice.
Two kinds of methods
Top-down: constructs a parse tree from root to leaves
Bottom-up: constructs a parse tree from leaves to root
Top-Down Parsing
Recursive descent parsing is a top-down method
of syntax analysis in which a set of recursive
procedures is used to process the input.
One procedure is associated with each nonterminal of a
grammar.
If a nonterminal has multiple productions, each production
is implemented in a branch of a selection statement based
on input lookahead information
Predictive parsing
A special form of recursive descent parsing
The lookahead symbol unambiguously determines the flow
of control through the procedure body for each nonterminal.
An Example: Top-Down Parsing
stmt expr ;
| if ( expr ) stmt
| for ( optexpr ; optexpr ; optexpr ) stmt
| other
optexpr
| expr
stmt
for
(
optexpr
ε
;
optexpr
expr
;
optexpr
expr
)
stmt
other
void stmt() {
switch ( lookahead ) {
case expr:
match(expr); match(‘;’); break;
case if:
match(if); match(‘(‘);
match(expr); match(‘)’);
stmt(); break;
case for:
match(for); match(‘(‘);
optexpr(); match(‘;’);
optexpr(); match(‘;’);
stmt expr ;
optexpr(); match(‘)’);
| if ( expr ) stmt
stmt(); break;
| for ( optexpr ; optexpr ; optexpr ) stmt
case other:
| other
match(other); break;
default:
report(“syntax error”);
}
}
Pseudocode For a
Predictive Parser
Use ε-Productions
optexpr | expr
void optexpr() {
if ( lookahead == expr ) match(expr);
}
void match(terminal t) {
if ( lookahead == t )
lookahead = nextTerminal;
else
report(“syntax error”);
}
Example: Predictive Parsing
Parse
Tree
for
LL(1)
stmt
(
optexpr
;
optexpr
optexpr
;
)
stmt
optexpr()match(‘;‘)
optexpr()
match(‘)‘) stmt()
match(for) match(‘(‘)
optexpr()match(‘;‘)
Input
for
(
lookahead
;
expr ;
expr
)
other
FIRST
FIRST() is the set of terminals that appear
as the first symbols of one or more strings
generated from
is Sentential Form
Example
FIRST(stmt) = { expr, if, for, other }
FIRST(expr ;) = { expr }
stmt
|
|
|
expr ;
if ( expr ) stmt
for ( optexpr ; optexpr ; optexpr ) stmt
other
Examples: First
type simple
| ^ id
| array [ simple ] of type
simple integer
| char
| num dotdot num
FIRST(simple) = { integer, char, num }
FIRST(^ id) = { ^ }
FIRST(type) = { integer, char, num, ^, array }
Designing a Predictive Parser
A predictive parser is a program consisting of a
procedure for every nonterminal.
The procedure for nonterminal A
It decides which A-production to use by examining
the lookahead symbol.
Left Factor
Left Recursion
ε Production
Mimics the body of the chosen production.
Applying translation scheme
Construct a predictive parser, ignoring the actions.
Copy the actions from the translation scheme into
the parser
Left Factor
Left Factor
One production for nonterminal A starts with the
same symbols.
Example:
stmt if ( expr ) stmt
| if ( expr ) stmt else stmt
Use Left Factoring to fix it
stmt if ( expr ) stmt rest
rest else stmt | ε
Left Recursion
Left Recursive
An Example:
A production for nonterminal A starts with a self
reference.
A Aα | β
expr expr + term | term
Rewrite the left recursive to right recursive by
using the following rules.
A βR
R αR | ε
Example: Left and Right Recursive
A
A
…
R
R
A
…
A
R
A
R
β
α
α
….
left recursive
α
β
α
α
….
right recursive
α
ε
Abstract and Concrete Syntax
+
-
2
expr
9
5
expr
expr
term
term
helper
term
9
-
5
+
2
Conclusion: Parsing and Translation Scheme
Give a CFG grammar G as below:
expr expr + term { print(‘+’) }
expr expr – term { print(‘-’) }
expr term
term 0
{ print(‘0’) }
term 1
{ print(‘1’) }
…
term 9
{ print(‘9’) }
Semantic actions for translating into postfix notation.
Conclusion: Parsing and Translation Scheme
Step 1
To elimination left-recursion
Technique
A Aα | Aβ | γ
into
A γR
R αR | βR | ε
Use the rule to transforms G.
Conclusion: Parsing and Translation Scheme
Left-Recursion-elimination
expr term rest
rest + term { print(‘+’) } rest
| – term { print(‘-’) } rest
| ε
term 0
term 1
…
term 9
{ print(‘0’) }
{ print(‘1’) }
{ print(‘9’) }
An Example: Left-Recursion-elimination
expr
rest
term
9
{ print(‘9’) }
-
term
{ print(‘-’) }
5 { print(‘5’) }
rest
term { print(‘+’) } rest
+
2
{ print(‘2’) }
expr term rest
rest + term { print(‘+’) } rest
| – term { print(‘-’) } rest
| ε
term 0 { print(‘0’) } | 1 { print(‘1’) } | … | 9 { print(‘9’) }
ε
Conclusion: Parsing and Translation Scheme
Step 2
Procedures for
Nonterminals.
void expr() {
term(); rest();
}
void rest() {
if ( lookahead == ‘+’ ) {
match(‘+’); term();
print(‘+’); rest();
}
else if ( lookahead == ‘-’ ) {
match(‘-’); term();
print(‘-’); rest();
}
else { } //do nothing with the input
}
void term() {
if ( lookahead is a digit ) {
t = lookahead; match(lookahead);
print(t);
}
else
report(“syntax error”);
}
Conclusion: Parsing and Translation Scheme
Step 3
Simplifying the Translator
void rest() {
if ( lookahead == ‘+’ ) {
match(‘+’); term();
print(‘+’); rest();
}
else if (lookahead == ‘-’) {
match(‘-’); term();
print(‘-’); rest();
}
else { }
void rest() {
while ( true ) {
if ( lookahead == ‘+’ ) {
match(‘+’); term();
print(‘+’); continue;
}
else if (lookahead == ‘-’) {
match(‘-’); term();
print(‘-’); continue;
}
break;
}
}
Conclusion: Parsing and Translation Scheme
Complete
import java.io.*;
class Parser {
static int lookahead;
public Parser() throws IOException {
lookahead = System.in.read();
}
void expr() {
term();
while ( true ) {
if ( lookahead == ‘+’ ) {
match(‘+’); term();
System.out.write(‘+’);
continue;
}
else if (lookahead == ‘-’) {
match(‘-’); term();
System.out.write(‘-’);
continue;
}
else return;
}
void term() throws IOException {
if (Character.isDigit((char)lookahead){
System.out.write((char)lookahead);
match(lookahead);
}
else throw new Error(“syntax error”);
}
void match(int t) throws IOException {
if ( lookahead == t )
lookahead = System.in.read();
else throw new Error(“syntax error”);
}
}