Compiler - Tunghai University

Download Report

Transcript Compiler - Tunghai University

Chapter 2
Chang Chi-Chung
2008.03 rev.1
A Simple Syntax-Directed Translator

This chapter contains introductory material to
Chapters 3 to 8


To create a syntax-directed translator that maps
infix arithmetic expressions into postfix
expressions.
Building a simple compiler involves:



Defining the syntax of a programming language
Develop a source code parser: for our compiler
we will use predictive parsing
Implementing syntax directed translation to
generate intermediate code
A Code Fragment To Be Translated
To extend syntax-directed translator to map code fragments into threeaddress code. See appendix A.
{
int i; int j;
float[100] a; float v; float x;
while (true) {
do i = i + 1; while ( a[i] < v );
do j = j – 1; while ( a[j] > v );
if ( i>= j ) break;
x = a[i]; a[i] = a[j]; a[j] = x;
}
}
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
i = i + 1
t1 = a [ i ]
if t1 < v goto 1
j = j -1
t2 = a [ j ]
if t2 > v goto 4
ifFalse i >= j goto 9
goto 14
x = a [ i ]
t3 = a [ j ]
a [ i ] = t3
a [ j ] = x
goto 1
A Model of a Compiler Front End
Source
program
Lexical
analyzer
Token
stream
Parser
Character
Stream
Symbol
Table
Syntax
tree
Intermediate
Code
Generator
Three-address
code
Two Forms of Intermediate Code

Abstract syntax trees

Tree-Address instructions
do-while
body
assign
[]
+
i
i
1:
2:
3:
>
a
1
v
i
i = i + 1
t1 = a [ i ]
if t1 < v goto 1
Syntax Definition



Using Context-free grammar (CFG)
BNF: Backus-Naur Form
Context-free grammar has four components:




A set of tokens (terminal symbols)
A set of nonterminals
A set of productions
A designated start symbol
Example of CFG

G = <T, N, P, S>



T = { +,-,0,1,2,3,4,5,6,7,8,9 }
N = { list, digit }
P=
 list  list + digit
 list  list – digit
 list  digit


digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
S = list
Derivations

The set of all strings (sequences of tokens)
generated by the CFG using derivation


Begin with the start symbol
Repeatedly replace a nonterminal symbol in the
current sentential form with one of the right-hand
sides of a production for that nonterminal
Example of the Derivations
list
 list + digit
 list - digit + digit
 digit - digit + digit
 9 - digit + digit
 9 - 5 + digit
9-5+2

Production
 list  list + digit
 list  list – digit
 list  digit
 digit  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
Leftmost derivation



replaces the leftmost nonterminal (underlined) in each step.
Rightmost derivation

replaces the rightmost nonterminal in each step.
Parser Trees

Given a CFG, a parse tree according to the grammar is a tree
with following propertes.
 The root of the tree is labeled by the start symbol
 Each leaf of the tree is labeled by a terminal (=token) or 
 Each interior node is labeled by a nonterminal
 If A  X1 X2 … Xn is a production, then node A has immediate
children X1, X2, …, Xn where Xi is a (non)terminal or  ( denotes
the empty string)

Example
 A  XYZ
A
X
Y
Z
Example of the Parser Tree

Parse tree of the string 9-5+2 using grammar G
list
list
list
digit
digit
digit
9
-
5
+
2
The sequence of
leafs is called the
yield of the parse tree
Ambiguity

Consider the following context-free grammar
G = <{string}, {+,-,0,1,2,3,4,5,6,7,8,9}, P, string>
P = string  string + string | string - string | 0 | 1 | … | 9

This grammar is ambiguous, because more
than one parse tree represents the string 95+2
Ambiguity (Cont’d)
string
string
string
9
string
string
string
-
5
string
string
+
2
9
string
-
5
string
+
2
Associativity of Operators

Left-associative

If an operand with an operator on both sides of it, then it
belongs to the operator to its left.


Left-associative operators have left-recursive productions


string a+b+c has the same meaning as (a+b)+c
left  left + term | term
Right-associative

If an operand with an operator on both sides of it, then it
belongs to the operator to its right.


string a=b=c has the same meaning as a=(b=c)
Right-associative operators have right-recursive productions

right  term = right | term
Associativity of Operators (cont’d)
right
list
list
list
digit
letter
right
letter
digit
right
letter
digit
a
+
b
+
left-associative
c
a
=
b
right-associative
=
c
Precedence of Operators



String 9+5*2 has the same meaning as 9+(5*2)
* has higher precedence than +
Constructs a grammar for arithmetic
expressions with precedence of operators.


left-associative : + - (expr)
left-associative:* / (term)
Step 1:
Step 3:
factor  digit | ( expr )
expr  expr + term
| expr – term
| term
Step 2:
Step 4:
term  term * factor
| term / factor
| factor
expr  expr + term | expr – term | term
term  term * factor | term / factor | factor
factor  digit | ( expr )
An Example: Syntax of Statements


The grammar is a subset of Java statements.
This approach prevents the build-up of semicolons
after statements such as if- and while-, which end
with nested substatements.
stmt 
|
|
|
|
|
id = expression ;
if ( expression ) stmt
if ( expression ) stmt else stmt
while ( expression ) stmt
do stmt while ( expression ) ;
{ stmts }
stmts  stmts stmt
| 
Syntax-Directed Translation


Syntax-Directed translation is done by attaching rules
or program fragments to productions in a grammar.
Translate infix expressions into postfix notation. ( in
this chapter )



Infix: 9 – 5 + 2
Postfix: 9 5 – 2 +
An Example
expr  expr1 + term
 The pseudo-code of the translation
translate expr1 ;
translate term ;
handle + ;

Syntax-Directed Translation (Cont’d)

Two concepts (approaches) related to
Syntax-Directed Translation.

Synthesized Attributes



Syntax-directed definition
Build up a translation by attaching strings (semantic
rules) as attributes to the nodes in the parse tree.
Translation Schemes


Syntax-directed translation
Build up a translation by program fragments which are
called semantic actions and embedded within production
bodies.
Syntax-directed definition

The syntax-directed definition associates



With each grammar symbol (terminals and nonterminals), a
set of attributes.
With each production, a set of semantic rules for computing
the values of the attributes associated with the symbols
appearing in the production.
An attribute is said to be

Synthesized


if its value at a parse-tree node is determined from attribute
values at its children and at the node itself.
Inherited

if its value at a parse-tree node is determined from attribute
values at the node itself, its parent, and its siblings in the parse
tree.
An Example: Synthesized Attributes

An annotated parse tree


Suppose a node N in a parse tree is labeled by
grammar symbol X.
The X.a is denoted the value of attribute a of X at
node N.
expr.t = “95-2+”
expr.t = “95-”
expr.t = “9”
term.t = “2”
term.t = “5”
term.t = “9”
9
-
5
+
2
Semantic Rules
Production
expr  expr1 + term
expr  expr1 - term
expr  term
term  0
term  1
…
term  9
Semantic Rules
expr.t = expr1.t || term.t || ‘+’
expr.t = expr1.t || term.t || ‘-’
expr.t = term.t
term.t = ‘0’
term.t = ‘1’
…
term.t = ‘9’
|| is the operator for string concatenation in semantic rule.
Depth-First Traversals

Tree traversals


Breadth-First
Depth-First




Preorder: N L R
Inorder: L N R
Postorder: L R N
Depth-First Traversals: Postorder、From left to right
procedure visit(node N)
{
for ( each child C of N, from left to right )
{
visit(C);
}
evaluate semantic rules at node N;
}
Example: Depth-First Traversals
expr.t = 95-2+
expr.t = 95expr.t = 9
term.t = 2
term.t = 5
term.t = 9
9
-
5
+
2
Note: all attributes are the synthesized type
Translation Schemes


A translation scheme is a CFG embedded
with semantic actions
Example

rest  + term { print(“+”) } rest
Embedded Semantic Action
rest
+
term
{ print(“+”) }
rest
An Example: Translation Scheme
expr
expr
expr
term
9
-
+
term
term { print(‘-’) }
5
{ print(‘9’) }
{ print(‘5’) }
2
{ print(‘+’) }
{ print(‘2’) }
expr  expr + term
expr  expr – term
expr  term
term  0
term  1
…
term  9
{ print(‘+’) }
{ print(‘-’) }
{ print(‘0’) }
{ print(‘1’) }
{ print(‘9’) }
Parsing


The process of determining if a string of
terminals (tokens) can be generated by a
grammar.
Time complexity:



For any CFG there is a parser that takes at most O(n3)
time to parse a string of n terminals.
Linear algorithms suffice to parse essentially all
languages that arise in practice.
Two kinds of methods


Top-down: constructs a parse tree from root to leaves
Bottom-up: constructs a parse tree from leaves to root
Top-Down Parsing

Recursive descent parsing is a top-down method
of syntax analysis in which a set of recursive
procedures is used to process the input.



One procedure is associated with each nonterminal of a
grammar.
If a nonterminal has multiple productions, each production
is implemented in a branch of a selection statement based
on input lookahead information
Predictive parsing


A special form of recursive descent parsing
The lookahead symbol unambiguously determines the flow
of control through the procedure body for each nonterminal.
An Example: Top-Down Parsing
stmt  expr ;
| if ( expr ) stmt
| for ( optexpr ; optexpr ; optexpr ) stmt
| other
optexpr  
| expr
stmt
for
(
optexpr
ε
;
optexpr
expr
;
optexpr
expr
)
stmt
other
void stmt() {
switch ( lookahead ) {
case expr:
match(expr); match(‘;’); break;
case if:
match(if); match(‘(‘);
match(expr); match(‘)’);
stmt(); break;
case for:
match(for); match(‘(‘);
optexpr(); match(‘;’);
optexpr(); match(‘;’);
stmt  expr ;
optexpr(); match(‘)’);
| if ( expr ) stmt
stmt(); break;
| for ( optexpr ; optexpr ; optexpr ) stmt
case other:
| other
match(other); break;
default:
report(“syntax error”);
}
}
Pseudocode For a
Predictive Parser
Use ε-Productions
optexpr   | expr
void optexpr() {
if ( lookahead == expr ) match(expr);
}
void match(terminal t) {
if ( lookahead == t )
lookahead = nextTerminal;
else
report(“syntax error”);
}
Example: Predictive Parsing
Parse
Tree
for
LL(1)
stmt
(
optexpr
;
optexpr
optexpr
;
)
stmt
optexpr()match(‘;‘)
optexpr()
match(‘)‘) stmt()
match(for) match(‘(‘)
optexpr()match(‘;‘)
Input
for
(
lookahead
;
expr ;
expr
)
other
FIRST



FIRST() is the set of terminals that appear
as the first symbols of one or more strings
generated from 
 is Sentential Form
Example


FIRST(stmt) = { expr, if, for, other }
FIRST(expr ;) = { expr }
stmt 
|
|
|
expr ;
if ( expr ) stmt
for ( optexpr ; optexpr ; optexpr ) stmt
other
Examples: First
type  simple
| ^ id
| array [ simple ] of type
simple  integer
| char
| num dotdot num
FIRST(simple) = { integer, char, num }
FIRST(^ id) = { ^ }
FIRST(type) = { integer, char, num, ^, array }
Designing a Predictive Parser


A predictive parser is a program consisting of a
procedure for every nonterminal.
The procedure for nonterminal A

It decides which A-production to use by examining
the lookahead symbol.





Left Factor
Left Recursion
ε Production
Mimics the body of the chosen production.
Applying translation scheme


Construct a predictive parser, ignoring the actions.
Copy the actions from the translation scheme into
the parser
Left Factor

Left Factor


One production for nonterminal A starts with the
same symbols.
Example:
stmt  if ( expr ) stmt
| if ( expr ) stmt else stmt

Use Left Factoring to fix it
stmt  if ( expr ) stmt rest
rest  else stmt | ε
Left Recursion

Left Recursive



An Example:


A production for nonterminal A starts with a self
reference.
A  Aα | β
expr  expr + term | term
Rewrite the left recursive to right recursive by
using the following rules.
A  βR
R  αR | ε
Example: Left and Right Recursive
A
A
…
R
R
A
…
A
R
A
R
β
α
α
….
left recursive
α
β
α
α
….
right recursive
α
ε
Abstract and Concrete Syntax
+
-
2
expr
9
5
expr
expr
term
term
helper
term
9
-
5
+
2
Conclusion: Parsing and Translation Scheme

Give a CFG grammar G as below:
expr  expr + term { print(‘+’) }
expr  expr – term { print(‘-’) }
expr  term
term  0
{ print(‘0’) }
term  1
{ print(‘1’) }
…
term  9
{ print(‘9’) }

Semantic actions for translating into postfix notation.
Conclusion: Parsing and Translation Scheme

Step 1


To elimination left-recursion
Technique
A  Aα | Aβ | γ
into
A  γR
R  αR | βR | ε

Use the rule to transforms G.
Conclusion: Parsing and Translation Scheme

Left-Recursion-elimination
expr  term rest
rest  + term { print(‘+’) } rest
| – term { print(‘-’) } rest
| ε
term  0
term  1
…
term  9
{ print(‘0’) }
{ print(‘1’) }
{ print(‘9’) }
An Example: Left-Recursion-elimination
expr
rest
term
9
{ print(‘9’) }
-
term
{ print(‘-’) }
5 { print(‘5’) }
rest
term { print(‘+’) } rest
+
2
{ print(‘2’) }
expr  term rest
rest  + term { print(‘+’) } rest
| – term { print(‘-’) } rest
| ε
term  0 { print(‘0’) } | 1 { print(‘1’) } | … | 9 { print(‘9’) }
ε
Conclusion: Parsing and Translation Scheme

Step 2

Procedures for
Nonterminals.
void expr() {
term(); rest();
}
void rest() {
if ( lookahead == ‘+’ ) {
match(‘+’); term();
print(‘+’); rest();
}
else if ( lookahead == ‘-’ ) {
match(‘-’); term();
print(‘-’); rest();
}
else { } //do nothing with the input
}
void term() {
if ( lookahead is a digit ) {
t = lookahead; match(lookahead);
print(t);
}
else
report(“syntax error”);
}
Conclusion: Parsing and Translation Scheme

Step 3

Simplifying the Translator
void rest() {
if ( lookahead == ‘+’ ) {
match(‘+’); term();
print(‘+’); rest();
}
else if (lookahead == ‘-’) {
match(‘-’); term();
print(‘-’); rest();
}
else { }
void rest() {
while ( true ) {
if ( lookahead == ‘+’ ) {
match(‘+’); term();
print(‘+’); continue;
}
else if (lookahead == ‘-’) {
match(‘-’); term();
print(‘-’); continue;
}
break;
}
}
Conclusion: Parsing and Translation Scheme

Complete
import java.io.*;
class Parser {
static int lookahead;
public Parser() throws IOException {
lookahead = System.in.read();
}
void expr() {
term();
while ( true ) {
if ( lookahead == ‘+’ ) {
match(‘+’); term();
System.out.write(‘+’);
continue;
}
else if (lookahead == ‘-’) {
match(‘-’); term();
System.out.write(‘-’);
continue;
}
else return;
}
void term() throws IOException {
if (Character.isDigit((char)lookahead){
System.out.write((char)lookahead);
match(lookahead);
}
else throw new Error(“syntax error”);
}
void match(int t) throws IOException {
if ( lookahead == t )
lookahead = System.in.read();
else throw new Error(“syntax error”);
}
}