Compiler Construction

Download Report

Transcript Compiler Construction

Chapter 3
Context-Free Grammars and Parsing
1
Parsing: Syntax Analysis
decides which part of the incoming token
stream should be grouped together.
 the
output of parsing is some
representation of a parse tree.
 intermediate code generator transforms
the parse tree into an intermediate
language.

2
Comparisons between r.e. (regular expressions)
and c.f.g. (context-free grammars)
tokens
using

r.e.
token

c.f.g. describes programming language
using
constructs
P.F.A. to test a valid
program (sentence)
describes
F.A. to test a valid
3
Features of programming languages

contents:
- declarations
- sequential statements
- iterative statements
- conditional statements
4

features:
- declare/state recursively & repeatedly
- hierarchical specification
e.g., compound statement -> statement ->
expression -> id
- nested structures
- similarity
5
Description of the syntax of programming
languages
Syntax Diagrams (See Sec. 3.5.2)
 Context Free Grammars (CFG)

6
Contex Free Grammar (in BNF)
exp  exp addop term | term
addop  + | -
term  term mulop factor | factor
mulop 
*
factor  ( exp ) | number
8
History
-
-
In 1956 BNF (Backus Naur Form) is used
for description of natural language.
Algol uses BNF to describe its language.
The Syntactic Specification of
Programming Languages - CFG ( a BNF
description)
9
Capabilities of Context-free grammars
give precise syntactic specification of
programming languages
 a parser can be constructed automatically
by CFG
 the syntax entity specified in CFG can be
used for translating into object code.
 useful for describing nested structures such
as balanced parentheses, matching beginend's, corresponding if-then-else, etc.

10
Def. of context free grammars
- A CFG is a 4-tuple (V,T,P,S), where
V - a finite set of variables (non-terminals)
T - a finite set of terminal symbols (tokens)
P - a finite set of productions (or grammar rules)
S - a start symbol
and
V T = 
S V
Productions are of the form: A -> , where A  V,
  (V+T)*
- CFG generates CFL(Context Free Languages)
11
An Example
G = ( {E}, {+, *, (, ), id}, P, E)
P: { E -> E + E
E -> E * E
E -> ( E )
E -> id }
12
Rules from F.A.(r.e.) to CFG
1.
2.
3.
4.
5.
For each state there is a nonterminal symbol.
If state A has a transition to state B on symbol
a, introduce A -> aB.
If A goes to B on input , introduce A -> B.
If A is an accepting state, introduce A -> .
Make the start state of the NFA be the start
symbol of the grammar.
13
Examples
(1) r.e.: (a|b)(a|b|0|1)*
c.f.g.: S -> aA|bA A -> aA|bA|0A|1A|
(2) r.e.: (a|b)*abb
c.f.g.: S -> aS | bS | aA
A -> bB
B -> bC
C -> 
14
Why don’t we use c.f.g. to replace r.e. ?
r.e. => easy & clear description for token.
 r.e. => efficient token recognizer
 modularizing the components

15
Derivations (How does a CFG defines a language?)
Definitions:
 directly derive
*   (V+T)*
 =>
 derive in zero or more steps
+
 derive in one or more steps
 =>   (V+T)*
i
 derive in i steps
A =>   (V+T)*
 sentential form
  (V+T)*
 sentence
  T*
+
 language: { w | S => w , w T* }
 leftmost derivations
 rightmost derivations
16
G = ( {exp, op}, {+, *, (, ), number}, P, exp )
P : { exp  exp op exp | ( exp ) | number
op  + | - | * }
(number-number)*number
18
Parse trees
=> a graphical representation for derivations.
(Note the difference between parse tree
and syntax tree.)
=> Often the parse tree is produced in only
a figurative sense; in reality, the parse tree
exists only as a sequence of actions made
by stepping through the tree construction
process.
19
Ambiguity
Ambiguous Grammars
- Def.: A context-free grammar that can
produce more than one parse tree for some
sentence.
- The ways to disambiguate a grammar: (1)
specifying the intention (e.g. associtivity and
precedence for arithmetic operators, other)
(2) rewrite a grammar to incorporate the
intention into the grammar itself.
20
For (1) Precedence: negate > exponent ( ) > * / > + -
Associtivity: exponent ==> right associtivity
others ==> left associtivity
In yacc, a “specification rule” is used to solve the problem
of (1), e.g., the alignment order, the special syntax, default
value (refer to yacc manual for the disambiguating rules)
For (2) introducing one nonterminal for each precedence
level.
21
Example 1
E -> E + E | E-E | E * E | E / E | E E | ( E )
| - E | id
is ambiguous ( is exponent operator with
right associtivity.)
22
E
E
E *
E
E + E
id
E + E
id E
* E
id
id
id
id
More than one parse tree for the sentence id + id * id
23
*
+
+
*
id
id
id
id
id
id
More than one syntax tree for the sentence id + id * id
24

The corresponding grammar shown below is
unambiguous
element -> (expression) | id
最
/*((expression) 括號內的
優先做之故) */
primary -> -primary | element
factor -> primary factor | primary
/*has right
associtivity */
term -> term * factor | term / factor | factor
expression -> expression + term | expression –
term | term
25
expression
Ex: id + id * id
expression +
term
factor
term
term
factor
primary primary
* factor
primary
element
element element
id
id
id
26
Example 2

stat -> IF cond THEN stat
| IF cond THEN stat ELSE stat
| other stat
is an ambiguous grammar
29
Dangling else problem
IF
stat
cond
THEN
stat
IF cond THEN stat ELSE stat
if
c1
If c1 then if c2 then s2 else s3
then
if c2 then s2 else s3
stat
IF cond THEN stat ELSE stat
if
c1 then
IF cond THEN stat
if c2
then s2
else
s3
The corresponding grammar shown below is
unambiguous.
stat -> matched-stat | unmatched-stat
matched-stat -> if cond then matched-stat else matchedstat | other-stat
unmatched-stat -> if cond then stat | if cond then
matched-stat else unmatched-stat
31
Non-context free language constructs



L = {wcw | w is in (a|b)*}
L = {anbmcndm | n  1 and m  1}
L = {anbncn| n  0}
33
Basic Parsing Techniques
1. How to check if an input string is a
sentence of a given grammar?
(check the syntax -- not only used in the
programming language)
2. How to construct a parse tree for the
input string, if desired?
34
Method
classic approach
1. top-down
recursive descent
modern approach
LL parsing
(produce leftmost derivation)
2. bottom-up operator precedence LR parsing (shift-reduce
parsing; produce rightmost derivation in reverse order)
35
An Example (for LR Parsing)
S -> aABe A -> Abc | b
w = abbcde
rm
rm
rm
B -> d
rm
S => aABe => aAde => aAbcde => abbcde
LR parsing:
abbcde ==> aAbcde ==> aAde ==> aABe
==> S
36
37
38
Assignment #4
1.
Do exercises 3.3, 3.5, 3.24, 3.25
Using the grammar in BNF of the TINY
language in Fig. 3.6 to derive step by
step the sequence of tokens of the
program in Fig. 3.8. (for practice only)
39