CS 236 – Discrete Mathematics

Download Report

Transcript CS 236 – Discrete Mathematics

Discussion #3
Grammar Formalization
& Parse-Tree Construction
Discussion #3
1/20
Topics
• Grammar Definitions
• Parse Trees
• Constructing Parse Trees
Discussion #3
2/20
Formal Definition of a Grammar
A grammar G is a 4-tuple:
G = (VN, VT, S, ), where
– VN , VT , sets of non-terminal and terminal symbols
– SVN, a start symbol
–  = a finite set of relations from
(VT  VN)+ to (VT  VN)*
– an element of , (, ), is written as    and is
called a production rule or a rewriting rule
Discussion #3
3/20
Examples of Grammars
G1 = (VN, VT, S, ), where:
VN = {S, B}
VT = {a, b, c}
S=S
 = { S  aBSc ,
S  abc ,
Ba  aB ,
Bb  bb }
G3 = (VN, VT, S, ), where:
VN = {S, A, B }
VT = {a, b}
S=S
Discussion #3
G2 = (VN, VT, S, ), where:
VN = {I, L, D}
VT = {a, b, …, z, 0, 1, …, 9}
S=I
 = { I  L | ID | IL ,
La|b|…|z,
D0|1|…|9 }
 = { S  aA ,
A  aA | bB ,
B  bB |  }
4/20
Definition of a Context-Free Grammar
• A context-free grammar is a grammar with the
following restriction:
– The relation  is a finite set of relations from
VN to (VT  VN)+
– i.e. the left hand side of a production is a single nonterminal
– i.e. the right hand side of any production cannot be empty
• Context-free grammars generate context-free
languages. With slight variations, essentially all
programming languages are context-free languages.
Discussion #3
5/20
Examples of Grammars (again)
Which are context-free grammars?
G1 = (VN, VT, S, ), where:
VN = {S, B}
VT = {a, b, c}
S=S
 = { S  aBSc ,
S  abc ,
Ba  aB ,
Bb  bb }
G3 = (VN, VT, S, ), where:
VN = {S, A, B }
VT = {a, b}
S=S
Discussion #3
G2 = (VN, VT, S, ), where:
VN = {I, L, D}
VT = {a, b, …, z, 0, 1, …, 9}
S=I
 = { I  L | ID | IL ,
La|b|…|z,
D0|1|…|9 }
 = { S  aA ,
A  aA | bB ,
B  bB |  }
6/20
Backus-Naur Form (BNF)
• A traditional meta language to represent
grammars for programming languages
• Every non-terminal is enclosed in < and >
• Instead of the symbol  we use ::=
• Example
BNF:
I  L | ID | IL
La|b|…|z
D0|1|…|9
Discussion #3
<I> ::= <L> | <I><D> | <I><L>
<L> ::= a | b | … | z
<D> ::= 0 | 1 | … | 9
7/20
Definition: Direct Derivative
Let G = (VN, VT, S, ) be a grammar and
,   (VN  VT)*,
 is said to be a direct derivative of , (written
  ) if there are strings 1 and 2 (including
possibly empty strings)
such that  = 1B2,
 = 12,
B  VN and
B   is a production of G.
Discussion #3
8/20
Example: Direct Derivatives
G = (VN, VT, S, ), where:
VN = {I, L, D}
VT = {a, b, …, z, 0, 1, …, 9}
S=I
 = { I  L | ID | IL
La|b|…|z
D0|1|…|9 }
Discussion #3


Rule Used
1
2
I
L
IL


Ib
Lb
IL

b
Lb
ab
La

b
IDD
I0D
D0
I
D
9/20
Definition: Derivation
Let G = (VN, VT, S, ) be a grammar
A string  produces  ( reduces to  or  is the
derivation of , written  + ),
if there are strings 0, 1, …, n (n>0) such that
 = 0  1, 1  2, …, n-1  n, n  .
Discussion #3
10/20
Example: Derivation
• Let G = (VN, VT, S, ), where:
VN = {I, L, D}
VT = {a, b, …, z, 0, 1, …, 9}
S=I
 = { I  L | ID | IL
La|b|…|z
D0|1|…|9 }
• I produces abc12
Discussion #3
I  ID
 IDD
 ILDD
 ILLDD
 LLLDD
 aLLDD
 abLDD
 abcDD
 abc1D
 abc12
11/20
Definition: Language
• A sentential form is any derivative of the
start symbol S.
• A language L generated by a grammar G is
the set of all sentential forms whose
symbols are all terminals; that is,
L(G) = { | S +  and   VT*}
Discussion #3
12/20
Example: Language
I  ID
 IDD
 ILDD
 ILLDD
 LLLDD
• Let G = (VN, VT, S, ), where:
VN = {I, L, D}
VT = {a, b, …, z, 0, 1, …, 9}
S=I
 aLLDD
 = { I  L | ID | IL
 abLDD
La|b|…|z
 abcDD
 abc1D
D0|1|…|9 }
 abc12
• I produces abc12
• L(G) = {abc12, x, m934897773645, a1b2c3, …}
Discussion #3
13/20
Syntax Analysis: Parsing
• The parse of a sentence is the construction
of a derivation for that sentence
• The parsing of a sentence results in
– acceptance or rejection
– and, if acceptance, then also a parse tree
• We are looking for an algorithm to parse a
sentence (i.e. to parse a program) and
produce a parse tree.
Discussion #3
14/20
Parse Trees
• A parse tree is composed of
– interior nodes representing syntactic categories
(non-terminal symbols)
– leaf nodes representing terminal symbols
• For each interior node N, the transition from
N to its children represents the application
of a production.
Discussion #3
15/20
Parse Tree Construction
• Top-down
– Starts with the root (starting symbol)
– Proceeds downward to leaves using productions
• Bottom-up
– Starts from leaves
– Proceeds upward to the root
• Although these seem like reasonable approaches
to develop a parsing algorithm, we’ll see that
neither works well  so we’ll need to find a better
way.
Discussion #3
16/20
Example: Top-Down Parse for
4 * 2 + 3 V = {E, D}
N
VT = {0, 1, …, 9, +, , *, /, (, )}
S=E
= { E  D | ( E )
|E+E|E–E
|E*E|E/E,
D0|1|…|9 }
E
E
*
E
D
E
4
D
D
2
3
Discussion #3
+
E
Problems:
-How do we guess
which rule applies?
-Note that we produced
the wrong parse tree
(precedence is wrong)
17/20
Ambiguous Grammar
Two Different Parse Trees for 4*2+3
 ={ED|(E)|E+E|E–E|E*E|E/E,
D0|1|…|9 }
E
E
E
E
E
D
D
2
3
D
E
4
Discussion #3
E
E
*
+
*
+
E
E
D
D
D
3
4
2
18/20
Example: Bottom-Up Parse
1.
2.
3.
4.
5.
Problem: scanning the
entire program repeatedly
A  V | I | (A + A) | (A * A)
V  L | VL | VD
I  D | ID
D0|1|2|3|4|5|6|7|8|9
Lx|y|z
(
( ( A *
A
A
A
+
) +
A )
A )
( ( A * ( A + A ) ) + I )
Problem:
( ( V * ( V + V ) ) + I D) I
( ( L * ( L + L ) ) +
D D)
??
D
( ( z * ( x + y ) ) + 12)
Discussion #3
19/20
So,
how do we develop a parsing algorithm?
• “Fix” the grammar
– So that we can go top down, left to right, with no backup
– LL(1) grammar: Left-to-right, Left-most non-terminal,
one symbol look ahead
• “Fix” (How?)
– Observe grammar properties: determine what’s needed to
make them LL(1)
– Transform grammars to make them LL(1)
• Note: works for many grammars, but not all
Discussion #3
20/20