Cse321, Programming Languages and Compilers Lecture #7, Feb. 5, 2007 •Grammars •Top down parsing •Transition Diagrams •Ambiguity •Left recursion •Refactoring by adding levels •Recursive descent parsing •Predictive Parsers •First and.

Download Report

Transcript Cse321, Programming Languages and Compilers Lecture #7, Feb. 5, 2007 •Grammars •Top down parsing •Transition Diagrams •Ambiguity •Left recursion •Refactoring by adding levels •Recursive descent parsing •Predictive Parsers •First and.

Cse321, Programming Languages and Compilers
Lecture #7, Feb. 5, 2007
•Grammars
•Top down parsing
•Transition Diagrams
•Ambiguity
•Left recursion
•Refactoring by adding levels
•Recursive descent parsing
•Predictive Parsers
•First and Follow
•Parsing tables
11/5/2015
1
Cse321, Programming Languages and Compilers
Assignments
• Reading
– Chapter 3
– Page 73-106
– Quiz on Wednesday
• Mid Term exam
– Monday. Feb 19, 2007. Time: in class.
• Next Homework
–
–
–
–
On the web page, and the last page of this handout
Due date to be negotiated.
Recall Project #1 is due next Wednesday.
I promised no homework Today or Wednesday.
• Project 1
– Recall Project #1, the scanner is due Feb. 14th
11/5/2015
2
Cse321, Programming Languages and Compilers
Grammars 1
• Grammar
– A set of tokens (terminals): T
– A set of non-terminals: N
– A set of productions { lhs ::= rhs , ... }
» lhs in N
» rhs is a sequence of N U T
– A Start symbol: S (in N)
• Shorthands
– Provide only the productions
» All lhs symbols comprise N
» All other sysmbols comprise T
» lhs of first production is S
11/5/2015
3
Cse321, Programming Languages and Compilers
Grammars 2
• Rewriting rules
– Pick a non-terminal to replace. Which order?
» left-to-right
» right-to-left
• Derivations (a list if productions used to derive a string from
a grammar).
• A sentence of G: L(G)
– Start with S
– only terminal symbols
– all strings derivable from G in 1 or more steps
11/5/2015
4
Cse321, Programming Languages and Compilers
Grammars 3
• Parse trees.
– Graphical representations of derivations.
– The leaves of a parse tree for fully filled out tree is a sentence.
• Context Free Grammars
– how do they compare to regular expressions?
– Nesting (matched ()’s) requires CFG,’s RE's are not powerful
enough.
• Ambiguity
– A string has two derivations
– E ::= E + E
|
E*E
|
» x+x*y
id
• Left-recursion
– E ::= E + E | E * E | id
– Makes certain top-down parsers loop
11/5/2015
5
Cse321, Programming Languages and Compilers
Top Down Parsing
• Begin with the start symbol and try and derive the
parse tree from the root.
• Consider the grammar
Exp ::= id
| Exp + Exp
| Exp * Exp
| ( Exp )
derives x, x+x, x+x+x,
x*y
11/5/2015
x+y*z
...
6
Cse321, Programming Languages and Compilers
Example Parse (top down)
– stack
input
Exp
Exp
/ | \
Exp + Exp
Exp
/ | \
Exp + Exp
|
id(x)
11/5/2015
x+y*z
x+y*z
y*z
7
Cse321, Programming Languages and Compilers
Top Down Parse (cont)
Exp
y*z
/ | \
Exp + Exp
|
/|\
id(x) Exp * Exp
Exp
/
| \
Exp + Exp
|
/ | \
id(x) Exp * Exp
|
id(y)
11/5/2015
z
8
Cse321, Programming Languages and Compilers
Top Down Parse (cont.)
Exp
/ | \
Exp + Exp
|
/ | \
id(x) Exp * Exp
|
|
id(y) id(z)
11/5/2015
9
Cse321, Programming Languages and Compilers
Transition Diagrams
• Transition diagrams for predictive parsers
– One diagram for each Non-terminal
– Shouldn't have left recursion ( left factored )
– Diagrams can (recursively) mention each other
E -> T E'
E' -> + T E' | <empty>
T -> F T'
T' -> * F T' | <empty>
F -> ( E ) | id
E’
T
E
E’
T’
F
11/5/2015
T
E’
*
F
T’
T’
F
T
+
E
(
id
)
10
Cse321, Programming Languages and Compilers
Problems with Top Down Parsing
• Backtracking may be necessary:
– S ::= ee | bAc | bAe
– A ::= d | cA
try on string “bcde”
• Infinite loops possible from (indirect) left recursive
grammars.
– E ::= E + id | id
• Ambiguity is a problem when a unique parse is not
possible.
• These often require extensive grammar restructuring
(grammar debugging).
11/5/2015
11
Cse321, Programming Languages and Compilers
Grammar Transformations
• Removing ambiguity.
• Removing Left Recursion
• Backtracking and Factoring
11/5/2015
12
Cse321, Programming Languages and Compilers
Removing ambiguity.
• Adding levels to a grammar
– E := E + E | E * E | id | ( E )
– E ::= E + T | T
– T ::= T * F | F
– F ::= id | ( E )
• The dangling else grammar.
– st ::= if exp then st else st
| if exp then st
| id := exp
– Note that the following has two possible parses
if x=2 then if x=3 then y:=2 else y := 4
11/5/2015
if x=2 then (if x=3 then y:=2 ) else y := 4
if x=2 then (if x=3 then y:=2 else y := 4)
13
Cse321, Programming Languages and Compilers
Adding levels (cont)
• Original grammar
st ::= if exp then st else st
| if exp then st
| id := exp
• Assume that every st between then and else must
be matched, i.e. it must have both a then and an
else.
• New Grammar with addtional levels
st
match
->
->
|
unmatch ->
|
11/5/2015
match | unmatch
if exp then match else match
id := exp
if exp then st
if exp then match else unmatch
14
Cse321, Programming Languages and Compilers
Removing Left Recursion
• Top down recursive descent parsers require non-left
recursive grammars
• Technique: Left Factoring
» E := E + E
» E ::= id E’
» E’ ::= + E
|
E * E |
E’
|
*
id
E
E’
| ε
• General Technique to remove direct left recursion
– Every Non terminal with productions
(a | b) (n | m) *
» T ::= T n | T m
(left recursive productions)
| a
| b
(non-left recursive productions)
T
– Make a new non-terminal T’
“a” and “b” because
T
n
– Remove the old productions
they are the rhs of the
non-left recurive
– Add the following productions
productions.
T
n
» T ::= a T’ | b T’
n
T
» T’ ::= n T’ | m T’ | ε
a
11/5/2015
15
Cse321, Programming Languages and Compilers
Backtracking and Factoring
• Backtracking may be necessary:
– S ::= ee
– A ::= d
|
|
bAc
cA
|
bAe
• try on string “bcde”
S -> bAc
(by S -> bAc)
-> bcAe
(by A -> cA)
-> bcde
(by A -> d)
• But this is the wrong answer!
• Factoring a grammar
– Factor common prefixes and make the different postfixes into a
new non-terminal
– S ::= ee | bAQ
– Q ::= c
| e
– A ::= d
| cA
11/5/2015
16
Cse321, Programming Languages and Compilers
Recursive Descent Parsing
• One procedure (function) for each non-terminal.
• Procedures are often (mutually) recursive.
• They can return a bool (the input matches that nonterminal) or more often they return a data-structure
(the input builds this parse tree)
• Need to control the lexical analyzer (requiring it to
“back-up” on occasion)
11/5/2015
17
Cse321, Programming Languages and Compilers
R.D. parser for R.E.’s
• Build an instance of the datatype:
datatype Re =
empty of int
| simple of string * int
| concat of Re * Re
| closure of Re
| union of Re * Re;
• The lexical Analyzer datatype
datatype token =
Done
| Bar
| Star
| Hash
| Leftparen
| Rightparen
| Single of string;
11/5/2015
18
Cse321, Programming Languages and Compilers
Ambiguous grammar
1. RE
2. RE
3. RE
4. RE
5. RE
6. RE
->
->
->
->
->
->
RE bar RE
RE RE
RE *
id
#
( RE )
•Transform grammar by layering
•Tightest binding operators (*) at
the lowest layer
•Layers are Alt, then Concat, then
Closure, then Simple.
Alt -> Alt bar Concat
Alt -> Concat
Concat -> Concat
Closure
Concat -> Closure
Closure -> simple star
Closure -> simple
simple -> id
11/5/2015
|
(Alt )
| #
19
Cse321, Programming Languages and Compilers
Left Recursive Grammar
Alt -> Alt bar Concat
Alt -> Concat
Concat -> Concat
Closure
Concat -> Closure
Closure -> simple star
Closure -> simple
simple -> id
|
(Alt )
| #
1.
For every Non terminal with productions
T ::= T n | T m
(left recursive
productions)
| a | b (non-left recursive
productions)
1.
Make a new non-terminal T’
2.
Remove the old productions
3.
Add the following productions
T ::= a T’ | b T’
T’ ::= n T’ | m T’ | ε
11/5/2015
Alt
moreAlt
::=
::=
|
Concat
::=
moreConcat ::=
|
Closure
::=
|
Simple
::=
|
|
Concat moreAlt
Bar Concat moreAlt
ε
Closure moreConcat
Closure moreConcat
ε
Simple Star
Simple
Id
( Alt )
#
20
Cse321, Programming Languages and Compilers
Lookahead and the Lexer
val lookahead = ref Done;
val input = ref [Done];
val location = ref 0;
fun nextloc () =
(location := (!location) + 1; !location)
fun init s = (location := 0;
input := lexan s;
lookahead := hd(!input);
input := tl(!input))
• Lex’s the whole input
• Stores it in the variable input
• Keeps track of next token (so that backup is possible)
11/5/2015
21
Cse321, Programming Languages and Compilers
Matching a single Terminal
fun match t =
if (!lookahead) = t
then (if null(!input)
then lookahead := Done
else (lookahead := hd(!input)
; input := tl(!input)))
else error ("looking for: "^(tok2str t)^
" found: "^(tok2str (!lookahead)));
•
•
•
•
•
Match one token
Advance the input
Handle the end of file correctly
Report errors in a sensible way
This function will be called a lot!!
11/5/2015
22
Cse321, Programming Languages and Compilers
moreAlt and moreConcat
When we removed left recursion, we added nonterminals that that might recognize ε.
i.e. moreAlt and moreConcat
Observe the shape of parse trees using those
productions.
moreConcat ::= Closure moreConcat
| <empty>
moreConcat
closure
...
11/5/2015
They always
end in ε at the
far right of the
tree
moreConcat
closure
...
moreConcat
ε
23
Cse321, Programming Languages and Compilers
1 Function for each NT
• A simple way to write a parser for a language is the
technique called recursive descent parsing.
• Each Non-terminal is represented by a function that
returns a syntax item corresponding to the element
that production parses.
• If it can’t parse that element it raises an error.
• When a production might return the empty string we
need to handle that by using the SML ‘a option
Alt
::= Concat moreAlt
datatype.
Alt : unit -> Re
moreAlt : unit -> Re option
Concat : unit -> Re
Closure : unit -> Re
moreConcat : unit -> Re option
Simple : unit -> Re
11/5/2015
moreAlt
::=
|
Concat
::=
moreConcat ::=
|
Closure
::=
|
Simple
::=
|
|
Bar Alt moreAlt
ε
Closure moreConcat
Closure moreConcat
ε
Simple Star
Simple
Id
( Alt )
#
24
Cse321, Programming Languages and Compilers
Alt
::=
Concat moreAlt
fun Alt () =
let val x = Concat ()
val y = moreAlt ()
in case y of
NONE => x
| SOME z => union(x,z)
end
11/5/2015
25
Cse321, Programming Languages and Compilers
moreAlt ::= Bar Alt moreAlt
| ε
and moreAlt () =
case (!lookahead) of
Bar => let val _ = match Bar
val x = Alt()
“and”
separates
val y = moreAlt ()
mutually
recursive
in case y of
functions
NONE => SOME x
| (SOME z) => SOME(union(x,z))
end
| _ => NONE
11/5/2015
26
Cse321, Programming Languages and Compilers
Concat ::= Closure moreConcat
and Concat () =
let val x = Closure ()
val y = moreConcat ()
in case y of
NONE => x
| SOME z => concat(x,z)
end
11/5/2015
27
Cse321, Programming Languages and Compilers
moreConcat ::= Closure moreConcat
| ε
and moreConcat () =
case (!lookahead) of
(Single _ | Leftparen | Hash) =>
let val x = Closure()
val y = moreConcat()
in case y of
NONE => SOME x
| SOME z => SOME(concat(x,z))
end
| _ => NONE
11/5/2015
28
Cse321, Programming Languages and Compilers
Closure ::= Simple Star
| Simple
A simple form of leftfactoring is used here
and Closure () =
let val x = Simple()
in case !lookahead of
Star => (match Star; closure x)
| other => x
end
11/5/2015
29
Cse321, Programming Languages and Compilers
Simple ::= Id
| ( Alt )
| #
and Simple () =
case !lookahead of
Single c =>
let val _ = match (Single c)
val n = nextloc()
in simple(c,n) end
| Leftparen =>
let val _ = match Leftparen
val x = Alt();
val _ = match Rightparen
in x end
| Hash =>
let val _ = match Hash
val n = nextloc()
in empty n end
| x => error ("In Simple no match: "^(tok2str x));
11/5/2015
30
Cse321, Programming Languages and Compilers
Top Level Parser
fun parse s
let val _ =
val ans
val _ =
in ans end;
=
init s
= Alt()
match Done
parse "a(b*|c)#";
concat
(simple ("a",1),
concat (union
(closure (simple ("b",2)),
simple ("c",4)),
empty 8))
11/5/2015
31
Cse321, Programming Languages and Compilers
Predictive Parsers
• Using a stack to avoid recursion. Encoding the
diagrams in a table
• The Nullable, First, and Follow functions
– Nullable: Can a symbol derive the empty string. False for every
terminal symbol.
– First: all the terminals that a non-terminal could possibly derive as
its first symbol.
» term or nonterm -> set( term )
» sequence(term + nonterm) -> set( term)
– Follow: all the terminals that could immediately follow the string
derived from a non-terminal.
» non-term -> set( term )
11/5/2015
32
Cse321, Programming Languages and Compilers
Example First and Follow Sets
E ::=
E' ::=
E’ ::=
T ::=
T' ::=
T’ ::=
F ::=
F ::=
T E' $
+ T E'
ε
F T'
* F T'
ε
(E)
id
First E = { "(", "id"}
First F = { "(", "id"}
First T = { "(", "id"}
First E' = { "+", ε}
First T' = { "*", ε}
Follow E =
Follow F =
Follow T =
Follow E' =
Follow T' =
{")","$"}
{"+","*",”)”,"$"}
{{"+",")","$"}
{")","$"}
{"+",")","$"}
• First of a terminal is itself.
• First can be extended to sequence of symbols.
11/5/2015
33
Cse321, Programming Languages and Compilers
Nullable
• if ε is in First(symbol) then that symbol
is nullable.
• Sometime rather than let ε be a symbol
we derive an additional function
E ::= T E' $
nullable.
E' ::= + T E'
E’ ::=
T ::=
T' ::=
T’ ::=
F ::=
F ::=
• Nullable (E’) = true
• Nullable(T’) = true
• Nullable for all other symbols is false
11/5/2015
ε
F T'
* F T'
ε
(E)
id
34
Cse321, Programming Languages and Compilers
Computing First
• Use the following rules until no more terminals
can be added to any FIRST set.
1) if X is a term. FIRST(X) = {X}
2) if X ::= ε is a production then add ε to
FIRST(X), (Or set nullable of X to true).
3) if X is a non-term and
– X ::= Y1 Y2 ... Yk
– add a to FIRST(X)
» if a in FIRST(Yi) and
» for all j<i ε in FIRST(Yj)
• E.g.. if Y1 can derive ε then if a is in FIRST(Y2)
it is surely in FIRST(X) as well.
11/5/2015
35
Cse321, Programming Languages and Compilers
Example First Computation
• Terminals
– First($) = {$} First(*) = {*} First(+) = {+} ...
• Empty Productions
– add ε to First(E’), add ε to First(T’)
• Other NonTerminals
– Computing from the lowest layer (F) up
» First(F) = {id , ( }
» First(T’) = { ε, * }
» First(T) = First(F) = {id, ( }
» First(E’) = { ε, + }
» First(E) = First(T) = {id, ( }
11/5/2015
E ::=
E' ::=
E’ ::=
T ::=
T' ::=
T’ ::=
F ::=
F ::=
T E' $
+ T E'
ε
F T'
* F T'
ε
(E)
id
36
Cse321, Programming Languages and Compilers
Computing Follow
• Use the following rules until nothing can be
added to any follow set.
1) Place $ (the end of input marker) in FOLLOW(S)
where S is the start symbol.
2) If A ::= a B b
then everything in FIRST(b) except ε is in
FOLLOW(B)
3) If there is a production A ::= a B
or A ::- a B b where FIRST(b)
contains ε (i.e. b can derive the empty string)
then everything in FOLLOW(A) is in FOLLOW(B)
11/5/2015
37
Cse321, Programming Languages and Compilers
Ex. Follow Computation
• Rule 1, Start symbol
– Add $ to Follow(E)
• Rule 2, Productions with embedded nonterms
–
–
–
–
Add First( ) ) = { ) } to follow(E)
Add First($) = { $ } to Follow(E’)
Add First(E’) = {+,ε } to Follow(T)
Add First(T’) = {*,ε} to Follow(F)
• Rule 3, Nonterm in last position
–
–
–
–
11/5/2015
Add
Add
Add
Add
follow(E’) to follow(E’) (doesn’t do much)
follow (T) to follow(T’)
follow(T) to follow(F) since T’ --> ε
follow(T’) to follow(F) since T’ --> ε
E ::=
E' ::=
E’ ::=
T ::=
T' ::=
T’ ::=
F ::=
F ::=
T E' $
+ T E'
ε
FT'
* F T'
ε
(E)
id
38
Cse321, Programming Languages and Compilers
Table from First and Follow
1. For each production A -> alpha do 2 & 3
2. For each a in First alpha do add A -> alpha to M[A,a]
3. if ε is in First alpha, add A -> alpha to M[A,b] for each terminal
b in Follow A. If ε is in First alpha and $ is in Follow A add A > alpha to M[A,$].
First E = {"(","id"}
First F = {"(","id"}
First T = {"(","id"}
First E' = {"+",ε}
First T' = {"*",ε}
Follow E =
Follow F =
Follow T =
Follow E' =
Follow T' =
1
2
3
4
5
6
7
8
{")","$"}
{"+","*",”)”,"$"}
{{"+",")","$"}
{")","$"}
{"+",")","$"}
M[A,t] terminals
+
11/5/2015
n
o
n
t
e
r
m
s
E
E’
T
T’
F
*
2
)
3
(
1
4
6
5
E ::=
E' ::=
|
T ::=
T' ::=
|
F ::=
|
T E' $
+ T E'
ε
F T'
* F T'
ε
( E )
id
id $
1
3
4
6
6
7
8
39
Cse321, Programming Languages and Compilers
Predictive Parsing Table
id
E
11/5/2015
(
ε
F T’
$
ε
F T’
ε
id
)
T E’
+ T E’
T’
F
*
T E’
E’
T
+
* F T’
ε
ε
(E)
40
Cse321, Programming Languages and Compilers
Table Driven Algorithm
push start symbol
Repeat
begin
let X top of stack, A next input
if terminal(X)
then if X=A
then pop X; remove A
else error()
else (* nonterminal(X) *)
begin
if M[X,A] = Y1 Y2 ... Yk
then pop X;
push Yk YK-1 ... Y1
else error()
end
until stack is empty, input = $
11/5/2015
41
Cse321, Programming Languages and Compilers
Example Parse
id
E
11/5/2015
Input
x + y $
x + y $
x + y $
x + y $
+ y $
+ y $
+ y $
y $
y $
y $
$
$
$
F
(
ε
F T’
$
ε
F T’
ε
id
)
T E’
+ T E’
T’
Stack
E
E’ T
E’ T’ F
E’ T’ id
E’ T’
E’
E’ T +
E’ T
E’ T’ F
E’ T’ id
E’ T’
E’
*
T E’
E’
T
+
* F T’
ε
ε
(E)
42
Cse321, Programming Languages and Compilers
CS321 Prog Lang & Compilers
Assigned: Feb 5, 2007
Assignment # 7
Due: Wed. Feb 14, 2007
Cut and paste the following into your solution file.
==================================================================
datatype RE
= Empty
| Union of RE * RE
| Concat of RE * RE
| Star of RE
| C of char;
================================================================
The purpose of today's home work is to write functions analogous to "first" and
"follow" for context free grammars from todays lecture. There are two differences,
The functions you will write for homework are for regular expressions, not context
free grammars, and since REs don't have non-terminal and terminal symbols, the
functions are for a complete RE rather than a symbol.
Write 3 ML functions
1) Write
(nullable: RE -> boolean) Returns a boolean, true if the empty string is a
member of the set of strings recognized by the RE, false otherwise.
2) Write (first: RE -> char list) Returns a (char list) which contains those
characters which may appear as the first character in the strings recognized by
that RE.
3) Write (last: RE -> char list). Returns a char list which contains those characters
which may appear as the last character in the strings recognized by that RE.
All these functions a simple functions defined with pattern matching. One clause for
each constructor of RE.
11/5/2015
43