Transcript ppt
Announcements
HW1 due on Monday February 8th
Name and date your submission
Submit electronically in Homework Server AND
on paper in the beginning of class
Make sure you have an account in HW Server!
You may submit late on Thursday February 11th
for 50% credit
No other submissions accepted
Email questions to [email protected]
Spring 16 CSCI 4430, A Milanova
1
Last Class
Top-down parsing vs. bottom-up parsing
Top-down parsing
Introduction
A backtracking parser
Recursive descent predictive parser
Table-driven top-down parser
LL(1) parsing tables, FIRST and FOLLOW sets
Spring 16 CSCI 4430, A Milanova
2
Today’s Lecture Outline
Top-down (also called LL) parsing
LL(1) parsing tables, FIRST, FOLLOW and
PREDICT sets
Writing an LL(1) grammar
Bottom-up (also called LR) parsing
Model of a bottom-up (LR) parser
Spring 16 CSCI 4430, A Milanova
3
Programming Language Syntax
Parsing
Read: finish Chapter 2.3.2 and start
Chapter 2.3.3
LL(1) Parsing Tables
One dimension is nonterminal to expand
Other dimension is lookahead token
a
A
α
E.g., entry “nonterminal A on terminal a”
contains production A α
This means, when the parser is at nonterminal A
and the lookahead token in the stream is a, the
parser must expand A by production A α
5
LL(1) Parsing Table
start expr $$
expr term term_tail
term id factor_tail
term_tail + term term_tail | ε
factor_tail * id factor_tail | ε
id
+
*
$$
start
expr $$
-
-
-
expr
term term_tail
-
-
-
term_tail
-
+ term term_tail -
ε
term
id factor_tail
-
-
-
factor_tail
-
ε
* id factor_tail ε
Spring 16 CSCI 4430, A Milanova
6
LL(1) Parsing Tables
We can construct an LL(1) parsing table for any
context-free grammar
In general, the table will contain multiply-defined entries.
That is, for some nonterminal and lookahead token, more
than one productions apply
A grammar whose LL(1) parsing table has no
multiply-defined entries is said to be LL(1) grammar
LL(1) grammars are a very special subclass of contextfree grammars. Why?
Spring 16 CSCI 4430, A Milanova
7
FIRST and FOLLOW sets
Let α be any sequence of nonterminals and
terminals
FIRST(α) is the set of terminals a that begin the strings
derived from α
If there is a derivation α * ε, then ε is in FIRST(α)
Let A be a nonterminal
FOLLOW(A) is the set of terminals b (including special
end-of-input marker $$) that can appear immediately to
the right of A in some sentential form:
start * …Ab… *…
Spring 16 CSCI 4430, A Milanova
8
Computing FIRST
Notation:
α is an arbitrary sequence
of terminals and nonterminals.
Apply these rules until no more terminals or ε can be
added to any FIRST(α) set
(1) If α starts with a terminal a, then FIRST(α) = {a}
(2) If α is a nonterminal X, where X ε, then add ε to
FIRST(α)
(3) If α is a nonterminal X Y1Y2…Yk then place a in
FIRST(X) if for some i, a is in FIRST(Yi) and ε is in all of
FIRST(Y1), … FIRST(Yi-1). If ε is in all of FIRST(Y1), …
FIRST(Yk), add ε to FIRST(X).
Everything in FIRST(Y1) is surely in FIRST(X)
If Y1 does not derive ε, then we add nothing more;
Otherwise, we add FIRST(Y2), and so on
Similarly, if α is Y1Y2…Yk , we’ll repeat the above
9
Warm-up Exercise
start expr $$
expr term term_tail
term id factor_tail
term_tail + term term_tail | ε
factor_tail * id factor_tail | ε
FIRST(term) = { id }
FIRST(expr) =
FIRST(start) =
FIRST(term_tail) =
FIRST(+ term term_tail) =
FIRST(factor_tail) =
Spring 16 CSCI 4430, A Milanova
10
Exercise
start S $$
SxS|Ay
A BCD | ε
BzS|ε
CvS|ε
DwS
Compute FIRST sets:
FIRST(x S) =
FIRST(A y) =
FIRST(BCD) =
FIRST(z S) =
FIRST(v S) =
FIRST(w S) =
FIRST(S) =
FIRST(A) =
FIRST(B) =
FIRST(C) =
FIRST(D) =
11
Computing FOLLOW
Notation:
A,B,S are nonterminals.
α,β are arbitrary sequences
of terminals and nonterminals.
Apply these rules until nothing can be added to
any FOLLOW(A) set
(1) If there is a production A αBβ, then everything
in FIRST(β) except for ε should be added to
FOLLOW(B)
(2) If there is a production A αB, or a production
A αBβ, where FIRST(β) contains ε, then
everything in FOLLOW(A) should be added to
FOLLOW(B)
Spring 16 CSCI 4430, A Milanova
12
Warm-up
start expr $$
expr term term_tail
term id factor_tail
term_tail + term term_tail | ε
factor_tail * id factor_tail | ε
FOLLOW(expr) = { $$ }
FOLLOW(term_tail) =
FOLLOW(term) =
FOLLOW(factor_tail) =
Spring 16 CSCI 4430, A Milanova
13
Exercise
start S $$
SxS|Ay
A BCD | ε
BzS|ε
CvS|ε
DwS
Compute FOLLOW sets:
FOLLOW(A) =
FOLLOW(B) =
FOLLOW(C) =
FOLLOW(D) =
FOLLOW(S) =
Spring 16 CSCI 4430, A Milanova
14
PREDICT Sets
if α does not derive ε
PREDICT(A α) =
(FIRST(α) – {ε}) U FOLLOW(A)
if α derives ε
Spring 16 CSCI 4430, A Milanova
15
Constructing LL(1) Parsing Table
Algorithm uses PREDICT sets
foreach production A α in grammar G
foreach symbol c in PREDICT(A α)
add A α to entry parse_table[A,c]
If all entries in parse_table contain at most one
production, then G is said to be LL(1)
Spring 16 CSCI 4430, A Milanova
16
Exercise
start S $$
SxS|Ay
A BCD | ε
BzS|ε
CvS|ε
DwS
Compute PREDICT sets:
PREDICT(S x S) =
PREDICT(S A y) =
PREDICT(A BCD) =
PREDICT(A ε) =
… etc…
Spring 16 CSCI 4430, A Milanova
17
Writing an LL(1) Grammar
Most context-free grammars are not LL(1)
grammars
Obstacles to LL(1)-ness
expr expr + term | term
term term * id | id
Left recursion is an
obstacle. Why?
Common prefixes are
an obstacle.
stmt if b then stmt else stmt |
Why?
if b then stmt |
a
Spring 16 CSCI 4430, A Milanova
18
Removal of Left Recursion
Left recursion can be removed from a
grammar mechanically
Started from this left-recursive expression
grammar:
expr expr + term | term
term term * id | id
After removal of left recursion we obtain this
equivalent grammar, which is LL(1):
Spring 16 CSCI 4430, A Milanova
expr term term_tail
term_tail + term term_tail | ε
term id factor_tail
factor_tail * id factor_tail | ε
19
Removal of Common Prefixes
Common prefixes can be removed
mechanically as well, by using left-factoring
Original if-then-else grammar:
stmt if b then stmt else stmt |
if b then stmt |
a
After left-factoring:
stmt if b then stmt else_part | a
else_part else stmt | ε
Spring 16 CSCI 4430, A Milanova
20
Exercise
Compute FIRSTs:
start stmt $$
stmt if b then stmt else_part | a
else_part else stmt | ε
FIRST(stmt $$), FIRST(if b then stmt else_part),
FIRST(a), FIRST(else stmt)
Compute FOLLOW:
FOLLOW(else_part)
Compute PREDICT sets for all 5 productions
Construct the LL(1) parsing table. Is this grammar
an LL(1) grammar?
Spring 16 CSCI 4430, A Milanova
21
Lecture Outline
Top-down (also called LL) Parsing (continue)
LL(1) parsing table, FIRST, FOLLOW and
PREDICT sets
Writing an LL(1) grammar
Bottom-up (also called LR) Parsing
Model of the bottom-up (LR) parser
Spring 16 CSCI 4430, A Milanova
22
Bottom-up Parsing
list id list_tail
list_tail , id list_tail | ;
Terminals are seen in the
order of appearance in the
token stream
id , id , id ;
list
id
list_tail
,
Parse tree is constructed
From the leaves to the top
A right-most derivation in reverse
Spring 16 CSCI 4430, A Milanova
id list_tail
,
id list_tail
;
23
Bottom-up Parsing
Stack
id
id,
id,id
id,id,
id,id,id
id,id,id;
Spring 16 CSCI 4430, A Milanova
list id list_tail
list_tail , id list_tail | ;
Input
Action
id,id,id;
,id,id;
id,id;
,id;
id;
;
shift
shift
shift
shift
shift
shift
reduce by
list_tail; 24
Bottom-up Parsing
Stack
id,id,id list_tail
Input
list id list_tail
list_tail , id list_tail | ;
Action
reduce by
list_tail ,id list_tail
id,id list_tail
reduce by
list_tail ,id list_tail
id list_tail
reduce by
list id list_tail
list
Spring 16 CSCI 4430, A Milanova
ACCEPT
25
Bottom-up Parsing
Also called LR parsing
LR parsing is better than LL parsing
LR parsers work with LR(k) grammars
Accepts larger class of languages
Just as efficient!
L stands for “left-to-right” scan of input
R stands for “rightmost” derivation
k stands for “need k tokens of lookahead”
We are interested in LR(0) and LR(1) and variants
in between
Spring 16 CSCI 4430, A Milanova
26
LR Parsing
The parsing method used in practice
LR parsers recognize virtually all PL constructs
LR parsers recognize a much larger set of grammars
than predictive parsers
LR parsing is efficient
LR parsing variants
SLR (or Simple LR)
LALR (or Lookahead LR) – yacc/bison generate LALR
parsers
LR (Canonical LR)
SLR < LALR < LR
Spring 16 CSCI 4430, A Milanova
27
Main Idea
Stack Input
Stack: holds the part of the input seen so far
Input: holds the remaining part of the input
A string of both terminals and nonterminals
A string of terminals
Parser performs two actions
Reduce: parser pops a “suitable” production
right-hand-side off the stack, and pushes the
production left-hand-side on the stack
Shift: parser pushes next terminal from the input
on top of the stack
Spring 16 CSCI 4430, A Milanova
28
Example
Recall the grammar
expr expr + term | term
term term * id | id
This is not LL(1) because it is left recursive
LR parsers can handle left recursion!
Consider string
id + id * id
Spring 16 CSCI 4430, A Milanova
29
id + id*id
Stack
Input
id+id*id
id
+id*id
term
+id*id
expr
+id*id
expr+
id*id
expr+id
*id
Spring 16 CSCI 4430, A Milanova
expr expr + term | term
term term * id | id
Action
shift id
reduce by term id
reduce by expr term
shift +
shift id
reduce by term id
30
expr expr + term | term
term term * id | id
id + id*id
Stack
Input Action
expr+term
*id
expr+term*
id
expr+term*id
expr+term
expr
Spring 16 CSCI 4430, A Milanova
shift *
shift id
reduce by termterm *id
reduce by exprexpr+term
ACCEPT, SUCCESS
31
id + id*id
expr expr + term | term
term term * id | id
Sequence of reductions performed by parser
id+id*id
• A right-most derivation in
reverse
term+id*id
expr+id*id
• The stack (e.g., expr)
concatenated with remaining
expr+term*id
input (e.g., +id*id) gives a
expr+term
expr
Spring 16 CSCI 4430, A Milanova
sentential form (expr+id*id)
in the right-most derivation
32
expr expr + term | term
term term * id | id
Handle
A handle
Formally, if we have a right-most derivation
S … αAw αβw, then we say that
A β at position α is a handle of αβw
Notation: S and A are nonterminals, w is a
sequence of terminals, α and β are arbitrary
sequences (of both terminals and nonterminals)
Recall our example id+id*id
Stack
expr+term
expr+term*id
Spring 16 CSCI 4430, A Milanova
Input
*id
Is expr expr+term a handle?
Is term id a handle?
33
Model of an LR parser
Input:
Stack:
State
Symbol
a1
ai
…
an
…
$$
LR
Parsing Program
sm
Xm
sm-1
Xm-1
…
Parsing table:
s0
action
action[s,a]: Do we shift or reduce?
Spring 16 CSCI 4430, A Milanova
goto
goto[s,A]: After reduction to
nonterminal A, what state is pushed
34
on top of the stack?
Model of an LR parser
Stack is (s0,X1,s1,…Xm,sm), input pointer at ai
action[sm,ai] is shift s
Push ai and state s on stack:
(s0,X1,s1,…Xm,sm,ai,s)
action[sm,ai] is reduce by A β
Pop β (i.e., pop 2*|β| things off the stack - all
symbols in β plus all their corresponding states):
(s0,X1,s1,…Xm-|β|,sm-|β|)
Push A and goto[sm-|β|,A]=s on top of the stack:
(s0,X1,s1,…Xm-|β|,sm-|β|,A,s)
Spring 16 CSCI 4430, A Milanova
35
White – action table
Blue – goto table
LR Parsing Table
1. expr expr + term
2. expr term
state
0
1
2
id
7
*
$$
s3
3
4
5
6
+
3. term term * id
4. term id
s4
r2
s5
acc
r2
r4
r4
r4
s3
s7
expr
term
1
2
6
r1
s5
r1
r3
r3
r3
36
Summary
Top-down (also called LL) Parsing (continue)
LL(1) parsing table and predict sets
Writing an LL(1) grammar
Bottom-up (also called LR) Parsing
Model of the bottom-up (LR) parser
LR parsing table
Spring 16 CSCI 4430, A Milanova
37
Next Class
We will continue with Bottom-up Parsing.
Keep reading Chapter 2.3.3
Spring 16 CSCI 4430, A Milanova
38