Transcript ppt

Announcements

HW1 due on Monday February 8th


Name and date your submission
Submit electronically in Homework Server AND
on paper in the beginning of class




Make sure you have an account in HW Server!
You may submit late on Thursday February 11th
for 50% credit
No other submissions accepted
Email questions to [email protected]
Spring 16 CSCI 4430, A Milanova
1
Last Class


Top-down parsing vs. bottom-up parsing
Top-down parsing





Introduction
A backtracking parser
Recursive descent predictive parser
Table-driven top-down parser
LL(1) parsing tables, FIRST and FOLLOW sets
Spring 16 CSCI 4430, A Milanova
2
Today’s Lecture Outline

Top-down (also called LL) parsing



LL(1) parsing tables, FIRST, FOLLOW and
PREDICT sets
Writing an LL(1) grammar
Bottom-up (also called LR) parsing

Model of a bottom-up (LR) parser
Spring 16 CSCI 4430, A Milanova
3
Programming Language Syntax
Parsing
Read: finish Chapter 2.3.2 and start
Chapter 2.3.3
LL(1) Parsing Tables


One dimension is nonterminal to expand
Other dimension is lookahead token
a
A

α
E.g., entry “nonterminal A on terminal a”
contains production A  α

This means, when the parser is at nonterminal A
and the lookahead token in the stream is a, the
parser must expand A by production A  α
5
LL(1) Parsing Table
start  expr $$
expr  term term_tail
term  id factor_tail
term_tail  + term term_tail | ε
factor_tail  * id factor_tail | ε
id
+
*
$$
start
expr $$
-
-
-
expr
term term_tail
-
-
-
term_tail
-
+ term term_tail -
ε
term
id factor_tail
-
-
-
factor_tail
-
ε
* id factor_tail ε
Spring 16 CSCI 4430, A Milanova
6
LL(1) Parsing Tables

We can construct an LL(1) parsing table for any
context-free grammar


In general, the table will contain multiply-defined entries.
That is, for some nonterminal and lookahead token, more
than one productions apply
A grammar whose LL(1) parsing table has no
multiply-defined entries is said to be LL(1) grammar

LL(1) grammars are a very special subclass of contextfree grammars. Why?
Spring 16 CSCI 4430, A Milanova
7
FIRST and FOLLOW sets

Let α be any sequence of nonterminals and
terminals



FIRST(α) is the set of terminals a that begin the strings
derived from α
If there is a derivation α * ε, then ε is in FIRST(α)
Let A be a nonterminal

FOLLOW(A) is the set of terminals b (including special
end-of-input marker $$) that can appear immediately to
the right of A in some sentential form:
start * …Ab… *…
Spring 16 CSCI 4430, A Milanova
8
Computing FIRST

Notation:
α is an arbitrary sequence
of terminals and nonterminals.
Apply these rules until no more terminals or ε can be
added to any FIRST(α) set
(1) If α starts with a terminal a, then FIRST(α) = {a}
(2) If α is a nonterminal X, where X  ε, then add ε to
FIRST(α)
(3) If α is a nonterminal X  Y1Y2…Yk then place a in
FIRST(X) if for some i, a is in FIRST(Yi) and ε is in all of
FIRST(Y1), … FIRST(Yi-1). If ε is in all of FIRST(Y1), …
FIRST(Yk), add ε to FIRST(X).
 Everything in FIRST(Y1) is surely in FIRST(X)
 If Y1 does not derive ε, then we add nothing more;
Otherwise, we add FIRST(Y2), and so on
Similarly, if α is Y1Y2…Yk , we’ll repeat the above
9
Warm-up Exercise
start  expr $$
expr  term term_tail
term  id factor_tail
term_tail  + term term_tail | ε
factor_tail  * id factor_tail | ε
FIRST(term) = { id }
FIRST(expr) =
FIRST(start) =
FIRST(term_tail) =
FIRST(+ term term_tail) =
FIRST(factor_tail) =
Spring 16 CSCI 4430, A Milanova
10
Exercise
start  S $$
SxS|Ay
A  BCD | ε
BzS|ε
CvS|ε
DwS
Compute FIRST sets:
FIRST(x S) =
FIRST(A y) =
FIRST(BCD) =
FIRST(z S) =
FIRST(v S) =
FIRST(w S) =
FIRST(S) =
FIRST(A) =
FIRST(B) =
FIRST(C) =
FIRST(D) =
11
Computing FOLLOW
Notation:
A,B,S are nonterminals.
α,β are arbitrary sequences
of terminals and nonterminals.
Apply these rules until nothing can be added to
any FOLLOW(A) set

(1) If there is a production A  αBβ, then everything
in FIRST(β) except for ε should be added to
FOLLOW(B)
(2) If there is a production A  αB, or a production
A  αBβ, where FIRST(β) contains ε, then
everything in FOLLOW(A) should be added to
FOLLOW(B)
Spring 16 CSCI 4430, A Milanova
12
Warm-up
start  expr $$
expr  term term_tail
term  id factor_tail
term_tail  + term term_tail | ε
factor_tail  * id factor_tail | ε
FOLLOW(expr) = { $$ }
FOLLOW(term_tail) =
FOLLOW(term) =
FOLLOW(factor_tail) =
Spring 16 CSCI 4430, A Milanova
13
Exercise
start  S $$
SxS|Ay
A  BCD | ε
BzS|ε
CvS|ε
DwS
Compute FOLLOW sets:
FOLLOW(A) =
FOLLOW(B) =
FOLLOW(C) =
FOLLOW(D) =
FOLLOW(S) =
Spring 16 CSCI 4430, A Milanova
14
PREDICT Sets
if α does not derive ε
PREDICT(A  α) =
(FIRST(α) – {ε}) U FOLLOW(A)
if α derives ε
Spring 16 CSCI 4430, A Milanova
15
Constructing LL(1) Parsing Table

Algorithm uses PREDICT sets
foreach production A  α in grammar G
foreach symbol c in PREDICT(A  α)
add A  α to entry parse_table[A,c]
If all entries in parse_table contain at most one
production, then G is said to be LL(1)

Spring 16 CSCI 4430, A Milanova
16
Exercise
start  S $$
SxS|Ay
A  BCD | ε
BzS|ε
CvS|ε
DwS
Compute PREDICT sets:
PREDICT(S  x S) =
PREDICT(S  A y) =
PREDICT(A  BCD) =
PREDICT(A  ε) =
… etc…
Spring 16 CSCI 4430, A Milanova
17
Writing an LL(1) Grammar


Most context-free grammars are not LL(1)
grammars
Obstacles to LL(1)-ness
expr  expr + term | term
term  term * id | id

Left recursion is an
obstacle. Why?

Common prefixes are
an obstacle.
stmt  if b then stmt else stmt |
Why?
if b then stmt |
a
Spring 16 CSCI 4430, A Milanova
18
Removal of Left Recursion


Left recursion can be removed from a
grammar mechanically
Started from this left-recursive expression
grammar:
expr  expr + term | term
term  term * id | id

After removal of left recursion we obtain this
equivalent grammar, which is LL(1):
Spring 16 CSCI 4430, A Milanova
expr  term term_tail
term_tail  + term term_tail | ε
term  id factor_tail
factor_tail  * id factor_tail | ε
19
Removal of Common Prefixes


Common prefixes can be removed
mechanically as well, by using left-factoring
Original if-then-else grammar:
stmt  if b then stmt else stmt |
if b then stmt |
a

After left-factoring:
stmt  if b then stmt else_part | a
else_part  else stmt | ε
Spring 16 CSCI 4430, A Milanova
20
Exercise

Compute FIRSTs:
start  stmt $$
stmt  if b then stmt else_part | a
else_part  else stmt | ε
FIRST(stmt $$), FIRST(if b then stmt else_part),
FIRST(a), FIRST(else stmt)

Compute FOLLOW:
FOLLOW(else_part)


Compute PREDICT sets for all 5 productions
Construct the LL(1) parsing table. Is this grammar
an LL(1) grammar?
Spring 16 CSCI 4430, A Milanova
21
Lecture Outline

Top-down (also called LL) Parsing (continue)



LL(1) parsing table, FIRST, FOLLOW and
PREDICT sets
Writing an LL(1) grammar
Bottom-up (also called LR) Parsing

Model of the bottom-up (LR) parser
Spring 16 CSCI 4430, A Milanova
22
Bottom-up Parsing
list  id list_tail
list_tail  , id list_tail | ;
Terminals are seen in the
order of appearance in the
token stream
id , id , id ;

list
id
list_tail
,

Parse tree is constructed


From the leaves to the top
A right-most derivation in reverse
Spring 16 CSCI 4430, A Milanova
id list_tail
,
id list_tail
;
23
Bottom-up Parsing
Stack
id
id,
id,id
id,id,
id,id,id
id,id,id;
Spring 16 CSCI 4430, A Milanova
list  id list_tail
list_tail  , id list_tail | ;
Input
Action
id,id,id;
,id,id;
id,id;
,id;
id;
;
shift
shift
shift
shift
shift
shift
reduce by
list_tail; 24
Bottom-up Parsing
Stack
id,id,id list_tail
Input
list  id list_tail
list_tail  , id list_tail | ;
Action
reduce by
list_tail  ,id list_tail
id,id list_tail
reduce by
list_tail  ,id list_tail
id list_tail
reduce by
list  id list_tail
list
Spring 16 CSCI 4430, A Milanova
ACCEPT
25
Bottom-up Parsing


Also called LR parsing
LR parsing is better than LL parsing



LR parsers work with LR(k) grammars




Accepts larger class of languages
Just as efficient!
L stands for “left-to-right” scan of input
R stands for “rightmost” derivation
k stands for “need k tokens of lookahead”
We are interested in LR(0) and LR(1) and variants
in between
Spring 16 CSCI 4430, A Milanova
26
LR Parsing

The parsing method used in practice




LR parsers recognize virtually all PL constructs
LR parsers recognize a much larger set of grammars
than predictive parsers
LR parsing is efficient
LR parsing variants




SLR (or Simple LR)
LALR (or Lookahead LR) – yacc/bison generate LALR
parsers
LR (Canonical LR)
SLR < LALR < LR
Spring 16 CSCI 4430, A Milanova
27
Main Idea

Stack  Input

Stack: holds the part of the input seen so far


Input: holds the remaining part of the input


A string of both terminals and nonterminals
A string of terminals
Parser performs two actions


Reduce: parser pops a “suitable” production
right-hand-side off the stack, and pushes the
production left-hand-side on the stack
Shift: parser pushes next terminal from the input
on top of the stack
Spring 16 CSCI 4430, A Milanova
28
Example

Recall the grammar
expr  expr + term | term
term  term * id | id



This is not LL(1) because it is left recursive
LR parsers can handle left recursion!
Consider string
id + id * id
Spring 16 CSCI 4430, A Milanova
29
id + id*id
Stack
Input
id+id*id
id
+id*id
term
+id*id
expr
+id*id
expr+
id*id
expr+id
*id
Spring 16 CSCI 4430, A Milanova
expr  expr + term | term
term  term * id | id
Action
shift id
reduce by term id
reduce by expr term
shift +
shift id
reduce by term  id
30
expr  expr + term | term
term  term * id | id
id + id*id
Stack
Input Action
expr+term
*id
expr+term*
id
expr+term*id
expr+term
expr
Spring 16 CSCI 4430, A Milanova
shift *
shift id
reduce by termterm *id
reduce by exprexpr+term
ACCEPT, SUCCESS
31
id + id*id
expr  expr + term | term
term  term * id | id
Sequence of reductions performed by parser
id+id*id
• A right-most derivation in
reverse
term+id*id
expr+id*id
• The stack (e.g., expr)
concatenated with remaining
expr+term*id
input (e.g., +id*id) gives a
expr+term
expr
Spring 16 CSCI 4430, A Milanova
sentential form (expr+id*id)
in the right-most derivation
32
expr  expr + term | term
term  term * id | id
Handle

A handle



Formally, if we have a right-most derivation
S …  αAw  αβw, then we say that
A  β at position α is a handle of αβw
Notation: S and A are nonterminals, w is a
sequence of terminals, α and β are arbitrary
sequences (of both terminals and nonterminals)
Recall our example id+id*id
Stack
expr+term
expr+term*id
Spring 16 CSCI 4430, A Milanova
Input
*id
Is expr  expr+term a handle?
Is term  id a handle?
33
Model of an LR parser
Input:
Stack:
State
Symbol
a1
ai
…
an
…
$$
LR
Parsing Program
sm
Xm
sm-1
Xm-1
…
Parsing table:
s0
action
action[s,a]: Do we shift or reduce?
Spring 16 CSCI 4430, A Milanova
goto
goto[s,A]: After reduction to
nonterminal A, what state is pushed
34
on top of the stack?
Model of an LR parser


Stack is (s0,X1,s1,…Xm,sm), input pointer at ai
action[sm,ai] is shift s


Push ai and state s on stack:
(s0,X1,s1,…Xm,sm,ai,s)
action[sm,ai] is reduce by A  β


Pop β (i.e., pop 2*|β| things off the stack - all
symbols in β plus all their corresponding states):
(s0,X1,s1,…Xm-|β|,sm-|β|)
Push A and goto[sm-|β|,A]=s on top of the stack:
(s0,X1,s1,…Xm-|β|,sm-|β|,A,s)
Spring 16 CSCI 4430, A Milanova
35
White – action table
Blue – goto table
LR Parsing Table
1. expr  expr + term
2. expr  term
state
0
1
2
id
7
*
$$
s3
3
4
5
6
+
3. term  term * id
4. term  id
s4
r2
s5
acc
r2
r4
r4
r4
s3
s7
expr
term
1
2
6
r1
s5
r1
r3
r3
r3
36
Summary

Top-down (also called LL) Parsing (continue)



LL(1) parsing table and predict sets
Writing an LL(1) grammar
Bottom-up (also called LR) Parsing


Model of the bottom-up (LR) parser
LR parsing table
Spring 16 CSCI 4430, A Milanova
37
Next Class

We will continue with Bottom-up Parsing.
Keep reading Chapter 2.3.3
Spring 16 CSCI 4430, A Milanova
38