CPSC 388 – Compiler Design and Construction

Download Report

Transcript CPSC 388 – Compiler Design and Construction

CPSC 388 – Compiler Design
and Construction
Implementing a Parser
LL(1) and LALR Grammars
FBI Noon Dining Hall Vicki Anderson Recruiter
Announcements
 PROG 3 out, due Oct 9th
 Get started NOW!
 HW due Friday
 HW6 posted, due next Friday
Parsing using CFGs
 Algorithms can parse using CFGs in O(n3) time (n is the
number of characters in input stream) – TOO SLOW
 Subclasses of grammars can be parsed in O(n) time
 LL(1)
1 token of look ahead
Do a left most derivation
Scan input from left to right
 LALR(1)
one token of look-ahead
do a rightmost derivation in reverse
scan the input left-to-right
LA means "look-ahead“
(nothing to do with the number of tokens)
LALR(1)
 More general than LL(1) grammars
(Every LL(1) grammar is a LALR(1) grammar but
not vice versa)
 Class of grammars used by java_cup,
Bison, YACC
 Parsed bottom up
(start with non-terminals and build tree from leaves
up to root)
 Covered in text section 4.6-4.7
 For class need to understand details of just
LL(1) grammars
LL(1) Grammars – Predictive Parsers
 “build” parse tree top-down
actually discover tree top-down, don’t
actually build it
 Keep track of work to be done using a stack
 Scanned tokens along with stack
correspond to leaves of incomplete tree
 Use parse table to decide how to parse
input
 Rows are non-terminals
 Columns are tokens (plus EOF token)
 Cells are the bodies of production rules
Predictive Parser Algorithm
s.push(EOF) // special EOF terminal
s.push(start) // start is start non-terminal
x=s.peek()
t=scanner.next_token()
While (x != EOF):
if x==t:
s.pop()
t=scanner.next_token()
else: if x is terminal: error
else: if table[x][t]==empty: error
else:
let body=table[x][t] //body of production
output x→body
s.pop()
s.push(…) //push body from right to left
x=s.peek()
Example Parse using algorithm
 Consider the language of balanced
parentheses and brackets, e.g. ([])
 Input String is “([])EOF”
 Grammar:
S→ε|(S)|[S]
 Parse Table:
S
(
)
[
]
EOF
(S)
ε
[S]
ε
ε
Not All Grammars LL(1)
 Not all Grammars are LL(1):
S→(S)|[S]|()|[]
 If input is ( don’t know which rule to
use!
 Try input “[[]]” to LL(1) grammar
using predictive parser
 Draw input seen so far
 Stack
 Action taken
Is Grammar LL(1)
 Given a grammar how do you tell if it
is LL(1)?
 How to build the parse table?
 If parse table is built and only one
entry per cell then LL(1)
Non-LL(1) Grammars
 If a grammar is left-recursive
 If a grammar is not left-factored
 It is sometimes possible to change a
grammar to remove left-recursion
and to make it left-factored
Left-Recursion
 Grammar g is recursive if there exists
a production such that:
x * x
Recursive
x  x
Left recursive
x  x
Right recursive
*
*
Removing Immediate Left-Recursion
 Consider the grammar
A → Aα | β
 A is a nonterminal
 α a sequence of terminals and/or nonterminals
 β is a sequence of terminals and/or nonterminals
not starting with A
 Replace production with
A → β A’
A’ → α A’ | ε
 Two grammars are equivalent (recognize
same set of input strings)
You Try it
 Remove left recursion from the grammar:
exp
factor
→
→
exp - factor | factor
INTLITERAL | ( exp )
 Construct parse tree using original
grammar and new grammar using input “53-2”
 In general more difficult than this to
remove left recursion, see text 4.3.3
Left Factored
 A grammar is NOT left-factored if a
non-terminal has two productions
whose bodies have common prefixes
exp → ( exp ) | ( )
 A top-down predictive parser would
not know which production rule to
use when seeing input character of
“(“
Left Factoring
 Given a pair of productions:
A → α β1 | α β2
 α is sequence of terminals and non-terminals
 β1 and β2 are sequence of terminals and nonterminals but don’t have common prefix (may
be epsilon)
 Change to:
A → α A’
A’ → β1 | β2
Left Factoring Example
 So for grammar
exp
→
( exp ) | ( )
 It becomes
exp
exp’
→
→
( exp’
exp ) | )
You Try It
 Remove left recursion and do left
factoring for grammar
exp → ( exp ) | exp exp | ( )
Building Parse Tables
 Recall a parse table
 Every row is a non-terminal
 Every column is an input token
 Every cell contains a production body
 If any cell contains more than one
production body then grammar is not
LL(1)
 To build parse table need to have
FIRST set and FOLLOW set
FIRST set
 FIRST(α)
α is some sequence of terminals and nonterminals
FIRST(α) is set of terminals that begin the
strings derivable from α
if α can derive ε, then ε is in FIRST(α)
 t is terminaland α  * t 
FIRST( )  t |

t   and α  *


FIRST(X)




X is a single terminal, non-terminal or ε
FIRST(X)={X} //X is terminal
FIRST(X)={ε} //X is ε
FIRST(X)=…
//X is non-terminal
 Look at all productions rules with X as head
 For each production rule, X →Y1,Y2,…Yn
 Put FIRST(Y1) - {ε} into FIRST(X).
 If ε is in FIRST(Y1), then put FIRST(Y2) - {ε} into
FIRST(X).
 If ε is in FIRST(Y2), then put FIRST(Y3) - {ε} into
FIRST(X).
 etc...
 If ε is in FIRST(Yi) for 1 <= i <= n (all production righthand side
Example FIRST Sets
 Compute FIRST sets for each nonterminal:
exp
exp’
term
term’
factor
→
→
→
→
→
term exp’
{ INTLITERAL, ( }
{ /, ε }
- term exp’ | ε
{ INTLITERAL, ( }
factor term’
{ -, ε }
/ factor term’ | ε
INTLITERAL | ( exp ) {INTLITERAL, ( }
FIRST(α) for any α
 α is of the form X1, X2, …, Xn
 Where each X is a terminal, non-terminal or ε
1. Put FIRST(X1) - {ε} into FIRST(α)
2. If epsilon is in FIRST(X1) put
FIRST(X2) into FIRST(α).
3. etc...
4. If ε is in the FIRST set for every Xn,
put ε into FIRST(α).
Example FIRST sets for rules
FIRST( term exp' )
=
FIRST( - term exp' ) =
FIRST(ε )
=
FIRST( factor term' ) =
FIRST( / factor term' ) =
FIRST(ε )
=
FIRST( INTLITERAL ) =
FIRST( ( exp ) )
=
{ INTLITERAL, ( }
{-}
{ε }
{ INTLITERAL, ( }
{/}
{ε }
{ INTLITERAL }
{(}
Why Do We Care about FIRST(α)?
 During parsing, suppose the top-of-stack
symbol is nonterminal A, that there are two
productions:
 A→α
 A→β
 And that the current token is x
 If x is in FIRST(α) then use first production
 If x is in FIRST(β) then use second
production
FOLLOW(A) sets
 Only defined for single
non-terminals, A
 the set of terminals that can appear
immediately to the right of A (may
include EOF but never ε)
Calculating FOLLOW(A)
 If A is start non-terminal put EOF in
FOLLOW(A)
 Find productions with A in body:
 For each production X → α A β
 put FIRST(β) – {ε} in FOLLOW(A)
 If ε in FIRST(β) put FOLLOW(X) into
FOLLOW(A)
 For each production X → α A
 put FOLLOW(X) into FOLLOW(A)
FIRST and FOLLOW sets
 To compute FIRST(A) you must look for A
on a production's left-hand side.
 To compute FOLLOW(A) you must look for A
on a production's right-hand side.
 FIRST and FOLLOW sets are always sets of
terminals (plus, perhaps, ε for FIRST sets,
and EOF for follow sets).
 Nonterminals are never in a FIRST or a
FOLLOW set.
Example FOLLOW sets
CAPS are non-terminals and lower-case are terminals
S →
Bc|DB
B →
ab|cS
D →
d|ε
X
FIRST(X)
FOLLOW(X)
------------------------------------------D
{ d, ε }
{ a, c }
B
{ a, c }
{ c, EOF }
S
{ a, c, d }
{ EOF, c }
Note: FOLLOW of S always includes EOF
You Try It
 Computer FIRST and FOLLOW sets
for:
methodHeader
paramList
paramList
nonEmptyParamList
nonEmptyParamList
→ VOID ID LPAREN paramList RPAREN
→ epsilon
→ nonEmptyParamList
→ ID ID
→ ID ID COMMA nonEmptyParamList
 Remember you need FIRST and FOLLOW
sets for all non-terminals and FIRST sets
for all bodies of rules
Parse Table
Current
Token
a
S
Non-terminals
A
X
R
Rule bodies
b
c
d
Parse Table Construction Algorithm
for each production X → α:
for each terminal t in First(α):
put α in Table[X,t]
if ε is in First(α) then:
for each terminal t in Follow(X):
put α in Table[X,t]
Example Parse Table Construction
S→Bc|DB
B→ab|cS
D→d|ε
For this grammar:
 Construct FIRST and FOLLOW Sets
 Apply algorithm to calculate parse
table
Example Parse Table Construction
X
FIRST(X)
FOLLOW(X)
--------------------------------------------------D
{ d, ε }
{ a, c }
B
{ a, c }
{ c, EOF }
S
{ a, c, d }
{ EOF, c }
Bc
{ a, c }
DB
{ d, a, c }
ab
{a}
cS
{c}
D
{d}
Ε
{ε }
Parse Table
a
S
b
c
d
Bc
DB
Bc
DB
DB
ε
ε
B
D
Finish Filling In Table
EOF
Predictive Parser Algorithm
s.push(EOF) // special EOF terminal
s.push(start) // start is start non-terminal
x=s.peek()
t=scanner.next_token()
While (x != EOF):
if x==t:
s.pop()
t=scanner.next_token()
else: if x is terminal: error
else: if table[x][t]==empty: error
else:
let body=table[x][t] //body of production
output x→body
s.pop()
s.push(…) //push body from right to left
x=s.peek()