Transcript Document
CIS 461
Compiler Design & Construction
Fall 2012
slides derived from Tevfik Bultan, Keith Cooper, and
Linda Torczon
Lecture-Module #12
Parsing 4
1
Parsing Techniques
Top-down parsers
(LL(1), recursive descent)
•
Start at the root of the parse tree from the start symbol and grow toward leaves
(similar to a derivation)
•
Pick a production and try to match the input
•
Bad “pick” may need to backtrack
•
Some grammars are backtrack-free
Bottom-up parsers
(predictive parsing)
(LR(1), operator precedence)
•
Start at the leaves and grow toward root
•
We can think of the process as reducing the input string to the start symbol
•
At each reduction step a particular substring matching the right-side of a
production is replaced by the symbol on the left-side of the production
•
Bottom-up parsers handle a large class of grammars
2
Top-down Parsing
S
A
fringe of the
parse tree
start symbol
D
B
?
C
S
left-to-right
scan
?
left-most
derivation
lookahead
Bottom-up Parsing
lookahead
S
input string
upper fringe of
the parse tree
?
A
D
right-most
derivation
in reverse
C
lookahead
3
Handle-pruning, Bottom-up Parsers
The process of discovering a handle & reducing it to the
appropriate left-hand side is called handle pruning
Handle pruning forms the basis for a bottom-up parsing method
To construct a rightmost derivation
S 0 1 2 … n-1 n w
Apply the following simple algorithm
for i n to 1 by -1
Find the handle < i i , ki > in i
Replace i with i to generate i-1
4
Example
1
2
3
4
5
6
7
8
9
S
Expr
Expr Expr + Term
| Expr – Term
| Term
Term Term * Factor
| Term / Factor
| Factor
Factor num
| id
Sentential Form
S
Expr
Expr – Term
Expr – Term * Factor
Expr – Term * <id,y>
Expr – Factor * <id,y>
Expr – <num,2> * <id,y>
Term – <num,2> * <id,y>
Factor – <num,2> * <id,y>
<id,x> – <num,2> * <id,y>
Handle
Prod’n , Pos’n
—
1,1
3,3
5,5
9,5
7,3
8,3
4,1
7,1
9,1
The expression grammar
Handles for rightmost derivation of input
string:
x–2*y
5
Handle-pruning, Bottom-up Parsers
One implementation technique is the shift-reduce parser
push $
lookahead = get_ next_token( )
repeat until (top of stack == start symbol and lookahead == $)
if the top of the stack is a handle
then /* reduce to */
pop || symbols off the stack
push onto the stack
else if (lookahead $)
then /* shift */
push lookahead
lookahead = get_next_token( )
How do errors show
up?
• failure to find a handle
• hitting $ and needing
to
shift (final else clause)
Either generates an
error
6
Example, Corresponding Parse Tree
S
Expr
Expr
–
Term
Term
Term
*
Fact.
Fact.
Fact.
<id,y>
<id,x> <num,2>
1. Shift until top-of-stack is the right end of a
handle
2. Pop the left end of the handle & reduce
5 shifts +
9 reduces +
1 accept
7
Shift-reduce Parsing
Shift reduce parsers are easily built and easily understood
A shift-reduce parser has just four actions
• Shift — next word is shifted onto the stack
• Reduce — right end of handle is at top of stack
Locate left end of handle within the stack
Pop handle off stack & push appropriate lhs
• Accept — stop parsing & report success
• Error — call an error reporting/recovery routine
Handle finding is key
• handle is on stack
• finite set of handles
use a DFA !
Accept & Error are simple
Shift is just a push and a call to the scanner
Reduce takes |rhs| pops & 1 push
If handle-finding requires state, put it in the stack
8
LR Parsers
• LR(k) parsers are table-driven, bottom-up, shift-reduce parsers
that use a limited right context (k-token lookahead) for handle
recognition
• LR(k): Left-to-right scan of the input, Rightmost derivation in reverse
with k token lookahead
A grammar is LR(k) if, given a rightmost derivation
S 0 1 2 … n-1 n sentence
We can
1. isolate the handle of each right-sentential form i , and
2. determine the production by which to reduce,
by scanning i from left-to-right, going at most k symbols beyond
the right end of the handle of i
9
LR Parsers
A table-driven LR parser looks like
Stack
source
code
grammar
Scanner
Table-driven
Parser
Parser
Generator
ACTION &
GOTO
Tables
IR
10
LR Shift-Reduce Parsers
push($); // $ is the end-of-file symbol
push(s0); // s0 is the start state of the DFA that recognizes handles
lookahead = get_next_token();
repeat forever
s = top_of_stack();
if ( ACTION[s,lookahead] == reduce ) then
pop 2*|| symbols;
s = top_of_stack();
push();
push(GOTO[s,]);
else if ( ACTION[s,lookahead] == shift si ) then
push(lookahead);
push(si);
lookahead = get_next_token();
else if ( ACTION[s,lookahead] == accept and lookahead == $ )
then return success;
else error();
The skeleton parser
•uses ACTION & GOTO
• does |words| shifts
• does |derivation|
reductions
• does 1 accept
11
LR Parsers (parse tables)
To make a parser for L(G), we need a set of tables
The grammar
1 S
2 Z
3
Z
Zz
| z
The tables
ACTION
State $
0
—
1
accept
2
reduce 3
3
reduce 2
z
shift 2
shift 3
reduce 3
reduce 2
GOTO
State Z
0
1
1
2
3
12
Example Parses
The string “z”
Stack
$ s0
$ s 0 z s2
$ s0 Z s1
Input
z$
$
$
Action
shift 2
reduce 3
accept
The string “zz”
Stack
$ s0
$ s 0 z s2
$ s0 Z s1
$ s0 Z s1 z s3
$ s0 Zs1
Input
zz$
z$
z$
$
$
Action
shift 2
reduce 3
shift 3
reduce 2
accept
13
LR Parsers
How does this LR stuff work?
• Unambiguous grammar unique rightmost derivation
• Keep upper fringe on a stack
– All active handles include TOS
– Shift inputs until TOS is right end of a handle
Reduce
action
• Language of handles is regular
– Build a handle-recognizing DFA
S1
S3
z
– ACTION & GOTO tables encode the DFA
Z
S0
• To match subterms, recurse and leave
z
DFA’s state on stack
Reduce
S2
action
• Final states of the DFA correspond to reduce actions
Control DFA for the
– New state is GOTO[lhs , state at TOS]
simple example
– For Z, this takes the DFA to S1
14
Building LR Parsers
How do we generate the ACTION and GOTO tables?
• Use the grammar to build a model of the handle recognizing DFA
• Use the DFA model to build ACTION & GOTO tables
• If construction succeeds, the grammar is LR
How do we build the handle-recognizing DFA ?
• Encode the set of productions that can be used as handles in the DFA
state: Use LR(k) items
• Use two functions goto( s, ) and closure( s )
– goto() is analogous to move() in the DFA to NFA conversion
– closure() is analogous to -closure
• Build up the states and transition functions of the DFA
• Use this information to fill in the ACTION and GOTO tables
15
LR(k) items
An LR(k) item is a pair [A , B], where
A is a production with a • at some position in the rhs
B is a lookahead string of length ≤ k
(terminal symbols or $)
Examples: [• , a], [• , a], [• , a], & [• , a]
The • in an item indicates the position of the top of the stack
• LR(0) items [ • ] (no lookahead symbol)
• LR(1) items [ • , a ] (one token lookahead)
• LR(2) items [ • , a b ] (two token lookahead) ...
16
LR(k) items
The • in an item indicates the position of the top of the stack
[• , a] means that the input seen so far is consistent with the use of
immediately after the symbol on top of the stack
[• , a] means that the input seen so far is consistent with the use of
at this point in the parse, and that the parser has already recognized .
[• , a] means that the parser has seen , and that a lookahead a is
consistent with reducing to (for LR(k) parsers a is a string of terminal
symbols of length k)
The table construction algorithm uses items to represent valid
configurations of an LR(1) parser
17
LR(1) Items
The production •, with lookahead a, generates 4 items
[• , a], [• , a], [• , a], & [• , a]
The set of LR(1) items for a grammar is finite
What’s the point of all these lookahead symbols?
• Carry them along to choose correct reduction
• Lookaheads are bookkeeping, unless item has • at right end
– Has no direct use in [• , a]
– In [• , a], a lookahead of a implies a reduction by
– For { [• , a],[• , b] }
lookahead = a
reduce to ;
lookahead FIRST()
shift
Limited right context is enough to pick the actions
18
Back to Finding Handles
Parser in a state where the stack (the fringe) was
Expr – Term
With lookahead of *
How did it choose to expand Term rather than reduce to Expr?
• Lookahead symbol is the key
• With lookahead of + or –, parser should reduce to Expr
• With lookahead of * or /, parser should shift
• Parser uses lookahead to decide
• All this context from the grammar is encoded in the handlerecognizing mechanism
19
Back to x - 2 * y
shift here
reduce here
1. Shift until TOS is the right end of a handle
2. Find the left end of the handle & reduce
20
LR(1) Table Construction
High-level overview
Build the handle-recognizing DFA (aka Canonical Collection of sets of LR(1)
items), C = { I0 , I1 , ... , In }
a Introduce a new start symbol S’ which has only one production
S’ S
b Initial state, I0 should include
• [S’ •S, $], along with any equivalent items
• Derive equivalent items as closure( I0 )
c Repeatedly compute, for each Ik , and each grammar symbol , goto(Ik , )
• If the set is not already in the collection, add it
• Record all the transitions created by goto( )
This eventually reaches a fixed point
2
Fill in the ACTION and GOTO tables using the DFA
The canonical collection completely encodes the
transition diagram for the handle-finding DFA
21
Computing Closures
closure(I) adds all the items implied by items already in I
• Any item [ , a] implies [ , x] for each production
with on the lhs, and x FIRST(a)
• Since is valid, any way to derive is valid, too
The algorithm
Closure( I )
while ( I is still changing )
for each item [ • , a] I
for each production P
for each terminal b FIRST(a)
if [ • , b] I
then add [ • , b] to I
Fixpoint computation
22
Example Grammar
Initial step builds the item [S • A ,$]
and takes its closure( )
1 S
2 Z
3
Z
Zz
| z
Closure( [S • A , $] )
Item
[S • Z , $]
[Z • Z z , $]
[Z • z , $]
[Z • Z z , z]
[Z • z , z]
From
Original item
1, a is $
1, a is $
2, a is z $
2, a is z $
So, initial state s0 is
{ [S • Z ,$], [Z • Z z, $],[Z• z , $], [Z • Z z , z], [Z • z , z] }
23
Computing Gotos
goto(I , x) computes the state that the parser would reach
if it recognized an x while in state I
• goto( { [ , a] }, ) produces [ , a]
• It also includes closure( [ , a] ) to fill out the state
The algorithm
Goto( I, x )
new = Ø
for each [ • x , a] I
new = new [ x • , a]
• Not a fixpoint method
• Uses closure
return closure(new)
24
Example Grammar
s0 is { [S • Z ,$], [Z • Z z, $],[Z • z , $], [Z • Z z , z], [Z • z , z] }
goto( S0 , z )
• Loop produces
Item
[Z z • , $]
[Z z • , z]
From
Item 3 in s0
Item 5 in s0
• Closure adds nothing since • is at end of rhs in each item
In the construction, this produces s2
{ [Z z • , {$ , z}]}
New, but obvious, notation
for two distinct items
[Zz • , $] and [Zz • , z]
25