No Slide Title

Download Report

Transcript No Slide Title

Bottom-up Parsing
Parsing Techniques
Top-down parsers
•
•
•
•
Start at the root of the parse tree and grow toward leaves
Pick a production & try to match the input
Bad “pick”  may need to backtrack
Some grammars are backtrack-free
Bottom-up parsers
•
•
•
•
(LL(1), recursive descent)
(predictive parsing)
(LR(1), operator precedence)
Start at the leaves and grow toward root
As input is consumed, encode possibilities in an internal state
Start in a state valid for legal first tokens
Bottom-up parsers handle a large class of grammars
2
Bottom-up Parsing
(definitions)
The point of parsing is to construct a derivation
A derivation consists of a series of rewrite steps
S  0  1  2  …  n–1  n  sentence
• Each i is a sentential form

If  contains only terminal symbols,  is a sentence in L(G)

If  contains ≥ 1 non-terminals,  is a sentential form
• To get i from j–1, expand some NT A  i–1 by using A 

Replace the occurrence of A  i–1 with  to get i

In a leftmost derivation, it would be the first NT A  i–1
A left-sentential form occurs in a leftmost derivation
A right-sentential form occurs in a rightmost derivation
3
Bottom-up Parsing
• Bottom-up paring and reverse right most derivation

A derivation consists of a series of rewrite steps

A bottom-up parser builds a derivation by working from the
input sentence back toward the start symbol S
S  0  1  2  …  n–1  n  sentence
Bottom-up
In terms of the parse tree, this is working from leaves to root
• Nodes with no parent in a partial tree form its upper fringe
• Since each replacement of  with A shrinks the upper fringe,
we call it a reduction.
4
Finding Reductions
(Handles)
The parser must find a substring  of the tree’s frontier that
matches some production A   that occurs as one step
In the rightmost derivation
Informally, we call this substring  a handle
Formally,
A handle of a right-sentential form  is a pair <A,k> where
A  P and k is the position in  of ’s rightmost symbol.
If <A,k> is a handle, then replace  at k with A
Handle Pruning
The process of discovering a handle and reducing it to the
appropriate left-hand side is called handle pruning
Because  is a right-sentential form, the substring to the right of a
handle contains only terminal symbols
5
Example
1
2
Goal
Expr
 Expr
 Expr + Term
Term
Expr – Term
| Term
 Term * Factor
| Term / Factor
| Factor
Factor
 number
3
|
4
5
6
7
8
(a very busy slide)
9
|
id
10
|
( Expr )
The expression grammar
Prod’n. Sentential Form
—
1
3
5
9
7
8
4
7
9
Goal
Expr
Expr – Term
Expr –Term * Factor
Expr – Term * <id,y>
Expr – Factor * <id,y>
Expr – <num,2> * <id,y>
Term – <num,2> * <id,y>
Factor – <num,2> * <id,y>
<id,x> – <num,2> * <id,y>
Handle
—
1,1
3,3
5,5
9,5
7,3
8,3
4,1
7,1
9,1
Handles for rightmost derivation of x – 2 * y
6
Handle-pruning, Bottom-up Parsers
One implementation technique is the shift-reduce parser
push INVALID
token  next_token( )
repeat until (top of stack = Goal and token = EOF)
if the top of the stack is a handle A
then /* reduce  to A*/
pop || symbols off the stack
push A onto the stack
else if (token  EOF)
then /* shift */
push token
token  next_token( )
How do errors show up?
• failure to find a handle
• hitting EOF & needing to
shift (final else clause)
Either generates an error
7
Back to x - 2 * y
Stack
$
$ id
Input
Handle
id – num * id none
– num * id
Action
shift
5 shifts +
9 reduces +
1 accept
1. Shift until the top of the stack is the right end of a handle
2. Find the left end of the handle & reduce
8
Back to x - 2 * y
Stack
$
$ id
$ Factor
$ Term
$ Expr
Input
Handle
id – num * id none
– num * id
9,1
– num * id
7,1
– num * id
4,1
– num * id
Action
shift
red. 9
red. 7
red. 4
5 shifts +
9 reduces +
1 accept
1. Shift until the top of the stack is the right end of a handle
2. Find the left end of the handle & reduce
9
Back to x - 2 * y
Stack
$
$ id
$ Factor
$ Term
$ Expr
$ Expr –
$ Expr – num
Input
Handle
id – num * id none
– num * id
9,1
– num * id
7,1
– num * id
4,1
– num * id none
num * id none
* id
Action
shift
red. 9
red. 7
red. 4
shift
shift
5 shifts +
9 reduces +
1 accept
1. Shift until the top of the stack is the right end of a handle
2. Find the left end of the handle & reduce
10
Back to x - 2 * y
Stack
$
$ id
$ Factor
$ Term
$ Expr
$ Expr –
$ Expr – num
$ Expr – Factor
$ Expr – Term
Input
Handle
id – num * id none
– num * id
9,1
– num * id
7,1
– num * id
4,1
– num * id none
num * id none
* id
8,3
* id
7,3
* id
Action
shift
red. 9
red. 7
red. 4
shift
shift
red. 8
red. 7
5 shifts +
9 reduces +
1 accept
1. Shift until the top of the stack is the right end of a handle
2. Find the left end of the handle & reduce
11
Back to x - 2 * y
Stack
$
$ id
$ Factor
$ Term
$ Expr
$ Expr –
$ Expr – num
$ Expr – Factor
$ Expr – Term
$ Expr – Term *
$ Expr – Term * id
Input
Handle
id – num * id none
– num * id
9,1
– num * id
7,1
– num * id
4,1
– num * id none
num * id none
* id
8,3
* id
7,3
* id none
id none
Action
shift
red. 9
red. 7
red. 4
shift
shift
red. 8
red. 7
shift
shift
5 shifts +
9 reduces +
1 accept
1. Shift until the top of the stack is the right end of a handle
2. Find the left end of the handle & reduce
12
Back to x – 2 * y
Stack
$
$ id
$ Factor
$ Term
$ Expr
$ Expr –
$ Expr – num
$ Expr – Factor
$ Expr – Term
$ Expr – Term *
$ Expr – Term * id
$ Expr – Term * Factor
$ Expr – Term
$ Expr
$ Goal
Input
Handle
id – num * id none
– num * id
9,1
– num * id
7,1
– num * id
4,1
– num * id none
num * id none
* id
8,3
* id
7,3
* id none
id none
9,5
5,5
3,3
1,1
none
Action
shift
red. 9
red. 7
red. 4
shift
shift
red. 8
red. 7
shift
shift
red. 9
red. 5
red. 3
red. 1
accept
5 shifts +
9 reduces +
1 accept
1. Shift until the top of the stack is the right end of a handle
2. Find the left end of the handle & reduce
13
Example
Stack
$
$ id
$ Factor
$ Term
$ Expr
$ Expr –
$ Expr – num
$ Expr – Factor
$ Expr – Term
$ Expr – Term *
$ Expr – Term * id
$ Expr – Term * Factor
$ Expr – Term
$ Expr
$ Goal
Input
id – num * id
– num * id
– num * id
– num * id
– num * id
num * id
* id
* id
* id
id
Action
shift
red. 9
red. 7
red. 4
shift
shift
red. 8
red. 7
shift
shift
red. 9
red. 5
red. 3
red. 1
accep t
Goal
Expr
Expr
–
Term
Term
Term
*
Fact.
Fact.
Fact.
<id,y>
<id,x> <num,2>
14
Shift-reduce Parsing
Shift reduce parsers are easily built and easily understood
A shift-reduce parser has just four actions
•
•
•
•
Shift — next word is shifted onto the stack
Reduce — right end of handle is at top of stack
Handle finding is key
Locate left end of handle within the stack
• handle is on stack
• finite set of handles
Pop handle off stack & push appropriate lhs
 use a DFA !
Accept — stop parsing & report success
Error — call an error reporting/recovery routine
Critical Question: How can we know when we have found a handle without
generating lots of different derivations?
 Answer: we use look ahead in the grammar along with tables
produced as the result of analyzing the grammar.
 LR(1) parsers build a DFA that runs over the stack & find them
15
LR(1) Parsers
A table-driven LR(1) parser looks like
source
code
grammar
Scanner
Table-driven
Parser
Parser
Generator
ACTION &
GOTO
Tables
IR
Tables can be built by hand
It is a perfect task to automate
16
LR(1) Skeleton Parser
stack.push(INVALID); stack.push(s0);
not_found = true;
token = scanner.next_token();
do while (not_found) {
s = stack.top();
if ( ACTION[s,token] == “reduce A” ) then {
stack.popnum(2*||); // pop 2*|| symbols
s = stack.top();
stack.push(A);
stack.push(GOTO[s,A]);
}
else if ( ACTION[s,token] == “shift si” ) then {
stack.push(token); stack.push(si);
token  scanner.next_token();
}
else if ( ACTION[s,token] == “accept”
& token == EOF )
then not_found = false;
The skeleton
parser
• uses ACTION &
GOTO tables
• does |words|
shifts
• does |derivation|
reductions
does 1 accept
•
• detects errors by
failure of 3 other
cases
else report a syntax error and recover;
}
report success;
17
LR(1) Parsers
How does this LR(1) stuff work?
• Unambiguous grammar  unique rightmost derivation
• Keep upper fringe on a stack
All active handles include top of stack (TOS)
 Shift inputs until TOS is right end of a handle

• Language of handles is regular (finite)


Reduce
action
Build a handle-recognizing DFA
ACTION & GOTO tables encode the DFA
• To match subterm, invoke subterm DFA
& leave old DFA’s state on stack
S1
S0
• Final state in DFA  a reduce action
baa
S3
SN
baa
S2
Reduce
action
New state is GOTO[state at TOS (after pop), lhs]
 For SN, this takes the DFA to s1

Control DFA for SN
18
Building LR(1) Parsers
How do we generate the ACTION and GOTO tables?
• Use the grammar to build a model of the DFA
• Use the model to build ACTION & GOTO tables
• If construction succeeds, the grammar is LR(1)
The Big Picture
Terminal or
non-terminal
• Model the state of the parser
• Use two functions goto( s, X ) and closure( s )


goto() is analogous to move() in the subset construction
closure() adds information to round out a state
• Build up the states and transition functions of the DFA
• Use this information to fill in the ACTION and GOTO tables
19
LR(k) items
An LR(k) item is a pair [P, ], where
P is a production A with a • at some position in the rhs
 is a lookahead string of length ≤ k
(words or EOF)
The • in an item indicates the position of the top of the stack
[A•,a] means that the input seen so far is consistent with the use of A 
immediately after the symbol on top of the stack
[A •,a] means that the input sees so far is consistent with the use of A
 at this point in the parse, and that the parser has already recognized
.
[A •,a] means that the parser has seen , and that a lookahead symbol of
a is consistent with reducing to A.
The table construction algorithm uses items to represent valid
configurations of an LR(1) parser
20
Computing Gotos
Goto(s,x) computes the state that the parser would reach
if it recognized an x while in state s
• Goto( { [AX,a] }, X ) produces [AX,a]
(obviously)
• It also includes closure( [AX,a] ) to fill out the state
The algorithm
Goto( s, X )
new Ø
 items [A•X,a]  s
new  new  [AX•,a]
• Not a fixed point method!
• Straightforward computation
• Uses closure( )
return closure(new)
Goto() advances the parse
21
Computing Closures
Closure(s) adds all the items implied by items already in s
• Any item [AB,a] implies [B,x] for each production
with B on the lhs, and each x  FIRST(a)
• Since B is valid, any way to derive B is valid, too
The algorithm
Closure( s )
while ( s is still changing )
 items [A  •B,a]  s
 productions B    P
 b  FIRST(a) //  might be 
if [B • ,b]  s
then add [B • ,b] to s
• Classic fixed-point algorithm
• Halts because s  ITEMS
• Worklist version is faster
Closure “fills out” a state
22
LR(1) Items
The production A, where  = B1B1B1 with lookahead a, can give
rise to 4 items
[A•B1B1B1,a], [AB1•B1B1,a], [AB1B1•B1,a], & [AB1B1B1•,a]
The set of LR(1) items for a grammar is finite
What’s the point of all these lookahead symbols?
• Carry them along to choose correct reduction (if a choice occurs)
• Lookaheads are bookkeeping, unless item has • at right end

Has no direct use in [A•,a]

In [A•,a], a lookahead of a implies a reduction by A 

For { [A•,a],[B•,b] }, a  reduce to A; FIRST()  shift
 Limited right context is enough to pick the actions
23
LR(1) Table Construction
High-level overview

Build the canonical collection of sets of LR(1) Items, I
a Begin in an appropriate state, s0

[S’ •S,EOF], along with any equivalent items

Derive equivalent items as closure( i0 )
b Repeatedly compute, for each sk, and each X, goto(sk,X)

If the set is not already in the collection, add it

Record all the transitions created by goto( )
This eventually reaches a fixed point
2
Fill in the table from the collection of sets of LR(1) items
The canonical collection completely encodes the
transition diagram for the handle-finding DFA
24
Building the Canonical Collection
Start from s0 = closure( [S’S,EOF] )
Repeatedly construct new states, until all are found
The algorithm
s0  closure( [S’S,EOF] )
S  { s0 }
k  1
while ( S is still changing )
 sj  S and  x  ( T  NT )
sk  goto(sj,x)
record sj  sk on x
if sk  S then
S  S  sk
kk+1
• Fixed-point computation
• Loop adds to S
• S  2ITEMS, so S is finite
• Worklist version is faster
25
Example
(grammar & sets)
Simplified, right recursive expression grammar
Goal  Expr
Expr  Term – Expr
Expr  Term
Term  Factor * Term
Term  Factor
Factor  ident
Symbol
Goal
Expr
Term
Factor
–
*
ident
FIRST
{ ident }
{ ident }
{ ident }
{ ident }
{–}
{*}
{ ident }
26
Example (building the collection)
Initialization Step
s0  closure( { [Goal  •Expr , EOF] } )
{ [Goal  • Expr , EOF], [Expr  • Term – Expr , EOF], [Expr  • Term , EOF],
[Term  • Factor * Term , EOF], [Term  • Factor * Term , –],
[Term  • Factor , EOF], [Term  • Factor , –],
[Factor  • ident , EOF], [Factor  • ident , –], [Factor  • ident , *] }
S  {s0 }
27
Example (building the collection)
Iteration 1
s1  goto(s0 , Expr)
s2  goto(s0 , Term)
s3  goto(s0 , Factor)
s4  goto(s0 , ident )
Iteration 2
s5  goto(s2 , – )
s6  goto(s3 , * )
Iteration 3
s7  goto(s5 , Expr )
s8  goto(s6 , Term )
28
Example
(Summary)
S0 : { [Goal  • Expr , EOF], [Expr  • Term – Expr , EOF], [Expr  • Term , EOF],
[Term  • Factor * Term , EOF], [Term  • Factor * Term , –],
[Term  • Factor , EOF], [Term  • Factor , –],
[Factor  • ident , EOF], [Factor  • ident , –], [Factor • ident, *] }
S1 : { [Goal  Expr •, EOF] }
S2 : { [Expr  Term • – Expr , EOF], [Expr  Term •, EOF] }
S3 : { [Term  Factor • * Term , EOF],[Term  Factor • * Term , –], [Term  Factor •, EOF],
[Term  Factor •, –] }
S4 : { [Factor  ident •, EOF],[Factor  ident •, –], [Factor  ident •, *] }
S5 : { [Expr  Term – • Expr , EOF], [Expr  • Term – Expr , EOF], [Expr  • Term , EOF],
[Term  • Factor * Term , –], [Term  • Factor , –],
[Term  • Factor * Term , EOF], [Term  • Factor , EOF],
[Factor  • ident , *], [Factor  • ident , –], [Factor  • ident , EOF] }
29
Example
(Summary)
S6 : { [Term  Factor * • Term , EOF], [Term  Factor * • Term , –],
[Term  • Factor * Term , EOF], [Term  • Factor * Term , –],
[Term  • Factor , EOF], [Term  • Factor , –],
[Factor  • ident , EOF], [Factor  • ident , –], [Factor  • ident , *] }
S7: { [Expr  Term – Expr •, EOF] }
S8 : { [Term  Factor * Term •, EOF], [Term  Factor * Term •, –] }
30
Filling in the ACTION and GOTO Tables
The algorithm
 set sx  S
 item i  sx
if i is [A •a,b] and goto(sx,a) = sk , a  T
then ACTION[x,a]  “shift k”
else if i is [S’S •,EOF]
then ACTION[x ,a]  “accept”
else if i is [A •,a]
then ACTION[x,a]  “reduce A”
x is the state number
 n  NT
if goto(sx ,n) = sk
then GOTO[x,n]  k
Many items generate no table entry

Closure( ) instantiates FIRST(X) directly for [A•X,a ]
31
Example
(Summary)
The Goto Relationship (from the construction)
State
Expr
Term
Factor
0
1
2
3
-
*
Ident
4
1
2
5
3
6
4
5
6
7
2
3
4
8
3
4
7
8
32
Example
(Filling in the tables)
The algorithm produces the following table
0
1
2
3
4
5
6
7
8
Ident
s4
ACTION
*
s5
r5
r6
s6
r6
GOTO
EOF
Factor
1
2
3
7
2
8
3
3
acc
r3
r5
r6
s4
s4
r4
Expr Term
r2
r4
33
What can go wrong?
What if set s contains [A•a,b] and [B•,a] ?
•
•
•
•
•
First item generates “shift”, second generates “reduce”
Both define ACTION[s,a] — cannot do both actions
This is a fundamental ambiguity, called a shift/reduce error
Modify the grammar to eliminate it
(if-then-else)
Shifting will often resolve it correctly
What is set s contains [A•, a] and [B•, a] ?
•
•
•
•
Each generates “reduce”, but with a different production
Both define ACTION[s,a] — cannot do both reductions
This is a fundamental ambiguity, called a reduce/reduce conflict
Modify the grammar to eliminate it
(PL/I’s overloading of (...))
In either case, the grammar is not LR(1)
34
LR(k) versus LL(k)
(Top-down Recursive Descent )
Finding Reductions
LR(k)  Each reduction in the parse is detectable with
 the complete left context,
 the reducible phrase, itself, and
 the k terminal symbols to its right

LL(k)  Parser must select the reduction based on
 The complete left context
 The next k terminals

Thus, LR(k) examines more context

“… in practice, programming languages do not actually seem to
fall in the gap between LL(1) languages and deterministic
languages”
J.J. Horning, “LR Grammars and Analysers”, in Compiler
Construction, An Advanced Course, Springer-Verlag, 1976
35
•Summary
Top-down
recursive
descent
LR(1)
Advantages
Disadvantages
Fast
Hand-coded
Good locality
High maintenance
Simplicity
Right associativity
Good error detection
Fast
Large working sets
Deterministic langs.
Poor error messages
Automatable
Large table sizes
Left associativity
36