No Slide Title
Download
Report
Transcript No Slide Title
More on LR Parsing
CSE244
Aggelos Kiayias
Computer Science & Engineering Department
The University of Connecticut
191 Auditorium Road, Box U-155
Storrs, CT 06269-3155
[email protected]
http://www.cse.uconn.edu/~akiayias
CH4.1
Picture So Far
CSE244
SLR construction:
based on canonical collection of LR(0) items –
gives rise to canonical LR(0) parsing table.
No multiply defined labels => Grammar is called
“SLR(1)”
More general class: LR(1) grammars.
Using the notion of LR(1) item and the canonical
LR(1) parsing table.
CH4.2
LR(1) Items
CSE244
DEF. A LR(1) item is a production with a marker
together with a terminal:
E.g. [S aA.Be, c]
intuition: it indicates how much of a certain production
we have seen already (aA) + what we could expect next
(Be) + a lookahead that agrees with what should follow
in the input if we ever do Reduce by the production
S aABe
By incorporating such lookahead information into the
item concept we will make more wise reduce decisions.
Direct use of lookahead in an LR(1) item is only
performed in considering reduce actions. (I.e. when
marker is in the rightmost).
Core of an LR(1) item [S aA.Be, c] is the LR(0)
item S aA.Be
Different LR(1) items may share the same core.
CH4.3
Usefulness of LR(1) items
CSE244
E.g. if we have two LR(1) items of the form
[ A . , a ] [ B . , b ] we will take
advantage of the lookahead to decide which
reduction to use (the same setting would
perhaps produce a reduce/reduce conflict in the
SLR approach).
How the Notion of Validity changes:
An item [ A 1.2 , a ] is valid for a viable
prefix 1 if we have a rightmost derivation that
yields Aaw which in one step yields 12aw
CH4.4
Constructing the Canonical Collection of
LR(1) items
CSE244
Initial item: [ S’ .S , $]
Closure. (more refined)
if [A.B , a] belongs to the set of items, and
B is a production of the grammar, then:
we add the item [B . , b]
for all bFIRST(a)
Goto. (the same)
A state containing [A.X , a] will move to a
state containing [AX. , a] with label X
Every state is closed according to Closure.
Every state has transitions according to Goto.
CH4.5
Constructing the LR(1) Parsing Table
CSE244
Shift actions: (same)
If [A.b , a] is in state Ik and Ik moves to state
Im with label b then we add the action
action[k, b] = “shift m”
Reduce actions: (more refined)
If [A. , a] is in state Ik then we add the action:
“Reduce A”
into action[A, a]
Observe that we don’t use information from
FOLLOW(A) anymore.
Goto part of the table is as before.
CH4.6
Example I
CSE244
S’ S
S CC
CcC |d
construction
FIRST
S cd
C cd
CH4.7
Example II
CSE244
S’ S
SL=R | R
L * R | id
RL
FIRST
S * id
L * id
R * id
CH4.8
LR(1) more general to SLR(1):
CSE244
S’ S
SL=R | R
L * R | id
RL
I2 = {
[S’ .S , $ ]
I0 = {
[S .L = R , $ ]
[S .R , $ ]
[L .* R , = / $ ]
[L . id , = / $ ]
[R .L , $ ] }
I1 = {[S’ S . , $ ]}
[S L . = R , $ ]
[R L . , $ ]
I3 = {
[S R. , $ ]}
I4 = {
[L *.R , = / $ ]
[R .L , = / $ ]
[L .* R , = / $ ]
[L . id , = / $ ] }
}
action[2, = ] ?
s6
(because of
S L. = R )
THERE IS NO
CONFLICT
ANYMORE
I5 = {[L id. , = / $ ]}
I6 = {
[S L = . R , $ ]
[R .L , $ ]
[L .* R , $ ]
[L . id , $ ]
I7 = {[L *R. , = / $ ]}
I8 = {[R L. , = / $ ]}
I10 = {[L *R. , $ ]}
I11 = {[L id. , $ ]}
I12 = {[R L. , $ ]}
}
I9 = {[L *.R , $ ]
[R .L , $ ]
[L .* R , $ ]
[L . id , $ ] }
CH4.9
LALR Parsing
CSE244
Canonical sets of LR(1) items
Number of states much larger than in the SLR construction
LR(1) = Order of thousands for a standard prog. Lang.
SLR(1) = order of hundreds for a standard prog. Lang.
LALR(1) (lookahead-LR)
A tradeoff:
Collapse states of the LR(1) table that have the same
core (the “LR(0)” part of each state)
LALR never introduces a Shift/Reduce Conflict if
LR(1) doesn’t.
It might introduce a Reduce/Reduce Conflict (that did
not exist in the LR(1))…
Still much better than SLR(1) (larger set of languages)
… but smaller than LR(1), actually ~ SLR(1)
What Yacc and most compilers employ.
CH4.10
Collapsing states with the same core.
CSE244
E.g., If I3 I6 collapse then whenever the LALR(1)
parser puts I36 into the stack, the LR(1) parser
would have either I3 or I6
A shift/reduce action would not be introduced by
the LALR “collapse”
Indeed if the LALR(1) has a Shift/Reduce
conflict this conflict should also exist in the
LR(1) version: this is because two states with
the same core would have the same outgoing
arrows.
On the other hand a reduce/reduce conflict may be
introduced.
Still LALR(1) preferred: table proportional to
SLR(1)
Direct construction is also possible.
CH4.11
Error Recovery in LR Parsing
CSE244
For a given stack $...Ii and input symbols s…s’…$
it holds that action[i,s] = empty
Panic-mode error recovery.
CH4.12
Panic Recovery Strategy I
CSE244
Scan down the stack till a state Ij is found
Ij moves with the non-terminal A to some state
Ik
Ik moves with s’ to some state Ik’
Proceed as follows:
Pop all states till Ij
Push A and state Ik
Discard all symbols from the input till s’
There may be many choices as above.
[essentially the parser in this way determines that a
string that is produced by A has an error; it assumes
it is correct and advances]
Error message: construct of type “A” has error at
location X
CH4.13
Panic Recovery Strategy II
CSE244
Scan down the stack till a state Ij is found
Ij moves with the terminal t to some state Ik
Ik with s’ has a valid action.
Proceed as follows:
Pop all states till Ij
Push t and state Ik
Discard all symbols from the input till s’
There may be many choices as above.
Error message: “missing t”
CH4.14
Example
CSE244
E’ E
EE+E|
|E*E
|(E)
| id
goto
action
0
1
2
3
4
5
6
7
8
9
id
+
*
(
)
$
s3
e3
s3
r4
s3
s3
e3
r1
r2
r3
e1
s4
e1
r4
e1
e1
s4
r1
r2
r3
e1
s5
e1
r4
e1
e1
s5
s5
r2
r3
s2
e3
s2
r4
s2
s2
e3
r1
r2
r3
e2
e2
e2
r4
e2
e2
s9
r1
r2
r3
e1
acc
e1 6
r4
e1 7
e1 8
e4
r1
r2
r3
E
1
CH4.15
E’ E
EE+E|
|E*E
Collection of LR(0) items
|(E)
| id
I0
I2
I5
I8
I1
I3
I6
I9
I4
EE+.E
E .E + E
E .E * E
E .( E )
E .id
I7
EE +E.
EE.+E
EE.*E
E’ .E
CSE244 E .E + E
E .E * E
E .( E )
E .id
E’ E.
EE.+E
EE.*E
E (. E )
E .E + E
E .E * E
E .( E )
E .id
E id.
EE*.E
E .E + E
E .E * E
E .( E )
E .id
E(E.)
EE.+E
EE.*E
EE*E.
EE.+E
EE.*E
E(E).
Follow(E’)=$
Follow(E)=+*)$
CH4.16
The parsing table
CSE244
id +
*
0 s3
1
s4 s5
2 s3
3
r4 r4
4 s3
5 s3
6
s4 s5
7
s4/r1 s5/r1
8
s4/r2 s5/r2
9
r3 r3
(
s2
)
$
E
1
acc
s2
6
r4
r4
s2
s2
7
8
s9
r1
r2
r3
r1
r2
r3
CH4.17
Error-handling
CSE244
id +
*
0 s3 e1
1
s4 s5
2 s3
3
r4 r4
4 s3
5 s3
6
s4 s5
7
s4/r1 s5/r1
8
s4/r2 s5/r2
9
r3 r3
(
s2
)
$
E
1
acc
s2
6
r4
r4
s2
s2
7
8
s9
r1
r2
r3
r1
r2
r3
CH4.18
Error-handling
I0
E’ .E
E .E + E
CSE244
E .E * E
E .( E )
E .id
I2
E (. E )
E .E + E
E .E * E
E .( E )
E .id
I5
EE*.E
E .E + E
E .E * E
E .( E )
E .id
I8
EE*E.
EE.+E
EE.*E
e1 Push E into the stack and move to state 1
“missing operand”
:
e1 Push id into the stack and change to state 3
“missing operand”
CH4.19
Error-handling
CSE244
id +
0 s3 e1
1
s4
2 s3
3
r4
4 s3
5 s3
6
s4
7
s4/r1
8
s4/r2
9
r3
*
e1
s5
(
s2
)
$
e1
acc
s2
r4
6
r4
r4
s2
s2
s5
s5/r1
s5/r2
r3
E
1
7
8
s9
r1
r2
r3
r1
r2
r3
CH4.20
Error-handling
CSE244
id +
0 s3 e1
1
s4
2 s3
3
r4
4 s3 e1
5 s3
6
s4
7
s4/r1
8
s4/r2
9
r3
*
e1
s5
(
s2
)
e2
e2
$
e1
acc
s2
r4
6
r4
r4
s2
s2
s5
s5/r1
s5/r2
r3
E
1
7
8
s9
r1
r2
r3
r1
r2
r3
CH4.21
Error-handling
CSE244
e2 remove “)” from input.
“unbalanced right parenthesis”
Try the input id+)
CH4.22
Error-handling state 1
CSE244
id +
0 s3 e1
1 e3 s4
2 s3
3
r4
4 s3
5 s3
6
s4
7
s4/r1
8
s4/r2
9
r3
*
e1
s5
(
s2
)
e2
$
e1
acc
s2
r4
6
r4
r4
s2
s2
s5
s5/r1
s5/r2
r3
E
1
7
8
s9
r1
r2
r3
r1
r2
r3
CH4.23
Error-Handling
I1
CSE244
E’ E.
EE.+E
EE.*E
I3
I6
I4
EE+.E
E .E + E
E .E * E
E .( E )
E .id
I7
EE +E.
EE.+E
EE.*E
E id.
E(E.)
EE.+E
EE.*E
I9
E(E).
e3 Push + into the stack and change to state 4
“missing operator”
CH4.24
Intro to Translation
Side-effects and Translation Schemes.
side-effects
attached to the symbols
to the right of them.
E’ E
E E + E {print(+)}
| E * E {print(*)}
| {parenthesis++} ( E ) {parenthesis--}
| id { print(id); print(parenthesis); }
CSE244
Do the construction as before but:
Side-effect in front of a symbol will be
executed in a state when we make the move
following that symbol to another state.
Side-effects on the rightmost end are executed
during reduce actions.
Do for example id*(id+id)$
CH4.25