LR-Grammar

Transcript LR-Grammar

LR-Grammars
LR(0), LR(1), and LR(K)
Deterministic Context-Free
Languages



DCFL
A family of languages that are accepted
by a Deterministic Pushdown
Automaton (DPDA)
Many programming languages can be
described by means of DCFLs
Prefix and Proper Prefix

Prefix (of a string)


Any number of leading symbols of that
string
Example: abc


Prefixes: , a, ab, abc
Proper Prefix (of a string)


A prefix of a string, but not the string itself
Example: abc

Proper prefixes: , a, ab
Prefix Property


Context-Free Language (CFL) L is said
to have the prefix property whenever w
is in L and no proper prefix of w is in L
Not considered a serve restriction

Why?

Because we can easily convert a DCFL to a
DCFL with the prefix property by introducing an
endmarker
Suffix and Proper Suffix

Suffix (of a string)


Any number of trailing symbols
Proper Suffix

A suffix of a string, but not the string itself
Example Grammar

This is the grammar that will be used in
many of the examples:



S’  Sc
S  SA | A
A  aSb | ab
LR-Grammar


Left-to-right scan of the input producing
a rightmost derivation
Simply:


L stands for Left-to-right
R stands for rightmost derivation
LR-Items

An item (for a given CFG)


A production with a dot anywhere in the
right side (including the beginning and end)
In the event of an -production: B  

B  · is an item
Example: Items

Given our example grammar:


S’  Sc, S  SA|A, A  aSb|ab
The items for the grammar are:
S’·Sc, S’S·c, S’Sc·
S·SA, SS·A, SSA·, S·A, SA·
A·aSb, Aa·Sb, AaS·b, AaSb·, A·ab, Aa·b, Aab·
Some Notation

* = 1 or more steps in a derivation

*rm = rightmost derivation

rm = single step in rightmost
derivation
Right-Sentential Form

A sentential form that can be derived by
a rightmost derivation

A string of terminals and variables  is
called a sentential form if S* 
More terms

Handle


A substring which matches the right-hand side of a
production and represents 1 step in the derivation
Or more formally:


(of a right-sentential form  for CFG G)
Is a substring  such that:



S *rm w
w = 
If the grammar is unambiguous:


There are no useless symbols
The rightmost derivation (in right-sentential
form) and the handle are unique
Example

Given our example grammar:


An example right-most derivation:


S’  Sc, S  SA|A, A  aSb|ab
S’  Sc  SAc  SaSbc
Therefore we can say that: SaSbc is in
right-sentential form

The handle is aSb
More terms

Viable Prefix



(of a right-sentential form for )
Is any prefix of  ending no farther right
than the right end of a handle of .
Complete item

An item where the dot is the rightmost
symbol
Example

Given our example grammar:


The right-sentential form abc:


S’ *rm Ac  abc
Valid prefixes:
A  ab for prefix ab
 A  ab for prefix a
 A  ab for prefix 
Aab is a complete item,  Ac is the right-sentential
form for abc


S’  Sc, S  SA|A, A  aSb|ab
LR(0)




Left-to-right scan of the input producing
a rightmost derivation with a look-ahead
(on the input) of 0 symbols
It is a restricted type of CFG
1st in the family of LR-grammars
LR(0) grammars define exactly the
DCFLs having the prefix property
Computing Sets of Valid Items

The definition of LR(0) and the method
of accepting L(G) for LR(0) grammar G
by a DPDA depends on:


Knowing the set of valid items for each
prefix 
For every CFG G, the set of viable
prefixes is a regular set

This regular set is accepted by an NFA
whose states are the items for G
Continued

Given an NFA (whose states are the
items for G) that accepts the regular set


We can apply the subset construction to
this NFA and yield a DFA
The DFA whose state is the set of valid
items for 
NFA M

NFA M recognizes the viable prefixes for CFG

M = (Q, V  T, , q0, Q)



Q = set of items for G plus state q0
G = (V, T, P, S)
Three Rules


(q0,) = {S| S is a production}
(AB,) = {B| B is a production}


Allows expansion of a variable B appearing
immediately to the right of the dot
(AX, X) = {AX}

Permits moving the dot over any grammar symbol X
if X is the next input symbol
Theorem 10.9


The NFA M has property that (q0, )
contains A iff A is valid for 
This theorem gives a method for
computing the sets of valid items for any
viable prefix

Note: It is an NFA. It can be converted to a
DFA. Then by inspecting each state it can
be determine if it is a valid LR(0) grammar
Definition of LR(0) Grammar

G is an LR(0) grammar if


The start symbol does not appear on the
right side of any productions
 prefixes  of G where A is a
complete item, then it is unique

i.e., there are no other complete items (and
there are no items with a terminal to the right of
the dot) that are valid for 
Facts we now know:




Every LR(0) grammar generates a
DCFL
Every DCFL with the prefix property has
a LR(0) grammar
Every language with LR(0) grammar
have the prefix property
L is DCFL iff L has a LR(0) grammar
DPDA’s from LR(0) Grammars


We trace out the rightmost derivation in
reverse
The stack holds a viable prefix (in rightsentential form) and the current state (of
the DFA)



Viable prefixes: X1X2…Xk
States: s1, s2,…,sk
Stack: s0X1s1…Xksk
Reduction

If sk contains A



Then A is valid for X1X2…Xk
 = suffix of X1X2…Xk
Let


 = Xi+1…Xk
w such that X1…Xkw is a right-sentential
form.
Reduction Continued

There is a derivation:


S *rm X1…XiAw rm X1…Xkw
To obtain the right-sentential form
(X1…Xkw) in a right derivation we
reduce  to A

Therefore, we pop Xi+1…Xk from the stack
and push A onto the stack
Shift

If sk contains only incomplete items


Then the right-sentential form (X1…Xkw)
cannot be formed using a reduction
Instead we simply “shift” the next input
symbol onto the stack
Theorem 10.10

If L is L(G) for an LR(0) grammar G,
then L is N(M) for a DPDA M

N(M) = the language accepted by empty
stack or null stack
Proof

Construct from G the DFA D


Stack Symbols of M are



Transition function: recognizes G’s prefixes
Grammar Symbols of G
States of D
M has start state q and other states
used to perform reduction
We know that:

If G is LR(0) then


Reductions are the only way to get the
right-sentential form when the state of the
DFA (on the top of the stack) contains a
complete item
When M starts on input w it will
construct a right-most derivation for w in
reverse order
What we need to prove:


When a shift is called for and the top
DFA state on the stack has only
incomplete items then there are no
handles
(Note: if there was a handle, then some
DFA state on the stack would have a
complete item)
Suppose  state A (complete item)



Each state is put onto the top of the
stack
It would then immediately be reduced to
A
Therefore, a complete item cannot
possibly become buried on the stack
Proof continued



The acceptance of G occurs when the
top of the stack contains the start
symbol
The start symbol by definition of LR(0)
grammars cannot appear on the right
side of a production
L(G) always has a prefix property if G
is LR(0)
Conclusion of Proof



Thus, if w is in L(G), M finds the
rightmost derivation of w, reduces w to
S, and accepts
If M accepts w, then the sequence of
right-sentential forms provides a
derivation of w from S
N(M) = L(G)
Corollary of Theorem 10.10


Every LR(0) grammar is unambiguous
Why?

The rightmost derivation of w is unique

(Given the construction we provided)
LR(1) Grammars



LR grammar with 1 look-ahead
All and only deterministic CFL’s have
LR(1) grammars
Are greatly important to compiler design

Why?


Because they are broad enough to include the
syntax of almost all programming languages
Restrictive enough to have efficient parsers
(that are essentially DPDAs)
LR(1) Item

Consists of an LR(0) item followed by a
look-ahead set consisting of terminals
and/or the special symbol $


General Form:


$ = the right end of the string
A  , {a1, a2, …, an}
The set of LR(1) items forms the states
of a viable prefix by converting the NFA
to a DFA
A grammar is LR(1) if


The start symbol does not appear on the
right side of any productions
The set of items, I, valid for some viable
prefix includes some complete item A,
{a1,…,an} then


No ai appears immediately to the right of the
dot in any item of I
If B, {b1,…,bk} is another complete item in
I, then ai  bj for any 1  i  n and 1  j  k
Accepting LR(1) language:



Similar to the DPDA used with LR(0)
grammars
However, it is allowed to use the next
input symbol during it’s decision making
This is accomplished by appending a $
to the end of the input and the DPDA
keeps the next input symbol as part of
the state
LR(1) Rules for Reduce/Shift



If the top set of items has a complete item
A, {a1, a2, …, an}, where A  S, reduce
by A if the current input symbol is in
{a1, a2, …, an}
If the top set of items has an item S,
{$}, then reduce by S and accept if the
current symbol is $ (i.e., the end of the
input is reached)
If the top set of items has an item
AaB, T, and a is the current input
symbol, then shift
Regarding the Rules


Guarantees that at most one of the
rules will be applied for any input
symbol or $
Often for practicality the information is
summarized into a table


Rows: sets of items
Columns: terminals and $