Syntax Analysis
Download
Report
Transcript Syntax Analysis
Syntax Analysis
1
Syntax Analysis
Introduction to parsers
Context-free grammars
Push-down automata
Top-down parsing
Buttom-up parsing
Bison - a parser generator
2
Introduction to parsers
source Lexical
code Analyzer
token
next token
syntax Semantic
Parser
Analyzer
tree
Symbol
Table
3
Context-Free Grammars
A set of terminals: basic symbols from
which sentences are formed
A set of nonterminals: syntactic
categories denoting sets of sentences
A set of productions: rules specifying
how the terminals and nonterminals can
be combined to form sentences
The start symbol: a distinguished
nonterminal denoting the language
4
An Example
Terminals: id, ‘+’, ‘-’, ‘*’, ‘/’, ‘(’, ‘)’
Nonterminals: expr, op
Productions:
expr expr op expr
expr ‘(’ expr ‘)’
expr ‘-’ expr
expr id
op ‘+’ | ‘-’ | ‘*’ | ‘/’
The start symbol: expr
5
Derivations
A derivation step is an application of a
production as a rewriting rule
E-E
A sequence of derivation steps
E - E - ( E ) - ( id )
is called a derivation of “- ( id )” from E
The symbol * denotes “derives in zero or
more steps”; the symbol + denotes “derives
in one or more steps
E * - ( id )
E + - ( id )
6
Context-Free Languages
A context-free language L(G) is the language
defined by a context-free grammar G
A string of terminals is in L(G) if and only
if S + , is called a sentence of G
If S * , where may contain nonterminals,
then we call a sentential form of G
E - E - ( E ) - ( id )
G1 is equivalent to G2 if L(G1) = L(G2)
7
Left- & Right-most Derivations
Each derivation step needs to choose
– a nonterminal to rewrite
– a production to apply
A leftmost derivation always chooses the
leftmost nonterminal to rewrite
E lm - E lm - ( E ) lm - ( E + E )
lm - ( id + E ) lm - ( id + id )
A rightmost derivation always chooses the
rightmost nonterminal to rewrite
E rm - E rm - ( E ) rm - ( E + E )
rm - (E + id ) rm - ( id + id )
8
Parse Trees
A parse tree is a graphical representation
for a derivation that filters out the order of
choosing nonterminals for rewriting
Many derivations may correspond to the
same parse tree, but every parse tree has
associated with it a unique leftmost and a
unique rightmost derivation
9
An Example
E
E
lm - E
lm - ( E )
lm - ( E + E )
lm - ( id + E )
lm - ( id + id )
E
(
E
)
E
+
E
id
id
E
rm - E
rm - ( E )
rm - ( E + E )
rm - ( E + id )
rm - ( id + id )
10
Ambiguous Grammar
A grammar is ambiguous if it produces
more than one parse tree for some sentence
E
E+E
id + E
id + E * E
id + id * E
id + id * id
E
E*E
E+E*E
id + E * E
id + id * E
id + id * id
11
Ambiguous Grammar
E
E
E
+
E
id
E
*
id
E
E
id
id
E
*
E
+
E
id
id
12
Resolving Ambiguity
Use disambiguiting rules to throw away
undesirable parse trees
Rewrite grammars by incorporating
disambiguiting rules into grammars
13
An Example
The dangling-else grammar
stmt if expr then stmt
| if expr then stmt else stmt
| other
Two parse trees for
if E1 then if E2 then S1 else S2
14
An Example
S
if
E
then
S
if
E
then
S
else
then
S
S
S
if
E
if
E
then
S
else
S
15
Disambiguiting Rules
Rule: match each else with the closest
previous unmatched then
Remove undesired state transitions in the
pushdown automaton
16
Grammar Rewriting
stmt m_stmt
| unm_stmt
m_stmt if expr then m_stmt else m_stmt
| other
unm_stmt if expr then stmt
| if expr then m_stmt else unm_stmt
17
RE vs. CFG
Every language described by a RE can also
be described by a CFG
Why use REs for lexical syntax?
– do not need a notation as powerful as CFGs
– are more concise and easier to understand than
CFGs
– More efficient lexical analyzers can be
constructed from REs than from CFGs
– Provide a way for modularizing the front end
into two manageable-sized components
18
Push-Down Automata
Input
$
Stack
Finite Automaton
Output
$
19
An Example
S’ S $
SaSb
S
start
0
(a, a)
a
(a, $)
a
(b, a)
1
a
($, $)
(b, a)
a
2
($, $)
3
20
Nonregular Constructs
REs can denote only a fixed number of
repetitions or an unspecified number of
repetitions of one given construct:
an, a*
A nonregular construct:
– L = {anbn | n 0}
21
Non-Context-Free Constructs
CFGs can denote only a fixed number of
repetitions or an unspecified number of
repetitions of one or two given constructs
Some non-context-free constructs:
– L1 = {wcw | w is in (a | b)*}
– L2 = {anbmcndm | n 1 and m 1}
– L3 = {anbncn | n 0}
22
共勉
大學之道︰
在明明德,在親民,在止於至善。
-- 大學
23
Top-Down Parsing
Construct a parse tree from the root to the
leaves using leftmost derivation
1. S c A B
2. A a b
3. A a
4. B d
S
1
S
c A B
input: cad
S
2
3
c A B
a
S
c A B
b
a
backtrack
4
S
c A B
a d
24
Predictive Parsing
A top-down parsing without backtracking
– there is only one alternative production to
choose at each derivation step
stmt if expr then stmt else stmt
| while expr do stmt
| begin stmt_list end
25
LL(k) Parsing
The first L stands for scanning the input
from left to right
The second L stands for producing a
leftmost derivation
The k stands for the number of lookahead
input symbols used to choose alternative
productions at each derivation step
26
LL(1) Parsing
Use one input symbol of lookahead
Recursive-descent parsing
Nonrecursive predictive parsing
27
An Example
LL(1): S a b e | c d e
LL(2): S a b e | a d e
28
Recursive Descent Parsing
The parser consists of a set of (possibly
recursive) procedures
Each procedure is associated with a
nonterminal of the grammar that is
responsible to derive the productions of
that nonterminal
Each procedure should be able to choose
a unique production to derive based on
the current token
29
An Example
{integer, char, num}
type simple
| id
| array [ simple ] of type
simple integer
| char
| num dotdot num
30
Recursive Descent Parsing
♥ For each terminal in the production, the
terminal is matched with the current token
♥ For each nonterminal in the production, the
procedure associated with the nonterminal
is called
♥ The sequence of matchings and procedure
calls in processing the input implicitly
defines a parse tree for the input
31
An Example
array [ num dotdot num ] of integer
type
array
[
simple
]
num
dotdot
num
of
type
simple
integer
32
An Example
procedure match(t : terminal);
begin
if lookahead = t then
lookahead := nexttoken
else error
end;
33
An Example
procedure type;
begin
if lookahead is in { integer, char, num } then
simple
else if lookahead = id then
match(id)
else if lookahead = array then begin
match(array); match('['); simple; match(']');
match(of); type
end
else error
end;
34
An Example
procedure simple;
begin
if lookahead = integer then
match(integer)
else if lookahead = char then
match(char)
else if lookahead = num then begin
match(num); match(dotdot); match(num)
end
else error
end;
35
First Sets
The first set of a string is the set of
terminals that begin the strings derived from
. If * , then is also in the first set of
.
36
First Sets
If X is terminal, then FIRST(X) is {X}
If X is nonterminal and X is a
production, then add to FIRST(X)
If X is nonterminal and X Y1 Y2 ... Yk is
a production, then add a to FIRST(X) if
for some i, a is in FIRST(Yi) and is in all
of FIRST(Y1), ..., FIRST(Yi-1). If is in
FIRST(Yj) for all j, then add to FIRST(X)
37
An Example
E T E'
E' + T E' |
T F T'
T' * F T' |
F ( E ) | id
FIRST(F) = { (, id }
FIRST(T') = { *, },
FIRST(E') = { +, },
FIRST(T) = { (, id }
FIRST(E) = { (, id }
38
Follow Sets
The follow set of a nonterminal A is the set of
terminals that can appear immediately to the
right of A in some sentential form, namely,
S * A a
a is in the follow set of A.
39
Follow Sets
Place $ in FOLLOW(S), where S is the start
symbol and $ is the input right endmarker
If there is a production A B , then
everything in FIRST() except for is placed
in FOLLOW(B)
If there is a production A B or A B
where FIRST() contains , then everything
in FOLLOW(A) is in FOLLOW(B)
40
An Example
E T E'
E' + T E' |
T F T'
T' * F T' |
F ( E ) | id
FIRST(E) = FIRST(T) = FIRST(F) = { (, id }
FIRST(E') = { +, },
FIRST(T') = { *, }
FOLLOW(E) = { ), $ }, FOLLOW(E') = { ), $ }
FOLLOW(T) = { +, ), $ }, FOLLOW(T') = { +, ), $ }
FOLLOW(F) = { +, *, ), $ }
41
Nonrecursive Predictive Parsing
Input
Stack
Parsing driver
Output
Parsing table
42
Stack Operations
Match
– when the top stack symbol is a terminal and it
matches the input token, pop the terminal and
advance the input pointer
Expand
– when the top stack symbol is a nonterminal,
replace this symbol by the right hand side of
one of its productions (pop the nonterminal and
push the right hand side of a production in
reverse order)
43
An Example
type simple
| id
| array [ simple ] of type
simple integer
| char
| num dotdot num
44
An Example
Action
Stack
Input
E
type
array [ num dotdot num ] of integer
M type of ] simple [ array array [ num dotdot num ] of integer
M type of ] simple [
[ num dotdot num ] of integer
E
type of ] simple
num dotdot num ] of integer
M type of ] num dotdot num
num dotdot num ] of integer
M type of ] num dotdot
dotdot num ] of integer
M type of ] num
num ] of integer
M type of ]
] of integer
M type of
of integer
E
type
integer
E simple
integer
45
M integer
integer
Parsing Driver
push $S onto the stack, where S is the start symbol
set ip to point to the first symbol of w$;
repeat
let X be the top stack symbol and a the symbol pointed to by ip;
if X is a terminal or $ then
if X = a then
pop X from the stack and advance ip
else error
else /* X is a nonterminal */
if M[X, a] = X Y1 Y2 ... Yk then
pop X from the stack and push Yk ... Y2 Y1 onto the stack
else error
until X = $ and a = $
46
Constructing Parsing Table
Input. Grammar G.
Output. Parsing Table M.
Method.
1. For each production A , do steps 2 and 3.
2. For each terminal a in FIRST( ), add A to M[A, a].
3. If is in FIRST( ), add A to M[A, b] for each
symbol b in FOLLOW(A).
4. Make each undefined entry of M be error.
47
An Example
FIRST(E) = FIRST(T) = FIRST(F) = { (, id }
FIRST(E') = { +, },
FIRST(T') = { *, }
FOLLOW(E) = { ), $ },
FOLLOW(E') = { ), $ }
FOLLOW(T) = { +, ), $ }, FOLLOW(T') = { +, ), $ }
FOLLOW(F) = { +, *, ), $ }
id
+
*
(
)
$
E E TE'
E TE'
E'
E' +TE'
E' E'
T T FT'
T FT'
T'
T'
T' *FT'
T' T'
F F id
F (E)
48
An Example
Stack
$E
$E'T
$E'T'F
$E'T'id
$E'T'
$E'
$E'T+
$E'T
$E'T'F
$E'T'id
$E'T'
$E'T'F*
$E'T'F
$E'T'id
$E'T'
$E'
$
Input
id + id * id$
id + id * id$
id + id * id$
id + id * id$
+ id * id$
+ id * id$
+ id * id$
id * id$
id * id$
id * id$
* id$
* id$
id$
id$
$
$
$
Output
E TE'
T FT'
F id
T'
E' +TE'
T FT'
F id
T' *FT'
F id
T'
E'
49
LL(1) Grammars
A grammar is an LL(1) grammar if its LL(1)
parsing table has no multiply-defined
entries
50
A Counter Example
S i E t S S' | a
S' e S |
Eb
a
S Sa
S'
E
b
Eb
FOLLOW(S) = {$, e}
FOLLOW(S') = {$, e}
FOLLOW(E) = {t}
e
S' e S
S'
i
t
S i E t S S'
$
S'
51
LL(1) Grammars
A grammar G is LL(1) iff whenever A |
are two distinct productions of G, the
following conditions hold:
– For no terminal a do both and derive strings
beginning with a.
FIRST(α) FIRST(β) =
– At most one of and can derive the empty string.
– If * , then does not derive any string
beginning with a terminal in FOLLOW(A).
FIRST(α) FOLLOW(A) =
52
Left Recursion
A grammar is left recursive if it has a
nonterminal A such that A * A
A A|
A
AR
A
R
A
RR|
R
R
R
A
A
53
Direct Left Recursion
A A 1 | A 2 | ... | A m | 1 | 2 | ... | n
A 1 A' | 2 A' | ... | n A'
A' 1 A' | 2 A' | ... | m A' |
54
An Example
E E + T | T
T T * F | F
F ( E ) | id
E T E'
E' + T E' |
T F T'
T' * F T' |
F ( E ) | id
55
Indirect Left Recursion
S Aa|b
A Ac | S d |
S Aa Sda
A Ac |Aa d | b d |
S Aa|b
A b d A' | A'
A' c A' | a d A' |
56
Indirect Left Recursion
Input. Grammar G with no cycles (derivations of the form
A + A) or -production (productions of the form A ).
Output. An equivalent grammar with no left recursion.
1. Arrange the nonterminals in some order A1, A2, ..., An
2. for i := 1 to n do begin
for j := 1 to i - 1 do begin
replace each production of the form Ai Aj
by the production Ai 1 | 2 | ... | k
where Aj 1 | 2 | ... | k are all the
current Aj-productions;
end
eliminate direct left recursion among Ai-productions
end
57
Left Factoring
Two alternatives of a nonterminal A have a
nontrivial common prefix if , and
A 1 | 2
A A'
A' 1 | 2
58
An Example
S iEtS|iEtSeS|a
E b
S i E t S S' | a
S' e S |
E b
59
Error Recovery
Panic mode: skip tokens until a token in a set
of synchronizing tokens appears
1. If a terminal on stack cannot be matched, pop the
terminal
2. use FOLLOW(A) as sync set for A (pop A)
3. use the first set of a higher construct as sync set
for A
4. use FIRST(A) as sync set for A
5. use the production deriving as the default for A
60
An Example
E T E'
E' + T E' |
T F T'
T' * F T' |
F ( E ) | id
FIRST(E) = FIRST(T) = FIRST(F) = { (, id }
FIRST(E') = { +, }
FIRST(T') = { *, }
FOLLOW(E) = FOLLOW(E') = { ), $ }
FOLLOW(T) = FOLLOW(T') = { +, ), $ }
FOLLOW(F) = { +, *, ), $ }
61
An Example
id
E E TE'
E'
T T FT'
T'
F F id
+
*
(
)
E TE' sync2
E' +TE'
E'
sync2
T FT' sync2
T'
T' *FT'
T'
sync2
sync2
F (E) sync2
$
sync2
E'
sync2
T'
sync2
62
An Example
Stack
$E
$E
$E'T
$E'T'F
$E'T'id
$E'T'
$E'T'F*
$E'T'F
$E'T'
$E'
$E'T+
$E'T
$E'T'F
$E'T'id
$E'T'
$E'
$
Input
) id * + id$
id * + id$
id * + id$
id * + id$
id * + id$
* + id$
* + id$
+ id$
+ id$
+ id$
+ id$
id$
id$
id$
$
$
$
Output
error, skip )
E TE'
T FT'
F id
T' *FT'
error
F has been poped
E' +TE'
T FT'
F id
T'
E'
63
共勉
樊遲問仁。
子曰︰愛人。
子曰︰人之過也,各於其黨,
觀過,斯知仁矣。 -- 論語
人生的目的在追尋快樂。
-- 達賴喇嘛
64
Bottom-Up Parsing
Construct a parse tree from the leaves to the
root using rightmost derivation in reverse
S aABe
A Abc|b
B d
input: abbcde
S
A
A
abbcde
abbcde
A
abbcde
rm
aAbcde
A B
A
abbcde
rm
A B
aAde
A
abbcde
rm
aABe
abbcde
rm
S
65
Handles
A handle of a right-sentential form
consists of
– a production A
– a position of where can be replaced by A to
produce the previous right-sentential form in a
rightmost derivation of
abbcde
A b
rm
aAbcde
rm
A Abc
aAde
rm
aABe
rm
S
B d S aABe
66
Handle Pruning
rm A rm S
S
A
The string to the right of the handle contains only
terminals
A is the bottommost leftmost interior node with all
its children in the tree
67
An Example
S
S
S
A B
A B
A B
A
A
abbcde
a bcde
a
de
S
S
A B
a
e
68
Shift-Reduce Parsing
$
Input
Handle
Parsing driver
$
Stack
Output
Parsing table
69
Stack Operations
Shift: shift the next input symbol onto the
top of the stack
Reduce: replace the handle at the top of the
stack with the corresponding nonterminal
Accept: announce successful completion of
the parsing
Error: call an error recovery routine
70
An Example
Action
S
S
R
S
S
R
S
R
S
R
A
Stack
$
$a
$ab
$aA
$aAb
$aAbc
$aA
$aAd
$aAB
$aABe
$S
Input
abbcde$
bbcde$
bcde$
bcde$
cde$
de$
de$
e$
e$
$
$
71
Shift/Reduce Conflict
stmt if expr then stmt
| if expr then stmt else stmt
| other
Stack
$ - - - if expr then stmt
Input
else - - - $
Shift if expr then stmt else stmt
Reduce if expr then stmt
72
Reduce/Reduce Conflict
stmt id ( para_list )
stmt expr := expr
para_list para_list , para
para_list para
para id
expr id ( expr_list )
expr id
expr_list expr_list , expr
expr_list expr
Stack
$ - - - id ( id
Input
, id ) - - - $
$- - - procid ( id
, id ) - - - $
73
LR(k) Parsing
The L stands for scanning the input from
left to right
The R stands for constructing a rightmost
derivation in reverse
The k stands for the number of lookahead
input symbols used to make parsing
decisions
74
LR Parsing
The LR parsing algorithm
Constructing SLR(1) parsing tables
Constructing LR(1) parsing tables
Constructing LALR(1) parsing tables
75
Model of an LR Parser
$
Input
Stack
Sm
Xm
Sm-1
Xm-1
S0
$
Parsing driver
Action
Output
Goto
Parsing table
76
An Example
State
(1) E
(2) E
(3) T
(4) T
(5) F
(6) F
E + T
T
T * F
F
( E )
id
0
1
2
3
4
5
6
7
8
9
10
11
Action
id + * ( )
s5
s4
s6
r2 s7
r2
r4 r4
r4
s5
s4
r6 r6
r6
s5
s4
s5
s4
s6
s11
r1 s7
r1
r3 r3
r3
r5 r5
r5
$
Goto
E T F
1 2 3
acc
r2
r4
8 2
3
9
3
10
r6
r1
r3
r5
77
An Example
Action
s5
r6
r4
r2
s6
s5
r6
r4
s7
s5
r6
r3
r1
acc
Stack
$0
$0 id5
$0 F3
$0 T2
$0 E1
$0 E1 +6
$0 E1 +6 id5
$0 E1 +6 F3
$0 E1 +6 T9
$0 E1 +6 T9 *7
$0 E1 +6 T9 *7 id5
$0 E1 +6 T9 *7 F10
$0 E1 +6 T9
$0 E1
Input
id + id * id $
+ id * id $
+ id * id $
+ id * id $
+ id * id $
id * id $
* id $
* id $
* id $
id $
$
$
$
$
78
LR Parsing Driver
push $s0 onto the stack, where s0 is the initial state
set ip to point to the first symbol of w$;
repeat
let s be the top state on the stack and a the symbol pointed to by ip;
if action[s, a] == shift s’ then
push a and s’ onto the stack and advance ip
else if action[s, a] == reduce A then
pop 2 * | | symbols off the stack; s’ = goto[top(), A];
push a and s’ onto the stack and advance ip
else if action[s, a] == accept then
return
else error
until false
79
LR(0) Items
• An LR(0) item of a grammar in G is a
production of G with a dot at some position of
the right-hand side, A
• The production A X Y Z yields the
following four LR(0) items
A • X Y Z, A X • Y Z,
A X Y • Z, A X Y Z •
• An LR(0) item represents a state in an NPDA
indicating how much of a production we have
seen at a given point in the parsing process
80
From CFG to NPDA
• The state A B will go to the state B
via an edge of the empty string
• The state A a will go to the state A
a via an edge of terminal a (a shifting)
• The state A will cause a reduction on
seeing a terminal in FOLLOW(A)
• The state A B will go to the state A
B via an edge of nonterminal B (after a
reduction)
81
An Example
Augmented grammar:
Easier to identify
the accepting state
1. E’ E
2. E E + T
3. E T
4. T T * F
5. T F
6. F ( E )
7. F id
82
An Example
1
8
14
17
E
T EE+T•
+
E•E+T
EE•+T
EE+•T
4
5
9
5
12
16
)
F(•E) E F(E•)
2
T
0
2
4
6
13
3
id
E•T
T•F
Fid•
E’•E
F•id
ET•
E
F•(E)
F
(
E’E•
F(E)•
11
TF•
T•T*F T TT•*F
7
19
3
10
*
F TT*F•
6 18
7
TT*•F
15
83
From NPDA to DPDA
• There are two functions performed on sets
of LR(0) items (DPDA states)
• The function closure(I) adds more items to I
when there is a dot to the left of a
nonterminal (corresponding to edges)
• The function goto(I, X) moves the dot past
the symbol X in all items in I that contain X
(corresponding to non- edges)
84
The Closure Function
function closure(I);
begin
J := I;
repeat
for each item A B in J and
each production B of G such that
B is not in J do
J=J{B}
until no more items can be added to J;
return J
end
85
An Example
1. E’ E
2. E E + T
3. E T
4. T T * F
5. T F
6. F ( E )
7. F id
s0 = E’ E,
I0 = closure({s0 }) =
{ E’ E,
E E + T,
E T,
T T * F,
T F,
F ( E ),
F id }
86
The Goto Function
function goto(I, X);
begin
set J to the empty set
for any item A X in I do
add A X to J
return closure(J)
end
87
An Example
I0 = {E’ E, E E + T, E T,
T T * F, T F, F ( E ),
F id }
goto(I0 , E)
= closure({E’ E , E E + T })
= {E’ E , E E + T }
88
Subset Construction
function items(G’);
begin
C := {closure({S’ S})}
repeat
for each set of items I in C and each symbol X do
J := goto(I, X)
if J is not empty and not in C then
C=C{J}
until no more sets of items can be added to C
return C
end
89
An Example
1. E’ E
2. E E + T
3. E T
4. T T * F
5. T F
6. F ( E )
7. F id
90
I0 : E’ E
E E + T
E T
T T * F
T F
F ( E )
F id
goto(I0, E) =
I1 : E’ E
E E+ T
goto(I0, T) =
I2 : E T
T T* F
goto(I0, F) =
I3 : T F
goto(I0, ‘(’) =
I4 : F ( E )
E E + T
E T
T T * F
T F
F ( E )
F id
goto(I0, id) =
I5 : F id
goto(I1, ‘+’) =
I6 : E E + T
T T * F
T F
F ( E )
F id
goto(I2, ‘*’) =
I7 : T T * F
F ( E )
F id
goto(I4, E) =
I8 : F ( E )
E E+ T
goto(I6, T) =
I9 : E E + T
T T* F
goto(I7, F) =
I10 : T T * F
goto(I8, ‘)’) =
I11 : F ( E )
91
An Example
E’ • E
F id •
F(•E)
id E • E + T
E • E + T id 5
F (E•) 8
E•T
E•T
E EE•+T +
(
T•T*F
T T•T*F
)
(
T•F
F T•F
7 id
F•(E)
F•(E)
(
F ( E ) • 11
F • id 0
TT*•F
F • id 4
F•(E) F
E
F T
2
T T * F • 10
F • id
*
ET•
TT•*F
EE+•T (
*
id T • T * F
E E+T• 9
TF•
3
F T•F
T TT•*F
+
E’ E •
F•(E)
EE•+T 1
F • id
6
92
SLR(1) Parsing Table Generation
procedure SLR(G’);
begin
for each state I in items(G’) do begin
if A a in I and goto(I, a) = J for a terminal a then
action[I, a] = “shift J”
if A in I and A S’ then
action[I, a] = “reduce A ” for all a in Follow(A)
if S’ S in I then action[I, $] = “accept”
if A X in I and goto(I, X) = J for a nonterminal X
then goto[I, X] = J
end
all other entries in action and goto are made error
93
end
An Example
+
0
1
2
3
4
5
6
7
8
9
10
11
s6
r3
r5
*
(
s4
s7
r5
)
r7
s5
r7
s4
s4
s6
r2
r4
r6
s7
r4
r6
$
T
2
F
3
8
2
3
9
3
10
r7
s5
s5
s11
r2
r4
r6
E
1
a
r3
r5
r3
r5
s4
r7
id
s5
r2
r4
r6
94
共勉
子曰︰唯仁者,能好人,能惡人。
子曰︰志士仁人,無求生以害仁,
有殺身以成仁。
子曰︰茍志於仁矣,無惡也。
-- 論語
95
LR(1) Items
• An LR(1) item of a grammar in G is a pair,
( A , a ), of an LR(0) item A
and a lookahead symbol a
• The lookahead has no effect in an LR(1) item
of the form ( A , a ), where is not
• An LR(1) item of the form ( A , a )
calls for a reduction by A only if the next
input symbol is a
96
The Closure Function
function closure(I);
begin
J := I;
repeat
for each item (A B , a) in J and
each production B of G and
each b FIRST( a) such that
(B , b) is not in J do
J = J { (B , b) }
until no more items can be added to J;
return J
end
97
The Goto Function
function goto(I, X);
begin
set J to the empty set
for any item (A X , a) in I do
add (A X , a) to J
return closure(J)
end
98
Subset Construction
function items(G’);
begin
C := {closure({S’ S, $})}
repeat
for each set of items I in C and each symbol X do
J := goto(I, X)
if J is not empty and not in C then
C=C{J}
until no more sets of items can be added to C
return C
end
99
An Example
1. S’ S
2. S C C
3. C c C
4. C d
100
An Example
I0: closure({(S’ S, $)}) =
(S’ S, $)
(S C C, $)
(C c C, c/d)
(C d, c/d)
I3: goto(I0, c) =
(C c C, c/d)
(C c C, c/d)
(C d, c/d)
I1: goto(I0, S) = (S’ S , $)
I4: goto(I0, d) =
(C d , c/d)
I2: goto(I0, C) =
(S C C, $)
(C c C, $)
(C d, $)
I5: goto(I2, C) =
(S C C , $)
101
An Example
I6: goto(I2, c) =
(C c C, $)
(C c C, $)
(C d, $)
I7: goto(I2, d) =
(C d , $)
: goto(I3, c) = I3
: goto(I3, d) = I4
I9: goto(I6, C) =
(C c C , $)
: goto(I6, c) = I6
I8: goto(I3, C) =
(C c C , c/d)
: goto(I6, d) = I7
102
LR(1) Parsing Table Generation
procedure LR(G’);
begin
for each state I in items(G’) do begin
if (A a , b) in I and goto(I, a) = J for a terminal a then
action[I, a] = “shift J”
if (A , a) in I and A S’ then
action[I, a] = “reduce A ”
if (S’ S , $) in I then action[I, $] = “accept”
if (A X , a) in I and goto(I, X) = J for a nonterminal X
then goto[I, X] = J
end
all other entries in action and goto are made error
end
103
An Example
0
1
2
3
4
5
6
7
8
9
c
s3
d
s4
$
S
1
C
2
a
s6
s3
r4
s7
s4
r4
5
8
r2
s6
s7
9
r4
r3
r3
r3
104
The Core of LR(1) Items
• The core of a set of LR(1) Items is the set of
their first components (i.e., LR(0) items)
• The core of the set of LR(1) items
{ (C c C, c/d),
(C c C, c/d),
(C d, c/d) }
is {C c C,
C c C,
Cd}
105
Merging Cores
I3: { (C c C, c/d)
(C c C, c/d)
(C d, c/d) }
I6: { (C c C, $)
(C c C, $)
(C d, $) }
I4: { (C d , c/d) }
I7: { (C d , $) }
I8: { (C c C , c/d) }
I9: { (C c C , $) }
106
LALR(1) Parsing Table Generation
procedure LALR(G’);
begin
for each state I in mergeCore(items(G’)) do begin
if (A a , b) in I and goto(I, a) = J for a terminal a then
action[I, a] = “shift J”
if (A , a) in I and A S’ then
action[I, a] = “reduce A ”
if (S’ S , $) in I then action[I, $] = “accept”
if (A X , a) in I and goto(I, X) = J for a nonterminal X
then goto[I, X] = J
end
all other entries in action and goto are made error
end
107
An Example
0
1
2
36
47
5
89
c
d
s36 s47
$
S
1
C
2
a
s36 s47
s36 s47
r4
r4
r3
r3
5
89
r4
r2
r3
108
LR Grammars
• A grammar is SLR(1) iff its SLR(1) parsing
table has no multiply-defined entries
• A grammar is LR(1) iff its LR(1) parsing
table has no multiply-defined entries
• A grammar is LALR(1) iff its LALR(1)
parsing table has no multiply-defined
entries
109
Hierarchy of Grammar Classes
Unambiguous Grammars Ambiguous Grammars
LL(k)
LR(k)
LR(1)
LALR(1)
LL(1)
SLR(1)
110
Hierarchy of Grammar Classes
• Why LL(k) LR(k)?
• Why SLR(k) LALR(k) LR(k)?
111
LL(k) vs. LR(k)
• For a grammar to be LL(k), we must be able
to recognize the use of a production by
seeing only the first k symbols of what its
right-hand side derives
• For a grammar to be LR(k), we must be able
to recognize the use of a production by
having seen all of what is derived from its
right-hand side with k more symbols of
lookahead
112
LALR(k) vs. LR(k)
• The merge of the sets of LR(1) items having the
same core does not introduce shift/reduce conflicts
• Suppose there is a shift-reduce conflict on
lookahead a in the merged set because of
1. (A , a)
2. (B a , b)
• Then some set of items has item (A , a) , and
since the cores of all sets merged are the same, it
must have an item (B a , c) for some c
• But then this set has the same shift/reduce conflict
on a
113
LALR(k) vs. LR(k)
• The merge of the sets of LR(1) items having the
same core may introduce reduce/reduce conflicts
• As an example, consider the grammar
1. S’ S
2. S a A d | a B e | b A e | b B d
3. A c
4. B c
that generates acd, ace, bce, bcd
• The set {(A c , d), (B c , e)} is valid for acx
• The set {(A c , e), (B c , d)} is valid for bcx
• But the union {(A c , d/e), (B c , d/e)}
generates a reduce/reduce conflict
114
SLR(k) vs. LALR(k)
1. S’ S
2. S L = R
3. S R
4. L * R
5. L id
6. R L
115
SLR(k) vs. LALR(k)
I0: closure({S’ S}) =
S’ S
SL=R
SR
L*R
L id
RL
I1: goto(I0, S) = S’ S
I2: goto(I0, L) =
SL=R
RL
I3: goto(I0, R) =
SR
I4: goto(I0, *) =
L*R
RL
L*R
L id
I5: goto(I0, id) =
L id
FOLLOW(R) = {=, $}
116
SLR(k) vs. LALR(k)
I6: goto(I2, =) =
SL=R
RL
L*R
L id
I8: goto(I4, L) =
RL
I9: goto(I6, R) =
SL=R
I7: goto(I4, R) =
L*R
117
SLR(k) vs. LALR(k)
I0: closure({(S’ S, $)}) =
(S’ S, $)
(S L = R, $)
(S R, $)
(L * R, =/$)
(L id, =/$)
(R L, $)
I1: goto(I0, S) = (S’ S , $)
I2: goto(I0, L) =
(S L = R, $)
(R L , $)
I3: goto(I0, R) =
(S R , $)
I4: goto(I0, *) =
(L * R, =/$)
(R L, =/$)
(L * R, =/$)
(L id, =/$)
I5: goto(I0, id) =
(L id , =/$)
118
SLR(k) vs. LALR(k)
I6: goto(I2, =) =
(S L = R, $)
(R L, $)
(L * R, $)
(L id, $)
I10: goto(I6, L) =
(R L , $)
I4
I7: goto(I4, R) =
(L * R , =/$)
I11: goto(I6, *) =
(L * R, $)
(R L, $)
(L * R, $)
(L id, $)
I8: goto(I4, L) =
(R L , =/$)
I12: goto(I6, id) =
(L id , $)
I5
I9: goto(I6, R) =
(S L = R , $)
I13: goto(I11, R) =
(L * R , $)
119
Bison – A Parser Generator
A langauge for specifying parsers and semantic analyzers
lang.y
lang.tab.c
tokens
Bison compiler
C compiler
a.out
lang.tab.c
lang.tab.h (-d option)
a.out
syntax tree
120
Bison Programs
%{
C declarations
%}
Bison declarations
%%
Grammar rules
%%
Additional C code
121
An Example
line expr ‘\n’
expr expr ‘+’ term | term
term term ‘*’ factor | factor
factor ‘(’ expr ‘)’ | DIGIT
122
An Example
%token DIGIT
%start line
%%
line : expr ‘\n’ {printf(“line: expr \\n\n”);}
;
expr: expr ‘+’ term {printf(“expr: expr + term\n”);}
| term {printf(“expr: term\n”}
;
term: term ‘*’ factor {printf(“term: term * factor\n”;}
| factor {printf(“term: factor\n”);}
;
factor: ‘(’ expr ‘)’ {printf(“factor: ( expr )\n”);}
| DIGIT {printf(“factor: DIGIT\n”);}
;
123
Functions and Variables
• yyparse(): the parser function
• yylex(): the lexical analyzer function. Bison
recognizes any non-positive value as
indicating the end of the input
• yylval: the attribute value of a token. Its
default type is int, and can be declared to be
multiple types in the first section using
%union {
int ival;
double dval;
}
124
Conflict Resolutions
• A reduce/reduce conflict is resolved by
choosing the production listed first
• A shift/reduce conflict is resolved in favor
of shift
• A mechanism for assigning precedences and
assocoativities to terminals
125
Precedence and Associativity
• The precedence and associativity of operators are
declared simultaneously
%nonassoc ‘<’
/* lowest */
%left ‘+’ ‘-’
%right ‘^’
/* highest */
• The precedence of a rule is determined by the
precedence of its rightmost terminal
• The precedence of a rule can be modified by
adding %prec <terminal> to its right end
126
An Example
%{
#include <stdio.h>
%}
%token NUMBER
%left ‘+’ ‘-’
%left ‘*’ ‘/’
%right UMINUS
%%
127
An Example
line :
;
expr:
|
|
|
|
|
|
;
expr ‘\n’
expr ‘+’ expr
expr ‘-’ expr
expr ‘*’ expr
expr ‘/’ expr
‘-’ expr %prec UMINUS
‘(’ expr ‘)’
NUMBER
128
Error Recovery
• Error recovery is performed via error productions
• An error production is a production containing
the predefined terminal error
• After adding an error production,
A B | error
on encountering an error in the middle of B, the
parser pops symbols from its stack until , shifts
error, and skips input tokens until a token in
FIRST()
129
Error Recovery
• The parser can report a syntax error by calling
the user provided function yyerror(char *)
• The parser will suppress the report of another
error message for 3 tokens
• You can resume error report immediately by
using the macro yyerrok
• Error productions are used for major
nonterminals
130
An Example
line : expr ‘\n’
| error ‘\n’ {yyerror("reenter last line:");
yyerrok;}
;
expr: expr ‘+’ expr
| expr ‘*’ expr
| ‘-’ expr %prec UMINUS
| ‘(’ expr ‘)’
| NUMBER
;
131
共勉
子曰︰里仁為美。擇不處仁,焉得知?
子曰︰不仁者不可以久處約,
不可以長處樂。仁者安仁,知者利仁。
子曰︰朝聞道,夕死可矣!
-- 論語
132