Syntax Analysis

Transcript Syntax Analysis

Syntax Analysis
1
Syntax Analysis
Introduction to parsers
Context-free grammars
Push-down automata
Top-down parsing
Buttom-up parsing
Bison - a parser generator
2
Introduction to parsers
source Lexical
code Analyzer
token
next token
syntax Semantic
Parser
Analyzer
tree
Symbol
Table
3
Context-Free Grammars
A set of terminals: basic symbols from
which sentences are formed
A set of nonterminals: syntactic
categories denoting sets of sentences
A set of productions: rules specifying
how the terminals and nonterminals can
be combined to form sentences
The start symbol: a distinguished
nonterminal denoting the language
4
An Example
Terminals: id, ‘+’, ‘-’, ‘*’, ‘/’, ‘(’, ‘)’
Nonterminals: expr, op
Productions:
expr  expr op expr
expr  ‘(’ expr ‘)’
expr  ‘-’ expr
expr  id
op  ‘+’ | ‘-’ | ‘*’ | ‘/’
The start symbol: expr
5
Derivations
A derivation step is an application of a
production as a rewriting rule
E-E
A sequence of derivation steps
E  - E  - ( E )  - ( id )
is called a derivation of “- ( id )” from E
The symbol * denotes “derives in zero or
more steps”; the symbol + denotes “derives
in one or more steps
E * - ( id )
E + - ( id )
6
Context-Free Languages
A context-free language L(G) is the language
defined by a context-free grammar G
A string of terminals  is in L(G) if and only
if S + ,  is called a sentence of G
If S * , where  may contain nonterminals,
then we call  a sentential form of G
E  - E  - ( E )  - ( id )
G1 is equivalent to G2 if L(G1) = L(G2)
7
Left- & Right-most Derivations
Each derivation step needs to choose
– a nonterminal to rewrite
– a production to apply
A leftmost derivation always chooses the
leftmost nonterminal to rewrite
E lm - E lm - ( E ) lm - ( E + E )
lm - ( id + E ) lm - ( id + id )
A rightmost derivation always chooses the
rightmost nonterminal to rewrite
E rm - E rm - ( E ) rm - ( E + E )
rm - (E + id ) rm - ( id + id )
8
Parse Trees
A parse tree is a graphical representation
for a derivation that filters out the order of
choosing nonterminals for rewriting
Many derivations may correspond to the
same parse tree, but every parse tree has
associated with it a unique leftmost and a
unique rightmost derivation
9
An Example
E
E
lm - E
lm - ( E )
lm - ( E + E )
lm - ( id + E )
lm - ( id + id )
E
(
E
)
E
+
E
id
id
E
rm - E
rm - ( E )
rm - ( E + E )
rm - ( E + id )
rm - ( id + id )
10
Ambiguous Grammar
A grammar is ambiguous if it produces
more than one parse tree for some sentence
E
E+E
 id + E
 id + E * E
 id + id * E
 id + id * id
E
E*E
E+E*E
 id + E * E
 id + id * E
 id + id * id
11
Ambiguous Grammar
E
E
E
+
E
id
E
*
id
E
E
id
id
E
*
E
+
E
id
id
12
Resolving Ambiguity
Use disambiguiting rules to throw away
undesirable parse trees
Rewrite grammars by incorporating
disambiguiting rules into grammars
13
An Example
The dangling-else grammar
stmt  if expr then stmt
| if expr then stmt else stmt
| other
Two parse trees for
if E1 then if E2 then S1 else S2
14
An Example
S
if
E
then
S
if
E
then
S
else
then
S
S
S
if
E
if
E
then
S
else
S
15
Disambiguiting Rules
Rule: match each else with the closest
previous unmatched then
Remove undesired state transitions in the
pushdown automaton
16
Grammar Rewriting
stmt  m_stmt
| unm_stmt
m_stmt  if expr then m_stmt else m_stmt
| other
unm_stmt  if expr then stmt
| if expr then m_stmt else unm_stmt
17
RE vs. CFG
Every language described by a RE can also
be described by a CFG
Why use REs for lexical syntax?
– do not need a notation as powerful as CFGs
– are more concise and easier to understand than
CFGs
– More efficient lexical analyzers can be
constructed from REs than from CFGs
– Provide a way for modularizing the front end
into two manageable-sized components
18
Push-Down Automata
Input
$
Stack
Finite Automaton
Output
$
19
An Example
S’  S $
SaSb
S
start
0
(a, a)
a
(a, $)
a
(b, a)
1
a
($, $)
(b, a)
a
2
($, $)
3
20
Nonregular Constructs
REs can denote only a fixed number of
repetitions or an unspecified number of
repetitions of one given construct:
an, a*
A nonregular construct:
– L = {anbn | n  0}
21
Non-Context-Free Constructs
CFGs can denote only a fixed number of
repetitions or an unspecified number of
repetitions of one or two given constructs
Some non-context-free constructs:
– L1 = {wcw | w is in (a | b)*}
– L2 = {anbmcndm | n  1 and m  1}
– L3 = {anbncn | n  0}
22
共勉
大學之道︰
在明明德，在親民，在止於至善。
-- 大學
23
Top-Down Parsing
Construct a parse tree from the root to the
leaves using leftmost derivation
1. S  c A B
2. A  a b
3. A  a
4. B  d
S
1
S
c A B
input: cad
S
2
3
c A B
a
S
c A B
b
a
backtrack
4
S
c A B
a d
24
Predictive Parsing
A top-down parsing without backtracking
– there is only one alternative production to
choose at each derivation step
stmt  if expr then stmt else stmt
| while expr do stmt
| begin stmt_list end
25
LL(k) Parsing
The first L stands for scanning the input
from left to right
The second L stands for producing a
leftmost derivation
The k stands for the number of lookahead
input symbols used to choose alternative
productions at each derivation step
26
LL(1) Parsing
Use one input symbol of lookahead
Recursive-descent parsing
Nonrecursive predictive parsing
27
An Example
LL(1): S  a b e | c d e
LL(2): S  a b e | a d e
28
Recursive Descent Parsing
The parser consists of a set of (possibly
recursive) procedures
Each procedure is associated with a
nonterminal of the grammar that is
responsible to derive the productions of
that nonterminal
Each procedure should be able to choose
a unique production to derive based on
the current token
29
An Example
{integer, char, num}
type  simple
| id
| array [ simple ] of type
simple  integer
| char
| num dotdot num
30
Recursive Descent Parsing
♥ For each terminal in the production, the
terminal is matched with the current token
♥ For each nonterminal in the production, the
procedure associated with the nonterminal
is called
♥ The sequence of matchings and procedure
calls in processing the input implicitly
defines a parse tree for the input
31
An Example
array [ num dotdot num ] of integer
type
array
[
simple
]
num
dotdot
num
of
type
simple
integer
32
An Example
procedure match(t : terminal);
begin
if lookahead = t then
lookahead := nexttoken
else error
end;
33
An Example
procedure type;
begin
if lookahead is in { integer, char, num } then
simple
else if lookahead = id then
match(id)
else if lookahead = array then begin
match(array); match('['); simple; match(']');
match(of); type
end
else error
end;
34
An Example
procedure simple;
begin
if lookahead = integer then
match(integer)
else if lookahead = char then
match(char)
else if lookahead = num then begin
match(num); match(dotdot); match(num)
end
else error
end;
35
First Sets
The first set of a string  is the set of
terminals that begin the strings derived from
. If  *  , then  is also in the first set of
.
36
First Sets
If X is terminal, then FIRST(X) is {X}
If X is nonterminal and X   is a
production, then add  to FIRST(X)
If X is nonterminal and X  Y1 Y2 ... Yk is
a production, then add a to FIRST(X) if
for some i, a is in FIRST(Yi) and  is in all
of FIRST(Y1), ..., FIRST(Yi-1). If  is in
FIRST(Yj) for all j, then add  to FIRST(X)
37
An Example
E  T E'
E'  + T E' | 
T  F T'
T'  * F T' | 
F  ( E ) | id
FIRST(F) = { (, id }
FIRST(T') = { *,  },
FIRST(E') = { +,  },
FIRST(T) = { (, id }
FIRST(E) = { (, id }
38
Follow Sets
The follow set of a nonterminal A is the set of
terminals that can appear immediately to the
right of A in some sentential form, namely,
S *  A a 
a is in the follow set of A.
39
Follow Sets
Place $ in FOLLOW(S), where S is the start
symbol and $ is the input right endmarker
If there is a production A   B , then
everything in FIRST() except for  is placed
in FOLLOW(B)
If there is a production A   B or A   B
where FIRST() contains  , then everything
in FOLLOW(A) is in FOLLOW(B)
40
An Example
E  T E'
E'  + T E' | 
T  F T'
T'  * F T' | 
F  ( E ) | id
FIRST(E) = FIRST(T) = FIRST(F) = { (, id }
FIRST(E') = { +,  },
FIRST(T') = { *,  }
FOLLOW(E) = { ), $ }, FOLLOW(E') = { ), $ }
FOLLOW(T) = { +, ), $ }, FOLLOW(T') = { +, ), $ }
FOLLOW(F) = { +, *, ), $ }
41
Nonrecursive Predictive Parsing
Input
Stack
Parsing driver
Output
Parsing table
42
Stack Operations
Match
– when the top stack symbol is a terminal and it
matches the input token, pop the terminal and
advance the input pointer
Expand
– when the top stack symbol is a nonterminal,
replace this symbol by the right hand side of
one of its productions (pop the nonterminal and
push the right hand side of a production in
reverse order)
43
An Example
type  simple
| id
| array [ simple ] of type
simple  integer
| char
| num dotdot num
44
An Example
Action
Stack
Input
E
type
array [ num dotdot num ] of integer
M type of ] simple [ array array [ num dotdot num ] of integer
M type of ] simple [
[ num dotdot num ] of integer
E
type of ] simple
num dotdot num ] of integer
M type of ] num dotdot num
num dotdot num ] of integer
M type of ] num dotdot
dotdot num ] of integer
M type of ] num
num ] of integer
M type of ]
] of integer
M type of
of integer
E
type
integer
E simple
integer
45
M integer
integer
Parsing Driver
push $S onto the stack, where S is the start symbol
set ip to point to the first symbol of w$;
repeat
let X be the top stack symbol and a the symbol pointed to by ip;
if X is a terminal or $ then
if X = a then
pop X from the stack and advance ip
else error
else /* X is a nonterminal */
if M[X, a] = X  Y1 Y2 ... Yk then
pop X from the stack and push Yk ... Y2 Y1 onto the stack
else error
until X = $ and a = $
46
Constructing Parsing Table
Input. Grammar G.
Output. Parsing Table M.
Method.
1. For each production A  , do steps 2 and 3.
2. For each terminal a in FIRST( ), add A   to M[A, a].
3. If  is in FIRST( ), add A   to M[A, b] for each
symbol b in FOLLOW(A).
4. Make each undefined entry of M be error.
47
An Example
FIRST(E) = FIRST(T) = FIRST(F) = { (, id }
FIRST(E') = { +,  },
FIRST(T') = { *,  }
FOLLOW(E) = { ), $ },
FOLLOW(E') = { ), $ }
FOLLOW(T) = { +, ), $ }, FOLLOW(T') = { +, ), $ }
FOLLOW(F) = { +, *, ), $ }
id
+
*
(
)
$
E E  TE'
E  TE'
E'
E'  +TE'
E'   E'  
T T  FT'
T  FT'
T'
T'  
T'  *FT'
T'   T'  
F F  id
F  (E)
48
An Example
Stack
$E
$E'T
$E'T'F
$E'T'id
$E'T'
$E'
$E'T+
$E'T
$E'T'F
$E'T'id
$E'T'
$E'T'F*
$E'T'F
$E'T'id
$E'T'
$E'
$
Input
id + id * id$
id + id * id$
id + id * id$
id + id * id$
+ id * id$
+ id * id$
+ id * id$
id * id$
id * id$
id * id$
* id$
* id$
id$
id$
$
$
$
Output
E  TE'
T  FT'
F  id
T'  
E'  +TE'
T  FT'
F  id
T'  *FT'
F  id
T'  
E'  
49
LL(1) Grammars
A grammar is an LL(1) grammar if its LL(1)
parsing table has no multiply-defined
entries
50
A Counter Example
S  i E t S S' | a
S'  e S | 
Eb
a
S Sa
S'
E
b
Eb
FOLLOW(S) = {$, e}
FOLLOW(S') = {$, e}
FOLLOW(E) = {t}
e
S'  e S
S'  
i
t
S  i E t S S'
$
S'  
51
LL(1) Grammars
A grammar G is LL(1) iff whenever A   | 
are two distinct productions of G, the
following conditions hold:
– For no terminal a do both  and  derive strings
beginning with a.
FIRST(α)  FIRST(β) = 
– At most one of  and  can derive the empty string.
– If  *  , then  does not derive any string
beginning with a terminal in FOLLOW(A).
FIRST(α)  FOLLOW(A) = 
52
Left Recursion
A grammar is left recursive if it has a
nonterminal A such that A * A 
A A|
A
AR
A
R
A
RR|
R
R
R
A
A

  





53
Direct Left Recursion
A  A 1 | A 2 | ... | A m | 1 | 2 | ... | n
A  1 A' | 2 A' | ... | n A'
A'  1 A' | 2 A' | ... | m A' | 
54
An Example
E E + T | T
T T * F | F
F  ( E ) | id
E  T E'
E'  + T E' | 
T  F T'
T'  * F T' | 
F  ( E ) | id
55
Indirect Left Recursion
S Aa|b
A Ac | S d | 
S  Aa  Sda
A Ac |Aa d | b d | 
S Aa|b
A  b d A' | A'
A'  c A' | a d A' | 
56
Indirect Left Recursion
Input. Grammar G with no cycles (derivations of the form
A + A) or -production (productions of the form A  ).
Output. An equivalent grammar with no left recursion.
1. Arrange the nonterminals in some order A1, A2, ..., An
2. for i := 1 to n do begin
for j := 1 to i - 1 do begin
replace each production of the form Ai  Aj 
by the production Ai  1  | 2  | ... | k 
where Aj  1 | 2 | ... | k are all the
current Aj-productions;
end
eliminate direct left recursion among Ai-productions
end
57
Left Factoring
Two alternatives of a nonterminal A have a
nontrivial common prefix if    , and
A   1 |  2
A   A'
A'  1 | 2
58
An Example
S iEtS|iEtSeS|a
E b
S  i E t S S' | a
S'  e S | 
E b
59
Error Recovery
 Panic mode: skip tokens until a token in a set
of synchronizing tokens appears
1. If a terminal on stack cannot be matched, pop the
terminal
2. use FOLLOW(A) as sync set for A (pop A)
3. use the first set of a higher construct as sync set
for A
4. use FIRST(A) as sync set for A
5. use the production deriving  as the default for A
60
An Example
E  T E'
E'  + T E' | 
T  F T'
T'  * F T' | 
F  ( E ) | id
FIRST(E) = FIRST(T) = FIRST(F) = { (, id }
FIRST(E') = { +,  }
FIRST(T') = { *,  }
FOLLOW(E) = FOLLOW(E') = { ), $ }
FOLLOW(T) = FOLLOW(T') = { +, ), $ }
FOLLOW(F) = { +, *, ), $ }
61
An Example
id
E E  TE'
E'
T T  FT'
T'
F F  id
+
*
(
)
E  TE' sync2
E'  +TE'
E'  
sync2
T  FT' sync2
T'  
T'  *FT'
T'  
sync2
sync2
F  (E) sync2
$
sync2
E'  
sync2
T'  
sync2
62
An Example
Stack
$E
$E
$E'T
$E'T'F
$E'T'id
$E'T'
$E'T'F*
$E'T'F
$E'T'
$E'
$E'T+
$E'T
$E'T'F
$E'T'id
$E'T'
$E'
$
Input
) id * + id$
id * + id$
id * + id$
id * + id$
id * + id$
* + id$
* + id$
+ id$
+ id$
+ id$
+ id$
id$
id$
id$
$
$
$
Output
error, skip )
E  TE'
T  FT'
F  id
T'  *FT'
error
F has been poped
E'  +TE'
T  FT'
F  id
T'  
E'  
63
共勉
樊遲問仁。
子曰︰愛人。
子曰︰人之過也，各於其黨，
觀過，斯知仁矣。 -- 論語
人生的目的在追尋快樂。
-- 達賴喇嘛
64
Bottom-Up Parsing
Construct a parse tree from the leaves to the
root using rightmost derivation in reverse
S  aABe
A Abc|b
B d
input: abbcde
S
A
A
abbcde
abbcde
A
abbcde
rm
aAbcde
A B
A
abbcde
 rm
A B
aAde
A
abbcde
 rm
aABe
abbcde
 rm
S
65
Handles
A handle  of a right-sentential form 
consists of
– a production A  
– a position of  where  can be replaced by A to
produce the previous right-sentential form in a
rightmost derivation of 
abbcde
A b
rm
aAbcde
 rm
A Abc
aAde
 rm
aABe
 rm
S
B d S  aABe
66
Handle Pruning
    rm  A  rm S
S

A


 The string  to the right of the handle contains only
terminals
 A is the bottommost leftmost interior node with all
its children in the tree
67
An Example
S
S
S
A B
A B
A B
A
A
abbcde
a bcde
a
de
S
S
A B
a
e
68
Shift-Reduce Parsing
$
Input
Handle
Parsing driver
$
Stack
Output
Parsing table
69
Stack Operations
Shift: shift the next input symbol onto the
top of the stack
Reduce: replace the handle at the top of the
stack with the corresponding nonterminal
Accept: announce successful completion of
the parsing
Error: call an error recovery routine
70
An Example
Action
S
S
R
S
S
R
S
R
S
R
A
Stack
$
$a
$ab
$aA
$aAb
$aAbc
$aA
$aAd
$aAB
$aABe
$S
Input
abbcde$
bbcde$
bcde$
bcde$
cde$
de$
de$
e$
e$
$
$
71
Shift/Reduce Conflict
stmt  if expr then stmt
| if expr then stmt else stmt
| other
Stack
$ - - - if expr then stmt
Input
else - - - $
Shift  if expr then stmt else stmt
Reduce  if expr then stmt
72
Reduce/Reduce Conflict
stmt  id ( para_list )
stmt  expr := expr
para_list  para_list , para
para_list  para
para  id
expr  id ( expr_list )
expr  id
expr_list  expr_list , expr
expr_list  expr
Stack
$ - - - id ( id
Input
, id ) - - - $
$- - - procid ( id
, id ) - - - $
73
LR(k) Parsing
The L stands for scanning the input from
left to right
The R stands for constructing a rightmost
derivation in reverse
The k stands for the number of lookahead
input symbols used to make parsing
decisions
74
LR Parsing
The LR parsing algorithm
Constructing SLR(1) parsing tables
Constructing LR(1) parsing tables
Constructing LALR(1) parsing tables
75
Model of an LR Parser
$
Input
Stack
Sm
Xm
Sm-1
Xm-1
S0
$
Parsing driver
Action
Output
Goto
Parsing table
76
An Example
State
(1) E
(2) E
(3) T
(4) T
(5) F
(6) F
E + T
T
T * F
F
( E )
 id
0
1
2
3
4
5
6
7
8
9
10
11
Action
id + * ( )
s5
s4
s6
r2 s7
r2
r4 r4
r4
s5
s4
r6 r6
r6
s5
s4
s5
s4
s6
s11
r1 s7
r1
r3 r3
r3
r5 r5
r5
$
Goto
E T F
1 2 3
acc
r2
r4
8 2
3
9
3
10
r6
r1
r3
r5
77
An Example
Action
s5
r6
r4
r2
s6
s5
r6
r4
s7
s5
r6
r3
r1
acc
Stack
$0
$0 id5
$0 F3
$0 T2
$0 E1
$0 E1 +6
$0 E1 +6 id5
$0 E1 +6 F3
$0 E1 +6 T9
$0 E1 +6 T9 *7
$0 E1 +6 T9 *7 id5
$0 E1 +6 T9 *7 F10
$0 E1 +6 T9
$0 E1
Input
id + id * id $
+ id * id $
+ id * id $
+ id * id $
+ id * id $
id * id $
* id $
* id $
* id $
id $
$
$
$
$
78
LR Parsing Driver
push $s0 onto the stack, where s0 is the initial state
set ip to point to the first symbol of w$;
repeat
let s be the top state on the stack and a the symbol pointed to by ip;
if action[s, a] == shift s’ then
push a and s’ onto the stack and advance ip
else if action[s, a] == reduce A   then
pop 2 * |  | symbols off the stack; s’ = goto[top(), A];
push a and s’ onto the stack and advance ip
else if action[s, a] == accept then
return
else error
until false
79
LR(0) Items
• An LR(0) item of a grammar in G is a
production of G with a dot at some position of
the right-hand side, A    
• The production A  X Y Z yields the
following four LR(0) items
A  • X Y Z, A  X • Y Z,
A  X Y • Z, A  X Y Z •
• An LR(0) item represents a state in an NPDA
indicating how much of a production we have
seen at a given point in the parsing process
80
From CFG to NPDA
• The state A    B will go to the state B 
  via an edge of the empty string 
• The state A    a  will go to the state A 
 a   via an edge of terminal a (a shifting)
• The state A     will cause a reduction on
seeing a terminal in FOLLOW(A)
• The state A    B  will go to the state A 
 B   via an edge of nonterminal B (after a
reduction)
81
An Example
Augmented grammar:
Easier to identify
the accepting state
1. E’  E
2. E  E + T
3. E  T
4. T  T * F
5. T  F
6. F  ( E )
7. F  id
82
An Example
1

8
14
17
E
T EE+T•
+
E•E+T
EE•+T
EE+•T
4


 5
9

5
12
16
)
F(•E) E F(E•)
2
T


0
2
4
6
13
 3



id
E•T
T•F
Fid•
E’•E
F•id
ET•

E
F•(E)
F

(
E’E•
F(E)•
11
TF•
T•T*F T TT•*F
7
19
3

10
*
F TT*F•
6 18

 7
TT*•F
15
83
From NPDA to DPDA
• There are two functions performed on sets
of LR(0) items (DPDA states)
• The function closure(I) adds more items to I
when there is a dot to the left of a
nonterminal (corresponding to  edges)
• The function goto(I, X) moves the dot past
the symbol X in all items in I that contain X
(corresponding to non- edges)
84
The Closure Function
function closure(I);
begin
J := I;
repeat
for each item A   B  in J and
each production B   of G such that
B    is not in J do
J=J{B}
until no more items can be added to J;
return J
end
85
An Example
1. E’  E
2. E  E + T
3. E  T
4. T  T * F
5. T  F
6. F  ( E )
7. F  id
s0 = E’   E,
I0 = closure({s0 }) =
{ E’   E,
E   E + T,
E   T,
T   T * F,
T   F,
F   ( E ),
F   id }
86
The Goto Function
function goto(I, X);
begin
set J to the empty set
for any item A    X  in I do
add A   X   to J
return closure(J)
end
87
An Example
I0 = {E’   E, E   E + T, E   T,
T   T * F, T   F, F   ( E ),
F   id }
goto(I0 , E)
= closure({E’  E , E  E  + T })
= {E’  E , E  E  + T }
88
Subset Construction
function items(G’);
begin
C := {closure({S’   S})}
repeat
for each set of items I in C and each symbol X do
J := goto(I, X)
if J is not empty and not in C then
C=C{J}
until no more sets of items can be added to C
return C
end
89
An Example
1. E’  E
2. E  E + T
3. E  T
4. T  T * F
5. T  F
6. F  ( E )
7. F  id
90
I0 : E’   E
E E + T
E T
T T * F
T F
F ( E )
F   id
goto(I0, E) =
I1 : E’  E 
E E+ T
goto(I0, T) =
I2 : E  T 
T T* F
goto(I0, F) =
I3 : T  F 
goto(I0, ‘(’) =
I4 : F  (  E )
E E + T
E T
T T * F
T F
F ( E )
F   id
goto(I0, id) =
I5 : F  id 
goto(I1, ‘+’) =
I6 : E  E +  T
T T * F
T F
F ( E )
F   id
goto(I2, ‘*’) =
I7 : T  T *  F
F ( E )
F   id
goto(I4, E) =
I8 : F  ( E )
E E+ T
goto(I6, T) =
I9 : E  E + T 
T T* F
goto(I7, F) =
I10 : T  T * F 
goto(I8, ‘)’) =
I11 : F  ( E ) 
91
An Example
E’  • E
F  id •
F(•E)
id E  • E + T
E  • E + T id 5
F (E•) 8
E•T
E•T
E EE•+T +
(
T•T*F
T T•T*F
)
(
T•F
F T•F
7 id
F•(E)
F•(E)
(
F  ( E ) • 11
F  • id 0
TT*•F
F  • id 4
F•(E) F
E
F T
2
T  T * F • 10
F  • id
*
ET•
TT•*F
EE+•T (
*
id T  • T * F
E E+T• 9
TF•
3
F T•F
T TT•*F
+
E’  E •
F•(E)
EE•+T 1
F  • id
6
92
SLR(1) Parsing Table Generation
procedure SLR(G’);
begin
for each state I in items(G’) do begin
if A   a  in I and goto(I, a) = J for a terminal a then
action[I, a] = “shift J”
if A   in I and A  S’ then
action[I, a] = “reduce A ” for all a in Follow(A)
if S’  S  in I then action[I, $] = “accept”
if A   X  in I and goto(I, X) = J for a nonterminal X
then goto[I, X] = J
end
all other entries in action and goto are made error
93
end
An Example
+
0
1
2
3
4
5
6
7
8
9
10
11
s6
r3
r5
*
(
s4
s7
r5
)
r7
s5
r7
s4
s4
s6
r2
r4
r6
s7
r4
r6
$
T
2
F
3
8
2
3
9
3
10
r7
s5
s5
s11
r2
r4
r6
E
1
a
r3
r5
r3
r5
s4
r7
id
s5
r2
r4
r6
94
共勉
子曰︰唯仁者，能好人，能惡人。
子曰︰志士仁人，無求生以害仁，
有殺身以成仁。
子曰︰茍志於仁矣，無惡也。
-- 論語
95
LR(1) Items
• An LR(1) item of a grammar in G is a pair,
( A    , a ), of an LR(0) item A    
and a lookahead symbol a
• The lookahead has no effect in an LR(1) item
of the form ( A    , a ), where  is not 
• An LR(1) item of the form ( A    , a )
calls for a reduction by A   only if the next
input symbol is a
96
The Closure Function
function closure(I);
begin
J := I;
repeat
for each item (A    B , a) in J and
each production B   of G and
each b  FIRST( a) such that
(B   , b) is not in J do
J = J  { (B   , b) }
until no more items can be added to J;
return J
end
97
The Goto Function
function goto(I, X);
begin
set J to the empty set
for any item (A    X , a) in I do
add (A   X   , a) to J
return closure(J)
end
98
Subset Construction
function items(G’);
begin
C := {closure({S’   S, $})}
repeat
for each set of items I in C and each symbol X do
J := goto(I, X)
if J is not empty and not in C then
C=C{J}
until no more sets of items can be added to C
return C
end
99
An Example
1. S’  S
2. S  C C
3. C  c C
4. C  d
100
An Example
I0: closure({(S’   S, $)}) =
(S’   S, $)
(S   C C, $)
(C   c C, c/d)
(C   d, c/d)
I3: goto(I0, c) =
(C  c  C, c/d)
(C   c C, c/d)
(C   d, c/d)
I1: goto(I0, S) = (S’  S , $)
I4: goto(I0, d) =
(C  d , c/d)
I2: goto(I0, C) =
(S  C  C, $)
(C   c C, $)
(C   d, $)
I5: goto(I2, C) =
(S  C C , $)
101
An Example
I6: goto(I2, c) =
(C  c  C, $)
(C   c C, $)
(C   d, $)
I7: goto(I2, d) =
(C  d , $)
: goto(I3, c) = I3
: goto(I3, d) = I4
I9: goto(I6, C) =
(C  c C , $)
: goto(I6, c) = I6
I8: goto(I3, C) =
(C  c C , c/d)
: goto(I6, d) = I7
102
LR(1) Parsing Table Generation
procedure LR(G’);
begin
for each state I in items(G’) do begin
if (A    a , b) in I and goto(I, a) = J for a terminal a then
action[I, a] = “shift J”
if (A   , a) in I and A  S’ then
action[I, a] = “reduce A ”
if (S’  S , $) in I then action[I, $] = “accept”
if (A    X , a) in I and goto(I, X) = J for a nonterminal X
then goto[I, X] = J
end
all other entries in action and goto are made error
end
103
An Example
0
1
2
3
4
5
6
7
8
9
c
s3
d
s4
$
S
1
C
2
a
s6
s3
r4
s7
s4
r4
5
8
r2
s6
s7
9
r4
r3
r3
r3
104
The Core of LR(1) Items
• The core of a set of LR(1) Items is the set of
their first components (i.e., LR(0) items)
• The core of the set of LR(1) items
{ (C  c  C, c/d),
(C   c C, c/d),
(C   d, c/d) }
is {C  c  C,
C   c C,
Cd}
105
Merging Cores
I3: { (C  c  C, c/d)
(C   c C, c/d)
(C   d, c/d) }
I6: { (C  c  C, $)
(C   c C, $)
(C   d, $) }
I4: { (C  d , c/d) }
I7: { (C  d , $) }
I8: { (C  c C , c/d) }
I9: { (C  c C , $) }
106
LALR(1) Parsing Table Generation
procedure LALR(G’);
begin
for each state I in mergeCore(items(G’)) do begin
if (A    a , b) in I and goto(I, a) = J for a terminal a then
action[I, a] = “shift J”
if (A   , a) in I and A  S’ then
action[I, a] = “reduce A ”
if (S’  S , $) in I then action[I, $] = “accept”
if (A    X , a) in I and goto(I, X) = J for a nonterminal X
then goto[I, X] = J
end
all other entries in action and goto are made error
end
107
An Example
0
1
2
36
47
5
89
c
d
s36 s47
$
S
1
C
2
a
s36 s47
s36 s47
r4
r4
r3
r3
5
89
r4
r2
r3
108
LR Grammars
• A grammar is SLR(1) iff its SLR(1) parsing
table has no multiply-defined entries
• A grammar is LR(1) iff its LR(1) parsing
table has no multiply-defined entries
• A grammar is LALR(1) iff its LALR(1)
parsing table has no multiply-defined
entries
109
Hierarchy of Grammar Classes
Unambiguous Grammars Ambiguous Grammars
LL(k)
LR(k)
LR(1)
LALR(1)
LL(1)
SLR(1)
110
Hierarchy of Grammar Classes
• Why LL(k)  LR(k)?
• Why SLR(k)  LALR(k)  LR(k)?
111
LL(k) vs. LR(k)
• For a grammar to be LL(k), we must be able
to recognize the use of a production by
seeing only the first k symbols of what its
right-hand side derives
• For a grammar to be LR(k), we must be able
to recognize the use of a production by
having seen all of what is derived from its
right-hand side with k more symbols of
lookahead
112
LALR(k) vs. LR(k)
• The merge of the sets of LR(1) items having the
same core does not introduce shift/reduce conflicts
• Suppose there is a shift-reduce conflict on
lookahead a in the merged set because of
1. (A    , a)
2. (B    a , b)
• Then some set of items has item (A    , a) , and
since the cores of all sets merged are the same, it
must have an item (B    a , c) for some c
• But then this set has the same shift/reduce conflict
on a
113
LALR(k) vs. LR(k)
• The merge of the sets of LR(1) items having the
same core may introduce reduce/reduce conflicts
• As an example, consider the grammar
1. S’  S
2. S  a A d | a B e | b A e | b B d
3. A  c
4. B  c
that generates acd, ace, bce, bcd
• The set {(A  c , d), (B  c , e)} is valid for acx
• The set {(A  c , e), (B  c , d)} is valid for bcx
• But the union {(A  c , d/e), (B  c , d/e)}
generates a reduce/reduce conflict
114
SLR(k) vs. LALR(k)
1. S’  S
2. S  L = R
3. S  R
4. L  * R
5. L  id
6. R  L
115
SLR(k) vs. LALR(k)
I0: closure({S’   S}) =
S’   S
SL=R
SR
L*R
L   id
RL
I1: goto(I0, S) = S’  S 
I2: goto(I0, L) =
SL=R
RL
I3: goto(I0, R) =
SR
I4: goto(I0, *) =
L*R
RL
L*R
L   id
I5: goto(I0, id) =
L  id 
FOLLOW(R) = {=, $}
116
SLR(k) vs. LALR(k)
I6: goto(I2, =) =
SL=R
RL
L*R
L   id
I8: goto(I4, L) =
RL
I9: goto(I6, R) =
SL=R
I7: goto(I4, R) =
L*R
117
SLR(k) vs. LALR(k)
I0: closure({(S’   S, $)}) =
(S’   S, $)
(S   L = R, $)
(S   R, $)
(L   * R, =/$)
(L   id, =/$)
(R   L, $)
I1: goto(I0, S) = (S’  S , $)
I2: goto(I0, L) =
(S  L  = R, $)
(R  L , $)
I3: goto(I0, R) =
(S  R , $)
I4: goto(I0, *) =
(L  *  R, =/$)
(R   L, =/$)
(L   * R, =/$)
(L   id, =/$)
I5: goto(I0, id) =
(L  id , =/$)
118
SLR(k) vs. LALR(k)
I6: goto(I2, =) =
(S  L =  R, $)
(R   L, $)
(L   * R, $)
(L   id, $)
I10: goto(I6, L) =
(R  L , $)
I4
I7: goto(I4, R) =
(L  * R , =/$)
I11: goto(I6, *) =
(L  *  R, $)
(R   L, $)
(L   * R, $)
(L   id, $)
I8: goto(I4, L) =
(R  L , =/$)
I12: goto(I6, id) =
(L  id , $)
I5
I9: goto(I6, R) =
(S  L = R , $)
I13: goto(I11, R) =
(L  * R , $)
119
Bison – A Parser Generator
A langauge for specifying parsers and semantic analyzers
lang.y
lang.tab.c
tokens
Bison compiler
C compiler
a.out
lang.tab.c
lang.tab.h (-d option)
a.out
syntax tree
120
Bison Programs
%{
C declarations
%}
Bison declarations
%%
Grammar rules
%%
Additional C code
121
An Example
line  expr ‘\n’
expr  expr ‘+’ term | term
term  term ‘*’ factor | factor
factor  ‘(’ expr ‘)’ | DIGIT
122
An Example
%token DIGIT
%start line
%%
line : expr ‘\n’ {printf(“line: expr \\n\n”);}
;
expr: expr ‘+’ term {printf(“expr: expr + term\n”);}
| term {printf(“expr: term\n”}
;
term: term ‘*’ factor {printf(“term: term * factor\n”;}
| factor {printf(“term: factor\n”);}
;
factor: ‘(’ expr ‘)’ {printf(“factor: ( expr )\n”);}
| DIGIT {printf(“factor: DIGIT\n”);}
;
123
Functions and Variables
• yyparse(): the parser function
• yylex(): the lexical analyzer function. Bison
recognizes any non-positive value as
indicating the end of the input
• yylval: the attribute value of a token. Its
default type is int, and can be declared to be
multiple types in the first section using
%union {
int ival;
double dval;
}
124
Conflict Resolutions
• A reduce/reduce conflict is resolved by
choosing the production listed first
• A shift/reduce conflict is resolved in favor
of shift
• A mechanism for assigning precedences and
assocoativities to terminals
125
Precedence and Associativity
• The precedence and associativity of operators are
declared simultaneously
%nonassoc ‘<’
/* lowest */
%left ‘+’ ‘-’
%right ‘^’
/* highest */
• The precedence of a rule is determined by the
precedence of its rightmost terminal
• The precedence of a rule can be modified by
adding %prec <terminal> to its right end
126
An Example
%{
#include <stdio.h>
%}
%token NUMBER
%left ‘+’ ‘-’
%left ‘*’ ‘/’
%right UMINUS
%%
127
An Example
line :
;
expr:
|
|
|
|
|
|
;
expr ‘\n’
expr ‘+’ expr
expr ‘-’ expr
expr ‘*’ expr
expr ‘/’ expr
‘-’ expr %prec UMINUS
‘(’ expr ‘)’
NUMBER
128
Error Recovery
• Error recovery is performed via error productions
• An error production is a production containing
the predefined terminal error
• After adding an error production,
A   B  |  error 
on encountering an error in the middle of B, the
parser pops symbols from its stack until , shifts
error, and skips input tokens until a token in
FIRST()
129
Error Recovery
• The parser can report a syntax error by calling
the user provided function yyerror(char *)
• The parser will suppress the report of another
error message for 3 tokens
• You can resume error report immediately by
using the macro yyerrok
• Error productions are used for major
nonterminals
130
An Example
line : expr ‘\n’
| error ‘\n’ {yyerror("reenter last line:");
yyerrok;}
;
expr: expr ‘+’ expr
| expr ‘*’ expr
| ‘-’ expr %prec UMINUS
| ‘(’ expr ‘)’
| NUMBER
;
131
共勉
子曰︰里仁為美。擇不處仁，焉得知？
子曰︰不仁者不可以久處約，
不可以長處樂。仁者安仁，知者利仁。
子曰︰朝聞道，夕死可矣！
-- 論語
132

Syntax Analysis

Transcript Syntax Analysis

Directory