Transcript Algorithm

(American Heritage Dict.)
Parse: v.
To break (a sentence) down into its component parts of
speech with an explanation of the form, function, and
syntactical relationship of each part.
the dog loves the cat
×
×
4/7/2015
the loves dog the cat
the cat the dog loves
IT 327
1
The most practical Parsers:
Predictive parser:
No back tracking.
1. input (token string)
2. Stacks, parsing table
3. output (syntax tree, intermediate codes)
4/7/2015
IT 327
2
Tow kinds of predictive parsers:
Top-Down
The syntax tree is built up from the root
Example: LL(1) parser
Left to right scanning
Leftmost derivations
1 symbol look-ahead
Bottom-Up:
The syntax tree is built up from the leaves
Example: LR(1) parser Left to right scanning
Rightmost derivations
1 symbol look-ahead
4/7/2015
IT 327
3
end-of-file symbol
A left-most derivation
1.
2.
3.
4.
5.
S ASb
S C
Aa
CcC
C
S
 ASb
 aSb
 aASbb
 aaSbb
 aaASbbb
 aaaSbbb
 aaaCbbb
 aaacCbbb
 aaaccCbbb
 aaaccbbb
a
b
c
$
S
1
2
2
2
A
3
5
4
5
C
LL(1) Parsing Table
LL(1) Grammar
aaaccbbb
4/7/2015
IT 327
4
Recursive-descent Parser
all possible terminal and
end-of-file symbols
S():
Switch(token) {
case a: A();S();get(b);
build S  ASb;
break;
case b: C();
build S  C;
break;
case c: C();
built S  C;
break;
case $: C();
built S  C;
break;
}
1.
2.
3.
4.
5.
S ASb
S C
Aa
CcC
C
a
b
c
$
S
1
2
2
2
A
3
5
4
5
C
LL(1) Parsing Table
4/7/2015
IT 327
5
Recursive-descent Parser
A():
Switch(token) {
case a: get(a);
build A  a;
break;
case b: error;
break;
case c: error;
break;
case $: error;
break;
}
1.
2.
3.
4.
5.
S ASb
S C
Aa
CcC
C
a
b
c
$
S
1
2
2
2
A
3
5
4
5
C
LL(1) Parsing Table
4/7/2015
IT 327
6
Recursive-descent Parser
C():
Switch(token) {
case a: error;
break;
case b: build C  ;
break;
case c: get(c);C();
built C  cC;
break;
case $: build C  ;
break;
}
1.
2.
3.
4.
5.
S ASb
S C
Aa
CcC
C
a
b
c
$
S
1
2
2
2
A
3
5
4
5
C
LL(1) Parsing Table
4/7/2015
IT 327
7
LL(1) Parsing
a b c $
S 1 2 2 2
A 3
C
5 4 5
aaaccbbb S();
A();S();get(b);
get(a);S();get(b);
aaaccbbb S();get(b);
A();S();get(b);get(b);
get(a);S();get(b);get(b);
aaaccbbb S();get(b);get(b);
A();S();get(b);get(b);get(b);
get(a);S();get(b);get(b);get(b);
aaaccbbb S();get(b);get(b);get(b);
C();get(b);get(b);get(b);
get(c);C();get(b);get(b);get(b);
aaaccbbb C();get(b);get(b);get(b);
get(c);C();get(b);get(b);get(b);
aaaccbbb C();get(b);get(b);get(b);
get(b);get(b);get(b);
aaaccbbb get(b);get(b);
aaaccbbb get(b);
IT 327
aaaccbbb
1.
2.
3.
4.
5.
S ASb
S C
Aa
CcC
C
S
 ASb
 aSb
 aASbb
 aaSbb
 aaASbbb
 aaaSbbb
 aaaCbbb
 aaacCbbb
 aaaccCbbb
 aaaccbbb
8
LL(1) Grammar
A grammar having an LL(1) parsing table.
i.e., There is no conflict in the parsing table
1.
2.
3.
4.
5.
S ASb
S C
Aa
CcC
C
a
b
c
$
S
1
2
2
2
A
3
5
4
5
C
LL(1) Parsing Table
LL(1) Grammars allow -production.
4/7/2015
IT 327
9
Not every CFG is an LL(1) grammar
<stmt> ::= <if-stmt> | s1 | s2
<if-stmt> ::= if <expr> then <stmt> else <stmt>
| if <expr> then <stmt>
<expr> ::= e1 | e2
if e1 then if e2 then s1 else s2
if (a > 2)
if (b > 1)
b++;
else
a++;
4/7/2015
if (a > 2)
if (b > 1)
b++;
else
a++;
IT 327
10
The recursive-descent parser does not work for
every CFG
E():
Switch(token) {
case id: E();
...
...
...
}
1.
2.
3.
4.
5.
6.
E
E
T
T
F
F
E+T
T
T*F
F
(E)
 id
id+id*id
Left-recursions
4/7/2015
IT 327
11
A left-recursive grammar
1. A  A 
2. A  





Left-recursions
4/7/2015
A’


A
1. A   A’
2. A’   A’
3. A’  
A
A
A
A
Remove left-recursion
A’

A’

A’

IT 327
12
Eliminating left-recursions

1.
2.
3.
4.
5.
6.
4/7/2015
E
E
T
T
F
F
E+T
T
T*F
F
(E)
 id
1.
2.
3.
4.
5.
6.
7.
8.

IT 327

E  T E’
E’  + T E’
E’  
T  F T’
T’  * F T’
T’  
F (E)
F  id

13
An Algorithm for Eliminating immediate left-recursions
Given a CFG G, let A be one of its non-terminal symbols such that A  A 
1.
Add a new non-terminal symbol A’ to G;
2.
For each production A
3.
For each production A
4.
A 
  such that A is not the 1st symbol in 
add A   A’ to G;
 A
replace it by A   A’;
Add A’
4/7/2015
  to G;
IT 327
1. A   A’
2. A’   A’
3. A’  
14
Indirect left-recursions
S
1.
2.
3.
4.
S
S
A
A
Aa
b
Sd
e
d
S
a
A
S
d
b
4/7/2015
a
A
bdada
IT 327
15
Indirect left-recursions
find all immediate
left recursions
repeat
if any, remove the last
non-terminal symbol
Z with rule ZX…
find all immediate
left recursions

1.
2.
3.
4.
5.
S
S
A
A
A
1. S  A a
Aa
 2. S  b
b
3. A  SdA’
Ac
4. A  eA’
Sd
5. A’  cA’
e
 6. A’  
1.
2.
3.
4.
5.
S  SdA’ a
S  eA’a
S b

A’  cA’
A’  
1.
2.
3.
4.
5.
6.
S  eA’aS’
S  bS’
S’  dA’aS’
S’  
A’  cA’
A’  
A   A’
A A
 A’   A’
A 
A’  
4/7/2015
IT 327
16
An Algorithm for Eliminating left-recursions
Given a CFG G, let A1, A2, ..... An, be its nonterminal symbols
for i:= n down to 1 do {
for j := 1 to i-1 do {
For each production
// find one level of indiretion
Ai

Aj ω do {
Aj  , add Ai   ω to the grammar;
Remove Ai  Aj ω by
For each production
}
} // end for j
Eliminate the immediate left-recursion caused by
} // end for i
4/7/2015
IT 327
Ai
17
A Grammar for if statements
1.
2.
3.
4.
5.
S
S
E
E
C





iCtSE
a
eS

b
a
S
e
2
i
t
$
1
E
3,4
C
4
5
Is it an LL(1) grammar?
Is there an LL(1) parsing table for it?
4/7/2015
b
IT 327
No!
18
ibtibtae……
A Grammar for if statements
1.
2.
3.
4.
5.
S
S
E
E
C
S 
 i
...
 i
 i
4:  i
 i
4/7/2015





iCtSE
a
eS

b
a
S 2
E
C
...
b t S E…
b
b
b
b
t
t
t
t
ibtSE
ibtaE
ibta
ibta
E…
E…
E…
eS…
b
e
i
t
$
1
3,4
4
Why there is a
conflict?
5
S 
 i
...
 i
 i
3:  i
IT 327
...
b t S E…
b t ibtSE E…
b t ibtaE E…
b t ibtaeS E…
19
A Grammar for if statements
a
1.
2.
3.
4.
5.
S
S
E
E
C





iCtSE
a
eS

b
S
b
e
2
i
t
1
E
3,4
C
$
4
5
Can we have an unambiguous equivalent grammar for this
grammar?
Yes!
In general, No! Some inherently ambiguous languages exist.
Can we write a program to test whether a given grammar is
ambiguous?
No!
Can we write a program to get an unambiguous equivalent grammar from
any grammar of a language that is known to be not inherently ambiguous?
4/7/2015
IT 327
No!
20
Is there an LL(2) Grammar ? Yes!
We need to look two symbols ahead in order to
determine which rule should be used.
{ ambnc | m ≥ 1 and n ≥ 0 }
1.
2.
3.
4.
5.
S
A
A
B
B
AB
 aA
a
bB
c
S
A a b c
2 3 3
B
b
c
4
5
LL(2) Parsing Table
a a a a a b b b b c
4/7/2015
a
1
IT 327
21
LL(2) Parsing Table
1.
2.
3.
4.
5.
S
A
A
B
B
AB
 aA
a
bB
c
S();
A();B();
get(a);A();B();
A();B();
get(a);A();B();
A();B();
get(a);B()
B();
get(b);B();
B();
get(c);
4/7/2015
a
LL(2) Parsing
a a a b
a a a b
a a a b
a a b
a a b
a b
a b
b
b
IT 327
S
c
c
c
c
c
c
c
c
c
c
c
b c
1
A ab c
23 3
B
4 5
22
Is there an LL(1) grammar equivalent to the
following LL(2) grammar?
Yes
{ ambnc | m ≥ 1 and n ≥ 0 }
1.
2.
3.
4.
5.
S
A
A
B
B
1.
2.
3.
4.
5.
AB
 aA
a
bB
c
S
A
A
B
B
aAB
 aA

bB
c
a a a a a b b b b c
4/7/2015
IT 327
23
Every left-recursive grammar is not an LL(k) grammar
But we can effectively find an equivalent one
1.
2.
3.
4.
1. S  S A
2. S  a
3. A  b
S
SA
SAA
SAAA
SAAAA
aAAAAA
....
abbbbbb
4/7/2015
1.
2.
3.
4.
5.
6.
E
E
T
T
F
F
S
S’
S’
A
 a S’
 AS’

b
1. E  T E’
2. E’  + T E’
E+T
3. E’  
T
4. T  F T’
T*F
5. T’  * F T’
F
6. T’  
(E)
7. F  ( E )
 id
8. F  id
IT 327 Are we happy with this? 24
Does any LL(2) grammar always has an equivalent
LL(1) grammar? No
LL(k) grammar, k  2
LL(2) grammar
1.
2.
3.
4.
S
S
A
A
1.
2.
3.
4.
aSA

 abS
c
S
S
A
A
aSA

 ak-1bS
c
no equivalent LL(k-1) grammar
KuriKi-Sunoi [1969]
no equivalent LL(1) grammar
LL(1)  LL(2)  LL(3)  ..... LL(k)  LL(k+1)  ...
4/7/2015
IT 327
25
LL(k) grammar, k  2
1.
2.
3.
4.
S
S
A
A
aSA

 ak-1bS
c
(KuriKi-Sunoi [1969])
This grammar is inherently ambiguous.
Is there an unambiguous CFG that
is not an LL(k) grammar?
Yes
There exists DCFL that is not LL(k) -- Stearns [1970]
{ an | n ≥ 0 }  { anbn | n ≥ 0 }
4/7/2015
IT 327
26
LL(1) Parser Implementation
1.
2.
3.
4.
5.
6.
7.
8.
E  T E’
E’  + T E’
E’  
T  F T’
T’  * F T’
T’  
F (E)
F n
n
E
(
4
$
3
3
6
6
4
6
8
)
1
2
T’
F
*
1
E’
T
+
5
7
p.s. Let n be any
positive integer
less than 32767
Programming Assignment
Details will be announced later.
4/7/2015
IT 327
27