Transcript first

컴파일러 입문
제7장
LL 구문 분석

Deterministic Top-Down Parsing
::= deterministic selection of production rules to be applied in
top-down syntax analysis.

One pass nobackup
1. Input string is scanned once from left to right.
2. Parsing process is deterministic.

Top-down parsing with nobackup
::= deterministic top-down parsing.
called LL parsing.
“Left to right scanning and Left parse”
LL Parsing
[2/45]

How to decide which production is to be applied:
sentential form
: 1 2 … i-1Xα
input string : 1 2 … i-1 i i+1 … n
X  1 | 2 ... | k ∈ P일 때,
i를 보고 X-production 중에 unique하게 결정.


the condition for no backtracking : FIRST와 FOLLOW가 필요.
(=> LL condition)

FIRST()
::= the set of terminals that begin the strings derived from .
if   * , then  is also in FIRST().
 FIRST(A) ::= { a∈VT∪{} | A *a,  ∈ V* }.

Computation of FIRST(X), where X ∈ V.
1) if X∈VT, then FIRST(X) = {X}
2) if X∈VN and X  a∈P, then FIRST(X) = FIRST(X) ∪ {a}
if X   ∈ P, then FIRST(X) = FIRST(X) ∪ {}
3) if X  Y1Y2 …Yk ∈ P and Y1Y2 …Yi-1 * ,
i
then FIRST(X) = FIRST(X) ∪ (∪ FIRST(Yj) - {}).
j=1
*  , then FIRST(X) = FIRST(X) ∪{}.
if Y1Y2 …Yk 
ex1)
E  TE
T  FT
F  (E) | id
E  +TE | 
T  FT | 
FIRST(E) = FIRST(T) = FIRST(F) = {(, id}
FIRST(E) = {+, }
FIRST(T) = {, }
ex2)
PROGRAM  begin d semi X end
X  d semi X
X  sY
Y  semi s Y | 
FIRST(PROGRAM) = {begin}
FIRST(X)
= {d,s}
FIRST(Y)
= {semi, }
Text p.275

FOLLOW(A)
::= the set of terminals that can appear immediately to
the right of A in some sentential form. If A can be
the rightmost symbol in some sentential form, then
$ is in FOLLOW(A).
::= {a ∈ VT∪{$} | S  *Aa, ,  ∈ V*}.
※ $ is the input right marker.

Computation of FOLLOW(A)
1) FOLLOW(S) = {$}
2) if A  B ∈ P and  ,
then FOLLOW(B) = FOLLOW(B) ∪ (FIRST() - )
3) if A  B ∈ P or A  B and  * ,
then FOLLOW(B) = FOLLOW(B) ∪ FOLLOW(A).
Text p.277

E  TE
E’  +TE | 
T  FT
T’  FT | 
F  (E) | id

Nullable = { E, T }

FIRST(E) = FIRST(T) = FIRST(F) = {(, id}
FIRST(E) = {+, }
FIRST(T) = {, }

FOLLOW(E) = {),$}
FOLLOW(E') = {),$}
FOLLOW(T) = {+,),$}
FOLLOW(T') = {+,),$}
FOLLOW(F) = {,+,),$}

연습문제 7.4 (3) - p.307
(3)
S  aAa | 
A  abS | c

기본적 개념
::= no backup condition
::= the condition for deterministic parsing of top-down method.
input
: 12 ... i-1i ...n
derived string : 12...i-1X
X  1 | 2 ... | m
 i를 보고 X-production들 중에서 X를 확장할 rule을
결정적으로 선택.

정의: A   | ∈ P,
1. FIRST() ∩ FIRST() = 
2. if   ,*FOLLOW(A) ∩ FIRST() = 
if ∈ FIRST(), FOLLOW(A) ∩ FIRST() = 

A  aBc | Bc | dAa
B  bB | 

FIRST(A) = {a,b,c,d}
FIRST(B) = {b, }

LL condition 검사

FOLLOW(A) = {$,a}
FOLLOW(B) = {c}
1) A  aBc | Bc | dAa에서,
FIRST(aBc) ∩ FIRST(Bc) ∩ FIRST(dAa)
= {a} ∩ {b,c} ∩ {d} = 

2) B  bB |  에서,
FIRST(bB) ∩ FOLLOW(B) = {b} ∩ {c} = 
1), 2)에 의해 LL 조건을 만족한다.

Recursive-descent parsing
::= A top-down method that uses a set of recursive procedures to
recognize its input with no backtracking.

Create a procedure for each nonterminal.
ex) G : S  aA | bB
A  aA | c
B  bB | d
procedure pS;
begin if nextSymbol = ta then
begin getNextSymbol; pA end
else if nextSymbol = tb then
begin getNextSymbol; pB end
else error
end;
procedure pA;
begin if nextSymbol = ta then begin getNextSymbol; pA end
else if nextSymbol = tc then getNextSymbol
else error
end;
procedure pB; ...
/* main */
begin getNextSymbol;
pS;
if nextSymbol = '$' then accept else error
end.
 = aac$
※ procedure call sequence ::= leftmost derivation

The main problem in constructing a recursive-descent syntax
analyzer is the choice of productions when a procedure is first
entered. To resolve this problem, we can compute the lookahead
of each production.

LOOKAHEAD of a production

Definition : LOOKAHEAD(A)
* ∈ VT*}).
* A   
= FIRST({ | S 

Meaning :
the set of terminals which can be generated by
* , then FOLLOW(A) is added to the set.
 and if  

Computing formula: LOOKAHEAD(A  X1X2...Xn)
= FIRST(X1X2...Xn)  FOLLOW(A)

S  aSA | 
Ac

Nullable Set = {S}

FIRST(S) = {a, }
FIRST(A) = {c}

LOOKAHEAD(S  aSA) = FIRST(aSA)  FOLLOW(S) = {a}
LOOKAHEAD(S  ) = FIRST()  FOLLOW(S) = {$,c}
LOOKAHEAD(A  c) = FIRST(c)  FOLLOW(A) = {c}
FOLLOW(S) = {$,c}
FOLLOW(A) = {$,c}
※ LOOKAHEAD를 구하는 순서 :
Nullable => FIRST => FOLLOW => LOOKAHEAD

Definition : A   |  ∈ P,
LOOKAHEAD(A  ) ∩ LOOKAHEAD(A  ) = .


Meaning : for each distinct pair of productions with the same
left-hand side, it can select the unique alternate that
derives a string beginning with the input symbol.
The grammar G is said to be strong LL(1)
if it satisfies the strong LL condition.
ex) G : S  aSA | 
Ac


LOOKAHEAD(S  aSA) = {a}
LOOKAHEAD(S  ) = FOLLOW(S) = {$, c}
LOOKAHEAD(S  aSA) ∩ LOOKAHEAD(S  ) = 
 G는 strong LL(1)이다.

If a grammar is strong LL(1), we can construct a parser for
sentences of the grammar using the following scheme.

Terminal procedure:
a ∈ VT,
procedure pa; /* getNextSymbol => scanner */
begin
if nextSymbol = ta then getNextSymbol
else error
end;
※ getNextSymbol : 스캐너에 해당하는 루틴으로 입력 스트림으로부터
토큰 한 개를 만들어 변수 nextSymbol에 배정한다.
Text p.284

A ∈ VN,
procedure pA;
var i: integer;
begin
case nextSymbol of
LOOKAHEAD(A  X1X2...Xm): for i := 1 to m do pXi;
LOOKAHEAD(A  Y1Y2...Yn): for i := 1 to n do pYi;
:
LOOKAHEAD(A  Z1Z2...Zr): for i := 1 to r do pZi;
LOOKAHEAD(A  ): ;
otherwise: error
end /* case */
end;
LL Parsing
※ The input buffer contains the string to be parsed,
followed by $.

Current input symbol과 stack top symbol사이의 관계에 따라 parsing.

Initial configuration : STACK
INPUT
$S
$
Parsing table(LL) : parsing action을 결정지어 줌.

t erminals
nont erminals X
r
※ M[X,a] = r : stack top symbol이 X이고 current symbol이 a일 때,
r번 생성 규칙으로 expand.

Parsing Actions
X : stack top symbol,
1. if X = a = $,
2. if X = a,
3. if X ∈ VN,
a : current input symbol
then accept.
then pop X and advance input.
then if M[X,a] = r (XABC), then replace X by ABC
else error.
Text p.291
Algorithm Predictive_Parser_Action;
begin
// set ip to point to the first symbol of $;
repeat
// let X be the top stack symbol and a the symbol pointed to by ip;
if X is a terminal or $ then
if X = a then
pop X from the stack and advance ip
else error(1)
else /* X is nonterminal */
if M[X,a] = X  Y1Y2...Yk then
begin pop X from the stack;
push YkYk-1,...,Y1 onto the stack, with Y1 on top;
output the production X  Y1Y2...Yk
end
else error(2)
until X = a = $ /* stack is empty */
end.
• G : 1. S  aSb
2. S  bA
3. A  Aa
4. A  b
string : aabbbb
• Parsing Table:
terminals
a
b
$
S
1
2
.
A
3
4
.
nonterminal
STACK
INPUT
ACTIONS
OUTPUT
$S
aabbbb$ expand 1
1
$bSa
aabbbb$ pop a and advance expand 1
abbbb$ 1
1
$bS
$bbSa
$bbS
$bbAb
$bbA
$bbb
$bb
$b
$
abbbb$ pop a and advance
bbbb$ expand 2
2
bbbb$ pop b and advance
bbb$ expand 4
4
bbb$ pop b and advance
bb$ pop b and advance
b$ pop b and advance
$ Accept
※ How to construct a predictive parsing table for the grammar.

main idea : If A   is a production with a in FIRST(),
then the parser will expand A by  when the
*
current input symbol is a. And if   , then
we should again expand A by  when the
current input symbol is in FOLLOW(A).

parsing table(LL):
VN
VT
a
X
M[X,a] = r : expand X with r-production
blank : error

Construction Algorithm :

for each production A,
1. a ∈ FIRST(), M[A,a] := <A>
2. if  * , then
b ∈ FOLLOW(A), M[A,b] := <A>.

G: 1. E  TE’ 2. E’ +TE’ 3. E’  
4. T  FT’ 5. T’ FT’ 6. T’ 
7. F  (E)
8. F  id

FIRST(E) = FIRST(T) = FIRST(F) = { ( , id }
FIRST(E’) = { + ,  }
FIRST(T’) = {  ,  }

FOLLOW(E) = FOLLOW(E’) = { ) , $ }
FOLLOW(T) = FOLLOW(T’) = { + , ) , $ }
FOLLOW(F) = { + ,  , ) , $ }

Parsing Table:
Terminal
Nonterminal
E
id
(
4
$
3
3
6
6
4
6
8
)
1
2
T’
F
*
1
E’
T
+
5
7

LL(1) Grammar
::= a grammar whose parsing table has no multiply-defined entries.
 multiply 정의되면 어느 rule로 expand해야 할 지 결정할 수 없기 때문에
deterministic하게 parsing할 수 없다.

LL(1) condition:
A   | ,
1. FIRST( ) ∩ FIRST() = .
2. if   *, then FOLLOW(A) ∩ FIRST() = .


G : 1. S  iCtSS’
2. S  a
3. S’  eS
4. S’  
5. C  b
FIRST(S) = {i,a} FIRST(S') = {e, } FIRST(C) = {b}
FOLLOW(S) = {$,e} FOLLOW(S') = {$,e} FOLLOW(C) = {t}
Parsing Table:
Terminals
Nonterminals
S
a
b
2


i
t
$
1
S’
C
e
3,4
5
M[S',e] := <3,4>로 중복으로 정의되었음.
여기서, stack top이 S'이고 input symbol이 e일 때 3번 rule로
expand해야 할 지, 4번 rule로 expand해야 하는지 알 수 없다.
그러므로 G는 LL(1) grammar가 아니다.
4

[예제 7.15] --- text p.298
G : S  aA | abA
A  Ab | a
 : abab