Context-free Languages - 法政大学 [HOSEI UNIVERSITY]

Transcript Context-free Languages - 法政大学 [HOSEI UNIVERSITY]

Context-free Languages
http://cis.k.hosei.ac.jp/~yukita/
Context-free grammar G1
A  0 A1
A B
B #
A grammarconsistsof su bstitu ti
on ru le s
which are oftencalled produ ctionrules.
A and B are variable.s
Especially, A is called th estart variable.
0,1, and # are te rm in als
.
2
Parse tree for 000#111 in grammar G1
A
A
A
A
B
0
0
0
#
1
1
1
3
The English Language
<SENTENSE> 
<NOUN-PHRASE> <VERB-PHRASE>
<NOUN-PHRASE> 
<CMPLX-NOUN>| <CMPLX-NOUN> <PREP-PHRASE>
<VERB-PHRASE> 
<CMPLX-VERB>| <CMPLX-VERB> <PREP-PHRASE>
<PREP-PHRASE> 
<PREP> <CMPLX-NOUN>
<CMPLX-NOUN> 
<ARTICLE> <NOUN>
<CMPLX-VERB> 
<VERB>|<VERB><NOUN-PHRASE>
<ARTICLE> 
a | the
<NOUN> 
boy | girl | flower
<VERB> 
touches | likes | sees
<PREP> 
with
4
Definition 2.1
A context free gram m aris a 4 - tuple(V , , R, S ), where
1. V is a finiteset of variables,
2.  is a finiteset of term inals,
3. R is a finiteset of rules, where
R  {" A  w"| A  V , w  (V  )*}, and
4. S  V is thestart sym bol.
If u, v and w are stringsof variablesand terminals, and
A  w is a rule of thegrammar,we say thatuAv yields
*
uwv, writtenuAv  uwv. Writeu  v if u  v or
u  u1  u2    uk  v.
*
T helanguageof G is L(G )  {w   | S  w}.
*
5
Context-free Languages
If a language is generatedby a context- free grammar,we say
that thelanguage is a context- free language.
Any regular language turnsout tobe a context- free language.
Assign a variableRi for each stateqi of theDFA.
Add therule Ri  aRj if  (qi , a)  q j .
Add therule Ri   if Ri is an acceptstate.
6
Context Dependency
Let A, B, C  V , and w  (V  )* .
B  w is contextindependent, while ABC  AwC is
contextdependent.
In t helatterrule, A _ C is consideredthecontextfor B
to yield w.
7
Example 2.2 G3
G3  ({S}, {a,b}, R, S ), where R consist sof only one rule
S  aSb | SS |  .
abab, aaabbb, and aababbare in L(G3 ).
If you regard a and b as " (" and " )", respectively,
thelanguage consist sof all stringsof properlynested
parentheses.
8
Example 2.3 G4
G4  (V , , R, EXP R ).
V  { EXP R , T ERM , FACT OR },   {a,,, (, )}.
 EXP R

R   T ERM
 FACT OR




EXP R  T ERM | T ERM 

T ERM  FACT OR | FACT OR 

( EXP R ) | a

9
Parse tree for a+aXa
<EXPR>
<EXPR>
<TERM>
<TERM>
<TERM>
<FACTOR>
<FACTOR>
a
<FACTOR>
+
a
X
a
10
Parse tree for (a+a)Xa
<EXPR>
<TERM>
<TERM>
<FACTOR>
<FACTOR>
<EXPR>
<EXPR>
<TERM>
<TERM>
<FACTOR>
(
a
<FACTOR>
+
a
)
X
a
11
Ambiguity in grammar G5
EXPR  EXPR  EXPR
| EXPR  EXPR
|  EXPR 
| a
12
A parse tree for a+aXa
<EXPR>
<EXPR>
<EXPR>
a
<EXPR>
<EXPR>
+
a
X
a
13
Another parse tree for a+aXa
<EXPR>
<EXPR>
<EXPR>
<EXPR>
<EXPR>
a
+
a
X
a
14
Different derivations for the same
parse tree
E  E  E
<EXPR>
 { E  E } E
 {a  E }  E
<EXPR>
<EXPR>
<EXPR>
 {a  a}  E
 {a  a}  a
E  E  E
<EXPR>
 E a
 { E  E } a
a
+
a
X
a
 {a  E }  a
 {a  a}  a
15
Leftmost Derivation
• If a string has two different parse trees, we
say that the grammar is ambiguous.
• A derivation of a string is a leftmost
derivation if at every step the leftmost
remaining variable is the one replaced.
• Every parse tree has unique leftmost
derivation.
16
Definition 2.4 Ambiguity
A string w is derived am biguousl
y by a context- free grammarG
if it has two or moredifferentleftmostderivations. GrammarG
is am biguousif it generatessomestringambiguously.
Remark: An ambiguous grammarGa and non - ambiguous
grammarGna can generatethesame language.T hereare languages
thatcan not be generatedby any non - ambiguous language,
in which case we say that he
t language is inherentlyam biguous
.
17
Definition 2.5 Chomsky normal form
A context- free grammaris in Chom skynorm al form if every
rule is of thefollowingforms:
A  BC
(where A V , and B, C V  {S}),
Aa
(where A V , and a  ), and
S  .
18
Theorem 2.6
Any context- free language is generatedby a context- free
grammarin Chomskynormalform.
19
Proof of Th. 2.6
1. Add a new start stateS 0 and therule S 0  S .
T henew start symbolneveroccur in theright handsides.
2. Removea rule A   where A  S .
For such A , if thereis a rule R  uAv, add therule R  uv.
If thereis a rule R  uAvAw, add therules R  uvAw,
R  uAvw, and R  uvw. And so on.
If we have therule R  A, we add R   unless we have
previouslyremovedR   .
We repeat this processuntil we all eliminateA   rules.
20
Proof of Th. 2.6
3. We removea unit rule A  B. For such A, B, and any B  u,
we add A  u unless this was a unit rule previouslyremoved.
We repeat the
se stepsuntil we eliminateall unit rules.
4. We replaceeach A  u1u2 uk where k  3 with therules
A  u1 A1 ,
A1  u2 A2 ,
A2  u3 A3 , ,
Ak  2  uk 1uk .
Here, A1 , A2 ,, Ak  2 are new variables.
If k  2, we replaceany terminal ui in t hepreceedingrules
with t henew variableU i and add therule U i  ui .
21
Example 2.7 G6
1. T heoriginalG6 is shown on theleft.T heresult of applying
thefirst st ep to makea new st art symbolappearson theright .
S0  S
S  ASA| aB
A B|S
B b|
S  ASA| aB
A B|S
B b|
22
Example 2.7 Step 2
2. Remove rules B   , and introducecompensations for it.
S0  S
S0  S
S  ASA| aB
S  ASA| aB | a
A B|S
Bb
A B|S |ε
Bb
Remove rules A   , and introducecompensations for it.
S0  S
S0  S
S  ASA| aB | a
S  ASA| aB | a | SA | AS | S
A B|S
A B|S
Bb
Bb
23
Example 2.7 Step 3
3. Removeunit rules S  S on theleft.RemoveS 0  S on theright .
S0  S
S 0  ASA| aB | a | SA | AS
S  ASA| aB | a | SA | AS
S  ASA| aB | a | SA | AS
A B|S
A B|S
Bb
Bb
Removeunit rules A  B. RemoveA  S .
S 0  ASA| aB | a | SA | AS
S 0  ASA| aB | a | SA | AS
S  ASA| aB | a | SA | AS
S  ASA| aB | a | SA | AS
Ab|S
A  b | ASA| aB | a | SA | AS
Bb
Bb
24
Example 2.7 Step 4
S 0  ASA| aB | a | SA | AS
S 0  AA1 | UB | a | SA | AS
S  ASA| aB | a | SA | AS
S  AA1 | UB | a | SA | AS
A  b | ASA| aB | a | SA | AS
A  b | AA1 | UB | a | SA | AS
A1  SA
U a
Bb
25
Pushdown Automata
finite
automaton
pushdown
automaton
state
control
a
a
b
b
input
state
control
a
a
b
b
input
x
y
stack
z
...
26
Definition 2.8
A pushdown automatonis a 6 - tuple(Q, , ,  , q0 , F ),
where Q, , , and F are all finitesets, and
1. Q is theset of states,
2.  is theinput alphabet,
3.  is thestack alphabet,
4.  : Q      2Q is the transit ion function,
5. q0 is thestart state,and
6. F  Q is theset of acceptstates.
27
Computation
T hemachineacceptsinput w  w1w2  wm , where wi  
if sequences of statesr0 , r1 ,  , rm  Q and strings s0 , s1 ,  , sm  *
satisfy the next t hreeconditions. T hestrings si representthe
sequence of stack contentsthatM has on theacceptingbranch
of thecomput ation.
1. r0  q0 and s0   .
2. (ri 1 , b)   (ri , wi 1 , a ), where si  at and si 1  bt for some
a, b   , t  * .
3. rm  F .
28
Theorem 2.12
• A language is context free if and only if
some pushdown automaton recognizes it.
• Lemma 2.13
– If a language is context free, then some
pushdown automaton recognizes it.
• Lemma 2.15
– If a pushdown automaton recognizes some
language, then it is context free.
29
Proof of Lemma 2.13
CFL Recognized by PDA
We constructP DA P  (Q, , ,  , q1 , F ).
Let (r , u )   (q, a, s ) , where u  u1  ul ,
be shorthandnotationfor
 (q, a, s ) contains (ql , ul ),
 (q1 ,  ,  )  {( q2 , ul 1 )},
 (q2 ,  ,  )  {( q3 , ul  2 )},

 (ql 1 ,  ,  )  {( r , u1 )}.
30
Proof of Lemma 2.13
We put Q  {qstart , qloop , qaccept}  E , where E is theset of states
needed to implementtheshorthand.
We define as follows.
 (qstart ,  ,  )  {( qloop , S $)}
 (qloop ,  , A)  {( qloop , w) | where A  w is a rule in R}
 (qloop , a, a )  {( qloop ,  )}
 (qloop ,  ,$)  {( qaccept,  )}
31
State Diagram of P
qstart
,S$
,Aw for rule Aw
qloop
a,a
for terminal a
,$
qaccept
32
SaTb | b
Example 2.14
qstart
,S$
TTa | 
,Sb
,T
,Ta
,T
,a
qloop
,$
,Sb
,T
qaccept
a,a
b,b
33
Proof of Lemma 2.15
Recognized by PDA  CFL
We constructa grammarG.
We can assume without losing generarit ythatmachineP satisfies
thefollowingconditions.
1. It has a single acceptstate,qaccept.
2. It emptiesits stack beforeaccepting.
3. Each transition eitherpushes a symbolontothestack (a push move)
or popsone off thestack (a pop move),but does not do both
at thesame time.
34
Proof of Lemma 2.15
Let P  (Q, , ,  , q0 , {qaccept}) be given. W econst ructG.
P ut V  { Apq | p, q  Q},
S  Aq0 ,qaccept , and t herules are :
(1) For each p, q, r , s  Q, t  , and a, b   ,
if (r , t )   ( p, a,  ) and (q,  )   ( s, b, t ),
put t herule Apq  aArs b in G.
(2) For each p, q, r  Q, put t herule Apq  Apr Arq in G.
(3) For each p  Q, put t herule App   in G.
35
ApqAprArq
generated
stack
by Apq
height
input
string
p
q
r
generated
generated
by Apr
by Arq
36
ApqaArsb
generated
stack
by Apq
height
input
string
r
s
q
p
a
b
generated
by Ars
37
Claim 2.16 If Apq generates x, then x can bring P
from p with empty stack to q with empty stack.
Proof. Inductionon thenumber of stepsin thederivationof x from Apq .
Basis : A derivationwith a single step must use a rule whose RHS contains
no variables. T heonly rules in G as such is App   . Input takesP from p
with emptystack top with emptystack.
Induction Step : (Assume k and provek  1)
*
Assume that Apq  x with k  1 steps.T hefirst step in thisderivation
is either Apq  aArs b or Apq  Apr Arq .
38
Proof (continued)
Proof. Inductionon thenumber of stepsin thederivationof x from Apq .
Basis : A derivationwith a single step must use a rule whose RHS contains
no variables. T heonly rules in G as such is App   . Input takesP from p
with emptystack top with emptystack.
Induction Step : (Assume k and provek  1)
*
Assume that Apq  x with k  1 steps.T hefirst step in thisderivation
is either Apq  aArs b or Apq  Apr Arq .
39
Proof (continued)
Case Apq  aArs b :
Let Ars generatey, which should completewithingk steps.We have x  ayb.
T heinductionhypothesistellsus that P can go from r on emptystack
to s on emptystack. Because Apq  aArs b is a rule of G,
(r , t )   ( p, a,  ) and (q,  )   ( s, b, t ).
T herefore,x can bring P from p with emptystack toq with emptystack.
40
Proof (continued)
Case Apq  Apr Arq :
*
*
Let x  yz, Apr  y and Arq  z. Both derivations should completewithin k steps.
T heinductionhypothesistellsus that
y can bring P from p to r , and
z can bring P from r to q, with emptystacksat thebeginningand end.
Hence,
x can bring it from p to q with emptystacksat thebeginningand end.
41
Claim 2.17 If x can bring P from p with empty
stack to q with empty stack, Apq generates x.
Proof. Inductionon thenumber of stepsin thecomputation of P that
goes from p to q with emptystackson input x.
Basis : T hecomputation has 0 steps.It startsand ends at thesame state,
say p. P only has timeto read x   .
By construction,G has therule App   .
Induction step : Assume truefor computation lengthat most k  0, and
provetruefor computations of length k  1.
Suppose that x brings P from p to q in k  1 steps with emptystacksat
the beginningand end.
42
Proof (continued)
Case T hestack is emptyonly at thebeginning and end :
T hen,(r , t )   ( p, a,  ) and (q,  )   ( s, b, t ) for some r , q  Q, a, b  ,
and t  .
So, Apq  aArs b is in G.
Let x  ayb. T hen, input y brings P from r to s within k-1 steps with
emptystacksat thebeginning and end.
*
*
T heinduction hypothesistellsus that Ars  y. Hence, Apq  x.
43
Proof (continued)
Case T hest ack becomesempt yat st ater other thanthebeginningor end :
Let x  yz, where y brings P from p to r , and z brings P from r to q.
T he two computations completewithin k st eps.
*
*
T heinductionhypothesistellsus that Apr  y and Arq  z.
*
Because Apq  Apr Arq is in G, Apq  yz  x.
44
Corollary 2.18 Every regular language is context
free.
context-free languages
regular
languages
45
Theorem 2.19 [Pumping Lemma]
If A is a context- free language, then ther
e is a number p (thepumpinglength)
where, if s is any stringin A of length at least p, thens may be devided into
s  uvxyz satisfyingthefollowingconditions:
1. For each i  0, uvi xyi z  A ,
2. | vy | 0, and
3. | vxy | p.
46
T
Proof
R
R
u
v
x
y
T
T
R
R
R
u
z
x
v
R
y
v
x
y
z
u
z
47
Proof
Let b  2 be themaximumnumber of symbolsin theRHS of a rule.
In any parse tree,no node can havemore thanb children.
If theheight of theparse treeis at most h, thelength of stringgenerated
is at most b h .
We set p  b|V |2  b|V |1. T hen,a parse tree (havingthesmallest number of
nodes) for any string s of length at least p requires height at least | V | 2.
T helongest path must havelength at least | V | 2, which
must haveat least | V | 1 variablessince only leavesconsist of terminals.
T hussome variable,say R, repeat s.
For lat er convenience, let R be theone thatrepeat sthelowest in thepath.
Condition1 is obvious.
48
Proof
Condition2 requires thatboth v and y are not  . If they were, theminimality
of  is broken.See thefigure in thepreviousslide.
Condition3 :
We chose R so thatit repeatsin thebottom| V | 1 variableson thepath.
So, thesubtree where R generatesvxy is at most | V | 2 high. A treeof thisheight
can generatea stringof length at most b|V |2  p.
49
Example 2.20 B  {a nbnc n | n  0}. Let s  a pb pc p  uvxyz.
ap
Case
v
v and y are
homogeneous
bp
cp
y
v
y
v
y
v
y
v
y
Case
v or y is
heterogeneous
or
v
y
50
Example 2.21 C  {aib j ck | 0  i  j  k}.
Let s  a pb pc p .
Case v and y are homogeneous
ap
v
bp
cp
y
See if uv2xy2z
breaks the balance.
v
y
v
y
v
See if uv0xy0z=uxz
breaks the balance.
y
v
y
Case v or y is heterogeneous
or
See if uv2xy2z or
uxz destroys the
51
order.
v
y
Example 2.22
0p
1p
D  {ww | w {0,1}*}. Let s  0 p1p 0 p1p.
0p
1p
0p
1p
See if the first half of uv2xy2z
begins with 0 while the latter
half begins with 1.
vxy
0p
1p
vxy
0p
1p
0p
1p
See if the first half of uv2xy2z
ends with 0 while the latter half
ends with 1.
See if uv0xy0z=uxz =0p1i0j1p,
where i and j can not both be p.
vxy
52