Equivalence of PDA and CFG
Download
Report
Transcript Equivalence of PDA and CFG
Lecture 17
Oct 25, 2011
• Section 2.1 (push-down automata)
• Section 2.2 (pumping lemma for context-free
languages)
Pushdown Automata
Pushdown automata are for context-free languages
what finite automata are for regular languages.
PDAs are recognizing automata that have a
single stack (= memory):
Last-In First-Out pushing and popping
Note: PDAs are nondeterministic.
Informal Description PDA (1)
input w = 00100100111100101
internal state
set Q
stack
x
y
y
z
x
The PDA M reads w
and stack element.
Depending on
- input wi ,
- stack sj , and
- state qk Q
the PDA M:
- jumps to a new state,
- pushes an element
(nondeterministically)
Informal Description PDA (2)
input w = 00100100111100101
internal state
set Q
After the PDA has
read complete input,
M will be in state Q
x
y
y
z
x
If possible to end in
accepting state FQ,
then M accepts w
stack
Formal Description PDA
A Pushdown Automata M is defined by a
six tuple (Q,,,,q0,F), with
• Q finite set of states
• finite input alphabet
• finite stack alphabet
• q0 start state Q
• F set of accepting states Q
• transition function
: Q P (Q )
PDA for L = { 0n1n | n0 }
Example 2.14:
The PDA first pushes “ $ 0n ” on stack.
Then, while reading the 1n string, the
zeros are popped again.
If, in the end, $ is left on stack, then “accept”
q1
q4
, $
0, 0
q2
1, 0
, $
q3
1, 0
Machine Diagram for 0n1n
q1
q4
, $
0, 0
q2
1, 0
, $
q3
1, 0
On w = 000111 (state; stack) evolution:
(q1; ) (q2; $) (q2; 0$) (q2; 00$)
(q2; 000$) (q3; 00$) (q3; 0$) (q3; $)
(q4; ) This final q4 is an accepting state
Machine Diagram for 0n1n
q1
q4
, $
0, 0
q2
1, 0
, $
q3
1, 0
On w = 0101 (state; stack) evolution:
(q1; ) (q2; $) (q2; 0$) (q3; $) (q4; ) …
But we still have part of input “01”.
There is no accepting path.
Another Example of a PDA
Another example of PDA
Consider the language over the alphabet {a, b}:
L = { w | #a(w) = #b(w) }
(#a(w) stands for the number of a’s in w.)
PDA design intuition: push a symbol 1 on seeing a’s, pop
on seeing b’s.
Problem: what if we see a lot of b’s in the start, and a’s
come later?
Can change the role. Push on b, pop on a.
Need to know which one – using two different states.
Another example of PDA
Consider the language over the alphabet {a, b}:
L = { w | #a(w) = #b(w) }
One more PDA – for even length palindromes
L = { w wR | w is in {0, 1}* }
PDAs versus CFL
Theorem 2.20: A language L is context-free if and only if
there is a pushdown automata M that recognizes L.
Two step proof:
1) Given a CFG G, construct a PDA MG
2) Given a PDA M, make a CFG GM
Equivalence of PDA and CFG (0)
Part 1: For every CFG, we can build an equivalent PDA.
General construction: each rule of CFG A w is
included in the PDA’s move.
Equivalence of PDA and CFG (1)
Part 1: For every CFG, we can build an equivalent PDA.
Example: (page 115 of text)
NPDA, CFG equivalence
Proof of (): L is recognized by a NPDA
implies L is described by a CFG.
– harder direction
– first step: convert NPDA into “normal form”:
• single accept state
• empties stack before accepting
• each transition either pushes or pops a symbol
2011
NPDA, CFG equivalence
– main idea: non-terminal Ap,q generates exactly the
strings that take the NPDA from state p (w/ empty
stack) to state q (w/ empty stack)
– then Astart, accept generates all of the strings in the
language recognized by the NPDA.
2011
NPDA, CFG equivalence
• Two possibilities to get from state p to q:
generated by Ap,r
generated by Ar,q
stack
height
p
input
r
q
abcabbacacbacbacabacabbabbabaacab
bbababaacaccaccccc
string taking NPDA from p to q
2011
NPDA, CFG equivalence
• NPDA P = (Q, Σ, , δ, start, {accept})
• CFG G:
– non-terminals V = {Ap,q : p, q Q}
– start variable Astart, accept
– productions:
for every p, r, q Q, add the rule
Ap,q → Ap,rAr,q
2011
NPDA, CFG equivalence
• Two possibilities to get from state p to q:
generated by Ar,s
stack
height
r
p
input
push d
s
pop d
q
abcabbacacbacbacabacabbabbabaacab
bbababaacaccaccccc
string taking NPDA from p to q
2011
NPDA, CFG equivalence
• NPDA P = (Q, Σ, , δ, start, {accept})
from state p, read a, push d,
• CFG G:
move to state r
– non-terminals V = {Ap,q : p, q Q}
– start variable Astart, accept
from state s, read b, pop d,
move to state q
– productions:
for every p, r, s, q Q, d , and a, b (Σ {ε})
if (r, d) δ(p, a, ε), and
(q, ε) δ(s, b, d), add the rule
Ap,q → aAr,sb
2011
NPDA, CFG equivalence
• NPDA P = (Q, Σ, , δ, start, {accept})
• CFG G:
– non-terminals V = {Ap,q : p, q Q}
– start variable Astart, accept
– productions:
for every p Q, add the rule
Ap,p → ε
NPDA, CFG equivalence
•
two claims to verify correctness:
1. if Ap,q generates string x, then x can take
NPDA P from state p (w/ empty stack) to q
(w/ empty stack)
2. if x can take NPDA P from state p (w/ empty
stack) to q (w/ empty stack), then Ap,q
generates string x
2011
NPDA, CFG equivalence
1. if Ap,q generates string x, then x can take
NPDA P from state p (w/ empty stack) to q
(w/ empty stack)
– induction on length of derivation of x.
– base case: 1 step derivation. must have only
terminals on rhs. In G, must be production of
form Ap,p → ε.
2011
NPDA, CFG equivalence
1. if Ap,q generates string x, then x can take
NPDA P from state p (w/ empty stack) to q
(w/ empty stack)
– assume true for derivations of length at most k,
prove for length k+1.
– verify case: Ap,q → Ap,rAr,q →k x = yz
– verify case: Ap,q → aAr,sb →k x = ayb
2011
NPDA, CFG equivalence
2. if x can take NPDA P from state p (w/
empty stack) to q (w/ empty stack), then
Ap,q generates string x
– induction on # of steps in P’s computation
– base case: 0 steps. starts and ends at same state
p. only has time to read empty string ε.
– G contains Ap,p → ε.
2011
NPDA, CFG equivalence
2. if x can take NPDA P from state p (w/
empty stack) to q (w/ empty stack), then
Ap,q generates string x
– induction step. assume true for computations of
length at most k, prove for length k+1.
– if stack becomes empty sometime in the middle
of the computation (at state r)
• y is read going from state p to r
y)
• z is read going from state r to q
z)
• conclude: Ap,q → Ap,rAr,q →* yz = x
2011
(Ap,r→*
(Ar,q→*
NPDA, CFG equivalence
2. if x can take NPDA P from state p (w/
empty stack) to q (w/ empty stack), then
Ap,q generates string x
– if stack becomes empty only at beginning and
end of computation.
•
•
•
•
2011
first step: state p to r, read a, push d
go from state r to s, read string y (Ar,s→* y)
last step: state s to q, read b, pop d
conclude: Ap,q → aAr,sb →* ayb = x
PDACFG conversion
Summary of the construction:
Non-CF Languages
The language L = { anbncn | n0 } does not appear to be
context-free.
Informal: A PDA can compare #a’s with #b’s. But by the
time b’s are processed, the stack is empty. Not possible to
count a’s with c’s.
The problem of A * vAy :
If S * uAz * uvAyz * uvxyz L,
then S * uAz * uvAyz * … * uviAyiz
* uvixyiz L as well, for all i=0,1,2,…
Pumping Lemma for CFLs
Idea: If we can prove the existence of derivations
for elements of the CFL L that use the step
A * vAy, then a new form of ‘v-y pumping’
holds: A * vAy * v2Ay2 * v3Ay3 * …)
Observation: We can prove this existence if the parsetree is tall enough.
Recall Parse Trees
Parse tree for S AbbcBa * cbbccccaBca
cbbccccacca
S
A
b b
c B
a
c c
a B c
c
A
c
c
Pumping a Parse Tree
S
A
A
v
x
y
u
z
If s = uvxyz L is long, then its parse-tree is tall.
Hence, there is a path on which a variable A
repeats itself. We can pump this A–A part.
A Tree Tall Enough
Let L be a context-free language, and let G be its
grammar with maximal b symbols on the right side of
the rules: A X1…Xb
A parse tree of depth h produces a string with maximum
length of bh. Long strings implies tall trees.
Let |V| be the number of variables of G. If h = |V|+2 or
bigger, then there is a variable on a ‘top-down path’ that
occurs more than once.
uvxyz L
S
A
A
u
v
x
y
z
By repeating the A–A part we get…
uv2xy2z L
S
A
A
u
v
A
R
x
y
z
y
x
v
… while removing the A–-A gives…
uxz L
S
A
x
u
z
In general uvixyiz L for all i=0,1,2,…
Pumping Lemma for CFL
For every context-free language L, there is a pumping
length p, such that for every string sL and |s|p, we can
write s = uvxyz with
1) uvixyiz L for every i{0,1,2,…}
2) |vy| 1
3) |vxy| p
Note that
1) implies that uxz L
2) says that v and y cannot be both empty strings
Condition 3) is not always used. (It is not crucial part of
pumping lemma, but helps to reduce the number of cases.)
Formal Proof of Pumping Lemma
Let G=(V,,R,S) be the grammar of a CFL.
Maximum size of rules is b2: A X1…Xb
A string s requires a minimum tree-depth logb|s|.
If |s| p=b|V|+2, then tree-depth |V|+2, hence
there is a path and variable A where A repeats
itself: S * uAz * uvAyz * uvxyz
It follows that uvixyiz L for all i=0,1,2,…
Furthermore:
|vy| 1 because tree is minimal
|vxy| p because bottom tree with p leaves
has a ‘repeating path’
Pumping lemma for {anbncn | n >= 0}
Assume that B = {anbncn | n0} is CFL
Let p be the pumping length, and s = apbpcp B
P.L.: s = uvxyz = apbpcp, with uvixyiz B for all i0
Options for vxy:
1) The strings v and y are uniform
(v=a…a and y=c…c, for example).
Then uv2xy2z will not contain the same number
of a’s, b’s and c’s, hence uv2xy2zB
2) At least one of v or y is not uniform. (i.e., it has at
least two different symbols occurring in it).
Then uv2xy2z will not be a…ab…bc…c
Hence uv2xy2zB
Pumping lemma applied to {anbncn} continued
Assume that B = {anbncn | n0} is CFL
Let p be the pumping length, and s = apbpcp B
P.L.: s = uvxyz = apbpcp, with uvixyiz B for all i0
We showed: For every way of partitioning s into uvxyz,
there is an i such that uvixyiz is not in B. Contradiction.
B is not a context-free language.
Another example
Proof that C = {aibjck | 0ijk } is not context-free.
Let p be the pumping length, and s = apbpcp C
P.L.: s = uvxyz, such that uvixyiz C for every i 0
vxy can’t have a’s and c’s. Why?
So only two options for vxy:
1) vxy belongs to a*b*, then the string uv2xy2z has
not enough c’s, hence uv2xy2zC
2) vxy belongs to b*c*, then the string uv0xy0z = uxz
has too many a’s, hence uv0xy0zC
Contradiction: C is not a context-free language.
D = { ww | w{0,1}* } (Ex. 2.22)
Carefully take the strings sD.
Let p be the pumping length, take s=0p1p0p1p.
Three options for s=uvxyz with 1 |vxy| p:
1) If a part of y is to the left of | in 0p1p|0p1p, then second
half of uv2xy2z starts with “1”
2) Same reasoning if a part of v is to the right
of middle of 0p1p|0p1p, hence uv2xy2z D
3) If x is in the middle of 0p1p|0p1p, then uxz
equals 0p1i 0j1p D (because i or j < p)
Contradiction: D is not context-free.
Pumping lemma for CFG - remarks
Using the CFL pumping lemma is more difficult
than the pumping lemma for regular languages.
You have to choose the string s carefully, and divide the
options efficiently.
Additional CFL properties would be helpful (like we had
for regular languages).
What about closure under standard operations?
Union Closure Properties
Lemma: Let A1 and A2 be two CF languages, then the
union A1A2 is context free as well.
Proof: Assume that the two grammars are
G1=(V1,,R1,S1) and G2=(V2,,R2,S2).
Construct a third grammar G3=(V3,,R3,S3) by:
V3 = V1 V2 { S3 } (new start variable) with
R3 = R1 R2 { S3 S1 | S2 }.
It follows that L(G3) = L(G1) L(G2).
Intersection, Complement?
Let again A1 and A2 be two CF languages.
One can prove that, in general,
the intersection A1 A2 ,
and
the complement Ā1= * \ A1
are not context free languages.
Intersection, Complement?
Proof for complement:
Recall that a problem in HW 5 shows that
L = { x#y | x, y are in {a, b}*, x != y} IS context-free.
Complement of this language is
L’ = { w | w has no # symbol} U
{ w | w has two or more # symbols} U
{ w#w | w is in {a,b}* }.
We can show that L’ is NOT context-free.
Context-free languages are NOT closed under
intersection
Proof by counterexample: Recall that in an earlier slide in
this lecture, we showed that
L = {anbncn | n >= 0} is NOT context-free.
Let
A = {anbncm | n, m >= 0} and
B = L = {anbmcm | n, m >= 0}. It is easy to see that both A
and B are context-free. (Design CFG’s.)
This shows that CFG’s are not closed under intersection.