Transcript Chapter 5

Chapter 8
Properties of Context-free Languages
These class notes are based on material from our textbook, An
Introduction to Formal Languages and Automata, 4th ed.,
by Peter Linz,.
The pumping lemma for context-free
languages:
Suppose you have a CFG G in which the
variable A is used in two different rules, to derive
two different strings, e.g.,
(1) S  vAz
(2) A  wAy
(3) A  x
We can use these rules, applying rule 2 recursively,
to generate the following string:
S  vAz  vwAyz  vwwAyyz 
vwwwAyyyz  etc.  vwnxynz.
The pumping lemma for CFLs:
Of course, we can apply rule 3 at any point
along the way to bring the process to a halt. Thus,
the following strings are all legitimate strings in the
language:
vwxyz, vwwxyyz, vwwwxyyyz, etc.
In fact, with rules 2 and 3 in the language, there is
no way to prevent the language from containing an
infinite number of strings of the form vwnxynz.
The pumping lemma for CFLs:
Remember the definition of Chomsky Normal
Form grammars: A CFG is in Chomsky Normal
Form if every production is of one of these two
types:
A  BC
Aa
Remember also that we can put any CFG grammar
into CNF (omitting the null string, if it belongs to
the original language).
The pumping lemma for CFLs:
If a grammar is in CNF, then its derivation tree will
be binary; that is, every node will have at most two
children. Why? There are only 3 possibilities:
(1) The node represents the first type of rule
above, in which a single variable produces two
variables.
(2) The node represents the second type of rule
above, in which a single variable produces a single
terminal.
(3) The node is a terminal node and so has no
children.
The pumping lemma for CFLs:
A path in a binary tree is either empty, or consists
of a node, one of its descendants, and all of the nodes in
between.
The length of a path is the number of nodes it
contains (for this class, we will us this definition;
however, most of the time length and height are in terms
of the number of edges, not number of nodes).
The height of a binary tree is the length of its
longest path.
The pumping lemma for CFLs:
You could create a very tall binary tree by
having all branches be unary. You can create the
shortest possible binary tree by having all of its
branches be binary, except possibly for some or all
of the branches at the bottom level of the tree.
The pumping lemma for CFLs:
•What
is the smallest height possible in a binary tree of 7
nodes? How many leaf nodes does it have?
•What
is the smallest height possible in a binary tree of 15
nodes? How many leaf nodes does it have?
•What
is the smallest height possible in a binary tree of 31
nodes? How many leaf nodes does it have?
•What
is the smallest height possible in a binary tree of
(2n) - 1 nodes? How many leaf nodes does it have?
The pumping lemma for CFLs:
Note the pattern here:
In a completely filled binary tree with (2n) – 1
nodes, half of the nodes (rounding up) will be
leaves. That is, (2n) / 2 nodes will be leaf nodes.
And we can rewrite (2n) / 2 as 2n-1. This gives us:
Lemma:
For any h  1, a binary tree which has more
than 2h-1 leaf nodes must have a height greater than
h.
The pumping lemma for CFLs:
Here is the point of all this:
If the height of the derivation tree for a given
string in the language is h, and there are fewer than
h production rules in the grammar, then at least one
rule must recur on the same path in the derivation
of this string.
Theorem 8.1:
Let L be a CFL. Then there is an integer m
so that for any w  L satisfying |w|  m, there are
strings u, v, x, y, and z satisfying
w = uvxyz
|vy| > 0
|vxy|  m
for any i > 0, uvixyiz  L
The pumping lemma for CFLs:
We can use the pumping lemma for contextfree languages to prove that there must exist some
language that is not context-free.
We do this by assuming that the language is
context free; this means that there must be an m
satisfying the conditions given above.
If we find that this causes a contradiction,
then we know the language can’t be a CFL.
Example:
the language L = {aibici | i  1}, assume
that L is context-free.
•Let w = ambmcm, with |w|  m.
•Since |vy| > 0, v and y together contain at least
one type of symbol.
•Since |vxy|  m, the string vxy can contain at
most two distinct types of symbols.
•Given
Example:
string uv2xy2z contains additional occurrences
of the symbols in v and y.
•Therefore, uv2xy2z cannot contain equal numbers
of all three symbols.
•But the pumping lemma says that uv2xy2z must be
a legitimate string in L. Obviously, this is a
contradiction.
•Consequently, L cannot be a context-free
language.
•The
Example:
Given the language L = {aibici | i  1}, how would
you try to process this language using a push-down
automaton?
We can insure that we have an equal number of a’s
and b’s, by pushing the a’s onto the stack one at a
time, then popping them off and matching them up
with the b’s one by one.
Example:
However, once we have done that, we don’t have
anything left to match the c’s with, so we can’t
guarantee that we have the same number of c’s as
a’s and b’s.
We can’t solve this problem by pushing a’s or b’s
back onto the stack.
This is due to the limitations of the type of memory
we have in a PDA.
Pumping lemma (again)
The pumping lemma for regular languages
states: every sufficiently long string in a
regular language contains a short substring
that can be pumped
 The pumping lemma for context-free
languages states: every sufficiently long
string in a context-free language contains two
short (and close-together) substrings that can
be pumped (the same number of times)

17
10:00 PM
Formal statement (again)
Let L be a context-free language. Then there exists
some positive integer m such that any string w  L
of length |w|  m can be decomposed into
substrings, u, v, x, y, z, such that w = uvxyz, and
|vxy|  m,
|v| > 0 or |y| > 0,
uvkxykz  L, for k  0
18
10:00 PM
Informal statement
Every context-free language has a “pumping
length” such that every string in the language
that is longer than this can be pumped to yield
another string in the language.
The string can be divided into five parts such
that the second and fourth parts can be repeated
together, or “pumped,” any number of times,
and the resulting string remains in the language.
19
10:00 PM
What is m?
In the pumping lemma for regular languages, the
“pumping length” m reflects the number of states
of the finite automaton.
20
In the pumping lemma for context-free
languages, what does m reflect? Roughly, it is
the length of the longest string that can be
generated by a parse tree in which the same
nonterminal never occurs twice on the same path
through the tree.
10:00 PM
In a sufficiently large parse tree, some
nonterminal must repeat along some path from
the root. This follows from the pigeonhole
principle.
S
A
A
u
21
v
x
y
z
10:00 PM
Proof Idea
The repetition of some nonterminal along a
path through the parse tree allows us to
replace the subtree under the last occurrence
of the nonterminal with the subtree under an
earlier occurrence of the nonterminal and still
get a valid parse tree
 This corresponds to pumping v and y
 Note that the parse tree of the previous slide
corresponds to the following derivation:

S ⇒uAz⇒uvAyz⇒uvxyz
22
10:00 PM
Important to remember
You can use a pumping lemma to prove that a
language is not context-free (or regular).
You cannot use a pumping lemma to prove that
a language is context-free (or regular).
23
10:00 PM
Exercise
The language L = {tt | t  {a,b}*} is not contextfree.
Pick a string in L. Try ambmambm. Then note that
you must consider three cases. It must be the case
that vxy is a substring of the prefix ambm, or the
“middle” bmam, or the suffix ambm.
Intuitively, why can’t a PDA accept this language,
although it can accept the language {ttR | t 
{a,b}*}?
24
10:00 PM
Pumping Lemma for
Linear Languages
Let L be an infinite linear languages. Then there
exists some positive integer m, such that any
wL, with |w|  m can be decomposed as w =
uvxyz with
|uvyz|  m
|vy|  1
such that
uvixyiz  L
for all i = 0,1,2…
25
10:00 PM
Pumping Lemma for
Linear Languages
Note that the conclusion for this theorem is different
from Theorem 8.1, since in 8.1 we have
|vxy|  m
and in Theorem 8.2 we have
|uvyz|  m
This implies that the strings v and y to be pumped must
now be within m symbols of the left and right ends of w,
respectively. The middle string x can be of arbitrary
length.
Theorem 8.2 helps establish the fact that the family of
linear languages is a proper subset of the family of
context-free languages.
26
10:00 PM
Closure properties for context-free languages:
The family of context-free languages is closed
under the operations of:
Union
Concatenation
Kleene closure
but not under the operations of
Intersection
Complementation
27
10:00 PM
Definition
A context-free grammar (CFG) is a 4-tuple
G = (V, T, S, P) where V and T are disjoint
sets, S  V, and P is a finite set of rules of the
form A  x, where A  V and x  (V  T)*.
V = non-terminals or variables
T = terminals
S = Start symbol
P = Productions or grammar rules
28
10:00 PM
Closure properties of CFGs
CFLs are closed under Union, Concatenation
and Kleene closure.
Proof by construction:
Let
G1 = (V1, T1, S1, P1) and
G2 = (V2, T2, S2, P2)
with
L1 = L(G1) and
L2 = L(G2)
29
10:00 PM
Union
We create grammar Gu = (Vu, T1  T2, Su, Pu)
generating
L1  L2
1. Rename the elements of V2 if necessary so that
V1  V2 = .
2. Create a new start symbol Su, not already in V1
or V2.
3. Set Vu = V1  V2  {Su}
4. Set Pu = P1  P2  {Su  S1 | S2}
Construction completed.
30
10:00 PM
Concatenation
We create grammar Gc = (Vc, T1  T2, Sc, Pc)
generating L1L2
1. Rename the elements of V2 if necessary so that
V1  V2 = .
2. Create a new start symbol Sc, not already in V1
or V2.
3. Set Vc = V1  V2  {Sc}
4. Set Pc = P1  P2  {Sc  S1S2}
Construction completed.
31
10:00 PM
Closure under Kleene star
Let G1 be any context-free grammar with the
starting symbol S. Adding the rules
Sλ
and
S  SS
creates a new context-free grammar G2 such
that L(G2) is the result of applying the Kleene
star operator to L(G1).
32
10:00 PM
Kleene Closure
We create grammar G* = (V, T, S, P) generating
L1*
1. Create a new start symbol S, not already in V1.
2. Set V* = V1  {S}
3. Set P* = P1  {S  S1S | l}
Construction completed. (See text for
justification.)
33
10:00 PM
Not closed under intersection
The context-free languages are not closed under
intersection. However, the intersection of a
context-free language with a regular language
is always a context-free language.
The context-free languages are not closed under
Complementation
34
10:00 PM
Corollary:
Are Regular Languages context free?
Yes.
Why?
We can express any Regular language in the form
of a CFG.
Regular languages are a proper subset of CFGs.
35
10:00 PM
Are Regular Languages context free?
Proof:
According to your textbook, the set of regular
languages is the smallest set that contains all
languages , {l}, and {a} (for every a  S)
and is closed under the operations of union,
concatenation, and Kleene*. We just
demonstrated that the operations of union,
concatenation, and Kleene* on CFGs produce
CFGs, so all we need to do is show that the
languages , {l}, and {a} have CFGs.
36
10:00 PM
Are Regular Languages context free?
The empty language can be written
SS
The language consisting of a null string can be written
Sl
The language consisting of single characters can be
written
Sa
QED
37
10:00 PM
Decision properties of context-free languages:
Can decide:
Membership
Empty
Infinite
But there is no algorithm for deciding whether
two CFGs generate the same language!
38
10:00 PM