THEORY OF COMPUTATION

Transcript THEORY OF COMPUTATION

THEORY OF COMPUTATION
CSC 422
Evelyne Tropper
7/6/2015
1
INTRODUCTION
• Are all problems programmable?
• What statement of a problem constitutes an
implementable program?
• Do the specifications of a program always
lead to a program?
• Is it always possible to find specifications of
a problem that lead to a program?
Evelyne Tropper
2
Purpose of Theory of
Computation
• Design a language for the mathematical
specification of computer languages in
general & for all computers
– Describe the workings of a general computer in
the simplest and most basic terms possible
– Find a mathematical language acceptable to the
above general, basic computer
Evelyne Tropper
3
SETS & FUNCTIONS
• {}
A collection
• {a,b,c} = A
A set
• CxS = {(c1,s1),…,(cn,sn)}
Cartesian product
• {0,1}
Binary alphabet
• e & |e|=0
Empty set & its length
• wr
Reverse of w
(if w=wr ==> palindrome)
• Ā
Complement of A
(A even nos ==> Ā odd nos)
RELATIONS
• Reflexive if (a,a)  R  a  S
• Symmetric if (a,b)  R ==> (b,a)  R
• Transitive if (a,b)  R , (b,c)  R ==>
(a,c)  R
• Equivalence relation is reflexive, symmetric
& transitive
Evelyne Tropper
5
Examples of reflexivity
• Let relation R be: a <= b
then a <= a & R is reflexive
• Let relation R be: a < b
then a is not < a & R is not reflexive
• Let set S be: {oaks}; let R be: oaks  trees
then a  S => a  trees, so (a,a)  R
Evelyne Tropper
6
Examples of symmetry
• If R is (big,large) then
a  {big} => a  {large} &
a  {large} => a  {big}
• If R is relation of synonyms then
a synonym of b => b synonym of a
so R is symmetric & also reflexive
Evelyne Tropper
7
Examples of transitivity
• If R is “on top of” & (a,b)  R; (b,c)  R
then (a,c)  R & R is transitive
• If R is “<” & (a,b)  R; (b,c)  R
then (a,c)  R & R is transitive
• If R is “child of” & (Bob, Mary)  R;
(Mary, John) then R is not transitive
• If R is “successor of” & (Bob, Mary)  R;
(Mary, John) then R is transitive
Evelyne Tropper
8
Union, Intersection & Complement
A
AB
B
universe
AA
Ā
AB
Evelyne Tropper
9
Laws of Set Operations
• Idempotency: AA = A, AA = A
• Commutativity: AB=BA, AB=BA
• Associativity: (AB)C= A(BC)
(AB)C=A(BC)
• Distributivity: (AB)C=(AC)(BC)
(AB)  C=(AC) (BC)
• Absorption:
• DeMorgan’s:
(AB)A=A, (AB)A=A
A-(BC)=(A-B)(A-C)
A-(BC)=(A-B)(A-C)
10
Examples
• Apples  apples = apples
• Apples  oranges = oranges  apples
Dad  Mom = Mom  Dad
…
Evelyne Tropper
11
FUNCTIONS
• R is a function if:
 x  X ! y  Y : (x,y)  R
• There are two notations:
– f : X --> Y
– fXxY
• Alternate definition of a function:
 x  X ! y  Y  f(x) = y
Evelyne Tropper
12
x
X
y
y1
Y
For each x  X there is a unique y  Y
In other words, the dotted line going from x to y1 cannot exist
Evelyne Tropper
13
BIJECTIVE FUNCTION
1-1 function
• If y  Y ! x X then we have a
Bijection or 1-1 function
• If f is bijective =>  f--1(y) = x for f(x) = y
x
x1
X
y
y1
Y
14
FINITE AUTOMATA
• Are all problems as easily programmable?
input
program
output
• Consider a program to compute the integral
of xn vs a sort program
• The first has constant memory requirements ,
the second can be of arbitrary length
• Can the implementation of a program be
designed as a program?
Deterministic Finite Automata
• A DFA is a program to design programs
that use a constant amount of memory.
• A DFA can be thought of as a tape which
reads one character at the time until the end
of the tape. Each position on the tape puts
DFA in a different state.
• It can elucidate errors that may show up in a
programming problem.
Evelyne Tropper
16
Example 1
When a customer pays for goods with electronic money, that is with
an ATM card, all possibilities must be accounted for :
• The customer may decide to pay, i.e. sends the money to the store
• The customer may cancel & money is sent to bank to be deposited
in customer’s account
• The store may ship the goods to the customer
• The store may redeem the money, i.e. the money is sent to the
bank to be given to the store
• The bank may send the money to the store
Can the store ship the goods without ever getting paid?
Evelyne Tropper
17
DFA for previous problem
Start
pay
redeem
a
b
transfer
d
ship
f
ship
ship
For the store
c
e
redeem
g
transfer
2
cancel
1
3
For the bank
4
Redeem transfer
cancel
start
For the customer
pay
18
Example 2
Memory machine
States
Input
q0
q1
Outputs
q0
q1
0 (a)
q0
q0
0
1
t
1 (b)
q1
q1
0
1
t+1
Time
1/1
0/0
1/0
q0
q1
0/1
19
Example 3
Parity machine
States
Input
q0
q1
Outputs
q0
q1
0
q0
q1
0
1
t
1
q1
q0
1
0
t+1
Time
0/1
0/0
1/1
q0
q1
1/0
20
Example 4
Adding machine
1 1
1
101101
111001
1100110
States are: ( i1 i2 carry)
(000), (001), (010), (011), (100), (101), (110), (111)
10/0
10/1
11/0
01/1
000
01/0
carry
00/1
00/0
11/1
21
Definition of a DFA
DFA A = (Q, , , s, F)
Q - finite set of states
 - finite input alphabet
 - transition function from Q x  --> Q
s - initial state s  Q
F - favorable, or accepting, states F  Q
Evelyne Tropper
22
Applications of a DFA
•
•
•
•
•
Search engines on Web can use them
News analysts searching on-line for special topics
Stock analysts searching for stock names
“Shopping robots” searching for best price on-line
Searching for all books on Amazon containing a
certain phrase or word
• grep, egrep, fgrep in Unix
Evelyne Tropper
23
Functioning of a DFA
•Automaton A in state q  Q
•reads a   (letter in the alphabet)
•enters state q1 =  (q, a) which is determined
completely by current state & which is the content
of the current cell.
•Set of all accepted input words by A is called the
language L(A) of the DFA.
Evelyne Tropper
24
Finite State Diagram
It is a directed graph representing a DFA.
a
State q changing to state q1
q
q1
after reading input a
Initial state is preceded by
A favorable state is doubly circled.
Evelyne Tropper
25
Programs & Finite State
Diagrams
Program A
statement 1
if ---then exit
else ---end
If-then
Statmt 1
s0
exit
s1
If-else
end
s2
s3
Statmt 2
statement 2
statement 3
s4
s5
Statmt 3
end
end
end
Evelyne Tropper
26
Example 1
a
b
s
b
a
q
a
r
b
Accepts language, L(anbm)
Ex: statement, statement,…, if-then, if-then, …, end
27
Example 2
b
a
a
s
q
b
a
r
Not accepted
Trap
stays
there
b
Accepts any language with 2 consecutive a’s
L(…, a, a, …)
Ex: Any program with a nested pair of “while” loops
28
Example 3
b
a
a
s
q
b
a
r
Dead, no further state
It will not accept two consecutive a’s
L(…, a, a, …) (Ex: no nested “for” loops)
b
It is the complement of the previous example
29
Example 4
b
s
a
q
a
b
• Initial “a” must be followed by
“a” to be accepted
• Initial “b” can be followed by
any number of a’s & b’s
a
f
b
30
Example 5
b
b
a
a
s
q
f
a
b
L(a, bn, a, bm)
a
p
Evelyne Tropper
b
Ex: one “case” followed by multiple statements
31
Configurations
•In chess there are individual moves & standard
patterns for the opening or end game.
•Imagine that the players have reached the end
game; the pieces left are BK, BQ, 1 BR, 1 BKt, 3
P; WK,WQ, 1 WB, 1 WKt, 2 P.
•Imagine they are in a certain configuration, that is
a certain pattern on the board.
•Then the next moves can be figured out from that
point on.
Evelyne Tropper
32
Definition of a configuration
A configuration is a composite of state (pieces left), position (pattern
on the board), input (next moves).
Ex: In reading input {aaaabba}, after state {aaa} has been
reached, position q can be reached by reading input {abba}
(q,abba) is a configuration towards acceptance. w is accepted if it
yields a favorable state.
If (q,w) -->(q1,w1) then  σ   : w = σ w1 & δ (q, σ ) = q1
then (q,w) yields (q1,w1) in one step
(q,w) yields (q1,w1) if  a sequence of configurations:
(q1,w1) … (qk,wk) : (q1,w1) = (q,w) , (qk,wk) = (q1,w1) &
(qi,wi) yields (qi+1,wi+1) in one step.
All configurations yield unique configurations as it is deterministic.
Example 1
A vending machine for newspapers. The cover is released when $0.25
is reached. It does not return money if > $0.25 is put in.
 = ( 5c, 10c, 25c )
s
5c
20
10c
10c
10c
5c
15
5c
10
10c
5c
5
5c
a
25c
Evelyne Tropper
34
Examples of applications in
Computer Science
•
•
•
•
•
•
•
•
•
Programming sequences, branching, loops
Pattern matching (for WWW & AI)
Lexical analysis in compilers
Finite state machines in software specs & design
Word processors
Design of telecommunication protocols
Design of circuits for VLSI
Hardware design
Control mechanisms
Evelyne Tropper
35
Non-deterministic Finite
Automata
• Union of 2 DFAs gives a NDFA as more than 1
arrow from any state with the same input.
• For example, in pattern matching of A  B, we
may match A or we may match B.
• NDFA A = ( Q,,,s,F )
where   Q x (  {e}) x Q is a transition
relation (q, a, p)  
The empty string as input permits the automaton to
jump from one state to the other. So after A is
matched, whatever state the other branch is at, can
36
jump
Example
•The ability to be in several different states at once, can be
expressed as the ability to “guess” what will come next.
•When a system searches for a certain sequence of characters, such
as a keyword, we can use a sequence of states to do nothing but
jump from state to state until we find the keyword.
Evelyne Tropper
37
Example 1
Find first occurrence of keywords
Searching for the keywords: “Web” or “Internet”
Finding either gets us to a favorable state.
e
b
w
i
n
Evelyne Tropper
t
e
r
n
e
t
38
Functioning of a NDFA vs DFA
• Automaton A in state q  Q
• reads a   (letter in the alphabet)
• enters state q1 or q2 =  (q, a) which is determined
completely by current state & which is the content
of the current cell.
a b
q
a b
Evelyne Tropper
q1
q2
State q changing to state q1 or
state q2 after reading input a b
39
Interpretation
• One input can send you to 2 different subprograms (matching 2 Jones in a DB)
• This is useful when you build a program from
multiple sub-programs. In industry, whole teams
work on complicated programs, each member on
separate sub-programs.
• In parallel programming, this represents different
threads.
• NDFAs are easier to design & can be changed to
DFAs
Evelyne Tropper
40
Examples
a
No jumping from q to p
Therefore no need for trap state
q
e
s
e
p
(q,w) can yield many different
configurations (q1, e)
b
a
s
b
q
b
b
r
Automaton that begins &
ends with b
41
Definition & theorem
• Definition: A & A1 that accept the same
language are equivalent.
• Theorem: For all non-deterministic
automaton A there exists a deterministic
finite automaton A1 equivalent to A.
Evelyne Tropper
42
Example
e
q
b
a
b
r
e
a
t
s
This NDFA is equivalent to
the following deterministic
transitions:
b
a
{s}---> {q,r,t}
b
{s}---> {r,t}
a
{s}---> {t}
Accepting states are:
{t}; {r,t}; {q,r,t}
Trap state is {0}
b
{t}---> {t}
43
Equivalent Deterministic Automaton
{t}
b
{r,t}
Evelyne Tropper
b
a
{s}
a
{q,r,t}
44
Regular Expressions
Regular expressions are used to:
• Design a language for mathematical
specification of languages acceptable by a
finite automata (computers of all types)
• Do that by a general algorithmic procedure
• To convert FA back to its specs through an
algorithmic procedure.
• These are used in turn to do simulations.
Evelyne Tropper
45
Reg. Exp. vs FA
• DFA & NDFA are machine-like descriptions
• Regular Expressions are algebraic-like
descriptions.
Evelyne Tropper
46
Applications of Regular
Expressions
• Lexical analyzers such as Lex & Flex which
take source code and convert it into tokens
• Grep in Unix
• Alternative to FA notation for describing
software components
• A declarative way to express which strings
are acceptable
Evelyne Tropper
47
Language for regular expressions
Some of the complex programming languages can be obtained from
simpler languages using , , , concatenation & Kleene stars.
Concatenation: of strings u, v is “uv”
of languages L1, L2 is L1L2 = {uv | u L1, v  L2}
Kleene Star:
L* of L is the infinite union:
{e}  L  L2  L3 …
L* = {w1w2 …wk} when wi  L
Evelyne Tropper
48
Examples
L  M = {001, 10, 111}  {e, 001} = {e, 001, 10, 111}
L  M = {001}
LM = {001, 10, 111, 001001, 10001, 111001}
L* for {0, 1} = all strings of 0’s & 1’s
for {0, 11} = all strings of 0’s & 1’s such that 1’s come in pairs
L0 = {e}; L1 = {0, 11}; L2 = {00, 011, 110, 1111};
L3 = {000, 0011, 0110, 01111, 1100, 11011, 11110, 111111}
…
Evelyne Tropper
49
Languages accepted by FA
Theorem: If languages L & M are acceptable by a finite automaton, so
are L  M, L  M, * - L (complement), L – M, LM (conc), L*
(Kleene Star)
.
.
Suppose L accepted by
s
.
Suppose M accepted by
Evelyne Tropper
s
.
.
.
50
1. Then L  M accepted by :
e
s
.
.
.
s1
.
.
.
q
e
_
2. Then L is accepted by a FA which is the DFA equivalent of the
NDFA and by flipping favorable & unfavorable states
_______
_
__
3. L  M = (L  M) and is accepted
__
4. L – M = L  M and is accepted
Evelyne Tropper
51
5. LM is accepted by:
Favorable state of L
.
.
.
s
.
.
.
s1
Favorable state of L
6.
L*
Favorable state of L
is accepted by:
New start
To include {e}
e
s
e
s1
.
.
.
e
Favorable state of L
Connect back to original state
52
Definition of a Regular
Expression
It is a string over an alphabet:
  { (,), e, , , * }
It is mostly used in lexical analyzers.
1.  , e, a   ( the ground elements) are
regular expressions.
2. If ,  are regular expressions
then   , , * are regular expressions
by induction
3. No other string is a regular expression. 53
Mapping of regular expression  to
language L()
• L() = , L(e) = {e}, L(a) = {a} a  
• If ,  are regular expressions then
L(   ) = L()  L()
L() = L()L( )
L(*) = (L())*
Evelyne Tropper
54
Examples
•
•
L((ab*)a) = {w | w of form abna}
Identifier in C begins with a letter & may
be followed by a string of letters & digits.
The identifiers can thus be expressed by the
regular expression:
([a-z]  [A-Z]) ([a-z]  [A-Z]  [0-9])*
•
Identifiers of languages with underscores
can be expressed by:
([a-z]  [A-Z]) ((-([a-z]  [A-Z]  [0-9]))*
([a-z]  [A-Z]  [0-9])*)*
55
Examples
• Strings with alternating 0’s & 1’s
(01)* + (10)* + 0(10)* + 1(01)*
Same as starting with (01)* and adding
optional 1 at beginning and optional 0 at
end:
(e+1)(01)*(e+0)
• Precedence of operators in Reg. Exp. is:
*, . , + or by grouping with parentheses56
Regular Expression --> FA
For reg. exp.: ( (a  ab)* ba )*
• ground singletons become:
• doubletons become
>
>
• (a  ab) becomes:
e
>
Evelyne Tropper
e
a,b
>
a
b
b
a
a
b
a
57
e
(a  ab)*
a
e
>
b
e
a
e
b
Add: ba
e
e
a
Add: ( ... )*
Evelyne Tropper
58
FA --> Reg. Exp.
a
Start with FA
b
b
a
>
a
b
a
•Replace 2 favorable states
with 1.
•Label nodes 1 to n.
a
> 1
2
b
3
b
4
e
b
5
a
6
e
7
•Replace arrows from i to j
& from j to k with an arrow
labeled li,jlj,k
Label from 1 to 3 will be “ab”
Label from 1 to 4 will be “abb”
• If there is an arrow from j
to j, add label li,jlji*
Labels b,a from 2 to 3 will be inserted as “ab(ab)*b”
•Different arrows from i to j
will be replaced by l1  l2
Labels from 1 to 4 and 1 to 7 will be replaced by
“ab(ab)*b  ba”
59
Example
3
a
a
>1
2
b
b
4
a
55
b
1.
2.
3.
4.
Single favorable state
Number nodes
3  2  4 becomes ab from 3 to 4; 1 2  4 becomes ab from 1 to 4; 4  2  4 becomes bb
Multiple arrows from one node are replaced by a union
Evelyne Tropper
60
3
ab
> 1
ab
b
a
4
55
bb
After 1, 2, 3
babbb
> 1
ab
4
a
55
After 4
ab(bab  bb)*a
61
Non-regular languages
• In the design of circuits on a chip, it is
important to know whether two automata
define the same language. This might
permit to minimize the area & cost of the
chip.
• FA & regular languages can describe
programs using a fixed amount of memory
regardless of input. This is not true for
loops.
Evelyne Tropper
62
Example
• L(anbn) = {ab, a2b2, a3b3, … } is not a
regular language because after accepting
“an” the automaton is in a state “q” and that
state cannot know what came before. So it
does not know how many “b”s it should
accept.
• A loop in a program can be represented by a
regular language or a FA if we can insert or
remove iterations without changing the
nature of the program.
Evelyne Tropper
63
Pumping Lemma for Reg. Exp.
•
For every Reg.lang. L there exists a
constant n such that every string w in L,
where |w| >= n, we can break w into 3
strings, w = xyz:
1. y  e
2. |xy| <= n
3. For all k >=0 the string xykz is also in L
Evelyne Tropper
64
Steps in proof
•
•
•
•
Assume it is regular.
Find the defining property of language
Define a y such that |xy| <= n and w=xyz
Find a k such that xykz destroys the property
of the language
• You have now found one example that is
not in the language
• Therefore the language cannot be regular
Evelyne Tropper
65
Example 1
•
L(anbn) ={w} is not a reg.lang.
Proof:
|w|= 2n
Let w = xyz, then if
1. |xy| <= n, then y = am for some m
2. Therefore xykz = anamkbn and is not in the
language
Evelyne Tropper
66
Example 2
•
L(1n) = {w : |w| is a prime } is not a reg.lang.
Proof:
1. If it were a reg.lang., there would exist a prime p >=
n, such that w = 1p, and |w| = p.
2. Then w = xyz, |xy| <= n
3. Let |y| = m. Then |xz| = p-m
4. Consider string xyp-mz which must then be in L.
5. |xyp-mz| = |xz| + (p-m)|y| = (p – m) + (p – m)m =
(m + 1)(p – m)
6. Therefore |xyp-mz| is not a prime.
Evelyne Tropper
67
Example 3
•
Language of palindromes is not regular.
Proof:
1. If it were a regular language, then there would
exist an n: every w  L, where |w| >= n can be
represented as w = xyz.
2. Consider anban. Since |xy| <= n, y is a
substring of an on the left. Let y = am
3. Then w = an-myban and by the Pumping
Lemma an-mykban  L. But an-mykban = anmamkban and it is not a palindrome.
Evelyne Tropper
68
The closure properties of RL
•
•
•
•
•
•
•
•
The union of 2 RLs is regular
The intersection of 2 RLs is regular
The complement of 2 RLs is regular
The difference of 2 RLs is regular
The reversal of a RL is regular
The Kleene star of a RL is regular
The concatenation of 2 RLs is regular
A homomorphism (substitution of strings for
symbols) of a RL is regular
• The inverse homomorphism of a RL is regular
Evelyne Tropper
69
Equivalence & Minimization of
Automata
• When are 2 descriptors of a RL equivalent?
• Consequently, when can we minimize a
DFA? This would be a unique minimal DFA.
• We start by asking when 2 states can be
replaced by a single state behaving like both.
Evelyne Tropper
70
Equivalent & distinguishable states
• p & q are equivalent if:
w: (p, w) is an accepting state iff
(q,w) is an accepting state
• If 2 states are not equivalent, then they are
distinguishable, and one is accepting while
the other is not.
Evelyne Tropper
71
Writing efficient FA’s & RE’s
• There many various FA’s and RE’s that will
satisfy a language, just as there are many
different programs that will solve a
particular problem.
• FA’s & RE’s are used to design VLSI chips.
• In all cases it is more efficient & cost
effective to find the minimal FA or RE to
represent the problem at hand.
Evelyne Tropper
72
State Minimization Problem
• The idea is to find the FA (or RE) with the
minimal number of states equivalent to the
one being used.
• Equivalent automata are indistinguishable
by the time they get to an accepting state.
That is, if they both get accepted with the
same inputs, they are indistinguishable.
Evelyne Tropper
73
Example 1
n
a
b
a
q
a
b
a
m
s
b
b
r
Evelyne Tropper
b
a
1. State n is
unreachable
a
2. States t & p are both
reachable from input
a and are
t
indistinguishable
b
3. n will be eliminated
and t & p will be
a
merged
p
4. q & r are both
accepted through a
b
and reach a dead
state through b. They
are
74
indistinguishable.
Equivalent automaton
a,b
a
s
b
m
{q,r}
b
a
{t,p}
a,b
Evelyne Tropper
75
Example 2
1
0
A
0
B
1
C
0
1
E
1
F
1
1
G
1
0
Evelyne Tropper
0
0
D
1
1
0
H
1. A & E are both nonaccepting, so input e
does not apply.
2. On input 1 they both go
to F.
3. On input 0, A goes to B
& E goes to H. Then on
input 1, B goes to C & H
goes to C. So on input
01, both A & E go to C, a
favorable state.
4. Therefore A & E are
equivalent.
5. D & F both go to C on 0
Equivalent minimal FA
0
0
{B,H}
0
G
1
{A,E}
1
1
F
C
1
1
0
Evelyne Tropper
77
Definition of equivalent states
States p & q are equivalent if for all input
strings w,
(p, w) is an accepting if and only if
(q, w) is an accepting state.
Any pair of states that are not
distinguishable as they proceed to be
accepted are equivalent.
78
Finding equivalent states
• Any state that is not accepting cannot be
equivalent to any accepting state.
• States that reach an accepting state with the
same single input are equivalent.
• States that reach an accepting state with the
same multiple input are equivalent.
Evelyne Tropper
79
Proof of equivalency
It is easier to show the states that are not equivalent than the ones that
are equivalent on a very large FA. To prove they are equivalent all
inputs (0, 1, 00, 01, 11, 000, …) must be considered.
1. The accepting state(s) is(are) not equivalent to any non-accepting
state
2. Going back one step from the accepting state(s), the states getting
there on different inputs are not equivalent.
3. Going back two steps from the accepting state(s), the states
getting there on different inputs are not equivalent.
4. Etc.
Evelyne Tropper
80
Table of inequivalences
for example 2
B
x
C
x
x
D
x
x
x
x
x
E
x
F
x
x
x
G
x
x
x
x
x
x
H
x
x
x
x
x
x
C
D
E
F
G
A
B
x
Minimization of FA through tables
•
•
Eliminate any state that cannot be reached
from the start state
Partition remaining states into equivalent
blocks so that no pair of states from
different blocks are equivalent
Evelyne Tropper
82
Context-Free Grammars
• Some languages cannot be recognized by FA’s
• Can FA’s recognize legal English statements?
• Can FA’s recognize the syntactical correctness
of statements in programs?
Evelyne Tropper
83
Uses of Context-Free Grammars
• Since the 1960’s CFGs have been used to
turn out parsers automatically.
• They are used to describe document formats
with Document Type Definitions, DTDs
used in XML for creating Web pages.
• Grammars can define languages through the
use of Parse Trees.
Evelyne Tropper
84
Parse Tree example 1
sentence  noun-phrase, verb-phrase
noun-phrase (proper-noun | determiner),
common-noun
verb-phrase  verb | (verb, adverb)
…
A Parse Tree is a recursive definition of a
language.
Evelyne Tropper
85
Palindromes example 2
The language of palindromes cannot be represented
by RE’s, but it can be defined recursively as:
1. e, 0, 1 are palindromes
2. If w is a palindrome, so are 0w0 and 1w1
As a Parse Tree:
P
1. P  e
0
P
0
2. P  0
1
P
1
3. P  1
4. P  0P0
P
86
5. P  1P1
e
Definition of a CFG
• Set of symbols that form the strings of the
grammar are called terminals, 
• Set of variables, or strings, are called nonterminals, NT
• A start symbol, S  NT
• Set of productions or rules, R:
R  NT x (  NT)*
which serve to derive the terminals from the
non-terminals
Evelyne Tropper
87
Derivation mechanism
• Left part of arrow always has a non-terminal
• Right part of arrow has a sentential form
• A sentential form v is one-step derivable from sentential
form u:
u  v if u = xAy and v = xzy and A  z is in the rules R
• v is derivable from u, u * v, if there exist a sequence,
called a derivation,
u0, … ,un : u=u0 and v=un and u0u1, u1u2,…,un-1un
• The language L(G) generated or derivable in a grammar
is defined as:
L(G) = {w | w  *, S * w}
88
Example 3 - a pseudo-language
Example of a pgm
Inst  Assign | Inst-phrase | END
Assign  (Id) (:= )(Id) (op) (Id)
Id := Id + Id
Id := Id – Id
Id  ([a-z]+ [A-Z]) ([0-9]* + [a-z]*+
[A-Z]*)
IF (Id [= < ><= >=] Id)
Inst
op  + | -
ELSE Inst
END
Inst-phrase  If-phrase | If-else-phrase
If-phrase (IF) ( condition)
condition  (Id) (Cond-op)(Inst)
Cond-op  = | < | > | <= | >=
If-else-phrase  (If-phrase) (ELSE) (Inst)
89
Example 4
Let grammar G have non-terminals
NT = {S, Np, Vp, Ap, N, V, A}
and terminals
 = {big, stout, John, bought, white, car}
and rules, or productions
S  NpVp
Np  N | ApN | e
Ap  ApA | e
Vp  VNp
A  big | stout | white
N  John | car
V  bought
Strings generated by G:
•John bought car
• John bought big car
• big stout John bought big white car
• big stout car bought big white car
90
Different derivations of a sentence
Given: S = John bought car
we can have derivation:
S  NpVp  NVp John Vp  John bought Np
 John bought N  John bought car
or
S  NpVp  NpVNp  NVNp  NVN  NVcar
 N bought car  John bought car
Both derivations give the same result and can best be described in a
parse tree. All derivations with same parse tree are equivalent.
We can always replace leftmost non-terminal, called leftmost
derivations, or we can always replace the rightmost non-terminal,
called rightmost derivations.
91
CFG derivations are non-deterministic.
Parse tree for example 4
S
Np
N
John
Vp
V
Np
bought
N
car
Parse trees for a+b*c
exp
exp
exp
+
exp
exp
exp
*
*
exp
exp
exp
+
exp
They are not equivalent. That is why we need a precedence rule for
algebraic expressions.
Factors are parenthesized expressions that cannot be broken down.
Terms are expressions that cannot be broken down by the + or operators, such as a*b.
A parser will accept only leftmost or only rightmost derivations.93
Applications
1. Parsers were the first applications of CFG
2. The YACC command in Unix is a CFG that creates either a tree
or a piece of object code. It allows to state the precedence of
operators in expressions.
3. XML (Extensible Mark-up Language) was the precursor & is a
superset of HTML (Hypertext Markup Language), a language
with which Web pages are created, both require a DTD
(Document Type Definition) which is a CFG describing the tags
allowed.
Evelyne Tropper
94
Pushdown Automata
FA can easily be expressed as derivations, so that
any FA can be expressed as a CFG:
(a,s) = s1 can be expressed as s  as1
However, there are non-regular, context-free
languages such as L(anbn).
If we add a stack to a FA such that every time it
reads a b it pops an a, then it does not need to
“remember” how many a’s there were.
95
Stack operation
input
a b
Finite
state
control
b
a
Evelyne Tropper
Accept/
reject
stack
96
Example - language of palindromes
Take the language of palindromes of even length:
Lwwr = {wwr | w is in (0,1)*}
• q0 is the state that represents a “guess” that we are not yet in the
middle. In state q0 we read input symbols & push them onto the
stack.
• At any time we guess that we have seen the middle & go to state q1.
Here the right part of w will be on top of the stack and the left part on
the bottom.
• In state q1 we compare input symbols with the symbol at the top of
the stack. If they do not match, the guess was wrong & this branch
dies.
• If the input symbol matches the symbol on the top of the stack, we
start popping until the stack is empty & enter an accepting state.
97
Example for w=1111
(q0,1111,a0)
(q0,111,1a0)
(q1,1111,a0)  (f,1111,a0)
(q0,11,11a0)
(q1,111,1a0)  (q1,11,a0)
(q0,1,111a0)
(q1,11,11,a0)
(f,11,a0)
(q0,e,1111a0)
(q1,1,111a0)
(q1,1,1a0)
(q1,e,1111a0)
(q1,e,11a0)
(q1,e,a0)
(f,e,a0)
98
Definition of Pushdown Automata
P = (Q, , , s0, a0, , F)
where:
Q is a finite set of states
 is the input alphabet
 is the set of stack symbols
s0 Q is the initial state
a0 is the start symbol needed at bottom of stack to get to a
favorable state after stack has been emptied
(s,a,X | sQ, a{e}, X) is the transition
relation with output (p,S | pQ, S is string of symbols
replacing X on top of stack)
99
FQ is the set of favorable states.
PDA for language of palindromes
P=( Q
,  ,
 , s0, a0, , F )
P = ({s0,s1,f}, {0,1}, {0,1,a0}, s0, a0, , {f})
• Originally in state s0 the stack contains a0 and we input 0 or 1
(s0,0,a0) = {(s0,0)} and (s0,1,a0) = {(s0,1)}
• Reading another 0 or 1 we obtain transitions: (s0,0,0) = {(s0,00)},
(s0,0,1) = {(s0,01)}, (s0,1,0) = {(s0,10)} and (s0,1,1) = {(s0,11)}
• We can go from state s0 to state s1 on input e: (s0,e,a0) = {(s1,a0)};
(s0,e,0) = {(s1,0)} and (s0,e,1) = {(s1,1)}
• In state s1 we match input symbols to stack symbols: (s1,0,0) =
{(s1,e)} and (s1,1,1) = {(s1,e)}
• After emptying the stack of input we are left with a0 in it, and e
takes us to a favorable state s2: (s1,e,a0) = {(f,a0)}
100
PDA for L={anbn)
Q = {s0,s1,f} ,  = {a,b},  = {a}, F = {f}
•
•
•
•
•
 (s0,e,e) = {(s1,a0)}
 (s1,a,e) = {(s1,a)}
 (s1,a,e) = {(s1,a)}
 (s1,b,a) = {(s1,a0)}
 (s1,e,a0) = {(f,e)}
Evelyne Tropper
101
PDA for L={a,b}*
Language with the same number of a’s and b’s
Q = {s0,s1,f} ,  = {a,b},  = {e}, a0 , F = {f}
1.
2.
3.
4.
5.
6.
7.
8.
 (s0,e,e) = {(s1,a0)}
 (s1,a,a0) = {(s1,aa0)}
 (s1,a,a) = {(s1,aa)}
 (s1,a,b) = {(s1,e)}
 (s1,b,a0) = {(s1,ba0)}
 (s1,b,b) = {(s1,bb)}
 (s1,b,a) = {(s1,a0)}
 (s1,e,a0) = {(f,e)}
Evelyne Tropper
102
Transitions accepting abbbaaaabb
State
Input left
Stack
Transition
s0
s1
s1
s1
s1
s1
s1
s1
s1
s1
s1
abbbaaaabb
abbbaaaabb
bbbaaaabb
bbaaaabb
baaaabb
aaaabb
aaabb
aabb
abb
bb
b
e
e
e
a0
a a0
a0
b a0
bb a0
b a0
a0
a a0
aa a0
a a0
a0
e
1
2
7
5
6
4
4
2
3
7
7
8
s1
f
103
Acceptance by empty stack
In the last example, we reached the end when the stack was empty.
In the example, we did not reach a favorable state.
This is called a PDA that accepts input strings by empty stack
rather than by a favorable state, and it is described by:
P = (Q, , , s0, a0, )
Theorem:
There is an algorithm that accepts a string by empty stack rather
than by favorable state, and the two algorithms are equivalent.
Evelyne Tropper
104
CFG  PDA
Theorem :
Given any CFG, G, there exists an algorithm that constructs a PDA,
A, such that L(A) = L(G)
Proof:
1. Let A have 2 states, s0 and f.
2. Push s0 onto the stack.
3. If topmost symbol on stack is N, a non-terminal, A picks rule
Nw in G and replaces N on the top of the stack by w.
4. If topmost symbol on stack is T, a terminal, A advances to next
symbol; if it matches T, it pops top of the stack.
Transitions:
1.  (s0,e,e) = {(f,s0)}
2.  (f,e,N) = {(f,w)} for each rule N  w in grammar G
3.  (f,T,T) = {(f,e)} for each terminal T
105
Example 1 for CFG  PDA
Grammar G with following rules:
Se
S  aSa
S  bSb
For language L = {wwr | w {a,b}*}
Transitions of PDA for CFG A is:
1.  (s0,e,e) = {(f,S)}
2.  (f,e,S) = {(f,aSa)}
3.  (f,e,S) = {(f,bSb)}
4.  (f,e,S) = {(f,e)}
5.  (f,a,a) = {(f,e)}
6.  (f,b,b) = {(f,e)}
106
PDA  CFG
Theorem :
Given any PDA, A, there exists an algorithm that constructs a CFG,
G, such that L(G) = L(A)
Proof:
Consider a simple PDA accepting by empty stack. A simple PDA is
one where every transition replaces the top of the stack.
For any transition (s,a,) --> (q,)
Let  be A1, A2, …, An. These represent the popped symbols.
Then the transition (s,a,) --> (q,) can be replaced by:
(s,e,A1) --> (sA1,e)
(sA1,e,A2) --> (sA1A2,e)
…
(sA1…An-2,e,An-1) --> (sA1A2…An-1,e)
107
(sA1…An-1,e,An) --> (q,)
The grammar can now be defined by the following rules:
• For every sQ, S --> [s0, a0, s]
• For every s,qQ, every a  {e}, A, if (s,a,A) --> (q,e),
then [s,A,q] --> a
• For every s,qQ, every a  {e}, A, if (s,a,A) -->
(q,B1B2…Bk) then
[s,A,qk] --> a[q,B1,q1] [q1,B2,q2]... [qk-1, Bk,qk]
Evelyne Tropper
108
RL  PDA  CFG
Theorem:
Every Regular Language is context free.
Proof:
Given a FA, A, accepting a language, L.
View A as a PDA that does not use its stack.
Therefore L is accepted by a PDA.
Because every PDA  CFG, every Regular Language, or FA, is
context free.
Evelyne Tropper
109
Languages that are not
Context-Free
Take language:
S --> uAz --> uvAyz --> uvxyz
Then it must be true that
A --> x and A --> vAy
Then we can derive further, getting:
S --> uAz --> uvAyz --> uv2Ay2z --> uv3Az3z --> .. --> uvnAynz
Now we can define a type of Pumping Lemma for context-free
languages
Evelyne Tropper
110
Pumping Lemma for ContextFree Languages
Lemma:
Let G = (, NT, R, S) be a context-free grammar. Then there exists a
number n such that any string w  L(G) with length |w|  n can be
written as w = uvxyz for some strings u, v, x, y, z  , and such that
1. |v| > 0, or |y| > 0
2. |vxy|  n
3. For any k  0, uvkxykz  L(G)
Evelyne Tropper
111
Example 1
Show that the language L = {akbkck | k = 0, 1, 2, …} is not contextfree.
Proof:
If it is context-free, let w = anbncn with length |w| = 3n
Let w = uvxyz with length |vxy|  n
Then vxy contains only a’s and b’s, or only b’s and c’s.
If vxy contains only a’s and b’s, then v or y contains at least one
symbol.
Then the string uv2xy2z contains more than n a’s, or more that n b’s
But the number of c’s is still the same.
Therefore the string is not in the language.
Evelyne Tropper
112