Chapter 9

Transcript Chapter 9

CS 3813: Introduction to Formal
Languages and Automata
Chapter 11
A Hierarchy of Formal Languages and
Automata
These class notes are based on material from our textbook, An
Introduction to Formal Languages and Automata, 3rd ed., by
Peter Linz, published by Jones and Bartlett Publishers, Inc.,
Sudbury, MA, 2001. They are intended for classroom use only
and are not a substitute for reading the textbook.
Diagrams from some slides are from a
previous year’s textbook: Martin, John C.,
Introduction to Languages and the Theory
of Computation. Boston: WCG McGrawHill, 1991.
Slides are for use of this class only.
Functions
A function is a mapping from a set of elements
(called the domain) to another set of elements
(called the range).
 If the domain and range are the set of strings over
an alphabet, we call it a string function. If the
domain and range are the set of natural numbers,
we call it a number-theoretic function.
 Any natural number can be represented by a
string. We will see later that any string can be
represented by a natural number. (This will turn
out to be important.)

Computability
A function is partial Turing computable if
there is a TM that computes it and the TM
stops on all inputs in the domain of the
function (which may be a subset of all
possible inputs)
 A function is Turing computable if there is a
TM that computes it and the TM stops on
all inputs

Languages
Given the set of all possible strings over an
alphabet, a language is a subset of this set.
 A language can be represented by a
characteristic function that has the set of all
strings as its domain and {0, 1} as its range. It
maps a string to 1 if it is in the language, and
otherwise maps it to 0.
 When we extend the concept of computability
to languages, we usually call it “decidability.”

Decidability
A Turing computable language has a
characteristic function that is Turing
computable. A Turing computable language is
also called a decidable language.
 A semi-decidable language has a TM that
outputs 1 (or equivalently, halts) for every
input string in the language, and does not halt
for any input string that is not in the language.
 So, we talk about computability for functions,
and decidability for languages. But it’s the
same idea.

Review of definitions
A function can be Turing computable,
partial Turing computable, or
uncomputable. What are the differences?
 A language can be decidable, semidecidable, or undecidable. What are the
differences?

Enumerability
A language is said to be Turing enumerable if
there is a TM that lists all the strings of the
language. (Note that the TM never terminates
if the language is infinite.)
 Some facts:
– A language is Turing enumerable if and only
if it is semi-decidable.
– If a language and its complement are Turing
enumerable, then the language is decidable.
– If a language is decidable, then its
complement is decidable.

Church-Turing thesis
This thesis (not theorem!) holds that any
algorithmic procedure that can be performed by a
human or computer can be performed by a TM.
 It can’t be proved, but is widely believed.
 First implication: instead of describing a TM in
detail, we can describe a high-level algorithm and
assume a TM can be described that computes it.
 Second implication: if we can show that a
problem cannot be solved by any TM, we may
conclude that it can’t be solved by any computer.

Universal Turing machine
A TM that takes as input the description of a TM (a
“program”) and an input string, simulates (“runs”)
the TM on the input, and returns result.
 Can be viewed as a programmable TM.
 Equivalently, can be viewed as “interpreter” for
TM programming language. Just as you can write
an interpreter for C in C, you can construct a
universal TM that is interpreter for TM programs.
 Although Turing developed the concept of a
universal TM for theoretical reasons, it helped
stimulate the development of stored-program
computers.

“Programming” a universal Turing machine
We can encode any TM as a unique string
(or program) over some fixed alphabet, say
{0,1}.
 We can encode any input to the TM as a
string over the same alphabet
 There are many ways to do this and it
doesn’t matter what method we use … what
matters is that we can do this at all.

Important questions
How many Turing machines are there?
 How many functions are there?
 How many computable functions are there?
 How many languages are there?
 How many decidable languages are there?
 We’ll come back to these questions later. To
answer them, we first need to discuss what it
means for a set to be countably or uncountably
infinite. And for that, we begin with a review
of set theory.

Review of set theory
The cardinality of a set is the number
of elements in a set. For example,
Let S = {2, 4, 6}. Then |S| = 3.
The powerset of S is the set of all subsets
of S. For example, 2S = {{}, {2}, {4},
{6},
{2,4}, {2,6}, {4,6}, {2,4,6}}
The cardinality of powersets

We can use mathematical induction to prove
that the cardinality of the powerset of a
finite set, S, is 2|S|. What about a more
difficult question: what is the cardinality of
the powerset of an infinite set?
Countable sets
Two sets have the same cardinality if their
elements can be put in 1-1 correspondence with
each other
 An infinite set is countable if its elements can
be placed in 1-1 correspondence with the
natural numbers, that is, if its elements can be
listed sequentially. Basically, this amounts to
being able to specify what the first element of
the set is, what the second is, etc.

The even, natural numbers are countable
2
4
6
8
10
12
14 … 2n ...
1
2
3
4
5
6
7 …
n ...
The set of even, natural numbers has the same
cardinality as the set of natural numbers, although
it is a strict subset of the set of natural numbers.
The integers are countable
… -4
-3
-2
-1
0
1
2
3
4 ...
…
7
5
3
1
2
4
6
8 ...
9
The rational numbers are countable
Here are the rational numbers:
1/1
2/1
3/1
4/1
5/1
6/1
7/1
1/2
2/2
3/2
4/2
5/2
6/2
...
1/3
2/3
3/3
4/3
5/3
…
1/4
2/4
3/4
4/4
…
1/5 1/6 1/7 ...
2/5 2/6 ...
3/5 ...
…
What is the first rational number?
What is the second rational number?
What is the third rational number?
What is the fourth rational number?
etc.
1/1
2/1
1/2
3/1
The real numbers are uncountable
(Cantor’s diagonal argument)
Assume the real numbers between 0 and 1 can be
listed in order as infinite decimals.
f0:
f1:
f2:
f3:
...
0.
0.
0.
0.
f0(0)
f1(0)
f2(0)
f3(0)
f0(1)
f1(1)
f2(1)
f3(1)
f0(2)
f1(2)
f2(2)
f3(2)
f0(3)
f1(3)
f2(3)
f3(3)
…
…
…
…
Consider the real number f defined as f(n) = fn(n) +1.
Note that for every i, f(i)  fi(i). Therefore f is not in list.
This contradiction disproves the assumption that real
numbers between 0 and 1 are countable.
The real numbers are uncountable
Didn’t get that? OK; let’s try again.
We can define the first real number. Let’s arbitrarily make 0.0
the first real number. That means that we can put it into one-toone correspondence with the number 1.
real #
counting #
0.0
1
But now what is the second real number?
real #
counting #
0.0
1
X
2
No matter what number we pick for X, we can always find
another real number in between the previous real number and X.
For example, we can divide X by 2. That gives us another real
number in between 0.0 and X.
The real numbers are uncountable
Since we cannot specify what the second, third, fourth,
etc. elements of the set of real numbers are, the set of
real numbers in uncountable, or uncountably infinite.
Definition: A set is uncountably infinite if it is
impossible to sequentially list its elements
Georg Cantor used this argument to distinguish between
different levels of infinity.
‫א‬0 (aleph null) = infinity of integers
‫א‬1 (aleph one) = infinity of real numbers
The powerset of an infinite set S is uncountable
The proof is by contradiction using diagonalization.
Assume an infinite set S is countable; this means that the
subsets of S can be listed in sequence.
Order the elements of S sequentially. Represent each
subset of S by an infinite row of 0’s and 1’s, where 1
indicates that the corresponding element of S occurs in it.
S 1:
S 2:
S 3:
S 4:
…
Element # of elements in the original set
1 2 3 4 5 6 ...
1 0 1 1 0 1 …
0 0 1 1 0 0 …
1 1 1 0 0 1 …
1 0 1 0 1 1 …
The powerset of an infinite set S is uncountable
S1:
S2:
S3:
S4:
…
Element # of elements in the original set
1 2 3 4 5 6 ...
1 0 1 1 0 1 …
0 0 1 1 0 0 …
1 1 1 0 0 1 …
1 0 1 0 1 1 …
Consider, Sx, a subset of S that differs from each of these
at some point along the diagonal. It will be represented by:
Sx:
0
1
0
1
…
Note that Sx is a valid subset of S, but it is not identical to
any of the subsets already listed.
Its existence contradicts the assumption that the powerset
of an infinite set is countable.
The powerset of an infinite set S is uncountable
Don’t worry if you don’t get this right away; we will see
this in more detail a few slides later on.
Formal languages and countability
A formal language is a set of strings over an
alphabet. Is this set countably or uncountably
infinite?
 If the symbols of an alphabet are arranged in
order, we can define a lexicographical
ordering over the strings in any language over
that alphabet.
– “alphabetical order” is an example of a
lexicographic ordering
 What does this imply about the countability of
the strings in any language?

Formal languages and countability
Answer: The number of strings in a language
is countably infinite.
 Proof:

– Divide the strings of the language into subsets
based on their length; i.e., put all strings of length
1 together, all strings of length 2 together, etc.
– Within each set, put the strings in lexicographical
order
– Merge the subsets, preserving their order
– Now put the strings into one-to-one
correspondence with the counting numbers
Formal languages and countability

1
2
3
4
5
6
7
Example: L = ww, where  = {a, b}
aa
bb
aaaa
abab
baba
bbbb
aaaaaa
. . . (the strings are listed in canonical order)
How many TMs are there?
Because we can encode each TM as a string over
an alphabet, the number of possible TMs is
countably infinite.
 From this we may also conclude that the number
of possible programs in any programming
language is countably infinite.

Formal languages and
countability (continued)
Any language over  is a subset of *.
 How many possible languages over  are
there? (In other words, how many subsets of
* are there?)

How many languages are there?
Answer: There are an uncountably infinite
number of languages
 Proof:

–
–
–
–
Any language over  is a subset of *
* is an infinite set
The powerset of * is the number of subsets of *
The powerset of an infinite set is uncountable
How many TMs are there?

What does this imply about whether all
languages are decidable?
How many TMs are there?

We have shown that:
– The number of strings in a language is countably
infinite
– We can represent any Turing Machine as a string
over the alphabet  = {0, 1}
– Therefore, the number of TMs is countably infinite
– But there are an uncountably infinite number of
languages
– Consequently, we cannot put the number of TMs into
one-to-one correspondence with the number of
languages
How many TMs are there?
This means that there are more languages than
there are TMs.
 Every TM accepts all and only the strings of one
specific language.
 Therefore, there must be some languages that
cannot be recognized by any TM.
 Next chapter will talk about specific languages
that are not decidable (and specific functions that
are not computable).

11.1: Recursive and recursively
enumerable languages
Remember that the strings that a TM accepts
constitute the language of the Turing
machine. We represent this as L(T).
A Turing machine always accepts the words of
its language by stopping in the halting state.
However, it is allowed to reject strings that
don’t belong to its language either by
crashing (in a finite number of steps), or by
looping forever.
Recursive and recursively
enumerable languages
Infinite loops are bad for us, because if we
are trying to decide whether a string belongs
to the language of a TM or not, we can’t tell
after waiting a finite amount of time whether
the TM is going to halt on the very next step,
or whether it is going to go on forever. We
would prefer to have our TMs crash to reject
a string.
Recursive and recursively
enumerable
It turns out that these distinctions exactly
correspond to the last two major classes of
languages that we want to discuss in this
course:
Recursively enumerable = accepted by a TM
that may loop (or may crash) to reject
Recursive = accepted by a TM that always
crashes to reject
Definition 11.1:
If L  * is a language, then a Turing machine T
with input alphabet  is said to accept L if
L(T) = L.
The Turing machine T recognizes or decides L if T
computes the characteristic function
L :
*  {0, 1}.
In other words, T halts for every string x in *,
outputting a 1 if x  L, and outputting a 0
otherwise.
Definitions:
Definition 11.1: A language is recursively
enumerable if there is a TM that accepts L.
Definition 11.2: A language is recursive if there
is a TM that recognizes L.
This means that a language is recursive iff there
exists a membership algorithm for it. Otherwise
the language is recursively enumerable.
We also know:
The set of recursive languages is a proper
subset of the set of recursively enumerable
languages.
Theorem:
If L1 and L2 are recursively enumerable
languages over , then L1  L2 and L1 
L2 are also recursively enumerable
languages.
Theorem:
If L1 and L2 are recursive languages over ,
then L1  L2 and L1  L2 are also
recursive languages.
If L is a recursive language, then L is a
recursive language. (the  means
“complement”).
(Proof: Obviously, just change the output of
the TM from 0 to 1.)
Theorem:
If L is a recursively enumerable language, and L
is also recursively enumerable, then L must be
recursive.
Another way to say this is that the only way that a
language L and its complement L can both be
recursively enumerable is if both are recursive.
Think about this. This implies that the
complement of a non-recursive recursively
enumerable language is . . . what?
Theorem:
The complement of a non-recursive recursively
enumerable language is a language that is not
recursively enumerable.
This means that the language cannot be accepted
by a Turing Machine....
.... which means that NO automaton can accept the
language.
11.1: Enumerating a language
Putting a set of strings in canonical order means
listing the shortest strings first, and listing the
strings of the same length alphabetically. So the
set of strings {abb, a, ba, aa, b} would look like
this in canonical order: {a, b, aa, ba, abb}.
Enumerating a set means to list the elements of
the set one at a time – to put them into one-toone correspondence with the positive integers.
Theorem:
A language L  * is recursively
enumerable (that is, can be accepted by
some TM) if and only if L can be
enumerated by some TM.
How would the TM do this?
Theorem:
One way is to list every possible string in
canonical order: {l, a, b, aa, ab, ba, bb,
aab, …}
Next, construct a universal TM that contains
within it a simulation of the TM that
accepts L. Have it write 0 on its on tape to
start off. Now run the UTM on the strings.
To avoid infinite loops, we make a series
of passes:
Theorem:
1st pass: The UTM generates the string l and
simulates one move of the TM on that input.
2nd pass: The UTM simulates two moves of the
TM on the string l, then generates the string a
and simulates one move of the TM on that input.
3rd pass: The UTM simulates three moves of the
TM on the string l, two moves of the TM on the
string a, then generates the string b and
simulates one move of the TM on that input.
. . . and so on.
Theorem:
Whenever the TM accepts a string, the UTM
writes the next integer on its tape.
Every string that is accepted by the TM is
accepted after a finite number of moves.
You can see that eventually, after a finite
series of moves, all the strings belonging
to L will be accepted by the TM, and the
UTM will have written a series of integers
on its tape.
Another observation:
Note that, for some languages, the TM may
accept a longer string in fewer passes than
a shorter string.
However, if the language is recursive (not
just recursively enumerable) then it turns
out that all strings will be accepted in
canonical order.
Theorem:
L is recursive if and only if there is a TM
that enumerates L in canonical order.
Theorem 11.2: Not all languages
are recursively enumerable
Theorem 11.1: If S is an infinite countable
set, its powerset 2S is not countable.
Proof: Use Cantor’s diagonalization
demonstration in a proof by contradiction.
Cantor’s diagonalization proof:
Assume that 2S is countable. If S = {s1, s2, s3,
…}, then we can represent any element t of 2S
by a binary number in which the 1’s represent
the elements of S that are in t and the 0’s
represent the elements of S that are not in this
particular t.
Cantor’s diagonalization proof:
For example:
t1
t2
t3
t4
t5
t6
t7
...
0
1
0
1
0
1
1
0
0
1
1
0
0
1
0
0
0
0
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
…
…
…
…
…
…
…
Cantor’s diagonalization proof:
So 0101000… represents the set {s2, s4}.
But you can look at this pattern of 1’s and
0’s as if it were a binary number printed in
reverse order: …0001010, which
represents the integer 9. Each different
element of 2S will correspond to a unique
integer. In the table above, t1 corresponds
to 0, t2 corresponds to 1, etc.
Cantor’s diagonalization proof:
After we have listed all of the elements of 2S in a
matrix, Cantor says we can create a new subset
of 2S that does not appear in our matrix. Simply
take the first number of the first element, the
second number of the second element, and so
on, and reverse them. The resulting number
does not represent any element of 2S, because it
differs from each element by at least one
number. However, by definition, the powerset
of S contains all possible subsets of S. Ergo, 2S
cannot be countable.
Cantor’s diagonalization proof:
Cantor used the diagonalization argument to
show that the real numbers could not be
put into one-to-one correspondence with
the integers. The real numbers are not
enumerable, or countable.
Cantor’s diagonalization proof:
What does this have to do with formal
languages, you may ask.
Cantor’s diagonalization proof:
We showed earlier how you can encode any TM in
the form of a finite binary pattern. Each
different TM will be represented by a unique
pattern of 1’s and 0’s. We can interpret that
pattern as a binary number, corresponding to a
specific integer. There is an infinite number of
different possible TMs (just as there is an
infinite number of integers), but the set of all
possible TM’s is countable – infinite, but
countably infinite.
Cantor’s diagonalization proof:
A language L is a set of strings; you can
think of each different language as a
different subset of *. Conversely, each
different subset of * constitutes a
different language. Languages may have
an infinite number of strings in them.
(Even regular languages may be infinite:
how about the language of all strings that
begin with the letter a?)
Cantor’s diagonalization proof:
The set of all possible languages is the set of
all possible subsets of strings: 2*. Since
* is infinite, the set of all languages on 
is not countable.
Cantor’s diagonalization proof:
Every recursively enumerable language can
be represented by a TM. But the number
of TMs is countable, while the number of
different languages is not countable.
Hence, there must exist some languages
which cannot be represented by any TM.
These languages are not recursively
enumerable.
Cantor’s diagonalization proof:
I know what you’re going to ask: can you
give a good, clear example of a language
which is not recursively enumerable?
My answer is, “NO”!
(But see your book, pages 279-280).
Theorem:
Since we can encode any TM as a string of
1’s and 0’s, we can talk about a TM which
accepts the string that represents itself.
Let’s call this TM SA, for “self-accepting”.
Theorem:
The language SA is recursively enumerable
but not recursive.
That is, there is at least one input string that
will cause it to loop forever to reject the
string.
Theorem:
Now let’s imagine a TM that accepts all
strings except the string that represents
itself. Call this TM NSA, for “non-self
accepting).
We know that if L is a recursively enumerable
language, and L is also recursively
enumerable, then L must be recursive.
The language NSA is not recursively
enumerable.
Grammars
Now let’s take a final look at grammars.
Unrestricted Grammars
A grammar G = (V, T, S, P) is called
unrestricted if all the productions are of the
form
uv
where u is in (V  T)+ and v is in (V  T)*
Basically, the only restriction is that λ is not
allowed as the left side of a production.
Example:
Let L = {ww | w  {a, b}*}. We already know that a
context-free grammar cannot produce this language.
But the following unrestricted grammar will produce
all and only the strings of L:
S  FM
AM  Ma
F  FaA
BM  Mb
F  FbB
Fl
Aa  aA
Ml
Ab  bA
Ba  aB
Bb  bB
Unrestricted Grammars
Any language generated by an unrestricted
grammar is recursively enumerable.
Theorem:
A Turing machine may be constructed to
accept any recursively enumerable
language.
Formally:
If G = (V, T, S, P) is any unrestricted
grammar, then there is a Turing machine T
= (Q, , G, q0, d) with L(T) = L(G).
Theorem: Proof by construction:
Construct an NTM T to accept L(G). It will be a
composite TM:
T= MovePastInput  Simulate  Equal
Prior to processing, the input string is copied onto
the tape.
Theorem: Proof by construction:
MovePastInput moves the tape head to the first blank
after the input string.
Simulate simulates a derivation of the string in the
grammar G and leaves the derivation of the string on
the tape. (Since it is an NTM, it can “guess” the right
string to generate.)
Equal compares them.
If the strings are equal, the machine erases the second
string, leaving the original input string on the tape as
the output of the TM.
The TM will reject every string that can’t be generated by
G, either by crashing during Simulate, or by looping
forever while trying to generate the string.
Simulate:
The Simulate part of this composite TM will be
different for every different grammar that we
want to build a TM for. The next slide shows
what the Simulate part would look like for the
grammar
S  aBS | l
aB  Ba
Ba  aB
Bb
which generates strings with equal numbers of a’s
and b’s.
Next slide:
Context-sensitive grammars:
Definition:
A context sensitive grammar (CSG) is an
unrestricted grammar in which every production
has the form:
ab
where |a|  |b|
(Note that, if we interpret this rule strictly, l can
never be part of a context-sensitive language.)
Context-sensitive Grammars
A grammar G = (V, T, S, P) is called context
sensitive if all the productions are of the
form
xy
where x and y are in (V  T)+ and |x|  |y|
(Note that, if we interpret this rule strictly,
λ can never be part of a context-sensitive
language.)
Context-sensitive grammars:
The reason why we call these grammars contextsensitive is that we can rewrite the productions
of any context-sensitive grammar to be in this
form:
xAy  xvy
where x, y, and v are strings of any combination
of variables and terminals, v is nonnull, and A is
a single variable
Context-sensitive grammars:
Let’s look more closely at this rule:
xAy  xvy
We can describe this rule by saying that:
A goes to v in the context of x on the left and y
on the right
Compare to context-free:
Remember that, in a context-free grammar, the
productions all have a single variable on the
left, such as:
A aa
This rule states that we can always replace the
variable A with two terminal a’s anywhere A
happens to occur in an intermediate string.
Context-sensitive:
However, in a context-sensitive grammar, we
might have two rules such as:
aAa  aba
bAb  bbabb
which say that we can replace A with a single b
when it is in the middle of two a’s, but we
replace it with bab when it is in the middle of
two b’s.
Context-sensitive:
Clearly, context-sensitive rules give a grammar
more power. A context-sensitive grammar can
use the surrounding characters to decide to do
different things with a variable, instead of
always having to do the same thing every time.
Context-sensitive:
All productions in context-sensitive grammars are
non-decreasing or non-contracting; that is, they
never result in the length of the intermediate
string being reduced.
Context-sensitive:
Suppose that a TM, T, simulating a contextsensitive grammar is trying to generate a
particular string s to see if the string belongs to
L(T), the language of the Turing machine.
Assume that |s| is 6, and the TM has just written 7
characters on the tape. Is there any way that
this intermediate string is ever going to shrink
back to 6 characters?
No, not if the TM is a correct implementation of
the context-sensitive grammar.
This means two important things:
1. In a TM that is simulating a context-sensitive
grammar, we never need to have more cells on
our tape than the number of characters in the
string that the TM has been given to process.
This means two important things:
2. We never have to loop infinitely to reject a
string. As soon as the TM has checked all the
valid intermediate strings of its language that
are less than or equal to the length of the input
string, if it hasn’t found a match then it is never
going to. With a minor modification, the TM
can crash immediately to reject the string, and
does not have to go on examining additional
longer intermediate strings ad infinitum.
Note:
This is NOT true for standard TMs that simulate
unrestricted grammars; unrestricted grammars
can have any kind of grammar rule, including
rules that shrink the size of the intermediate
string.
Thus we might need a tape hundreds of cells
longs to process a string that ends up being only
2 terminals long. So we can’t arbitrarily restrict
the length of a tape for unrestricted grammars
and recursively enumerable languages.
Linear-bounded automaton:
A Turing machine that has the length of its tape
limited to the length of the input string is called
a linear-bounded automaton (LBA).
Definition:
A linear bounded automaton is a 5-tuple M = (Q, , G, q0,
d) that is a nondeterministic Turing machine except
that:
1. There are two extra tape symbols < and >, which are
not elements of G.
2. The TM begins in the configuration (q0, <x>), with its
tape head scanning the symbol < in cell 0. The >
symbol is in the cell immediately to the right of the
input string x.
3. The TM cannot replace < or > with anything else, nor
move the tape head left of < or right of >.
LBAs and CSLs:
The languages corresponding to LBAs are
precisely the class of context-sensitive
languages.
We can break this down into two theorems:
Theorem 11.4:
If L  * is a context-sensitive language, then
there is a linear-bounded automaton accepting
L.
Theorem 11.5:
If there is a linear bounded automaton M = (Q, ,
G, q0, d) accepting the language L  *, then
there is a context-sensitive grammar generating
L – {l}.
LBAs:
One last fact about LBAs: The definition of
LBAs specifies that they be nondeterministic.
It is not known if nondeterminism is necessary
or not, but no proof currently exists that a
deterministic version of an LBA would be
sufficiently powerful to recognize all contextsensitive grammars.
Context-sensitive Grammars
Most characteristics of human languages can
be described by context-sensitive
grammars.
However, human languages are creative (we
don’t have to obey the rules of the
language, or can make up new rules), so
they are probably really not even
recursively enumerable.
But what about recursive languages?
Theorem:
Every context-sensitive language is recursive.
Why?
Because there is a TM (specifically, an LBA) that
computes the characteristic function (often
called a membership algorithm) for each CSL.
Recursive vs. context-sensitive:
Theorem:
There exists at least one recursive language L
over {a, b} such that L – {l} is not contextsensitive. (see book for proof)
The previous two theorems together show that the
context-sensitive languages are a proper subset
of the recursive languages:
{context-sensitive languages}  {recursive languages}
Recursive languages:
This means that LBAs can’t recognize all
recursive languages.
In fact, we don’t have a different type of
automaton that that can recognize all and only
the recursive languages – just TMs that always
halt or crash, and never loop.
Also, recursive languages don’t seem to have a
grammar that corresponds to them. (Strange,
but true.)
Summary: languages
We have studied seven types of languages in this
course. They can be represented by a set of
concentric circles, each circle representing the
fact that the languages form proper subsets of
one another as you go from the outer to the
inner circles.
Languages:
non-recursively enumerable
recursively-enumerable
recursive
context-sensitive
non-deterministic context-free
deterministic context-free
regular
Summary: Automata
All of the automata we study in this class have a
finite number of states. They differ in the “auxiliary
memory” they have and how it is organized.
DFA / NFA
no auxiliary memory
DPDA
NPDA
infinite stack memory
LBA
proportional tape memory
DTM / NTM
infinite tape memory
Automata:
Turing machines
linear-bounded automata
non-deterministic push-down automata
deterministic push-down automata
finite-state automata
Summary: grammars
Noam Chomsky presented four classes of
grammars for generating languages. They can
be represented by a set of concentric circles,
each circle representing the fact that the
grammars form proper subsets of one another
and become weaker as you go from the outer to
the inner circles. Each corresponds to a class of
languages and to a type of automaton.
Grammars:
Phrase-structure, or unrestricted (Type 0)
context-sensitive (Type 1)
context-free (Type 2)
regular (Type 3)
Chomsky Hierarchy of Grammars
Language
Type
Languages
Generated
Production Rules
XY
Automaton
0
Phrase structure
(recursively
enumerable)
X = any string with  1 nonterminal
Y = any string
TM
1
Context-sensitive
X = any string with  1 nonterminal
Y = any string with length  |X|
LBA
2
Context-free
X = 1 nonterminal
Y = any string
PDA
3
Regular
X = 1 nonterminal
Y = 1 terminal OR
Y = 1 terminal and 1 nonterminal
FA

Chapter 9

Transcript Chapter 9

Directory