Language Recognition

Download Report

Transcript Language Recognition

INFO 2950
Prof. Carla Gomes
[email protected]
Module
Modeling Computation:
Language Recognition
Rosen, Chapter 12.4
1
What sets can be recognized by a
Finite State Automata?
Regular Sets
2
Regular Sets
Definition: A regular set is a set that can be
generated starting from the empty set, empty
string, and single elements from the alphabet,
using concatenations, unions, and Kleene closures
in arbitrary order.
We will give a more precise definition after we
define a regular expression.
Regular Expressions
Definition: The regular expressions over a set I are
defined recursively by:
–  (the empty set) is a regular expression,
–  (the set containing the empty string) is a regular
expression,
– x is a regular expression for all xI,
– (AB) , (AB) , and A* are regular expressions if A and B
are regular expressions
Definition: A regular set is a set represented by a
regular expression.
Regular Expression Example
Examples: 001*, 1(0(01)*11, and AB*C are
regular expressions
The regular set defined by the regular expression 01*
is the set of strings starting with a 0 followed by 0
or more 1s.
The regular set defined by (10)* is the set of strings
containing 0 or more copies of 10.
The regular set defined by 0(01)*1 is the set of all
binary strings beginning with 0 and ending with 1.
The regular set defined by (01)1(01) is the set of
strings {010, 011, 110, 111}.
What are the strings represented by
10*
A 1 followed by any mnumber of 0s (including no zeros)
(10)*
Any number of copies of 10 (including null string)
6
0  01
the string 0 or the string 01
0 (0  1)*
Any string beginning with 0
(0*1)*
Any string not ending with a 0 (including null string)
7
Find a regular expression
The set of bit strings with even length
(00 01 10 11)*
Set of bit strings ending with a 0 not containing 11
Concatenations of 0 or 10 ; not the null string
(0 10)*(010)
8
The set of bit strings containing and odd number of 0s
At least one 0
Zero or more 1s, followed by a 0, followed by zero or more 1
1*01*(01*01*)*
9
Regular Expression Applications
Regular expressions are actually used quite often in
computer science.
For instance, if you are editing a file with vi, and
want to see if it contains the string blah followed
by a number followed by any character followed
by the letter Q, you can use the regular expression
blah[0-9][0-9]*.Q
This works because vi uses regular expressions for
searching.
Regular Expression
Regular Grammar
a*
(a+b)*
a* + b*
S   | aS
S   | aS | bS
S|A|B
A  a | aA
B  b | bB
S  b | aS
S  bA
A   | aA
S   | abS
a*b
ba*
(ab)*
EXAMPLE 1
Consider the language { ambn | m, n  N}, which is
represented by the regular expression a*b*.
A regular grammar for this language can be written
as follows:
S  | aS | B
B  b | bB.
Grammars, Expressions, and
Automata
• Consider the set
A={binary strings which start with 0 and end with 1}
We saw that A is recognized by a finite-state automata.
A is generated by the grammar with V={S,A,0,1},
T={0,1}, and P={S0A, A0A, A1A, A1}
We also saw that A is defined by the regular expression
0(01)*1
• This is no coincidence, as we will see next.
Three Equivalent Representations
Regular
expressions
Finite
automata
Each
can
describe
the others
Regular
languages
Kleene’s Theorem:
For every regular expression, there is a
deterministic finite-state automaton that
defines the same language, and vice versa.
Grammars, Expressions, and
Automata
• Theorem: Let L be a language. The following three
statements are equivalent
L is regular set (that is, L generated by a regular expression)
L is a regular language (that is, L generated by a regular grammar)
L is recognized by a finite-state automaton
• Put another way, L is a regular set if and only if L is a
regular language if and only if L is recognized by a finitestate automaton.
• In other words, regular sets, regular languages, and
languages recognized by finite-state automata are all the
same thing.
Example
• Example: What language does the following finite-state
automaton recognize?
Complex Example Continued
• If start by going to state S1 can recognize 000, 0110, 011100, 0111110,
011111100, 00100, 0010100, 01110110, 01110100, …
• It is not easy to see the pattern right away, but notice that they
Start with 0
Can have any number of instances of 111 or 01 interleaved
Can then have either 00 or 110
Can end with any number of 1s.
• These are all of the form 0(11101)*(00110)1*
• But we can also start by going to S6
Complex Example Continued
• If we start by going to S6 we notice that the strings
Start with 1
Have any number of occurrences of 01
Have a 1
End with as many 0s as we want
• These are of the form 1(01)*10*
• Thus, we can recognize (0(11101)*(00110)1*) 1(01)*10*)
Limitations
• Problem: Find a finite-state automaton that recognizes the
following language
L={0n1n | n=0,1,2,…}
• Solution: It cannot be done.
• Proof: Take an advanced course.
• Can you describe L with a regular expression?
• Can you give a regular grammar that generates L?
• Can you give any grammar that generates L?
Models of computing
DFA
Push down automata
Bounded Turing M’s
Turing machines
-
regular languages
Context-free
Context sensitive
Phrase-structure
20
Summary
• Hopefully it is clear that although finite-state machines and
finite-state automata are useful models of computation,
they have serious limitations.
• Are there more powerful ways to model computation?
• The answer is: Yes.
• Some more powerful models include
Pushdown automaton
Linear bounded automaton
Turing machines
Quantum computation models