Formal Languages and Automata Theory

Download Report

Transcript Formal Languages and Automata Theory

Unit 1: Automata Theory
and Formal Languages
Readings 1, 2.2, 2.3
1
What is automata theory
• Automata theory is the study of abstract
computational devices
• Abstract devices are (simplified) models of real
computations
• Computations happen everywhere: On your laptop,
on your cell phone, in nature, …
• Why do we need abstract models?
2
A simple computer
BATTERY
input: switch
output: light bulb
actions: flip switch
states: on, off
3
A simple “computer”
f
BATTERY
start
on
off
f
input: switch
output: light bulb
actions: f for “flip switch”
bulb is on if and only if
there was an odd number
of flips
states: on, off
4
Another “computer”
1
1
start
off
2
BATTERY
off
1
2
2
2
1
2
off
1
on
inputs: switches 1 and 2
actions: 1 for “flip switch 1”
actions: 2 for “flip switch 2”
states: on, off
bulb is on if and only if
both switches were flipped
an odd number of times
5
A design problem
1
BATTERY
2
3
?
4
5
Can you design a circuit where the light is on if and only
if all the switches were flipped exactly the same number
of times?
6
A design problem
• Such devices are difficult to reason about, because
they can be designed in an infinite number of ways
• By representing them as abstract computational
devices, or automata, we will learn how to answer
such questions
7
These devices can model many things
• They can describe the operation of any “small
computer”, like the control component of an alarm
clock or a microwave
• They are also used in lexical analyzers to recognize
well formed expressions in programming languages:
ab1 is a legal name of a variable in C
5u= is not
8
Different kinds of automata
• This was only one example of a computational device,
and there are others
• We will look at different devices, and look at the
following questions:
– What can a given type of device compute, and what are its
limitations?
– Is one type of device more powerful than another?
9
Some devices we will see
finite automata
Devices with a finite amount of memory.
Used to model “small” computers.
push-down
automata
Devices with infinite memory that can be
accessed in a restricted way.
Used to model parsers, etc.
Turing Machines
Devices with infinite memory.
Used to model any computer.
time-bounded
Turing Machines
Infinite memory, but bounded running time.
Used to model any computer program that
runs in a “reasonable” amount of time.
10
Some highlights of the course
• Finite automata
– We will understand what kinds of things a device with finite
memory can do, and what it cannot do
– Introduce simulation: the ability of one device to “imitate”
another device
– Introduce nondeterminism: the ability of a device to make
arbitrary choices
• Push-down automata
– These devices are related to grammars, which describe the
structure of programming (and natural) languages
11
Some highlights of the course
• Turing Machines
– This is a general model of a computer, capturing anything
we could ever hope to compute
– Surprisingly, there are many things that we cannot
compute, for example:
Write a program that, given the code of another
program in C, tells if this program ever outputs
the word “hello”
– It seems that you should be able to tell just by looking at
the program, but it is impossible to do!
12
Some highlights of the course
• Time-bounded Turing Machines
– Many problems are possible to solve on a computer in
principle, but take too much time in practice
– Traveling salesman: Given a list of cities, find the shortest
way to visit them and come back home
Beijing
Chengdu
Xian
Shanghai
Guangzhou
Hong Kong
– Easy in principle: Try the cities in every possible order
– Hard in practice: For 100 cities, this would take 100+ years
even on the fastest computer!
13
Preliminaries of automata theory
• How do we formalize the question
Can device A solve problem B?
• First, we need a formal way of describing the
problems that we are interested in solving
14
Problems
• Examples of problems we will consider
–
–
–
–
Given a word s, does it contain the subword “fool”?
Given a number n, is it divisible by 7?
Given a pair of words s and t, are they the same?
Given an expression with brackets, e.g. (()()), does
every left bracket match with a subsequent right bracket?
• All of these have “yes/no” answers.
• There are other types of problems, that ask “Find
this” or “How many of that” but we won’t look at
those.
15
Alphabets and strings
• A common way to talk about words, number, pairs of
words, etc. is by representing them as strings
• To define strings, we start with an alphabet
An alphabet is a finite set of symbols.
• Examples
S1 = {a, b, c, d, …, z}: the set of letters in English
S2 = {0, 1, …, 9}: the set of (base 10) digits
S3 = {a, b, …, z, #}: the set of letters plus the
special symbol #
S4 = {(, )}: the set of open and closed brackets
16
Strings
A string over alphabet S is a finite sequence
of symbols in S.
• The empty string will be denoted by e
• Examples
abfbz is a string over S1 = {a, b, c, d, …, z}
9021 is a string over S2 = {0, 1, …, 9}
ab#bc is a string over S3 = {a, b, …, z, #}
))()(() is a string over S4 = {(, )}
17
Languages
A language is a set of strings over an alphabet.
• Languages can be used to describe problems with
“yes/no” answers, for example:
The set of all strings over S1 that contain
the substring “fool”
L2 = The set of all strings over S2 that are divisible by 7
= {7, 14, 21, …}
L3 = The set of all strings of the form s#s where s is any
string over {a, b, …, z}
L4 = The set of all strings over S4 where every ( can be
matched with a subsequent )
L1 =
18
Finite Automata
19
Example of a finite automaton
f
on
off
f
• There are states off and on, the automaton starts in
off and tries to reach the “good state” on
• What sequences of fs lead to the good state?
• Answer: {f, fff, fffff, …} = {f n: n is odd}
• This is an example of a deterministic finite automaton
over alphabet {f}
20
Deterministic finite automata
• A deterministic finite automaton (DFA) is a 5-tuple
(Q, S, d, q0, F) where
–
–
–
–
–
Q is a finite set of states
S is an alphabet
d: Q × S → Q is a transition function
q0  Q is the initial state
F  Q is a set of accepting states (or final states).
• In diagrams, the accepting states will be denoted by
double loops
21
Example
q0
1
1
q1
alphabet S = {0, 1}
set of states Q = {q0, q1, q2}
initial state q0
accepting states F = {q0, q1}
0,1
0
q2
transition function d:
inputs
states
0
q0
q1
q2
0
q0
q2
q2
1
q1
q1
q2
22
Language of a DFA
The language of a DFA (Q, S, d, q0, F) is the set of
all strings over S that, starting from q0 and
following the transitions as the string is read left
to right, will reach some accepting state.
f
M:
on
off
f
• Language of M is {f, fff, fffff, …} = {f n: n is odd}
23
Examples
0
0
1
q0
q1
1
0
1
1
q0
0
0
q0
q1
1
1
q1
0,1
0
q2
What are the languages of these DFAs?
24
Examples
• Construct a DFA that accepts the language
L = {010, 1}
( S = {0, 1} )
25
Examples
• Construct a DFA that accepts the language
( S = {0, 1} )
L = {010, 1}
• Answer
0
q0
1
0
qe
0
q01
q010
1
0, 1
1
q1
0, 1
qdie
0, 1
26
Examples
• Construct a DFA over alphabet {0, 1} that accepts all
strings that end in 101
27
Examples
• Construct a DFA over alphabet {0, 1} that accepts all
strings that end in 101
• Hint: The DFA must “remember” the last 3 bits of
the string it is reading
28
Examples
• Construct a DFA over alphabet {0, 1} that accepts all
strings that end in 101
• Sketch of answer:
0
0
q0
1
qe
1
0
q1
1
q00
1
q01
…
q10
q11
1
…
1
1
q001
…
0
q000
q101
…
0
q111
1
29
Would be easier if…
• Suppose we could guess when the string we are
reading has only 3 symbols left
• Then we could simply look for the sequence 101
and accept if we see it
3 symbols left
1
0
qdie
1
This is not a DFA!
30
Nondeterminism
• Nondeterminism is the ability to make guesses, which
we can later verify
• Informal nondeterministic algorithm for language of
strings that end in 101:
1. Guess if you are approaching end of input
2. If guess is yes, look for 101 and accept if you see it
3. If guess is no, read one more symbol and go to step 1
31
Nondeterministic finite automaton
• This is a kind of automaton that allows you to make
guesses
0, 1
q0
1
q1
0
q2
1
q3
• Each state can have zero, one, or more transitions
out labeled by the same symbol
32
Semantics of guessing
0, 1
q0
1
q1
0
q2
1
q3
• State q0 has two transitions labeled 1
• Upon reading 1, we have the choice of staying in q0 or
moving to q1
33
Semantics of guessing
0, 1
q0
1
q1
0
q2
1
q3
• State q1 has no transition labeled 1
• Upon reading 1 in q1, we die; upon reading 0, we
continue to q2
34
Semantics of guessing
0, 1
q0
1
q1
0
q2
1
q3
• State q3 has no transition going out
• Upon reading anything in q3, we die
35
Meaning of automaton
Guess if you are 3 symbols away from
end of input
0, 1
q0
1
q1
0
If so, guess you will see the
pattern 101
q2
1
q3
Check that you are at the end of
input
36
Formal definition
• A nondeterministic finite automaton (NFA) is a
5-tuple (Q, S, d, q0, F) where
–
–
–
–
–
Q is a finite set of states
S is an alphabet
d: Q × S → subsets of Q is a transition function
q0  Q is the initial state
F  Q is a set of accepting states (or final states).
• Only difference from DFA is that output of d is a set
of states
37
Example
0, 1
alphabet S = {0, 1}
states Q = {q0, q1, q2, q3}
initial state q0
accepting states F = {q3}
1
q1
0
q2
1
q3
transition function d:
inputs
states
q0
0
1
q0
{q0}
{q0, q1}
q1
{q2}

q2

{q3}
q3


38
Language of an NFA
The language of an NFA is the set of all strings for
which there is some path that, starting from the
initial state, leads to an accepting state as the
string is read left to right.
• Example
0, 1
q0
1
q1
0
q2
1
q3
– 1101 is accepted, but 0110 is not
39
NFAs are as powerful as DFAs
• Obviously, an NFA can do everything a DFA can do
• But can it do more?
40
NFAs are as powerful as DFAs
• Obviously, an NFA can do everything a DFA can do
• But can it do more?
NO!
• Theorem
A language L is accepted by some DFA if and
only if it is accepted by some NFA.
41
Proof of theorem
• To prove the theorem, we have to show that for
every NFA there is a DFA that accepts the same
language
• We will give a general method for simulating any NFA
by a DFA
• Let’s do an example first
42
Simulation example
0, 1
NFA:
1
q0
q0
q2
0
0
DFA:
0
q1
1
q0 or q1
1
0
q0 or q2
1
43
General method
states
NFA
DFA
q0, q1, …, qn
q0}, {q1}, {q0,q1}, …, {q0,…,qn}
one for each subset of states in the NFA
initial state
q0
q0}
transitions
d
d’({qi1,…,qik}, a) =
d(qi1, a) ∪…∪ d(qik, a)
accepting
states
FQ
F’ = {S: S contains some state in F}
44
Proof of correctness
• Lemma
After reading n symbols, the DFA is in state
{qi1,…,qik} if and only if the NFA is in one of the
states qi1,…,qik
• Proof by induction on n
• At the end, the DFA accepts iff it is in a state that
contains some accepting state of NFA
• By lemma, this is true iff the NFA can reach an
accepting state
45
Exercises
• Construct NFAs for the following languages over the
alphabet {a, b, …, z}:
– All strings that contain eat or sea or easy
– All strings that contain both sea and tea
– All strings that do not contain fool
46