Regular - The University of Sydney
Download
Report
Transcript Regular - The University of Sydney
Surinder Kumar Jain,
University of Sydney
Automaton
Expressions
DFA
NFA
Ε-NFA
CFG as a DFA
Equivalence
Minimal DFA
Definition
Conversion from/to Automaton
Regular Langauges
Pumping Lemma – proving regularness
Closures
Equivalence
A
system with many states
Can transition from one state to another
Usually caused by external input
Set of states is finite
System is in one state at any given time
Mathematical Definition of a DFA
A = (Q, Σ,δ, q0,F)
Q : States, DFA is in one of these finite states at any
time.
Σ : Input symbols, DFA changes its state from one
state to another state on consuming an input symbol.
δ : Transition function.
Given a state and an input symbols, gives the next DFA
state
Function over QxΣ -> Q.
q0 : Initial DFA state
F : Accepting states. Once DFA reaches one of these
states, it may not accept any more input symbols.
Q = { waiting, pending, rejected, approved, paid }
Σ = {receive, reject, accept, pay }
δ : (waiting -> receive -> pending), (pending -> reject -> rejected),
(pending -> accept -> accepted), (accepted -> pay -> paid)
q0 : {waiting}
F : { rejected, paid }
start
receive
Waiting
accept
Pending
pay
Accepted
Paid
Paid
reject
Rejected
Paid
Q = { waiting, pending, rejected, approved, paid }
Σ = {receive, reject, accept, pay }
δ : (waiting -> receive -> pending), (pending -> reject -> rejected),
(pending -> accept -> accepted), (accepted -> pay -> paid)
q0 : {waiting}
F : { rejected, paid }
Set
of alphabets
Concatenation (joining)
Strings
A subset of strings is a language
A
DFA defines a language
Alphabet set is the set of input symbols
Concatenation - one symbol follows another
Acceptance – sequence of symbols takes DFA from
start state to one of the accepting states
Five-tuple
like a DFA, (Q, Σ,δ, q0,F)
Transition function returns a set not one
state
Several outgoing arcs with same symbol
In several states at the same time
Language of NFA
Any NFA language can be described by some DFA
Adding non-determinism does not give any thing
more
Why use NFAs then :
Easier to make for some languages
May have fewer states and less complex
Algorithm to convert NFA to DFA
For n state NFA,DFA may have up to 2n states
Can throw away inaccessible states
Observation : DFA has practically the same number of
states as NFA though it often has more transitions
For an NFA, N = {Q, Σ, δ, q0, F},
Construct the DFA, D = {Qd, Σ, δd, {q0}, Fd}
Qd = Powerset of Q
δd(S, a) = Up in S δ(p,a)
for every S in Qd.
Fd = S : S is subset of Q and S has an accepting state
of NFA
DFA operates on one state at a time, NFA operates on sets of states.
Given a state, NFA gives a set of new states
Make all possible sets of DFA states as NFA states
Transit from one set of states to a new set of all possible state set
Any set with an accepting state is the accepting state in NFA
O(2n)
(number of subsets of a set)
Efficient algorithm
Do not construct the entire power set
Start with start state
Only construct subsets that can reach an
accepting state from the start state
The number of states in DFA is much less than 2n.
DFA has practically the same number of states as
NFA though it often has more transitions
Includes
ε (the empty string, not in alphabet
set) as a transition
ε is identity in concatenation
a.ε = ε.a = a for all a
Spontaneous transition without an input
An
ε-NFA language can be described by some
NFA
Every NFA can be described by some DFA
Adding ε transition does not give any thing
more
Why use ε-NFAs then :
Easier to make for some languages
Useful in proving equivalence of languages
Conversion
aims to remove ε transitions
Define a new set of states
ε are contained inside the set
No ε arc leaves or enters the new set of states
Epsilon
closure (eclose)
For a state, set of all states reachable
spontaneously
Follow the ε arcs recursively and include reachable
states in the epsilon closure
For an ε-NFA, N = {Q, Σ, δ, q0, F},
Construct the DFA, D = {Qd, Σ, δd, {eclose(q0)}, Fd}
Qd = { eclose(q) | q = eclose(q) and q in Q }
δd(S, a) = Up in S δ(p,eclose(a))
for every S in Qd.
Fd = S : S is subset of Q and S has an accepting state
of NFA
DFA operates on one state at a time, ε-NFA operates on sets of states
with no ε transition leaving the set
Make all eclose sets as DFA states
Transit from one set of states to a new set of all eclose state set
Any set with an accepting state is the accepting state in NFA
An
imperative program can be represented as
a Control Flow Graph (CFG) with
It
statements at nodes and
predicates at edges
can be converted into a CFG with
both statements and predicates at edges
by pushing node statements up incoming edges
Such
a CFG is a DFA
Program points are States
Statements are input symbols that change
program state from program point to point
Algebraic
expression to denote languages
Composed of symbols “ε”, “Ø”, “+”, “*”, “.”,
“(“, “)” and alphabets
The language is generated using rules :
L(ε) = empty set
L(Ø) = empty set
L(a) = a for all alphabets a
L(p+q) = L(p) U L(q)
L(p.q) = { p’.q’ | p’ in L(p) & q’ in L(q) }
L(p*) = { qn | q in L(p) and n >= 0 }, q0= ε, qk=q.qk-1
a+b.c
The language generated is :
{ a, b.c }
a.b.c*.d
the language generated is :
{ a.b.d, a.b.c.d, a.b.c.c.d, a.b.c.c.c.d, … }
A finite way to express an infinite language
DEFINITION
Two
regular expression (or automaton)
are EQUAL
if they both generate same languages
Thus
(a.b)* + (b.a)* + a.(b.a)* + b.(b.a)*
= (ε + b).(a.b)*.(ε+a)
p+q=q+p
(p + q) + r = p + (q + r)
(p.q).r = p.(q.r)
Ø+p=p+Ø=p
ε.p = p.ε = p
Ø.p = p.Ø = Ø
p.(q=r) = p.q + p.r
(p + q).r = p.r + q.r
p+p=p
(p*)* = p*
Ø* = ε
ε* = ε
p.p* = p*.p
(p + q)* = (p*.q*)*
Every
language
defined by a finite automaton is also defined by
some regular expression
defined by a regular expression is also defined by
some DFA
Hopcroft’s
formula
Rij(k) = Rij(k-1)+Rik(k-1).(Rkk(k-1))*.Rkj(k-1)
Rij(n) is the regular expression of all paths from i
to j. (n is the number of states)
States are sorted in some order and numbered 1
to n
Rij(k) is regular expression of all paths from i to j
passing thru nodes whose sort order is less than k
Computed for all i,j for k=0, then k=1,…,k=n
Rs,f1(n)+…+Rs,fk(n) is the regular expression of the
DFA
s is the start state, f1,…,fk are accepting states, n is
the number of states.
Hopcroft
n3 to compute the table and
4n as size of regular expression grows by 4 every
time.
In
A
formula is O(n34n),
practice it is close to O(n3)
By simplifying the regular expression at every
step and
using judicious algorithm avoiding recomputation
of Rkk(k)
Most DFAs have almost n and not 2n accessible
states
faster state elimination method close to
O(n2) is also available
Regular
expression is converted to ε-NFA
ε-NFA can the be converted to NFA and to DFA
RE to ε-NFA conversion rules :
ε
Ø
a
+
->
->
->
->
One edge (two state) DFA with ε transition
Two state DFA with no edges
Two state with “a” transition
A new start/accept statejoining two
arguments of + in parallel
. -> Accept of first is start of second
* -> An ε edge joining star/accept of argument and
a new start/accept state
Convert
resulting ε-NFA to a DFA
Augment
regular expression r to (r).#
Position number for each occurrence of
alphabet
Compute for each node of syntax tree
nullable (ε in the language)
firstpos (set of possible first alphabets)
lastpos (set of possible last alphabets)
Compute
for each position
followpos (set of possible next alphabet after
this position)
Construct
the DFA
Unix
text search, search matching patterns
(grep)
Lexical/Parser analysis
Parse text against a regular expression
find set of first tokens at this expression root
find set of last tkens at this expression root
can the expression at this root be null set
find set of next tokens after an alphabet position
in a regular expression
Efficient
search of patterns in very large
repository (web text search)
DEFINITION
A
language (a set of strings)
is defined to be a regular language if
it can be defined by a finite automaton
by a DFA or
by an NFA or
by an ε-NFA or
by a regular expression
Four
different ways to describe a regular language
If
L is a regular language then there exists
integer n such that
for every string w in L
we can break w into x, y, z such that w=x.y.z
yε
|x.y| =< n
x.yk.z is in L (for all k >= 0)
Proof based on
For a DFA of length n
any string of length > n
must revisit a state
Used to
prove that a language is not regular
Language is a set of string over finite alphabets
Language operators :
Union of two languages L(A B) = L(A) L(B) - re
Intersection
Concatenation L(A.B) = { a.b | a in A, b in B}
Kleene Closure L(A*) = { an | a in A, n >= 0 }
a0 = ε for all a and an = an-1
Compliment L(A’) = { a | a not in A } (with respect to some
overall alphabet set) - dfa
Difference L(A-B) = L(A) – L(B) - dfa switch q0 F
Reversal L (A) = { ak.ak-1…a1 | a1…ak-1.ak in A }
Homomorphism – replace an alphabet with another regular
expression
Inverse homomorphism
Is
the language described empty?
Is a particualr string in the described
language?
Do two different of languages actually
describe the same language?
Decision properties may require conversion
between various forms.
Can the conversion be done in reasonable time?
Conversion
Complexity
Computing ε closures
O(n3) Warshall’s O(n)
Subset construction
O(2n)
NFA to DFA
O(n32n) (In practice O(n3s)
DFA to NFA conversion
O(n)
NFA/DFA to Regular
Expression
O(n34n) (worst case)
(Actual is much less)
Regular Expression to εNFA
O(n)
Regular Expression to NFA
O(n3)
Regular Expression to DFA
O(n34n^32^n)
Equivalence
of two states
States p and q in an automaton are Defined
to be equivalent if
For all input strings applied at state p or q
p ends up in an accepting state
if and only if
q also ends up in an accepting state
The
accepting state reached by p does not
have to be same accepting state as that
reached by q
If
two states p and q are equivalent
we can combine them together into a single
state
it wont affect the language accepted by the
DFA
This process of combining states together is
called Minimization
Table-filling algorithm can find if two states
are equivalent or not. Complexity O(n2)
Non-equivalent pairs are distinguishable
Minimum DFA is unique
Eliminate all states not reachable from start
Determine which states are equivalent
Partition states into blocks of equivalent states
Equivalence is transitive
Thus no state is in two blocks
Equivalence of two Regular Languages
Convert them into their minimum DFAs
and check for isomorphism
Union method
Make a minimum DFA of the union of the two
Start state of the two original DFAs must be
equivalent if and only if DFAs are equivalent