Regular - The University of Sydney

Download Report

Transcript Regular - The University of Sydney

Surinder Kumar Jain,
University of Sydney

Automaton







Expressions



DFA
NFA
Ε-NFA
CFG as a DFA
Equivalence
Minimal DFA
Definition
Conversion from/to Automaton
Regular Langauges



Pumping Lemma – proving regularness
Closures
Equivalence
A
system with many states
 Can transition from one state to another
 Usually caused by external input
 Set of states is finite
 System is in one state at any given time





Mathematical Definition of a DFA
A = (Q, Σ,δ, q0,F)
Q : States, DFA is in one of these finite states at any
time.
Σ : Input symbols, DFA changes its state from one
state to another state on consuming an input symbol.
δ : Transition function.




Given a state and an input symbols, gives the next DFA
state
Function over QxΣ -> Q.
q0 : Initial DFA state
F : Accepting states. Once DFA reaches one of these
states, it may not accept any more input symbols.
Q = { waiting, pending, rejected, approved, paid }
Σ = {receive, reject, accept, pay }
δ : (waiting -> receive -> pending), (pending -> reject -> rejected),
(pending -> accept -> accepted), (accepted -> pay -> paid)
q0 : {waiting}
F : { rejected, paid }
start
receive
Waiting
accept
Pending
pay
Accepted
Paid
Paid
reject
Rejected
Paid
Q = { waiting, pending, rejected, approved, paid }
Σ = {receive, reject, accept, pay }
δ : (waiting -> receive -> pending), (pending -> reject -> rejected),
(pending -> accept -> accepted), (accepted -> pay -> paid)
q0 : {waiting}
F : { rejected, paid }
 Set
of alphabets
 Concatenation (joining)
 Strings
 A subset of strings is a language
A



DFA defines a language
Alphabet set is the set of input symbols
Concatenation - one symbol follows another
Acceptance – sequence of symbols takes DFA from
start state to one of the accepting states
 Five-tuple
like a DFA, (Q, Σ,δ, q0,F)
 Transition function returns a set not one
state
 Several outgoing arcs with same symbol
 In several states at the same time
 Language of NFA
Any NFA language can be described by some DFA
 Adding non-determinism does not give any thing
more
 Why use NFAs then :

Easier to make for some languages
 May have fewer states and less complex


Algorithm to convert NFA to DFA
For n state NFA,DFA may have up to 2n states
 Can throw away inaccessible states
 Observation : DFA has practically the same number of
states as NFA though it often has more transitions











For an NFA, N = {Q, Σ, δ, q0, F},
Construct the DFA, D = {Qd, Σ, δd, {q0}, Fd}
Qd = Powerset of Q
δd(S, a) = Up in S δ(p,a)
for every S in Qd.
Fd = S : S is subset of Q and S has an accepting state
of NFA
DFA operates on one state at a time, NFA operates on sets of states.
Given a state, NFA gives a set of new states
Make all possible sets of DFA states as NFA states
Transit from one set of states to a new set of all possible state set
Any set with an accepting state is the accepting state in NFA
 O(2n)
(number of subsets of a set)
 Efficient algorithm





Do not construct the entire power set
Start with start state
Only construct subsets that can reach an
accepting state from the start state
The number of states in DFA is much less than 2n.
DFA has practically the same number of states as
NFA though it often has more transitions
 Includes
ε (the empty string, not in alphabet
set) as a transition
 ε is identity in concatenation
 a.ε = ε.a = a for all a
 Spontaneous transition without an input
 An
ε-NFA language can be described by some
NFA
 Every NFA can be described by some DFA
 Adding ε transition does not give any thing
more
 Why use ε-NFAs then :


Easier to make for some languages
Useful in proving equivalence of languages
 Conversion
aims to remove ε transitions
 Define a new set of states
ε are contained inside the set
 No ε arc leaves or enters the new set of states

 Epsilon


closure (eclose)
For a state, set of all states reachable
spontaneously
Follow the ε arcs recursively and include reachable
states in the epsilon closure









For an ε-NFA, N = {Q, Σ, δ, q0, F},
Construct the DFA, D = {Qd, Σ, δd, {eclose(q0)}, Fd}
Qd = { eclose(q) | q = eclose(q) and q in Q }
δd(S, a) = Up in S δ(p,eclose(a))
for every S in Qd.
Fd = S : S is subset of Q and S has an accepting state
of NFA
DFA operates on one state at a time, ε-NFA operates on sets of states
with no ε transition leaving the set
Make all eclose sets as DFA states
Transit from one set of states to a new set of all eclose state set
Any set with an accepting state is the accepting state in NFA
 An
imperative program can be represented as
a Control Flow Graph (CFG) with


 It


statements at nodes and
predicates at edges
can be converted into a CFG with
both statements and predicates at edges
by pushing node statements up incoming edges
 Such


a CFG is a DFA
Program points are States
Statements are input symbols that change
program state from program point to point
 Algebraic
expression to denote languages
 Composed of symbols “ε”, “Ø”, “+”, “*”, “.”,
“(“, “)” and alphabets
 The language is generated using rules :






L(ε) = empty set
L(Ø) = empty set
L(a) = a for all alphabets a
L(p+q) = L(p) U L(q)
L(p.q) = { p’.q’ | p’ in L(p) & q’ in L(q) }
L(p*) = { qn | q in L(p) and n >= 0 }, q0= ε, qk=q.qk-1
a+b.c
The language generated is :
{ a, b.c }
a.b.c*.d
the language generated is :
{ a.b.d, a.b.c.d, a.b.c.c.d, a.b.c.c.c.d, … }
A finite way to express an infinite language
DEFINITION
 Two
regular expression (or automaton)
 are EQUAL
 if they both generate same languages
Thus
(a.b)* + (b.a)* + a.(b.a)* + b.(b.a)*
= (ε + b).(a.b)*.(ε+a)














p+q=q+p
(p + q) + r = p + (q + r)
(p.q).r = p.(q.r)
Ø+p=p+Ø=p
ε.p = p.ε = p
Ø.p = p.Ø = Ø
p.(q=r) = p.q + p.r
(p + q).r = p.r + q.r
p+p=p
(p*)* = p*
Ø* = ε
ε* = ε
p.p* = p*.p
(p + q)* = (p*.q*)*
 Every


language
defined by a finite automaton is also defined by
some regular expression
defined by a regular expression is also defined by
some DFA
 Hopcroft’s






formula
Rij(k) = Rij(k-1)+Rik(k-1).(Rkk(k-1))*.Rkj(k-1)
Rij(n) is the regular expression of all paths from i
to j. (n is the number of states)
States are sorted in some order and numbered 1
to n
Rij(k) is regular expression of all paths from i to j
passing thru nodes whose sort order is less than k
Computed for all i,j for k=0, then k=1,…,k=n
Rs,f1(n)+…+Rs,fk(n) is the regular expression of the
DFA

s is the start state, f1,…,fk are accepting states, n is
the number of states.
 Hopcroft


n3 to compute the table and
4n as size of regular expression grows by 4 every
time.
 In



A
formula is O(n34n),
practice it is close to O(n3)
By simplifying the regular expression at every
step and
using judicious algorithm avoiding recomputation
of Rkk(k)
Most DFAs have almost n and not 2n accessible
states
faster state elimination method close to
O(n2) is also available
 Regular
expression is converted to ε-NFA
 ε-NFA can the be converted to NFA and to DFA
 RE to ε-NFA conversion rules :






ε
Ø
a
+
->
->
->
->
One edge (two state) DFA with ε transition
Two state DFA with no edges
Two state with “a” transition
A new start/accept statejoining two
arguments of + in parallel
. -> Accept of first is start of second
* -> An ε edge joining star/accept of argument and
a new start/accept state
 Convert
resulting ε-NFA to a DFA
 Augment
regular expression r to (r).#
 Position number for each occurrence of
alphabet
 Compute for each node of syntax tree



nullable (ε in the language)
firstpos (set of possible first alphabets)
lastpos (set of possible last alphabets)
 Compute

for each position
followpos (set of possible next alphabet after
this position)
 Construct
the DFA
 Unix
text search, search matching patterns
(grep)
 Lexical/Parser analysis





Parse text against a regular expression
find set of first tokens at this expression root
find set of last tkens at this expression root
can the expression at this root be null set
find set of next tokens after an alphabet position
in a regular expression
 Efficient
search of patterns in very large
repository (web text search)
DEFINITION
A
language (a set of strings)
 is defined to be a regular language if
 it can be defined by a finite automaton




by a DFA or
by an NFA or
by an ε-NFA or
by a regular expression
 Four
different ways to describe a regular language
 If



L is a regular language then there exists
integer n such that
for every string w in L
we can break w into x, y, z such that w=x.y.z




yε
|x.y| =< n
x.yk.z is in L (for all k >= 0)
Proof based on



For a DFA of length n
any string of length > n
must revisit a state
 Used to
prove that a language is not regular
Language is a set of string over finite alphabets
 Language operators :

Union of two languages L(A B) = L(A) L(B) - re
Intersection
Concatenation L(A.B) = { a.b | a in A, b in B}
 Kleene Closure L(A*) = { an | a in A, n >= 0 }









a0 = ε for all a and an = an-1
Compliment L(A’) = { a | a not in A } (with respect to some
overall alphabet set) - dfa
Difference L(A-B) = L(A) – L(B) - dfa switch q0 F
Reversal L (A) = { ak.ak-1…a1 | a1…ak-1.ak in A }
Homomorphism – replace an alphabet with another regular
expression
Inverse homomorphism
 Is
the language described empty?
 Is a particualr string in the described
language?
 Do two different of languages actually
describe the same language?
Decision properties may require conversion
between various forms.
 Can the conversion be done in reasonable time?

Conversion
Complexity
Computing ε closures
O(n3) Warshall’s O(n)
Subset construction
O(2n)
NFA to DFA
O(n32n) (In practice O(n3s)
DFA to NFA conversion
O(n)
NFA/DFA to Regular
Expression
O(n34n) (worst case)
(Actual is much less)
Regular Expression to εNFA
O(n)
Regular Expression to NFA
O(n3)
Regular Expression to DFA
O(n34n^32^n)
 Equivalence
of two states
 States p and q in an automaton are Defined
to be equivalent if




For all input strings applied at state p or q
p ends up in an accepting state
if and only if
q also ends up in an accepting state
 The
accepting state reached by p does not
have to be same accepting state as that
reached by q
 If
two states p and q are equivalent
 we can combine them together into a single
state
 it wont affect the language accepted by the
DFA
 This process of combining states together is
called Minimization
 Table-filling algorithm can find if two states
are equivalent or not. Complexity O(n2)
 Non-equivalent pairs are distinguishable

Minimum DFA is unique






Eliminate all states not reachable from start
Determine which states are equivalent
Partition states into blocks of equivalent states
Equivalence is transitive
Thus no state is in two blocks
Equivalence of two Regular Languages
Convert them into their minimum DFAs
 and check for isomorphism


Union method
Make a minimum DFA of the union of the two
 Start state of the two original DFAs must be
equivalent if and only if DFAs are equivalent
