Regular Expressions and Non

Transcript Regular Expressions and Non

Regular Expressions and
Non-regular Languages
http://cis.k.hosei.ac.jp/~yukita/
Expressions and their values
Expression Value
Arithmetic
expression
Regular
expression
(5  3)  4
32
(0  1)  0*
T helanguage thatconsistsof
or
strings begining with 0 or 1
(0  1)0*
followed by an arbitrarynumber of 0s.
Subexpressions 0, 1, 0  1, and 0* have values{0}, {1}, and
{0,1}, and{0}* , respectively. T he value of the whole expression
is {0,1}  {0}* .
2
Definition 1.26
R is a regular expressionif
1. R  a  ,
2. R   ,
3. R  ,
4. R  ( R1  R2 ), where R1 and R2 are regular expressions,
5. R  ( R1  R2 ), where R1 and R2 are regular expressions, or
6. R  ( R1 ), where R1 is a regular expression.
*
3
The values of atomic expressions
expression value
a
{a}

{ }

meaning
T helanguage thatconsistsof
only one stringa  * .
T helanguage thatconsistsof
only theemptystring  * .
  {}
T heemptylanguage
In manyreal programming languages, we must distinguish
alphabeta   and stringa  * ; theformershould be
writtenas ' a' while thelatter" a". But we will not follow
thisconvention
.
4
Example 1.27
Let   {0,1} throughout in thefollowingexamples.
1. 0*10*  {w | w has exactlya single 1}.
2. *1*  {w | w has at least one1}.
3. * 001*  {w | w containsthestring001as a substring}.
4. ()*  {w | w is a stringof even length}.
7. 0* 0  1*1  0  1  {w | w startsand ends with thesame symbol}.
8. (0   )1*  01*  1*.
10. 1*   .
11. *  { }.
5
Units for the binary operations
R    R shows t hat is t heunit for t heunion operat ion.
R    R shows t hat is t heunit for t heconcat enation operat ion.
We denot eby L( R ) t helanguage of R.
R   may not equal R.
For example,if R  0, t henL( R )  {0} but L( R   )  {0,  }.
In general, R    R.
For example,if R  0, t hen L( R )  {0} but L( R  )  .
6
Theorem 1.28
• A language is regular if and only if some
regular expression describes it.
We break down this theorem as follows.
• Lemma 1.29
– If a language is described by a regular
expression, then it is regular.
• Lemma 1.32
– If a langulage is regular, then it is described
by a regular expression.
7
Proof of Lemma 1.29
We shall convertR intoan NFA N .
a
Case 1 : R  a.
Case 2 : R   .
Case 3 : R  .
Case 4 : R  R1  R2 .
Case 5 : R  R1  R2 .
Next three slides
Case 6 : R  R
*
1
Proof of Lemma 1.29
8
Case 4: Let N1, N2, and N correspond to R1,
R2, and R, respectively.
N
N1


N2
Proof of Lemma 1.29
9
Case 5: Let N1, N2, and N correspond to R1,
R2, and R, respectively.
N2
N1

N

Proof of Lemma 1.29
10
Case 6: Let N1 and N correspond to R1 and
R, respectively.
N

N1


Proof of Lemma 1.29
11
Generalized Nondeterministic
Finite Automaton
• is roughly a NFA in which the transition arrows may have
regular expressions as labels.
• We assume the following standard form for convenience,
which can always be attained with an easy modification.
– There is only one accept state and different from the start state.
– The start state has transition arrows going to every other state
but no arrows coming in from any other state.
– There is only a single accept state, and it has arrows coming in
from any other state but no arrows going to any other state.
– Except for the start and accept states, one arrow goes from
every state to every other state and also from each state to itself.
12
Standard Form of GNFA
...
13
Standard Form of GNFA
Q  {qstart }  Q  {qaccept} is a disjont union.
For each pair (qi , q j ) in ({qstart }  Q)  (Q  Q)  Q  {qaccept}
thereis one and only one directedarrow going from qi to q j .
14
Equivalent GNFA with one fewer state
qi
R4
R1
qj
qi
( R1 )(R2 ) * ( R3 )  ( R4 )
qj
R3
qrip
R2
15
Definition 1.33
A generalized nondeterministicfiniteautomatonis 5 - tuple
(Q, ,  , qstart , qaccept ), where R is theset of regular expressions,
1. Q is thefiniteset of states,
2.  is theinput alphabet,
3.  : (Q  {qaccept})  (Q  {qstart })  R,
4. qstart is thestart state,and
5. qaccept is theacceptstate.
16
Computation with GNFA
A GNFA acceptsa stringw  w1w2  wk  * with wi  * ,
and a sequence of statesq0 , q1 ,, qk existssuch that
1. q0  qstart ,
2. qk  qaccept,
3. for each i, we havewi  L( Ri ), where Ri   (qi 1 , qi ).
17
Converting GNFA
Convert(G ) :
1. Let k be thenumber of statesof G.
2. If k  2, thenreturn theregular expressionappearingon the
only arrow.
3. If k  2, we select any stateqrip  Q  {qstart , qaccept} and let
G be theGNFA (Q, ,  , qstart , qaccept ), where Q  Q  {qrip },
and for any qi  Q  {qaccept} and q j  Q  {qstart } let
 (qi , q j )  ( R1 )(R2 ) * ( R3 )  ( R4 ),
for R1   (qi , qrip ), R2   (qrip , qrip ), R3   (qrip , q j ), and R4   (qi , q j ).
4. ComputeConvert(G) and return this value.
18
Claim 1.34 For any GNFA G, Convert(G) is
equivalent to G.
P r oof. B as is (k 2) : Obvious.
In du cti onste p: Assume t hat t heclaim is t ruefor k  1 st at es.
Suppose t hatG accept san input w. T hen,in an accept ingbranch
of t hecomput at ion, G ent ersa sequence of st at es
qstart , q1 , q2 , q3 ,  , qaccept.
If qrip  {qstart , q1 , q2 , q3 ,  , qaccept}, clearlyG also accept sw.
If qrip  {qstart , q1 , q2 , q3 ,  , qaccept}, removingeach run of
consecut ive qrip st at esformsan accept ingcomput at ion for G.
T hest at esqi and q j bracket inga run havea new regular expression
on t hearrow bet ween t hem t hatdescribes all st rings t akingqi t o q j
in via qrip on G . So G accept sw.
19
Proof continued
For theotherdirection,suppose thatG acceptsan input w.
As each arrow between any twostatesqi and q j in G describes
thecollectionof strings takingqi and q j in G, eitherdirectly
or viaqrip , G must also accept w. T husG and G are equivalent.
T heinduction hypothesisstatesthat whenthealgorithmcalls
itself recursively on input G, theresult is a regular expression
thatis equivalent to G because G has k  1 states.Hence the
regular expressionalso is equivalent to G, and thealgorithm
is provedcorrect.
20
Non-regularity
B, C , and D seem to require machineswith infinitenumber of states
to recognizethem.
B  {0 n1n | n  0}
C  {w | w has an equal number of 0s and1s}
D  {w | w has an equal number of occurrences of
01and10 as substrings}.
B and C will turn out tobe nonregularwhile D regular.
See, problem1.41.
21
Theorem 1.37 Pumping Lemma
e is a number p
If A is a regular language, then ther
(thepumpinglength)where,if s  A with | s | p,
then s can be divided into threepieces,
s  xyz, satisfyingthefollowingconditions:
1. for each i  0, xy z  A,
2. | y | 0, and
i
3. | xy | p.
22
Proof of Th 1.37
s  s1 s2  sk  sl  sn
  


q1 q2  qk  qk  qa
 ( s1s2 )(sk  sl )( sn )  xyz
where qa is an acceptingstate.
By thepigeonholeprinciple,
we can takethepumpinglength
as thenumber of statesplus one.
23
Example 1.38
C lai m: B  {0 n1n | n  0} is not regular.
t B is regular.
Proof. Assume thecontrary hat
Let p be thepumpinglengt h.T hen,s  0 p1p can be decomposed
as s  xyz wit h | y | 1, and xy n z  B for any n  0.
T herecan be threecases y  0 k , y  0 k1l , and y  1l , for some
nonzerok , l. In each case, we can eazily see that xy n z  B, which
leads to contradiction.(n  2)
24
Example 1.39
C laim: C  {w | w has an equal number of 0s and1s}
is not regular.
Proof. Assume thecontrary hat
t C is regular.
Let p be thepumpinglength.T hen,s  0 p1p can be decomposed
as s  xyz with | y | 1 and | xy | p, and xy n z  C for any n  0.
T hen wemust have y  0 k for some nonzerok .
We can eazily see that xy n z  C , which contradicts theassumption.
25
Alternative proof of 1.39
T heclass of regular languages is closed under theitnersection
operation.T hisis eazy t oproveif we run t wo DFAs parallelyand
acceptonlystrings which are acceptedby bot h of theDFAs.
Now, assume C is regular.T hen, C  0*1*  B is also regular.
T hiscontradicts what we provedin Example1.38.
26
Example 1.40
C laim: F  {ww | w {0,1}*} is not regular.
Proof. Assume thecontrary hat
t F is regular.
Let p be thepumpinglengthand let s  0 p10 p1 F . T hiss can be split
into pieceslike s  xyz with | y | 1 and | xy | p, and xy n z  F for any n  0.
T hen wemust have y  0 k for some nonzerok .
We can eazily see that xy n z  F , which contradicts theassumption.
27
Example 1.41 Unary Language
C laim: D  {1 | n  0} is not regular.
n2
Proof. Assume thecontrary hat
t D is regular.
Let p be thepumpinglength and let s  1  D. T hiss can be split
p2
into pieceslike s  xyz with | y | 1 and | xy | p, and xy n z  D
for any n  0.
T helength of xy n z　grows linearly with n, while thelengths
of stringsin D grows as 0,1,4,9,16,25,36,49, 
T hese two factsare incompatible as can be easily seen.
28
Example 1.42 Pumping Down
C laim: E  {0i1 j | i  j} is not regular.
Proof. Assume thecontrary hat
t E is regular.
Let p be thepumpinglength and let s  0 p 11p. T hiss can be split
into s  xyz with | y | 1 and | xy | p, and xy n z  E for any n  0.
T hen wemust have y  0 k for some nonzerok .
We can eazily see that xy0 z  xz  E , which contradicts theassumption.
29
Problem 1.41 Differential Encoding
C laim: D  {w | w containsequal number of occurrences of the
substrings 01and10} is regular.
0
1
1
0
0xx0
0xx1
0
qstart
0
1
1xx1
1xx0
1
1
0
30